WEBVTT

00:00:00.080 --> 00:00:02.430
The following content is
provided under a Creative

00:00:02.430 --> 00:00:03.810
Commons license.

00:00:03.810 --> 00:00:06.050
Your support will help
MIT OpenCourseWare

00:00:06.050 --> 00:00:10.160
continue to offer high quality
educational resources for free.

00:00:10.160 --> 00:00:12.700
To make a donation or to
view additional materials

00:00:12.700 --> 00:00:16.600
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:16.600 --> 00:00:17.263
at ocw.mit.edu.

00:00:26.590 --> 00:00:29.840
PROFESSOR: All right,
let's get started.

00:00:29.840 --> 00:00:33.336
So welcome to another exciting
lecture about security

00:00:33.336 --> 00:00:34.710
and why the world
is so terrible.

00:00:34.710 --> 00:00:37.390
So today we're going to
talk about private browsing

00:00:37.390 --> 00:00:38.860
modes, something
that a lot of you

00:00:38.860 --> 00:00:40.940
probably have a lot of
personal experience with.

00:00:40.940 --> 00:00:43.996
At a high level, what
is the goal of privacy?

00:00:43.996 --> 00:00:45.870
When security researchers
talk about privacy,

00:00:45.870 --> 00:00:47.140
what are they talking about?

00:00:47.140 --> 00:00:48.900
Well at a high level,
they're talking

00:00:48.900 --> 00:00:50.060
about the following goal.

00:00:50.060 --> 00:00:55.260
So any particular user should be
indistinguishable from a bunch

00:00:55.260 --> 00:00:56.200
of other users.

00:00:56.200 --> 00:00:58.560
In particular, the
activity of a given user

00:00:58.560 --> 00:01:02.480
should be non-incriminating
when viewed in light of activity

00:01:02.480 --> 00:01:05.099
from a bunch of other
different users.

00:01:05.099 --> 00:01:06.620
And so, as I
mentioned, today we're

00:01:06.620 --> 00:01:09.480
going to talk about privacy
in the specific context

00:01:09.480 --> 00:01:11.800
of private web browsing.

00:01:11.800 --> 00:01:14.460
And so there's actually
no formal definition

00:01:14.460 --> 00:01:16.960
for what private
web browsing means.

00:01:16.960 --> 00:01:19.020
There's a couple different
reasons for that.

00:01:19.020 --> 00:01:20.850
So one reason is
that web applications

00:01:20.850 --> 00:01:22.170
are very, very complicated.

00:01:22.170 --> 00:01:24.160
And they're adding new
features all the time

00:01:24.160 --> 00:01:26.140
like audio and video
capabilities and things

00:01:26.140 --> 00:01:26.870
like this.

00:01:26.870 --> 00:01:29.020
As a result, there's
this moving target

00:01:29.020 --> 00:01:30.910
in terms of what
browsers can do.

00:01:30.910 --> 00:01:32.520
And as a result,
what information

00:01:32.520 --> 00:01:35.470
they might be able to leak
about a particular user.

00:01:35.470 --> 00:01:37.740
And so what ends up happening
is that in practice,

00:01:37.740 --> 00:01:39.630
like with many things
involving browsers,

00:01:39.630 --> 00:01:41.990
there's this living standard.

00:01:41.990 --> 00:01:43.400
So different
browser vendors will

00:01:43.400 --> 00:01:45.420
implement different features,
particularly with respect

00:01:45.420 --> 00:01:46.370
to private browsing.

00:01:46.370 --> 00:01:49.150
Other vendors will look and
see what vendor X is doing.

00:01:49.150 --> 00:01:51.970
They will update
their own browser.

00:01:51.970 --> 00:01:54.260
So it's like a moving target.

00:01:54.260 --> 00:01:58.040
And as users grow to rely on
private browsing more and more,

00:01:58.040 --> 00:02:00.910
they end up a lot of times
actually finding bugs

00:02:00.910 --> 00:02:03.730
in private browsing mode,
as I'll discuss a couple

00:02:03.730 --> 00:02:05.430
minutes later in the lecture.

00:02:05.430 --> 00:02:07.510
And so private browsing
at a high level

00:02:07.510 --> 00:02:09.930
you can think of as
an aspirational goal.

00:02:09.930 --> 00:02:11.960
But we as society
are continually

00:02:11.960 --> 00:02:14.070
refining what it means
to do private browsing

00:02:14.070 --> 00:02:16.470
and getting better
in some aspects--

00:02:16.470 --> 00:02:19.090
worse in some aspects-- as
we'll see a little bit later.

00:02:19.090 --> 00:02:22.920
So what exactly do we
mean by private browsing?

00:02:22.920 --> 00:02:23.440
It's tough.

00:02:23.440 --> 00:02:27.030
But the paper tries to formalize
it in two specific ways.

00:02:27.030 --> 00:02:31.640
So first of all, the paper
talks about a local attacker

00:02:31.640 --> 00:02:33.015
on private web browsing.

00:02:33.015 --> 00:02:34.640
This is someone who
is going to possess

00:02:34.640 --> 00:02:35.866
your machine after you've
finished a private browsing

00:02:35.866 --> 00:02:36.555
session.

00:02:36.555 --> 00:02:39.066
And it wants to figure
out what sites you looked

00:02:39.066 --> 00:02:40.710
at in private browsing mode.

00:02:40.710 --> 00:02:44.520
And the paper also talks
about web attackers.

00:02:44.520 --> 00:02:47.425
The web attacker is someone
who controls the websites

00:02:47.425 --> 00:02:48.680
that you visit.

00:02:48.680 --> 00:02:51.750
And this web attacker might
want to try to figure out

00:02:51.750 --> 00:02:55.470
that you are some particular
person John or Jane as opposed

00:02:55.470 --> 00:02:58.460
to some amorphous user
that the website can't

00:02:58.460 --> 00:02:59.210
tell who they are.

00:02:59.210 --> 00:03:03.840
And so we'll look at each one
of these attacks in detail.

00:03:03.840 --> 00:03:07.400
But for now, suffice it to say
that if the attacker can launch

00:03:07.400 --> 00:03:10.488
both of these attacks--
both a local and a web

00:03:10.488 --> 00:03:12.783
attack-- that actually really
strengthens their ability

00:03:12.783 --> 00:03:14.160
to try to dearm us.

00:03:14.160 --> 00:03:16.230
So, for example,
a local attacker

00:03:16.230 --> 00:03:18.820
who, for example, maybe
knows your IP address

00:03:18.820 --> 00:03:21.390
can actually talk to the
website and say, hey,

00:03:21.390 --> 00:03:23.640
have you seen this particular
IP address in your logs.

00:03:23.640 --> 00:03:24.310
If so, aha!

00:03:24.310 --> 00:03:28.744
You're looking at the user whose
machine I control right now.

00:03:28.744 --> 00:03:31.160
So it's actually pretty useful
from a security perspective

00:03:31.160 --> 00:03:33.299
to consider these
local and web attacks.

00:03:33.299 --> 00:03:35.090
So they are separate
things and then to see

00:03:35.090 --> 00:03:37.110
how they can possibly compose.

00:03:37.110 --> 00:03:42.100
So let's look at this
first type of attacker,

00:03:42.100 --> 00:03:44.560
which is the local attacker.

00:03:49.880 --> 00:03:54.890
So as I mentioned, we
assume that this attacker

00:03:54.890 --> 00:04:02.700
is going to control the
user's machine post-session.

00:04:02.700 --> 00:04:07.784
And so by post-session, I
mean that the private browsing

00:04:07.784 --> 00:04:11.440
activity has already finished--
the user has perhaps gone

00:04:11.440 --> 00:04:12.809
off and done something else.

00:04:12.809 --> 00:04:13.850
It's not at the computer.

00:04:13.850 --> 00:04:15.891
And then the attacker
takes control of that issue

00:04:15.891 --> 00:04:17.890
and wants to figure
out what's going on.

00:04:17.890 --> 00:04:21.110
And so the security goal is
that well we don't the attacker

00:04:21.110 --> 00:04:23.080
be able to figure out
any of the websites

00:04:23.080 --> 00:04:27.310
that the user visited during
this private browsing activity.

00:04:27.310 --> 00:04:29.885
Now, the reason why the
post is actually important

00:04:29.885 --> 00:04:32.420
there is because if we assume
that the attacker can control

00:04:32.420 --> 00:04:34.787
the machine before the
users private browsing,

00:04:34.787 --> 00:04:37.370
then basically it's game over,
right, because the attacker can

00:04:37.370 --> 00:04:41.730
install a keystroke logger-- the
attacker can subvert the binary

00:04:41.730 --> 00:04:44.120
that [INAUDIBLE] the browser.

00:04:44.120 --> 00:04:45.980
The attacker can subert the OS.

00:04:45.980 --> 00:04:50.460
So we don't really care about
this pre-session attacker.

00:04:50.460 --> 00:04:52.800
And also note that we're not
trying to provide privacy

00:04:52.800 --> 00:04:57.189
for the user after the attacker
has controlled the machine.

00:04:57.189 --> 00:04:58.480
And that's for the same reason.

00:04:58.480 --> 00:04:59.910
Once the attacker
gets to the machine,

00:04:59.910 --> 00:05:02.535
he or she can do the same thing
that is mentioned-- key logger.

00:05:02.535 --> 00:05:05.530
So, basically, once the
user leaves the machine,

00:05:05.530 --> 00:05:08.840
we don't assume any
forward notions of privacy.

00:05:08.840 --> 00:05:09.740
Does that make sense?

00:05:09.740 --> 00:05:11.580
It's pretty straightforward.

00:05:11.580 --> 00:05:14.700
And so you can imagine that
another goal that you might

00:05:14.700 --> 00:05:17.950
want to try to
satisfy here is you

00:05:17.950 --> 00:05:20.150
might want to try to
hide from the attacker

00:05:20.150 --> 00:05:23.820
that the user was employing
private browsing mode at all.

00:05:23.820 --> 00:05:26.290
Now the paper actually
said that's very difficult.

00:05:26.290 --> 00:05:28.530
This property is often
called plausible deniability.

00:05:28.530 --> 00:05:31.389
So your boss comes up to you
after you use private browsing,

00:05:31.389 --> 00:05:33.305
and says were you looking
at mylittlepony.com?

00:05:33.305 --> 00:05:35.027
No, no, I certainly wasn't.

00:05:35.027 --> 00:05:37.110
And I certainly wasn't
using private browsing mode

00:05:37.110 --> 00:05:39.444
to hide the fact that I was
looking at mylittlepony.com.

00:05:39.444 --> 00:05:40.818
So as I said, the
paper said it's

00:05:40.818 --> 00:05:42.390
difficult to provide
this property

00:05:42.390 --> 00:05:43.770
of plausible deniability.

00:05:43.770 --> 00:05:45.300
I'll give you some
concrete reasons

00:05:45.300 --> 00:05:47.190
why this might be the
case a little bit later

00:05:47.190 --> 00:05:48.570
on in the lecture.

00:05:48.570 --> 00:05:51.440
But that basically is an
overview of the local attacker.

00:05:51.440 --> 00:05:55.142
So one question we might
want to think about

00:05:55.142 --> 00:06:05.760
is what kinds of persistent
client-side state

00:06:05.760 --> 00:06:11.041
can be leaked by a
private browsing session?

00:06:14.730 --> 00:06:18.796
And by persistent,
I just mean stuff

00:06:18.796 --> 00:06:22.022
that will end up getting
stored on the local hard disk,

00:06:22.022 --> 00:06:24.716
the local SSD or whatever.

00:06:24.716 --> 00:06:27.215
So what kinds of state might
be leaked if we weren't careful

00:06:27.215 --> 00:06:29.381
when someone is doing this
type of private browsing?

00:06:29.381 --> 00:06:31.010
So one thing you
might be worried about

00:06:31.010 --> 00:06:38.150
is JavaScript accessible states.

00:06:38.150 --> 00:06:45.391
So examplees of this includes
things like cookies and DOM

00:06:45.391 --> 00:06:45.890
storage.

00:06:49.400 --> 00:06:52.220
Another thing you might
be worried about--

00:06:52.220 --> 00:06:55.049
and this is what most people
think about when they think

00:06:55.049 --> 00:06:57.090
about what they want to
say in private browsing--

00:06:57.090 --> 00:06:59.514
is maybe the browser cache.

00:06:59.514 --> 00:07:01.680
So you don't want someone
to look in the inner cache

00:07:01.680 --> 00:07:04.200
and figure out here
are some images or HTML

00:07:04.200 --> 00:07:05.908
files from websites
you prefer people

00:07:05.908 --> 00:07:07.116
didn't know that you visited.

00:07:10.620 --> 00:07:16.377
Another important thing is
your history of visited sites.

00:07:19.680 --> 00:07:21.486
So many of your
relationships have

00:07:21.486 --> 00:07:24.032
been broken when the other goes
to the browser-- start typing

00:07:24.032 --> 00:07:26.240
something into to the address
bar and all of a sudden

00:07:26.240 --> 00:07:28.281
it auto-completes to
something very embarrassing.

00:07:28.281 --> 00:07:30.245
So this is one
thing definitely you

00:07:30.245 --> 00:07:33.610
don't want to leak outside
the private browsing session.

00:07:33.610 --> 00:07:38.610
You can also think about
configuration states

00:07:38.610 --> 00:07:39.830
with the browsing.

00:07:39.830 --> 00:07:43.270
And so here you could
think about things

00:07:43.270 --> 00:07:47.330
like client certificates.

00:07:47.330 --> 00:07:51.725
You could also think about
stuff like bookmarks.

00:07:55.170 --> 00:07:58.262
Maybe if you logged into a
particular site and the browser

00:07:58.262 --> 00:08:00.178
offers to store your
passwords in another type

00:08:00.178 --> 00:08:02.455
of configuration state that
you might not want leaking

00:08:02.455 --> 00:08:05.210
from private browsing mode.

00:08:05.210 --> 00:08:09.940
Downloaded files--
as we'll discuss,

00:08:09.940 --> 00:08:12.520
this one's a little
bit interesting

00:08:12.520 --> 00:08:15.972
because downloading a file
actually requires explicit user

00:08:15.972 --> 00:08:17.180
action to download that file.

00:08:17.180 --> 00:08:18.876
Maybe we do actually
want this stuff

00:08:18.876 --> 00:08:20.610
to leak outside of
private browsing mode.

00:08:20.610 --> 00:08:22.780
Maybe if you download something
in private browsing mode,

00:08:22.780 --> 00:08:25.247
it should actually be accessible
when you open the browser

00:08:25.247 --> 00:08:26.830
or use the machine
after that session.

00:08:26.830 --> 00:08:30.660
So we'll talk about this
a little bit in a second.

00:08:30.660 --> 00:08:34.320
And then, finally, during
private browsing mode,

00:08:34.320 --> 00:08:39.261
you might install new
plug-ins or browser sessions.

00:08:43.489 --> 00:08:45.030
That's another type
of state that you

00:08:45.030 --> 00:08:48.970
might imagine you don't
want to leak outside

00:08:48.970 --> 00:08:50.160
of private browsing mode.

00:08:50.160 --> 00:08:54.040
So, basically, current
browsing modes typically

00:08:54.040 --> 00:08:58.750
try to prevent one, two, and
three from leaking outside

00:08:58.750 --> 00:09:00.351
of private browser sessions.

00:09:00.351 --> 00:09:00.850
Right?

00:09:00.850 --> 00:09:02.808
So there shouldn't be
any cookies or DOM stores

00:09:02.808 --> 00:09:03.732
to get out of there.

00:09:03.732 --> 00:09:05.940
Anything you put in a cache
during a private browsing

00:09:05.940 --> 00:09:07.404
session should be deleted.

00:09:07.404 --> 00:09:09.320
And you shouldn't have
any history of the URLs

00:09:09.320 --> 00:09:11.160
that you're using.

00:09:11.160 --> 00:09:14.510
Typically, four, five, and
six private browsing modes

00:09:14.510 --> 00:09:16.890
allow to leak
outside of a session.

00:09:16.890 --> 00:09:19.220
And there's some good
and some bad reasons

00:09:19.220 --> 00:09:20.570
why this might be the case.

00:09:20.570 --> 00:09:22.400
And as we'll
discuss later, we'll

00:09:22.400 --> 00:09:25.050
see if you allow anything to
leak from the private browsing

00:09:25.050 --> 00:09:28.130
session, that actually
radically increases the threat

00:09:28.130 --> 00:09:29.470
surface of private leaks.

00:09:29.470 --> 00:09:31.540
So it becomes much more
difficult to reason

00:09:31.540 --> 00:09:33.240
about what the
security properties are

00:09:33.240 --> 00:09:34.429
for private browsing mode.

00:09:34.429 --> 00:09:35.470
Does that all make sense?

00:09:35.470 --> 00:09:38.370
Anyone have any questions?

00:09:38.370 --> 00:09:41.320
It's pretty straightforward.

00:09:41.320 --> 00:09:45.320
So the next thing we're going
to talk about very briefly

00:09:45.320 --> 00:09:50.704
is network activity during
private browsing mode.

00:09:50.704 --> 00:09:53.460
And what's
interesting about this

00:09:53.460 --> 00:09:56.911
is that even if we
cover all this stuff--

00:09:56.911 --> 00:09:58.910
we don't allow private
browsing to leak anything

00:09:58.910 --> 00:10:02.060
from there-- the mere fact that
you're issuing network packet

00:10:02.060 --> 00:10:04.650
connections leave evidence
of what you were doing.

00:10:04.650 --> 00:10:06.513
So imagine when you
want to go to foo.com,

00:10:06.513 --> 00:10:08.540
the website, your
machine actually

00:10:08.540 --> 00:10:12.259
has to issue a DNS resolution
request for foo.com.

00:10:12.259 --> 00:10:14.050
So even if you don't
leave any of this type

00:10:14.050 --> 00:10:15.508
of persistent state
up there, there

00:10:15.508 --> 00:10:18.310
may be records in
your local DNS cache

00:10:18.310 --> 00:10:21.681
that you, in fact, tried to
resolve the hostname foo.com.

00:10:21.681 --> 00:10:22.680
That's very interesting.

00:10:22.680 --> 00:10:24.680
So you can imagine
that browsers could

00:10:24.680 --> 00:10:27.660
try to flush the
DNS cache somehow

00:10:27.660 --> 00:10:29.390
after the private
session was over.

00:10:29.390 --> 00:10:30.370
Now, in practice,
that's actually

00:10:30.370 --> 00:10:31.911
tricky to do because
on many systems,

00:10:31.911 --> 00:10:34.480
you require administrator
privileges to do that.

00:10:34.480 --> 00:10:37.360
So it's not clear if you want
the browser running as root

00:10:37.360 --> 00:10:40.115
because browsers, as
we've seen, are somewhat

00:10:40.115 --> 00:10:42.020
untrustworthy individuals.

00:10:42.020 --> 00:10:44.130
And also too-- a
lot of DNS flush

00:10:44.130 --> 00:10:46.290
commands-- they don't
actually act per user.

00:10:46.290 --> 00:10:47.707
They flush the
entire cache, which

00:10:47.707 --> 00:10:50.165
is typically not what you would
want if you're implementing

00:10:50.165 --> 00:10:51.250
private browsing mode.

00:10:51.250 --> 00:10:53.000
You'd want to use a
type of surgical thing

00:10:53.000 --> 00:10:55.472
where I only want to get rid
of foo.com and things that

00:10:55.472 --> 00:10:57.597
were visited during this
private browsing sessions,

00:10:57.597 --> 00:10:59.470
but not delete other things.

00:10:59.470 --> 00:11:02.462
So in practice, that's kind
of a tricky thing to handle.

00:11:02.462 --> 00:11:03.920
And another tricky
thing to handle,

00:11:03.920 --> 00:11:08.310
which the paper mentions--
are these things

00:11:08.310 --> 00:11:10.620
that I'll call RAM artifacts.

00:11:13.330 --> 00:11:18.210
So the basic idea here is that
during private browsing mode,

00:11:18.210 --> 00:11:22.320
that private browser has to be
keeping some stuff in memory.

00:11:22.320 --> 00:11:24.290
And so even if the
private browsing mode

00:11:24.290 --> 00:11:29.630
doesn't issue any direct
I/Os to disk-- user rights.

00:11:29.630 --> 00:11:32.620
The RAM that belongs to
that private browsing tab

00:11:32.620 --> 00:11:35.720
can still be reflected into
the page file, for example.

00:11:35.720 --> 00:11:38.490
It can still be reflected
into the hibernation file,

00:11:38.490 --> 00:11:40.330
for example, the laptop.

00:11:40.330 --> 00:11:42.500
And so if that
state gets reflected

00:11:42.500 --> 00:11:45.450
into persistent storage, then
what may end up happening

00:11:45.450 --> 00:11:47.740
is that after your private
browsing session is over,

00:11:47.740 --> 00:11:50.160
the attacker can look in
your page file, for example,

00:11:50.160 --> 00:11:52.420
and find, for example,
JavaScript code that

00:11:52.420 --> 00:11:54.480
was reflected to
disk or find HTML

00:11:54.480 --> 00:11:56.320
that was reflected to disk.

00:11:56.320 --> 00:11:59.650
So we're going to have a
little demonstration of how

00:11:59.650 --> 00:12:01.160
this might work.

00:12:01.160 --> 00:12:04.080
So if you see up
here on the screen,

00:12:04.080 --> 00:12:09.750
I basically loaded up
private browsing tabs.

00:12:09.750 --> 00:12:11.890
And so what I'm
going to do is I'm

00:12:11.890 --> 00:12:15.810
going to go to some website.

00:12:15.810 --> 00:12:21.360
So this is for the PDOS
group here at CSAIL.

00:12:21.360 --> 00:12:23.160
I've loaded up that page.

00:12:23.160 --> 00:12:25.090
And then what I'm
going to do is use

00:12:25.090 --> 00:12:28.540
this fun command called gcore.

00:12:28.540 --> 00:12:30.740
So, basically, I'm
going to take a memory

00:12:30.740 --> 00:12:34.200
snapshot of this running page.

00:12:34.200 --> 00:12:37.410
And so I will do
the following magic.

00:12:48.780 --> 00:12:53.150
So basically there's
going to be some work

00:12:53.150 --> 00:12:58.345
that my terminal is doing to
generate that memory snapshot.

00:13:02.270 --> 00:13:04.980
So this takes a little
bit of time sometimes.

00:13:10.400 --> 00:13:16.220
Now, what's happening here.

00:13:16.220 --> 00:13:18.840
So now we've basically
generated the core file

00:13:18.840 --> 00:13:20.987
for that private browsing image.

00:13:20.987 --> 00:13:22.570
So what we're going
to do now is we're

00:13:22.570 --> 00:13:26.550
going to look
inside of that image

00:13:26.550 --> 00:13:30.790
and see if we can find
any mentions of PDOS.

00:13:33.832 --> 00:13:35.290
And so what's
interesting is we see

00:13:35.290 --> 00:13:39.410
a ton of instances of the
string PDOS in that memory image

00:13:39.410 --> 00:13:41.330
for the private browsing mode.

00:13:41.330 --> 00:13:43.590
And so what is interesting
is we actually see

00:13:43.590 --> 00:13:45.610
various prefixes for things.

00:13:45.610 --> 00:13:48.860
If we look further
up, we can see things

00:13:48.860 --> 00:13:52.630
like there's full URLs
here and things like this.

00:13:52.630 --> 00:13:55.280
You also find HTML
code in there.

00:13:55.280 --> 00:13:58.580
So the point here is
that if we found all this

00:13:58.580 --> 00:14:02.780
in the memory of that page, then
if this-- if any of those pages

00:14:02.780 --> 00:14:05.786
got put to disk in the
page file, then he attacker

00:14:05.786 --> 00:14:07.160
could basically
just run strings.

00:14:07.160 --> 00:14:09.602
So they could do what I
just did over the page file

00:14:09.602 --> 00:14:11.560
and try to find out what
sites that you visited

00:14:11.560 --> 00:14:13.340
in private browsing mode.

00:14:13.340 --> 00:14:14.340
So does that make sense?

00:14:14.340 --> 00:14:17.540
Basically, the problem here
is that private browsing modes

00:14:17.540 --> 00:14:20.226
don't try to obfuscate RAM
basically or encrypt it

00:14:20.226 --> 00:14:21.051
in any way.

00:14:21.051 --> 00:14:23.300
And that seems like a pretty
fundamental thing because

00:14:23.300 --> 00:14:24.780
at a certain point,
the processor

00:14:24.780 --> 00:14:27.600
has to execute on
clear text data.

00:14:27.600 --> 00:14:31.814
And so this is actually
a pretty big challenge.

00:14:31.814 --> 00:14:33.230
So does anyone
have any questions?

00:14:33.230 --> 00:14:34.200
Yeah?

00:14:34.200 --> 00:14:37.472
AUDIENCE: So one thing is
I don't expect my browser

00:14:37.472 --> 00:14:38.951
to do that.

00:14:38.951 --> 00:14:41.021
One thing is that these
browsers-- the guarantee

00:14:41.021 --> 00:14:42.895
that they give you
through private browsing--

00:14:42.895 --> 00:14:45.360
the example they give is if
you're shopping for something,

00:14:45.360 --> 00:14:47.825
your layman friend
can't go on the computer

00:14:47.825 --> 00:14:49.179
and see the things.

00:14:49.179 --> 00:14:51.679
So can you talk a little bit
about what guarantees they give

00:14:51.679 --> 00:14:52.950
and if they had
to change anything

00:14:52.950 --> 00:14:54.360
as a consequence of this paper?

00:14:54.360 --> 00:14:57.030
PROFESSOR: Yeah, it's
very interesting.

00:14:57.030 --> 00:14:58.960
One thing you can look
at is when you open up

00:14:58.960 --> 00:15:00.335
a private browsing
tab, typically

00:15:00.335 --> 00:15:02.630
there will be a little
blurb that says, hey,

00:15:02.630 --> 00:15:03.730
welcome to incognito mode.

00:15:03.730 --> 00:15:05.230
Here's where we'll
help you against.

00:15:05.230 --> 00:15:07.200
We won't help you if someone
is standing behind you

00:15:07.200 --> 00:15:08.741
with a rubber hose
about to beat you.

00:15:08.741 --> 00:15:10.850
And so the browser
vendors themselves

00:15:10.850 --> 00:15:14.737
area little bit cagey about
what guarantees they provide.

00:15:14.737 --> 00:15:17.320
And in fact, after the Snowden
incident, a lot of the browsers

00:15:17.320 --> 00:15:18.800
actually changed
that splash page

00:15:18.800 --> 00:15:20.533
because they wanted to
actually make it clear

00:15:20.533 --> 00:15:22.658
that we're not actually
protecting from strong ways

00:15:22.658 --> 00:15:24.650
with the NSA or
something like that.

00:15:24.650 --> 00:15:26.505
So long story short,
what guarantees

00:15:26.505 --> 00:15:27.660
are they providing you?

00:15:27.660 --> 00:15:30.502
In practice, they're
providing that weak thing

00:15:30.502 --> 00:15:31.460
that you mention there.

00:15:31.460 --> 00:15:33.266
It's like a lay
person who wanted

00:15:33.266 --> 00:15:34.990
to see what you were
doing afterwards

00:15:34.990 --> 00:15:36.990
couldn't figure out
what you were doing.

00:15:36.990 --> 00:15:38.365
And we're assuming
the lay person

00:15:38.365 --> 00:15:41.227
can't run strings on the page
file or things like that.

00:15:41.227 --> 00:15:43.560
Now, the problem-- there's
actually two problems though.

00:15:43.560 --> 00:15:47.835
One problem is that first
of all, because browsers

00:15:47.835 --> 00:15:49.720
are so complicated,
they often don't even

00:15:49.720 --> 00:15:50.970
protect against the layperson.

00:15:50.970 --> 00:15:52.520
I can give you a
personal example.

00:15:52.520 --> 00:15:56.920
So a lot of times when you
see those ridiculous ads

00:15:56.920 --> 00:15:58.670
from "Huffington Post,"
like, oh, my gosh.

00:15:58.670 --> 00:16:00.920
It's like puppies trying to
help small puppies go down

00:16:00.920 --> 00:16:02.310
stairs and things like that.

00:16:02.310 --> 00:16:03.340
Right?

00:16:03.340 --> 00:16:06.010
Because I'm weak, I will
sometimes hook on those things.

00:16:06.010 --> 00:16:08.093
But because I don't know
want people to know that,

00:16:08.093 --> 00:16:10.734
I'll sometimes do that
in private browsing mode.

00:16:10.734 --> 00:16:12.900
So what will happen sometimes
is that sometimes I'll

00:16:12.900 --> 00:16:16.610
see those URLs will
leak into my URL history

00:16:16.610 --> 00:16:19.240
like my regular, public
mode browser, which

00:16:19.240 --> 00:16:22.220
is precisely what this
stuff is designed not to do.

00:16:22.220 --> 00:16:25.346
So one problem is that
sometimes these browsers

00:16:25.346 --> 00:16:27.720
don't provide protection
against the layperson attackers.

00:16:27.720 --> 00:16:29.844
The second thing is I think
that there are actually

00:16:29.844 --> 00:16:32.270
a lot of people who would
like for private browsing mode

00:16:32.270 --> 00:16:34.652
to provide something
stronger, particularly

00:16:34.652 --> 00:16:36.640
with the whole Snowden thing.

00:16:36.640 --> 00:16:37.280
I think there is a lot
of people increasingly

00:16:37.280 --> 00:16:39.279
who would like private
browsing mode to protect,

00:16:39.279 --> 00:16:41.510
for example, against these
RAM artifact attacks,

00:16:41.510 --> 00:16:44.310
even though they may not be
able to technically articulate

00:16:44.310 --> 00:16:45.617
that goal.

00:16:45.617 --> 00:16:47.200
And so actually one
of the things I've

00:16:47.200 --> 00:16:48.120
done while I was
here, I got to do

00:16:48.120 --> 00:16:50.351
some research in a stronger
private browsing mode

00:16:50.351 --> 00:16:50.850
protection.

00:16:50.850 --> 00:16:52.350
So we can chat about
that after all.

00:16:52.350 --> 00:16:54.308
One of the things we
learn about all professors

00:16:54.308 --> 00:16:56.763
is that we will talk about
our research endlessly.

00:16:56.763 --> 00:16:59.013
So if you want to talk about
that for three hours just

00:16:59.013 --> 00:17:00.150
send me a calendar request.

00:17:00.150 --> 00:17:02.490
And we can do that.

00:17:02.490 --> 00:17:06.099
So, anyway, this is
basically a demonstration.

00:17:06.099 --> 00:17:07.180
Oh, you had a question?

00:17:07.180 --> 00:17:09.400
AUDIENCE: Yeah, about the RAM.

00:17:09.400 --> 00:17:12.375
So I'm not familiar with
how it works exactly.

00:17:12.375 --> 00:17:15.333
How come a browser can't
at the end of a session,

00:17:15.333 --> 00:17:19.384
just ask the OS to flush those
parts around that he was using?

00:17:19.384 --> 00:17:20.800
PROFESSOR: So we're
actually going

00:17:20.800 --> 00:17:23.437
to get to that topic
in a couple of minutes.

00:17:23.437 --> 00:17:24.270
But you are correct.

00:17:24.270 --> 00:17:26.920
At a high level,
what you can imagine

00:17:26.920 --> 00:17:30.280
is that maybe the OS when it,
for example, killed a process,

00:17:30.280 --> 00:17:32.510
would actually go through
all those numbered pages

00:17:32.510 --> 00:17:34.540
and write zeros to
all those pages.

00:17:34.540 --> 00:17:37.375
Or you could also imagine
that maybe the browser tried

00:17:37.375 --> 00:17:40.140
to pin all the pages in
memory to prevent anything

00:17:40.140 --> 00:17:42.410
from getting flushed out at all.

00:17:42.410 --> 00:17:44.880
So there are some
solutions that can do that.

00:17:44.880 --> 00:17:48.040
So hold onto that
question for one second.

00:17:48.040 --> 00:17:50.170
This is basically an
example of how data from RAM

00:17:50.170 --> 00:17:53.060
can leak onto disk
through paging activity.

00:17:53.060 --> 00:17:58.460
But note that data lifetime
is a bigger problem than just

00:17:58.460 --> 00:18:00.797
in the context of
private browsing.

00:18:00.797 --> 00:18:02.880
You can imagine that any
programs that deals with,

00:18:02.880 --> 00:18:05.740
let's say, cryptographic
keys or user passwords

00:18:05.740 --> 00:18:06.930
will have this problem.

00:18:06.930 --> 00:18:09.540
Anytime you type in your
password to a a program,

00:18:09.540 --> 00:18:12.330
the memory page which holds
that password can always get

00:18:12.330 --> 00:18:13.080
reflected to disk.

00:18:13.080 --> 00:18:17.560
So let me show you
another example of this.

00:18:17.560 --> 00:18:24.140
So let's say that we looked at
the following program, which

00:18:24.140 --> 00:18:25.660
is pretty simple.

00:18:25.660 --> 00:18:26.890
It's called memclear.

00:18:26.890 --> 00:18:28.700
So you see here at
the bottom and main,

00:18:28.700 --> 00:18:33.410
we're just going to read in
some secret text file here.

00:18:33.410 --> 00:18:35.420
And then we're just
going to sleep forever.

00:18:35.420 --> 00:18:37.990
So what is that Read Secret do?

00:18:37.990 --> 00:18:42.470
Basically, it reasons from file.

00:18:42.470 --> 00:18:48.790
It's going to print out
the contents of that file.

00:18:48.790 --> 00:18:50.770
And then it's actually
going to clear out

00:18:50.770 --> 00:18:54.040
the buffer that was used to
store that secret information.

00:18:54.040 --> 00:18:55.290
So getting back to your issue.

00:18:55.290 --> 00:18:57.123
So one can imagine the
browser, for example,

00:18:57.123 --> 00:19:01.180
would try to just memset to
zero all the secrets that it

00:19:01.180 --> 00:19:05.940
encountered when it's
just in private browser.

00:19:05.940 --> 00:19:12.810
So if we look at the secret
files, it's not very fun.

00:19:12.810 --> 00:19:14.660
It just says, my
secrets of in a file.

00:19:14.660 --> 00:19:21.230
And then if we run this
program, in the background--

00:19:21.230 --> 00:19:22.210
so what did it do?

00:19:22.210 --> 00:19:24.040
So like I said, it
just printed it out.

00:19:24.040 --> 00:19:26.970
It read that file in, printed
out the secret value--

00:19:26.970 --> 00:19:28.930
cleared the memory
buffer that it

00:19:28.930 --> 00:19:30.400
used to print that stuff out.

00:19:30.400 --> 00:19:32.870
Now it's just sleeping
in the background.

00:19:32.870 --> 00:19:39.190
So once again, if we use
this fun gcore command,

00:19:39.190 --> 00:19:44.050
we can take a memory dump
of the memclear program

00:19:44.050 --> 00:19:46.730
that's running in
memory right now.

00:19:46.730 --> 00:19:51.050
OK, and then if we
do-- let's see which

00:19:51.050 --> 00:19:54.440
ones we're going to look at.

00:19:54.440 --> 00:20:00.390
So then if we look at--
this guy is the one we want.

00:20:00.390 --> 00:20:05.540
And then we do a
grep for a secret.

00:20:05.540 --> 00:20:07.650
So once again, we
see that if look

00:20:07.650 --> 00:20:11.700
in the RAM image of
that running program,

00:20:11.700 --> 00:20:14.050
we found instances of
both the file name that

00:20:14.050 --> 00:20:17.240
was read in and also some
prefixes of the string

00:20:17.240 --> 00:20:20.390
contents of that
file, even though we

00:20:20.390 --> 00:20:24.190
wiped the buffer in
the C program itself.

00:20:24.190 --> 00:20:26.810
So you might say
why did this happen?

00:20:26.810 --> 00:20:28.720
This seems very, very strange.

00:20:28.720 --> 00:20:30.860
And the reason is that if
you think about the way

00:20:30.860 --> 00:20:34.710
that I/O works, it's
like a layer type thing.

00:20:34.710 --> 00:20:37.830
So by the time that the
contents of that file

00:20:37.830 --> 00:20:41.320
get to the program, it's
already gone through, let's say,

00:20:41.320 --> 00:20:42.070
the kernel memory.

00:20:42.070 --> 00:20:45.740
It's already gone through maybe
like the C Standard Library

00:20:45.740 --> 00:20:47.430
to do I/O because
that library does

00:20:47.430 --> 00:20:48.980
buffering and stuff like that.

00:20:48.980 --> 00:20:50.730
And so what ends up
happening is that even

00:20:50.730 --> 00:20:54.870
if you memset the
application visible buffer,

00:20:54.870 --> 00:20:57.757
there are still instances
of secret data lying

00:20:57.757 --> 00:21:00.590
in many different places
throughout the system.

00:21:00.590 --> 00:21:02.300
And this is looking
at the user mode

00:21:02.300 --> 00:21:03.960
portion of this application.

00:21:03.960 --> 00:21:06.530
So there's probably still
data sitting around in maybe

00:21:06.530 --> 00:21:09.110
like the kernel I/O buffers
or things like that.

00:21:09.110 --> 00:21:10.740
So getting back
to your question,

00:21:10.740 --> 00:21:13.800
if you want to do what they
call security allocation,

00:21:13.800 --> 00:21:17.655
you can't just rely on
mechanisms at the application

00:21:17.655 --> 00:21:20.730
level because there may be other
places where that data lives.

00:21:20.730 --> 00:21:22.720
So what are some
examples of other places

00:21:22.720 --> 00:21:26.110
where this data might live?

00:21:26.110 --> 00:21:33.540
So, for example, it might
live in a process memory.

00:21:33.540 --> 00:21:41.470
So these are things like
the heap and the stack.

00:21:41.470 --> 00:21:45.080
So when we did that memset
inside of memclear.c,

00:21:45.080 --> 00:21:47.520
we were basically
trying to address this.

00:21:47.520 --> 00:21:50.530
But what we found
out is that that

00:21:50.530 --> 00:21:54.260
is necessary, but insufficient
to actually clear all instances

00:21:54.260 --> 00:21:56.490
of that secret from memory.

00:21:56.490 --> 00:22:02.830
So where else my RAM
artifacts live or secret data

00:22:02.830 --> 00:22:05.790
persists-- so all
kinds of files--

00:22:05.790 --> 00:22:11.590
backups-- SQL write databases.

00:22:14.900 --> 00:22:18.291
If at any point, an application
takes something in RAM

00:22:18.291 --> 00:22:20.540
and writes it to one of these
things, then once again,

00:22:20.540 --> 00:22:23.640
the attacker may be able to
recover that after the attacker

00:22:23.640 --> 00:22:25.500
controls the disk .

00:22:25.500 --> 00:22:33.080
As I mentioned, a kernel
memory is another common place

00:22:33.080 --> 00:22:35.840
where RAM secrets may
live because, once

00:22:35.840 --> 00:22:39.650
again, applications typically
do layered I/O in which

00:22:39.650 --> 00:22:42.580
each piece of data goes through
multiple parts of the stack.

00:22:42.580 --> 00:22:45.560
Think of like network
transmission, for example.

00:22:45.560 --> 00:22:48.052
First, the data has to come
to some network buffer that's

00:22:48.052 --> 00:22:49.420
probably inside the kernel.

00:22:49.420 --> 00:22:52.400
Then once again, it probably
goes through some buffers

00:22:52.400 --> 00:22:54.170
inside the C Standard Library.

00:22:54.170 --> 00:22:57.334
And then finally it will
go to the user mode--

00:22:57.334 --> 00:22:59.500
the part of the application
that the developer wrote

00:22:59.500 --> 00:23:02.400
him or herself.

00:23:02.400 --> 00:23:04.140
So that can actually
be a big problem.

00:23:04.140 --> 00:23:07.500
You can also think too of
freed memory pages as being

00:23:07.500 --> 00:23:09.900
a place where data can leak.

00:23:09.900 --> 00:23:14.610
So imagine that your
application allocates

00:23:14.610 --> 00:23:18.370
a bunch of memory using
whatever [INAUDIBLE] or whatnot.

00:23:18.370 --> 00:23:20.730
And then that process dies.

00:23:20.730 --> 00:23:23.190
And the kernel sends
out another process

00:23:23.190 --> 00:23:26.980
but hasn't actually zeroed
out all the physical RAM page.

00:23:26.980 --> 00:23:29.912
So what could happen is that
when that new process spins up,

00:23:29.912 --> 00:23:32.370
it could just do a walk through
all this physical RAM pages

00:23:32.370 --> 00:23:33.995
and use a bunch of
memory and just do

00:23:33.995 --> 00:23:36.430
the same thing-- do the
strange thing-- see if there's

00:23:36.430 --> 00:23:38.090
anything interesting there.

00:23:38.090 --> 00:23:40.321
And then they might be able
to get secrets that way.

00:23:40.321 --> 00:23:41.820
So there's a lot
of ways information

00:23:41.820 --> 00:23:44.140
is leaked from the kernel.

00:23:44.140 --> 00:23:47.130
You could also think about
I/O buffers and things

00:23:47.130 --> 00:23:50.230
like a keyboard from
things like the mouse.

00:23:50.230 --> 00:23:52.870
There's just a bunch
of different factors

00:23:52.870 --> 00:23:55.430
that data can leak
through the kernel.

00:23:59.650 --> 00:24:03.366
How might an attacker try to
get some of this information?

00:24:03.366 --> 00:24:04.740
Well, in some
cases, it's just as

00:24:04.740 --> 00:24:09.520
simple as reading the files--
so just read the page file.

00:24:09.520 --> 00:24:13.350
Read the hibernation file
and just see what's in there.

00:24:13.350 --> 00:24:16.060
Some file formats actually
embed different versions

00:24:16.060 --> 00:24:17.360
within themselves.

00:24:17.360 --> 00:24:19.610
For example, the way that
Microsoft Word used to work

00:24:19.610 --> 00:24:22.180
is that a single Word file
would actually contain versions

00:24:22.180 --> 00:24:23.850
for old pieces of data.

00:24:23.850 --> 00:24:25.910
So if you could get
access to that Word file,

00:24:25.910 --> 00:24:27.826
you could just sit there
through either format

00:24:27.826 --> 00:24:30.380
and so step through
all the old versions.

00:24:30.380 --> 00:24:33.580
And so as we have been
discussing in the last couple

00:24:33.580 --> 00:24:38.030
minutes, security allocation
is also a problem.

00:24:38.030 --> 00:24:40.430
It cannot supported
a full stack.

00:24:40.430 --> 00:24:42.610
So for example, an older
Linux kernel-- when

00:24:42.610 --> 00:24:45.830
you would create a
directory, end directory,

00:24:45.830 --> 00:24:49.410
you could leak up to four
kilobytes of kernel memory.

00:24:49.410 --> 00:24:51.160
Only Zeus knows what's
inside that memory.

00:24:51.160 --> 00:24:55.870
And that's because Linux
wasn't actually zeroing out

00:24:55.870 --> 00:24:58.480
kernel memory that had been
allocated, deallocated,

00:24:58.480 --> 00:25:02.060
and then allocated
to something else.

00:25:02.060 --> 00:25:06.780
So as I mentioned before too--
if the kernel doesn't zero out

00:25:06.780 --> 00:25:09.390
pages that are given
to user mode processes,

00:25:09.390 --> 00:25:10.973
you can also have
user mode secret

00:25:10.973 --> 00:25:14.020
leaks through those types
of menu pages as well.

00:25:14.020 --> 00:25:21.990
Another thing is that-- SSDs--
many of them implement logging.

00:25:26.290 --> 00:25:32.250
And so in other words, when
you send a write to an SSD,

00:25:32.250 --> 00:25:35.480
oftentimes you are not
directly overwriting data,

00:25:35.480 --> 00:25:37.480
you're actually
writing to a log.

00:25:37.480 --> 00:25:40.260
And when a piece of
data becomes invalid,

00:25:40.260 --> 00:25:42.760
it lays away your claim.

00:25:42.760 --> 00:25:46.664
So what that means is that if
you as the user get unlucky.

00:25:46.664 --> 00:25:49.205
And you've written a bunch of
data that hasn't been reclaimed

00:25:49.205 --> 00:25:51.440
by the SSD, then
maybe the attacker

00:25:51.440 --> 00:25:54.754
can look at that
hardware and say, oh, OK,

00:25:54.754 --> 00:25:55.920
I understand the log format.

00:25:55.920 --> 00:25:56.850
And even though
technically speaking,

00:25:56.850 --> 00:25:58.810
this data may be
invalid, I can still

00:25:58.810 --> 00:26:01.832
recover because I understand
how the Flash translation layer

00:26:01.832 --> 00:26:03.040
works or something like that.

00:26:03.040 --> 00:26:04.550
And at a high
level, you can also

00:26:04.550 --> 00:26:10.020
have this problem with stolen
or discarded hardware as well.

00:26:10.020 --> 00:26:12.500
If you don't use encryption,
then a lot of times,

00:26:12.500 --> 00:26:14.270
you can just take
some disk that you

00:26:14.270 --> 00:26:15.900
found in a dumpster
somewhere-- you

00:26:15.900 --> 00:26:17.483
understand what the
physical layout is

00:26:17.483 --> 00:26:19.670
and recover data like that.

00:26:19.670 --> 00:26:21.570
So anyway, there's
a lot of problems

00:26:21.570 --> 00:26:25.220
with these RAM artifacts getting
stuck in persistent storage

00:26:25.220 --> 00:26:30.490
somehow and then being available
for an attacker later on.

00:26:30.490 --> 00:26:38.670
So how can we fix these
data lifetime problems?

00:26:42.570 --> 00:26:47.700
So we've already
discussed one solution,

00:26:47.700 --> 00:26:53.429
which is to basically
zero out memory

00:26:53.429 --> 00:26:54.470
when you're done with it.

00:26:57.680 --> 00:27:00.620
So whenever you deallocate
something, you just go through.

00:27:00.620 --> 00:27:02.970
You write a bunch of
zeros or some random thing

00:27:02.970 --> 00:27:04.595
and then essentially
hide the old data

00:27:04.595 --> 00:27:06.470
from someone else who
might come along later.

00:27:06.470 --> 00:27:09.011
So does anyone see potential
any potential problem with that?

00:27:13.500 --> 00:27:16.555
One problem you might imagine
is that as with all things

00:27:16.555 --> 00:27:20.130
in security, people always
complain about performance.

00:27:20.130 --> 00:27:22.940
And so when you say that
you zero out memory,

00:27:22.940 --> 00:27:26.410
maybe this isn't a problem
if your program is I/O bound.

00:27:26.410 --> 00:27:28.776
So you're waiting on some
slow, mechanical part

00:27:28.776 --> 00:27:30.110
of the hard disk or whatnot.

00:27:30.110 --> 00:27:32.862
But imagine if your
program is CPU bound.

00:27:32.862 --> 00:27:34.570
And maybe it's very
memory intensive too.

00:27:34.570 --> 00:27:36.630
So it's always allocating
and deallocating data.

00:27:36.630 --> 00:27:40.400
So maybe zeroing out memory
might be performance cost

00:27:40.400 --> 00:27:42.499
that you don't want to pay.

00:27:42.499 --> 00:27:44.290
Typically this isn't
a problem in practice.

00:27:44.290 --> 00:27:45.990
But as we all know,
people love performance.

00:27:45.990 --> 00:27:47.740
This is sometimes an
objection that you'll

00:27:47.740 --> 00:27:49.200
have with this approach.

00:27:49.200 --> 00:27:51.903
Another thing you
can imagine doing

00:27:51.903 --> 00:27:53.486
is that instead of
zeroing out memory,

00:27:53.486 --> 00:28:03.130
you always encrypt data as
it goes to stable storage.

00:28:08.550 --> 00:28:11.210
So in a system like
this, basically,

00:28:11.210 --> 00:28:14.880
before the application ever
writes anything to disk,

00:28:14.880 --> 00:28:17.520
it's actually going to encrypt
it before it actually hits

00:28:17.520 --> 00:28:19.180
that SSD or that hard disk.

00:28:19.180 --> 00:28:22.857
Similarly, when the data comes
back in from stable storage,

00:28:22.857 --> 00:28:24.440
you're going to
decrypt it dynamically

00:28:24.440 --> 00:28:26.160
before you put it into RAM.

00:28:26.160 --> 00:28:29.410
And so what's interesting
about this approach is

00:28:29.410 --> 00:28:33.060
that if the key that you use
to decrypt and encrypt data--

00:28:33.060 --> 00:28:36.920
if you throw it away, then
once you throw it away,

00:28:36.920 --> 00:28:39.830
you've effectively
made that data on disk

00:28:39.830 --> 00:28:42.944
unrecoverable by the
attacker, assuming that you

00:28:42.944 --> 00:28:44.920
believe in cryptography.

00:28:44.920 --> 00:28:49.160
So this is very, very
nice because it gives us

00:28:49.160 --> 00:28:50.840
this nice property
that we don't have

00:28:50.840 --> 00:28:53.010
to remember per se all
places where you've

00:28:53.010 --> 00:28:54.810
written this encrypted data.

00:28:54.810 --> 00:28:56.455
We can just say
why drop the keys?

00:28:56.455 --> 00:28:58.380
And I'll just treat
all that encrypted data

00:28:58.380 --> 00:29:01.230
as it's something that
I can allocate again.

00:29:01.230 --> 00:29:08.050
So, for example, if
you look at Open BSD,

00:29:08.050 --> 00:29:14.610
they have this option where
you can do swap encryption.

00:29:14.610 --> 00:29:19.190
So you can basically
associate keys

00:29:19.190 --> 00:29:22.120
with various sections
of the page file.

00:29:22.120 --> 00:29:24.115
So it does this very
thing I mentioned.

00:29:24.115 --> 00:29:25.690
So every time you
group the machine,

00:29:25.690 --> 00:29:27.720
it'll generate a
bunch of new keys.

00:29:27.720 --> 00:29:30.340
And then when your machine goes
down because you shut it down

00:29:30.340 --> 00:29:32.298
or you reboot it or
whatever, it will basically

00:29:32.298 --> 00:29:35.100
forget all the keys that it
used to encrypt that swap space.

00:29:35.100 --> 00:29:37.058
And then it can basically
say now all that swap

00:29:37.058 --> 00:29:38.520
is available to be used again.

00:29:38.520 --> 00:29:40.910
And so because those
keys are forgotten,

00:29:40.910 --> 00:29:42.740
one can assume that
the attacker can't look

00:29:42.740 --> 00:29:43.990
at the stuff that is in there.

00:29:43.990 --> 00:29:47.127
AUDIENCE: What is
the [INAUDIBLE]?

00:29:47.127 --> 00:29:48.960
PROFESSOR: Ah, yeah,
that's a good question.

00:29:48.960 --> 00:29:52.960
I'm actually not sure what
sources of entropy it uses.

00:29:52.960 --> 00:29:56.200
Open BSD is pretty
paranoid about security.

00:29:56.200 --> 00:29:58.557
So I imagine it
does things like it

00:29:58.557 --> 00:30:00.390
looks at let's say the
entropy pool gathered

00:30:00.390 --> 00:30:02.276
from user keyboard
input, for example,

00:30:02.276 --> 00:30:03.400
and other things like that.

00:30:03.400 --> 00:30:05.608
Yeah, I'm not actually sure
how it drives those keys.

00:30:05.608 --> 00:30:08.155
But you're exactly right that
if these sources of entropy

00:30:08.155 --> 00:30:10.197
that it uses are predictable,
then that basically

00:30:10.197 --> 00:30:12.029
shrinks the entropy
space of the key itself,

00:30:12.029 --> 00:30:13.788
which then makes the
key more vulnerable.

00:30:13.788 --> 00:30:18.189
AUDIENCE: So with the memory
it's capturing [INAUDIBLE].

00:30:21.940 --> 00:30:25.510
PROFESSOR: Yeah, so basically,
what this model assumes

00:30:25.510 --> 00:30:28.580
if all we are doing is looking
at the swap encryption,

00:30:28.580 --> 00:30:32.230
It assumes that the
RAM pages for the keys,

00:30:32.230 --> 00:30:34.159
for example, are
never swapped out.

00:30:34.159 --> 00:30:35.700
And that's actually
pretty easy to do

00:30:35.700 --> 00:30:38.180
if you're the OS of if you
just pin that page to memory.

00:30:38.180 --> 00:30:40.030
And this also doesn't
help you with someone

00:30:40.030 --> 00:30:42.465
whose got pins with the
memory bus or someone who

00:30:42.465 --> 00:30:44.590
can walk the kernel memory
page or stuff like that.

00:30:44.590 --> 00:30:45.256
So you're right.

00:30:47.460 --> 00:30:49.190
AUDIENCE: In terms
of browsing, it

00:30:49.190 --> 00:30:51.641
helps of attackers that
come after the fact

00:30:51.641 --> 00:30:53.390
because if you have
to throw away the key,

00:30:53.390 --> 00:30:55.500
then after the fact,
there is no key to memory.

00:30:55.500 --> 00:30:57.083
PROFESSOR: Yeah,
that's exactly right.

00:30:57.083 --> 00:30:59.890
So what's nice about this
is that it essentially

00:30:59.890 --> 00:31:01.910
doesn't require modifications
to applications.

00:31:01.910 --> 00:31:04.810
Like you said, you can just
put any old thing atop this

00:31:04.810 --> 00:31:06.140
and get this property for free.

00:31:09.008 --> 00:31:11.790
AUDIENCE: Going back a bit--
if you look at the data

00:31:11.790 --> 00:31:16.272
before [INAUDIBLE] to RAM.

00:31:16.272 --> 00:31:18.355
How does that avoid the
RAM artifacts [INAUDIBLE]?

00:31:21.555 --> 00:31:23.513
PROFESSOR: OK, so if I
understand your question

00:31:23.513 --> 00:31:25.930
correctly, I think you're
worried about the fact

00:31:25.930 --> 00:31:29.081
that, sure, data is
encrypted when it's on disk,

00:31:29.081 --> 00:31:31.080
but then it actually can
sit in clear text forms

00:31:31.080 --> 00:31:34.710
somehow in the
actual memory itself.

00:31:34.710 --> 00:31:37.880
So this gets back to the
discussion that we had here.

00:31:37.880 --> 00:31:42.150
So ensuring that data
hit the disk encrypted

00:31:42.150 --> 00:31:44.445
doesn't actually protect
against an attacker who

00:31:44.445 --> 00:31:46.566
can look at RAM in real time.

00:31:46.566 --> 00:31:47.940
So basically what
we're saying is

00:31:47.940 --> 00:31:50.300
that if you're only worried
about this post-session

00:31:50.300 --> 00:31:52.800
attacker who can't, for
example, look at your RAM views

00:31:52.800 --> 00:31:54.354
in real time, this works fine.

00:31:54.354 --> 00:31:56.520
But you're exactly right
that this does not provide,

00:31:56.520 --> 00:31:58.469
for lack of a better
term, encrypted RAM.

00:31:58.469 --> 00:32:00.510
And there actually are
some research systems that

00:32:00.510 --> 00:32:01.880
try to do something like that.

00:32:01.880 --> 00:32:04.590
It gets a little bit tricky
because at some point when

00:32:04.590 --> 00:32:06.340
you look at your
hardware, your processor,

00:32:06.340 --> 00:32:10.276
it has to actually do
something on real data

00:32:10.276 --> 00:32:13.470
like if you want to do an ad and
you have to pass a clear text

00:32:13.470 --> 00:32:15.260
operands perhaps.

00:32:15.260 --> 00:32:17.670
There are also some
interesting research systems

00:32:17.670 --> 00:32:20.530
which actually try to do
computation on encrypted data.

00:32:20.530 --> 00:32:23.240
This is mind blowing
like "The Matrix."

00:32:23.240 --> 00:32:26.220
But suffice it to say that
protections that people have

00:32:26.220 --> 00:32:29.851
for in RAM data are typically
much weaker than what

00:32:29.851 --> 00:32:32.477
they have for data that
lives on stable storage.

00:32:32.477 --> 00:32:33.268
You got a question?

00:32:33.268 --> 00:32:35.152
AUDIENCE: Yeah, but
does that [INAUDIBLE]

00:32:35.152 --> 00:32:38.710
because even though the attacker
has post-session access,

00:32:38.710 --> 00:32:41.851
that's just post-private
mode access.

00:32:41.851 --> 00:32:43.342
So there could
this could still be

00:32:43.342 --> 00:32:45.330
a public mode session going on.

00:32:45.330 --> 00:32:48.320
And the attacker would have
access to the machine, right?

00:32:48.320 --> 00:32:49.660
PROFESSOR: So you're worried
about if a concurrent--

00:32:49.660 --> 00:32:50.656
AUDIENCE: So if you
have a public mode tab

00:32:50.656 --> 00:32:51.989
and you have a private mode tab.

00:32:51.989 --> 00:32:54.171
You close the private tab
and the public mode tab

00:32:54.171 --> 00:32:58.761
stays on-- the attacker
could still dump the memory.

00:32:58.761 --> 00:33:00.813
And the RAM artifacts
would be problematic.

00:33:00.813 --> 00:33:01.680
Is that right?

00:33:01.680 --> 00:33:04.250
PROFESSOR: Yeah,
interesting-- so we

00:33:04.250 --> 00:33:07.110
will talk at the end of
lecture about an attack which

00:33:07.110 --> 00:33:08.710
is somewhat similar.

00:33:08.710 --> 00:33:11.242
So most of the threat
models of private browsing

00:33:11.242 --> 00:33:12.950
due not assume a
current attacker at all.

00:33:12.950 --> 00:33:14.615
In other words, they
assume that when

00:33:14.615 --> 00:33:16.115
you're doing private
browsing, there

00:33:16.115 --> 00:33:18.480
is no other person
who have a public mode

00:33:18.480 --> 00:33:20.100
tab open or anything like that.

00:33:20.100 --> 00:33:24.880
But you are in fact correct that
the way that private browsing

00:33:24.880 --> 00:33:26.434
modes are often
implemented-- let's

00:33:26.434 --> 00:33:27.850
say you open up a
private browsing

00:33:27.850 --> 00:33:29.974
tab, you close that tab.

00:33:29.974 --> 00:33:31.890
You immediately run to
go get a cup of coffee.

00:33:31.890 --> 00:33:34.490
So one attack I will describe
is that Firefox, for example,

00:33:34.490 --> 00:33:37.830
still keeps statistics about,
let's say, memory allocation.

00:33:37.830 --> 00:33:39.405
So if the memory
for your private tab

00:33:39.405 --> 00:33:40.780
is actually laid
with the garbage

00:33:40.780 --> 00:33:43.830
collected and I can basically
go to about.memory or whatever

00:33:43.830 --> 00:33:46.794
and actually see URLs
and stuff in your tab.

00:33:46.794 --> 00:33:49.210
But yeah, but the long story
short, most of these attacker

00:33:49.210 --> 00:33:51.570
models do not assume
a concurrent attacker

00:33:51.570 --> 00:33:55.070
at the same time that
you're privately browsing.

00:33:55.070 --> 00:33:55.570
Make sense?

00:34:00.690 --> 00:34:03.312
So this is one that you
do-- do swap encryption

00:34:03.312 --> 00:34:04.020
like I mentioned.

00:34:04.020 --> 00:34:06.862
This is nice because this gives
you some pretty cool security

00:34:06.862 --> 00:34:08.320
properties without
having to change

00:34:08.320 --> 00:34:10.510
the browser at all or
any of applications

00:34:10.510 --> 00:34:11.630
running on top of this.

00:34:11.630 --> 00:34:15.290
And in practice, the CPU cost
of doing this kind of thing

00:34:15.290 --> 00:34:17.810
is much, much lower
than the actual cost

00:34:17.810 --> 00:34:19.879
of doing I/O in
general, particularly

00:34:19.879 --> 00:34:21.670
if you have a disk
because with disk you're

00:34:21.670 --> 00:34:22.989
particularly paying C cost.

00:34:22.989 --> 00:34:24.449
That's a mechanical cost.

00:34:24.449 --> 00:34:27.360
This is all processing cost--
pure computational stuff.

00:34:27.360 --> 00:34:30.166
So typically this not that
big of a performance hit.

00:34:36.159 --> 00:34:37.818
Oh, god there's physics here.

00:34:41.319 --> 00:34:45.980
This is always an adventure.

00:34:45.980 --> 00:34:52.320
So the next attacker that
we're going to look at

00:34:52.320 --> 00:34:57.940
is this web attacker
that I mentioned

00:34:57.940 --> 00:35:00.920
at the beginning of lecture.

00:35:00.920 --> 00:35:08.940
So the assumption here
are that the attacker

00:35:08.940 --> 00:35:17.376
who controls the website
that the user is going

00:35:17.376 --> 00:35:22.066
to visit in private
browsing mode--

00:35:22.066 --> 00:35:27.686
how the attacker does
not control the user's

00:35:27.686 --> 00:35:28.618
local machine.

00:35:32.350 --> 00:35:34.820
And so the security
goals that we

00:35:34.820 --> 00:35:38.990
want to have against the
web attackers are two fold.

00:35:41.680 --> 00:35:46.960
So first, we don't
want the attacker

00:35:46.960 --> 00:35:52.560
to be able to
identify the users.

00:35:55.560 --> 00:35:57.320
And by identify
with, we just mean

00:35:57.320 --> 00:35:59.820
we don't want the attacker
to be able to distinguish

00:35:59.820 --> 00:36:02.778
the user from any
other user that happens

00:36:02.778 --> 00:36:04.640
to be visiting the site.

00:36:04.640 --> 00:36:08.140
And you also might
imagine that perhaps we

00:36:08.140 --> 00:36:15.340
don't want the attacker to
tell whether or not we're

00:36:15.340 --> 00:36:18.940
using private browsing mode.

00:36:18.940 --> 00:36:24.430
So the attacker can't
tell the user employees

00:36:24.430 --> 00:36:25.290
private browsing.

00:36:28.380 --> 00:36:33.330
And so as the paper
discusses, defending

00:36:33.330 --> 00:36:37.260
against the web attacker
is actually pretty tricky.

00:36:37.260 --> 00:36:39.000
So what does it
mean, for example,

00:36:39.000 --> 00:36:41.935
to identify different users.

00:36:41.935 --> 00:36:44.060
Like I said, at a high
level, as you could imagine,

00:36:44.060 --> 00:36:47.320
the user looks no different
than any other users

00:36:47.320 --> 00:36:48.910
that visits this site.

00:36:48.910 --> 00:36:50.460
So you can imagine
a web attacker

00:36:50.460 --> 00:36:53.170
might want to do one
of two specific things.

00:36:53.170 --> 00:36:56.400
It might want to say, OK,
I see multiple people who

00:36:56.400 --> 00:36:59.740
were visiting my site in
private browsing mode.

00:36:59.740 --> 00:37:02.890
You were visitor five,
seven, and eight.

00:37:02.890 --> 00:37:04.890
So in other words,
identifying a particular user

00:37:04.890 --> 00:37:07.820
within the context of multiple
private browsing sessions.

00:37:07.820 --> 00:37:09.920
The second the attacker
might want to do

00:37:09.920 --> 00:37:14.230
is actually try to link a user
across public and private mode

00:37:14.230 --> 00:37:15.120
browsing sessions.

00:37:15.120 --> 00:37:18.110
So I go to Amazon.com once
in public browsing mode.

00:37:18.110 --> 00:37:20.350
I then go to it again in
private browsing mode.

00:37:20.350 --> 00:37:22.366
Can the attacker
actually figure out

00:37:22.366 --> 00:37:23.740
that I'm actually
the same person

00:37:23.740 --> 00:37:24.600
through those two visits.

00:37:24.600 --> 00:37:25.150
Yes?

00:37:25.150 --> 00:37:27.900
AUDIENCE: This is all a
module of the IP address.

00:37:27.900 --> 00:37:31.370
PROFESSOR: Ah, yes,
that's exactly right.

00:37:31.370 --> 00:37:32.740
That is excellent foreshadowing.

00:37:32.740 --> 00:37:38.315
So right now I'm assuming that
either user employs Tor or uses

00:37:38.315 --> 00:37:39.180
something like this.

00:37:39.180 --> 00:37:41.180
So yeah, we're punting
on this whole issue of IP

00:37:41.180 --> 00:37:42.270
admittedly for now.

00:37:42.270 --> 00:37:44.640
That's right.

00:37:44.640 --> 00:37:47.150
So yeah, this segues very well.

00:37:47.150 --> 00:37:48.960
So what's an easy way
to identify the user,

00:37:48.960 --> 00:37:50.780
as you suggested,
the IP address.

00:37:50.780 --> 00:37:53.260
So it's a pretty high
likelihood if you

00:37:53.260 --> 00:37:55.425
see two visits that are
sort of close in time

00:37:55.425 --> 00:37:57.590
relatively speaking
with the same IP

00:37:57.590 --> 00:38:00.900
with high likelihood that's
probably the same user.

00:38:00.900 --> 00:38:02.442
And this in fact
the motivation-- one

00:38:02.442 --> 00:38:05.110
of the motivations
for stuff like Tor.

00:38:05.110 --> 00:38:08.510
And so we're actually willing
to discuss Tor next lecture.

00:38:08.510 --> 00:38:10.320
So in case you
haven't heard of Tor,

00:38:10.320 --> 00:38:13.560
it's basically a tool which
tries to obscure things

00:38:13.560 --> 00:38:15.120
like your IP address.

00:38:15.120 --> 00:38:18.560
And you could actually
imagine layering Tor--

00:38:18.560 --> 00:38:22.210
having Tor be the foundation.

00:38:22.210 --> 00:38:24.630
And then you put private
browsing modes atop that.

00:38:24.630 --> 00:38:26.986
And that might give you some
stronger properties then

00:38:26.986 --> 00:38:31.680
you would if you used private
browsing modes at all.

00:38:31.680 --> 00:38:34.610
But, anyway, so the thing
to mention about Tor

00:38:34.610 --> 00:38:37.940
though is that Tor does provide
some sense of IP anonymity.

00:38:37.940 --> 00:38:40.830
But it doesn't actually address
things like the data secrecy

00:38:40.830 --> 00:38:42.920
lifetime issues or
things like that.

00:38:42.920 --> 00:38:46.410
So Tor-- perhaps you can think
of it as maybe necessary,

00:38:46.410 --> 00:38:48.580
but insufficient for
a full implementation

00:38:48.580 --> 00:38:50.760
of private browsing mode.

00:38:50.760 --> 00:38:53.450
And so what's interesting
too is that even if a user

00:38:53.450 --> 00:38:57.800
employees Tor, there are still
ways that a web server can

00:38:57.800 --> 00:39:02.020
identify the user by looking
at the unique characteristics

00:39:02.020 --> 00:39:06.230
of that user's browser.

00:39:06.230 --> 00:39:09.080
So this is our final
demo for today.

00:39:09.080 --> 00:39:12.255
So let's see here.

00:39:12.255 --> 00:39:15.980
So going to get rid of this guy.

00:39:15.980 --> 00:39:18.380
And then let's see.

00:39:18.380 --> 00:39:22.632
I am going to go to this
site called Panopticlick.

00:39:22.632 --> 00:39:23.840
Some of so you heard of this.

00:39:23.840 --> 00:39:25.260
It's run the EFF.

00:39:25.260 --> 00:39:29.640
The basic idea is it is going
to try to identify you the user

00:39:29.640 --> 00:39:32.940
by looking at various
characteristics of your web

00:39:32.940 --> 00:39:33.738
browser.

00:39:33.738 --> 00:39:37.410
So I'll show you
exactly what I mean.

00:39:37.410 --> 00:39:39.101
So I want to go--
the URL is very long.

00:39:39.101 --> 00:39:41.506
This is very stressful
for me to type in.

00:39:41.506 --> 00:39:43.911
So please don't just if
it doesn't go through.

00:39:43.911 --> 00:39:45.354
Let's see.

00:39:45.354 --> 00:39:49.220
Panopticlick-- did it work?

00:39:49.220 --> 00:39:51.730
Yes, OK.

00:39:51.730 --> 00:39:54.030
So I am going to
go to this website.

00:39:54.030 --> 00:39:57.600
And it's run by
the folks at EFF.

00:39:57.600 --> 00:39:59.820
And I say, OK, test me.

00:39:59.820 --> 00:40:02.117
So what this is doing
is it's basically

00:40:02.117 --> 00:40:03.825
running a bunch of
JavaScript code, maybe

00:40:03.825 --> 00:40:05.730
an applet-- maybe some Java.

00:40:05.730 --> 00:40:08.110
And it's trying to
fingerprint my browser.

00:40:08.110 --> 00:40:12.115
And it's trying to figure out
how much unique information

00:40:12.115 --> 00:40:12.990
does it have.

00:40:12.990 --> 00:40:18.810
And so-- let me
increase the font here.

00:40:18.810 --> 00:40:20.960
So, for example, one
thing it looks at

00:40:20.960 --> 00:40:23.620
is it looks at you
see here what are

00:40:23.620 --> 00:40:27.060
all the details of the browser
plugins that I'm running.

00:40:27.060 --> 00:40:29.390
So basically it'll run
code in it's web page

00:40:29.390 --> 00:40:31.454
that looks and sees do
I have Flash installed?

00:40:31.454 --> 00:40:32.370
What version of Flash?

00:40:32.370 --> 00:40:33.620
Do I have Java installed?

00:40:33.620 --> 00:40:35.970
What version of Java?

00:40:35.970 --> 00:40:39.190
So you can see that these
are all-- they can't even

00:40:39.190 --> 00:40:40.810
fit on the tree at one time.

00:40:40.810 --> 00:40:44.820
These are like all the various
plugins and ridiculous formats

00:40:44.820 --> 00:40:45.960
that my browser supports.

00:40:45.960 --> 00:40:48.774
Now, the high level-- this
should be troubling to you

00:40:48.774 --> 00:40:49.940
if you're a security person.

00:40:49.940 --> 00:40:51.939
Am I actually actively
using all of these things

00:40:51.939 --> 00:40:53.180
at a given time?

00:40:53.180 --> 00:40:55.805
This gives me nightmares.

00:40:55.805 --> 00:40:57.930
So what ends up
happening is that web

00:40:57.930 --> 00:41:00.389
servers-- this web attacker--
they can hunt code like this.

00:41:00.389 --> 00:41:02.888
And they can figure out what
are all the plugins that you're

00:41:02.888 --> 00:41:03.840
looking at.

00:41:03.840 --> 00:41:05.970
Now if you look at these
two columns to the left,

00:41:05.970 --> 00:41:07.020
what are they?

00:41:07.020 --> 00:41:09.550
So you see up here.

00:41:09.550 --> 00:41:11.810
It says bits of
identifying information.

00:41:11.810 --> 00:41:15.760
And then one in x
browsers has this value.

00:41:15.760 --> 00:41:18.635
So, for example, if
we look at a plugin,

00:41:18.635 --> 00:41:21.979
it's saying there is
basically-- it's probably

00:41:21.979 --> 00:41:23.770
this is the number
that's more interesting.

00:41:23.770 --> 00:41:24.660
It's no longer right.

00:41:24.660 --> 00:41:30.140
It's saying that 1 in
approximately 280,000 browsers

00:41:30.140 --> 00:41:33.610
has this exact set of plugins.

00:41:33.610 --> 00:41:37.960
So that's actually a pretty
specific way to fingerprint me.

00:41:37.960 --> 00:41:40.580
It's saying very,
very few people

00:41:40.580 --> 00:41:43.674
who have this exact set of
plugins and configurations.

00:41:43.674 --> 00:41:45.090
So as it turns
out, they're right.

00:41:45.090 --> 00:41:45.840
I am quite unique.

00:41:45.840 --> 00:41:50.104
But this a problem from
the security perspective.

00:41:50.104 --> 00:41:50.770
So look at this.

00:41:50.770 --> 00:41:55.120
The screen size and the
color depths for my machine--

00:41:55.120 --> 00:41:57.830
1 in-- what is this?

00:41:57.830 --> 00:42:00.570
1.5 million.

00:42:00.570 --> 00:42:02.515
That's actually pretty shocking.

00:42:02.515 --> 00:42:07.050
So there's only one person in
a sample of 1.5 million people

00:42:07.050 --> 00:42:10.420
who have this
particular screen image.

00:42:10.420 --> 00:42:14.110
So these things-- they are
additive in some sense.

00:42:14.110 --> 00:42:17.340
So the more fingerprints
you have, the more easy

00:42:17.340 --> 00:42:21.180
it is for the attacker to
figure out exactly who you are.

00:42:21.180 --> 00:42:24.420
And so note this was done
purely from the server side.

00:42:24.420 --> 00:42:26.090
I just went to this web page.

00:42:26.090 --> 00:42:27.490
And I just did this.

00:42:27.490 --> 00:42:28.710
And this is what it got to.

00:42:28.710 --> 00:42:30.716
One second-- I want to
show one more thing.

00:42:30.716 --> 00:42:33.614
This was done in
private browsing mode.

00:42:33.614 --> 00:42:35.063
And let's see here.

00:42:38.927 --> 00:42:43.948
I will open up a regular
version of Firefox.

00:42:47.392 --> 00:42:51.850
Then I run this up again.

00:42:51.850 --> 00:42:55.490
So note that now I'm in
a public mode browser.

00:42:55.490 --> 00:42:57.050
Before I was in private mode.

00:42:57.050 --> 00:42:58.970
Now I am public mode.

00:42:58.970 --> 00:43:02.250
So what you'll see is that when
we look at the browser plugins,

00:43:02.250 --> 00:43:04.000
the extent to which I
can be fingerprinted

00:43:04.000 --> 00:43:05.820
is essentially the same.

00:43:05.820 --> 00:43:08.448
So it's going to be a few
plugins that may or may not

00:43:08.448 --> 00:43:10.274
load depending on
the vagaries of how

00:43:10.274 --> 00:43:11.440
privacy mode is implemented.

00:43:11.440 --> 00:43:13.512
But still, look at that.

00:43:13.512 --> 00:43:15.872
I'm still very easy
to fingerprint.

00:43:15.872 --> 00:43:18.392
And in fact, if you
look back at this guy

00:43:18.392 --> 00:43:20.100
again-- that screen
size and color depth.

00:43:20.100 --> 00:43:22.082
I didn't change that
actually between the two--

00:43:22.082 --> 00:43:23.790
between public and
private browsing mode.

00:43:23.790 --> 00:43:26.730
So that ability to fingerprint
there is basically the same.

00:43:26.730 --> 00:43:29.430
This is one reason why it's so
difficult to protect yourself

00:43:29.430 --> 00:43:33.110
against this web attack because
browsers themselves reveal

00:43:33.110 --> 00:43:35.749
so much information about you
just from their configuration.

00:43:35.749 --> 00:43:37.998
AUDIENCE: I am curious the
screen size and color depth

00:43:37.998 --> 00:43:39.133
thing.

00:43:39.133 --> 00:43:39.966
How does it do that?

00:43:39.966 --> 00:43:42.336
How is that unique?

00:43:42.336 --> 00:43:44.887
How many screen sizes and
color depths are there?

00:43:44.887 --> 00:43:46.470
PROFESSOR: Well, I
think it's actually

00:43:46.470 --> 00:43:48.136
hiding some of the
magic that it's using

00:43:48.136 --> 00:43:49.430
to figure out what that is.

00:43:49.430 --> 00:43:51.638
So at a high level, how do
a lot of these tests work?

00:43:51.638 --> 00:43:55.250
So there's some parts of
your browser environment

00:43:55.250 --> 00:43:57.300
that are testable purely
by JavaScript code.

00:43:57.300 --> 00:43:59.866
So you can imagine that
you can essentially

00:43:59.866 --> 00:44:01.240
have JavaScript
code, which looks

00:44:01.240 --> 00:44:03.198
over the properties of
the window object, which

00:44:03.198 --> 00:44:05.370
is like a global
JavaScript manuscript

00:44:05.370 --> 00:44:07.741
and sees how do you
define this weird widget?

00:44:07.741 --> 00:44:09.240
How do you define
this weird widget?

00:44:09.240 --> 00:44:12.090
And if so, that my count
your plug-ins, lets say.

00:44:12.090 --> 00:44:14.650
Pages like this also typically
take advantage of the fact

00:44:14.650 --> 00:44:18.522
that Java applets
and Flash objects

00:44:18.522 --> 00:44:20.480
can look at all kinds of
more interesting stuff

00:44:20.480 --> 00:44:22.521
like the fonts that are
available on your machine

00:44:22.521 --> 00:44:23.660
and things like that.

00:44:23.660 --> 00:44:27.180
So as to the particular screen
size and color depth thing--

00:44:27.180 --> 00:44:28.555
I think-- don't
quote me on that.

00:44:28.555 --> 00:44:29.971
But I think what
ends up happening

00:44:29.971 --> 00:44:32.796
is it will try to run an applet,
let's say, that will actually

00:44:32.796 --> 00:44:35.360
try to query your graphics card
or whatever are the graphics

00:44:35.360 --> 00:44:38.334
interfaces in Java and poke
for different aspects of it.

00:44:38.334 --> 00:44:40.250
So I think it's actually
more than just screen

00:44:40.250 --> 00:44:41.030
size and depth.

00:44:41.030 --> 00:44:43.620
They condense it
for size as that.

00:44:43.620 --> 00:44:45.842
So at a high level, that's
how all these tricks work.

00:44:45.842 --> 00:44:47.300
So you see a bunch
of information--

00:44:47.300 --> 00:44:48.830
you can snarf up
through JavaScript.

00:44:48.830 --> 00:44:50.720
Then you run a bunch
of plugins, which

00:44:50.720 --> 00:44:53.820
can typically access more stuff
and see what they can snarf up.

00:44:53.820 --> 00:44:56.910
And then you see
what's going on.

00:44:56.910 --> 00:44:58.736
Does it all make sense?

00:44:58.736 --> 00:45:01.152
Yeah, this is basically why
it's very difficult to protect

00:45:01.152 --> 00:45:02.520
against a web attacker.

00:45:02.520 --> 00:45:04.686
And in particular, getting
back to the discussion we

00:45:04.686 --> 00:45:07.940
had about Tor, right, even if
I had gone through Tor-- so

00:45:07.940 --> 00:45:12.145
you'll note the IP address--
you don't see it up here.

00:45:12.145 --> 00:45:13.867
And so you can
imagine that yeah,

00:45:13.867 --> 00:45:16.200
maybe this thing would actually
look at your IP address.

00:45:16.200 --> 00:45:17.405
But the thing is
like even if I didn't

00:45:17.405 --> 00:45:19.200
know what IP you were
coming from at all,

00:45:19.200 --> 00:45:21.924
I can do all these things.

00:45:21.924 --> 00:45:22.840
It's pretty maddening.

00:45:22.840 --> 00:45:23.890
It's pretty insane.

00:45:23.890 --> 00:45:25.682
So there are some
products out there

00:45:25.682 --> 00:45:28.690
that tried to do
things like imagine

00:45:28.690 --> 00:45:31.090
that you had a proxy
out in the cloud

00:45:31.090 --> 00:45:33.170
that all your web
traffic went through.

00:45:33.170 --> 00:45:34.680
And then imagine
that proxy tried

00:45:34.680 --> 00:45:40.250
to present a canonical
version of a browser runtime.

00:45:40.250 --> 00:45:42.890
And imagine that it would
always try to emulate,

00:45:42.890 --> 00:45:46.400
let's say, Firefox v 10.7.

00:45:46.400 --> 00:45:48.780
Then it would try to
send back the data

00:45:48.780 --> 00:45:51.930
that it rendered
as Firefox v 10.7.

00:45:51.930 --> 00:45:53.960
So some people would
try to attack this.

00:45:53.960 --> 00:45:54.970
It's sort of tricky.

00:45:54.970 --> 00:45:55.886
AUDIENCE: [INAUDIBLE].

00:45:58.896 --> 00:45:59.878
PROFESSOR: I am not--

00:45:59.878 --> 00:46:00.860
AUDIENCE: Is that
Tor distributions?

00:46:00.860 --> 00:46:02.333
Is that paired with
virtual machines?

00:46:02.333 --> 00:46:02.833
[INAUDIBLE]

00:46:05.527 --> 00:46:07.110
PROFESSOR: I see--
so the basic idea--

00:46:07.110 --> 00:46:09.443
is it a similar idea to what
we were just talking about?

00:46:09.443 --> 00:46:10.744
AUDIENCE: Yes, [INAUDIBLE].

00:46:10.744 --> 00:46:12.660
PROFESSOR: Yeah, so I
never heard of that one.

00:46:12.660 --> 00:46:14.535
I have heard of some of
these other projects.

00:46:14.535 --> 00:46:18.480
I'm imagining there's actually
some trickiness in getting

00:46:18.480 --> 00:46:20.495
systems like this to be
efficient a lot of times

00:46:20.495 --> 00:46:22.870
because particularly imagine
if you have something that's

00:46:22.870 --> 00:46:23.655
interactive.

00:46:23.655 --> 00:46:26.030
It's like you want to play a
game or something like that.

00:46:26.030 --> 00:46:28.790
It's a little bit
awkward to send my mouse

00:46:28.790 --> 00:46:30.650
click to some proxy.

00:46:30.650 --> 00:46:34.858
That proxy is then somehow
going to [INAUDIBLE].

00:46:34.858 --> 00:46:38.770
AUDIENCE: Let me clarify the
first station virtual machine

00:46:38.770 --> 00:46:41.215
actually runs
[INAUDIBLE] Firefox.

00:46:41.215 --> 00:46:44.160
In the proxy it's
known as a Tor.

00:46:44.160 --> 00:46:46.512
PROFESSOR: Ah, it's
just a Tor proxy.

00:46:46.512 --> 00:46:48.470
So if it's a Tor proxy,
sure, that's one thing.

00:46:48.470 --> 00:46:50.303
Then the only overhead
there you have to pay

00:46:50.303 --> 00:46:53.062
is the regular Tor
overhead of going

00:46:53.062 --> 00:46:54.840
through all the onion route.

00:46:54.840 --> 00:46:57.860
Yeah, so I was talking
there are systems--

00:46:57.860 --> 00:46:59.880
let's ignore the IP
anonymity for a second

00:46:59.880 --> 00:47:01.820
because they
basically try to say

00:47:01.820 --> 00:47:04.550
you have your own very
fingerprintable browser

00:47:04.550 --> 00:47:05.571
on your own machine.

00:47:05.571 --> 00:47:07.570
You don't want to expose
that to the web server.

00:47:07.570 --> 00:47:09.270
So essentially you
go through a proxy,

00:47:09.270 --> 00:47:10.686
which you think
of it all the time

00:47:10.686 --> 00:47:14.370
like a headless Firefox let's
say of some canonical version.

00:47:14.370 --> 00:47:16.760
The web server thinks it is
interacting with this thing.

00:47:16.760 --> 00:47:19.910
So if I go load this site, I
am perceived by the web server

00:47:19.910 --> 00:47:21.490
as Firefox 10.7 or whatever.

00:47:21.490 --> 00:47:23.910
If you go there, you are also
perceived as Firefox 10.7.

00:47:23.910 --> 00:47:26.954
Then behind the scenes its'
spitting out HTML and stuff

00:47:26.954 --> 00:47:29.000
like that it collected
from the proxy.

00:47:29.000 --> 00:47:32.680
So those two things
are orthogonal.

00:47:32.680 --> 00:47:35.620
AUDIENCE: But it seems like you
don't need a proxy for this.

00:47:35.620 --> 00:47:36.600
You could have browser
support for this, right?

00:47:36.600 --> 00:47:38.460
Meaning the Tor
browser does this

00:47:38.460 --> 00:47:42.150
already by trying to appear
as the most generic version

00:47:42.150 --> 00:47:42.870
of Firefox.

00:47:42.870 --> 00:47:44.560
PROFESSOR: Yeah,
so this is true.

00:47:44.560 --> 00:47:46.810
Although, I think a problem
with a lot of those things

00:47:46.810 --> 00:47:49.311
that even if you try to lock
yourself into one version,

00:47:49.311 --> 00:47:51.560
there's still a lot of things
that can fingerprint it.

00:47:51.560 --> 00:47:53.768
So I think with the Tor
distribution, what they often

00:47:53.768 --> 00:47:56.950
do is they say, we control
what's in the Tor distribution.

00:47:56.950 --> 00:47:59.670
So if we all go down to the Tor
distribution, then forshizzle,

00:47:59.670 --> 00:48:04.000
we're both going to get Firefox
with the same Java version--

00:48:04.000 --> 00:48:05.624
the same so on and so forth.

00:48:05.624 --> 00:48:07.490
AUDIENCE: Well, it's
more than that though.

00:48:07.490 --> 00:48:09.823
They return screen sizes that
are the most common screen

00:48:09.823 --> 00:48:11.909
sizes whenever you
clear the screen.

00:48:11.909 --> 00:48:13.034
PROFESSOR: That's all true.

00:48:13.034 --> 00:48:14.170
Yeah, so one thing that's
interesting to look

00:48:14.170 --> 00:48:16.639
at though-- the Tor team that
also put out-- the people who

00:48:16.639 --> 00:48:19.180
do the bundle-- they'll often
put out reports about what data

00:48:19.180 --> 00:48:20.120
still gets leaked.

00:48:20.120 --> 00:48:21.946
So stuff does still
get leaked out.

00:48:21.946 --> 00:48:23.297
But you're right.

00:48:23.297 --> 00:48:25.880
If you could-- the high level
of that goal is very reasonable.

00:48:25.880 --> 00:48:27.590
It's saying that
if we all agreed

00:48:27.590 --> 00:48:29.629
to download the
same distribution

00:48:29.629 --> 00:48:32.170
and to then not trick it out by
adding plugins or stuff like,

00:48:32.170 --> 00:48:33.253
then you're exactly right.

00:48:33.253 --> 00:48:35.197
That'd work great.

00:48:35.197 --> 00:48:36.030
Any other questions?

00:48:40.410 --> 00:48:44.030
Yeah, so that is
it for demo time.

00:48:51.845 --> 00:48:52.886
And there's more physics.

00:48:56.330 --> 00:48:59.560
This must have been a
riveting previous class.

00:48:59.560 --> 00:49:01.295
So we will ignore
that for the moment.

00:49:01.295 --> 00:49:01.920
Let's see here.

00:49:07.240 --> 00:49:08.340
So where were we?

00:49:12.116 --> 00:49:14.525
So what is the high-level
goal of privacy?

00:49:14.525 --> 00:49:15.900
And you can think
of it as what's

00:49:15.900 --> 00:49:18.420
your anonymity set
if you're a user?

00:49:18.420 --> 00:49:20.490
So in other words,
how many-- what's

00:49:20.490 --> 00:49:22.630
the size of people--
the number of people

00:49:22.630 --> 00:49:25.270
that you could be
confused for-- you

00:49:25.270 --> 00:49:26.857
could be mistaken
for by an attacker.

00:49:26.857 --> 00:49:28.940
And so what the browser
fingerprinting stuff shows

00:49:28.940 --> 00:49:32.620
is that oftentimes a web
attacker can narrow you

00:49:32.620 --> 00:49:35.360
down to a very, very
tight demographic

00:49:35.360 --> 00:49:38.510
without controlling anything on
your local machine whatsoever.

00:49:38.510 --> 00:49:41.370
So that's actually little
bit frightening to know.

00:49:44.020 --> 00:49:47.480
So you might want
to think about how

00:49:47.480 --> 00:49:50.480
can a web attacker determine if
you're using private browsing

00:49:50.480 --> 00:49:51.692
mode?

00:49:51.692 --> 00:49:53.400
Maybe that's [INAUDIBLE]
for some reason.

00:49:53.400 --> 00:49:56.260
So in the paper they
actually describe an attack

00:49:56.260 --> 00:49:58.400
that uses link colors.

00:49:58.400 --> 00:50:00.260
So remember, in
private browsing mode,

00:50:00.260 --> 00:50:01.730
the browsers isn't
supposed to keep

00:50:01.730 --> 00:50:04.770
track of the history of
the sites that you visit.

00:50:04.770 --> 00:50:07.630
And so in the paper, the
authors describe an attack

00:50:07.630 --> 00:50:10.630
in which essentially the
attacker-controlled page

00:50:10.630 --> 00:50:14.510
creates an iframe to some URL
that the attacker controls

00:50:14.510 --> 00:50:16.780
and loads that inside
the attacker page.

00:50:16.780 --> 00:50:19.400
And then it basically
looks at the link colors.

00:50:19.400 --> 00:50:21.065
It creates a link
to that page-- that

00:50:21.065 --> 00:50:22.880
iframe it just
created-- and then sees

00:50:22.880 --> 00:50:26.810
that the link color for that
link is the visited color.

00:50:26.810 --> 00:50:29.460
So see it as purple versus blue.

00:50:29.460 --> 00:50:33.600
And the idea that if you do this
test in private browsing mode,

00:50:33.600 --> 00:50:35.510
then presumably the
link colors should

00:50:35.510 --> 00:50:38.084
stay like the unvisited
color because the browser

00:50:38.084 --> 00:50:40.542
is supposed to be forgetting
about all this kinds of stuff.

00:50:40.542 --> 00:50:43.097
So that's the attack they
describe in the paper.

00:50:43.097 --> 00:50:45.055
What's interesting is
that this attack actually

00:50:45.055 --> 00:50:46.330
doesn't work anymore.

00:50:46.330 --> 00:50:49.280
So we actually discussed this
a couple of lectures back.

00:50:49.280 --> 00:50:51.430
So this attack that
the paper describes

00:50:51.430 --> 00:50:53.550
is the browser history
sniffing attack.

00:50:53.550 --> 00:50:55.640
So as we discussed a
couple of lectures ago,

00:50:55.640 --> 00:50:59.770
JavaScript code now does not
expose correct link colors

00:50:59.770 --> 00:51:02.380
basically to JavaScript .

00:51:02.380 --> 00:51:06.290
And it's precisely to prevent
these types of attacks.

00:51:06.290 --> 00:51:08.775
So that particular part
of the paper is outdated.

00:51:08.775 --> 00:51:11.790
AUDIENCE: What does it point
to that browsers now also show

00:51:11.790 --> 00:51:14.774
links as purple in
private browsing mode

00:51:14.774 --> 00:51:16.444
and turn blue again
when you exit.

00:51:16.444 --> 00:51:18.110
PROFESSOR: Yeah, it's
a bit weird, yeah.

00:51:18.110 --> 00:51:20.442
They implemented that
attack-- the defense--

00:51:20.442 --> 00:51:22.275
I think before a lot
of the private browsers

00:51:22.275 --> 00:51:23.220
like a popware.

00:51:23.220 --> 00:51:25.190
So now they do this
additional thing too.

00:51:25.190 --> 00:51:27.854
The long story short, the attack
they describe in the paper

00:51:27.854 --> 00:51:30.145
doesn't work because of some
of these browsers sniffing

00:51:30.145 --> 00:51:30.700
defenses.

00:51:30.700 --> 00:51:32.610
But you can still
imagine that there

00:51:32.610 --> 00:51:36.054
may be ways for the web attacker
to figure out if you are

00:51:36.054 --> 00:51:37.220
using private browsing mode.

00:51:37.220 --> 00:51:40.500
So for example, when you
do private browsing mode,

00:51:40.500 --> 00:51:42.640
any cookies that you
got from public mode

00:51:42.640 --> 00:51:45.340
should not be sent
during private mode.

00:51:45.340 --> 00:51:48.102
So in other words, if I go
to Amazon.com in public mode,

00:51:48.102 --> 00:51:50.050
I generate some cookies.

00:51:50.050 --> 00:51:52.521
Then I go to Amazon.com
in private browsing mode.

00:51:52.521 --> 00:51:54.270
When I contact Amazon.com
in private mode,

00:51:54.270 --> 00:51:57.320
those public mode cookies
should not be sent.

00:51:57.320 --> 00:52:02.420
That can actually act as
the sign to the web attacker

00:52:02.420 --> 00:52:04.500
that you actually are
using private mode.

00:52:04.500 --> 00:52:06.940
AUDIENCE: This is also now
you're using the canvass

00:52:06.940 --> 00:52:08.612
in both of these events, right?

00:52:08.612 --> 00:52:10.770
So you need to know
the IP address.

00:52:10.770 --> 00:52:12.722
PROFESSOR: Yeah, that's right.

00:52:12.722 --> 00:52:14.610
AUDIENCE: So that
link that you were

00:52:14.610 --> 00:52:17.442
targeting with the link color
would be on a per IP basis.

00:52:17.442 --> 00:52:19.358
And you would have to
rely that the user first

00:52:19.358 --> 00:52:21.494
visited it as a public
mode, and you protect it.

00:52:21.494 --> 00:52:23.160
PROFESSOR: Ah, so the
link-- so the link

00:52:23.160 --> 00:52:26.800
attack you can actually do in
the context of a single page.

00:52:26.800 --> 00:52:29.300
So imagine that I,
the web attacker,

00:52:29.300 --> 00:52:30.819
construct single page.

00:52:30.819 --> 00:52:33.110
I, the attacker, have JavaScript
that creates an iframe

00:52:33.110 --> 00:52:35.820
to foo.com like this.

00:52:35.820 --> 00:52:38.570
So that iframe will load
the contents of that page.

00:52:38.570 --> 00:52:40.570
And then I, the attacker,
in the parent frame

00:52:40.570 --> 00:52:42.840
can then create a
link element and then

00:52:42.840 --> 00:52:44.190
try to look at the color.

00:52:44.190 --> 00:52:46.330
This worked four years ago.

00:52:46.330 --> 00:52:49.880
So in that case, it doesn't rely
on the user having explicitly

00:52:49.880 --> 00:52:53.890
visited that iframe page at
all because I, the attacker,

00:52:53.890 --> 00:52:56.008
can create that in the
context of the page.

00:52:56.008 --> 00:52:59.330
I have gotten [INAUDIBLE].

00:52:59.330 --> 00:53:01.310
Any other questions?

00:53:01.310 --> 00:53:04.167
So yeah, so you can maybe
think about how cookies

00:53:04.167 --> 00:53:06.000
can reveal public and
private browsing modes

00:53:06.000 --> 00:53:08.660
and things like that.

00:53:08.660 --> 00:53:12.120
So one thing we
might think about

00:53:12.120 --> 00:53:21.210
is how we can provide
a stronger privacy

00:53:21.210 --> 00:53:25.520
guarantee for private browsers?

00:53:29.554 --> 00:53:35.050
And for the sake
of this discussion,

00:53:35.050 --> 00:53:41.630
let's just ignore
IP addresses for now

00:53:41.630 --> 00:53:45.260
because as we'll
discuss next lecture,

00:53:45.260 --> 00:53:47.670
we can used Tor to
maybe help with some

00:53:47.670 --> 00:53:49.330
of the privacy of IP addresses.

00:53:49.330 --> 00:53:51.971
So one thing you
can imagine doing

00:53:51.971 --> 00:53:55.712
is you can imagine
using VMs in some way

00:53:55.712 --> 00:54:06.200
to help provide stronger private
browsing guaranteed-- so VM

00:54:06.200 --> 00:54:08.490
level privacy.

00:54:08.490 --> 00:54:11.290
And so the basic
idea is that you

00:54:11.290 --> 00:54:21.825
want to run each private
session inside of a separate VM.

00:54:25.020 --> 00:54:29.070
And then when the user
is done with that--

00:54:29.070 --> 00:54:31.830
so is finished with the
private browsing session,

00:54:31.830 --> 00:54:38.580
you basically delete VM
after that session is done.

00:54:43.820 --> 00:54:47.870
So what's the advantage of this?

00:54:47.870 --> 00:54:51.230
Well, what's nice about
this is presumably

00:54:51.230 --> 00:54:52.730
you can get some
stronger guarantees

00:54:52.730 --> 00:54:58.910
about what privacy properties
you can provide to the user

00:54:58.910 --> 00:55:01.640
because, presumably,
the VM has a pretty

00:55:01.640 --> 00:55:06.606
clean interface to the I/O
path of the underlying post-OS.

00:55:06.606 --> 00:55:07.980
So you can imagine
that maybe you

00:55:07.980 --> 00:55:13.000
combine this VMs into let's
say some type of a secure swap

00:55:13.000 --> 00:55:16.206
solution like Open BSD has--
give us another encrypted disk

00:55:16.206 --> 00:55:16.705
type thing.

00:55:16.705 --> 00:55:21.840
So you can imagine, OK, we have
a very clean separation of VM

00:55:21.840 --> 00:55:24.450
up here and all the I/Os
that are generated down here.

00:55:24.450 --> 00:55:27.420
And so that gives you
stronger guarantees

00:55:27.420 --> 00:55:30.891
than what you can get from the
browser, which wasn't designed

00:55:30.891 --> 00:55:33.390
from the ground up to think
very carefully about all the I/O

00:55:33.390 --> 00:55:35.764
paths and what secrets might
leak when it was in storage.

00:55:38.620 --> 00:55:42.330
So yes, this
provides what's nice

00:55:42.330 --> 00:55:45.050
about this-- strong guarantees.

00:55:48.930 --> 00:55:52.560
And, also, what's nice
is it doesn't require

00:55:52.560 --> 00:55:57.060
any changes to your
applications-- that

00:55:57.060 --> 00:55:58.474
is to say to the browser.

00:55:58.474 --> 00:56:00.140
You take your browser,
put it inside one

00:56:00.140 --> 00:56:03.760
of these VMs-- then everything
gets better all magically.

00:56:03.760 --> 00:56:06.045
It's not location change.

00:56:06.045 --> 00:56:11.315
So what's bad about this--
I'll use an unhappy face

00:56:11.315 --> 00:56:12.330
to demonstrate that.

00:56:12.330 --> 00:56:17.110
So what's bad is first
of all, it's heavyweight.

00:56:17.110 --> 00:56:20.050
And by heavyweight,
I mean that time you

00:56:20.050 --> 00:56:22.860
want to spin up one of these
private browsing sessions,

00:56:22.860 --> 00:56:25.000
you have to spin up a whole VM.

00:56:25.000 --> 00:56:27.260
And that can actually
be pretty painful.

00:56:27.260 --> 00:56:28.886
So perhaps users are
going to get upset

00:56:28.886 --> 00:56:30.760
because it's going to
take them long time now

00:56:30.760 --> 00:56:32.660
to launch these private
browsing sessions.

00:56:32.660 --> 00:56:36.730
And the other problems to is
this solution actually has

00:56:36.730 --> 00:56:39.830
bad usability.

00:56:39.830 --> 00:56:43.080
And the reason I say that
is because now it's actually

00:56:43.080 --> 00:56:47.230
difficult for users to
do things like take files

00:56:47.230 --> 00:56:49.176
that they've saved in
private browsing mode

00:56:49.176 --> 00:56:52.190
and then take them to the
rest of their computer--

00:56:52.190 --> 00:56:54.731
any bookmarks that they generate
during private browsing mode

00:56:54.731 --> 00:56:57.110
that who they actually
do want to persist

00:56:57.110 --> 00:56:59.485
will be difficult to
get those at the end.

00:56:59.485 --> 00:57:00.110
It can be done.

00:57:00.110 --> 00:57:02.120
But there's a lot
of friction here.

00:57:02.120 --> 00:57:03.400
So that's the bummer.

00:57:05.920 --> 00:57:11.720
So another thing that
you might imagine doing

00:57:11.720 --> 00:57:16.740
is something that looks
like approach number one.

00:57:16.740 --> 00:57:23.813
But we actually implement it
inside of the OS themselves

00:57:23.813 --> 00:57:26.180
instead of in a virtual machine.

00:57:26.180 --> 00:57:28.500
So the basic idea
here is that you

00:57:28.500 --> 00:57:35.500
can imagine that each
process could potentially

00:57:35.500 --> 00:57:39.746
run in a privacy domain.

00:57:44.620 --> 00:57:51.340
So basically, the privacy
domain access the collection

00:57:51.340 --> 00:57:54.400
of OS global resources
that process uses.

00:57:54.400 --> 00:57:56.680
And so the OS tracks
all that kind of stuff.

00:57:56.680 --> 00:58:00.190
And then once the process
dies, essentially the OS

00:58:00.190 --> 00:58:01.950
goes through and looks
at all the things

00:58:01.950 --> 00:58:04.230
that are in that
privacy domain set.

00:58:04.230 --> 00:58:09.050
And then purely deallocate
all those resources.

00:58:09.050 --> 00:58:12.993
And so the advantage
of this over the VM

00:58:12.993 --> 00:58:20.330
is that it is lighter weight
because if you think about it,

00:58:20.330 --> 00:58:23.450
the VM is essentially agnostic
to all the OS state and all

00:58:23.450 --> 00:58:26.580
the application state that is
actually being used to run.

00:58:26.580 --> 00:58:29.266
So the result-- it probably
does more work than the OS

00:58:29.266 --> 00:58:31.880
would have to do because
the OS presumably

00:58:31.880 --> 00:58:35.082
knows all the points at which
the private browser would

00:58:35.082 --> 00:58:38.360
be touching I/O, and talk to the
network, and stuff like that.

00:58:38.360 --> 00:58:40.585
So maybe it even knows
things like you can actually

00:58:40.585 --> 00:58:43.980
clear the DNS cache
selectively, for example.

00:58:43.980 --> 00:58:46.560
So you can imagine
that it's much easier

00:58:46.560 --> 00:58:49.095
to spin these things up--
these privacy domains--

00:58:49.095 --> 00:58:51.090
then to tear them down.

00:58:51.090 --> 00:58:53.930
However, the sad thing,
at least with respect

00:58:53.930 --> 00:58:58.580
to the virtual machine
solution, is that it's harder

00:58:58.580 --> 00:58:59.330
to get this right.

00:59:03.010 --> 00:59:07.292
So I just described
the VM approach

00:59:07.292 --> 00:59:09.000
as being headway
because it's essentially

00:59:09.000 --> 00:59:12.650
agnostic to everything that's
running inside the container.

00:59:12.650 --> 00:59:14.832
But what's nice about
that is that allows

00:59:14.832 --> 00:59:18.585
the VM approach to only focus
on a few low-level interfaces.

00:59:18.585 --> 00:59:20.620
And it can focus
on those things.

00:59:20.620 --> 00:59:23.600
For example, the interface
the VM uses to write to disk,

00:59:23.600 --> 00:59:27.230
then you can have high
confidence that it's actually

00:59:27.230 --> 00:59:29.070
managed to contain everything.

00:59:29.070 --> 00:59:30.705
Whereas with the
OS-- if you think

00:59:30.705 --> 00:59:33.291
the OS is going to interpose
on individual files with system

00:59:33.291 --> 00:59:35.790
interfaces-- perhaps individual
network interfaces and stuff

00:59:35.790 --> 00:59:37.714
like that-- it's much
more complicated to find

00:59:37.714 --> 00:59:42.667
all of those points at which
data can leak if you're going

00:59:42.667 --> 00:59:44.450
to do that at the OS level.

00:59:44.450 --> 00:59:45.782
So does that all make sense?

00:59:57.972 --> 00:59:59.263
Why is this physics everywhere?

01:00:02.124 --> 01:00:03.207
Ah, god, I'm being tested.

01:00:09.468 --> 01:00:10.926
Those are basically
some approaches

01:00:10.926 --> 01:00:13.952
we can use to provide
potentially stronger privacy

01:00:13.952 --> 01:00:16.202
guarantees than what's
implemented in private browsers

01:00:16.202 --> 01:00:18.240
right now.

01:00:18.240 --> 01:00:26.610
So one question you might
have is can we still

01:00:26.610 --> 01:00:33.250
be an anonymized user
if the browser-- sorry,

01:00:33.250 --> 01:00:38.950
if the user is employing one of
these more powerful solutions--

01:00:38.950 --> 01:00:43.039
if the user is
surfing through VM

01:00:43.039 --> 01:00:45.330
or surfing one of these
privacy domains in the OS-- can

01:00:45.330 --> 01:00:46.960
we still figure
out who they are?

01:00:46.960 --> 01:00:48.420
And the answer is, yes.

01:00:48.420 --> 01:00:53.020
So maybe the VM is
unique for some reason.

01:00:56.800 --> 01:01:02.844
And so similar to how we were
able to fingerprint browsers

01:01:02.844 --> 01:01:04.760
using that Panopticlick
website, maybe there's

01:01:04.760 --> 01:01:07.460
something unique about the way
that the VM would be set up

01:01:07.460 --> 01:01:09.530
that allows to fingerprint it.

01:01:09.530 --> 01:01:14.800
And it may in fact be the case
that maybe the virtual machine

01:01:14.800 --> 01:01:20.640
monitor or the OS itself
is unique in some ways.

01:01:20.640 --> 01:01:23.650
That would allow a web attacker
to figure out who the user was.

01:01:23.650 --> 01:01:28.440
And so one cute example of
this is TCP fingerprinting.

01:01:32.742 --> 01:01:34.200
So what's the big
idea behind this.

01:01:34.200 --> 01:01:35.980
So as it turns out,
the specification

01:01:35.980 --> 01:01:38.290
for the TCP protocol
actually allows

01:01:38.290 --> 01:01:40.420
some of the parameters
for the protocol

01:01:40.420 --> 01:01:44.080
to be set by the
implementation of the protocol.

01:01:44.080 --> 01:01:47.725
So, for example, TCP allows
implementers to choose things

01:01:47.725 --> 01:01:49.556
like initial packet
size-- the things that

01:01:49.556 --> 01:01:52.140
are sent out the first part
of the TCP connection--

01:01:52.140 --> 01:01:55.000
it allows implementers to choose
things like that initial time

01:01:55.000 --> 01:01:57.870
to live in those packets.

01:01:57.870 --> 01:01:59.745
And so you can imagine,
and in fact, you

01:01:59.745 --> 01:02:01.995
don't have to imagine that
this is actually the truth.

01:02:01.995 --> 01:02:04.817
You can get off-the shelf
tools like InMap, for example,

01:02:04.817 --> 01:02:07.150
that they actually can tell
what operating system you're

01:02:07.150 --> 01:02:10.340
running with high probability
just by sending you packets.

01:02:10.340 --> 01:02:13.040
They'll send these very
carefully crafted packets.

01:02:13.040 --> 01:02:15.042
And they will look and
see things like here's

01:02:15.042 --> 01:02:17.000
what the TTL was or here's
what the packet size

01:02:17.000 --> 01:02:20.090
distribution was-- here's what
the TTP sequence number was.

01:02:20.090 --> 01:02:22.394
And they basically have a
database to fingerprint.

01:02:22.394 --> 01:02:24.644
And they say, OK, if the
return packet has this, this,

01:02:24.644 --> 01:02:27.280
and this characteristic,
then the table

01:02:27.280 --> 01:02:29.420
says that you're probably
running for some reason

01:02:29.420 --> 01:02:30.650
Solaris.

01:02:30.650 --> 01:02:31.800
You're running Mac.

01:02:31.800 --> 01:02:34.120
You're running
Windows or whatever.

01:02:34.120 --> 01:02:36.770
So even if we use one of
these stronger approaches

01:02:36.770 --> 01:02:39.070
for private browsing
with a VM or an OS,

01:02:39.070 --> 01:02:41.570
you still may be able to run
one of those TCP fingerprinting

01:02:41.570 --> 01:02:45.360
attacks and learn a lot
about a particular user.

01:02:45.360 --> 01:02:50.302
And one thing that's
also interesting to note

01:02:50.302 --> 01:02:56.042
is that even if we use one of
these more powerful techniques

01:02:56.042 --> 01:02:59.070
to try to protect the
user, the user is still

01:02:59.070 --> 01:03:04.500
shared across both the public
and the private browsing

01:03:04.500 --> 01:03:05.550
session.

01:03:05.550 --> 01:03:07.460
Still uses-- visibly
using the machine.

01:03:07.460 --> 01:03:09.480
So why is it interesting?

01:03:09.480 --> 01:03:13.020
Well, it's interesting
because you yourself

01:03:13.020 --> 01:03:17.180
by way that you use computers,
may leak information

01:03:17.180 --> 01:03:17.980
about yourself.

01:03:17.980 --> 01:03:22.780
So, for example,
as it turns out,

01:03:22.780 --> 01:03:26.140
users have unique
keystroke timing.

01:03:29.050 --> 01:03:32.600
So if I look at-- if I give
everyone in this class the same

01:03:32.600 --> 01:03:35.240
thing to type in --
the quick, brown fox--

01:03:35.240 --> 01:03:37.380
whatever that nonsense
is-- and I actually look

01:03:37.380 --> 01:03:42.320
at the inter-key press timing,
we'll all have these unique

01:03:42.320 --> 01:03:44.790
distributions that can
potentially be used

01:03:44.790 --> 01:03:46.890
to fingerprint us.

01:03:46.890 --> 01:03:50.960
Another thing that's
interesting is that users

01:03:50.960 --> 01:03:52.510
have unique writing styles.

01:03:55.850 --> 01:04:00.500
So there's this
branch of security

01:04:00.500 --> 01:04:02.525
that is called stylography.

01:04:06.060 --> 01:04:12.270
And the basic idea here is to
figure out if I am an attacker,

01:04:12.270 --> 01:04:14.410
can I figure out
who you are just

01:04:14.410 --> 01:04:16.460
by looking at writing
samples from you?

01:04:16.460 --> 01:04:18.730
So imagine that
for whatever reason

01:04:18.730 --> 01:04:21.190
you're hanging out on 4chan--
don't hang out on 4chan--

01:04:21.190 --> 01:04:23.697
and I want to figure out if
you've actually, in fact,

01:04:23.697 --> 01:04:24.780
been hanging out on 4chan.

01:04:24.780 --> 01:04:27.240
So perhaps what
I can do is I can

01:04:27.240 --> 01:04:30.970
look at a bunch of
different posts from 4chan.

01:04:30.970 --> 01:04:34.692
Maybe I can cluster those
posts into sets of posts

01:04:34.692 --> 01:04:37.999
that I think look
stylistically the same.

01:04:37.999 --> 01:04:39.790
And then what I can do
is I can find things

01:04:39.790 --> 01:04:42.580
that you've written publicly
where you're actually

01:04:42.580 --> 01:04:43.652
attributed as the author.

01:04:43.652 --> 01:04:45.610
I'll look at you homework
assignments or papers

01:04:45.610 --> 01:04:47.276
that you've written
or things like that.

01:04:47.276 --> 01:04:49.630
And I'll see do you match
any of these clusters

01:04:49.630 --> 01:04:51.130
from these 4chan comments.

01:04:51.130 --> 01:04:53.900
And if so, them maybe I can say
maybe send you a stern note.

01:04:53.900 --> 01:04:55.121
Talk to the parents that
their kid has gone off

01:04:55.121 --> 01:04:56.090
the beaten path.

01:04:56.090 --> 01:04:57.700
Get off of 4chan.

01:04:57.700 --> 01:05:00.460
So the reason is I would like
to look at this thing called

01:05:00.460 --> 01:05:00.960
stylography.

01:05:00.960 --> 01:05:03.100
It's actually quite interesting.

01:05:03.100 --> 01:05:06.371
Does anyone have any
questions about that?

01:05:06.371 --> 01:05:06.870
Excellent.

01:05:09.620 --> 01:05:15.040
So we discuss how we
might be able to use

01:05:15.040 --> 01:05:19.340
VM or modified operating systems
to provide private browsing

01:05:19.340 --> 01:05:20.130
support.

01:05:20.130 --> 01:05:23.010
And so you might wonder, OK,
well, then why don't browsers

01:05:23.010 --> 01:05:25.895
require users to do
one of these things--

01:05:25.895 --> 01:05:28.270
to have one of these tricked
out VMs or tricked out OSes?

01:05:28.270 --> 01:05:30.050
So why do browsers
take it upon themselves

01:05:30.050 --> 01:05:31.560
to implement all this stuff?

01:05:31.560 --> 01:05:34.200
And so the main reason
is deployability.

01:05:34.200 --> 01:05:36.210
So in fact, browser
vendors typically

01:05:36.210 --> 01:05:39.290
do not want to ask their
users to do anything special

01:05:39.290 --> 01:05:42.550
to use the browser besides
install the browser binary

01:05:42.550 --> 01:05:43.050
itself.

01:05:43.050 --> 01:05:44.582
This is similar
to the motivation

01:05:44.582 --> 01:05:45.720
of the native client.

01:05:45.720 --> 01:05:47.840
So if Google wants to
add these cool future

01:05:47.840 --> 01:05:49.020
to end users' computers.

01:05:49.020 --> 01:05:50.640
But it doesn't
want to force users

01:05:50.640 --> 01:05:53.795
to install some special
version of Windows or Linux

01:05:53.795 --> 01:05:54.620
or whatever.

01:05:54.620 --> 01:05:58.610
So Google basically says, we'll
take care of this ourselves.

01:05:58.610 --> 01:06:01.100
Then another reason
is actually usability.

01:06:01.100 --> 01:06:04.920
So a lot of these
VM and OS-level

01:06:04.920 --> 01:06:07.260
solutions in private
browsing-- as we've discussed,

01:06:07.260 --> 01:06:08.960
they make it more
difficult for users

01:06:08.960 --> 01:06:12.160
to persist state from
private browsing sessions

01:06:12.160 --> 01:06:15.830
that they do actually want to
persist like downloading files

01:06:15.830 --> 01:06:19.480
like bookmarks they create
and things like that.

01:06:19.480 --> 01:06:21.539
So basically the browser
vendors say, well,

01:06:21.539 --> 01:06:23.580
if we implement private
browsing modes ourselves,

01:06:23.580 --> 01:06:25.730
we can actually allow
users to do those things.

01:06:25.730 --> 01:06:27.780
We can allow users to
take downloaded files

01:06:27.780 --> 01:06:29.470
from private browsing
mode and take them

01:06:29.470 --> 01:06:30.594
to the rest of the machine.

01:06:30.594 --> 01:06:32.710
So that seems nice at first.

01:06:32.710 --> 01:06:35.635
But note that, of course,
that allowing users

01:06:35.635 --> 01:06:37.740
to export some type
of private state

01:06:37.740 --> 01:06:40.490
actually opens up a lot of
security vulnerabilities.

01:06:40.490 --> 01:06:43.420
It makes it very difficult to
analyze security properties

01:06:43.420 --> 01:06:49.770
that result in private browsing
modes actually provide.

01:06:49.770 --> 01:06:53.665
And so in the paper,
they try to characterize

01:06:53.665 --> 01:06:57.900
the different types of browser
states that can be modified

01:06:57.900 --> 01:07:01.400
and how current private
browsing modes actually handle

01:07:01.400 --> 01:07:03.080
the modifications at stake.

01:07:03.080 --> 01:07:06.292
So the paper describes
this taxonomy

01:07:06.292 --> 01:07:12.122
of browser state changes.

01:07:12.122 --> 01:07:14.540
And so there are four
things in the taxonomy.

01:07:14.540 --> 01:07:22.110
So one type of state
change is initiated

01:07:22.110 --> 01:07:25.109
by the website itself.

01:07:25.109 --> 01:07:26.442
And there's no user interaction.

01:07:29.750 --> 01:07:33.880
And so examples of this
type of state change

01:07:33.880 --> 01:07:37.580
think about stuff like
when a cookie gets

01:07:37.580 --> 01:07:43.482
set-- when something
gets added to the address

01:07:43.482 --> 01:07:49.250
history of the browser--
maybe within a browser

01:07:49.250 --> 01:07:52.270
cache or something.

01:07:52.270 --> 01:07:56.210
And so from this type
of state, basically,

01:07:56.210 --> 01:07:57.935
private browsing
mode says this state

01:07:57.935 --> 01:08:01.240
is a private browsing
mode session.

01:08:01.240 --> 01:08:03.300
But it basically is
going to be destroyed

01:08:03.300 --> 01:08:05.982
when that private browsing
session concludes.

01:08:05.982 --> 01:08:10.419
And so the intuition behind
this is that because there

01:08:10.419 --> 01:08:14.158
is no user interaction
in creating this state,

01:08:14.158 --> 01:08:16.241
then perhaps the right
thing for the browser to do

01:08:16.241 --> 01:08:21.050
is assume that the user
wouldn't want that to persist.

01:08:21.050 --> 01:08:25.094
So another type of
browser state change

01:08:25.094 --> 01:08:32.569
is initiated by the website
that the user is visiting.

01:08:32.569 --> 01:08:37.314
But there is some type of
user interaction involved

01:08:37.314 --> 01:08:40.189
in the state change.

01:08:40.189 --> 01:08:45.234
So an example of this might
be the user installs client

01:08:45.234 --> 01:08:53.359
certificate or maybe
there's a safe password.

01:08:53.359 --> 01:08:57.920
So the user tries to
login to something.

01:08:57.920 --> 01:09:00.130
And the browser says
very helpfully would you

01:09:00.130 --> 01:09:01.608
like to save this password?

01:09:01.608 --> 01:09:03.649
And then if the users
says, yes, then these types

01:09:03.649 --> 01:09:05.616
of things, say
passwords, can actually

01:09:05.616 --> 01:09:08.970
be used outside of the
private browsing mode.

01:09:08.970 --> 01:09:12.460
And so it's a little
bit unclear in principle

01:09:12.460 --> 01:09:14.927
what the policy
for this should be.

01:09:14.927 --> 01:09:16.950
So what ends up
happening in practice

01:09:16.950 --> 01:09:20.260
is that browsers
typically allow statements

01:09:20.260 --> 01:09:23.095
in this category that set
in private browsing modes

01:09:23.095 --> 01:09:26.200
to persist outside of
that private browsing mode

01:09:26.200 --> 01:09:29.995
under the intuition that the
user did have to say yes or no.

01:09:29.995 --> 01:09:31.744
If the user said, yes,
then maybe the user

01:09:31.744 --> 01:09:35.689
is smart enough to
understand that they

01:09:35.689 --> 01:09:38.066
save some password
for some unsavory site

01:09:38.066 --> 01:09:39.649
and someone comes
on later and figures

01:09:39.649 --> 01:09:42.950
that out, that's the users
fault-- not the browsers fault.

01:09:42.950 --> 01:09:45.330
So it's a little unclear
what the best policy is here.

01:09:45.330 --> 01:09:46.795
But in practice, this
type of state change

01:09:46.795 --> 01:09:49.086
is allowed to persist outside
of private browsing mode.

01:09:52.360 --> 01:09:54.860
So there's another type
of state change, which is

01:09:54.860 --> 01:09:59.790
purely initiated by the user.

01:09:59.790 --> 01:10:05.590
And so here you can think about
things like setting a bookmark

01:10:05.590 --> 01:10:08.420
or maybe downloading files.

01:10:11.800 --> 01:10:13.880
And so the story
for this state is

01:10:13.880 --> 01:10:15.700
similar to the story
for the state up here.

01:10:15.700 --> 01:10:18.440
So basically because
the user was explicitly

01:10:18.440 --> 01:10:20.492
involved in the
creation of that state.

01:10:20.492 --> 01:10:22.408
Private browsing modes
typically say, OK, it's

01:10:22.408 --> 01:10:25.174
OK to persist these
types of changes

01:10:25.174 --> 01:10:29.040
to the outside world outside
of private browsing mode.

01:10:29.040 --> 01:10:31.450
Then there's some
sets of state which

01:10:31.450 --> 01:10:40.720
are actually unrelated to any
particular session at all.

01:10:40.720 --> 01:10:46.100
So this is stuff, for example,
like an update to the browser

01:10:46.100 --> 01:10:53.890
itself-- the actual binary
that constitutes the browser.

01:10:53.890 --> 01:10:56.327
And so the way the browser
vendors think about this state

01:10:56.327 --> 01:10:57.826
is this state is
essentially assumed

01:10:57.826 --> 01:11:01.651
to be part of the single,
global state that's

01:11:01.651 --> 01:11:04.540
available to both public
and private browsing modes.

01:11:04.540 --> 01:11:06.585
And so eventually,
if you look at it,

01:11:06.585 --> 01:11:09.210
there's actually quite a lot
of states that will actually

01:11:09.210 --> 01:11:11.930
potentially leak outside
of private browsing mode,

01:11:11.930 --> 01:11:14.539
particularly if there's
user volition involved.

01:11:14.539 --> 01:11:16.080
So it's interesting
to think about is

01:11:16.080 --> 01:11:22.150
this the right trade-off
between security and privacy?

01:11:22.150 --> 01:11:25.899
So what's interesting is
that-- so the paper actually

01:11:25.899 --> 01:11:32.045
says that it's difficult to
sort of prevent a local attacker

01:11:32.045 --> 01:11:34.136
from detecting whether
or not you've been

01:11:34.136 --> 01:11:35.700
using private browsing mode.

01:11:35.700 --> 01:11:37.200
And the paper was
a little bit vague

01:11:37.200 --> 01:11:38.810
about why this
might be the case.

01:11:38.810 --> 01:11:40.700
So one reason why
this might be the case

01:11:40.700 --> 01:11:43.457
is because some of this
state that actually leaks

01:11:43.457 --> 01:11:46.140
from private browsing mode
to public browsing mode,

01:11:46.140 --> 01:11:47.967
essentially it can
actually contain

01:11:47.967 --> 01:11:50.960
hints the state was generated
in private browsing mode.

01:11:50.960 --> 01:11:53.940
So for example, on
Firefox and Chrome,

01:11:53.940 --> 01:11:58.524
when you generate a bookmark
in private browsing mode,

01:11:58.524 --> 01:12:00.440
that bookmark has a bunch
of metadata with it.

01:12:00.440 --> 01:12:02.860
So for example, the
time that it was visited

01:12:02.860 --> 01:12:03.780
and things like that.

01:12:03.780 --> 01:12:06.350
So in many cases,
that metadata will

01:12:06.350 --> 01:12:08.682
be set to zero or
some null value

01:12:08.682 --> 01:12:11.140
if that bookmark was generated
inside of a private browsing

01:12:11.140 --> 01:12:12.133
mode.

01:12:12.133 --> 01:12:14.216
So then later on if someone
controls your machine,

01:12:14.216 --> 01:12:16.650
and they look at your
bookmark information--

01:12:16.650 --> 01:12:19.580
if they see this metadata set
to this zero and null value,

01:12:19.580 --> 01:12:22.590
they can say, aha, that
bookmark was probably generated

01:12:22.590 --> 01:12:25.140
in private browsing mode.

01:12:25.140 --> 01:12:28.775
So one thing to think
about is typically

01:12:28.775 --> 01:12:30.290
we talk about browser security.

01:12:30.290 --> 01:12:32.250
We talk about, OK,
what can people do

01:12:32.250 --> 01:12:34.356
with JavaScript or HTML or CSS.

01:12:34.356 --> 01:12:35.980
One thing you might
want to think about

01:12:35.980 --> 01:12:38.400
is, well, what can people do
with plug-ins or extensions?

01:12:38.400 --> 01:12:40.350
So in the context
of private browsing,

01:12:40.350 --> 01:12:41.910
plug-ins and
extensions are quite

01:12:41.910 --> 01:12:46.260
interesting because they're
not constrained in most cases

01:12:46.260 --> 01:12:48.020
by the same origin policy.

01:12:48.020 --> 01:12:49.840
They can constrain
stuff like JavaScript.

01:12:49.840 --> 01:12:52.340
And what's interesting is that
these extensions and plug-ins

01:12:52.340 --> 01:12:54.875
typically run with
very high authority.

01:12:54.875 --> 01:12:57.500
Loosely speaking, you can think
of them as like kernel modules.

01:12:57.500 --> 01:12:59.020
They implement new
features directly

01:12:59.020 --> 01:13:01.470
inside the browsers themselves.

01:13:01.470 --> 01:13:03.280
And so that's a
little bit problematic

01:13:03.280 --> 01:13:05.450
because these plug-ins
and extensions are often

01:13:05.450 --> 01:13:09.030
developed by someone who is
not the actual browser vendor.

01:13:09.030 --> 01:13:10.580
So what that means
is that someone

01:13:10.580 --> 01:13:12.496
is trying to do something
nice and provide you

01:13:12.496 --> 01:13:15.580
with this nice value add in this
browser plug in or extension.

01:13:15.580 --> 01:13:17.380
But that implementor
might not fully

01:13:17.380 --> 01:13:19.775
understand the context,
the security context,

01:13:19.775 --> 01:13:22.140
in which that extension runs.

01:13:22.140 --> 01:13:25.760
So that extension may not
implement private browsing mode

01:13:25.760 --> 01:13:26.450
semantics.

01:13:26.450 --> 01:13:29.710
Or it may try to implement
it to do it in a bad way.

01:13:29.710 --> 01:13:33.052
And so as I'll describe in
a couple of minutes, that's

01:13:33.052 --> 01:13:35.110
actually bad from the
security perspective

01:13:35.110 --> 01:13:37.401
because that means if we add
some of these new plug-ins

01:13:37.401 --> 01:13:39.100
or extensions, you
now can't strongly

01:13:39.100 --> 01:13:42.990
reason about what the
resulting [INAUDIBLE] are.

01:13:42.990 --> 01:13:45.430
Now, one thing that's nice
is that plug-ins are actually

01:13:45.430 --> 01:13:47.920
probably going the
way of dinosaurs.

01:13:47.920 --> 01:13:50.387
So as you probably know, HTML5
adds all these new features

01:13:50.387 --> 01:13:51.970
like the audio tag
and the videos tag,

01:13:51.970 --> 01:13:53.010
and stuff like that.

01:13:53.010 --> 01:13:56.440
And so a lot of these new
features were designed to allow

01:13:56.440 --> 01:13:58.030
people to get away
from plug-ins--

01:13:58.030 --> 01:14:01.745
to get away from Java--
to get away from Flash .

01:14:01.745 --> 01:14:03.610
So when people in the
past wanted do things

01:14:03.610 --> 01:14:06.560
like have rich 2D
or 3D graphics,

01:14:06.560 --> 01:14:08.660
they'd have to do something
like Java or Flash.

01:14:08.660 --> 01:14:10.460
Now they can use
things like Web GL.

01:14:10.460 --> 01:14:12.960
They can used things
like the canvass tag.

01:14:12.960 --> 01:14:14.980
So probably plug-ins
are going away.

01:14:14.980 --> 01:14:16.410
In fact, the IE
team for example,

01:14:16.410 --> 01:14:17.910
has said that in a
couple years they

01:14:17.910 --> 01:14:20.410
don't think anybody's going to
be using plug-ins whatsoever.

01:14:20.410 --> 01:14:22.246
It's all going to
be HTML5 type stuff.

01:14:22.246 --> 01:14:24.870
In fact, if you go to YouTube--
I don't know if you've noticed.

01:14:24.870 --> 01:14:26.650
But a lot of times if
you go to the video,

01:14:26.650 --> 01:14:30.250
the video is actually using--
it's called an HTML5 player.

01:14:30.250 --> 01:14:34.290
They've gone away from their
standard plugin-based one.

01:14:34.290 --> 01:14:35.415
So that's very interesting.

01:14:35.415 --> 01:14:37.600
You can already see
sites trying to move

01:14:37.600 --> 01:14:39.049
towards this new plug-in world.

01:14:39.049 --> 01:14:40.590
However, extensions
are probably here

01:14:40.590 --> 01:14:42.423
to stay for at least
the foreseeable future.

01:14:42.423 --> 01:14:45.409
So it's still important
to get those right.

01:14:45.409 --> 01:14:47.450
So, yeah, the last thing
that I wanted to discuss

01:14:47.450 --> 01:14:51.340
is a paper was written in
2010-- that's four years ago.

01:14:51.340 --> 01:14:52.930
So you might think
to yourself what's

01:14:52.930 --> 01:14:55.250
changed about private browsing?

01:14:55.250 --> 01:14:57.470
And so at a high level,
private browsing mode

01:14:57.470 --> 01:14:59.580
is still tricky to get right.

01:14:59.580 --> 01:15:02.370
And the reason why it's
tricky to get right--

01:15:02.370 --> 01:15:03.220
a couple of reasons.

01:15:03.220 --> 01:15:05.430
So first of all, because
the browser [INAUDIBLE]

01:15:05.430 --> 01:15:10.560
is still growing because of
things like this HTML5 stuff.

01:15:10.560 --> 01:15:13.500
The interface, which needs
to be secure with respect

01:15:13.500 --> 01:15:15.505
to private browsing
mode, that frontier

01:15:15.505 --> 01:15:17.160
is always getting bigger.

01:15:17.160 --> 01:15:19.230
And also a lot of
times developers--

01:15:19.230 --> 01:15:22.950
they are more focused on to
adding cool, new features.

01:15:22.950 --> 01:15:24.360
And then the
privacy implications

01:15:24.360 --> 01:15:26.340
get taken up later on.

01:15:26.340 --> 01:15:29.377
And so in practice, it is
still tricky to produce

01:15:29.377 --> 01:15:31.710
a private browsing mode which
catches all potential data

01:15:31.710 --> 01:15:33.430
leaks.

01:15:33.430 --> 01:15:37.600
So one example, there
was a Firefox bug fix

01:15:37.600 --> 01:15:39.680
from January, 2014.

01:15:39.680 --> 01:15:44.060
And the basic idea is
there is this extension--

01:15:44.060 --> 01:15:49.050
it's called pdf.js
is basically a way

01:15:49.050 --> 01:15:55.020
to look at PDF files using
pure HTML5 interfaces.

01:15:55.020 --> 01:15:58.280
And so as it turns
out, this extension

01:15:58.280 --> 01:16:03.010
was allowing public mode cookies
to leak when it was being

01:16:03.010 --> 01:16:06.446
used in private browsing mode.

01:16:06.446 --> 01:16:08.440
The idea is that let's
say that you visit

01:16:08.440 --> 01:16:10.600
some websites in public mode.

01:16:10.600 --> 01:16:11.850
You want to download some PDF.

01:16:11.850 --> 01:16:13.470
Maybe you get some
cookie that comes back.

01:16:13.470 --> 01:16:15.180
You come back in
private browsing mode.

01:16:15.180 --> 01:16:17.850
You want to view another
PDF from that site.

01:16:17.850 --> 01:16:20.215
And then pdf.js is actually
sending those public mode

01:16:20.215 --> 01:16:23.800
cookies along with any private
mode things that were set.

01:16:23.800 --> 01:16:26.110
And so in the lecture
notes, I actually

01:16:26.110 --> 01:16:29.639
have a link to the
bugzilla discussion

01:16:29.639 --> 01:16:30.680
about the particular bug.

01:16:30.680 --> 01:16:32.600
So the fix was
actually quite simple

01:16:32.600 --> 01:16:34.267
once they realized
this was the problem.

01:16:34.267 --> 01:16:36.100
Basically they just
have to add a check that

01:16:36.100 --> 01:16:38.680
says morally speaking, am
I in private browsing mode?

01:16:38.680 --> 01:16:41.020
If so, do some things--
and one of those things

01:16:41.020 --> 01:16:43.140
is not from the cookies.

01:16:43.140 --> 01:16:45.630
So the fix here is
actually quite simple.

01:16:45.630 --> 01:16:49.070
But the challenge was
that once again, people

01:16:49.070 --> 01:16:51.500
added this cool, new extension.

01:16:51.500 --> 01:16:53.920
But it hadn't really
crossed their mind

01:16:53.920 --> 01:16:57.590
to do this full, invasive audit.

01:16:57.590 --> 01:17:00.270
And say where are all
the places at which

01:17:00.270 --> 01:17:03.720
private browsing with
semantics might be impacted

01:17:03.720 --> 01:17:05.445
by this particular plug-in.

01:17:05.445 --> 01:17:06.930
There's another
interesting one too

01:17:06.930 --> 01:17:09.405
this is actually
the discussion we

01:17:09.405 --> 01:17:11.751
had about 30 minutes ago
about what happens if you have

01:17:11.751 --> 01:17:14.250
private tabs and public tabs
where you open at the same time

01:17:14.250 --> 01:17:15.570
or very close to each other.

01:17:15.570 --> 01:17:18.080
There is actually
a bug in Firefox.

01:17:18.080 --> 01:17:19.870
I think that's from--
let's see here--

01:17:19.870 --> 01:17:22.750
yeah, 2011, which
is still unfilled.

01:17:22.750 --> 01:17:24.360
And the basic idea
is that if you

01:17:24.360 --> 01:17:27.740
go to a task in private
browsing mode-- OK,

01:17:27.740 --> 01:17:28.655
you go do some stuff.

01:17:28.655 --> 01:17:31.875
You then close that tab.

01:17:31.875 --> 01:17:34.170
You then open a new
public mode tab.

01:17:34.170 --> 01:17:40.906
And you go to about:memory.

01:17:40.906 --> 01:17:43.354
So as you probably know, a
browser is defined as fake URLs

01:17:43.354 --> 01:17:45.520
and telling information
about how the browser works.

01:17:45.520 --> 01:17:47.706
So you go to the private
tab, close it up,

01:17:47.706 --> 01:17:49.539
then go to about:memory.

01:17:49.539 --> 01:17:51.080
This is going to
tell you information

01:17:51.080 --> 01:17:53.830
about all the objects that
Firefox has allocated.

01:17:53.830 --> 01:17:58.000
So what would happen is that
window objects are typically

01:17:58.000 --> 01:18:01.362
deallocated-- they are
[INAUDIBLE] in Firefox.

01:18:01.362 --> 01:18:03.820
So what ends up happening is
that when you open up that new

01:18:03.820 --> 01:18:06.880
public mode tab, go to
about:memory you can actually

01:18:06.880 --> 01:18:11.670
find information still about
that private mode window such

01:18:11.670 --> 01:18:13.087
as things like a
URL, for example,

01:18:13.087 --> 01:18:15.545
that will tell you how much
memory to allocate and all that

01:18:15.545 --> 01:18:16.140
kind of stuff.

01:18:16.140 --> 01:18:17.570
And it's all in the plain text.

01:18:17.570 --> 01:18:20.762
And so that's an example
of how these very subtle

01:18:20.762 --> 01:18:22.470
interfaces and browsers
that can actually

01:18:22.470 --> 01:18:24.340
leak a lot of information.

01:18:24.340 --> 01:18:26.395
And so it's very interesting.

01:18:26.395 --> 01:18:28.020
If you look at the
bugzilla discussion,

01:18:28.020 --> 01:18:31.244
it's actually pretty interesting
to see how these problems get

01:18:31.244 --> 01:18:32.160
resolved in real life.

01:18:32.160 --> 01:18:35.170
And I put a link it
so there is a message

01:18:35.170 --> 01:18:39.025
that this book was deprioritized
when it became clear

01:18:39.025 --> 01:18:42.020
that the potential solution
was more involved than

01:18:42.020 --> 01:18:44.552
originally anticipated.

01:18:44.552 --> 01:18:46.239
So that's a pretty
long discussion

01:18:46.239 --> 01:18:47.280
about how do we fix this.

01:18:47.280 --> 01:18:49.070
And it involved changing the
way that garbage collection is

01:18:49.070 --> 01:18:49.610
done.

01:18:49.610 --> 01:18:53.350
And it's very tricky because
if you invoke it too often

01:18:53.350 --> 01:18:55.100
then it gets performance.

01:18:55.100 --> 01:18:57.230
So there's this long
discussion about this.

01:18:57.230 --> 01:18:58.810
So they said, "It
was deprioritized

01:18:58.810 --> 01:19:00.809
when it was clear the
solution was more involved

01:19:00.809 --> 01:19:01.746
than anticipated."

01:19:01.746 --> 01:19:04.810
And then in response,
a developer said,

01:19:04.810 --> 01:19:06.250
"That is very sad to hear.

01:19:06.250 --> 01:19:08.100
This could pretty much
defeat the purpose

01:19:08.100 --> 01:19:10.440
of things like session
store for getting

01:19:10.440 --> 01:19:12.130
about closed private windows."

01:19:12.130 --> 01:19:14.977
So the developers
about this stuff.

01:19:14.977 --> 01:19:16.560
Like in the case of
the session store,

01:19:16.560 --> 01:19:19.780
this is storage
feature for HTML5--

01:19:19.780 --> 01:19:21.936
they had gone to
a lot of trouble

01:19:21.936 --> 01:19:25.780
to make it delete
things that belong

01:19:25.780 --> 01:19:28.320
to these closed private windows.

01:19:28.320 --> 01:19:30.440
But, basically, what
this bug did-- it still--

01:19:30.440 --> 01:19:32.060
it basically still
left information

01:19:32.060 --> 01:19:35.260
about that stuff sitting
around in memory somewhere.

01:19:35.260 --> 01:19:37.841
So long story short,
it's still very difficult

01:19:37.841 --> 01:19:39.090
to get private browsing right.

01:19:39.090 --> 01:19:41.765
And in fact, there are actually
off-the-shelf forensics tools

01:19:41.765 --> 01:19:43.431
that you can download
that will actually

01:19:43.431 --> 01:19:47.959
look for evidence of both public
and private browsing modes.

01:19:47.959 --> 01:19:49.625
So if you're an
attacker, you don't have

01:19:49.625 --> 01:19:50.910
to roll your own custom tool.

01:19:50.910 --> 01:19:52.436
There's this one
they call Magnet.

01:19:52.436 --> 01:19:54.615
I think it's an internet
evidence finder.

01:19:54.615 --> 01:19:55.740
You just go get this thing.

01:19:55.740 --> 01:19:57.670
It'll do things like
look through your page

01:19:57.670 --> 01:19:59.090
file for RAM artifacts.

01:19:59.090 --> 01:20:00.730
And it will give
you a very nice GUI.

01:20:00.730 --> 01:20:02.570
It'll say here are
the images I found.

01:20:02.570 --> 01:20:04.540
Here are the URLs.

01:20:04.540 --> 01:20:07.240
So in practice, these
private browsing modes

01:20:07.240 --> 01:20:08.740
still do leak some information.

01:20:08.740 --> 01:20:11.190
All right, so next section,
we'll talk about Tor.