WEBVTT

00:00:00.080 --> 00:00:02.430
The following content is
provided under a Creative

00:00:02.430 --> 00:00:03.810
Commons license.

00:00:03.810 --> 00:00:06.060
Your support will help
MIT OpenCourseWare

00:00:06.060 --> 00:00:10.150
continue to offer high quality
educational resources for free.

00:00:10.150 --> 00:00:12.700
To make a donation or to
view additional materials

00:00:12.700 --> 00:00:16.600
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:16.600 --> 00:00:17.305
at ocw.mit.edu.

00:00:26.380 --> 00:00:28.855
PROFESSOR: All right, guys.

00:00:28.855 --> 00:00:30.850
Let's get started with
the next installment

00:00:30.850 --> 00:00:34.290
of our exciting journey
into computer security.

00:00:34.290 --> 00:00:36.665
Today, we're actually going
to talk about web security.

00:00:36.665 --> 00:00:39.760
Web security is, actually,
one of my favorite topics

00:00:39.760 --> 00:00:41.657
to talk about because
it really exposes you

00:00:41.657 --> 00:00:43.543
to the true horrors
of the world.

00:00:43.543 --> 00:00:45.126
It's very easy to
think, as a student,

00:00:45.126 --> 00:00:46.780
that everything will be
great when you graduate.

00:00:46.780 --> 00:00:48.516
Today's lecture and
the next lecture

00:00:48.516 --> 00:00:51.006
will be telling you that's,
in fact, not the case.

00:00:51.006 --> 00:00:52.030
Everything's terrible.

00:00:52.030 --> 00:00:53.760
So what Is the web?

00:00:53.760 --> 00:00:57.230
Well back in the olden days, the
web was, actually, much simpler

00:00:57.230 --> 00:00:58.630
than it is today, right.

00:00:58.630 --> 00:01:00.780
So clients, which is
to say the browsers,

00:01:00.780 --> 00:01:03.030
couldn't really do anything
with respect of displaying

00:01:03.030 --> 00:01:04.030
rigid or active content.

00:01:04.030 --> 00:01:06.540
Basically they could just get
static images, static text,

00:01:06.540 --> 00:01:07.960
and that was about it.

00:01:07.960 --> 00:01:10.512
Now the server side was
a little more interesting

00:01:10.512 --> 00:01:13.320
because even if there was static
content on a clients side.

00:01:13.320 --> 00:01:15.830
Maybe the server was
talking databases,

00:01:15.830 --> 00:01:18.700
maybe it was talking to other
machines on the server side.

00:01:18.700 --> 00:01:20.050
Things like that.

00:01:20.050 --> 00:01:22.950
So for a very long time,
the notion of web security,

00:01:22.950 --> 00:01:26.060
basically, meant looking at
what the server was doing.

00:01:26.060 --> 00:01:27.760
And to this point
in this class, we've

00:01:27.760 --> 00:01:29.430
essentially taken that approach.

00:01:29.430 --> 00:01:33.450
So we looked at things like
buffer to overflow attacks.

00:01:33.450 --> 00:01:35.890
So how clients can trick
the server into doing things

00:01:35.890 --> 00:01:37.140
the server doesn't want to do.

00:01:37.140 --> 00:01:39.500
You also looked
at the OKWS server

00:01:39.500 --> 00:01:43.750
and looked at how we can do
some privilege isolation there.

00:01:43.750 --> 00:01:46.230
So to this point, we, sort
of, looked at security

00:01:46.230 --> 00:01:50.489
through the experiences
that were actually

00:01:50.489 --> 00:01:52.530
experienced by the security
resources themselves.

00:01:52.530 --> 00:01:55.269
But now, actually, the
browser is very interesting

00:01:55.269 --> 00:01:56.810
to think about, in
terms of security,

00:01:56.810 --> 00:02:02.484
because the browser is super,
super complicated these days.

00:02:05.460 --> 00:02:07.950
So now there's all kinds
of insane, dynamic stuff

00:02:07.950 --> 00:02:09.350
that the browser
can actually do.

00:02:09.350 --> 00:02:13.320
So for example, you probably
heard of JavaScript.

00:02:13.320 --> 00:02:16.450
So JavaScript now
allows pages to execute

00:02:16.450 --> 00:02:18.465
client side code,
Turing complete,

00:02:18.465 --> 00:02:20.350
can do all kinds of wacky stuff.

00:02:20.350 --> 00:02:22.320
There is the DOM
model, which we'll

00:02:22.320 --> 00:02:25.100
talk about in more
depth later today.

00:02:25.100 --> 00:02:27.350
The DOM model, essentially,
allows JavaScript code

00:02:27.350 --> 00:02:31.480
to dynamically change the
visual appearance of the page.

00:02:31.480 --> 00:02:36.166
Fiddle with things like font
stylings and stuff like that.

00:02:36.166 --> 00:02:40.630
There's XML HTTP request.

00:02:40.630 --> 00:02:44.250
These are, basically,
a way for JavaScript

00:02:44.250 --> 00:02:47.350
to asynchronously fetch
contents from servers.

00:02:47.350 --> 00:02:53.520
You may also hear XML HTTP
requests referred to as AJAX.

00:02:53.520 --> 00:02:56.030
Asynchronous
JavaScript fetching.

00:02:56.030 --> 00:02:58.760
There are things
like web sockets.

00:02:58.760 --> 00:03:02.780
This is, actually,
recently introduced API.

00:03:02.780 --> 00:03:05.960
So WebSockets, essentially,
allow a full duplex

00:03:05.960 --> 00:03:08.260
communication between
clients and servers.

00:03:08.260 --> 00:03:09.920
Communication going both ways.

00:03:09.920 --> 00:03:12.610
We've got all kinds
of multimedia support.

00:03:16.230 --> 00:03:22.630
So for example, we have
things like the video tag,

00:03:22.630 --> 00:03:26.167
which allows a web
page to play video

00:03:26.167 --> 00:03:27.250
without using a Flash app.

00:03:27.250 --> 00:03:30.110
It can actually just
play that video natively.

00:03:30.110 --> 00:03:34.170
There's also a geolocation.

00:03:34.170 --> 00:03:39.180
So now a web page can actually
determine, physically,

00:03:39.180 --> 00:03:40.190
where you are.

00:03:40.190 --> 00:03:42.680
For example, if you're running
a web page on a smartphone,

00:03:42.680 --> 00:03:45.360
the browser can actually
access your GPS unit.

00:03:45.360 --> 00:03:48.550
If you're accessing a
page on a desktop browser,

00:03:48.550 --> 00:03:51.460
it can actually look at
your Wi-Fi connection

00:03:51.460 --> 00:03:54.310
and connect to Google's
Wi-Fi geolocation service

00:03:54.310 --> 00:03:56.130
to figure out where
exactly you are.

00:03:56.130 --> 00:03:57.130
That's, kind of, insane.

00:03:57.130 --> 00:03:57.630
Right?

00:03:57.630 --> 00:04:00.470
But now web pages can do
do that kind of stuff.

00:04:00.470 --> 00:04:05.000
So we've also talked
about things like NaCl,

00:04:05.000 --> 00:04:09.300
for example, which allows
browsers to run native code.

00:04:09.300 --> 00:04:11.371
So there's many,
many other features

00:04:11.371 --> 00:04:12.620
that I haven't mentioned here.

00:04:12.620 --> 00:04:14.240
But suffice it to
say the browser

00:04:14.240 --> 00:04:16.480
is now incredibly complicated.

00:04:16.480 --> 00:04:19.750
So what does this mean from
the perspective of security?

00:04:19.750 --> 00:04:22.140
Well basically, it means
that we're screwed.

00:04:22.140 --> 00:04:22.640
Right?

00:04:22.640 --> 00:04:25.590
The thread surface for that
right there is enormous.

00:04:25.590 --> 00:04:28.580
And loosely speaking, when
you're thinking about security,

00:04:28.580 --> 00:04:31.460
you can think of a graph that,
sort of, looks like this.

00:04:31.460 --> 00:04:37.230
So you've got the
likelihood of correctness.

00:04:41.214 --> 00:04:43.630
And then, you've got the number
of features that you have.

00:04:48.430 --> 00:04:51.035
And so you know, this graph
starts up here at 100.

00:04:51.035 --> 00:04:53.630
Well of course, we
never even started 100,

00:04:53.630 --> 00:04:55.680
even with very simple
code because we can't even

00:04:55.680 --> 00:04:58.190
do bubble sort right.

00:04:58.190 --> 00:05:00.470
So essentially, that curve
looks something like this.

00:05:00.470 --> 00:05:03.490
And web browsers
are right over here.

00:05:03.490 --> 00:05:05.140
So as we'll discuss
today, There's

00:05:05.140 --> 00:05:09.210
all kinds of wacky security bugs
that are arising constantly.

00:05:09.210 --> 00:05:11.020
And as soon as the
old ones are fixed,

00:05:11.020 --> 00:05:12.660
new ones are rising
because people

00:05:12.660 --> 00:05:14.530
keep adding these new features.

00:05:14.530 --> 00:05:16.270
Oftentimes, without
thinking about what

00:05:16.270 --> 00:05:19.270
the security implications
of those features are.

00:05:19.270 --> 00:05:22.400
So if you think about what a
web application is these days,

00:05:22.400 --> 00:05:24.720
well it's this client thing
and it's a server thing.

00:05:24.720 --> 00:05:28.220
And a web application now spans
multiple programming languages,

00:05:28.220 --> 00:05:30.770
multiple machines, and
multiple hardware programs.

00:05:30.770 --> 00:05:32.972
You could be using
Firefox on Windows.

00:05:32.972 --> 00:05:35.430
Then it's going to go talk to
a machine in the cloud that's

00:05:35.430 --> 00:05:36.030
running Linux.

00:05:36.030 --> 00:05:38.230
It's running the Apache server.

00:05:38.230 --> 00:05:41.460
Maybe it's running an ARM chip
opposed to x86 or something

00:05:41.460 --> 00:05:43.020
like that, or the
other way around.

00:05:43.020 --> 00:05:47.210
So long story short, there's all
these problems of composition.

00:05:47.210 --> 00:05:49.935
There's all these software
layers and all these hardware

00:05:49.935 --> 00:05:53.797
layers that all can impact
security in some way.

00:05:53.797 --> 00:05:54.880
But it's also complicated.

00:05:54.880 --> 00:05:58.470
It's not quite clear how we can
make sense of the entire whole.

00:05:58.470 --> 00:06:03.170
So for example, one common
problem with the web

00:06:03.170 --> 00:06:06.385
is this problem of
a parsing context.

00:06:10.220 --> 00:06:12.050
So as an example,
suppose that you

00:06:12.050 --> 00:06:16.220
had something in a page
that looked like this.

00:06:16.220 --> 00:06:19.260
You declare a script tag.

00:06:19.260 --> 00:06:22.470
Inside that script tag,
you declare a variable.

00:06:22.470 --> 00:06:24.440
There's some string here.

00:06:24.440 --> 00:06:29.700
And let's say that this string
comes from an untrusted party.

00:06:29.700 --> 00:06:34.810
Either the user or another
machine or something like that.

00:06:34.810 --> 00:06:36.710
And then, you close
that script tag.

00:06:40.360 --> 00:06:42.074
So this stuff is trusted.

00:06:42.074 --> 00:06:42.990
This stuff is trusted.

00:06:42.990 --> 00:06:44.280
This stuff is not trusted.

00:06:44.280 --> 00:06:45.390
So can anybody
figure out why there

00:06:45.390 --> 00:06:47.775
might be some problems here if
we take this entrusted string

00:06:47.775 --> 00:06:48.608
and put it in there?

00:06:51.422 --> 00:06:55.728
AUDIENCE: You can have a closing
quote mark in [INAUDIBLE]

00:06:55.728 --> 00:06:57.159
and then have some [INAUDIBLE].

00:06:57.159 --> 00:06:58.640
PROFESSOR: Right,
right, exactly.

00:06:58.640 --> 00:07:01.320
So the problem is there
are multiple context,

00:07:01.320 --> 00:07:04.590
that this untrusted code
could, sort of, break into.

00:07:04.590 --> 00:07:09.390
So for example, if the untrusted
code had a double quote here,

00:07:09.390 --> 00:07:14.056
now we've closed the definition
of this JavaScript string.

00:07:14.056 --> 00:07:16.055
So now we're added the
JavaScript string context

00:07:16.055 --> 00:07:18.570
and render the regular
JavaScript execution context.

00:07:18.570 --> 00:07:20.610
And then the attacker
gets a regular job

00:07:20.610 --> 00:07:22.540
zip code here and go to town.

00:07:22.540 --> 00:07:25.580
Alternatively, the
attacker could just

00:07:25.580 --> 00:07:31.220
put a closing script tag here.

00:07:31.220 --> 00:07:31.850
Right?

00:07:31.850 --> 00:07:35.270
And then, at that
point, the attacker

00:07:35.270 --> 00:07:38.820
can, sort of, get out of
the JavaScript context

00:07:38.820 --> 00:07:40.940
and then get into
the HTML context.

00:07:40.940 --> 00:07:44.250
Maybe to find some new HTML
nodes or something like that.

00:07:44.250 --> 00:07:46.090
So you see this problem
with composition

00:07:46.090 --> 00:07:48.185
all over the place in
the web because there

00:07:48.185 --> 00:07:49.810
are so many different
languages and run

00:07:49.810 --> 00:07:51.018
times for you to think about.

00:07:51.018 --> 00:07:54.575
HTML, CSS, JavaScript, maybe
MySQL on the server side,

00:07:54.575 --> 00:07:56.820
and so on and so forth.

00:07:56.820 --> 00:07:59.540
So this is just
a classic example

00:07:59.540 --> 00:08:02.240
of why you have to do something
called content standardization.

00:08:02.240 --> 00:08:05.410
So whenever you get
untrusted input from someone,

00:08:05.410 --> 00:08:07.700
you actually need to
analyze it very carefully

00:08:07.700 --> 00:08:11.720
to make sure that it's not being
used as a vector for an attack.

00:08:11.720 --> 00:08:14.420
So another reason why
web security so tricky

00:08:14.420 --> 00:08:17.510
is because the web
specifications are incredibly

00:08:17.510 --> 00:08:19.130
long, they're
incredibly tedious,

00:08:19.130 --> 00:08:21.857
they're incredibly boring, and
they're often inconsistent.

00:08:21.857 --> 00:08:23.440
So when I mean the
web specifications,

00:08:23.440 --> 00:08:26.000
I mean things like the
definition of JPEG,

00:08:26.000 --> 00:08:28.522
the definition of CSS,
the definition of HTML.

00:08:28.522 --> 00:08:29.980
These documents
are, like, the size

00:08:29.980 --> 00:08:33.480
of the EU constitution and
equally as easy to understand.

00:08:33.480 --> 00:08:36.130
So what ends up happening
is that the browser vendors

00:08:36.130 --> 00:08:37.549
see all these specs.

00:08:37.549 --> 00:08:40.080
And they essentially
say, OK, thanks for that.

00:08:40.080 --> 00:08:42.169
I'm going to do something
that somewhat resembles

00:08:42.169 --> 00:08:43.320
what these specs look like.

00:08:43.320 --> 00:08:44.610
Then they call it a day
and they laugh about it

00:08:44.610 --> 00:08:45.410
with their friends.

00:08:45.410 --> 00:08:48.160
OK, so what ends up happening
is that these specifications

00:08:48.160 --> 00:08:52.550
end up being like these vague,
aspirational documents that

00:08:52.550 --> 00:08:55.109
don't always accurately reflect
what real browsers are doing.

00:08:55.109 --> 00:08:57.150
And if you want to understand
the horror of this,

00:08:57.150 --> 00:08:59.450
you can go to this site
called quirksmode.org.

00:08:59.450 --> 00:09:01.050
I mean, don't go to this
site if you want to be happy.

00:09:01.050 --> 00:09:01.925
But you can go there.

00:09:01.925 --> 00:09:05.540
And it actually documents all of
these terrible inconsistencies

00:09:05.540 --> 00:09:08.030
that browsers have with
respect to what happens

00:09:08.030 --> 00:09:10.337
when the user hits a key press?

00:09:10.337 --> 00:09:12.670
There should just be one key
precedent that's generated.

00:09:12.670 --> 00:09:13.840
You are so wrong.

00:09:13.840 --> 00:09:15.680
So go to quirksmode.org
and check that out,

00:09:15.680 --> 00:09:17.075
and see what's going on.

00:09:17.075 --> 00:09:18.450
So anyway, in this
lecture, we're

00:09:18.450 --> 00:09:21.514
going to focus on the client
side of the web application.

00:09:21.514 --> 00:09:22.930
In particular,
we're going to look

00:09:22.930 --> 00:09:26.250
at how we can isolate
content from different web

00:09:26.250 --> 00:09:28.610
providers that has
to coexist, somehow,

00:09:28.610 --> 00:09:31.720
in the same machine
and the same browser.

00:09:31.720 --> 00:09:34.012
So at a high level, there's
this fundamental difference

00:09:34.012 --> 00:09:35.386
between the way
you traditionally

00:09:35.386 --> 00:09:37.362
think of a desktop
application and the way

00:09:37.362 --> 00:09:39.420
you think of a web application.

00:09:39.420 --> 00:09:42.490
Abstractly speaking, most of the
desktop applications that you

00:09:42.490 --> 00:09:45.320
use, you can think of it as
coming from a single principal.

00:09:45.320 --> 00:09:47.490
So word comes from Microsoft.

00:09:47.490 --> 00:09:49.925
And maybe TurboTax comes
from Mr. and Mrs. TurboTax,

00:09:49.925 --> 00:09:51.470
so on and so forth.

00:09:51.470 --> 00:09:54.870
But when you look at a
web application, something

00:09:54.870 --> 00:09:58.830
that looks to you, visually,
as a single application

00:09:58.830 --> 00:10:01.226
is actually composed of a
bunch of different content

00:10:01.226 --> 00:10:02.600
from a bunch of
different people.

00:10:02.600 --> 00:10:05.260
So you go to CNN, it looks
like it's all on one tab.

00:10:05.260 --> 00:10:08.670
But each of those visual
things that you see

00:10:08.670 --> 00:10:10.420
may, in fact, come
from someone different.

00:10:10.420 --> 00:10:15.530
So let's just look at a
very simple example here.

00:10:15.530 --> 00:10:20.020
So let's say that we were
looking at the following site.

00:10:20.020 --> 00:10:24.654
So HTTP food.com.

00:10:24.654 --> 00:10:26.854
And we're just
looking at index.html.

00:10:30.730 --> 00:10:35.170
So you know, you look
at your browser tab.

00:10:35.170 --> 00:10:36.240
What might you see?

00:10:36.240 --> 00:10:41.830
So one thing that you might
see is an advertisement.

00:10:41.830 --> 00:10:43.230
So you might see
an advertisement

00:10:43.230 --> 00:10:45.660
in the form of a gift.

00:10:45.660 --> 00:10:49.082
And maybe that was
downloaded from ads.com.

00:10:51.974 --> 00:10:56.270
Then you also might see, let's
say, an analytics library.

00:10:58.980 --> 00:11:03.062
And maybe this comes
from google.com.

00:11:06.970 --> 00:11:09.470
So these libraries are very
popular for doing things

00:11:09.470 --> 00:11:11.760
like tracking how many
people have loaded your page,

00:11:11.760 --> 00:11:14.600
looking to see where
people click on things

00:11:14.600 --> 00:11:17.040
to see which parts of their
site are the most interesting

00:11:17.040 --> 00:11:19.550
for people to interact
with, so on and so forth.

00:11:19.550 --> 00:11:22.690
And you might also have
another JavaScript library.

00:11:22.690 --> 00:11:24.810
Let's say it's jQuery.

00:11:27.440 --> 00:11:33.370
And maybe that comes
from cdn.foo.com.

00:11:33.370 --> 00:11:38.190
So some content distribution
network that foo.com runs.

00:11:38.190 --> 00:11:40.470
jQuery is very popular
library for doing things

00:11:40.470 --> 00:11:41.822
like GUI manipulation.

00:11:41.822 --> 00:11:42.530
Things like that.

00:11:42.530 --> 00:11:44.670
So a lot of popular
websites have jQuery.

00:11:44.670 --> 00:11:47.140
Although, they serve it
from different places.

00:11:47.140 --> 00:11:53.170
And then, on this page
you might see some HTML.

00:11:53.170 --> 00:11:54.860
And here's where
you might see stuff

00:11:54.860 --> 00:12:02.250
like buttons for the user to
click on, text input, and so

00:12:02.250 --> 00:12:03.020
on and so forth.

00:12:05.700 --> 00:12:08.120
So that's just raw
HTML on the page.

00:12:08.120 --> 00:12:12.690
And then, you might
see what they call

00:12:12.690 --> 00:12:19.075
inline JavaScript from foo.com.

00:12:21.865 --> 00:12:27.480
In my inline, you
have a script tag.

00:12:27.480 --> 00:12:31.029
And then, you have
a closed script tag.

00:12:31.029 --> 00:12:32.820
And then you just have
some JavaScript code

00:12:32.820 --> 00:12:34.430
included in their directly.

00:12:34.430 --> 00:12:39.400
That's as opposed to where
you say something like script.

00:12:39.400 --> 00:12:43.780
And then, the source
equals something that

00:12:43.780 --> 00:12:45.199
lives on some server remotely.

00:12:45.199 --> 00:12:46.990
So this is what's called
inline JavaScript.

00:12:46.990 --> 00:12:49.115
This is what's referred to
as an externally defined

00:12:49.115 --> 00:12:49.830
JavaScript file.

00:12:49.830 --> 00:12:53.172
So you might have some inline
JavaScript there from foo.com.

00:12:53.172 --> 00:12:55.130
And the other thing that
you might have in here

00:12:55.130 --> 00:12:58.960
is actually a frame.

00:12:58.960 --> 00:13:01.680
So we'll talk about frames
a bit more in a little bit,

00:13:01.680 --> 00:13:04.450
but think of a frame as almost
like a separate JavaScript

00:13:04.450 --> 00:13:05.770
universe.

00:13:05.770 --> 00:13:08.960
It's a little bit equivalent
to a process and UNIX.

00:13:08.960 --> 00:13:13.500
So maybe this frame here,
maybe this guy belongs

00:13:13.500 --> 00:13:15.226
to https://facebook
.com/likethis.html.

00:13:30.690 --> 00:13:36.940
So maybe here we have
some inline JavaScript

00:13:36.940 --> 00:13:40.240
from Facebook.

00:13:40.240 --> 00:13:43.300
And then, maybe, we
also have some image.

00:13:43.300 --> 00:13:46.410
So you know, f.jpeg.

00:13:49.040 --> 00:14:00.140
That comes from
https://facebook.com.

00:14:00.140 --> 00:14:06.370
OK, so this is what a single
tab might have in its contents.

00:14:06.370 --> 00:14:08.052
But as I just
mentioned, all this

00:14:08.052 --> 00:14:10.510
can, potentially, come from
all these different principles.

00:14:10.510 --> 00:14:12.301
So there's a bunch of
interesting questions

00:14:12.301 --> 00:14:14.320
that we can ask about
a application that

00:14:14.320 --> 00:14:15.200
looks like this

00:14:15.200 --> 00:14:19.354
So for example, can this
analytics code from google.com

00:14:19.354 --> 00:14:21.660
actually access
JavaScript state that

00:14:21.660 --> 00:14:23.880
resides in the jQuery code.

00:14:23.880 --> 00:14:26.840
So to first approximation,
maybe that seems like a bad idea

00:14:26.840 --> 00:14:29.577
because these two pieces of
code came from different places.

00:14:29.577 --> 00:14:31.160
But then again, maybe
it's actually OK

00:14:31.160 --> 00:14:35.170
because, presumably, foo.com
brought both of these libraries

00:14:35.170 --> 00:14:37.270
in so that they can
work with each other.

00:14:37.270 --> 00:14:38.280
So who knows.

00:14:38.280 --> 00:14:40.370
Another question
you might have is

00:14:40.370 --> 00:14:43.090
can the analytics
code here actually

00:14:43.090 --> 00:14:44.880
interact with the
text inputs here.

00:14:44.880 --> 00:14:47.350
So for example, can
the analytics code

00:14:47.350 --> 00:14:49.460
define event handlers?

00:14:49.460 --> 00:14:51.720
So a little bit of
background in JavaScript.

00:14:51.720 --> 00:14:54.720
JavaScript is single
threaded vent driven model.

00:14:54.720 --> 00:14:56.280
So basically, in
each frame, there's

00:14:56.280 --> 00:14:58.970
just an event loop that's just
constantly pulling events.

00:14:58.970 --> 00:15:01.570
Key presses, network events
timers, and stuff like that.

00:15:01.570 --> 00:15:03.260
And then, seeing if there
are any handlers associated

00:15:03.260 --> 00:15:04.009
with those events.

00:15:04.009 --> 00:15:05.460
And if so, it fires them.

00:15:05.460 --> 00:15:08.800
So who should be able to define
event handlers for this HTML.

00:15:08.800 --> 00:15:10.510
Should google.com
be able to do it.

00:15:10.510 --> 00:15:14.520
It's not from foo.com
so maybe, maybe not.

00:15:14.520 --> 00:15:16.890
Another question, too, is
what's the relationship

00:15:16.890 --> 00:15:19.930
between this Facebook frame
here and the larger frame?

00:15:19.930 --> 00:15:23.680
The Facebook frame
is an HTTPS, secure.

00:15:23.680 --> 00:15:26.460
foo.com is an HTTP, nonsecure.

00:15:26.460 --> 00:15:29.090
So how should these two
things be able to interact?

00:15:29.090 --> 00:15:31.900
So basically, to
answer these questions,

00:15:31.900 --> 00:15:38.015
browsers use a security model
called the same-origin policy.

00:15:43.910 --> 00:15:47.294
So there's, sort
of, this vague goal

00:15:47.294 --> 00:15:49.460
because a lot of things
with respect to web security

00:15:49.460 --> 00:15:50.436
are, kind of, vague
because nobody

00:15:50.436 --> 00:15:51.477
knows what they're doing.

00:15:51.477 --> 00:15:58.140
But the basic idea
is two websites

00:15:58.140 --> 00:16:03.654
should not be able to
tamper with each other,

00:16:03.654 --> 00:16:05.272
unless they want to.

00:16:14.090 --> 00:16:19.860
So defining what tampering
means was actually easier

00:16:19.860 --> 00:16:21.300
when the web was simpler.

00:16:21.300 --> 00:16:23.032
But as we keep adding
these new APIs,

00:16:23.032 --> 00:16:24.990
it's more and more
difficult to understand what

00:16:24.990 --> 00:16:26.760
this non-tampering goal means.

00:16:26.760 --> 00:16:29.550
So for example,
it's obviously bad

00:16:29.550 --> 00:16:32.010
if two websites, which
don't trust each other,

00:16:32.010 --> 00:16:34.850
can over write o each
other's visual display.

00:16:34.850 --> 00:16:36.970
That seems like an
obviously bad thing.

00:16:36.970 --> 00:16:39.000
It seems like an
obviously good thing

00:16:39.000 --> 00:16:41.680
if two websites, which
want to collaborate,

00:16:41.680 --> 00:16:44.977
are able to, somehow,
exchange data in a safe way.

00:16:44.977 --> 00:16:47.310
So you can think of mash up
sites you may have heard of.

00:16:47.310 --> 00:16:49.040
So sometimes you'll see
these things in the internet.

00:16:49.040 --> 00:16:50.990
It's like someone
takes Google map data,

00:16:50.990 --> 00:16:52.995
and then takes the
location of food trucks.

00:16:52.995 --> 00:16:54.620
And then, you have
this amazing mash up

00:16:54.620 --> 00:16:57.140
that allows you to eat cheaply
and avoid salmonella, right?

00:16:57.140 --> 00:16:59.930
So that seems like a thing
you should be able to do.

00:16:59.930 --> 00:17:02.695
But how, exactly, do we enable
that type of composition?

00:17:02.695 --> 00:17:05.069
Then there's other things that
are, kind of, hard to say.

00:17:05.069 --> 00:17:07.910
So for example, if JavaScript
code comes from origin

00:17:07.910 --> 00:17:11.270
x inside of a page
that's from origin y,

00:17:11.270 --> 00:17:15.920
how exactly should that code
and that content compose?

00:17:15.920 --> 00:17:23.220
So the strategy that the
same-origin policy user can be

00:17:23.220 --> 00:17:25.579
roughly described as follows.

00:17:25.579 --> 00:17:38.830
So each resource is
assigned an origin, which

00:17:38.830 --> 00:17:41.680
we'll discuss in a second.

00:17:44.790 --> 00:17:49.740
And essentially,
a JavaScript code

00:17:49.740 --> 00:17:57.430
can only access resources
from its own origin.

00:18:05.820 --> 00:18:08.820
So this is the
high level strategy

00:18:08.820 --> 00:18:10.064
the same origin policy uses.

00:18:10.064 --> 00:18:11.355
But the devil's in the details.

00:18:11.355 --> 00:18:13.271
And there's the ton of
exceptions, which we're

00:18:13.271 --> 00:18:15.450
going to look into in a second.

00:18:15.450 --> 00:18:17.180
But first of all,
before we proceed,

00:18:17.180 --> 00:18:19.930
let's define what an origin is.

00:18:19.930 --> 00:18:29.310
So an origin is, basically,
a network protocol scheme

00:18:29.310 --> 00:18:36.140
plus a host name plus a port.

00:18:39.540 --> 00:18:44.952
So for example, we can have
something like HTTP foo.com.

00:18:47.664 --> 00:18:49.310
And then, maybe,
it's index.html.

00:18:55.130 --> 00:18:58.536
So the scheme here is HTTP.

00:18:58.536 --> 00:19:02.400
And the host name is foo.com.

00:19:02.400 --> 00:19:03.840
And the port is 80.

00:19:03.840 --> 00:19:06.530
Now the port, in this
case, is implicit.

00:19:06.530 --> 00:19:08.830
The port is the port
on the server side

00:19:08.830 --> 00:19:10.560
that the client uses to connect.

00:19:10.560 --> 00:19:13.490
So if you see a URL
from the HTTP scheme

00:19:13.490 --> 00:19:16.550
and there's no port that's
explicitly supplied, then,

00:19:16.550 --> 00:19:19.000
implicitly, that port is 80.

00:19:19.000 --> 00:19:26.220
So then, if we look at
something like the HTTPS,

00:19:26.220 --> 00:19:29.764
once again, foo.com index.html.

00:19:33.340 --> 00:19:37.270
So these two URLs have
the same host name.

00:19:37.270 --> 00:19:37.800
Right?

00:19:37.800 --> 00:19:40.880
But they have, actually,
different schemes.

00:19:40.880 --> 00:19:42.690
HTTPS vs HTTP.

00:19:42.690 --> 00:19:46.710
And also, here, the
port is implicitly 443.

00:19:46.710 --> 00:19:48.880
That's the default HTTPS port.

00:19:48.880 --> 00:19:51.940
So these two URLs have
different origins.

00:19:51.940 --> 00:19:54.840
And then, as a final
example, if you

00:19:54.840 --> 00:20:00.830
had a site like HTTP
bar.com, then you

00:20:00.830 --> 00:20:03.740
can use this colon
notation here.

00:20:03.740 --> 00:20:07.330
8181.

00:20:07.330 --> 00:20:09.680
You know, these
things beyond here

00:20:09.680 --> 00:20:12.915
don't matter with respect to
the same origin policy, at least

00:20:12.915 --> 00:20:15.150
with respect to this
very simple example.

00:20:15.150 --> 00:20:17.930
Here, we see that we have
a scheme of HTTP, a host

00:20:17.930 --> 00:20:22.230
name of bar.com, and here we've
explicitly specified the port.

00:20:22.230 --> 00:20:25.771
So in this case, it's a
non-default port of 8181.

00:20:25.771 --> 00:20:26.770
So does that make sense?

00:20:26.770 --> 00:20:29.480
It's pretty straightforward.

00:20:29.480 --> 00:20:33.970
OK, so this is, basically,
what an origin is.

00:20:33.970 --> 00:20:39.630
Loosely speaking, you can think
of an origin as a UID in Unix

00:20:39.630 --> 00:20:43.950
with the frame being loosely
considered as, like, a process.

00:20:43.950 --> 00:20:53.410
So there are four basic
ideas behind the browser's

00:20:53.410 --> 00:20:56.100
implementation of the
same origin policy.

00:20:56.100 --> 00:21:08.350
So first idea is each origin
has client side resources.

00:21:14.180 --> 00:21:17.590
So what are examples
of those resources?

00:21:17.590 --> 00:21:21.560
Things like cookies.

00:21:21.560 --> 00:21:25.170
Now you can think of
cookies as a very simple way

00:21:25.170 --> 00:21:29.560
to implement state in a
stateless protocol like HTTP.

00:21:29.560 --> 00:21:31.740
Basically, a cookie is
like a tiny file that's

00:21:31.740 --> 00:21:33.614
associated with each origin.

00:21:33.614 --> 00:21:35.780
And we'll talk about the
specifics of this in a bit.

00:21:35.780 --> 00:21:38.238
But the basic idea is that when
the browser sends a request

00:21:38.238 --> 00:21:40.960
to a particular website,
it includes any cookies

00:21:40.960 --> 00:21:43.320
that the client has
for that website.

00:21:43.320 --> 00:21:46.230
And you can use these
cookies for things

00:21:46.230 --> 00:21:48.385
like implementing
password remembering.

00:21:48.385 --> 00:21:50.480
Maybe if you were going
to an ecommerce site,

00:21:50.480 --> 00:21:53.935
you can remember stuff
about a user's shopping cart

00:21:53.935 --> 00:21:55.960
in these cookies,
so on and so forth.

00:21:55.960 --> 00:21:59.530
So cookies are one
thing that each origin

00:21:59.530 --> 00:22:01.070
can be associated with.

00:22:01.070 --> 00:22:04.180
Also, you can think
of DOM storage

00:22:04.180 --> 00:22:06.170
as another one of
these resources.

00:22:06.170 --> 00:22:08.350
This is a fairly new interface.

00:22:08.350 --> 00:22:11.820
But think of DOM storage
as just a key value store.

00:22:11.820 --> 00:22:14.600
So DOM storage allows
an origin to say,

00:22:14.600 --> 00:22:16.562
for this given key,
which is a string,

00:22:16.562 --> 00:22:18.020
let me associate
it with this given

00:22:18.020 --> 00:22:21.650
value, which is also a string.

00:22:21.650 --> 00:22:26.390
Another thing that is
social with an origin

00:22:26.390 --> 00:22:28.545
is a JavaScript name space.

00:22:32.810 --> 00:22:34.840
So that JavaScript
name space defines

00:22:34.840 --> 00:22:36.530
what functions and
what interfaces

00:22:36.530 --> 00:22:38.887
are available to the origin.

00:22:38.887 --> 00:22:40.470
Some of those
interfaces are built in.

00:22:40.470 --> 00:22:42.890
Like, let's say, the string
prototype and stuff like that.

00:22:42.890 --> 00:22:44.514
And then, an application
might actually

00:22:44.514 --> 00:22:47.620
fill the JavaScript namespace
with some other content.

00:22:47.620 --> 00:22:53.180
There's also this thing
called the DOM tree.

00:22:53.180 --> 00:22:56.580
So DOM is short for
Document Object Model.

00:22:56.580 --> 00:22:58.410
And the Dom tree
is, essentially,

00:22:58.410 --> 00:23:03.090
a JavaScript reflection
of the HTML in a page.

00:23:03.090 --> 00:23:07.410
So you can imagine
that the DOM tree

00:23:07.410 --> 00:23:14.690
has a node for the topmost
HTML5 node in the HTML.

00:23:14.690 --> 00:23:20.820
And then, it's going to have
a node for the head tag.

00:23:20.820 --> 00:23:24.470
Then, it's going to have
a node for the body tag.

00:23:27.176 --> 00:23:29.200
All right, so on and so forth.

00:23:29.200 --> 00:23:32.270
So the way that a lot
of dynamic web pages

00:23:32.270 --> 00:23:35.470
are made dynamic is
the JavaScript code

00:23:35.470 --> 00:23:37.630
can access this data
structure in JavaScript

00:23:37.630 --> 00:23:39.249
that mirrors the HTML content.

00:23:39.249 --> 00:23:41.040
So you can imagine an
animation takes place

00:23:41.040 --> 00:23:43.000
by changing some
of these nodes down

00:23:43.000 --> 00:23:46.670
here to implement different
organizations of various tabs.

00:23:46.670 --> 00:23:49.290
So that's what the DOM tree is.

00:23:49.290 --> 00:23:53.085
There's also a
visual display area.

00:23:57.398 --> 00:23:59.611
Although, we'll see that
the visual display area

00:23:59.611 --> 00:24:01.860
actually interacts very
strangely with the same origin

00:24:01.860 --> 00:24:02.910
policy.

00:24:02.910 --> 00:24:04.170
So on and so forth.

00:24:04.170 --> 00:24:06.950
So at high level,
each origin has access

00:24:06.950 --> 00:24:10.160
to some set of client side
resources of these types.

00:24:10.160 --> 00:24:13.290
Doe that make sense?

00:24:13.290 --> 00:24:21.920
And then, the second big
idea is that each frame

00:24:21.920 --> 00:24:28.100
gets the origin of its URL.

00:24:34.060 --> 00:24:35.790
So as I mentioned
before, a frame

00:24:35.790 --> 00:24:39.850
is, roughly, analogous
to a process in Unix.

00:24:39.850 --> 00:24:41.780
It's, kind of, like
a name space that

00:24:41.780 --> 00:24:45.700
aggregates a bunch of
other different resources.

00:24:45.700 --> 00:24:55.380
So third idea is that
scripts, so JavaScript code,

00:24:55.380 --> 00:25:09.700
execute with the authority
of it's frame's origin.

00:25:18.510 --> 00:25:22.990
OK, so what that means is that
foo.com imports a JavaScript

00:25:22.990 --> 00:25:24.130
file from bar.com.

00:25:24.130 --> 00:25:26.200
Well, that JavaScript
file is going

00:25:26.200 --> 00:25:30.780
to be able to act with
the authority of foo.com.

00:25:30.780 --> 00:25:34.125
So loosely speaking, this
is, sort of, similar to

00:25:34.125 --> 00:25:36.220
if you were in the
Unix world to run

00:25:36.220 --> 00:25:38.610
a binary that, sort of,
belonged in someone else's home

00:25:38.610 --> 00:25:39.380
directory.

00:25:39.380 --> 00:25:41.760
That thing would sort of,
execute, with your privileged

00:25:41.760 --> 00:25:43.650
there.

00:25:43.650 --> 00:25:50.020
And the fourth thing is
there's passive content.

00:25:50.020 --> 00:25:55.980
So by passive content I mean
things like that images,

00:25:55.980 --> 00:25:57.490
for example.

00:25:57.490 --> 00:26:00.217
Or CSS file or things like that.

00:26:00.217 --> 00:26:01.800
These are things,
which we don't think

00:26:01.800 --> 00:26:03.750
of as having executable code.

00:26:03.750 --> 00:26:08.800
So passive content gets zero
authority from the browser.

00:26:16.430 --> 00:26:19.070
So that, kind of, makes sense.

00:26:19.070 --> 00:26:21.270
We'll see why this fourth
thing is a little bit

00:26:21.270 --> 00:26:22.280
subtle in a second.

00:26:22.280 --> 00:26:25.080
So going back to
our example here.

00:26:25.080 --> 00:26:27.830
So we see, for example,
that the Google Analytics

00:26:27.830 --> 00:26:32.425
script and the jQuery script
can access all kinds of stuff

00:26:32.425 --> 00:26:33.630
in foo.com.

00:26:33.630 --> 00:26:35.970
So for example, they can
read and write cookies.

00:26:35.970 --> 00:26:39.440
They can do things like attach
event handlers to buttons here.

00:26:39.440 --> 00:26:41.500
So on and so forth.

00:26:41.500 --> 00:26:44.900
If we look at the Facebook
frame and its relationship

00:26:44.900 --> 00:26:47.090
to the larger foo.com
frame, then we

00:26:47.090 --> 00:26:49.440
see that they're from
different origins

00:26:49.440 --> 00:26:51.830
because they have
different schemes here.

00:26:51.830 --> 00:26:54.660
They have different host names.

00:26:54.660 --> 00:26:55.560
Different ports.

00:26:55.560 --> 00:26:58.630
So what this means is that they
are, to a first approximation,

00:26:58.630 --> 00:27:00.010
isolated.

00:27:00.010 --> 00:27:03.630
Now they can communicate
if they both opt

00:27:03.630 --> 00:27:07.885
into it using this interface
called postMessage.

00:27:12.540 --> 00:27:17.320
So postMessage allows
two different frames

00:27:17.320 --> 00:27:20.960
to exchange asynchronous
immutable messages

00:27:20.960 --> 00:27:21.750
with each other.

00:27:21.750 --> 00:27:25.080
So think of this facility
as allowing Facebook

00:27:25.080 --> 00:27:27.310
to try to send a string.

00:27:27.310 --> 00:27:30.860
Not a reference, a string up
to the enclosing foo.com frame.

00:27:30.860 --> 00:27:34.420
Now note that if foo.com doesn't
want to receive those messages,

00:27:34.420 --> 00:27:35.430
it doesn't have to.

00:27:35.430 --> 00:27:37.940
So this has to be opt
in from both sides

00:27:37.940 --> 00:27:41.220
to get this thing to work.

00:27:41.220 --> 00:27:45.880
So note that the JavaScript
code here in the Facebook frame

00:27:45.880 --> 00:27:51.860
cannot issue an XML HTTP
request to the foo.com server.

00:27:51.860 --> 00:27:54.460
That's once again because
network destinations also

00:27:54.460 --> 00:27:56.710
have these origins that
are associated with them.

00:27:56.710 --> 00:28:00.220
So because Facebook.com does not
have the same origin as foo.com

00:28:00.220 --> 00:28:05.610
it can't asynchronously fetch
stuff from it via HTML request.

00:28:05.610 --> 00:28:08.370
So the last thing
we can look at we

00:28:08.370 --> 00:28:10.480
can say, OK, we got an
image up here from ads.com.

00:28:10.480 --> 00:28:12.302
This is rule number
four over there.

00:28:12.302 --> 00:28:13.760
So it seems pretty
straightforward.

00:28:13.760 --> 00:28:14.690
This is an image.

00:28:14.690 --> 00:28:15.890
It has no executable code.

00:28:15.890 --> 00:28:18.350
So clearly, the browser's
going to give it no authority.

00:28:18.350 --> 00:28:20.320
Now that seems kind
of like a dumb thing.

00:28:20.320 --> 00:28:22.260
Like, why are you even
talking about images

00:28:22.260 --> 00:28:23.800
having authority or
not having authority?

00:28:23.800 --> 00:28:26.258
It seems obvious that images
shouldn't be able to do stuff.

00:28:26.258 --> 00:28:28.530
Well it's a security class.

00:28:28.530 --> 00:28:32.005
So clearly, there is mischief
that hides in statement number

00:28:32.005 --> 00:28:32.970
four up there.

00:28:32.970 --> 00:28:39.600
So what happens if the browser
incorrectly parses an object

00:28:39.600 --> 00:28:42.422
and misattributes it's type?

00:28:42.422 --> 00:28:44.630
So you can actually get into
security problems there.

00:28:44.630 --> 00:28:46.660
And this was actually a
real security problem.

00:28:46.660 --> 00:28:49.340
So there's this thing called
the MIME sniffing attack.

00:28:49.340 --> 00:28:50.876
So the MIME type--
I mean, you've

00:28:50.876 --> 00:28:52.000
probably seen these before.

00:28:52.000 --> 00:28:56.176
You knows it's Something
like text dot HTML

00:28:56.176 --> 00:28:58.360
or image.JPEG Things like that.

00:28:58.360 --> 00:29:00.240
This was like a MIME type.

00:29:00.240 --> 00:29:04.690
So old versions of i.e used to
do something that they thought

00:29:04.690 --> 00:29:06.470
was going to be helpful for you.

00:29:06.470 --> 00:29:08.410
So sometimes what
web servers will do

00:29:08.410 --> 00:29:13.519
is they will misattribute the
file extension of an object.

00:29:13.519 --> 00:29:15.310
So you can imagine that
a web server that's

00:29:15.310 --> 00:29:19.050
been configured incorrectly
might attach a dot HTML

00:29:19.050 --> 00:29:21.830
suffix to something
that's really an image.

00:29:21.830 --> 00:29:24.470
Or it might attach
a dot JPEG suffix

00:29:24.470 --> 00:29:26.910
to something that's really HTML.

00:29:26.910 --> 00:29:29.190
So what IE would do
back in the olden

00:29:29.190 --> 00:29:31.040
days is try to help you out.

00:29:31.040 --> 00:29:32.250
So IE would go out.

00:29:32.250 --> 00:29:34.270
It would go fetch this resource.

00:29:34.270 --> 00:29:37.020
And it would say,
OK, this resource

00:29:37.020 --> 00:29:39.840
claims to be of some type,
according to its file name

00:29:39.840 --> 00:29:40.570
extension.

00:29:40.570 --> 00:29:43.520
But then it would actually
look at the first 256 bytes

00:29:43.520 --> 00:29:45.620
of what was in that object.

00:29:45.620 --> 00:29:48.089
And if it found certain
magic values in there

00:29:48.089 --> 00:29:50.380
that indicated that there
was a different type for that

00:29:50.380 --> 00:29:54.440
object, it would just say, hey,
I found something cool here.

00:29:54.440 --> 00:29:56.630
The web server
misidentified the object.

00:29:56.630 --> 00:29:59.640
Let me just treat the
object like it's type

00:29:59.640 --> 00:30:01.779
that I found in these
first 256 bytes.

00:30:01.779 --> 00:30:03.570
And then, everybody's
a winner because I've

00:30:03.570 --> 00:30:05.028
helped the web
server developer out

00:30:05.028 --> 00:30:08.396
because now their website's
going to render properly.

00:30:08.396 --> 00:30:09.770
And the user's
going to like this

00:30:09.770 --> 00:30:11.290
because they get to
unlock this content that

00:30:11.290 --> 00:30:12.850
would have been garbage before.

00:30:12.850 --> 00:30:15.320
But this is clearly
a vulnerability

00:30:15.320 --> 00:30:20.260
because suppose that a page
includes some passive content.

00:30:20.260 --> 00:30:23.400
Like, let's say, an image
from a domain that's

00:30:23.400 --> 00:30:25.340
controlled by the attacker.

00:30:25.340 --> 00:30:28.750
Now from the perspective of
the victim page, it's saying,

00:30:28.750 --> 00:30:32.820
even if this attacker site is
evil, it's passive content.

00:30:32.820 --> 00:30:34.189
It can't do anything.

00:30:34.189 --> 00:30:36.230
Like, at worst, it displays
an unfortunate image.

00:30:36.230 --> 00:30:38.130
But it can't actually
access any code

00:30:38.130 --> 00:30:40.820
because passive content
gives 0 authority.

00:30:40.820 --> 00:30:44.790
But what would happen is that
IE could sniff this image.

00:30:44.790 --> 00:30:46.300
The first 256 bytes.

00:30:46.300 --> 00:30:48.230
And the attacker
could intentionally

00:30:48.230 --> 00:30:51.096
put HTML and
JavaScript in there.

00:30:51.096 --> 00:30:53.220
So what would happen is
that the victim site brings

00:30:53.220 --> 00:30:54.930
in what it thinks is an image.

00:30:54.930 --> 00:30:58.260
IE coerces it into
HTML and JavaScript.

00:30:58.260 --> 00:31:02.300
And then, executes that
code in the context

00:31:02.300 --> 00:31:04.790
of that enclosing page.

00:31:04.790 --> 00:31:07.360
So does that attack make sense?

00:31:07.360 --> 00:31:07.860
so

00:31:07.860 --> 00:31:12.420
This is, sort of, an example
of how complex browsers are

00:31:12.420 --> 00:31:17.000
and how adding even a very
well intentioned feature

00:31:17.000 --> 00:31:22.010
can cause these very
subtle security bugs.

00:31:22.010 --> 00:31:26.740
So let's now dig down
and take a deeper look

00:31:26.740 --> 00:31:29.870
at how the browser
secures various resources.

00:31:29.870 --> 00:31:36.515
So let's look at frames
and window objects.

00:31:42.100 --> 00:31:46.550
So frames represent these
separate JavaScript universes

00:31:46.550 --> 00:31:48.720
that we discussed over here.

00:31:48.720 --> 00:31:51.610
I mean, implementation
wise, a frame

00:31:51.610 --> 00:31:55.400
with respect to JavaScript
is an instance of a DOM node.

00:31:55.400 --> 00:31:57.010
So I forget where
I drew-- oh, yeah.

00:31:57.010 --> 00:31:58.030
This DOM node up here.

00:31:58.030 --> 00:32:01.340
So the frame would
exist as a DOM node

00:32:01.340 --> 00:32:03.080
object somewhere in
this hierarchy that's

00:32:03.080 --> 00:32:04.730
visible to JavaScript.

00:32:04.730 --> 00:32:07.900
In JavaScript, the window
object is actually an alias

00:32:07.900 --> 00:32:09.030
for the global name space.

00:32:09.030 --> 00:32:10.321
It's, kind of, this wacky idea.

00:32:10.321 --> 00:32:12.980
Like, if you were to find
this global variable name x,

00:32:12.980 --> 00:32:16.500
you can also access it
via the name window.x.

00:32:16.500 --> 00:32:19.260
OK, so basically, frames
and window objects

00:32:19.260 --> 00:32:22.450
are very powerful references
for you to be able to access.

00:32:22.450 --> 00:32:24.662
And they actually contain
pointers to each other.

00:32:24.662 --> 00:32:26.120
The frame can
[INAUDIBLE] a pointer

00:32:26.120 --> 00:32:28.479
to the associated window
object and vice versa.

00:32:28.479 --> 00:32:30.020
So these two things
are, essentially,

00:32:30.020 --> 00:32:31.130
equivalently powerful.

00:32:31.130 --> 00:32:43.220
So frame and window objects get
the origin of the framed URL.

00:32:49.910 --> 00:32:54.650
Or because there's always
an or in web security,

00:32:54.650 --> 00:33:10.530
they can get a suffix of
the original domain name.

00:33:10.530 --> 00:33:11.890
The original origin.

00:33:11.890 --> 00:33:18.200
So for example, a
frame could start off

00:33:18.200 --> 00:33:21.470
having an initial origin.

00:33:21.470 --> 00:33:26.770
x dot y dot z dot com.

00:33:26.770 --> 00:33:30.180
So let's ignore the scheme
and the protocol for a second.

00:33:30.180 --> 00:33:33.020
So initially, the page
can start off like this.

00:33:33.020 --> 00:33:39.470
It can then intentionally
say I want to set my origin

00:33:39.470 --> 00:33:41.782
to be y dot z dot com.

00:33:41.782 --> 00:33:42.820
A suffix of that.

00:33:42.820 --> 00:33:44.320
And the way that
it indicates this

00:33:44.320 --> 00:33:51.080
is by doing an assignment
to the special document

00:33:51.080 --> 00:33:56.090
dot domain value that's
accessible via JavaScript.

00:33:56.090 --> 00:33:59.600
So we can set document dot
domain explicitly to this right

00:33:59.600 --> 00:34:00.150
here.

00:34:00.150 --> 00:34:02.060
And that's allowable
because this guy

00:34:02.060 --> 00:34:04.160
is a suffix of that guy.

00:34:04.160 --> 00:34:07.880
And then, similarly,
it could also

00:34:07.880 --> 00:34:10.770
set document dot domain to
z.com and effectively reset

00:34:10.770 --> 00:34:12.770
it's origin like that.

00:34:12.770 --> 00:34:16.980
Now what it cannot do is
it cannot do something like

00:34:16.980 --> 00:34:23.729
setting document domain
to a dot y dot z dot com.

00:34:23.729 --> 00:34:25.270
That's disallowed
because this is not

00:34:25.270 --> 00:34:29.370
a problem this is not a proper
suffix of the original origin.

00:34:29.370 --> 00:34:35.536
And also, it cannot set
its suffix to dot com.

00:34:35.536 --> 00:34:39.510
So does anyone have any theories
about why this is a bad idea?

00:34:39.510 --> 00:34:40.270
Right, exactly.

00:34:40.270 --> 00:34:41.760
So people are laughing
because, clearly, this

00:34:41.760 --> 00:34:43.593
is going to bring out
the apocalypse, right.

00:34:43.593 --> 00:34:45.330
So if it does this,
then this means

00:34:45.330 --> 00:34:49.639
that the site could somehow
be able to impact cookies

00:34:49.639 --> 00:34:52.050
or things like that in
any dot com site, which

00:34:52.050 --> 00:34:53.250
will be pretty devastating.

00:34:53.250 --> 00:34:56.210
The motivation for why these
types of things are allowable

00:34:56.210 --> 00:34:59.910
is because, presumably,
these origins

00:34:59.910 --> 00:35:02.130
have some type of preexisting
trust relationship.

00:35:02.130 --> 00:35:04.330
So this seems to be vaguely OK.

00:35:04.330 --> 00:35:05.890
Whereas, this would
seem to be bad.

00:35:05.890 --> 00:35:07.650
AUDIENCE: So you can
make these splits

00:35:07.650 --> 00:35:10.999
on any dot or actual end point?

00:35:10.999 --> 00:35:12.811
Like, for example,
for your x.y.zz.com,

00:35:12.811 --> 00:35:14.956
can you change
that to your z.com?

00:35:14.956 --> 00:35:16.730
PROFESSOR: No, it
says on every dot.

00:35:16.730 --> 00:35:17.936
AUDIENCE: OK.

00:35:17.936 --> 00:35:20.150
Is there a reason
that it wasn't made

00:35:20.150 --> 00:35:27.560
so that you could specify
super- or subdomain,

00:35:27.560 --> 00:35:31.820
but somehow they had to agree
on where the information was

00:35:31.820 --> 00:35:33.050
coming from.

00:35:33.050 --> 00:35:36.370
So, like, you said some kind of
I want to consider all of these

00:35:36.370 --> 00:35:37.674
to be the same origin as me.

00:35:37.674 --> 00:35:39.940
So any of them can attack me.

00:35:39.940 --> 00:35:42.945
And then you made this
symmetric in order for me

00:35:42.945 --> 00:35:44.315
to impact them as well?

00:35:44.315 --> 00:35:48.170
[INAUDIBLE] .com means anything
that's .com can impact me.

00:35:48.170 --> 00:35:50.410
And then you put [INAUDIBLE].

00:35:50.410 --> 00:35:51.799
PROFESSOR: Yeah, it's tricky.

00:35:51.799 --> 00:35:53.840
So there's a couple of
different answers to that.

00:35:53.840 --> 00:35:55.770
So first of all, people were
very worried about this attack

00:35:55.770 --> 00:35:56.440
here.

00:35:56.440 --> 00:36:00.570
So they wanted to make
the domain manipulation

00:36:00.570 --> 00:36:03.540
language be, at least,
somewhat easy to understand.

00:36:03.540 --> 00:36:05.859
So they don't allow
more broke settings.

00:36:05.859 --> 00:36:08.150
I'll get to one thing in a
second, which kind of allows

00:36:08.150 --> 00:36:10.720
what you're talking about but
only with respect to domain

00:36:10.720 --> 00:36:11.220
[INAUDIBLE].

00:36:11.220 --> 00:36:12.370
I'll get to that in one second.

00:36:12.370 --> 00:36:15.070
And another to mention, too, is
that the post message interface

00:36:15.070 --> 00:36:18.230
does allow arbitrary domains
to communicate with each other

00:36:18.230 --> 00:36:20.080
if they both opt into it.

00:36:20.080 --> 00:36:22.700
So in practice, people
use post message

00:36:22.700 --> 00:36:25.040
to cross domain
communication if they

00:36:25.040 --> 00:36:27.510
can't set their origins
to be the same using

00:36:27.510 --> 00:36:30.060
these tricks here.

00:36:30.060 --> 00:36:35.780
So yeah, so browsers
can constrain or widen,

00:36:35.780 --> 00:36:37.880
I should say, their
domain to these suffixes

00:36:37.880 --> 00:36:39.404
of the original domain.

00:36:39.404 --> 00:36:41.570
And there's also this little
interesting quark here,

00:36:41.570 --> 00:36:45.980
which is that browsers actually
distinguish between a document

00:36:45.980 --> 00:36:48.150
dot domain value
that has been written

00:36:48.150 --> 00:36:50.306
and one that has not
been written, OK.

00:36:50.306 --> 00:36:51.805
And there's a subtle
reason for this

00:36:51.805 --> 00:36:52.930
we'll get into in a second.

00:36:52.930 --> 00:37:03.100
So basically, two frames
can access each other

00:37:03.100 --> 00:37:06.840
if one of two things is true.

00:37:06.840 --> 00:37:13.290
The first thing is both of
the frames set document dot

00:37:13.290 --> 00:37:19.390
domain to the same value.

00:37:24.330 --> 00:37:27.630
And the other way that two
frames can access each other

00:37:27.630 --> 00:37:36.110
is that neither of those frames
has changed document domain.

00:37:42.310 --> 00:37:46.110
And of course, both
values have to match.

00:37:46.110 --> 00:37:49.278
And there's a value match.

00:37:52.090 --> 00:37:57.290
So the reason for
this is a bit subtle.

00:37:57.290 --> 00:38:02.540
But the basic idea is that
these two rules prevent a domain

00:38:02.540 --> 00:38:06.060
from being attacked by
one of its own buggy

00:38:06.060 --> 00:38:08.150
or malicious sub-domains.

00:38:08.150 --> 00:38:08.760
OK?

00:38:08.760 --> 00:38:13.110
So imagine that you have
the domain x.y.z.com.

00:38:16.540 --> 00:38:19.985
And then, imagine that it's
trying to attack y.z.com.

00:38:23.526 --> 00:38:29.040
So this guy up here
is buggy or evil.

00:38:32.080 --> 00:38:36.140
So what this guy could try to do
is actually shorten his domain

00:38:36.140 --> 00:38:36.730
to be y.z.com.

00:38:36.730 --> 00:38:40.320
And then, start messing
around with JavaScript state,

00:38:40.320 --> 00:38:42.170
or cookies or stuff
like that here.

00:38:42.170 --> 00:38:42.690
Right?

00:38:42.690 --> 00:38:45.420
So basically, what these
two rules over here will say

00:38:45.420 --> 00:38:49.600
is that if y.z.com does not
want to actually allow anyone

00:38:49.600 --> 00:38:51.910
to interact with
it, it will never

00:38:51.910 --> 00:38:54.560
change it's
document.domain value

00:38:54.560 --> 00:38:57.860
so that when this frame
up here does shorten it,

00:38:57.860 --> 00:38:59.340
the browser will say aha.

00:38:59.340 --> 00:39:00.700
You've shortened it.

00:39:00.700 --> 00:39:01.470
You have not.

00:39:01.470 --> 00:39:03.379
There's a match here
in terms of the values.

00:39:03.379 --> 00:39:04.920
But this person
hasn't indicated they

00:39:04.920 --> 00:39:08.209
want to opt into this
type of chicanery.

00:39:08.209 --> 00:39:09.250
So does that makes sense?

00:39:12.850 --> 00:39:18.610
OK, so that is, basically,
how frames work with respect

00:39:18.610 --> 00:39:19.860
to the same origin policy.

00:39:23.280 --> 00:39:27.200
So then we can look at how
our DOM node's treated.

00:39:27.200 --> 00:39:31.700
So DOM nodes, it's pretty
straightforward for DOM nodes.

00:39:31.700 --> 00:39:33.870
So DOM nodes, basically,
get the origin

00:39:33.870 --> 00:39:35.950
of their surrounding frame.

00:39:35.950 --> 00:39:37.140
Makes sense.

00:39:37.140 --> 00:39:38.590
Then we can look at cookies.

00:39:38.590 --> 00:39:44.770
Cookies are complicated
and a bit tricky.

00:39:44.770 --> 00:39:50.400
So cookies have a domain.

00:39:50.400 --> 00:39:52.555
And they have a path.

00:39:55.810 --> 00:40:01.040
So for example, you can imagine
a cookie might be associated

00:40:01.040 --> 00:40:02.680
with the following information.

00:40:02.680 --> 00:40:06.880
So asterisks dot MIT.edu.

00:40:06.880 --> 00:40:12.030
And then, 6.858.

00:40:12.030 --> 00:40:14.660
So you've got this domain
thing sitting here,

00:40:14.660 --> 00:40:18.000
and then, you've got this
path thing sitting over here.

00:40:18.000 --> 00:40:23.000
So note that this domain
can be, possibly, complete

00:40:23.000 --> 00:40:24.670
suffix of the pages
current domain.

00:40:24.670 --> 00:40:26.378
So you can play,
somewhat, similar tricks

00:40:26.378 --> 00:40:27.280
as we had over there.

00:40:27.280 --> 00:40:29.300
And note that this path
here can actually just

00:40:29.300 --> 00:40:33.690
be set just to the slash with
nothing else there, which

00:40:33.690 --> 00:40:37.280
indicates that all
paths in the domain

00:40:37.280 --> 00:40:40.230
should be able to have
access to this cookie here.

00:40:40.230 --> 00:40:41.940
But in this case,
we actually have

00:40:41.940 --> 00:40:43.800
one of these nonempty paths.

00:40:43.800 --> 00:40:46.500
So whoever sets this
cookie, basically,

00:40:46.500 --> 00:40:49.630
gets to choose what the
domain in the path look like.

00:40:49.630 --> 00:40:51.950
And it can actually
be set by the server

00:40:51.950 --> 00:40:54.190
or can be set on
the client side.

00:40:54.190 --> 00:40:56.260
So on the client side,
you can basically

00:40:56.260 --> 00:41:00.985
right to this JavaScript
object called document.cooking.

00:41:04.200 --> 00:41:06.540
And there's, sort of,
this Byzantine format

00:41:06.540 --> 00:41:08.426
that you can use to
indicate all these paths

00:41:08.426 --> 00:41:09.300
and things like that.

00:41:09.300 --> 00:41:11.320
But suffice to say
it can be done.

00:41:11.320 --> 00:41:13.280
So JavaScript can set
cookies like this.

00:41:13.280 --> 00:41:14.690
And also, the
server can actually

00:41:14.690 --> 00:41:18.880
set cookies on HP responses when
they come back over the wire.

00:41:18.880 --> 00:41:21.740
So you can, basically, just
use the set cookie header,

00:41:21.740 --> 00:41:24.590
if you're the server, to
set some of these things.

00:41:24.590 --> 00:41:30.530
And know that there's
also a secure flag

00:41:30.530 --> 00:41:34.520
that you can set in the cookie
to indicate that it's an HTTPS

00:41:34.520 --> 00:41:38.330
cookie, meaning that
HTTP content should not

00:41:38.330 --> 00:41:41.110
be able to access that cookie.

00:41:41.110 --> 00:41:45.210
So that's the basic
idea behind cookies.

00:41:45.210 --> 00:41:48.780
Now note that whenever the
browser generates a request

00:41:48.780 --> 00:41:50.580
to a particular web
server, it's going

00:41:50.580 --> 00:41:54.720
to include all of the matching
cookies in that request.

00:41:54.720 --> 00:41:56.540
So there's a little
bit of, sort of,

00:41:56.540 --> 00:41:58.093
string matching
and algorithms that

00:41:58.093 --> 00:42:00.289
have to take place to
figure out what are all

00:42:00.289 --> 00:42:01.830
the exact cookies
that should be sent

00:42:01.830 --> 00:42:03.180
to the service for
a particular request

00:42:03.180 --> 00:42:04.990
because you can have
all these weird,

00:42:04.990 --> 00:42:06.632
sort of, suffix
domain things going on

00:42:06.632 --> 00:42:07.590
and so on and so forth.

00:42:07.590 --> 00:42:12.890
But that's the basic
idea behind cookies.

00:42:12.890 --> 00:42:16.224
So does that all make sense?

00:42:16.224 --> 00:42:18.654
AUDIENCE: So can frames
access each other cookies

00:42:18.654 --> 00:42:21.084
if they match those rules?

00:42:21.084 --> 00:42:24.150
PROFESSOR: Yeah, so
frames can do that.

00:42:24.150 --> 00:42:28.430
But it's dependent on how the
document.domain has been set.

00:42:28.430 --> 00:42:32.315
And then, it's dependent
on what the cookie domain

00:42:32.315 --> 00:42:33.760
and path have been set.

00:42:33.760 --> 00:42:36.860
So yeah, after a bunch of
these strained comparisons,

00:42:36.860 --> 00:42:38.610
yes, frames can access
each others cookies

00:42:38.610 --> 00:42:39.810
if all those tests pass.

00:42:44.220 --> 00:42:47.400
OK, so yes, that leads me
into the next question.

00:42:47.400 --> 00:42:50.240
So we're trying to figure
out how different frames can

00:42:50.240 --> 00:42:51.580
access each others cookies.

00:42:51.580 --> 00:42:54.040
So what's the problem?

00:42:54.040 --> 00:42:56.910
What would be the problem is
we allowed arbitrary frames

00:42:56.910 --> 00:42:59.516
to write arbitrary
people's cookies?

00:42:59.516 --> 00:43:00.390
So what do you think?

00:43:06.513 --> 00:43:08.950
Well, it will be bad,
suffice it to say.

00:43:08.950 --> 00:43:11.150
The reason it would be bad
is because, once again,

00:43:11.150 --> 00:43:16.340
these cookies allow the
client side of the application

00:43:16.340 --> 00:43:18.940
to store a per user data.

00:43:18.940 --> 00:43:22.180
So you can imagine that if
an attacker could control

00:43:22.180 --> 00:43:24.960
or override a users cookie,
the attacker could actually,

00:43:24.960 --> 00:43:27.940
for example, change
that cookie for a Gmail

00:43:27.940 --> 00:43:33.140
to make the user log into
the attackers Gmail account.

00:43:33.140 --> 00:43:35.230
So when the user logged
into the attacker Gmail

00:43:35.230 --> 00:43:38.820
account, any email
that the user typed in

00:43:38.820 --> 00:43:40.680
could be read by the
attacker, for example.

00:43:40.680 --> 00:43:42.680
You could also imagine
that someone could tamper

00:43:42.680 --> 00:43:44.177
with the Amazon.com cookie.

00:43:44.177 --> 00:43:46.510
You know, put all kinds of
embarrassing ridiculous stuff

00:43:46.510 --> 00:43:49.066
in your shopping cart,
perhaps, or so and so forth.

00:43:49.066 --> 00:43:51.690
So cookies are, actually, a very
important resource to protect.

00:43:51.690 --> 00:43:54.580
And a lot of web
security attacks

00:43:54.580 --> 00:43:59.390
try to steal that cookie to
do various kinds of evil.

00:43:59.390 --> 00:44:01.750
So here's another
interesting question

00:44:01.750 --> 00:44:03.370
with respect to cookies.

00:44:03.370 --> 00:44:08.420
So let's say that you've
got the site that's

00:44:08.420 --> 00:44:12.290
coming from foo.co.uk.

00:44:14.800 --> 00:44:18.690
So should the site
from this host name

00:44:18.690 --> 00:44:24.120
be allowed to set
a cookie for co.uk?

00:44:26.630 --> 00:44:30.146
So this is a bit subtle
because, according

00:44:30.146 --> 00:44:31.520
to the rules that
we've discussed

00:44:31.520 --> 00:44:37.320
before, a site from here should
be able to shorten its domain,

00:44:37.320 --> 00:44:41.000
set a cookie for this, and
that all seems to be legal.

00:44:41.000 --> 00:44:42.760
Now of course, as
a human, we think

00:44:42.760 --> 00:44:45.430
this is kind of suspicious
because, as a human,

00:44:45.430 --> 00:44:48.820
we actually understand that
this is morally speaking

00:44:48.820 --> 00:44:51.790
a single atomic domain.

00:44:51.790 --> 00:44:54.640
Morally speaking, this
is equivalent to .com.

00:44:54.640 --> 00:44:55.640
The British got screwed.

00:44:55.640 --> 00:44:56.311
They have to have
a dot in there.

00:44:56.311 --> 00:44:58.186
But that's not their
fault. History's unfair.

00:44:58.186 --> 00:44:58.720
Right?

00:44:58.720 --> 00:45:02.470
So morally speaking,
this is a single domain.

00:45:02.470 --> 00:45:05.040
So you actually have to have
some special infrastructure

00:45:05.040 --> 00:45:08.260
to get the cookie setting
rules to work out correctly.

00:45:08.260 --> 00:45:12.400
So essentially, Mozilla,
they have this website

00:45:12.400 --> 00:45:15.430
called publicsuffix.org.

00:45:21.220 --> 00:45:25.030
And basically, what
this website contains

00:45:25.030 --> 00:45:29.360
are lists of these rules for
how cookies, and origins,

00:45:29.360 --> 00:45:32.480
and domains should be shrunk
given that some things might

00:45:32.480 --> 00:45:33.590
have dots in them.

00:45:33.590 --> 00:45:37.010
But actually, they should be
treated as a single, sort of,

00:45:37.010 --> 00:45:38.930
atomic thing.

00:45:38.930 --> 00:45:41.500
So actually, when your
browser is figuring out

00:45:41.500 --> 00:45:44.512
how it should do all these
various cookie manipulations,

00:45:44.512 --> 00:45:46.220
it's actually going
to consult this side.

00:45:46.220 --> 00:45:47.730
Or it's going to have
this baked in somehow

00:45:47.730 --> 00:45:49.230
or something like
that to make sure

00:45:49.230 --> 00:45:52.790
that foo.co.uk can't actually
just shorten its domain

00:45:52.790 --> 00:45:54.070
to co.uk.

00:45:54.070 --> 00:45:56.980
And then, perform
some chicanery.

00:45:56.980 --> 00:45:59.220
So once again, this
is very subtle.

00:45:59.220 --> 00:46:01.770
And a lot of the
interesting web security

00:46:01.770 --> 00:46:04.740
issues that we find
come about because a lot

00:46:04.740 --> 00:46:07.120
of the original infrastructure
was designed just

00:46:07.120 --> 00:46:08.590
for the English language.

00:46:08.590 --> 00:46:11.150
You know, for ASCII text
or something like this.

00:46:11.150 --> 00:46:15.460
It wasn't designed for an
international community.

00:46:15.460 --> 00:46:18.275
So as the internet became more
popular, people said, hey,

00:46:18.275 --> 00:46:20.150
we made some pretty big
design decisions here

00:46:20.150 --> 00:46:20.840
at the beginning.

00:46:20.840 --> 00:46:22.298
We should actually
make this usable

00:46:22.298 --> 00:46:25.181
on people who use our narrow
understanding of what language

00:46:25.181 --> 00:46:25.680
means.

00:46:25.680 --> 00:46:27.319
You run into all
these crazy problems.

00:46:27.319 --> 00:46:28.860
And I'll give you
another example one

00:46:28.860 --> 00:46:31.520
of those a later lecture.

00:46:31.520 --> 00:46:34.400
So does this all makes sense?

00:46:34.400 --> 00:46:36.220
OK.

00:46:36.220 --> 00:46:44.930
So with respect to
XML HTTP responses,

00:46:44.930 --> 00:46:50.740
how are they treated by
the same origin policy?

00:46:53.310 --> 00:46:58.510
So by default, JavaScript can
only generate one of these

00:46:58.510 --> 00:47:01.720
if it's going to
its origin server.

00:47:01.720 --> 00:47:05.500
However, there's this
new interface called

00:47:05.500 --> 00:47:08.476
cross origin request or CORS.

00:47:08.476 --> 00:47:13.970
All right, so this
is the same origin

00:47:13.970 --> 00:47:20.500
unless the server has
enabled this CORS thing.

00:47:24.120 --> 00:47:29.960
So basically, this adds a new
HTTP response header called

00:47:29.960 --> 00:47:36.480
access control allow origin.

00:47:42.100 --> 00:47:43.960
So let's say that
JavaScript from foo.com

00:47:43.960 --> 00:47:47.470
wants to make an XML
HTTP request to bar.com.

00:47:47.470 --> 00:47:51.280
So that's cross origin, as we
described in the rules so far.

00:47:51.280 --> 00:47:55.380
So if the server in bar.com
wants to allow this,

00:47:55.380 --> 00:47:59.220
it will return in it's HTTP
response this header here

00:47:59.220 --> 00:48:07.670
that's going to say, yes, I
allow, for example, foo.com

00:48:07.670 --> 00:48:13.220
to send me these cross
origin XML HTTP request.

00:48:13.220 --> 00:48:15.270
The server on bar.com
could actually say no.

00:48:15.270 --> 00:48:17.230
It could refuse the request.

00:48:17.230 --> 00:48:21.440
In which case, the browser
would fail the XML HTTP request.

00:48:21.440 --> 00:48:23.260
So this is, sort of,
a new thing that's

00:48:23.260 --> 00:48:27.270
come up in large part because
of these mash up applications.

00:48:27.270 --> 00:48:30.732
This need for,
somehow, applications

00:48:30.732 --> 00:48:32.690
from different developers
and different domains

00:48:32.690 --> 00:48:35.930
to be able to share data in
some type of constrained way.

00:48:35.930 --> 00:48:38.085
So this could also be
asterisks over here

00:48:38.085 --> 00:48:40.220
if anybody can fetch
the data cross-origin,

00:48:40.220 --> 00:48:42.630
so on and so forth.

00:48:42.630 --> 00:48:45.316
So I think that's
pretty straightforward.

00:48:45.316 --> 00:48:47.190
So I mean, there's a
bunch of other resources

00:48:47.190 --> 00:48:50.220
we could look at.

00:48:50.220 --> 00:48:52.030
For example, images.

00:48:52.030 --> 00:48:56.310
So a frame can load images from
any origin that it desires.

00:48:56.310 --> 00:49:01.044
But it can't actually inspect
the bits in that image

00:49:01.044 --> 00:49:02.710
because, somehow, the
same origin policy

00:49:02.710 --> 00:49:04.790
says that having
different origin

00:49:04.790 --> 00:49:07.870
directly inspect each others
content is a bad thing.

00:49:07.870 --> 00:49:10.390
So the frame can't
inspect the bits.

00:49:10.390 --> 00:49:12.005
But it can, actually,
infer things

00:49:12.005 --> 00:49:14.630
like what the size of the image
is because it can actually

00:49:14.630 --> 00:49:17.490
see where the other
dominoes in that page

00:49:17.490 --> 00:49:18.910
have been placed, for example.

00:49:18.910 --> 00:49:20.700
So this is another one of
these weird instances where

00:49:20.700 --> 00:49:22.390
the same origin
policy is ostensibly

00:49:22.390 --> 00:49:24.180
trying to prevent all
information leakage.

00:49:24.180 --> 00:49:25.805
But it can't actually
prevent all of it

00:49:25.805 --> 00:49:27.670
because embedding
inherently reveals

00:49:27.670 --> 00:49:29.960
some types of information.

00:49:29.960 --> 00:49:33.280
CSS has a similar
story to images.

00:49:33.280 --> 00:49:38.140
So a frame can embed
CSS from any origin.

00:49:38.140 --> 00:49:41.850
However, it cannot directly
inspect the text inside that

00:49:41.850 --> 00:49:44.150
CSS file, if it's from
a different origin.

00:49:44.150 --> 00:49:47.640
But it can actually imply what
this CSS does because it just

00:49:47.640 --> 00:49:49.130
can create a bunch of nodes.

00:49:49.130 --> 00:49:51.370
And then, see how they're
styling gets changed.

00:49:51.370 --> 00:49:53.740
So it's a bit wacky.

00:49:53.740 --> 00:49:59.020
JavaScript is actually
my favorite example

00:49:59.020 --> 00:50:01.440
of how this same
origin policy struggles

00:50:01.440 --> 00:50:04.550
to maintain any type of
intellectual consistency.

00:50:04.550 --> 00:50:08.740
So the idea here is that, if
you do a cross origin fetch

00:50:08.740 --> 00:50:10.980
of JavaScript, that is allowed.

00:50:10.980 --> 00:50:13.200
You can allow that
external JavaScript

00:50:13.200 --> 00:50:15.810
to execute in the
context of your own page.

00:50:15.810 --> 00:50:19.300
You cannot, however, look
at the source code for it.

00:50:19.300 --> 00:50:21.750
So if you have a
script tag source

00:50:21.750 --> 00:50:23.700
equals something
outside your domain,

00:50:23.700 --> 00:50:25.922
then when that
source gets executed,

00:50:25.922 --> 00:50:27.130
you can call functions in it.

00:50:27.130 --> 00:50:29.296
But you can't actually look
at the JavaScript source

00:50:29.296 --> 00:50:30.470
code in it.

00:50:30.470 --> 00:50:31.370
OK, fine.

00:50:31.370 --> 00:50:32.510
So that seems very nice.

00:50:32.510 --> 00:50:34.343
However, there are a
bunch of holes in this.

00:50:34.343 --> 00:50:38.040
So for example, JavaScript is
dynamic scripting language.

00:50:38.040 --> 00:50:40.920
And functions are
first class objects.

00:50:40.920 --> 00:50:47.090
So for any function f, you
can just call f.tostring.

00:50:47.090 --> 00:50:49.950
And that will give you the
source code for the function.

00:50:49.950 --> 00:50:51.590
And people do this all the time.

00:50:51.590 --> 00:50:54.654
Do things like dynamic
rewriting and stuff like that.

00:50:54.654 --> 00:50:56.070
So you know the
same origin policy

00:50:56.070 --> 00:50:57.880
doesn't allow you
to directly look

00:50:57.880 --> 00:51:00.140
at the contents of
the script tag itself?

00:51:00.140 --> 00:51:02.864
You can just call this
for any public function

00:51:02.864 --> 00:51:04.530
that that external
script has given you.

00:51:04.530 --> 00:51:06.600
And just get the
source code like that.

00:51:06.600 --> 00:51:08.190
Another thing you
could imagine doing

00:51:08.190 --> 00:51:11.540
is you could just get your
home server from your domain

00:51:11.540 --> 00:51:13.980
to just fetch the
source code for you.

00:51:13.980 --> 00:51:16.630
And then, just send
it back to you.

00:51:16.630 --> 00:51:17.275
So oops.

00:51:17.275 --> 00:51:19.400
I mean, you essentially
just asked your home server

00:51:19.400 --> 00:51:20.610
to run Wget.

00:51:20.610 --> 00:51:22.180
And you get the
source code that way.

00:51:22.180 --> 00:51:24.370
OK, so that's, kind
of think, goofy.

00:51:24.370 --> 00:51:27.290
So long story short,
the same origin policies

00:51:27.290 --> 00:51:28.556
here are a bit odd.

00:51:28.556 --> 00:51:30.903
AUDIENCE: Presume that
par of the reason they

00:51:30.903 --> 00:51:33.281
do it is to prevent the user
from fetching JavaScript

00:51:33.281 --> 00:51:35.030
because then cookies
will be sent as well.

00:51:35.030 --> 00:51:37.284
So you can get JavaScript
tailored to you.

00:51:37.284 --> 00:51:38.276
PROFESSOR: Yeah.

00:51:38.276 --> 00:51:40.260
AUDIENCE: So if you get your
server to fetch it for you,

00:51:40.260 --> 00:51:41.252
it won't have the user's
cookies [INAUDIBLE].

00:51:41.252 --> 00:51:42.640
PROFESSOR: That is true.

00:51:42.640 --> 00:51:44.790
Although, in practice,
a lot of times,

00:51:44.790 --> 00:51:49.119
the raw source code, itself, is
not user tailored in practice.

00:51:49.119 --> 00:51:50.660
But you're right
that it will prevent

00:51:50.660 --> 00:51:52.960
some cookie-mediated
attacks like that.

00:51:52.960 --> 00:51:54.630
Modulo, some of the
cookie [INAUDIBLE].

00:51:54.630 --> 00:51:57.160
But that's exactly correct.

00:51:57.160 --> 00:52:02.505
So because it's actually pretty
easy for users and applications

00:52:02.505 --> 00:52:04.809
to get JavaScript source
code, a lot of times,

00:52:04.809 --> 00:52:06.600
JavaScript source code,
when it's deployed,

00:52:06.600 --> 00:52:09.070
it's actually
obfuscated and minified.

00:52:09.070 --> 00:52:11.860
So if you've ever tried to look
and see how a web page works,

00:52:11.860 --> 00:52:13.800
if you look at the
source, sometimes people

00:52:13.800 --> 00:52:16.330
will do things like move
all the white space.

00:52:16.330 --> 00:52:18.440
They will also change
all the variable names

00:52:18.440 --> 00:52:21.070
to be super short and have
all these exclamation marks.

00:52:21.070 --> 00:52:23.940
Looks like cartoon characters
cursing in the cartoons.

00:52:23.940 --> 00:52:25.650
So that's, sort of,
like a cheat form

00:52:25.650 --> 00:52:27.290
of digital rights management.

00:52:27.290 --> 00:52:32.090
But it's all, ultimately,
a bit of a crap shoot

00:52:32.090 --> 00:52:34.490
because you can do
things like execute

00:52:34.490 --> 00:52:36.450
that code in your own browser.

00:52:36.450 --> 00:52:37.300
See what it does.

00:52:37.300 --> 00:52:38.220
Sniff the network.

00:52:38.220 --> 00:52:40.352
See who it talks to,
so on and so forth.

00:52:40.352 --> 00:52:44.480
But that's, basically, the same
origin story for JavaScript.

00:52:44.480 --> 00:52:45.255
Plug-ins--

00:52:45.255 --> 00:52:46.754
AUDIENCE: I was
under the impression

00:52:46.754 --> 00:52:50.051
that the reason you
do that is [INAUDIBLE]

00:52:50.051 --> 00:52:52.330
take less time to download
rather than [INAUDIBLE].

00:52:52.330 --> 00:52:54.820
PROFESSOR: So that is also
a reason they do that, too.

00:52:54.820 --> 00:52:56.100
That's a good point.

00:52:56.100 --> 00:53:00.312
But I mean, if you type
into the internet, sort of,

00:53:00.312 --> 00:53:02.500
web page obfuscation
or stuff like that,

00:53:02.500 --> 00:53:06.110
people often try to, somehow,
make some type of secrets

00:53:06.110 --> 00:53:08.330
into either their HTML
or their JavaScript.

00:53:08.330 --> 00:53:10.170
Maybe they want to
obscure the protocol.

00:53:10.170 --> 00:53:13.230
For example, if the client
uses it to talk to the server.

00:53:13.230 --> 00:53:16.222
Some people will also do the
obfuscation for that reason.

00:53:16.222 --> 00:53:17.680
Pure minification--
in other words,

00:53:17.680 --> 00:53:19.944
just making the
variable names small

00:53:19.944 --> 00:53:21.360
and moving the
[INAUDIBLE] space--

00:53:21.360 --> 00:53:24.875
yeah, that's mainly just to save
download band, download time.

00:53:28.611 --> 00:53:31.710
OK, so that's the
story for JavaScript.

00:53:31.710 --> 00:53:34.150
There's also plug-ins.

00:53:34.150 --> 00:53:39.440
So this is stuff like
Java and things like this.

00:53:39.440 --> 00:53:42.799
So a frame can easily run a
plug-in from either origin.

00:53:42.799 --> 00:53:44.590
Now plug-ins, depending
on who you believe,

00:53:44.590 --> 00:53:46.548
are actually going to
the way of the dinosaurs.

00:53:46.548 --> 00:53:48.530
Because a lot of the
new HTML 5 features,

00:53:48.530 --> 00:53:50.030
like video tag and
things like this,

00:53:50.030 --> 00:53:51.488
can actually do
stuff that you used

00:53:51.488 --> 00:53:53.574
to only be able to do
with a plug-in like Java.

00:53:53.574 --> 00:53:55.490
So it's not clear how
much longer these things

00:53:55.490 --> 00:53:58.460
are going to be around.

00:53:58.460 --> 00:53:59.590
OK, so any questions.

00:54:02.992 --> 00:54:07.560
OK, so remember that when
a browser generates an HTTP

00:54:07.560 --> 00:54:11.090
request it automatically
includes the relevant cookies

00:54:11.090 --> 00:54:12.080
in that request.

00:54:12.080 --> 00:54:18.250
So what happens if a
malicious site generates

00:54:18.250 --> 00:54:21.280
a URL that looks like this?

00:54:21.280 --> 00:54:24.538
So for example, it
creates a new child frame.

00:54:24.538 --> 00:54:28.370
It says that URL to bank.com.

00:54:28.370 --> 00:54:31.990
And then, it actually tries to
mimic what the browser would

00:54:31.990 --> 00:54:36.910
do if there was going to
be a transfer of money

00:54:36.910 --> 00:54:39.780
between the user
and someone else.

00:54:44.140 --> 00:54:49.605
So in this URL, in this
frame that the attack

00:54:49.605 --> 00:54:53.020
is trying to create, it tries
to invoke this transfer command

00:54:53.020 --> 00:54:53.520
here.

00:54:53.520 --> 00:54:54.872
Say $500.

00:54:54.872 --> 00:54:58.960
And that should go to the
attacker's account at the bank.

00:54:58.960 --> 00:55:01.800
Now the attacker
page, which the user

00:55:01.800 --> 00:55:04.516
visited because, somehow,
the attacker is [INAUDIBLE]

00:55:04.516 --> 00:55:07.450
go there.

00:55:07.450 --> 00:55:09.160
What's interesting
about this is that,

00:55:09.160 --> 00:55:11.760
even though the
attacker page won't

00:55:11.760 --> 00:55:14.930
be able to see the contents
of this child frame

00:55:14.930 --> 00:55:18.020
because it's probably going
to be in a different origin.

00:55:18.020 --> 00:55:21.880
The bank.com page will still
do what the attacker wants

00:55:21.880 --> 00:55:24.220
because the browser's going
to transfer all the users

00:55:24.220 --> 00:55:25.506
cookies with this request.

00:55:25.506 --> 00:55:27.130
It's going to look
at this command here

00:55:27.130 --> 00:55:29.080
and say, oh, the user
must've, somehow,

00:55:29.080 --> 00:55:31.770
asked me to transfer $500
to this mysteriously named

00:55:31.770 --> 00:55:32.990
individual named attacker.

00:55:32.990 --> 00:55:34.070
OK, I'll do.

00:55:34.070 --> 00:55:36.030
All right, seems reasonable.

00:55:36.030 --> 00:55:38.080
So that's a problem.

00:55:38.080 --> 00:55:39.620
Then the reason
this attack works

00:55:39.620 --> 00:55:42.850
is because, essentially,
the attacker

00:55:42.850 --> 00:55:45.760
can figure out
deterministically what

00:55:45.760 --> 00:55:47.520
this command should look like.

00:55:47.520 --> 00:55:49.825
There's no randomness
in this command here.

00:55:49.825 --> 00:55:51.200
So essentially,
what the attacker

00:55:51.200 --> 00:55:54.164
can do is try this on his
or her own bank account,

00:55:54.164 --> 00:55:55.580
figure out this
protocol, and then

00:55:55.580 --> 00:55:58.780
just, somehow, force the
user browser to execute

00:55:58.780 --> 00:56:00.560
this on the attackers behalf.

00:56:00.560 --> 00:56:08.639
So this is what's called a
cross site request forgery.

00:56:12.180 --> 00:56:16.690
So sometimes you hear
this is called CSRF.

00:56:16.690 --> 00:56:21.680
C-S-R-F.

00:56:21.680 --> 00:56:25.620
So the solution to
fixing this attack here

00:56:25.620 --> 00:56:28.540
is that you actually just need
to include some randomness

00:56:28.540 --> 00:56:30.680
in this URL that's generated.

00:56:30.680 --> 00:56:32.420
A type of randomness
that the attacker

00:56:32.420 --> 00:56:33.810
can't guess statically.

00:56:33.810 --> 00:56:42.960
So for example, you can imagine
that inside the bank's web page

00:56:42.960 --> 00:56:45.110
it's going to have some form.

00:56:45.110 --> 00:56:47.010
The form is the
thing, which actually

00:56:47.010 --> 00:56:48.460
generates request like this.

00:56:48.460 --> 00:56:53.475
So maybe the action of
that form is transfer.cgi.

00:56:57.690 --> 00:57:02.330
And then, inside this form,
you're going to have an input.

00:57:02.330 --> 00:57:05.322
Inputs are usually used to
get in user input like text,

00:57:05.322 --> 00:57:07.280
key presses, mouse clicks,
and stuff like that.

00:57:07.280 --> 00:57:09.310
But we can actually
give this input

00:57:09.310 --> 00:57:12.950
a type of hidden, which
means that it's not

00:57:12.950 --> 00:57:16.280
shown to the user.

00:57:16.280 --> 00:57:19.060
And then, we can give
it this attribute.

00:57:19.060 --> 00:57:24.190
We'll call it CSRF.

00:57:24.190 --> 00:57:26.020
And then, we'll give
it some random value.

00:57:31.790 --> 00:57:33.380
You know, a72f.

00:57:33.380 --> 00:57:35.240
Whatever.

00:57:35.240 --> 00:57:37.620
So remember, this is
generated on the server side.

00:57:37.620 --> 00:57:41.320
So when the user goes to this
page, on the server side,

00:57:41.320 --> 00:57:43.270
it sometimes generates
this random here

00:57:43.270 --> 00:57:46.940
and embeds that in the HTML
that the user receives.

00:57:46.940 --> 00:57:49.390
So when the user
submits this form,

00:57:49.390 --> 00:57:52.140
then this URL that we
have up here will actually

00:57:52.140 --> 00:58:03.620
have this extra thing up here,
which is this token here.

00:58:03.620 --> 00:58:05.250
So what this does
is that this now

00:58:05.250 --> 00:58:08.198
means that the
attacker would have

00:58:08.198 --> 00:58:10.450
to be able to guess the
particular range of token

00:58:10.450 --> 00:58:13.060
that the server generated
for the user each time

00:58:13.060 --> 00:58:14.460
the user had gone to the page.

00:58:14.460 --> 00:58:17.720
So if you sufficient
randomness here,

00:58:17.720 --> 00:58:20.230
the attacker can't just
forge one of these things

00:58:20.230 --> 00:58:23.250
because if the attacker
guesses the wrong token,

00:58:23.250 --> 00:58:25.364
then the server orders
will reject your request.

00:58:25.364 --> 00:58:26.985
AUDIENCE: Well why
should these always

00:58:26.985 --> 00:58:30.450
be included in the URL and not
in the body of the [INAUDIBLE]?

00:58:35.286 --> 00:58:36.160
PROFESSOR: Yeah, yea.

00:58:36.160 --> 00:58:38.836
So HTTPS helps a
lot of these things.

00:58:38.836 --> 00:58:40.502
And there's actually
no intrinsic reason

00:58:40.502 --> 00:58:42.240
why you couldn't put
some of this stuff

00:58:42.240 --> 00:58:44.000
in the body of the request.

00:58:44.000 --> 00:58:47.319
There's some legacy reasons why
forms, sort of, work like this.

00:58:47.319 --> 00:58:48.110
But you're correct.

00:58:48.110 --> 00:58:50.690
And in practice, you can put
that information somewhere else

00:58:50.690 --> 00:58:51.660
in the HTTPS request.

00:58:51.660 --> 00:58:54.000
But note that just moving
that information, for example,

00:58:54.000 --> 00:58:56.280
to the body of the
request, there's

00:58:56.280 --> 00:58:59.080
still a challenge there,
potentially because if there's

00:58:59.080 --> 00:59:01.350
something there that
the attacker can guess.

00:59:01.350 --> 00:59:03.635
Then the attacker may
still be able to, somehow,

00:59:03.635 --> 00:59:05.410
conjure up that URL.

00:59:05.410 --> 00:59:08.510
For example, when I'm making
XML HTTP request and then,

00:59:08.510 --> 00:59:10.370
explicitly, setting
the body to this thing

00:59:10.370 --> 00:59:11.911
that the attacker
knows how to guess.

00:59:11.911 --> 00:59:15.154
AUDIENCE: Well if the
attacker just gives you a URL,

00:59:15.154 --> 00:59:19.934
then that just gets encoded
in the header of [INAUDIBLE].

00:59:19.934 --> 00:59:22.620
PROFESSOR: If the attacker
just gives you a URL.

00:59:22.620 --> 00:59:26.320
So if you're just
setting a frame to URL,

00:59:26.320 --> 00:59:28.826
then, that's all that
the attacker can control.

00:59:28.826 --> 00:59:30.450
But if you're using
an XML HTTP request

00:59:30.450 --> 00:59:32.970
if, if somehow the attacker
can generate one of those,

00:59:32.970 --> 00:59:38.050
then XML HTTP interface actually
allows you to set the body.

00:59:38.050 --> 00:59:39.972
AUDIENCE: The XML
HTTP request would

00:59:39.972 --> 00:59:41.740
be limited by, say, an origin.

00:59:41.740 --> 00:59:44.110
But the attacker could just
write a form and submit it.

00:59:44.110 --> 00:59:46.820
There's nothing [INAUDIBLE]
submitting a form like using

00:59:46.820 --> 00:59:47.330
[INAUDIBLE].

00:59:47.330 --> 00:59:49.910
And then, it's sent in the body.

00:59:49.910 --> 00:59:50.710
But it's still--

00:59:50.710 --> 00:59:51.710
PROFESSOR: That's right.

00:59:51.710 --> 00:59:55.290
So XML HTTP request is
limited to the same origin.

00:59:55.290 --> 00:59:58.190
However, if for example,
the attacker can,

00:59:58.190 --> 01:00:01.380
maybe, do something
like this, for example.

01:00:01.380 --> 01:00:04.070
And the attacker can inject
the XML HTTP request here,

01:00:04.070 --> 01:00:05.090
which would then execute
with the authority

01:00:05.090 --> 01:00:05.965
of the embedded page.

01:00:10.650 --> 01:00:13.250
AUDIENCE: Can the
attacker [INAUDIBLE]

01:00:13.250 --> 01:00:16.741
by inspecting the
HTML source code?

01:00:16.741 --> 01:00:19.360
PROFESSOR: Yes, that's actually
a good question. right so

01:00:19.360 --> 01:00:22.830
it depends on what the
attacker has access to.

01:00:22.830 --> 01:00:25.431
If the attacker-- for example,
by doing something goofy

01:00:25.431 --> 01:00:30.110
like that-- can actually
access this JavaScript property

01:00:30.110 --> 01:00:33.870
called inner HTML.

01:00:33.870 --> 01:00:35.680
This is a property
[INAUDIBLE], right.

01:00:35.680 --> 01:00:39.970
So if I document that
body dot inner HTML,

01:00:39.970 --> 01:00:42.536
I will get all of the HTML
that's inside that page

01:00:42.536 --> 01:00:43.140
right now.

01:00:43.140 --> 01:00:43.640
So yeah.

01:00:43.640 --> 01:00:45.612
So if the attacker can
do this, then yeah.

01:00:45.612 --> 01:00:46.570
Then you're in trouble.

01:00:46.570 --> 01:00:47.624
That's right.

01:00:47.624 --> 01:00:49.040
So a lot of these
details, though,

01:00:49.040 --> 01:00:50.970
depend on exactly what the
attacker can and can't do.

01:00:50.970 --> 01:00:52.230
So it, kind of, makes sense.

01:00:52.230 --> 01:00:54.780
So if the attacker can or
cannot generate Ajax request,

01:00:54.780 --> 01:00:55.800
that means one thing.

01:00:55.800 --> 01:00:57.690
The attacker can or cannot
look at the right HTML,

01:00:57.690 --> 01:00:58.760
then you have another thing.

01:00:58.760 --> 01:00:59.551
So on and so forth.

01:01:02.041 --> 01:01:02.540
All right.

01:01:02.540 --> 01:01:03.660
So yeah.

01:01:03.660 --> 01:01:06.340
So this is token based
thing is a popular way

01:01:06.340 --> 01:01:10.270
to get around
these CSRF attacks.

01:01:10.270 --> 01:01:17.970
All right, so another
thing we can look at

01:01:17.970 --> 01:01:19.595
are network addresses.

01:01:22.711 --> 01:01:25.210
So this gets into some of the
conversation we've been having

01:01:25.210 --> 01:01:30.610
about who the attacker cannot
contact via XML HTTP request,

01:01:30.610 --> 01:01:31.240
for example.

01:01:36.450 --> 01:01:40.848
So with respect to
network addresses,

01:01:40.848 --> 01:01:47.210
a frame can send HTTP
and HTTPS requests

01:01:47.210 --> 01:01:50.560
to a host plus a port
that matches it's origin.

01:01:50.560 --> 01:01:54.950
But note that the security
of the same origin policy is,

01:01:54.950 --> 01:01:58.600
actually, very tightly tied
with the security of the DNS

01:01:58.600 --> 01:02:01.730
infrastructure because
all the same origin

01:02:01.730 --> 01:02:04.360
policies' rules are
based upon what names me.

01:02:04.360 --> 01:02:06.080
So if you can control
what names me,

01:02:06.080 --> 01:02:08.260
you can actually want some
pretty vicious attacks.

01:02:08.260 --> 01:02:14.520
So an example of this is
the DNS rebinding attack.

01:02:19.940 --> 01:02:25.010
So in this attack, the
goal of the attacker

01:02:25.010 --> 01:02:40.577
is run attacker controlled
JavaScript with the authority

01:02:40.577 --> 01:02:42.997
of some victim website.

01:02:42.997 --> 01:02:44.330
We'll just call them victim.com.

01:02:48.232 --> 01:02:50.440
So the attacker wants to
bus the same origin policies

01:02:50.440 --> 01:02:53.240
and somehow run code
that he has written

01:02:53.240 --> 01:02:55.620
with the authority
of some other site.

01:02:55.620 --> 01:02:59.480
So here's the approach.

01:02:59.480 --> 01:03:03.740
So the first thing that
the attacker is going to do

01:03:03.740 --> 01:03:07.460
is register a domain name.

01:03:10.670 --> 01:03:13.040
So let's say we just
call that attacker.com.

01:03:18.746 --> 01:03:19.960
Very simple to do.

01:03:19.960 --> 01:03:21.089
Just pay a couple of bucks.

01:03:21.089 --> 01:03:21.880
You're ready to go.

01:03:21.880 --> 01:03:24.390
You own your own domain name.

01:03:24.390 --> 01:03:26.220
So note that the
attacker is also

01:03:26.220 --> 01:03:28.970
going to set up a
DNS server to respond

01:03:28.970 --> 01:03:32.490
to name resolution
requests for objects

01:03:32.490 --> 01:03:33.960
that reside in attacker.com.

01:03:33.960 --> 01:03:35.810
So the second thing
that has to happen

01:03:35.810 --> 01:03:40.980
is that the user has
to visit attacker.com.

01:03:44.291 --> 01:03:47.260
In particular, the user has
to visit some website that

01:03:47.260 --> 01:03:49.190
hangs off of this domain name.

01:03:49.190 --> 01:03:50.990
This part is
actually not tricky.

01:03:50.990 --> 01:03:53.110
See if you can create
an ad campaign.

01:03:53.110 --> 01:03:54.040
Free iPad.

01:03:54.040 --> 01:03:54.800
Everybody wants
a free iPad, even

01:03:54.800 --> 01:03:56.730
though I don't know anyone
who's ever won a free iPad.

01:03:56.730 --> 01:03:57.647
The click on this.

01:03:57.647 --> 01:03:58.230
They're there.

01:03:58.230 --> 01:04:00.063
It's in the phishing
email, so and so forth.

01:04:00.063 --> 01:04:01.330
This part's not hard.

01:04:01.330 --> 01:04:03.030
So what's going to happen?

01:04:03.030 --> 01:04:10.430
So this is actually going
to cause the browser

01:04:10.430 --> 01:04:25.560
to generate a DNS
request to attacker.com

01:04:25.560 --> 01:04:27.540
because this page
has some objects that

01:04:27.540 --> 01:04:30.950
refer to some objects
that live in attacker.com.

01:04:30.950 --> 01:04:34.810
The browser's going to say I
never seen this domain before.

01:04:34.810 --> 01:04:38.292
Let me send the DNS resolution
request to attacker.com.

01:04:38.292 --> 01:04:39.750
So what's going to
end up happening

01:04:39.750 --> 01:04:42.570
is that the attackers
DNS server is going

01:04:42.570 --> 01:04:45.090
to respond to that request.

01:04:45.090 --> 01:04:49.100
But it's going to respond
with a DNS result that

01:04:49.100 --> 01:04:51.630
has a very short time to live.

01:04:51.630 --> 01:04:52.210
OK?

01:04:52.210 --> 01:04:54.540
Meaning that the
browser will think

01:04:54.540 --> 01:04:58.300
that it's only valid for a
very short period of time

01:04:58.300 --> 01:05:00.400
before it has to go out
and revalidate that.

01:05:00.400 --> 01:05:02.070
OK?

01:05:02.070 --> 01:05:17.780
So in other words, the attacker
response has a small DTL.

01:05:20.600 --> 01:05:21.260
OK, fine.

01:05:21.260 --> 01:05:23.990
So the user gets
the response back.

01:05:23.990 --> 01:05:27.460
The malicious website is now
running on the user side.

01:05:27.460 --> 01:05:30.580
Meanwhile, while the user's
interacting with the sight,

01:05:30.580 --> 01:05:34.580
the attacker is going
to configure the DNS

01:05:34.580 --> 01:05:37.310
server that he controls.

01:05:37.310 --> 01:05:45.390
The attacker is going to
bind the attacker.com name

01:05:45.390 --> 01:05:50.940
to victim.com's IP address.

01:05:56.600 --> 01:05:57.260
Right?

01:05:57.260 --> 01:06:02.050
So what that means is that
now if the user's browser ask

01:06:02.050 --> 01:06:04.600
for a domain name resolution
for something that

01:06:04.600 --> 01:06:06.730
resides in attacker.com,
it's actually

01:06:06.730 --> 01:06:10.152
going to get some internal
address to victim.com.

01:06:10.152 --> 01:06:12.750
This is actually very subtle.

01:06:12.750 --> 01:06:15.720
Now why can the attacker's
DNS resolver do that?

01:06:15.720 --> 01:06:18.530
Because the attacker
configures it to do so.

01:06:18.530 --> 01:06:19.970
The attacker's DNS
server does not

01:06:19.970 --> 01:06:23.387
have to consult victim.com
to do this rebinding.

01:06:23.387 --> 01:06:25.970
So perhaps, you can see some of
the outline in the attack now.

01:06:25.970 --> 01:06:32.450
So what will happen
is that the website

01:06:32.450 --> 01:06:44.185
wants to fetch a new object
via, let's say, AJAX.

01:06:47.480 --> 01:06:50.300
And it thinks that
that AJAX request

01:06:50.300 --> 01:06:53.520
is going to go to attacker.com
somewhere externally.

01:06:53.520 --> 01:07:00.950
But this AJAX request
actually goes to victim.com.

01:07:05.800 --> 01:07:08.110
And the reason why that's
bad is because now we've

01:07:08.110 --> 01:07:10.240
got this code on
appliance side that

01:07:10.240 --> 01:07:16.270
resides on the attacker.com web
page that's actually accessing

01:07:16.270 --> 01:07:19.070
now data that is from
a different origin

01:07:19.070 --> 01:07:20.990
from victim.com.

01:07:20.990 --> 01:07:23.150
So once this step of
the attack completes,

01:07:23.150 --> 01:07:26.765
then the attacker.com web page
can send that contact back

01:07:26.765 --> 01:07:30.600
to the server using [INAUDIBLE]
or do other things like that.

01:07:30.600 --> 01:07:32.709
So does this attack make sense?

01:07:32.709 --> 01:07:35.000
AUDIENCE: Wouldn't it be more
sensible to do the attack

01:07:35.000 --> 01:07:36.560
the other way around?

01:07:36.560 --> 01:07:41.732
So to [INAUDIBLE] victim.com
to the attackers IP address.

01:07:41.732 --> 01:07:43.940
Because that way you're the
same origin as victim.com

01:07:43.940 --> 01:07:47.710
so you can get all
the cookies and such.

01:07:47.710 --> 01:07:50.330
PROFESSOR: Yeah, so that
would work, too, as well.

01:07:50.330 --> 01:07:53.850
So what's nice about
this though is that,

01:07:53.850 --> 01:07:58.547
presumably, this allows you o do
nice things like port scanning

01:07:58.547 --> 01:07:59.380
and stuff like that.

01:07:59.380 --> 01:08:01.500
I mean, your approach
will work, right.

01:08:01.500 --> 01:08:04.680
But I think here the
reason why you do--

01:08:04.680 --> 01:08:05.780
AUDIENCE: [INAUDIBLE].

01:08:05.780 --> 01:08:07.280
PROFESSOR: Because,
essentially, you

01:08:07.280 --> 01:08:11.460
can do things like constantly
rebind what attacker.com points

01:08:11.460 --> 01:08:15.680
to to different machine names
and different ports inside

01:08:15.680 --> 01:08:17.394
of victim.com's network.

01:08:17.394 --> 01:08:19.060
So then, you can,
sort of, step through.

01:08:19.060 --> 01:08:22.240
So in other words, let's say
that the attacker.com web page

01:08:22.240 --> 01:08:28.899
always thinks it's
going to attacker.com

01:08:28.899 --> 01:08:32.540
and issuing an
AJAX request there.

01:08:32.540 --> 01:08:35.270
So every time the
DNS server rebinds,

01:08:35.270 --> 01:08:37.910
it [INAUDIBLE] to some
different IP address

01:08:37.910 --> 01:08:39.693
inside of victim.com's network.

01:08:39.693 --> 01:08:42.109
So it can just, sort of, step
through the IP addresses one

01:08:42.109 --> 01:08:47.369
by one and see if anybody's
responding to those requests.

01:08:47.369 --> 01:08:51.560
AUDIENCE: But the client,
the user you're attacking,

01:08:51.560 --> 01:08:55.280
doesn't necessarily have inside
access to victim.com's network.

01:08:55.280 --> 01:08:57.550
PROFESSOR: So what this
attack, typically, ensues

01:08:57.550 --> 01:09:00.390
is that there are certain
firewall rules that

01:09:00.390 --> 01:09:03.400
would prevent attacker.com
from outside the network

01:09:03.400 --> 01:09:05.970
from actually looking through
each one of the IP addresses

01:09:05.970 --> 01:09:07.354
inside of victim.com.

01:09:07.354 --> 01:09:09.270
However, if you're inside
corp.net-- if you're

01:09:09.270 --> 01:09:11.540
inside the corporate
firewall, let's say--

01:09:11.540 --> 01:09:16.384
then machines often do have the
ability to contact [INAUDIBLE].

01:09:16.384 --> 01:09:17.300
AUDIENCE: [INAUDIBLE].

01:09:17.300 --> 01:09:18.430
PROFESSOR: Yeah, yeah.

01:09:18.430 --> 01:09:20.270
Exactly.

01:09:20.270 --> 01:09:23.229
AUDIENCE: Does this
work over HTTPS?

01:09:23.229 --> 01:09:25.270
PROFESSOR: Ah, so that's
an interesting question.

01:09:25.270 --> 01:09:29.960
So HTTPS has these keys.

01:09:29.960 --> 01:09:33.090
So the way you'd have to
get this to work with HTTPS

01:09:33.090 --> 01:09:41.497
is if somehow, for example,
if attacker.com could-- let

01:09:41.497 --> 01:09:44.005
me think about this.

01:09:44.005 --> 01:09:47.990
Yeah, it's interesting
because, presumably,

01:09:47.990 --> 01:09:51.450
if you were using HTTPS, then
when you sent out this Ajax

01:09:51.450 --> 01:09:53.510
request, the victim
machine wouldn't

01:09:53.510 --> 01:09:56.830
have the attackers HTTPS keys.

01:09:56.830 --> 01:10:00.896
So the cryptography
would fail somehow.

01:10:00.896 --> 01:10:02.270
So I think HTTPS
would stop that.

01:10:02.270 --> 01:10:07.590
AUDIENCE: Or if the the victim
only has things on HTTPS?

01:10:07.590 --> 01:10:08.570
PROFESSOR: Yeah.

01:10:08.570 --> 01:10:10.352
So I think that would stop it.

01:10:14.771 --> 01:10:20.663
AUDIENCE: If you
configure the [INAUDIBLE]

01:10:20.663 --> 01:10:24.100
use the initial or receiving
result [INAUDIBLE]?

01:10:24.100 --> 01:10:25.580
PROFESSOR: That's
a good question.

01:10:25.580 --> 01:10:26.280
I'm actually not
sure about that.

01:10:26.280 --> 01:10:27.739
So actually, a lot
of these attacks

01:10:27.739 --> 01:10:29.821
were dependant on the devil
in the details, right?

01:10:29.821 --> 01:10:31.732
So I'm not actually
sure how that wold work.

01:10:31.732 --> 01:10:33.190
AUDIENCE: It uses
the first domain.

01:10:33.190 --> 01:10:34.898
PROFESSOR: It would
use the first domain?

01:10:34.898 --> 01:10:37.460
OK.

01:10:37.460 --> 01:10:37.960
Yep?

01:10:37.960 --> 01:10:40.030
AUDIENCE: So why
can the attacker

01:10:40.030 --> 01:10:46.319
respond with the victims IP
address in the first place?

01:10:46.319 --> 01:10:48.110
PROFESSOR: So why
can't-- what do you mean?

01:10:48.110 --> 01:10:50.900
AUDIENCE: [INAUDIBLE].

01:10:50.900 --> 01:10:53.630
Why has the attacker
team [INAUDIBLE]

01:10:53.630 --> 01:10:57.777
has to respond with the
attacker's IP [INAUDIBLE]?

01:10:57.777 --> 01:10:58.860
PROFESSOR: Oh, well, yeah.

01:10:58.860 --> 01:11:00.318
Since the attacker
has to, somehow,

01:11:00.318 --> 01:11:01.970
get it's own code on
the victim machine

01:11:01.970 --> 01:11:05.090
first before it can then start
doing this nonsense where it's

01:11:05.090 --> 01:11:06.300
looking inside the network.

01:11:06.300 --> 01:11:08.231
So it's that initial
step where it

01:11:08.231 --> 01:11:10.467
has to put that code
on the victims machine.

01:11:10.467 --> 01:11:12.050
All right, so in the
interest of time,

01:11:12.050 --> 01:11:13.133
let's keep moving forward.

01:11:13.133 --> 01:11:15.556
But come see me
after class if you

01:11:15.556 --> 01:11:19.100
want to follow up the question.

01:11:19.100 --> 01:11:22.560
So that's the DNS
rebinding attack.

01:11:22.560 --> 01:11:24.546
So how can you fix this?

01:11:24.546 --> 01:11:25.920
So one way you
could fix it is so

01:11:25.920 --> 01:11:29.040
that you modify your
client-side DNS resolver

01:11:29.040 --> 01:11:31.700
so that external
host names can never

01:11:31.700 --> 01:11:33.215
resolve to internal IP address.

01:11:33.215 --> 01:11:35.590
It's, kind of, goofy that
someone outside of your network

01:11:35.590 --> 01:11:37.756
should be able to create a
DNS binding for something

01:11:37.756 --> 01:11:38.840
inside of your network.

01:11:38.840 --> 01:11:40.740
That's the most
straightforward solution.

01:11:40.740 --> 01:11:43.310
You could also imagine that
the browser could do something

01:11:43.310 --> 01:11:44.620
called DNS pinning.

01:11:44.620 --> 01:11:47.760
Whereby, if it receives
a DNS resolution record,

01:11:47.760 --> 01:11:51.240
then it will always treat
that record as valid for,

01:11:51.240 --> 01:11:53.895
let's say, 30 minutes,
regardless of whether it

01:11:53.895 --> 01:11:56.740
has a short TTL set inside
it because that also

01:11:56.740 --> 01:11:58.177
prevents the attack, as well.

01:11:58.177 --> 01:12:00.260
That solution is a little
bit tricky because there

01:12:00.260 --> 01:12:02.920
are some sites that actually,
intentionally, use dynamic DNS

01:12:02.920 --> 01:12:05.170
and do things like load
balancing and stuff like that.

01:12:05.170 --> 01:12:08.230
So the first solution is
probably the better one.

01:12:08.230 --> 01:12:13.240
OK, so here is, sort
of, a fun attack.

01:12:13.240 --> 01:12:18.680
So we've talked about
a lot of resources

01:12:18.680 --> 01:12:20.628
that the origin protects--
the the same origin

01:12:20.628 --> 01:12:20.930
policy protects.

01:12:20.930 --> 01:12:21.805
So what about pixels?

01:12:25.230 --> 01:12:27.520
So how does the same origin
policy protect pixels?

01:12:27.520 --> 01:12:31.350
Well as it turns out, pixels
don't really have an origin.

01:12:31.350 --> 01:12:35.040
So each frame gets its
own little bounding box.

01:12:35.040 --> 01:12:36.480
Just a square, basically.

01:12:36.480 --> 01:12:40.710
So a frame can draw wherever
it wants on that square.

01:12:40.710 --> 01:12:42.910
So this is, actually,
a problem because what

01:12:42.910 --> 01:12:45.700
this means is that
a parent frame can

01:12:45.700 --> 01:12:49.030
draw atop of it's child frame.

01:12:49.030 --> 01:12:51.250
So this can lead to some
very insidious attacks.

01:12:51.250 --> 01:12:59.040
So let's say that the
attacker creates some page.

01:12:59.040 --> 01:13:02.620
And let's say,
inside of that page,

01:13:02.620 --> 01:13:09.420
the attacker says
click to win the iPad.

01:13:09.420 --> 01:13:11.690
The very same standard thing.

01:13:11.690 --> 01:13:13.090
So this is the parent frame.

01:13:13.090 --> 01:13:15.320
Now what the parent frame
can do is actually create

01:13:15.320 --> 01:13:23.140
a child frame that is actually
the Facebook Like button frame.

01:13:27.850 --> 01:13:32.630
So Facebook allows you to run
this little piece of Facebook

01:13:32.630 --> 01:13:34.210
code you can put on your page.

01:13:34.210 --> 01:13:36.340
You know, if the user
clicks Like, then that means

01:13:36.340 --> 01:13:37.970
that it'll go on
Facebook and say, hey,

01:13:37.970 --> 01:13:40.640
the user likes the
particular page.

01:13:40.640 --> 01:13:43.255
So we've got this
child frame over here.

01:13:45.852 --> 01:13:47.560
That actually turned
out remarkably well.

01:13:47.560 --> 01:13:51.480
Anyway, so you've got
this Like thing over here.

01:13:51.480 --> 01:13:58.200
Now what the attacker can do
is actually overlay this frame

01:13:58.200 --> 01:14:01.070
on top of the click
to get the free iPad

01:14:01.070 --> 01:14:04.720
and also make this invisible.

01:14:04.720 --> 01:14:06.252
So CSS let's you do that.

01:14:06.252 --> 01:14:07.730
So what's going to happen?

01:14:07.730 --> 01:14:10.260
As we've already established,
everybody wants a free iPad.

01:14:10.260 --> 01:14:12.370
So the user's going
to go to this site,

01:14:12.370 --> 01:14:16.609
click on thing-- this area
of the screen-- thinking

01:14:16.609 --> 01:14:18.900
that they're going to click
here and get the free iPad.

01:14:18.900 --> 01:14:21.060
But in reality, they're
clicking the Like button

01:14:21.060 --> 01:14:23.130
that they can't see
that's invisible.

01:14:23.130 --> 01:14:25.560
It's like layered
atop the C index.

01:14:25.560 --> 01:14:27.640
So what that means is
that now maybe they

01:14:27.640 --> 01:14:30.310
go check their Facebook profile,
and they've liked attacker.com.

01:14:30.310 --> 01:14:33.300
You know, and they don't
remember how that happened.

01:14:33.300 --> 01:14:36.050
So this is actually called
click jacking attack

01:14:36.050 --> 01:14:38.910
because you can imagine you
can do all kinds of evil things

01:14:38.910 --> 01:14:39.410
here.

01:14:39.410 --> 01:14:43.610
So you can imagine you could
steal passwords this way.

01:14:43.610 --> 01:14:44.770
You could get raw input.

01:14:44.770 --> 01:14:46.270
I mean, it's madness.

01:14:46.270 --> 01:14:49.760
So once again, this
happens because the parent,

01:14:49.760 --> 01:14:53.720
essentially, gets the right
to draw over anything that's

01:14:53.720 --> 01:14:56.140
inside this bounding box.

01:14:56.140 --> 01:15:00.084
So does that attack make sense?

01:15:00.084 --> 01:15:00.724
Yeah.

01:15:00.724 --> 01:15:02.140
AUDIENCE: [INAUDIBLE],
what do you

01:15:02.140 --> 01:15:06.400
mean the parent gets to draw
over anything [INAUDIBLE]?

01:15:06.400 --> 01:15:08.900
PROFESSOR: So what I'm
trying to indicate here

01:15:08.900 --> 01:15:14.415
is that, visually speaking,
what the user just sees is this.

01:15:14.415 --> 01:15:16.040
AUDIENCE: Oh, that's
the parent frames.

01:15:16.040 --> 01:15:17.140
PROFESSOR: Yeah, this
is the parent frame.

01:15:17.140 --> 01:15:17.380
That's right.

01:15:17.380 --> 01:15:17.930
This is the child frame.

01:15:17.930 --> 01:15:20.120
So visually speaking,
the user just sees this.

01:15:20.120 --> 01:15:23.790
But using the miracle of my da
Vinci style drawing techniques,

01:15:23.790 --> 01:15:27.340
this is actually overlaid
atop this transparently.

01:15:27.340 --> 01:15:28.720
So that's the child frame.

01:15:28.720 --> 01:15:30.505
That's the parent frame.

01:15:30.505 --> 01:15:32.380
OK so, there's a couple
different solutions--

01:15:32.380 --> 01:15:34.575
you can imagine--
for solving this.

01:15:34.575 --> 01:15:40.320
The first solution is to
use a frame busting code.

01:15:43.850 --> 01:15:47.200
So you can actually use
JavaScript expressions

01:15:47.200 --> 01:15:50.910
to figure out if you have
been put into a frame

01:15:50.910 --> 01:15:51.920
by someone else.

01:15:51.920 --> 01:15:59.490
So like, one of these tests is
you compare the reference self

01:15:59.490 --> 01:16:01.750
to top.

01:16:01.750 --> 01:16:04.330
So in the JavaScript
world, self refers

01:16:04.330 --> 01:16:06.800
to frame that you
yourself aren't in.

01:16:06.800 --> 01:16:10.700
Top refers to the frame at the
top of the frame hierarchy.

01:16:10.700 --> 01:16:12.846
So if you do this
test and you find out

01:16:12.846 --> 01:16:14.780
that self is not
equal to top, then you

01:16:14.780 --> 01:16:16.570
realize that you
are a child frame.

01:16:16.570 --> 01:16:19.039
And then you can refuse to
load or do things like this.

01:16:19.039 --> 01:16:20.580
So this, in fact,
is what will happen

01:16:20.580 --> 01:16:22.844
if you try to create a frame
for, let's say, CNN.com.

01:16:22.844 --> 01:16:24.760
You can actually look
in the JavaScript source

01:16:24.760 --> 01:16:26.940
and see that it does
this test because CNN.com

01:16:26.940 --> 01:16:29.980
doesn't want other people
taking credit for it's content.

01:16:29.980 --> 01:16:31.755
So it only wants to
be the top most frame.

01:16:31.755 --> 01:16:33.550
So that's one solution
you can use here.

01:16:33.550 --> 01:16:35.216
The other solution
that you can use here

01:16:35.216 --> 01:16:39.890
is also to have your web
server send this HTTP response

01:16:39.890 --> 01:16:41.900
hitter called x-frame options.

01:16:45.180 --> 01:16:47.520
So when the web server
returns a response,

01:16:47.520 --> 01:16:48.690
it can set this header.

01:16:48.690 --> 01:16:50.870
And it can basically
say, hey, browser,

01:16:50.870 --> 01:16:54.740
do not allow anyone to put
my content inside of a frame.

01:16:54.740 --> 01:16:56.830
So that allows the browser
to do the enforcement.

01:16:56.830 --> 01:16:59.540
So that's pretty
straightforward.

01:16:59.540 --> 01:17:02.460
So there's a bunch of
other, sort of, crazy

01:17:02.460 --> 01:17:04.151
attacks that you can launch.

01:17:04.151 --> 01:17:06.150
Here's another one that's
actually pretty funny.

01:17:06.150 --> 01:17:08.860
So as I was mentioning
before, the fact

01:17:08.860 --> 01:17:11.180
that we're now living in a
web that's internationalized

01:17:11.180 --> 01:17:14.502
actually mean that there's
all these issues that

01:17:14.502 --> 01:17:17.900
come up involving name and
how you represent host names.

01:17:17.900 --> 01:17:23.434
So for example, let's say that
you see this letter right here.

01:17:23.434 --> 01:17:24.600
So what does this look like?

01:17:24.600 --> 01:17:26.120
This looks like a C, right?

01:17:26.120 --> 01:17:27.490
What is this?

01:17:27.490 --> 01:17:30.460
A C in ASCII in
the Latin alphabet?

01:17:30.460 --> 01:17:33.250
Or is this a C in Cyrillic?

01:17:33.250 --> 01:17:34.870
Hard to say, right?

01:17:34.870 --> 01:17:37.890
So you can end up having these
really strange attacks where

01:17:37.890 --> 01:17:44.210
attackers will register a
domain name, like cats.com,

01:17:44.210 --> 01:17:45.350
for example.

01:17:45.350 --> 01:17:48.340
But this is a Cyrillic C.

01:17:48.340 --> 01:17:50.724
So users will go to this domain.

01:17:50.724 --> 01:17:52.140
They might click
on it or whatever

01:17:52.140 --> 01:17:55.840
thinking they're going to
Latin alphabet C, cats.com.

01:17:55.840 --> 01:17:58.450
But instead, they're
going to an attacker one.

01:17:58.450 --> 01:18:01.824
And then, all kinds of madness
can happen from there, as well.

01:18:01.824 --> 01:18:03.240
So you might have
heard of attacks

01:18:03.240 --> 01:18:05.240
like this are like
typo squatting attacks

01:18:05.240 --> 01:18:11.900
where people register for
names like F-C-E book.com.

01:18:16.440 --> 01:18:20.170
This is a common fumble finger
typing for Facebook.com.

01:18:20.170 --> 01:18:23.745
So if you control this, you're
going to get a ton of traffic

01:18:23.745 --> 01:18:26.456
from people who think they're
going to Facebook.com.

01:18:26.456 --> 01:18:29.130
So there's a bunch of different,
sort of, wacky attacks

01:18:29.130 --> 01:18:31.710
that you can launch
through the domain

01:18:31.710 --> 01:18:34.806
registry system that are tricky
to defend from first principles

01:18:34.806 --> 01:18:37.180
because how are you going to
prevent users from mistyping

01:18:37.180 --> 01:18:38.540
things, for example?

01:18:38.540 --> 01:18:41.700
Or how would the browser
indicate to the user, hey,

01:18:41.700 --> 01:18:43.110
this is Cyrillic?

01:18:43.110 --> 01:18:45.260
Is the browser going to
alert the user every time

01:18:45.260 --> 01:18:46.820
Cyrillic fonts are included?

01:18:46.820 --> 01:18:49.070
That's going to make people
angry if they actually use

01:18:49.070 --> 01:18:51.220
Cyrillic as their native font.

01:18:51.220 --> 01:18:54.040
So it's not quite clear,
technologically speaking,

01:18:54.040 --> 01:18:56.940
how we deal with
some of those issues.

01:18:56.940 --> 01:19:01.430
So yeah, there's a bunch
of other security issues

01:19:01.430 --> 01:19:02.790
that are very subtle here.

01:19:02.790 --> 01:19:07.670
One thing that's interesting
is if you look at plugins.

01:19:07.670 --> 01:19:10.900
So how do plugins treat
the same origin policy?

01:19:10.900 --> 01:19:15.442
Well plugins often have very
subtle incompatibilities

01:19:15.442 --> 01:19:17.150
with the rest of the
browser with respect

01:19:17.150 --> 01:19:17.941
to the same origin.

01:19:17.941 --> 01:19:20.480
So for example, if you
look at a Java plug-in,

01:19:20.480 --> 01:19:25.020
Java, oftentimes, assumes
that different host

01:19:25.020 --> 01:19:28.730
names that have
the same IP address

01:19:28.730 --> 01:19:31.420
actually have the same origin.

01:19:31.420 --> 01:19:34.580
That's actually a pretty big
deviation from the standard

01:19:34.580 --> 01:19:37.450
interpretation of the same
origin policy because this

01:19:37.450 --> 01:19:45.620
means that if you have something
like x.y.com and, lets say,

01:19:45.620 --> 01:19:50.640
z.y.com, if they map
onto the same IP address,

01:19:50.640 --> 01:19:53.940
then Java will consider these
to be in the same origin,

01:19:53.940 --> 01:19:55.580
which is a problem
if, for example,

01:19:55.580 --> 01:19:58.390
this site gets [? owned ?]
but this one doesn't.

01:19:58.390 --> 01:19:59.970
So there's a bunch
of other corner

01:19:59.970 --> 01:20:01.420
cases involving plug-ins.

01:20:01.420 --> 01:20:05.190
You can refer to the tangled
web to see some more about some

01:20:05.190 --> 01:20:07.910
of those types of things.

01:20:07.910 --> 01:20:09.740
So the final thing that
I want to discuss--

01:20:09.740 --> 01:20:11.323
you can see the
lecture notes for more

01:20:11.323 --> 01:20:13.851
examples of a crazy Attacks
that people can launch--

01:20:13.851 --> 01:20:15.600
but the final thing
that I want to discuss

01:20:15.600 --> 01:20:19.680
is this screen sharing attack.

01:20:19.680 --> 01:20:22.680
So HTML 5 actually
define this NEW API

01:20:22.680 --> 01:20:26.630
by which a web page can allow
all the bits in it's screen

01:20:26.630 --> 01:20:28.560
to be shared with
another browser

01:20:28.560 --> 01:20:30.630
or shared with the server.

01:20:30.630 --> 01:20:32.230
This seems like a
really cool idea

01:20:32.230 --> 01:20:34.170
because now I can do
collaborative foo.

01:20:34.170 --> 01:20:36.406
We can collaborate on a
document at the same time.

01:20:36.406 --> 01:20:38.405
And it's exciting because
we live in the future.

01:20:38.405 --> 01:20:40.950
But what's funny
about this is that,

01:20:40.950 --> 01:20:44.420
when they designed this API,
and it's a very new API,

01:20:44.420 --> 01:20:47.560
they apparently didn't think
about same origin policies

01:20:47.560 --> 01:20:49.260
at all.

01:20:49.260 --> 01:20:54.070
So what that means is that
if you have some page that

01:20:54.070 --> 01:20:57.775
has multiple frames, then
any one of these frames,

01:20:57.775 --> 01:21:00.180
if they are granted
permission to take

01:21:00.180 --> 01:21:04.840
a screenshot of your monitor,
it can take an entire screen

01:21:04.840 --> 01:21:07.630
shot of the entire
thing, regardless

01:21:07.630 --> 01:21:11.200
of what origin that other
content's coming from.

01:21:11.200 --> 01:21:14.340
So this is, actually, a
pretty devastating flaw

01:21:14.340 --> 01:21:16.875
in the same origin policy.

01:21:16.875 --> 01:21:19.250
So there's some pretty obvious
fixes you can think about.

01:21:19.250 --> 01:21:23.500
So for example, if this person's
given screenshot capabilities,

01:21:23.500 --> 01:21:25.310
only let it take a
screenshot of this.

01:21:25.310 --> 01:21:25.810
Right?

01:21:25.810 --> 01:21:26.700
Not this whole thing.

01:21:26.700 --> 01:21:29.010
Why didn't the browser vendors
implement it like this?

01:21:29.010 --> 01:21:32.410
Because there's such pressure
to compete on features,

01:21:32.410 --> 01:21:35.595
and to innovate on features, and
to get that next new thing out

01:21:35.595 --> 01:21:36.150
there.

01:21:36.150 --> 01:21:38.441
So for example, a lot of the
questions that people were

01:21:38.441 --> 01:21:40.940
asking about this particular
lecture online [INAUDIBLE]

01:21:40.940 --> 01:21:42.711
was like, well, why
couldn't you do this?

01:21:42.711 --> 01:21:44.210
Wouldn't this thing
make more sense?

01:21:44.210 --> 01:21:46.030
It seems like this current
scheme is brain dead.

01:21:46.030 --> 01:21:47.460
Wouldn't this other
one be better?

01:21:47.460 --> 01:21:48.210
And the answer is, yes.

01:21:48.210 --> 01:21:48.895
Everything, yes.

01:21:48.895 --> 01:21:50.850
That's exactly correct.

01:21:50.850 --> 01:21:53.460
Almost anything would
be better than this.

01:21:53.460 --> 01:21:56.030
I'm ashamed to be
associated with this.

01:21:56.030 --> 01:21:57.220
But this is what we had.

01:21:57.220 --> 01:21:59.440
So what ends up happening
is if you look at the nuts

01:21:59.440 --> 01:22:01.507
and bolts of how web
browsers get developed,

01:22:01.507 --> 01:22:03.590
people are a little bit
better about security now.

01:22:03.590 --> 01:22:05.256
But like, with the
screen sharing thing,

01:22:05.256 --> 01:22:08.290
people were so pumped to
get this thing out there,

01:22:08.290 --> 01:22:10.310
they didn't realize
that's it's going to leak

01:22:10.310 --> 01:22:12.920
all the bits on your screen.

01:22:12.920 --> 01:22:14.864
So now we're at his
point with the web

01:22:14.864 --> 01:22:16.530
where-- I mean, look
at all these things

01:22:16.530 --> 01:22:18.310
that we've discussed today.

01:22:18.310 --> 01:22:20.200
So if we were going
to start from scratch

01:22:20.200 --> 01:22:22.280
and come up with a
better security policy,

01:22:22.280 --> 01:22:25.020
what fraction of websites
that you have today

01:22:25.020 --> 01:22:26.870
are going to actually work?

01:22:26.870 --> 01:22:28.941
Like, approximately,
.2% of them.

01:22:28.941 --> 01:22:29.440
Right?

01:22:29.440 --> 01:22:30.731
So users are going to complain.

01:22:30.731 --> 01:22:33.090
And this is another constant
story with security.

01:22:33.090 --> 01:22:36.040
Once you give users a feature,
it's often very difficult

01:22:36.040 --> 01:22:40.280
to claw that back, even if
that feature is insecure.

01:22:40.280 --> 01:22:42.450
So today, we discussed a
lot of different things

01:22:42.450 --> 01:22:44.120
about the same origin
policy and stuff like that.

01:22:44.120 --> 01:22:45.720
Next lecture, we'll
go into some more

01:22:45.720 --> 01:22:48.680
depth about some of those things
we talked about [INAUDIBLE].