WEBVTT

00:00:00.070 --> 00:00:02.430
The following content is
provided under a Creative

00:00:02.430 --> 00:00:03.820
Commons license.

00:00:03.820 --> 00:00:06.060
Your support will help
MIT OpenCourseWare

00:00:06.060 --> 00:00:10.150
continue to offer high quality,
educational resources for free.

00:00:10.150 --> 00:00:12.700
To make a donation, or to
view additional materials

00:00:12.700 --> 00:00:16.600
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:16.600 --> 00:00:17.260
at ocw.mit.edu.

00:00:26.985 --> 00:00:27.860
PROFESSOR: All right.

00:00:27.860 --> 00:00:29.760
Let's get started.

00:00:29.760 --> 00:00:32.409
So today we're going to
talk about capabilities,

00:00:32.409 --> 00:00:36.310
continue our discussion of how
to do privilege separation.

00:00:36.310 --> 00:00:39.960
And remember last week we
talked about how Unix provides

00:00:39.960 --> 00:00:41.910
some mechanisms for
applications to use

00:00:41.910 --> 00:00:45.600
if they want to privilege
separate the application's

00:00:45.600 --> 00:00:46.649
internal structure.

00:00:46.649 --> 00:00:48.940
And today we're going to talk
about capabilities, which

00:00:48.940 --> 00:00:53.720
is a very different way of
thinking about privileges

00:00:53.720 --> 00:00:56.220
that an application might have.

00:00:56.220 --> 00:00:59.030
And this is why we have actually
these two somewhat distinct

00:00:59.030 --> 00:01:06.840
readings for today, one of which
is this confused deputy problem

00:01:06.840 --> 00:01:10.982
and how to make your privileges
much more explicit when you're

00:01:10.982 --> 00:01:12.940
writing software so that
you don't accidentally

00:01:12.940 --> 00:01:14.595
use the wrong privileges.

00:01:14.595 --> 00:01:16.470
And then the second
paper is about the system

00:01:16.470 --> 00:01:20.700
called Capsicum, which is all
about sandboxing and running

00:01:20.700 --> 00:01:22.930
some piece of code
with fewer privileges

00:01:22.930 --> 00:01:26.420
so that it, very much
like [INAUDIBLE],

00:01:26.420 --> 00:01:29.786
if it's compromised, the
damage isn't that great.

00:01:29.786 --> 00:01:31.830
Now it turns out
that the authors

00:01:31.830 --> 00:01:34.380
of both of these
readings really think

00:01:34.380 --> 00:01:37.610
capabilities are the answer,
because they let you manipulate

00:01:37.610 --> 00:01:42.540
privileges in a rather different
way from how Unix, let's say,

00:01:42.540 --> 00:01:44.812
thinks about privileges.

00:01:44.812 --> 00:01:47.270
So to get started, maybe let's
look at this confused deputy

00:01:47.270 --> 00:01:48.880
problem and try
to understand what

00:01:48.880 --> 00:01:52.980
is this problem that Norman
Hardy ran into and was

00:01:52.980 --> 00:01:54.590
so perplexed by.

00:01:54.590 --> 00:01:56.854
So the paper is
written-- well, it

00:01:56.854 --> 00:01:58.395
was written quite
a while ago, and it

00:01:58.395 --> 00:02:01.020
uses syntax for file names
that's a bit surprising.

00:02:01.020 --> 00:02:04.480
But we can try to at least
transcribe his problem

00:02:04.480 --> 00:02:07.690
into more familiar syntax
with Unix-style path

00:02:07.690 --> 00:02:08.947
names, et cetera.

00:02:08.947 --> 00:02:10.530
So as far as I can
tell, what is going

00:02:10.530 --> 00:02:13.880
on in their system is that they
had a Fortran compiler, which

00:02:13.880 --> 00:02:16.310
sort of dates their
design at some level, too.

00:02:16.310 --> 00:02:22.030
But their Fortran compiler
lived in /sysx/fort,

00:02:22.030 --> 00:02:26.150
and they wanted to change
this Fortran compiler,

00:02:26.150 --> 00:02:29.554
so they would keep statistics
about what was compiled,

00:02:29.554 --> 00:02:31.720
what parts of a compiler
were particularly expensive

00:02:31.720 --> 00:02:33.410
presumably, et cetera.

00:02:33.410 --> 00:02:36.120
So he wanted to make sure this
Fortran compiler would somehow

00:02:36.120 --> 00:02:39.110
end up writing to
this file /sysx/stat,

00:02:39.110 --> 00:02:44.360
that it would record information
about various invocations

00:02:44.360 --> 00:02:46.350
of the compiler.

00:02:46.350 --> 00:02:50.070
And the way they did this is,
in their operating system, they

00:02:50.070 --> 00:02:52.170
had something kind
of like the setuid

00:02:52.170 --> 00:02:54.040
that we talked about in Unix.

00:02:54.040 --> 00:02:57.360
Except there, they called
it the home files license.

00:02:57.360 --> 00:03:01.380
And what it means is that
if you ran /sysx/fort,

00:03:01.380 --> 00:03:05.710
and this program had this
so-called home files license,

00:03:05.710 --> 00:03:09.860
then this process that you just
ran would have extra privileges

00:03:09.860 --> 00:03:13.102
on being able to write
everything in /sysx.

00:03:13.102 --> 00:03:15.310
So it would have these extra
privileges on everything

00:03:15.310 --> 00:03:18.819
in /sysx/, basically, star.

00:03:18.819 --> 00:03:20.610
It could access all
those files in addition

00:03:20.610 --> 00:03:22.985
to anything that it could
access because the user ran it,

00:03:22.985 --> 00:03:25.190
for example.

00:03:25.190 --> 00:03:27.030
So the particular
problem they ran into

00:03:27.030 --> 00:03:31.236
is that some clever user
was able to do this.

00:03:31.236 --> 00:03:32.860
So they would run
the Fortran compiler,

00:03:32.860 --> 00:03:35.151
and the Fortran compiler
would take arguments very much

00:03:35.151 --> 00:03:36.790
like GCC takes arguments.

00:03:36.790 --> 00:03:39.590
And they would compile
something like foo.f.

00:03:39.590 --> 00:03:41.620
Here is my Fortran source code.

00:03:41.620 --> 00:03:48.120
And they'd say, well, put that
output -o into /sysx/stat.

00:03:48.120 --> 00:03:50.700
Or more damagingly
in their case,

00:03:50.700 --> 00:03:54.470
there was another file in
/sysx that was the billing file

00:03:54.470 --> 00:03:56.390
for all the customers
on the system.

00:03:56.390 --> 00:04:01.850
So you could similarly ask the
Fortran compiler to compile

00:04:01.850 --> 00:04:05.800
the source file and put the
output into some special file

00:04:05.800 --> 00:04:07.980
in /sysx.

00:04:07.980 --> 00:04:10.860
And in their case,
this actually worked.

00:04:10.860 --> 00:04:12.570
Even though the user
themselves didn't

00:04:12.570 --> 00:04:15.430
have access to write to
this file or directory,

00:04:15.430 --> 00:04:18.620
because the compiler had
this extra privilege--

00:04:18.620 --> 00:04:21.660
this home files
license, in their case--

00:04:21.660 --> 00:04:24.590
it was able to
override these files

00:04:24.590 --> 00:04:28.784
despite that not being really
the developer's intention.

00:04:28.784 --> 00:04:29.450
This make sense?

00:04:29.450 --> 00:04:31.116
This is the rough
problem they ran into?

00:04:31.116 --> 00:04:32.515
So who do they blame?

00:04:32.515 --> 00:04:33.765
What do they think went wrong?

00:04:40.995 --> 00:04:42.780
Or how would you
design it differently

00:04:42.780 --> 00:04:46.150
to avoid running
into such problems?

00:04:46.150 --> 00:04:48.770
So the thing they sort
of think about here,

00:04:48.770 --> 00:04:51.930
or they talk about
in this write up,

00:04:51.930 --> 00:04:55.240
is that they believe this
Fortran compiler should

00:04:55.240 --> 00:04:57.990
be very careful when it's
using its privileges.

00:04:57.990 --> 00:04:59.960
Because, at some level,
the Fortran compiler

00:04:59.960 --> 00:05:01.570
has two types of privileges.

00:05:01.570 --> 00:05:05.660
It has one stemming from the
fact the user invoked it,

00:05:05.660 --> 00:05:08.140
so the user should be
able to access the source

00:05:08.140 --> 00:05:10.050
file, like foo.f.

00:05:10.050 --> 00:05:11.860
And if it was some
other user, maybe

00:05:11.860 --> 00:05:14.680
it wouldn't be able to
access the user source code.

00:05:14.680 --> 00:05:17.590
And in other sorts of privileges
is from those home files

00:05:17.590 --> 00:05:20.830
license thing that allows us to
write to these special files.

00:05:20.830 --> 00:05:23.480
And internally, in the
source code of the compiler,

00:05:23.480 --> 00:05:25.920
when they open a
file, the compiler

00:05:25.920 --> 00:05:28.900
should have been very explicit
about which of these privileges

00:05:28.900 --> 00:05:31.910
it wants to exercise
when opening a file

00:05:31.910 --> 00:05:34.372
or performing some
privileged operation.

00:05:34.372 --> 00:05:36.330
But their compiler was
not written in this way.

00:05:36.330 --> 00:05:38.140
It was just called
open, read, write,

00:05:38.140 --> 00:05:39.550
like any other program would do.

00:05:39.550 --> 00:05:42.440
And it would implicitly use
all the privileges that it has,

00:05:42.440 --> 00:05:45.033
which combines-- well,
in their system design,

00:05:45.033 --> 00:05:47.410
it was sort of the union
of the user privileges

00:05:47.410 --> 00:05:51.086
and these home files
license privileges.

00:05:51.086 --> 00:05:52.790
That make sense?

00:05:52.790 --> 00:05:55.390
So these guys were really
interested in fixing

00:05:55.390 --> 00:05:56.180
this problem.

00:05:56.180 --> 00:05:59.240
And they were sort of calling
this compiler this confused

00:05:59.240 --> 00:06:00.964
deputy, because it
needs to disambiguate

00:06:00.964 --> 00:06:02.505
these multiple
privileges that it has

00:06:02.505 --> 00:06:06.800
and carefully use them
in the right instance.

00:06:06.800 --> 00:06:09.350
So I guess one thing
we could try to look at

00:06:09.350 --> 00:06:15.120
is how would we design
such a compiler in Unix?

00:06:15.120 --> 00:06:18.024
So in their system, they had
this whole files license thing.

00:06:18.024 --> 00:06:20.190
Other mechanisms, then they
introduced capabilities.

00:06:20.190 --> 00:06:21.750
We'll talk about them shortly.

00:06:21.750 --> 00:06:24.830
But could we solve
this in a Unix system?

00:06:24.830 --> 00:06:27.080
Suppose you had to write
this Fortran compiler in Unix

00:06:27.080 --> 00:06:29.566
and write to a special file
and avoid this confused

00:06:29.566 --> 00:06:30.190
deputy problem.

00:06:30.190 --> 00:06:32.775
What would you do?

00:06:32.775 --> 00:06:33.275
Any ideas?

00:06:35.802 --> 00:06:37.760
I guess you could just
declare this a bad plan.

00:06:37.760 --> 00:06:40.212
Like don't keep statistics.

00:06:40.212 --> 00:06:42.028
Yeah?

00:06:42.028 --> 00:06:44.649
AUDIENCE: [INAUDIBLE].

00:06:44.649 --> 00:06:45.315
PROFESSOR: Sure.

00:06:45.315 --> 00:06:46.670
That could be, right?

00:06:46.670 --> 00:06:47.530
Well, yeah.

00:06:47.530 --> 00:06:50.530
So you could not
support flags like -o.

00:06:50.530 --> 00:06:52.210
On the other hand,
you might want

00:06:52.210 --> 00:06:55.980
to allow specifying which
source code you want

00:06:55.980 --> 00:06:58.196
to compile so that maybe
you could read the billing

00:06:58.196 --> 00:06:59.820
file or read the
statistics file, which

00:06:59.820 --> 00:07:01.230
maybe should be secret.

00:07:01.230 --> 00:07:02.897
Or maybe the source
code has-- maybe you

00:07:02.897 --> 00:07:04.646
can support a the
source code on standard,

00:07:04.646 --> 00:07:06.330
but it has include
statements, so

00:07:06.330 --> 00:07:08.370
it needs to include other
pieces of source code.

00:07:08.370 --> 00:07:09.354
So that's a little tricky.

00:07:09.354 --> 00:07:11.729
AUDIENCE: You could split up
the application [INAUDIBLE].

00:07:16.905 --> 00:07:17.530
PROFESSOR: Yes.

00:07:17.530 --> 00:07:20.270
So another potentially good
design is to split it up,

00:07:20.270 --> 00:07:20.770
right?

00:07:20.770 --> 00:07:23.130
And realize that this
fort compiler really

00:07:23.130 --> 00:07:25.525
doesn't need all these two
privileges at the same time.

00:07:25.525 --> 00:07:33.420
So maybe we should have our Unix
world /bin/fortcc or something,

00:07:33.420 --> 00:07:36.570
the compiler, and then this guy
is just a regular program with

00:07:36.570 --> 00:07:37.790
no extra privileges.

00:07:37.790 --> 00:07:41.980
And then we'll also maybe
have a /bin/fortlog,

00:07:41.980 --> 00:07:44.350
which is going to be a special
program with some extra

00:07:44.350 --> 00:07:47.640
privileges and it'll log some
statistics about what's going

00:07:47.640 --> 00:07:49.410
on in the compiler.

00:07:49.410 --> 00:07:53.010
And fortcc is going
to invoke this guy.

00:07:53.010 --> 00:07:56.020
So how do we give this
guy extra privileges?

00:07:56.020 --> 00:07:56.520
Yeah?

00:07:56.520 --> 00:07:58.153
AUDIENCE: Well, maybe if you
use something like setuid

00:07:58.153 --> 00:08:00.930
or something, like fortlog,
then presumably any other user

00:08:00.930 --> 00:08:03.034
could also log arbitrary
data through it.

00:08:03.034 --> 00:08:03.700
PROFESSOR: Yeah.

00:08:03.700 --> 00:08:04.719
So this is not so great.

00:08:04.719 --> 00:08:06.510
Because on fortlog,
presumably the only way

00:08:06.510 --> 00:08:07.968
to give extra
privileges in Unix is

00:08:07.968 --> 00:08:11.170
to in fact make it owned by, I
don't know, maybe the fort UID,

00:08:11.170 --> 00:08:14.490
and that's also setuid.

00:08:14.490 --> 00:08:17.550
So every time you run it, it
switches to this Fortran UID.

00:08:17.550 --> 00:08:19.580
And maybe there's some
special stats file.

00:08:19.580 --> 00:08:23.170
But then in fact anyone can
invoke this fortlog thingy.

00:08:23.170 --> 00:08:24.730
Which is maybe not great.

00:08:24.730 --> 00:08:26.940
Now anyone can write
to the stats file.

00:08:26.940 --> 00:08:29.782
But maybe this example is not
the biggest security concern

00:08:29.782 --> 00:08:31.490
about someone corrupting
your statistics.

00:08:31.490 --> 00:08:33.220
But suppose this
was a billing file.

00:08:33.220 --> 00:08:36.072
Then maybe the same problems
would be slightly more acute.

00:08:36.072 --> 00:08:36.571
Yeah?

00:08:36.571 --> 00:08:39.674
AUDIENCE: But you can always
make your [INAUDIBLE] stats

00:08:39.674 --> 00:08:40.340
you want, right?

00:08:40.340 --> 00:08:41.298
Instead of [INAUDIBLE].

00:08:44.940 --> 00:08:46.930
PROFESSOR: So in
some sense, yeah.

00:08:46.930 --> 00:08:48.960
If you're willing to
live with arbitrary

00:08:48.960 --> 00:08:51.262
stuff in your statistics
or logging file,

00:08:51.262 --> 00:08:52.220
then maybe that's true.

00:08:52.220 --> 00:08:54.212
AUDIENCE: Even if
you [INAUDIBLE],

00:08:54.212 --> 00:08:56.702
you can already make your C
code have whatever statistics

00:08:56.702 --> 00:08:57.994
that you'd want to be recorded.

00:08:57.994 --> 00:08:58.868
PROFESSOR: You could.

00:08:58.868 --> 00:08:59.585
Yeah.

00:08:59.585 --> 00:09:00.084
Yeah.

00:09:00.084 --> 00:09:01.524
So it might be
that in this case,

00:09:01.524 --> 00:09:03.940
it doesn't really matter that
you can log arbitrary stuff.

00:09:03.940 --> 00:09:05.240
So that's true.

00:09:05.240 --> 00:09:06.040
Yeah.

00:09:06.040 --> 00:09:08.480
So if you cared about who can
invoke this fortlog thing,

00:09:08.480 --> 00:09:10.063
could you really do
something about it

00:09:10.063 --> 00:09:12.484
in Unix, or not so much?

00:09:12.484 --> 00:09:12.984
Yeah?

00:09:12.984 --> 00:09:14.892
AUDIENCE: [INAUDIBLE].

00:09:14.892 --> 00:09:18.090
It would make both
of them setuid.

00:09:18.090 --> 00:09:23.120
Now the fortcc would
read that source files.

00:09:23.120 --> 00:09:26.430
It would switch back to the
saved UID, just the user UID.

00:09:26.430 --> 00:09:31.060
Remote fortlog in
a setuid, which has

00:09:31.060 --> 00:09:32.485
permissions to execute fortlog.

00:09:32.485 --> 00:09:37.812
And that fortlog would
setuid again [INAUDIBLE].

00:09:37.812 --> 00:09:38.520
PROFESSOR: Right.

00:09:38.520 --> 00:09:39.020
Yeah.

00:09:39.020 --> 00:09:42.710
So there is this rather
elaborate mechanism in Unix

00:09:42.710 --> 00:09:46.280
that we skipped on last
Monday's lecture, that

00:09:46.280 --> 00:09:48.780
actually allows an
application to switch

00:09:48.780 --> 00:09:50.190
between multiple UIDs.

00:09:50.190 --> 00:09:53.800
if it was setuid to some
user ID, then it could say,

00:09:53.800 --> 00:09:55.730
well, now I want to
run with this user ID.

00:09:55.730 --> 00:09:57.480
Now I want to run with
this other user ID.

00:09:57.480 --> 00:10:00.820
And it could sort of carefully
alternate between these.

00:10:00.820 --> 00:10:02.320
It's a little tricky
to do it right,

00:10:02.320 --> 00:10:04.213
but it's probably doable.

00:10:04.213 --> 00:10:06.224
So that's one potential design.

00:10:06.224 --> 00:10:08.140
I guess another hack you
could maybe try to do

00:10:08.140 --> 00:10:10.740
is make this fortlog
binary only executable

00:10:10.740 --> 00:10:14.790
to a particular group and
make fortcc a setgid binary

00:10:14.790 --> 00:10:15.622
to that group.

00:10:15.622 --> 00:10:17.830
It's not great, because it
obliterates whatever group

00:10:17.830 --> 00:10:19.950
list the user had initially.

00:10:19.950 --> 00:10:21.200
But who knows?

00:10:21.200 --> 00:10:24.190
Maybe that's better
than nothing.

00:10:24.190 --> 00:10:26.550
Anyway, so it's a
fairly tricky problem

00:10:26.550 --> 00:10:29.600
to solve in an entirely
satisfactory fashion

00:10:29.600 --> 00:10:31.812
with these Unix mechanisms.

00:10:31.812 --> 00:10:33.770
Although, maybe you should
rethink your problem

00:10:33.770 --> 00:10:35.640
and not worry about
your statistics

00:10:35.640 --> 00:10:38.970
file as much in the first place.

00:10:38.970 --> 00:10:45.150
But how do we think about what's
going wrong in the design?

00:10:45.150 --> 00:10:47.920
Well, there's two things we
could try to learn from this,

00:10:47.920 --> 00:10:49.925
or basically, what went wrong.

00:10:53.120 --> 00:10:58.180
And one interpretation that
one party wants us to take away

00:10:58.180 --> 00:11:01.560
is this notion that he
calls ambient authority.

00:11:06.730 --> 00:11:08.300
So what is ambient authority?

00:11:08.300 --> 00:11:10.230
Can anyone figure
out what they meant?

00:11:10.230 --> 00:11:12.230
They've never
exactly defined this.

00:11:12.230 --> 00:11:12.730
Yeah?

00:11:12.730 --> 00:11:14.248
AUDIENCE: It means you
have the authority given

00:11:14.248 --> 00:11:15.590
to you by the environment.

00:11:15.590 --> 00:11:19.464
So as if [INAUDIBLE]
user with no limitations.

00:11:19.464 --> 00:11:20.130
PROFESSOR: Yeah.

00:11:20.130 --> 00:11:24.040
So you're making an
operation, and you can specify

00:11:24.040 --> 00:11:25.177
what operation you want.

00:11:25.177 --> 00:11:27.760
But the decision of whether that
operation is going to succeed

00:11:27.760 --> 00:11:30.850
comes from some extra implicit
parameters in your process,

00:11:30.850 --> 00:11:31.660
for example.

00:11:31.660 --> 00:11:34.970
And in Unix, you can figure
out what this ambient authority

00:11:34.970 --> 00:11:36.490
check might look like.

00:11:36.490 --> 00:11:38.860
So if you make a system
call, then you probably

00:11:38.860 --> 00:11:41.080
supplied some sort of a
name to a system call.

00:11:41.080 --> 00:11:43.340
And inside of the
kernel, this gets

00:11:43.340 --> 00:11:45.570
mapped to some
sort of an object.

00:11:45.570 --> 00:11:48.580
And the object presumably has
some kind of an access control

00:11:48.580 --> 00:11:52.110
list on it, like the permissions
on a file, et cetera.

00:11:52.110 --> 00:11:53.930
So there are some
permissions that you

00:11:53.930 --> 00:11:56.460
can get from the object.

00:11:56.460 --> 00:11:58.770
And that should decide
whether an operation

00:11:58.770 --> 00:12:00.480
is going to be
allowed on this name

00:12:00.480 --> 00:12:02.180
of the application supplied.

00:12:02.180 --> 00:12:04.400
This is sort of what the
application gets to see.

00:12:04.400 --> 00:12:06.850
Inside of the
kernel, there's also

00:12:06.850 --> 00:12:09.780
the current user ID of the
process making the calls.

00:12:09.780 --> 00:12:12.600
So this is the current prox UID.

00:12:15.140 --> 00:12:18.250
And this thing goes
into the decision

00:12:18.250 --> 00:12:22.710
of whether to allow a
particular operation or not.

00:12:22.710 --> 00:12:24.770
So it's the current
process user ID

00:12:24.770 --> 00:12:27.210
that's this ambient privilege.

00:12:27.210 --> 00:12:29.240
Whatever operation you're
going to try to do,

00:12:29.240 --> 00:12:31.540
the kernel will actually
try, in some sense,

00:12:31.540 --> 00:12:35.815
as hard as possible to allow
it by using your current UID,

00:12:35.815 --> 00:12:39.410
and your current GID and
whatever other extra privileges

00:12:39.410 --> 00:12:40.500
you might have.

00:12:40.500 --> 00:12:43.120
And as long as there's some set
of privileges that allow you

00:12:43.120 --> 00:12:45.690
to do it, it'll let you do it.

00:12:45.690 --> 00:12:47.065
Which is maybe
not the best thing

00:12:47.065 --> 00:12:51.080
to do if you aren't fully aware
of what all these problems are.

00:12:51.080 --> 00:12:53.010
Maybe you don't want
to use all of them

00:12:53.010 --> 00:12:57.910
to open a particular file or
make some other operation.

00:12:57.910 --> 00:13:01.867
Does this make sense, roughly
what ambient privilege is?

00:13:01.867 --> 00:13:03.325
In the case of an
operating system,

00:13:03.325 --> 00:13:05.910
it basically ends up being
the fact that a process has

00:13:05.910 --> 00:13:07.680
some sort of a user ID.

00:13:07.680 --> 00:13:11.570
Are there non-OS examples
of ambient privilege

00:13:11.570 --> 00:13:12.710
you guys can think of?

00:13:12.710 --> 00:13:15.280
Like when you're making
an operation, something

00:13:15.280 --> 00:13:17.525
about the identity of
the caller, the terms of

00:13:17.525 --> 00:13:18.900
whether they'll succeed or not.

00:13:21.640 --> 00:13:23.765
Like one example is
probably firewalls, as well.

00:13:23.765 --> 00:13:25.610
So this is just an OS example.

00:13:25.610 --> 00:13:29.940
And in privilege, another is
the firewalls on the network.

00:13:29.940 --> 00:13:32.570
Because any operation
you do from a machine

00:13:32.570 --> 00:13:35.890
inside of a firewall is
going to be allowed because,

00:13:35.890 --> 00:13:37.410
well, you just have
that IP address,

00:13:37.410 --> 00:13:39.930
or you're on that
side of a network.

00:13:39.930 --> 00:13:43.870
And if you're outside, the same
operation will be disallowed.

00:13:43.870 --> 00:13:47.330
So it's also a solar problem.

00:13:47.330 --> 00:13:50.850
Say you visit some website,
and the website includes a link

00:13:50.850 --> 00:13:53.794
to some different
server, well, maybe you

00:13:53.794 --> 00:13:55.710
don't want to use the
privileges that you have

00:13:55.710 --> 00:13:58.500
or the inside of your
network to access that link.

00:13:58.500 --> 00:14:00.500
Because maybe it'll access
your internal printer

00:14:00.500 --> 00:14:02.470
and exploit it in some way.

00:14:02.470 --> 00:14:05.021
And really, the guy that
provided you the link

00:14:05.021 --> 00:14:06.396
shouldn't have
been able to reach

00:14:06.396 --> 00:14:08.397
the printer in the first
place, because they

00:14:08.397 --> 00:14:09.230
were on the outside.

00:14:09.230 --> 00:14:14.190
Or a firewall that your browser,
maybe by visiting uplink,

00:14:14.190 --> 00:14:15.885
will be tricked into doing this.

00:14:15.885 --> 00:14:19.510
It's sort of a moral
equivalent of this confused

00:14:19.510 --> 00:14:21.010
problem on the network models.

00:14:21.010 --> 00:14:22.010
Yeah?

00:14:22.010 --> 00:14:25.344
AUDIENCE: [INAUDIBLE] permission
are directly affected also.

00:14:25.344 --> 00:14:26.010
PROFESSOR: Yeah.

00:14:26.010 --> 00:14:28.070
AUDIENCE: Because it's
essentially DAC, potentially,

00:14:28.070 --> 00:14:28.830
in the Capsicum.

00:14:28.830 --> 00:14:29.280
PROFESSOR: Yeah.

00:14:29.280 --> 00:14:31.250
So this is pretty much
what the Capsicum guys

00:14:31.250 --> 00:14:33.550
think of as discretionary
access control.

00:14:33.550 --> 00:14:35.800
And the fact that it's
discretionary, well,

00:14:35.800 --> 00:14:38.697
this is not quite what
discretionary access control

00:14:38.697 --> 00:14:39.470
means.

00:14:39.470 --> 00:14:41.790
But what discretionary
access control means

00:14:41.790 --> 00:14:45.350
is that the user, or
the owner of an object,

00:14:45.350 --> 00:14:48.609
can decide what security policy
will look like to an object.

00:14:48.609 --> 00:14:51.025
Which seems very natural in a
Unix setting. it's my files,

00:14:51.025 --> 00:14:51.970
I can decide what I want.

00:14:51.970 --> 00:14:54.386
I can give them to you, or I
can keep them private, great.

00:14:55.960 --> 00:14:58.700
So almost all DAC
systems do look

00:14:58.700 --> 00:15:01.300
like this, because they want to
have some sort of permissions

00:15:01.300 --> 00:15:04.450
that a user could modify
to control the security

00:15:04.450 --> 00:15:07.800
policy for their files.

00:15:07.800 --> 00:15:11.910
The flip side is
mandatory access control.

00:15:11.910 --> 00:15:15.257
We'll talk about it in a little
while, but at some level,

00:15:15.257 --> 00:15:17.340
they have this very
philosophically different view

00:15:17.340 --> 00:15:17.881
of the world.

00:15:17.881 --> 00:15:20.000
They think, well,
you're the user.

00:15:20.000 --> 00:15:22.240
But someone else will
set the security policy

00:15:22.240 --> 00:15:24.460
for how you use this computer.

00:15:24.460 --> 00:15:29.000
And this sort of came out of the
military in the '70s or '80s,

00:15:29.000 --> 00:15:32.946
when they really wanted to have
classified computer systems

00:15:32.946 --> 00:15:34.654
where, well, you're
working on some stuff

00:15:34.654 --> 00:15:35.613
and it's marked secret.

00:15:35.613 --> 00:15:37.737
I'm working on some stuff
that's marked top secret.

00:15:37.737 --> 00:15:39.113
So my stuff just
can't go to you.

00:15:39.113 --> 00:15:41.112
It's not up to me whether
to set the permissions

00:15:41.112 --> 00:15:42.000
on a file, et cetera.

00:15:42.000 --> 00:15:44.830
It's just not allowed
by some guy in charge.

00:15:44.830 --> 00:15:46.630
So mandatory access
control is really

00:15:46.630 --> 00:15:49.640
trying to enforce these
different kinds of policies

00:15:49.640 --> 00:15:52.500
in the first place,
where there's

00:15:52.500 --> 00:15:54.610
the user and the
application developer.

00:15:54.610 --> 00:15:56.910
And then there's some guy
separate from the user

00:15:56.910 --> 00:15:59.472
and the developer
that sets the policy.

00:15:59.472 --> 00:16:02.492
And, as you can sort of guess,
it doesn't always work out.

00:16:02.492 --> 00:16:03.950
Well, we'll talk
about it in a bit.

00:16:03.950 --> 00:16:06.001
But that's what discretionary
versus mandatory

00:16:06.001 --> 00:16:10.110
means at this control.

00:16:10.110 --> 00:16:11.310
All right.

00:16:11.310 --> 00:16:14.480
So there's many other examples
that you could imagine where

00:16:14.480 --> 00:16:16.040
we have ambient authority.

00:16:16.040 --> 00:16:20.910
And it's not inherently bad,
law but it's just something

00:16:20.910 --> 00:16:22.637
that you have to be
very careful about.

00:16:22.637 --> 00:16:24.470
If you have a system
with ambient authority,

00:16:24.470 --> 00:16:27.020
you should probably
be very careful

00:16:27.020 --> 00:16:29.595
if you're performing
privileged operations.

00:16:29.595 --> 00:16:31.220
You should make sure
that you're really

00:16:31.220 --> 00:16:35.980
using the right authority
and not accidentally being

00:16:35.980 --> 00:16:39.146
tricked very much like this
Fortran compiler 20 years ago.

00:16:39.146 --> 00:16:41.580
25 now.

00:16:41.580 --> 00:16:42.450
All right.

00:16:42.450 --> 00:16:45.470
So this is one interpretation
of what goes wrong.

00:16:45.470 --> 00:16:47.487
And it's not
necessarily the only way

00:16:47.487 --> 00:16:49.070
to think about what
goes wrong, right?

00:16:49.070 --> 00:16:51.192
Another possibility
is that, well,

00:16:51.192 --> 00:16:53.400
wouldn't it be nice if it
was easy for an application

00:16:53.400 --> 00:16:56.440
to tell whether it should
access a file on behalf

00:16:56.440 --> 00:16:57.445
of some principle?

00:16:57.445 --> 00:17:00.700
So maybe another problem
is that the access control

00:17:00.700 --> 00:17:02.024
checks are complicated.

00:17:07.381 --> 00:17:10.294
So in some sense, when the
Fortran compiler is running,

00:17:10.294 --> 00:17:13.900
and it's opening a file
on behalf of a user,

00:17:13.900 --> 00:17:16.660
it basically needs to
replicate the same exact logic

00:17:16.660 --> 00:17:20.240
we see drawn out here, except
that the Fortran compiler needs

00:17:20.240 --> 00:17:22.490
to plug-in something else here.

00:17:22.490 --> 00:17:25.770
Instead of using its current
privileges, and all of them,

00:17:25.770 --> 00:17:27.470
it should just
replicate this check

00:17:27.470 --> 00:17:32.150
and try to make it with a
different set of privileges.

00:17:32.150 --> 00:17:34.110
So in Unix, this
turns out to be fairly

00:17:34.110 --> 00:17:36.920
tricky to do, because
there's many places

00:17:36.920 --> 00:17:38.500
where these security
checks happen.

00:17:38.500 --> 00:17:41.020
if you have symbolic links,
then the symbolic link

00:17:41.020 --> 00:17:43.660
gets looked up, and
that path name also

00:17:43.660 --> 00:17:47.540
gets evaluated with someone's
privileges, et cetera.

00:17:47.540 --> 00:17:50.220
But it might be
that, in some system,

00:17:50.220 --> 00:17:51.940
you could simplify
this access control

00:17:51.940 --> 00:17:55.632
check, where you could do it
yourself in an application.

00:17:55.632 --> 00:17:59.320
Does that seem like a
reasonable plan to you guys?

00:17:59.320 --> 00:18:01.960
Would you go with that?

00:18:01.960 --> 00:18:03.640
Any dangers of
replicating these checks?

00:18:03.640 --> 00:18:04.260
Yeah?

00:18:04.260 --> 00:18:06.865
AUDIENCE: Well, if you do the
checks in the application,

00:18:06.865 --> 00:18:08.594
you could just
not do the checks.

00:18:08.594 --> 00:18:09.260
PROFESSOR: Yeah.

00:18:09.260 --> 00:18:10.360
So you could easily
miss the checks.

00:18:10.360 --> 00:18:11.360
That's absolutely right.

00:18:11.360 --> 00:18:13.680
So in some sense, what the
Fortran compiler did here,

00:18:13.680 --> 00:18:15.370
well, they didn't even bother
trying to do the checks,

00:18:15.370 --> 00:18:16.659
now that they screwed them up.

00:18:16.659 --> 00:18:18.950
Another possibility, in
addition to missing the checks,

00:18:18.950 --> 00:18:21.589
is maybe the kernel
will change over time,

00:18:21.589 --> 00:18:23.380
and it will have slightly
different checks.

00:18:23.380 --> 00:18:25.100
It will introduce some
extra security measure,

00:18:25.100 --> 00:18:26.766
and the application
will be left behind.

00:18:26.766 --> 00:18:28.455
And it will implement
old style checks.

00:18:28.455 --> 00:18:31.280
And probably not a great plan.

00:18:31.280 --> 00:18:34.862
So recall, one good
idea in security

00:18:34.862 --> 00:18:36.590
is to have economy
of mechanisms.

00:18:36.590 --> 00:18:39.222
So there's only a small number
of places that are enforcing

00:18:39.222 --> 00:18:40.180
your security policies.

00:18:40.180 --> 00:18:41.890
You probably don't
want to replicate

00:18:41.890 --> 00:18:45.520
the same functionality in
applications in the kernel,

00:18:45.520 --> 00:18:46.020
et cetera.

00:18:46.020 --> 00:18:48.090
You really want to boil
it down to one place.

00:18:48.090 --> 00:18:50.900
That roughly makes sense?

00:18:50.900 --> 00:18:52.070
OK.

00:18:52.070 --> 00:18:56.980
So what is this
capability, I guess,

00:18:56.980 --> 00:19:02.220
idea where thinking might
solve this authority problem?

00:19:02.220 --> 00:19:05.150
Well, there's some formal
definition for the thing.

00:19:05.150 --> 00:19:08.570
But really, you can get very
close by thinking of Unix file

00:19:08.570 --> 00:19:11.270
descriptors as a capability.

00:19:11.270 --> 00:19:15.210
So I guess the alternative
to this picture,

00:19:15.210 --> 00:19:18.470
in capability world,
is that instead

00:19:18.470 --> 00:19:20.510
of having the
application supply name,

00:19:20.510 --> 00:19:22.510
and you look up an object,
you get a permission,

00:19:22.510 --> 00:19:24.180
you decide whether
to allow it based

00:19:24.180 --> 00:19:25.910
on some ambient
authority, instead,

00:19:25.910 --> 00:19:28.920
the capability is the
picture looks very simple.

00:19:28.920 --> 00:19:32.230
You have a capability, and
if you have a capability,

00:19:32.230 --> 00:19:35.270
it points to an object.

00:19:35.270 --> 00:19:37.482
And maybe the capability
has some small number

00:19:37.482 --> 00:19:40.450
of restrictions of what
you can do with an object.

00:19:40.450 --> 00:19:43.340
But basically, if you have
the capability to an object,

00:19:43.340 --> 00:19:44.830
you can access the object.

00:19:44.830 --> 00:19:46.420
It's actually very simple.

00:19:46.420 --> 00:19:49.280
So there's no ambient
authority that

00:19:49.280 --> 00:19:51.470
decides whether an
operation on a capability

00:19:51.470 --> 00:19:53.310
is going to be allowed.

00:19:53.310 --> 00:19:55.290
The only thing is that
maybe the capability has

00:19:55.290 --> 00:19:57.623
a couple of extra bits, or
this mass that they described

00:19:57.623 --> 00:19:59.629
in the paper, which
says, well, you

00:19:59.629 --> 00:20:02.240
have a capability for this
file, as it's restricted

00:20:02.240 --> 00:20:03.470
to read operations only.

00:20:03.470 --> 00:20:07.440
Or it's restricted to write
or append operations only.

00:20:07.440 --> 00:20:10.885
And then your security decisions
are all of a sudden very easy.

00:20:10.885 --> 00:20:12.260
Because if you
have a capability,

00:20:12.260 --> 00:20:13.410
you can do something.

00:20:13.410 --> 00:20:15.248
If you don't, you can't.

00:20:15.248 --> 00:20:17.940
Make sense?

00:20:17.940 --> 00:20:21.430
So I guess one important
property of capability

00:20:21.430 --> 00:20:25.000
is that they should
actually be unforgeable,

00:20:25.000 --> 00:20:27.396
as the papers talk about.

00:20:27.396 --> 00:20:29.020
So what does it mean
to be unforgeable,

00:20:29.020 --> 00:20:31.900
or why do we want this
in this capability world?

00:20:34.980 --> 00:20:37.700
Well, I guess this actually
may be almost too obvious here.

00:20:37.700 --> 00:20:39.324
Well, if you can make
up any capability

00:20:39.324 --> 00:20:41.275
you want-- I can make
up a capability for any

00:20:41.275 --> 00:20:42.849
of your guys' files
and go access it.

00:20:42.849 --> 00:20:44.640
So if I can make it
up, and I'll access it,

00:20:44.640 --> 00:20:47.760
and there's nothing else in the
security design, that stops me

00:20:47.760 --> 00:20:54.030
from accessing an object once
I can manufacture a capability.

00:20:54.030 --> 00:20:55.870
So it's important that
these capabilities

00:20:55.870 --> 00:20:58.765
can't be made up out of
thin air by the application

00:20:58.765 --> 00:21:01.340
or by whatever's running.

00:21:01.340 --> 00:21:05.170
How is this getting forced, if
we think of file descriptors

00:21:05.170 --> 00:21:07.249
as capabilities?

00:21:07.249 --> 00:21:09.040
So many of you guys
actually submitted this

00:21:09.040 --> 00:21:11.300
as the big question
about Capsicum.

00:21:11.300 --> 00:21:13.080
What do you think?

00:21:13.080 --> 00:21:17.490
What prevents an application
from synthesizing a capability

00:21:17.490 --> 00:21:20.490
in this file descriptor world?

00:21:20.490 --> 00:21:24.310
Could you synthesize
a capability?

00:21:24.310 --> 00:21:24.950
Yeah?

00:21:24.950 --> 00:21:26.949
AUDIENCE: Well, it was
probably like a structure

00:21:26.949 --> 00:21:29.364
and a construct
that says that they

00:21:29.364 --> 00:21:31.504
have a capability for
certain file descriptors.

00:21:31.504 --> 00:21:32.170
PROFESSOR: Yeah.

00:21:32.170 --> 00:21:35.510
So it's actually
fairly easy to see

00:21:35.510 --> 00:21:37.500
what goes on once you
look at what exactly

00:21:37.500 --> 00:21:38.666
is a file descriptor, right?

00:21:38.666 --> 00:21:40.230
So a file descriptor
is basically

00:21:40.230 --> 00:21:42.040
just some sort of an integer.

00:21:42.040 --> 00:21:44.756
And this integer--
like in Unix, you

00:21:44.756 --> 00:21:46.880
have file descriptor 0,
which refers to your input,

00:21:46.880 --> 00:21:48.796
file descriptor 1 which
refers to your output.

00:21:48.796 --> 00:21:49.470
Rockwell

00:21:49.470 --> 00:21:52.580
But really, these are just
integers in user space.

00:21:52.580 --> 00:21:56.120
And this is what the
application can presumably do,

00:21:56.120 --> 00:21:58.380
and it can choose
any integer it wants.

00:21:58.380 --> 00:22:00.190
But whenever you
try to do something

00:22:00.190 --> 00:22:02.570
to a file descriptor, which
is one of these integers,

00:22:02.570 --> 00:22:05.640
the kernel will always
interpret the integer

00:22:05.640 --> 00:22:08.680
according to your current
process's file descriptor

00:22:08.680 --> 00:22:09.490
table.

00:22:09.490 --> 00:22:12.430
So for every PID-- let's
say, well, this is PID,

00:22:12.430 --> 00:22:13.395
I don't know, 57.

00:22:13.395 --> 00:22:14.830
So I'm process running.

00:22:14.830 --> 00:22:18.750
It has an open file
table, and each integer

00:22:18.750 --> 00:22:20.560
from supply from
user space, refers

00:22:20.560 --> 00:22:23.185
to some entry in this table.

00:22:23.185 --> 00:22:26.650
And of course, the kernel
should check that the integer

00:22:26.650 --> 00:22:28.000
is in bounds in this stable.

00:22:28.000 --> 00:22:29.630
It isn't negative.

00:22:29.630 --> 00:22:31.890
It doesn't go past
the end of the table.

00:22:31.890 --> 00:22:34.050
Otherwise, it will have
the usual buffer overflow

00:22:34.050 --> 00:22:35.630
problems, et cetera.

00:22:35.630 --> 00:22:38.517
But if you carefully
check that the integer is

00:22:38.517 --> 00:22:41.380
in bounds in the
kernel implementation,

00:22:41.380 --> 00:22:44.670
then the only possible
things that the application

00:22:44.670 --> 00:22:46.550
can refer to by
a file descriptor

00:22:46.550 --> 00:22:48.910
are entries in this table.

00:22:48.910 --> 00:22:51.060
So presumably, the
kernel will somehow

00:22:51.060 --> 00:22:54.640
make sure that you legitimately
guard a particular capability.

00:22:54.640 --> 00:22:58.810
So when you, for example, open a
file outside of this capability

00:22:58.810 --> 00:23:03.240
model in Unix, well, the kernel,
after the open call succeeds,

00:23:03.240 --> 00:23:07.420
it's going to change that
file descriptor table

00:23:07.420 --> 00:23:10.090
entry to point to a
particular open file,

00:23:10.090 --> 00:23:11.126
like maybe open/etc/pwd.

00:23:14.350 --> 00:23:17.380
And now, the entry at
this slot on the table

00:23:17.380 --> 00:23:18.580
points to an open file.

00:23:18.580 --> 00:23:20.080
Some of them might
actually be null.

00:23:20.080 --> 00:23:23.260
Maybe you don't have an open
file with a particular index

00:23:23.260 --> 00:23:24.660
in this table.

00:23:24.660 --> 00:23:29.000
And as a result, what does it
mean to forge a capability?

00:23:29.000 --> 00:23:30.700
The only thing you
can do in user space

00:23:30.700 --> 00:23:32.460
is make up an integer.

00:23:32.460 --> 00:23:35.230
And the only integers that
would make sense to make up

00:23:35.230 --> 00:23:38.560
would be entries that point to
non-null entries in this table.

00:23:38.560 --> 00:23:42.910
And those guys are exactly the
capabilities that you have.

00:23:42.910 --> 00:23:45.620
So does that make sense why
it's difficult, in this file

00:23:45.620 --> 00:23:47.750
descriptor world, to
actually forge capabilities

00:23:47.750 --> 00:23:48.542
in the first place?

00:23:48.542 --> 00:23:49.708
So it's kind of cool, right?

00:23:49.708 --> 00:23:52.130
Like the only files that
you have opened are exactly

00:23:52.130 --> 00:23:53.420
the things you can operate on.

00:23:53.420 --> 00:23:56.740
And there's nothing else
that you can potentially

00:23:56.740 --> 00:23:59.996
touch and effect.

00:23:59.996 --> 00:24:00.820
Make sense?

00:24:00.820 --> 00:24:01.403
Any questions?

00:24:05.630 --> 00:24:06.610
All right.

00:24:06.610 --> 00:24:07.110
OK.

00:24:07.110 --> 00:24:09.990
So I guess, how
would capabilities

00:24:09.990 --> 00:24:12.540
help solve the ambient
authority problem

00:24:12.540 --> 00:24:14.820
that Norman Hardy is excited
about with his Fortran

00:24:14.820 --> 00:24:16.020
compiler?

00:24:16.020 --> 00:24:19.790
So what would be the file
descriptor moral equivalent

00:24:19.790 --> 00:24:22.600
solution to this
sysx/fort thing?

00:24:25.682 --> 00:24:27.140
Do they actually
solve the problem?

00:24:27.140 --> 00:24:28.016
Yeah?

00:24:28.016 --> 00:24:31.590
AUDIENCE: Well, they just use
the appropriate capabilities

00:24:31.590 --> 00:24:33.160
whenever they're needed.

00:24:33.160 --> 00:24:36.660
So when you have to access the
output file, in the statistics,

00:24:36.660 --> 00:24:39.378
you use the capability
[INAUDIBLE] file.

00:24:39.378 --> 00:24:42.320
But when you're accessing the
file you're about to read,

00:24:42.320 --> 00:24:44.714
you don't use that capability.

00:24:44.714 --> 00:24:45.380
PROFESSOR: Yeah.

00:24:45.380 --> 00:24:48.370
So I guess really what it
boils down to is that somehow

00:24:48.370 --> 00:24:51.560
the Fortran compiler should just
already have a file descriptor

00:24:51.560 --> 00:24:54.280
open for that /sysx/stat file.

00:24:54.280 --> 00:24:57.660
So they don't really describe,
in their short paper,

00:24:57.660 --> 00:24:59.950
about how we don't
get that capability.

00:24:59.950 --> 00:25:02.340
But it basically means
you shouldn't really

00:25:02.340 --> 00:25:04.250
pass file names around.

00:25:04.250 --> 00:25:05.925
You shouldn't set
past file descriptors.

00:25:05.925 --> 00:25:08.270
So you could actually come
up with a perhaps much more

00:25:08.270 --> 00:25:12.540
elegant design for our Unix
replacement on the Fortran

00:25:12.540 --> 00:25:14.290
compiler using capabilities.

00:25:14.290 --> 00:25:19.530
So maybe the plan is we should
just have a Fortran compiler

00:25:19.530 --> 00:25:22.310
front end that doesn't
have any extra privileges,

00:25:22.310 --> 00:25:25.750
and it takes all these arguments
you give it, and converts

00:25:25.750 --> 00:25:30.340
all the path names you supply to
it into open file descriptors.

00:25:30.340 --> 00:25:33.540
So the alternative design
I am thinking of here

00:25:33.540 --> 00:25:36.160
is that maybe we'd
have a program

00:25:36.160 --> 00:25:38.200
fort1, which is the front end.

00:25:38.200 --> 00:25:40.345
And it would take some
sort of a file, foo.f,

00:25:40.345 --> 00:25:45.390
and all the other
arguments, -o, whatever.

00:25:45.390 --> 00:25:48.470
And it doesn't actually
implement any of the compiler

00:25:48.470 --> 00:25:50.020
logic, anything else.

00:25:50.020 --> 00:25:52.080
All it looks for is path
names in its arguments,

00:25:52.080 --> 00:25:54.870
and it's going to open
them and establish

00:25:54.870 --> 00:25:55.991
file descriptors for them.

00:25:56.471 --> 00:25:58.054
And the cool thing
is that, because it

00:25:58.054 --> 00:26:01.570
has no extra privileges, if
the user can't have access

00:26:01.570 --> 00:26:03.520
to some file name,
then it will fail.

00:26:03.520 --> 00:26:04.720
Those are great.

00:26:04.720 --> 00:26:07.280
And then once this front end
has opened all these file

00:26:07.280 --> 00:26:10.990
descriptors, it can execute
some privileged extra component,

00:26:10.990 --> 00:26:14.500
like the actual setuid
Fortran compiler.

00:26:14.500 --> 00:26:16.520
So maybe then it'll run fort.

00:26:16.520 --> 00:26:19.075
This guy's maybe setuid to
some special user ID that

00:26:19.075 --> 00:26:21.230
has access to the stats file.

00:26:21.230 --> 00:26:23.750
But it doesn't actually accept
any path names as input.

00:26:23.750 --> 00:26:27.250
All it's going to do is
accept file descriptors.

00:26:27.250 --> 00:26:29.550
And, in that case,
the file descriptor

00:26:29.550 --> 00:26:33.980
is already prove that the
caller had access to open them.

00:26:33.980 --> 00:26:35.845
Does the property make sense?

00:26:35.845 --> 00:26:37.800
So it of course doesn't
solve every issue.

00:26:37.800 --> 00:26:40.570
I'm just sort of sketching out
how capabilities might help.

00:26:40.570 --> 00:26:43.565
But that's roughly the plan,
is that you should demonstrate

00:26:43.565 --> 00:26:45.760
the fact that you have
access to a particular name

00:26:45.760 --> 00:26:49.190
by just opening it and passing
a capability, instead of saying,

00:26:49.190 --> 00:26:51.140
why didn't you try
to open this file

00:26:51.140 --> 00:26:54.457
and maybe accidentally
use some extra privileges.

00:26:54.457 --> 00:26:54.956
Yes.

00:26:54.956 --> 00:26:56.354
AUDIENCE: So does
this generalize

00:26:56.354 --> 00:26:59.137
to having one process
per capability?

00:26:59.137 --> 00:27:00.470
PROFESSOR: Does this generalize?

00:27:00.470 --> 00:27:02.330
Well, of course you can have
as many processes as you want.

00:27:02.330 --> 00:27:04.288
You can have multiple
processes per capability,

00:27:04.288 --> 00:27:05.324
but I'm not sure--

00:27:05.324 --> 00:27:06.240
AUDIENCE: [INAUDIBLE].

00:27:12.930 --> 00:27:16.480
PROFESSOR: I'm still not sure
what you mean by one property.

00:27:16.480 --> 00:27:19.222
AUDIENCE: So we have [INAUDIBLE]
capabilities the user has.

00:27:19.801 --> 00:27:20.800
PROFESSOR: That's right.

00:27:20.800 --> 00:27:22.633
AUDIENCE: And then we
have the fort.s access

00:27:22.633 --> 00:27:24.211
to this past file.

00:27:24.211 --> 00:27:25.210
PROFESSOR: That's right.

00:27:25.210 --> 00:27:25.470
Yeah.

00:27:25.470 --> 00:27:27.595
So the way to think of it
is, you don't necessarily

00:27:27.595 --> 00:27:31.516
need a separate process
for every capability.

00:27:31.516 --> 00:27:35.140
Because here, the fort1
thing might open many files

00:27:35.140 --> 00:27:38.590
and might pass many capabilities
to the privileged fort

00:27:38.590 --> 00:27:40.435
component.

00:27:40.435 --> 00:27:42.060
The problem here--
the reason that this

00:27:42.060 --> 00:27:44.030
might seem like you
want a separate process

00:27:44.030 --> 00:27:48.427
for every capability
is that we're

00:27:48.427 --> 00:27:51.010
sort of dealing with this weird
interface between capabilities

00:27:51.010 --> 00:27:52.450
and ambient privileges.

00:27:52.450 --> 00:27:54.780
Because fort1 sort of does
have ambient privilege.

00:27:54.780 --> 00:27:56.155
And what we're
doing is basically

00:27:56.155 --> 00:27:59.100
we're converting this ambient
privilege into capabilities

00:27:59.100 --> 00:28:00.890
in this fort1 process.

00:28:00.890 --> 00:28:02.580
So if you have multiple
different kinds

00:28:02.580 --> 00:28:05.035
of ambient privilege, or
multiple different privileges

00:28:05.035 --> 00:28:07.730
that you want to carefully
use, then maybe what you want

00:28:07.730 --> 00:28:10.320
is a separate process
holding that privilege.

00:28:10.320 --> 00:28:12.820
And whenever you want to use a
particular set of privileges,

00:28:12.820 --> 00:28:14.520
you'll ask the
corresponding process

00:28:14.520 --> 00:28:16.800
to please perform a separation.

00:28:16.800 --> 00:28:19.120
And if it succeeds, give
me back the capability.

00:28:19.120 --> 00:28:21.210
So that's maybe one
way to think of this.

00:28:24.000 --> 00:28:26.336
There's been actually some
operating system designs that

00:28:26.336 --> 00:28:30.770
are entirely capability-based,
there are no ambient privileges

00:28:30.770 --> 00:28:31.564
whatsoever.

00:28:31.564 --> 00:28:32.480
And it's kind of cool.

00:28:32.480 --> 00:28:35.961
Unfortunately, it's more of
sort of an interesting reading

00:28:35.961 --> 00:28:36.460
experience.

00:28:36.460 --> 00:28:37.905
Like oh, yeah, you can do it.

00:28:37.905 --> 00:28:38.920
That's pretty cool.

00:28:38.920 --> 00:28:42.680
But it's probably not
really practical to use

00:28:42.680 --> 00:28:45.540
in a real system, unfortunately.

00:28:45.540 --> 00:28:48.300
It turns out that you
really do want not so much

00:28:48.300 --> 00:28:51.200
ambient privilege but being
able to name an object

00:28:51.200 --> 00:28:53.960
and tell someone about an object
without conveying necessarily

00:28:53.960 --> 00:28:56.060
the rights to that object.

00:28:56.060 --> 00:28:57.670
So maybe I don't
know what privileges

00:28:57.670 --> 00:29:00.599
you might have over some
shared document, but I do

00:29:00.599 --> 00:29:02.890
want to tell you, hey, well,
there's a shared document.

00:29:02.890 --> 00:29:04.230
If you can read it, read it.

00:29:04.230 --> 00:29:05.605
If you write it,
great, write it.

00:29:05.605 --> 00:29:07.830
But I don't want to
necessarily convey any rights.

00:29:07.830 --> 00:29:10.960
I just want to tell you, hey,
there's this thing, go try it.

00:29:10.960 --> 00:29:13.540
So it's a bit of a bummer
in a capability world

00:29:13.540 --> 00:29:16.930
that it really forces
you to never talk

00:29:16.930 --> 00:29:21.050
about objects without conveying
rights to that object.

00:29:21.050 --> 00:29:24.910
So it's an important
idea to know about,

00:29:24.910 --> 00:29:27.240
and to use it in some
parts of a system,

00:29:27.240 --> 00:29:29.639
but probably not the
be all end all solution

00:29:29.639 --> 00:29:31.930
to security, much like almost
anything else [INAUDIBLE]

00:29:31.930 --> 00:29:33.419
about here.

00:29:33.419 --> 00:29:33.918
Make sense?

00:29:33.918 --> 00:29:34.810
Yeah?

00:29:34.810 --> 00:29:37.720
AUDIENCE: So if the process
has capabilities given to it

00:29:37.720 --> 00:29:40.811
by some other process,
and it happens

00:29:40.811 --> 00:29:43.395
to already have the capability
to that object, that's greater.

00:29:43.395 --> 00:29:45.269
Can it compare them to
make sure that they're

00:29:45.269 --> 00:29:46.629
about the same object?

00:29:46.629 --> 00:29:48.420
Or will it just use
the one that's greater?

00:29:48.420 --> 00:29:50.919
PROFESSOR: So the thing is that
a process doesn't implicitly

00:29:50.919 --> 00:29:51.794
use the capabilities.

00:29:51.794 --> 00:29:53.627
So that's the cool thing
about capabilities.

00:29:53.627 --> 00:29:55.760
You have to explicitly name
which one you're using.

00:29:55.760 --> 00:29:57.680
So think of it in terms
of file descriptors.

00:29:57.680 --> 00:30:01.820
Suppose that I give you an open
file descriptor for some file,

00:30:01.820 --> 00:30:02.807
and it's read only.

00:30:02.807 --> 00:30:04.890
And then someone else gives
you another capability

00:30:04.890 --> 00:30:07.431
for some other-- maybe the same
filem maybe a different file,

00:30:07.431 --> 00:30:08.760
and it's read/write.

00:30:08.760 --> 00:30:10.390
It's not all of a
sudden that if you're

00:30:10.390 --> 00:30:12.869
trying to write to the
first file descriptor

00:30:12.869 --> 00:30:14.660
you had that was read
only, all of a sudden

00:30:14.660 --> 00:30:16.390
those will start
succeeding because you

00:30:16.390 --> 00:30:19.270
have this extra writeable
file descriptor open.

00:30:19.270 --> 00:30:21.407
So that's sort of
the cool thing.

00:30:21.407 --> 00:30:22.990
You don't want this
ambient privilege.

00:30:22.990 --> 00:30:24.920
Because if you think
of these capabilities

00:30:24.920 --> 00:30:27.245
as a bunch of privileges
that just keep accumulating

00:30:27.245 --> 00:30:29.190
in your process, then
you'll actually just

00:30:29.190 --> 00:30:30.690
end up with ambient
privilege again.

00:30:30.690 --> 00:30:32.849
You just have all these
magic capabilities,

00:30:32.849 --> 00:30:34.765
and people have actually
built such libraries.

00:30:34.765 --> 00:30:37.197
Basically, well, they manage
your capabilities for you.

00:30:37.197 --> 00:30:38.280
They sort of collect them.

00:30:38.280 --> 00:30:39.680
And when you try to
perform an operation,

00:30:39.680 --> 00:30:40.670
they look for the
capabilities and find

00:30:40.670 --> 00:30:42.250
the one that'll make it work.

00:30:42.250 --> 00:30:44.500
That exactly brings you back
to this ambient authority

00:30:44.500 --> 00:30:45.890
that you were trying to avoid.

00:30:45.890 --> 00:30:47.390
So the cool thing
about capabilities

00:30:47.390 --> 00:30:50.670
is that it's almost like
a programming construct,

00:30:50.670 --> 00:30:52.875
where it makes it
easy for you-- which

00:30:52.875 --> 00:30:54.875
is a rare thing in
security-- it makes it easier

00:30:54.875 --> 00:30:56.950
for you to write
code that specifies

00:30:56.950 --> 00:30:59.200
exactly what privileges you
want to do from a security

00:30:59.200 --> 00:30:59.700
standpoint.

00:30:59.700 --> 00:31:02.570
And it's actually a fairly
natural code to write.

00:31:02.570 --> 00:31:05.280
So if you get into that mindset
of always carrying around

00:31:05.280 --> 00:31:07.450
this privilege with the
object you're accessing,

00:31:07.450 --> 00:31:09.210
it seems like a
cool thing to do.

00:31:09.210 --> 00:31:12.750
It doesn't always make
sense, but sometimes it does.

00:31:12.750 --> 00:31:16.070
Any other questions?

00:31:16.070 --> 00:31:16.640
OK.

00:31:16.640 --> 00:31:20.150
So that's more on
the ambient authority

00:31:20.150 --> 00:31:21.730
that we've look at here.

00:31:21.730 --> 00:31:23.640
It turns out that
capabilities are also

00:31:23.640 --> 00:31:26.100
great for other
problems, as well.

00:31:26.100 --> 00:31:30.000
And in particular, the
problem of managing privileges

00:31:30.000 --> 00:31:33.700
often shows up when you want
to run some untrustworthy code.

00:31:33.700 --> 00:31:35.370
Because you want
to really control

00:31:35.370 --> 00:31:37.280
which privileges you
give it, because you

00:31:37.280 --> 00:31:40.590
think it will misuse any
privileges you give it at all.

00:31:40.590 --> 00:31:44.150
And this is the slightly
different point of view

00:31:44.150 --> 00:31:46.960
from which the authors
of the Capsicum paper

00:31:46.960 --> 00:31:50.640
are coming at capabilities.

00:31:50.640 --> 00:31:53.575
So they're of course clearly
aware of this ambient authority

00:31:53.575 --> 00:31:55.450
problem, but it's sort
of a different problem

00:31:55.450 --> 00:31:57.720
that you might or might
not care about solving.

00:31:57.720 --> 00:32:00.960
But the particular thing
they really care about

00:32:00.960 --> 00:32:04.776
is they have a really large
privileged application,

00:32:04.776 --> 00:32:06.150
and they worry
that there's going

00:32:06.150 --> 00:32:10.480
to be bugs in different parts
of that application source code.

00:32:10.480 --> 00:32:12.900
So they would like to
reduce the privileges

00:32:12.900 --> 00:32:16.380
of different components
of that application.

00:32:16.380 --> 00:32:20.480
So in that sense, the story
is very similar to OKWS.

00:32:20.480 --> 00:32:24.459
So you have-- for
sandboxing, you

00:32:24.459 --> 00:32:27.000
have some large application,
you break it up into components,

00:32:27.000 --> 00:32:30.270
and you will limit what
privileges each component has.

00:32:30.270 --> 00:32:31.520
So where does this make sense?

00:32:31.520 --> 00:32:34.140
Like OKWS is
clearly one example.

00:32:34.140 --> 00:32:36.010
What are other situations
where you might

00:32:36.010 --> 00:32:40.280
care about prileged separation?

00:32:40.280 --> 00:32:43.707
Well, I guess in the paper
they describe the examples I

00:32:43.707 --> 00:32:44.540
actually got to run.

00:32:44.540 --> 00:32:48.320
So things like tcpdump
and other applications

00:32:48.320 --> 00:32:50.285
that parse network data.

00:32:50.285 --> 00:32:53.890
So why do they worry so
much about applications

00:32:53.890 --> 00:32:56.000
that parse network inputs?

00:32:56.000 --> 00:32:57.580
What goes wrong in tcpdump?

00:32:57.580 --> 00:32:58.656
Why are they so paranoid?

00:32:58.656 --> 00:33:01.036
AUDIENCE: Well, an attacker
can control what's being sent

00:33:01.036 --> 00:33:01.988
and what's being called.

00:33:01.988 --> 00:33:02.470
PROFESSOR: Yeah.

00:33:02.470 --> 00:33:04.020
I think what they
really worry about is,

00:33:04.020 --> 00:33:06.603
very much like with OKWS, they
worry about that attack surface

00:33:06.603 --> 00:33:08.900
and how much can an attacker
really control the inputs?

00:33:08.900 --> 00:33:11.970
And with these network
parsing programs,

00:33:11.970 --> 00:33:14.698
there's a lot of control
that that factor has.

00:33:14.698 --> 00:33:16.100
They have the exact packet.

00:33:16.100 --> 00:33:18.355
And the reason that
this was so problematic

00:33:18.355 --> 00:33:21.400
is that if you're
writing code in C that

00:33:21.400 --> 00:33:23.920
has to parse data
structures, you're presumably

00:33:23.920 --> 00:33:26.100
going to do lots of
pointer manipulations,

00:33:26.100 --> 00:33:28.830
copying bites into
arrays, allocating memory.

00:33:28.830 --> 00:33:32.450
And as you are now experts,
this is super fragile.

00:33:32.450 --> 00:33:34.875
And you can easily have
memory management errors

00:33:34.875 --> 00:33:38.155
that lead to pretty
disastrous consequences.

00:33:38.155 --> 00:33:39.530
So this is the
reason why they're

00:33:39.530 --> 00:33:43.990
very excited about sandboxing
various network protocol,

00:33:43.990 --> 00:33:45.790
parsing things, et cetera.

00:33:45.790 --> 00:33:47.850
Another probably
real world instance

00:33:47.850 --> 00:33:50.070
where you really care about
this is in your browser.

00:33:50.070 --> 00:33:52.070
You probably want to
sandbox your Flash plug-in,

00:33:52.070 --> 00:33:54.960
or your Java
extension, or whatnot.

00:33:54.960 --> 00:33:56.570
Because they're
pretty large attack

00:33:56.570 --> 00:33:58.430
surfaces as well
that have gotten

00:33:58.430 --> 00:34:01.352
exploited pretty aggressively.

00:34:01.352 --> 00:34:02.810
So it seems like
a reasonable plan.

00:34:02.810 --> 00:34:04.726
Like if you're writing
some piece of software,

00:34:04.726 --> 00:34:06.980
you want to sandbox
different components of it.

00:34:06.980 --> 00:34:08.790
What about more generally,
if you download something

00:34:08.790 --> 00:34:10.498
from the internet,
and you want to run it

00:34:10.498 --> 00:34:12.889
with fewer privileges?

00:34:12.889 --> 00:34:16.989
Is this sort of Capsicum style
isolation a good plan for that?

00:34:16.989 --> 00:34:19.500
I could download some random
screensaver or some game

00:34:19.500 --> 00:34:20.290
from the internet.

00:34:20.290 --> 00:34:21.590
And I want to run
it on my computer,

00:34:21.590 --> 00:34:23.381
and I want to make sure
it doesn't screw up

00:34:23.381 --> 00:34:24.690
whatever I have laying around.

00:34:27.802 --> 00:34:28.760
Would you use Capsicum?

00:34:28.760 --> 00:34:31.588
Would this be a good plan?

00:34:31.588 --> 00:34:33.046
Yeah?

00:34:33.046 --> 00:34:35.476
AUDIENCE: You could write
a sandboxing program,

00:34:35.476 --> 00:34:38.878
which you'd use Capsicum
to sandbox [INAUDIBLE].

00:34:42.652 --> 00:34:43.360
PROFESSOR: Right.

00:34:43.360 --> 00:34:44.900
You could try to use Capsicum.

00:34:44.900 --> 00:34:46.150
So how would you use Capsicum?

00:34:46.150 --> 00:34:49.380
Well, you'd just enter into the
sandbox mode with cap_enter.

00:34:49.380 --> 00:34:53.330
And then you run the program.

00:34:53.330 --> 00:34:54.514
Would you expect it to work?

00:34:56.887 --> 00:34:59.220
I guess the problem is that
if the program wasn't really

00:34:59.220 --> 00:35:01.155
expecting to be
sandboxed with Capsicum,

00:35:01.155 --> 00:35:04.920
then all of a sudden the
program will try to open any

00:35:04.920 --> 00:35:07.460
simplified-- it'll
open a shared library,

00:35:07.460 --> 00:35:09.430
and it can't open
the shared library,

00:35:09.430 --> 00:35:11.570
because it can't
open/liv/ something else.

00:35:11.570 --> 00:35:13.810
That's not allowed
in capability mode.

00:35:13.810 --> 00:35:16.790
So it's a bit of a problem.

00:35:16.790 --> 00:35:18.800
So typically, these
sandboxing techniques

00:35:18.800 --> 00:35:21.685
that we're going to look at
here-- capabilities, style,

00:35:21.685 --> 00:35:24.850
stuff, and so on--
really are best

00:35:24.850 --> 00:35:27.400
used when the developer
is sort of building

00:35:27.400 --> 00:35:30.110
the application aware
that the code is

00:35:30.110 --> 00:35:31.882
going to run in this mode.

00:35:31.882 --> 00:35:34.260
There's probably other kinds
of sandboxing techniques

00:35:34.260 --> 00:35:36.550
that could be used
for unmodified code,

00:35:36.550 --> 00:35:40.270
but then the focus, or the
requirements, change a bit.

00:35:40.270 --> 00:35:42.410
So in Capsicum,
they don't really

00:35:42.410 --> 00:35:43.910
worry about backwards
compatibility.

00:35:43.910 --> 00:35:45.320
Well, we have to open
files differently?

00:35:45.320 --> 00:35:46.770
Sure, we'll open
them differently.

00:35:46.770 --> 00:35:48.820
Whereas, if you want
to write existing code,

00:35:48.820 --> 00:35:51.330
you probably want
something more like maybe

00:35:51.330 --> 00:35:52.450
a full virtual machine.

00:35:52.450 --> 00:35:55.040
So you could open a
VM and run it there.

00:35:55.040 --> 00:35:58.400
And it's very
compatible, and there's

00:35:58.400 --> 00:36:03.440
no question that it'll just
run, and probably not--

00:36:03.440 --> 00:36:07.060
Well, it's actually a
good thought exercise.

00:36:07.060 --> 00:36:11.970
Should we use virtual machines
to sandbox instead of Capsicum?

00:36:11.970 --> 00:36:12.886
AUDIENCE: [INAUDIBLE].

00:36:12.886 --> 00:36:13.690
PROFESSOR: Yeah.

00:36:13.690 --> 00:36:16.510
The overheads are probably
quite significant.

00:36:16.510 --> 00:36:20.715
So the memory overhead
is pretty bad.

00:36:20.715 --> 00:36:21.325
It could be.

00:36:21.325 --> 00:36:22.900
But what if we don't care
about memory overhead?

00:36:22.900 --> 00:36:24.691
So maybe virtual machines
gets really good,

00:36:24.691 --> 00:36:28.080
and they don't actually
use that much memory.

00:36:28.080 --> 00:36:30.210
Is it still a bad plan?

00:36:30.210 --> 00:36:32.708
AUDIENCE: [INAUDIBLE].

00:36:32.708 --> 00:36:33.374
PROFESSOR: Yeah.

00:36:33.374 --> 00:36:37.160
So it's kind of hard to control
what happens on the network,

00:36:37.160 --> 00:36:40.150
because either you give the
virtual machine no access

00:36:40.150 --> 00:36:42.570
to the network at all, or
you connect to a network

00:36:42.570 --> 00:36:45.800
through NAT mode or something
in Preview or VMware.

00:36:45.800 --> 00:36:47.550
And then it can access
the whole internet.

00:36:47.550 --> 00:36:52.652
So you have to much more
explicitly control network

00:36:52.652 --> 00:36:55.110
by maybe setting up firewall
rules for the virtual machine,

00:36:55.110 --> 00:36:55.797
et cetera.

00:36:55.797 --> 00:36:56.880
That's maybe not so great.

00:36:56.880 --> 00:36:58.890
What if you don't
care about network?

00:36:58.890 --> 00:37:04.240
What if you're some simple
video or tcpdump parser.

00:37:04.240 --> 00:37:05.260
You just spin up a VM.

00:37:05.260 --> 00:37:07.000
It's going to parse
your tcpdump packets

00:37:07.000 --> 00:37:09.490
and spit you back
after your presentation

00:37:09.490 --> 00:37:11.850
that tcpdump wants
to burn to the user.

00:37:11.850 --> 00:37:14.190
So there's no real
network I/O. Maybe you're,

00:37:14.190 --> 00:37:20.820
for some reason
[INAUDIBLE] still?

00:37:20.820 --> 00:37:23.340
AUDIENCE: Because the
initialization overhead

00:37:23.340 --> 00:37:24.656
is still large.

00:37:24.656 --> 00:37:25.490
PROFESSOR: Yeah.

00:37:25.490 --> 00:37:27.823
So it's maybe like an initial
overhead of starting a VM.

00:37:27.823 --> 00:37:28.620
So that's true.

00:37:28.620 --> 00:37:32.030
There's some performance stuff.

00:37:32.030 --> 00:37:32.530
Yeah.

00:37:32.530 --> 00:37:34.780
AUDIENCE: Well, you might
want to have database rights

00:37:34.780 --> 00:37:35.762
and things like that.

00:37:35.762 --> 00:37:36.200
PROFESSOR: Yeah.

00:37:36.200 --> 00:37:38.158
But even more generally,
what you're getting at

00:37:38.158 --> 00:37:41.140
is what if there's a real
data that you care about here?

00:37:41.140 --> 00:37:42.840
And it's really hard to share.

00:37:42.840 --> 00:37:45.990
So VMs are really
a much more sort

00:37:45.990 --> 00:37:50.040
of separation mechanism, where
you can't really share stuff

00:37:50.040 --> 00:37:51.970
across VMs very easily.

00:37:51.970 --> 00:37:53.640
So it's good for
situations where

00:37:53.640 --> 00:37:57.090
you have a very isolated program
you want to run, you basically

00:37:57.090 --> 00:37:59.470
don't want to share any
files with any directories,

00:37:59.470 --> 00:38:01.830
any processes, any pipes even.

00:38:01.830 --> 00:38:03.640
And you just let
it run separately.

00:38:03.640 --> 00:38:04.290
So it's great.

00:38:04.290 --> 00:38:07.340
It's probably, in some ways,
stronger isolation than what

00:38:07.340 --> 00:38:10.340
Capsicum provides, because
there's probably fewer

00:38:10.340 --> 00:38:12.865
ways for things to go wrong.

00:38:12.865 --> 00:38:14.240
And, you know,
all these problems

00:38:14.240 --> 00:38:15.640
we talked about so far.

00:38:15.640 --> 00:38:18.189
But it's also not applicable
in many of the situations

00:38:18.189 --> 00:38:19.730
where you might want
to use Capsicum,

00:38:19.730 --> 00:38:21.880
because in Capsicum,
you can actually

00:38:21.880 --> 00:38:26.645
share files that have very fine
granularity between sandbox

00:38:26.645 --> 00:38:30.342
[INAUDIBLE] by just giving it
capability to [INAUDIBLE] file.

00:38:30.342 --> 00:38:32.550
This is something that's
very easy to do in Capsicum,

00:38:32.550 --> 00:38:35.220
and would require quite
a bit of machinery

00:38:35.220 --> 00:38:37.280
in a virtual machine setting.

00:38:37.280 --> 00:38:40.720
That makes sense?

00:38:40.720 --> 00:38:43.200
Questions?

00:38:43.200 --> 00:38:44.330
All right.

00:38:44.330 --> 00:38:47.600
So does that seem like
a useful primitives

00:38:47.600 --> 00:38:49.340
to have to maybe sandbox stuff.

00:38:49.340 --> 00:38:53.040
So I guess we're going to
talk about different ways

00:38:53.040 --> 00:38:54.900
to try to sandbox something.

00:38:54.900 --> 00:38:58.060
And Capsicum in particular
is the new thing here

00:38:58.060 --> 00:38:59.270
that uses capabilities.

00:38:59.270 --> 00:39:05.810
But just by comparison,
I guess, you

00:39:05.810 --> 00:39:08.350
can do some sandboxing in
Unix, as we saw with OKWS.

00:39:08.350 --> 00:39:08.850
Right?

00:39:08.850 --> 00:39:13.170
It's just not great from
several standpoints.

00:39:13.170 --> 00:39:17.860
So let's maybe take
the example of tcpdump

00:39:17.860 --> 00:39:24.530
and see why tcpdump is difficult
to sandbox with Unix mechanism.

00:39:24.530 --> 00:39:27.880
So remember, in the Capsicum
paper, these guys took tcpdump.

00:39:27.880 --> 00:39:32.570
And the way tcpdump
works is that it

00:39:32.570 --> 00:39:39.080
opens some special sockets and
then runs basically parsing

00:39:39.080 --> 00:39:41.010
logic on network packets.

00:39:41.010 --> 00:39:44.860
And it proceeds and prints them
out to the users' terminal.

00:39:44.860 --> 00:39:51.180
So what would it take to sandbox
tcpdump with Unix primitives?

00:39:51.180 --> 00:39:54.066
Have you restricted privileges?

00:39:54.066 --> 00:39:55.870
So I guess the one
problem with Unix

00:39:55.870 --> 00:39:59.300
is that you basically have
to-- well, the only way

00:39:59.300 --> 00:40:01.890
to really change
privileges is to change

00:40:01.890 --> 00:40:04.152
the inputs into the
decision function that

00:40:04.152 --> 00:40:06.610
decides whether you can actually
access some object or not.

00:40:06.610 --> 00:40:09.160
And the only things
you can really change

00:40:09.160 --> 00:40:11.860
are, well, you can change
the privilges of the process,

00:40:11.860 --> 00:40:14.300
which means it sends
UID to something else.

00:40:14.300 --> 00:40:15.800
Or you could change
the permissions

00:40:15.800 --> 00:40:21.510
on various objects that are
laying around in your system.

00:40:21.510 --> 00:40:23.330
Or probably both,
in fact, right?

00:40:23.330 --> 00:40:25.110
If you wanted to
sandbox tcpdump,

00:40:25.110 --> 00:40:27.850
you'd probably have to
pick some extra user ID

00:40:27.850 --> 00:40:31.612
and switch to that
while you're running.

00:40:31.612 --> 00:40:36.660
Probably not an ideal
plan, because you probably

00:40:36.660 --> 00:40:39.340
don't mean for multiple
instances of tcpdump

00:40:39.340 --> 00:40:41.049
to run as the same user ID.

00:40:41.049 --> 00:40:42.840
So if I compromise one
instance of tcpdump,

00:40:42.840 --> 00:40:45.307
it doesn't really mean I
want to allow that factor

00:40:45.307 --> 00:40:47.515
to now control the other
instances of tcpdump running

00:40:47.515 --> 00:40:49.070
on my machine.

00:40:49.070 --> 00:40:53.614
So that's potentially a bad
part of using user IDs here.

00:40:53.614 --> 00:40:55.530
Another problem is that,
in Unix, you actually

00:40:55.530 --> 00:40:58.924
have to be root in
order to change the user

00:40:58.924 --> 00:41:01.215
ID of the process or something
else, or user privileges

00:41:01.215 --> 00:41:03.200
or switch them to
something else.

00:41:03.200 --> 00:41:05.060
That's not great either.

00:41:05.060 --> 00:41:08.080
And another problem
is that, regardless

00:41:08.080 --> 00:41:11.700
of what your user ID
is, there could be files

00:41:11.700 --> 00:41:13.830
that allow access to them.

00:41:13.830 --> 00:41:16.760
So there could be world
writable or world readable files

00:41:16.760 --> 00:41:17.800
in your file system.

00:41:17.800 --> 00:41:19.730
Like your etc password file.

00:41:19.730 --> 00:41:22.370
Regardless of what your
UID is, the process

00:41:22.370 --> 00:41:24.420
will still be able to
read that password.

00:41:24.420 --> 00:41:26.070
So that's not so nice.

00:41:26.070 --> 00:41:29.850
So the result, in order
to sandbox a unit,

00:41:29.850 --> 00:41:36.257
you probably have to do both--
some UID changing and maybe

00:41:36.257 --> 00:41:38.340
careful look at the
permissions of all the objects

00:41:38.340 --> 00:41:40.507
to convince yourself that
there's no world writeable

00:41:40.507 --> 00:41:41.714
file that's really sensitive.

00:41:41.714 --> 00:41:43.130
Or there's no
world readable file

00:41:43.130 --> 00:41:45.742
that you don't want that
hacker to get access to.

00:41:45.742 --> 00:41:48.200
And I guess [INAUDIBLE] true
that you get another mechanism

00:41:48.200 --> 00:41:49.530
unit that you can use.

00:41:49.530 --> 00:41:50.920
But it all starts to add up.

00:41:50.920 --> 00:41:52.420
If you see it
through, then it might

00:41:52.420 --> 00:41:56.681
be hard to share files or
share directories and so on.

00:41:56.681 --> 00:41:57.680
So does that make sense?

00:41:57.680 --> 00:42:00.466
Just in terms of
contrast for what

00:42:00.466 --> 00:42:02.393
Capsicum is trying to solve?

00:42:02.393 --> 00:42:06.160
Any questions about Unix stuff?

00:42:06.160 --> 00:42:06.950
All right.

00:42:06.950 --> 00:42:10.650
So let's look at how Capsicum
tries to solve this problem.

00:42:10.650 --> 00:42:13.680
So in Capsicum, as
we keep alluding to,

00:42:13.680 --> 00:42:18.330
the plan is very much that once
you enter the sandboxing mode,

00:42:18.330 --> 00:42:20.879
everything is going
to be accessed only

00:42:20.879 --> 00:42:21.670
through capability.

00:42:21.670 --> 00:42:23.490
So if you don't
have a capability,

00:42:23.490 --> 00:42:27.610
you simply cannot
access any objects.

00:42:27.610 --> 00:42:32.000
So these guys, in the
paper, make a huge deal

00:42:32.000 --> 00:42:34.870
about global namespaces.

00:42:34.870 --> 00:42:37.720
So what's this thing
about a global namespace,

00:42:37.720 --> 00:42:39.460
and why are they so
worried about it?

00:42:43.155 --> 00:42:44.780
What's an example of
a global namespace

00:42:44.780 --> 00:42:47.379
these guys worry about?

00:42:47.379 --> 00:42:48.534
AUDIENCE: [INAUDIBLE].

00:42:48.534 --> 00:42:49.200
PROFESSOR: Yeah.

00:42:49.200 --> 00:42:51.634
So a file system from them
is sort of the prime example

00:42:51.634 --> 00:42:52.550
of a global namespace.

00:42:52.550 --> 00:42:55.420
You can start a slash, and
you can basically enumerate

00:42:55.420 --> 00:42:56.690
any file you could, right?

00:42:56.690 --> 00:42:59.450
Like go to someone's
home directory--

00:42:59.450 --> 00:43:03.748
/home/nickolai/
something, something.

00:43:03.748 --> 00:43:04.860
Why is this bad?

00:43:04.860 --> 00:43:08.470
Why are they against global
namespaces in Capsicum?

00:43:14.350 --> 00:43:15.100
What do you think?

00:43:15.100 --> 00:43:15.460
Yeah?

00:43:15.460 --> 00:43:17.543
AUDIENCE: Well, if you
have the wrong permissions,

00:43:17.543 --> 00:43:20.534
then use authorities, and
then you can get in trouble.

00:43:20.534 --> 00:43:21.200
PROFESSOR: Yeah.

00:43:21.200 --> 00:43:23.116
So the problem is that
this is Unix after all.

00:43:23.116 --> 00:43:27.370
So there are still regular
permissions on file.

00:43:27.370 --> 00:43:29.790
So maybe you really want
to sandbox some process

00:43:29.790 --> 00:43:31.804
and can't read anything
at all in the system

00:43:31.804 --> 00:43:32.970
and can't write to anything.

00:43:32.970 --> 00:43:35.530
But if you can name a file
starting from scratch,

00:43:35.530 --> 00:43:38.060
you'll find some stupid user
that has a world writable

00:43:38.060 --> 00:43:39.970
file in their home directory.

00:43:39.970 --> 00:43:43.874
And that would be not so great
for the sandboxing client.

00:43:43.874 --> 00:43:46.290
And I guess more generally,
the way they're thinking of it

00:43:46.290 --> 00:43:50.430
is that, with capabilities, you
could, in principle, enumerate

00:43:50.430 --> 00:43:53.122
exactly all the objects
that a process has.

00:43:53.122 --> 00:43:56.030
Because you could just
enumerate all the capabilities

00:43:56.030 --> 00:43:58.350
in the file descriptor table,
or whatever it is that's

00:43:58.350 --> 00:44:00.250
storing capabilities for you.

00:44:00.250 --> 00:44:03.970
And those are the only things
that the process could ever

00:44:03.970 --> 00:44:05.734
touch.

00:44:05.734 --> 00:44:07.900
And if you ever have access
to our global namespace,

00:44:07.900 --> 00:44:09.090
and this was
potentially unbounded.

00:44:09.090 --> 00:44:10.540
Because you could--
even if you have

00:44:10.540 --> 00:44:11.920
some limited set
of capabilities,

00:44:11.920 --> 00:44:14.850
maybe you'll start from slash
again and find some new file,

00:44:14.850 --> 00:44:16.510
and you'll never
really know what

00:44:16.510 --> 00:44:19.745
is the set of
operations or objects

00:44:19.745 --> 00:44:22.120
that a process could access.

00:44:22.120 --> 00:44:25.370
So this is the reason they're so
worried about global namespaces

00:44:25.370 --> 00:44:28.775
because it goes against their
goal of precisely controlling

00:44:28.775 --> 00:44:33.880
all the things that a sandbox
process should have access to.

00:44:33.880 --> 00:44:36.440
Make sense?

00:44:36.440 --> 00:44:37.590
All right.

00:44:37.590 --> 00:44:39.850
So they tried to eliminate
global namespaces

00:44:39.850 --> 00:44:44.590
with a bunch of kernel changes
to the FreeBSD, in their case,

00:44:44.590 --> 00:44:47.960
kernel to make sure that
all the operations go

00:44:47.960 --> 00:44:52.220
through some kind of capability,
which is, in their case,

00:44:52.220 --> 00:44:54.190
a file descriptor.

00:44:54.190 --> 00:44:57.800
So just to double check, do
we really need kernel changes?

00:44:57.800 --> 00:45:00.350
What if we just do
this in a library?

00:45:00.350 --> 00:45:03.040
So we implement Capsicum, which
they already have a library.

00:45:03.040 --> 00:45:05.700
And all we do is we change
all these functions,

00:45:05.700 --> 00:45:08.590
like open, read, and write,
to all very exclusive use

00:45:08.590 --> 00:45:09.927
capabilities.

00:45:09.927 --> 00:45:12.010
So all operations will go
through some capability,

00:45:12.010 --> 00:45:16.193
and look it up in the
file table, et cetera.

00:45:16.193 --> 00:45:17.140
Does that work?

00:45:17.140 --> 00:45:17.640
Yeah?

00:45:17.640 --> 00:45:19.730
AUDIENCE: You could
always make a sys call.

00:45:19.730 --> 00:45:20.010
PROFESSOR: Yeah.

00:45:20.010 --> 00:45:22.551
So the problem is that there
was this existing set of systems

00:45:22.551 --> 00:45:23.866
calls the kernel will accept.

00:45:23.866 --> 00:45:25.866
And even if you
implement a nice library,

00:45:25.866 --> 00:45:28.240
it doesn't prevent a bad
process or a compromised process

00:45:28.240 --> 00:45:29.656
from making the
sys call directly.

00:45:29.656 --> 00:45:32.270
And then you have to
have the kernel enforce

00:45:32.270 --> 00:45:33.786
something or other.

00:45:33.786 --> 00:45:34.286
Yeah?

00:45:34.286 --> 00:45:36.724
AUDIENCE: [INAUDIBLE].

00:45:36.724 --> 00:45:37.390
PROFESSOR: Yeah.

00:45:37.390 --> 00:45:39.247
So I think it's a
question of-- I guess

00:45:39.247 --> 00:45:40.330
what is your threat model?

00:45:40.330 --> 00:45:40.830
Exactly.

00:45:40.830 --> 00:45:42.580
So for the compiler,
the threat model

00:45:42.580 --> 00:45:47.230
is that the programmer is
maybe not paying attention

00:45:47.230 --> 00:45:50.240
a whole lot, but it's not really
a compromised compiler process,

00:45:50.240 --> 00:45:51.710
not an arbitrary code.

00:45:51.710 --> 00:45:54.750
So if we just help the
well-meaning developer do

00:45:54.750 --> 00:45:58.590
the right thing, then a
library will probably suffice.

00:45:58.590 --> 00:46:00.990
On the other hand, if we're
talking about a process that

00:46:00.990 --> 00:46:03.110
could be our executing
arbitrary code

00:46:03.110 --> 00:46:05.610
and could be trying to
bypass our mechanisms

00:46:05.610 --> 00:46:07.210
in any possible
way, then we have

00:46:07.210 --> 00:46:09.370
to have a strong
enforcement boundary.

00:46:09.370 --> 00:46:12.160
And a library doesn't provide
any kind of strong enforcement

00:46:12.160 --> 00:46:12.660
guarantees.

00:46:12.660 --> 00:46:16.311
Whereas a kernel, in
our case, would do that.

00:46:16.311 --> 00:46:16.810
OK.

00:46:16.810 --> 00:46:20.805
So what do they actually make in
terms of changes to the kernel?

00:46:20.805 --> 00:46:25.270
So I guess the first
thing is this system call

00:46:25.270 --> 00:46:26.780
that they call cap_enter.

00:46:30.750 --> 00:46:33.049
And what happens once
you run cap_enter?

00:46:33.049 --> 00:46:35.215
Once you've [INAUDIBLE]
cap_enter from your process?

00:46:38.309 --> 00:46:39.850
So as far as I can
tell, what happens

00:46:39.850 --> 00:46:44.950
is that the kernel will stop
accepting any system calls that

00:46:44.950 --> 00:46:47.635
refer to global namespaces.

00:46:47.635 --> 00:46:49.260
And the only thing
you'll be able to do

00:46:49.260 --> 00:46:52.650
is refer to existing
file descriptors

00:46:52.650 --> 00:46:54.810
that you have open
in your process.

00:46:54.810 --> 00:46:58.340
So cap_enter will put your
process in a special mode where

00:46:58.340 --> 00:47:02.265
you cannot use the regular
system called open,

00:47:02.265 --> 00:47:06.059
and instead you have to
do things like openat.

00:47:06.059 --> 00:47:07.475
So there's this
new sort of family

00:47:07.475 --> 00:47:10.830
of systems called, in Unix
like operating systems, where

00:47:10.830 --> 00:47:13.280
instead of having open
take a single path name,

00:47:13.280 --> 00:47:15.850
you can actually
you openat, where

00:47:15.850 --> 00:47:17.560
you pass it a first
argument which

00:47:17.560 --> 00:47:20.110
is a file descriptor
for a directory

00:47:20.110 --> 00:47:23.640
and the second is
some sort of a name.

00:47:23.640 --> 00:47:27.610
And the open at system
call will open this name

00:47:27.610 --> 00:47:31.250
relative to whatever directory
the file descriptor points to.

00:47:31.250 --> 00:47:33.430
So this is a much more
capability-like version

00:47:33.430 --> 00:47:36.930
of open, where you can still
have file descriptors pointing

00:47:36.930 --> 00:47:42.580
to directories, but
you can-- well, sorry.

00:47:42.580 --> 00:47:44.795
You can still direct
your operation.

00:47:44.795 --> 00:47:46.170
But in order to
do this, you have

00:47:46.170 --> 00:47:47.872
to have a capability
to the directory

00:47:47.872 --> 00:47:49.830
in the form of an open
file descriptor for that

00:47:49.830 --> 00:47:51.200
[INAUDIBLE].

00:47:51.200 --> 00:47:53.944
Make sense?

00:47:53.944 --> 00:47:55.290
OK.

00:47:55.290 --> 00:47:58.480
So do they need any
other kernel changes?

00:47:58.480 --> 00:48:00.630
Is there anything
else they worry about?

00:48:04.520 --> 00:48:06.086
So I guess there's
another-- yeah?

00:48:06.086 --> 00:48:07.650
AUDIENCE: [INAUDIBLE].

00:48:07.650 --> 00:48:08.316
PROFESSOR: Yeah.

00:48:08.316 --> 00:48:10.274
So what do they do about
network access, right?

00:48:10.274 --> 00:48:12.073
So what happens in
capability mode?

00:48:12.073 --> 00:48:14.281
AUDIENCE: I guess they have
capabilities for security

00:48:14.281 --> 00:48:17.365
packets [INAUDIBLE].

00:48:17.365 --> 00:48:17.990
PROFESSOR: Yes.

00:48:17.990 --> 00:48:19.365
So I think the
way they basically

00:48:19.365 --> 00:48:22.682
do it is that they treat the
network as a global namespace,

00:48:22.682 --> 00:48:23.890
very much like a file system.

00:48:23.890 --> 00:48:28.020
So I think once you
enter capability mode,

00:48:28.020 --> 00:48:30.660
you cannot create a new socket.

00:48:30.660 --> 00:48:33.320
Or you cannot create a new
socket and connect to some

00:48:33.320 --> 00:48:36.321
arbitrary machine, or to some
arbitrary address or fort

00:48:36.321 --> 00:48:36.820
number.

00:48:36.820 --> 00:48:40.710
You have to basically create all
the connections you want ahead

00:48:40.710 --> 00:48:42.420
of time and fill them
in as capabilities.

00:48:42.420 --> 00:48:44.670
Or maybe you'd have to get
them from someone that will

00:48:44.670 --> 00:48:46.185
pass you a file descriptor.

00:48:46.185 --> 00:48:48.655
But basically, once
you're in capability mode,

00:48:48.655 --> 00:48:51.280
the set of file descriptors you
have open completely enumerates

00:48:51.280 --> 00:48:52.821
all the machines
you'll ever talk to.

00:48:52.821 --> 00:48:54.430
So you can find
open connections.

00:48:54.430 --> 00:48:55.846
Maybe you're
listening on a forge.

00:48:55.846 --> 00:48:57.050
That's OK.

00:48:57.050 --> 00:48:59.790
But you cannot connect
to an address specified

00:48:59.790 --> 00:49:02.453
by an absolute name, kind of
like a global namespace would

00:49:02.453 --> 00:49:03.866
allow you to do it.

00:49:03.866 --> 00:49:05.150
That make sense?

00:49:05.150 --> 00:49:09.310
So it's access through the
networking namespace, as well.

00:49:09.310 --> 00:49:11.840
What do they do for processes?

00:49:11.840 --> 00:49:14.400
So another global
namespace, I guess, in Unix,

00:49:14.400 --> 00:49:16.670
is the the PIDs themselves.

00:49:16.670 --> 00:49:18.875
So the example of a
system call that operates

00:49:18.875 --> 00:49:20.090
in this name space is "kill."

00:49:20.090 --> 00:49:22.549
So I could kill PID 25.

00:49:22.549 --> 00:49:24.840
And I could-- well, presumably
I'll put a single number

00:49:24.840 --> 00:49:26.110
in there, too.

00:49:26.110 --> 00:49:31.040
But I could actually kill a
process by its PID number.

00:49:31.040 --> 00:49:35.320
How do they fix
this in Capsicum?

00:49:35.320 --> 00:49:36.130
What's their plan?

00:49:41.553 --> 00:49:42.269
Yeah?

00:49:42.269 --> 00:49:44.018
AUDIENCE: File descriptors
with processes.

00:49:44.018 --> 00:49:44.520
PROFESSOR: Yeah.

00:49:44.520 --> 00:49:45.130
It's actually kind of cool.

00:49:45.130 --> 00:49:47.300
It's like, I wish Unix
had this all along.

00:49:47.300 --> 00:49:50.640
Which is that, instead of
having these different kinds

00:49:50.640 --> 00:49:54.630
of numbers or PIDs, instead,
when you fork off a process,

00:49:54.630 --> 00:49:56.620
actually having
new variant of fork

00:49:56.620 --> 00:50:01.300
called pdfork, or
Process Descriptor Fork.

00:50:01.300 --> 00:50:04.560
And what it does is when
it creates a child process,

00:50:04.560 --> 00:50:07.700
it actually sticks a reference
to that child process

00:50:07.700 --> 00:50:10.320
into your file descriptor
table somewhere.

00:50:10.320 --> 00:50:11.730
And this is your new process.

00:50:11.730 --> 00:50:13.700
And you can operate
on a child process

00:50:13.700 --> 00:50:15.409
by specifying the file
descriptor number.

00:50:15.409 --> 00:50:17.491
Well, it would be pretty
cool, because you can now

00:50:17.491 --> 00:50:19.550
pass your child
process to someone else

00:50:19.550 --> 00:50:21.580
and say, well, if you
can go and kill them now,

00:50:21.580 --> 00:50:24.230
or you can manage this
process however you want,

00:50:24.230 --> 00:50:26.560
you'll get notifications
when the process dies.

00:50:26.560 --> 00:50:31.000
It'll look like a readable
file descriptor, et cetera.

00:50:31.000 --> 00:50:34.530
So they really try to
homogenize everything

00:50:34.530 --> 00:50:38.930
into looking like a file
descriptor of some sort here.

00:50:38.930 --> 00:50:40.695
And with these
kernel changes, you

00:50:40.695 --> 00:50:43.300
can finally have all
the functionalities

00:50:43.300 --> 00:50:44.330
you might care about.

00:50:44.330 --> 00:50:46.110
You have the support
for sockets already,

00:50:46.110 --> 00:50:48.160
process descriptors, et cetera.

00:50:48.160 --> 00:50:52.350
And you have a way
of constraining

00:50:52.350 --> 00:50:53.840
what the process can do.

00:50:53.840 --> 00:50:56.470
Because it cannot refer to any
of the global names anymore

00:50:56.470 --> 00:50:59.690
after [INAUDIBLE].

00:50:59.690 --> 00:51:00.610
All right.

00:51:00.610 --> 00:51:03.050
Any questions?

00:51:03.050 --> 00:51:05.700
So here's an interesting puzzle.

00:51:05.700 --> 00:51:07.820
I was trying to
understand from the paper.

00:51:07.820 --> 00:51:10.410
They make a big
deal about dot dot

00:51:10.410 --> 00:51:12.820
in looking up directory names.

00:51:12.820 --> 00:51:16.210
So they basically say, well,
once you're in capability mode,

00:51:16.210 --> 00:51:19.430
when you pass a
particular name to openat,

00:51:19.430 --> 00:51:21.416
you cannot use dot
dot in those names.

00:51:21.416 --> 00:51:23.040
And presumably, if
you have a Simulink,

00:51:23.040 --> 00:51:25.205
if a Simulink's target
contains dot dot,

00:51:25.205 --> 00:51:28.780
they will reject it if
you're in capability mode.

00:51:28.780 --> 00:51:31.830
So is this strictly required?

00:51:31.830 --> 00:51:33.980
Could you imagine a
safe design in principle

00:51:33.980 --> 00:51:35.610
that allows the use of dot dot?

00:51:40.330 --> 00:51:41.040
Yeah.

00:51:41.040 --> 00:51:43.664
AUDIENCE: Well, you'd need to be
able to find whether they have

00:51:43.664 --> 00:51:46.892
a file or a capability that
allows the masses to the parent

00:51:46.892 --> 00:51:47.880
directory.

00:51:47.880 --> 00:51:48.640
PROFESSOR: Right.

00:51:48.640 --> 00:51:50.181
AUDIENCE: So it's
trivial to go down,

00:51:50.181 --> 00:51:52.908
because any subdirectory--
you already have access to it

00:51:52.908 --> 00:51:53.490
by having the capability.

00:51:53.490 --> 00:51:54.190
PROFESSOR: That's right.

00:51:54.190 --> 00:51:54.740
Yeah.

00:51:54.740 --> 00:51:56.050
AUDIENCE: But going
up, you need to see

00:51:56.050 --> 00:51:58.050
whether you have any
capabilities for the parent

00:51:58.050 --> 00:51:58.810
directory.

00:51:58.810 --> 00:51:59.810
PROFESSOR: That's right.

00:51:59.810 --> 00:52:00.300
Yeah.

00:52:00.300 --> 00:52:01.060
AUDIENCE: Search for it somehow.

00:52:01.060 --> 00:52:01.220
PROFESSOR: Yeah.

00:52:01.220 --> 00:52:01.955
So that's a little bit tricky.

00:52:01.955 --> 00:52:03.503
And also, it goes
against the grain

00:52:03.503 --> 00:52:06.490
of this whole explicit
authority thing.

00:52:06.490 --> 00:52:09.895
What about if you're
using dot dot inside sort

00:52:09.895 --> 00:52:11.415
of a single open call?

00:52:11.415 --> 00:52:15.800
So for example, what if you
call something like openat some

00:52:15.800 --> 00:52:18.050
particular directory or
file descriptor number,

00:52:18.050 --> 00:52:20.332
and you open something like,
I don't know, b/c/../..?

00:52:26.690 --> 00:52:28.910
In principle, this
might be safe, right?

00:52:28.910 --> 00:52:31.290
Because you go down some
directory, and then you just

00:52:31.290 --> 00:52:33.770
climb back up out of it.

00:52:33.770 --> 00:52:34.660
Yeah?

00:52:34.660 --> 00:52:36.824
AUDIENCE: What if
c is [INAUDIBLE]?

00:52:36.824 --> 00:52:37.490
PROFESSOR: Yeah.

00:52:37.490 --> 00:52:38.560
So it's a little bit
tricky, of course,

00:52:38.560 --> 00:52:40.570
to define exactly what
it means to be safe.

00:52:40.570 --> 00:52:41.070
Right?

00:52:41.070 --> 00:52:44.350
You probably have to make sure
that c isn't a Simulink that

00:52:44.350 --> 00:52:46.160
goes somewhere else and so on.

00:52:46.160 --> 00:52:46.660
Yeah.

00:52:46.660 --> 00:52:48.190
That's a fairly tricky
proposition, to get this right.

00:52:48.190 --> 00:52:50.106
And I think, in the
paper, what they basically

00:52:50.106 --> 00:52:52.000
argue about is
that it's actually

00:52:52.000 --> 00:52:54.630
quite difficult in practice
to implement a set of checks

00:52:54.630 --> 00:52:57.990
that's sufficient and
bypasses all the possible rate

00:52:57.990 --> 00:52:59.640
conditions here.

00:52:59.640 --> 00:53:02.020
So they basically just
do the conservative thing

00:53:02.020 --> 00:53:04.190
and disallow any
dot dot at any time

00:53:04.190 --> 00:53:07.520
once you're in capability mode.

00:53:07.520 --> 00:53:09.330
There's some interesting
rate conditions

00:53:09.330 --> 00:53:10.496
you could come up with here.

00:53:10.496 --> 00:53:14.000
The lecture notes
have more details.

00:53:14.000 --> 00:53:16.010
But basically I
think these guys are

00:53:16.010 --> 00:53:18.560
being extra cautious in
defining what's allowed

00:53:18.560 --> 00:53:22.700
and what's not allowed
in capability mode.

00:53:22.700 --> 00:53:23.567
OK.

00:53:23.567 --> 00:53:25.620
So here, to answer
your question,

00:53:25.620 --> 00:53:27.036
once you enter
capability mode, it

00:53:27.036 --> 00:53:30.505
seems to be all controlled
by your file table.

00:53:30.505 --> 00:53:33.641
Does your UID still matter,
once you enter capability mode?

00:53:41.020 --> 00:53:43.340
[INAUDIBLE]

00:53:43.340 --> 00:53:44.080
Yeah?

00:53:44.080 --> 00:53:46.080
AUDIENCE: Well, you could
still launch a process

00:53:46.080 --> 00:53:48.077
that doesn't use capabilities.

00:53:48.077 --> 00:53:48.660
PROFESSOR: No.

00:53:48.660 --> 00:53:50.187
Actually, no, you can't.

00:53:50.187 --> 00:53:52.520
You have to make sure that--
otherwise you could escape,

00:53:52.520 --> 00:53:54.811
like well, I can't access--
why don't you run this guy?

00:53:54.811 --> 00:53:56.258
[INAUDIBLE]

00:53:56.258 --> 00:53:59.535
So yeah, cap_enter is inherited
by all the children, which

00:53:59.535 --> 00:54:01.550
is actually hugely important.

00:54:01.550 --> 00:54:02.050
Yeah?

00:54:06.000 --> 00:54:09.190
Anyone else?

00:54:09.190 --> 00:54:10.990
So what if we kill the UID?

00:54:10.990 --> 00:54:13.591
So it's supposed to be
like going to cap_enter,

00:54:13.591 --> 00:54:15.590
and we just kill the UID
of the current process.

00:54:15.590 --> 00:54:17.476
We don't actually care
what it is anymore.

00:54:17.476 --> 00:54:19.225
And then the process
tries to open a file.

00:54:19.225 --> 00:54:22.350
What checks should apply?

00:54:22.350 --> 00:54:22.850
Yeah?

00:54:22.850 --> 00:54:25.191
AUDIENCE: Oh, I was
thinking that the UID is

00:54:25.191 --> 00:54:26.690
useful for logging
purposes as well,

00:54:26.690 --> 00:54:28.580
like being able to tell
if you did something.

00:54:28.580 --> 00:54:29.130
PROFESSOR: So
yeah, you're right.

00:54:29.130 --> 00:54:29.460
Actually, yeah.

00:54:29.460 --> 00:54:30.930
So that would be actually
kind of damaging, right?

00:54:30.930 --> 00:54:33.500
Like I spawned some sandbox
process on my machine

00:54:33.500 --> 00:54:34.669
and it loses the UID.

00:54:34.669 --> 00:54:36.460
I'm like I have a
hundred processes running

00:54:36.460 --> 00:54:38.730
on my machine, and I have
no idea what they are.

00:54:38.730 --> 00:54:40.400
So that's probably not a good
plan for a management purpose.

00:54:40.400 --> 00:54:41.555
You're absolutely right.

00:54:41.555 --> 00:54:44.170
But I'm just sort of
hypothetically saying, well,

00:54:44.170 --> 00:54:45.920
do we need it for
access control, I guess.

00:54:45.920 --> 00:54:46.750
Yeah?

00:54:46.750 --> 00:54:48.280
AUDIENCE: Maybe if
this UID is only

00:54:48.280 --> 00:54:50.790
supposed to be able to
access this file by reading

00:54:50.790 --> 00:54:54.075
or whatever, but you have
the file descriptor for it,

00:54:54.075 --> 00:54:55.450
but then if you
lose the UID, you

00:54:55.450 --> 00:54:57.960
might get permissions to write
[INAUDIBLE] or something?

00:54:57.960 --> 00:54:58.780
PROFESSOR: Yeah.

00:54:58.780 --> 00:55:03.410
I think actually what it
shows up in is in directories.

00:55:03.410 --> 00:55:05.287
Because once you add a
capability to a file,

00:55:05.287 --> 00:55:06.120
that's basically it.

00:55:06.120 --> 00:55:08.600
You have it open with particular
privileges, et cetera.

00:55:08.600 --> 00:55:11.519
But the problem is that they
have this hybrid design where

00:55:11.519 --> 00:55:13.560
they say, well, you can
actually add capabilities

00:55:13.560 --> 00:55:15.510
to directories, and
you can open a new file

00:55:15.510 --> 00:55:17.030
as you're running along.

00:55:17.030 --> 00:55:19.375
And it might be the case
that you add a capability

00:55:19.375 --> 00:55:22.200
to a directory, like /etc.

00:55:22.200 --> 00:55:24.450
And you don't have access
to necessarily all the files

00:55:24.450 --> 00:55:25.520
in /etc.

00:55:25.520 --> 00:55:27.440
But once you enter
capability mode,

00:55:27.440 --> 00:55:29.860
you can now try to open
those files by saying, well,

00:55:29.860 --> 00:55:31.840
I have access to
the /etc directory.

00:55:31.840 --> 00:55:32.850
It's open already.

00:55:32.850 --> 00:55:34.620
Why don't you give
me the file named

00:55:34.620 --> 00:55:36.060
password in that directory?

00:55:36.060 --> 00:55:38.780
And the kernel still needs to
make an access control decision

00:55:38.780 --> 00:55:42.090
on whether to allow you to
open a file in that directory

00:55:42.090 --> 00:55:45.010
with either read mode or
write mode or what have you.

00:55:45.010 --> 00:55:47.490
So I think this is the one
place where you still need

00:55:47.490 --> 00:55:50.620
this ambient privilege, to
some extent, because they're

00:55:50.620 --> 00:55:53.140
trying to build this
compatible design where

00:55:53.140 --> 00:55:56.780
you can have semi-natural
semantics for how directories

00:55:56.780 --> 00:55:57.670
work.

00:55:57.670 --> 00:55:59.410
Does that make sense?

00:55:59.410 --> 00:56:02.920
it's like one leftover place,
kind of for compatibility

00:56:02.920 --> 00:56:05.884
reasons, or at least the
way that Unix file systems

00:56:05.884 --> 00:56:07.660
are typically set up.

00:56:07.660 --> 00:56:09.649
AUDIENCE: Are there
any other places?

00:56:09.649 --> 00:56:10.690
PROFESSOR: Good question.

00:56:10.690 --> 00:56:12.240
I couldn't think
of one off hand,

00:56:12.240 --> 00:56:14.531
but I guess I would have to
get their previous desource

00:56:14.531 --> 00:56:17.980
code to really figure
out what's going on.

00:56:17.980 --> 00:56:20.150
I think most of the
other situations

00:56:20.150 --> 00:56:22.069
don't really
require a UID check.

00:56:22.069 --> 00:56:23.860
Because for networking,
it doesn't show up.

00:56:23.860 --> 00:56:27.406
I think for process descriptors
it doesn't show up, either.

00:56:27.406 --> 00:56:29.660
If you have it, then
you just have it.

00:56:29.660 --> 00:56:33.421
So I think it probably is
just file system operations.

00:56:33.421 --> 00:56:35.920
For shared memory, it's also--
once you have a shared memory

00:56:35.920 --> 00:56:37.760
segment, you have it open.

00:56:41.232 --> 00:56:41.841
Yeah?

00:56:41.841 --> 00:56:43.216
AUDIENCE: Could
you explain again

00:56:43.216 --> 00:56:47.404
how exactly the user ID matters
if you have a capability?

00:56:47.404 --> 00:56:48.070
PROFESSOR: Yeah.

00:56:48.070 --> 00:56:51.810
So I think where it
matters is, you have

00:56:51.810 --> 00:56:54.910
a capability to a directory.

00:56:54.910 --> 00:56:57.770
The question is, what does
the capability represent?

00:56:57.770 --> 00:57:01.260
So one interpretation that--
for example, some capability

00:57:01.260 --> 00:57:03.130
system state, not Capsicum.

00:57:03.130 --> 00:57:04.130
Pure capability systems.

00:57:04.130 --> 00:57:06.870
They say, well, if you have
a capability to a directory,

00:57:06.870 --> 00:57:08.828
then of course you have
access to all the files

00:57:08.828 --> 00:57:11.392
in that directory, no
questions about it.

00:57:11.392 --> 00:57:13.225
And in Unix, this is
typically not the case.

00:57:13.225 --> 00:57:16.110
You can open a
directory like /etc,

00:57:16.110 --> 00:57:18.670
but there's lots of system
files in there that are maybe

00:57:18.670 --> 00:57:21.917
private, like the private key of
your server is stored in there.

00:57:21.917 --> 00:57:24.250
And just because you can look
at a directory and open it

00:57:24.250 --> 00:57:26.820
and list it doesn't mean that
you cannot open the files

00:57:26.820 --> 00:57:28.310
in that directory.

00:57:28.310 --> 00:57:32.392
So in Capsicum, if you
open a directory like /etc,

00:57:32.392 --> 00:57:33.850
and then you enter
capability mode.

00:57:33.850 --> 00:57:35.190
And then you say,
well, hey, I don't

00:57:35.190 --> 00:57:36.200
know what this directory is.

00:57:36.200 --> 00:57:37.658
I just add a file
descriptor to it.

00:57:37.658 --> 00:57:39.342
There's a file in
there called "key."

00:57:39.342 --> 00:57:41.390
Why don't you open
that file "key"?

00:57:41.390 --> 00:57:44.070
And at this point,
you probably don't

00:57:44.070 --> 00:57:46.270
want to allow this
capability-based processor

00:57:46.270 --> 00:57:48.480
to just open it, because
that wasn't the intent.

00:57:48.480 --> 00:57:52.060
They'll allow you to bypass
the Unix permissions on a file.

00:57:52.060 --> 00:57:54.250
So I think the
authors of this paper

00:57:54.250 --> 00:57:59.850
are careful to design a
system which would not violate

00:57:59.850 --> 00:58:01.600
existing security mechanisms.

00:58:01.600 --> 00:58:04.462
AUDIENCE: So you're saying
that you can, in some cases,

00:58:04.462 --> 00:58:06.370
use a combination of the two?

00:58:06.370 --> 00:58:08.760
So even though it'll be able
to change it to directory,

00:58:08.760 --> 00:58:10.760
inside the directory,
which files you can access

00:58:10.760 --> 00:58:11.839
depends on your user ID?

00:58:11.839 --> 00:58:12.880
PROFESSOR: Yeah, exactly.

00:58:12.880 --> 00:58:16.645
So in Capsicum, the way they
get it to work in practice

00:58:16.645 --> 00:58:19.890
is that, actually, before
you enter capability mode,

00:58:19.890 --> 00:58:20.666
you have to guess.

00:58:20.666 --> 00:58:22.415
Well, what files am I
going to need later?

00:58:22.415 --> 00:58:23.970
I'm going to need
some shared libraries.

00:58:23.970 --> 00:58:25.060
I'll need some text files.

00:58:25.060 --> 00:58:26.644
I'll need some templates.

00:58:26.644 --> 00:58:28.560
I'll need some network
connections, et cetera.

00:58:28.560 --> 00:58:30.960
So you open all these
things ahead of time.

00:58:30.960 --> 00:58:33.970
And you don't always necessarily
know which exact file you need.

00:58:33.970 --> 00:58:35.754
So what these guys
support as well,

00:58:35.754 --> 00:58:38.045
you can actually just open
a directory file descriptor,

00:58:38.045 --> 00:58:38.780
as well.

00:58:38.780 --> 00:58:41.460
And then I can look up the
particular files later.

00:58:41.460 --> 00:58:42.960
But it might be
that the files don't

00:58:42.960 --> 00:58:44.209
have all the same permissions.

00:58:44.209 --> 00:58:46.760
So that's exactly
the reason, yeah.

00:58:46.760 --> 00:58:49.610
Make sense?

00:58:49.610 --> 00:58:50.940
All right.

00:58:50.940 --> 00:58:55.560
So this is the kernel
mechanism part of it.

00:58:55.560 --> 00:59:01.830
Why do they also need this
library for libcapsicum?

00:59:01.830 --> 00:59:04.410
I guess there's two things that
they support in that library,

00:59:04.410 --> 00:59:07.330
as far as I can tell,
or two main things.

00:59:07.330 --> 00:59:15.342
One is that they implement this
function they call lch_start

00:59:15.342 --> 00:59:21.930
that you should use
instead of cap_enter.

00:59:21.930 --> 00:59:25.600
And the other sort of
feature the library provides

00:59:25.600 --> 00:59:31.120
in libcapsicum is this
notion called fd lists

00:59:31.120 --> 00:59:33.600
instead of passing file
descriptors by number.

00:59:33.600 --> 00:59:35.030
So this fd list
thing is probably

00:59:35.030 --> 00:59:36.460
the easiest thing to explain.

00:59:36.460 --> 00:59:40.940
It's basically a generalization,
or maybe a clean up,

00:59:40.940 --> 00:59:43.520
of how Unix manages
and passes file

00:59:43.520 --> 00:59:46.220
descriptors between process.

00:59:46.220 --> 00:59:49.580
So in traditional
Unix and Linux,

00:59:49.580 --> 00:59:52.910
how you use it today, typically
when you launch a process,

00:59:52.910 --> 00:59:54.550
you can pass it some
file descriptors.

00:59:54.550 --> 00:59:56.020
You just open some
file descriptors

00:59:56.020 --> 00:59:58.485
at particular integer
numbers in this table

00:59:58.485 --> 01:00:00.610
and you run the child
process that you want to run.

01:00:00.610 --> 01:00:03.180
Or you run a particular
binary, and it

01:00:03.180 --> 01:00:08.000
inherits all these open
slots in the fd table.

01:00:08.000 --> 01:00:10.370
But there's no real good
way to name these things

01:00:10.370 --> 01:00:11.730
other than by number.

01:00:11.730 --> 01:00:15.244
So the somewhat
surprising convention,

01:00:15.244 --> 01:00:16.660
if you haven't
[INAUDIBLE] before,

01:00:16.660 --> 01:00:18.750
is that, well, slot
0 is your input.

01:00:18.750 --> 01:00:20.940
Slot 1 is your output.

01:00:20.940 --> 01:00:24.010
Slot 2 is where you should
print error messages to.

01:00:24.010 --> 01:00:27.370
And that's how
Unix sort of works.

01:00:27.370 --> 01:00:32.240
And it sort of works OK if you
are just passing these three

01:00:32.240 --> 01:00:35.430
files or streams to a process.

01:00:35.430 --> 01:00:37.570
But in Capsicum,
what's happening

01:00:37.570 --> 01:00:41.140
is that you're passing down many
more file descriptors around.

01:00:41.140 --> 01:00:43.894
So you're passing a file
descriptor for some files.

01:00:43.894 --> 01:00:46.310
You're passing a file descriptor
for a network connection,

01:00:46.310 --> 01:00:49.320
for a shared library,
what have you.

01:00:49.320 --> 01:00:52.060
And it becomes much more tedious
to manage all these numbers.

01:00:52.060 --> 01:00:55.370
So basically, libcapsicum
provides an abstraction

01:00:55.370 --> 01:00:59.460
for naming these past file
descriptors between processes

01:00:59.460 --> 01:01:01.810
by some sort of a
hierarchical name,

01:01:01.810 --> 01:01:06.980
instead of just these opaque
integers, if you will.

01:01:06.980 --> 01:01:08.410
So that's one sort
of simple thing

01:01:08.410 --> 01:01:10.240
that they provide
in their library.

01:01:10.240 --> 01:01:13.260
So I can pass a file
descriptor to a process

01:01:13.260 --> 01:01:14.100
and give it a name.

01:01:14.100 --> 01:01:16.100
And it doesn't really
matter what number it has,

01:01:16.100 --> 01:01:16.982
a little easier.

01:01:16.982 --> 01:01:17.968
That make sense?

01:01:17.968 --> 01:01:19.450
OK.

01:01:19.450 --> 01:01:21.120
So then they have
this other mechanism,

01:01:21.120 --> 01:01:25.906
this much more elaborate
way to start a sandbox.

01:01:25.906 --> 01:01:29.740
This lch, libcapsicum Host,
API for starting a sandbox,

01:01:29.740 --> 01:01:33.342
instead of just entering
the capability mode.

01:01:33.342 --> 01:01:34.050
So what happened?

01:01:34.050 --> 01:01:36.396
Why do they need something
more than just entering

01:01:36.396 --> 01:01:37.392
capability mode?

01:01:37.392 --> 01:01:39.950
What are you worried about
on creating a sandbox?

01:01:39.950 --> 01:01:40.810
Yeah?

01:01:40.810 --> 01:01:43.502
AUDIENCE: It erases
all the inherited stuff

01:01:43.502 --> 01:01:45.524
to give you a clean start.

01:01:45.524 --> 01:01:46.190
PROFESSOR: Yeah.

01:01:46.190 --> 01:01:48.430
So I think they
worry about trying

01:01:48.430 --> 01:01:51.230
to enumerate what are all the
things the sandbox has access

01:01:51.230 --> 01:01:51.870
to.

01:01:51.870 --> 01:01:56.160
And the problem is that if
you just call cap_enter,

01:01:56.160 --> 01:01:58.560
technically, at the kernel
mechanism level, as we talked

01:01:58.560 --> 01:01:59.285
about just now, it worked.

01:01:59.285 --> 01:01:59.785
Right?

01:01:59.785 --> 01:02:02.270
It just prevents you from
opening any new capabilities.

01:02:02.270 --> 01:02:05.230
But the problem is that there
might be lots of existing stuff

01:02:05.230 --> 01:02:08.780
that the process
already has access to.

01:02:08.780 --> 01:02:11.256
So I guess the simplest
example is maybe

01:02:11.256 --> 01:02:13.930
there are some file descriptors
that you forgot you had opened,

01:02:13.930 --> 01:02:17.310
and it'll just get
inherited by this process.

01:02:17.310 --> 01:02:20.470
So one example is they
were looking at tcpdump.

01:02:20.470 --> 01:02:23.950
And they realized that-- well,
first, they changed tcpdump

01:02:23.950 --> 01:02:27.500
just by calling
cap_enter at the point

01:02:27.500 --> 01:02:30.594
just before they were about to
parse all the network input.

01:02:30.594 --> 01:02:32.760
So this works well, in some
sense, because you can't

01:02:32.760 --> 01:02:34.290
get any more capabilities.

01:02:34.290 --> 01:02:36.331
But then they looked at
the open file descriptor,

01:02:36.331 --> 01:02:39.285
and they realized that you have
complete access to the user's

01:02:39.285 --> 01:02:41.720
terminal, because you have an
open file descriptor to it.

01:02:41.720 --> 01:02:43.145
So you can actually
sniff all the keystrokes

01:02:43.145 --> 01:02:45.225
that the user is typing
and all that stuff.

01:02:45.225 --> 01:02:48.602
So it's probably not a
great plan for tcpdump.

01:02:48.602 --> 01:02:51.060
This compromise you probably
don't want sniffing everything

01:02:51.060 --> 01:02:52.950
you're typing.

01:02:52.950 --> 01:02:56.520
So instead they-- well,
in tcpdump's case,

01:02:56.520 --> 01:03:00.900
they manually changed
these file descriptors

01:03:00.900 --> 01:03:03.010
to add some capability
bits to them,

01:03:03.010 --> 01:03:05.360
to restrict what kinds
of operations you can do.

01:03:05.360 --> 01:03:07.990
So remember, the capability,
at least in Capsicum,

01:03:07.990 --> 01:03:11.030
has these extra bits that say,
here's the class of operations

01:03:11.030 --> 01:03:13.310
you can perform on
a file descriptor.

01:03:13.310 --> 01:03:17.650
So they basically take what
used to be file descriptor 0.

01:03:17.650 --> 01:03:20.700
It pointed to the
user's terminal, tty.

01:03:20.700 --> 01:03:23.670
And originally, this was
just a direct pointer

01:03:23.670 --> 01:03:25.880
to the tty structure
in the kernel.

01:03:25.880 --> 01:03:27.570
What they do is they
actually-- in order

01:03:27.570 --> 01:03:30.070
to limit the kind of operations
you can perform on this file

01:03:30.070 --> 01:03:31.930
descriptor, they basically
introduced some extra beta

01:03:31.930 --> 01:03:32.930
structure in the middle.

01:03:32.930 --> 01:03:34.810
This guy will point
to the terminal.

01:03:34.810 --> 01:03:36.730
And the file
descriptor itself will

01:03:36.730 --> 01:03:39.950
point to some sort of
a capability structure.

01:03:39.950 --> 01:03:43.040
And inside of it is the
pointer to the real file

01:03:43.040 --> 01:03:46.685
that you're trying to access,
as well as some restricted bits

01:03:46.685 --> 01:03:51.590
or permissions on
that file descriptor

01:03:51.590 --> 01:03:53.280
object that you can do.

01:03:53.280 --> 01:03:55.740
In their case, they basically
can say for tcpdumps standard

01:03:55.740 --> 01:03:57.585
input, you cannot
do anything on it.

01:03:57.585 --> 01:03:59.602
You can just see that it
exists, and that's it.

01:03:59.602 --> 01:04:01.564
For the output file
descriptor, they say,

01:04:01.564 --> 01:04:03.980
well, you can write to it, but
you maybe can't reposition.

01:04:03.980 --> 01:04:07.710
You can't [INAUDIBLE]
back and forth, et cetera.

01:04:07.710 --> 01:04:10.280
Make sense?

01:04:10.280 --> 01:04:11.900
So what else would
you worry about,

01:04:11.900 --> 01:04:12.570
in terms of starting a sandbox?

01:04:12.570 --> 01:04:14.810
So there is, I guess, the
file descriptor state.

01:04:14.810 --> 01:04:16.234
Anything else that matters?

01:04:21.448 --> 01:04:24.320
Well, I guess in Unix it's
file descriptors and memory.

01:04:24.320 --> 01:04:25.670
That's pretty much it.

01:04:25.670 --> 01:04:29.400
So the other thing that
these guys worry about

01:04:29.400 --> 01:04:32.250
is that it might be that
in your address space,

01:04:32.250 --> 01:04:34.600
you previously allocated
some sensitive data.

01:04:34.600 --> 01:04:36.920
And the process
that your sandbox

01:04:36.920 --> 01:04:38.830
is going to be able to
read all its memory.

01:04:38.830 --> 01:04:40.205
So if there's
maybe some password

01:04:40.205 --> 01:04:42.420
that you checked before when
the user was logging in,

01:04:42.420 --> 01:04:44.150
and you haven't
cleared that yet,

01:04:44.150 --> 01:04:45.749
well, the sandbox
process will be

01:04:45.749 --> 01:04:47.165
able to read that
and do something

01:04:47.165 --> 01:04:49.050
maybe interesting to that.

01:04:49.050 --> 01:04:50.920
So the way they
solved this problem

01:04:50.920 --> 01:04:55.100
is, in lch_start, you basically
have to start a program fresh.

01:04:55.100 --> 01:04:57.270
You basically take a program.

01:04:57.270 --> 01:04:59.197
You explicitly package
up all the arguments

01:04:59.197 --> 01:05:00.030
you want to give it.

01:05:00.030 --> 01:05:01.590
You explicitly package up
all the file descriptors

01:05:01.590 --> 01:05:02.860
you want to give it.

01:05:02.860 --> 01:05:04.235
And then you start
a new process,

01:05:04.235 --> 01:05:06.410
or you would call
executives to reinitialize

01:05:06.410 --> 01:05:09.200
your whole virtual memory space.

01:05:09.200 --> 01:05:11.080
And then there's no
question about what

01:05:11.080 --> 01:05:14.370
is the set of sensitive
data of extra privileges

01:05:14.370 --> 01:05:15.510
that this process has.

01:05:15.510 --> 01:05:18.160
It's exactly what you
passed to lch_start,

01:05:18.160 --> 01:05:22.040
in terms of a program name,
arguments, and capabilities.

01:05:22.040 --> 01:05:24.540
Does that make sense?

01:05:24.540 --> 01:05:27.160
AUDIENCE: What would happen
if the process that you're

01:05:27.160 --> 01:05:29.494
starting is a setuid 0 binary?

01:05:29.494 --> 01:05:30.160
PROFESSOR: Yeah.

01:05:30.160 --> 01:05:35.380
I think these guys say
that they don't actually

01:05:35.380 --> 01:05:38.020
allow setuid binaries
in capability mode,

01:05:38.020 --> 01:05:39.860
just to avoid some
weird interactions that

01:05:39.860 --> 01:05:40.905
would show up.

01:05:40.905 --> 01:05:42.940
I think the rules
that they implement

01:05:42.940 --> 01:05:45.263
is that you could have
a setuid program that

01:05:45.263 --> 01:05:47.770
gets its privileges
from a setuid binary,

01:05:47.770 --> 01:05:50.950
and then it can call
capenter or lch_start.

01:05:50.950 --> 01:05:52.890
But once you're in
capability mode,

01:05:52.890 --> 01:05:54.640
you cannot regain
extra privileges.

01:05:54.640 --> 01:05:58.110
In principle, this could work,
but it would be very weird.

01:05:58.110 --> 01:06:00.680
Because remember, the only
place where the UID matters,

01:06:00.680 --> 01:06:02.275
once you're in
capability mode, is

01:06:02.275 --> 01:06:04.150
in opening these files
inside of a directory.

01:06:04.150 --> 01:06:07.080
So it's not clear this
is really a great plan

01:06:07.080 --> 01:06:10.850
for getting more privileges
or [INAUDIBLE] there.

01:06:10.850 --> 01:06:11.350
Make sense?

01:06:11.350 --> 01:06:12.790
Yeah?

01:06:12.790 --> 01:06:14.270
AUDIENCE: We talked
about earlier

01:06:14.270 --> 01:06:17.575
why the library doesn't really
support strict separation

01:06:17.575 --> 01:06:19.165
between those two.

01:06:19.165 --> 01:06:21.390
And then we just mentioned
all these problems

01:06:21.390 --> 01:06:23.800
that you could use
[INAUDIBLE], so we're still

01:06:23.800 --> 01:06:26.680
not under a restriction to use
lch_start necessarily, right?

01:06:26.680 --> 01:06:27.680
PROFESSOR: That's right.

01:06:27.680 --> 01:06:28.179
Yeah.

01:06:28.179 --> 01:06:30.510
So lch_start, here's sort
of the way to think of it.

01:06:30.510 --> 01:06:32.960
So you have an application,
like maybe tcpdump.

01:06:32.960 --> 01:06:36.309
Or gzip is the other
thing they work with.

01:06:36.309 --> 01:06:37.725
And what you're
basically assuming

01:06:37.725 --> 01:06:40.390
is the application is
probably not compromised,

01:06:40.390 --> 01:06:42.960
and there are some core part
of the application that you

01:06:42.960 --> 01:06:44.730
worry about sandboxing.

01:06:44.730 --> 01:06:47.570
In tcpdump's case, it's
actually parsing packets

01:06:47.570 --> 01:06:48.730
coming from the network.

01:06:48.730 --> 01:06:50.660
In gzip's case, it's
actually taking the file

01:06:50.660 --> 01:06:51.915
and decompressing it.

01:06:51.915 --> 01:06:54.250
And you're basically assuming,
well, up until a point,

01:06:54.250 --> 01:06:56.250
the process is probably
doing all the right things.

01:06:56.250 --> 01:06:57.041
It's not exploited.

01:06:57.041 --> 01:06:59.420
There's probably not a bug
yet for the [INAUDIBLE] even.

01:06:59.420 --> 01:07:00.795
So at that point,
you're trusting

01:07:00.795 --> 01:07:04.210
that it will run lch_start
correctly and correctly set up

01:07:04.210 --> 01:07:06.580
the image, correctly set
up all the capabilities,

01:07:06.580 --> 01:07:09.870
and then restrict itself from
making any further system calls

01:07:09.870 --> 01:07:11.840
outside its capability mode.

01:07:11.840 --> 01:07:13.490
And then you run
the dangerous stuff.

01:07:13.490 --> 01:07:16.590
And by then, this setup
has happened correctly,

01:07:16.590 --> 01:07:20.252
and there's no way to
escape out of that sandbox.

01:07:20.252 --> 01:07:22.570
Make sense?

01:07:22.570 --> 01:07:23.690
All right.

01:07:23.690 --> 01:07:28.230
So I guess let's look at how
you actually use capability mode

01:07:28.230 --> 01:07:30.584
to sandbox applications.

01:07:30.584 --> 01:07:32.250
So we talked a little
bit about tcpdump.

01:07:32.250 --> 01:07:36.005
How do you isolate this process?

01:07:36.005 --> 01:07:38.410
Another interesting
example they had

01:07:38.410 --> 01:07:44.660
was this gzip program that
compresses, decompresses files.

01:07:44.660 --> 01:07:47.010
So why do they worry
about sandboxing it?

01:07:47.010 --> 01:07:50.420
I guess they worry that the
decompression code is going

01:07:50.420 --> 01:07:52.740
to be potentially
buggy, or maybe there's

01:07:52.740 --> 01:07:54.880
some memory management
errors in how

01:07:54.880 --> 01:07:58.100
they manage the buffers during
decompression, et cetera.

01:07:58.100 --> 01:08:05.450
So could they-- well, one
interesting question, I guess,

01:08:05.450 --> 01:08:10.390
is why are the changes to
gzip seemingly much more

01:08:10.390 --> 01:08:16.109
complicated than for tcpdump?

01:08:23.670 --> 01:08:24.170
Any guesses?

01:08:26.655 --> 01:08:28.029
Well as far as
you can tell, it's

01:08:28.029 --> 01:08:31.640
mostly just a question of how
the application is structured

01:08:31.640 --> 01:08:32.439
internally, right?

01:08:32.439 --> 01:08:39.170
So if you had a application
that simply compressed

01:08:39.170 --> 01:08:42.029
a single file, or
decompressed a single file,

01:08:42.029 --> 01:08:48.125
then it might be OK for us to
just run it in capability mode

01:08:48.125 --> 01:08:49.249
without really changing it.

01:08:49.249 --> 01:08:52.540
You just give it a new standard
in for something to decompress,

01:08:52.540 --> 01:08:55.830
and the standard out goes
to the decompressed output,

01:08:55.830 --> 01:08:57.300
and that would work fine.

01:08:57.300 --> 01:08:59.830
The problem, as is
almost always the case

01:08:59.830 --> 01:09:01.899
here with these kind of
sandboxing techniques,

01:09:01.899 --> 01:09:04.830
is that the application actually
has much more complicated logic

01:09:04.830 --> 01:09:05.330
around it.

01:09:05.330 --> 01:09:07.359
So gzip, for
example, can compress

01:09:07.359 --> 01:09:09.490
multiple files, et cetera.

01:09:09.490 --> 01:09:13.580
And in that case, you have some
sort of a driver process on top

01:09:13.580 --> 01:09:15.450
which actually has
these extra privileges

01:09:15.450 --> 01:09:18.899
to open multiple files, to
create things, et cetera.

01:09:18.899 --> 01:09:22.300
And the core logic needs to be
often another helper process.

01:09:22.300 --> 01:09:24.600
And it was just so
the case in gzip

01:09:24.600 --> 01:09:27.359
that the application
wasn't structured

01:09:27.359 --> 01:09:29.890
in a way where this was already
a separate process doing

01:09:29.890 --> 01:09:31.689
all the decompression
or compression.

01:09:31.689 --> 01:09:36.020
So they had to change
gzip's core implementation,

01:09:36.020 --> 01:09:42.050
and, well, some structure of
the gzip application, instead

01:09:42.050 --> 01:09:44.560
of just passing the data
to the decompression

01:09:44.560 --> 01:09:47.060
function to actually
send it over an RPC call

01:09:47.060 --> 01:09:49.859
or really just write it to
some almost file descriptor

01:09:49.859 --> 01:09:52.660
to help process the
problems on the side

01:09:52.660 --> 01:09:54.200
and performs all
the decompression

01:09:54.200 --> 01:09:55.940
with almost no privileges.

01:09:55.940 --> 01:09:57.760
The only thing it
can do is return

01:09:57.760 --> 01:10:00.090
the decompressed data,
or the compressed data,

01:10:00.090 --> 01:10:02.670
back to the caller process.

01:10:02.670 --> 01:10:03.670
That roughly make sense?

01:10:03.670 --> 01:10:06.230
What's going on in gzip?

01:10:06.230 --> 01:10:07.820
All right.

01:10:07.820 --> 01:10:12.180
So I guess one thing we asked
for the homework is how do you

01:10:12.180 --> 01:10:13.667
actually use Capsicum in OKWS?

01:10:13.667 --> 01:10:14.750
So what do you guys think?

01:10:14.750 --> 01:10:17.025
Would it be useful?

01:10:17.025 --> 01:10:19.385
Would the OKWS guys
have been excited

01:10:19.385 --> 01:10:23.980
and switched to FreeBSD because
this was much easier to use?

01:10:23.980 --> 01:10:25.590
Or is this a wash?

01:10:25.590 --> 01:10:26.777
So what do you think?

01:10:26.777 --> 01:10:28.360
How would you use
Capsicum in FreeBSD?

01:10:28.360 --> 01:10:30.954
Would this be much different?

01:10:30.954 --> 01:10:31.890
Yeah.

01:10:31.890 --> 01:10:33.765
AUDIENCE: So it means
you can get rid of some

01:10:33.765 --> 01:10:36.944
of the jailing [INAUDIBLE].

01:10:36.944 --> 01:10:37.610
PROFESSOR: Yeah.

01:10:37.610 --> 01:10:38.109
That's true.

01:10:38.109 --> 01:10:40.600
So truth seems to be completely
superseded by this plan

01:10:40.600 --> 01:10:42.980
of having directory file
descriptors and capabilities.

01:10:42.980 --> 01:10:43.646
So that's great.

01:10:43.646 --> 01:10:45.980
So you don't need the
chroots setting it up.

01:10:45.980 --> 01:10:46.770
That seems messy.

01:10:46.770 --> 01:10:48.270
And this is much
more precise, also.

01:10:48.270 --> 01:10:49.996
Because you can--
instead of having

01:10:49.996 --> 01:10:51.870
a chroot with lots of
little things in there,

01:10:51.870 --> 01:10:54.397
you have to maybe set the
permissions on there carefully.

01:10:54.397 --> 01:10:56.480
You can just open exactly
the files that you need.

01:10:56.480 --> 01:10:58.800
So that seems like a plus.

01:10:58.800 --> 01:11:00.788
Any other benefits?

01:11:00.788 --> 01:11:01.288
Yeah.

01:11:01.288 --> 01:11:02.236
AUDIENCE: [INAUDIBLE].

01:11:06.502 --> 01:11:08.120
PROFESSOR: In OKWS, you mean?

01:11:08.120 --> 01:11:09.036
AUDIENCE: [INAUDIBLE].

01:11:09.036 --> 01:11:09.438
PROFESSOR: Yeah.

01:11:09.438 --> 01:11:11.880
So in OKWS, right, you have
this OK launcher daemon that

01:11:11.880 --> 01:11:14.150
had to launch all these guys.

01:11:14.150 --> 01:11:15.870
And it was the parent process.

01:11:15.870 --> 01:11:18.030
Only when they die,
the signal goes back

01:11:18.030 --> 01:11:22.197
to this okld to restart
the crash process.

01:11:22.197 --> 01:11:24.155
And that thing had to
run this root, because it

01:11:24.155 --> 01:11:25.700
had to sandbox things.

01:11:25.700 --> 01:11:28.140
There's actually a number of
things you could do better

01:11:28.140 --> 01:11:31.240
with Capsicum in OKWS.

01:11:31.240 --> 01:11:33.200
So one example is
you could probably

01:11:33.200 --> 01:11:35.410
have okld have many
fewer privileges.

01:11:35.410 --> 01:11:39.410
Because it might need to be
root initially to get fort 80.

01:11:39.410 --> 01:11:42.516
But after that, it could set
up sandboxes for everyone else

01:11:42.516 --> 01:11:43.640
without being root anymore.

01:11:43.640 --> 01:11:44.670
So that's kind of cool.

01:11:44.670 --> 01:11:46.620
And maybe you can
even delegate the job

01:11:46.620 --> 01:11:48.870
of responding a process
to someone else,

01:11:48.870 --> 01:11:50.930
maybe a per service
monitor Damion

01:11:50.930 --> 01:11:54.430
that just has this
process descriptor handle,

01:11:54.430 --> 01:11:56.950
or process descriptor
for child process,

01:11:56.950 --> 01:11:58.870
and whenever it crashes,
starts a new one.

01:11:58.870 --> 01:12:02.745
So I think this process
[INAUDIBLE] helps things a lot.

01:12:02.745 --> 01:12:06.160
And the fact that you can create
a sandbox without being root

01:12:06.160 --> 01:12:09.542
is also quite helpful, as well.

01:12:09.542 --> 01:12:11.000
Any other stuff,
what you could do?

01:12:11.000 --> 01:12:11.440
Yeah?

01:12:11.440 --> 01:12:12.320
AUDIENCE: You
could give each one

01:12:12.320 --> 01:12:14.387
a file descriptor with
append only mode to the log.

01:12:14.387 --> 01:12:15.053
PROFESSOR: Yeah.

01:12:15.053 --> 01:12:16.750
So that's pretty cool.

01:12:16.750 --> 01:12:19.560
So as we were talking
last time, in OKWS,

01:12:19.560 --> 01:12:23.675
well, the oklogd maybe could
hamper with the log file.

01:12:23.675 --> 01:12:25.373
And who knows what
the kernel will

01:12:25.373 --> 01:12:27.710
allow it to do once it has
a file descriptor on the log

01:12:27.710 --> 01:12:28.670
file itself.

01:12:28.670 --> 01:12:30.090
But here, the fact
that we can do

01:12:30.090 --> 01:12:33.010
much more of a
precise capability map

01:12:33.010 --> 01:12:35.562
on a file descriptor, well,
we could give it a log file

01:12:35.562 --> 01:12:37.895
and say, well, you could just
write to it, but not seek.

01:12:37.895 --> 01:12:40.150
So that basically
means append only,

01:12:40.150 --> 01:12:41.935
if you're the only
writer to that file.

01:12:41.935 --> 01:12:43.060
So that seems kind of nice.

01:12:43.060 --> 01:12:45.270
And you could prevent
it from reading a file.

01:12:45.270 --> 01:12:47.140
You could say, well, you can
only write, but not read,

01:12:47.140 --> 01:12:48.270
which is something
that's probably

01:12:48.270 --> 01:12:50.519
difficult to do with Unix
permissions alone right now.

01:12:53.253 --> 01:12:54.630
Make sense?

01:12:54.630 --> 01:12:57.120
Any other ideas for how
Capsicum might help?

01:12:59.680 --> 01:13:01.815
Would you wish there was
more stuff in Capsicum?

01:13:01.815 --> 01:13:03.670
I guess we always wish
there was more stuff.

01:13:03.670 --> 01:13:05.128
AUDIENCE: So one
thing that perhaps

01:13:05.128 --> 01:13:07.326
may be tricky is the
service team daemons need

01:13:07.326 --> 01:13:11.617
to connected to their
backend databases somehow.

01:13:11.617 --> 01:13:13.470
Which might be remotely.

01:13:13.470 --> 01:13:15.235
But you don't want
the launch daemon

01:13:15.235 --> 01:13:17.235
to know about which
services each service

01:13:17.235 --> 01:13:18.722
is going to connect to.

01:13:18.722 --> 01:13:19.680
PROFESSOR: Maybe, yeah.

01:13:19.680 --> 01:13:20.930
That's a good question, right?

01:13:20.930 --> 01:13:23.990
So in Capsicum, as we
were talking about,

01:13:23.990 --> 01:13:25.780
the network is in
global namespace.

01:13:25.780 --> 01:13:27.570
You have to have
existing file descriptors

01:13:27.570 --> 01:13:29.910
for all the outstanding
connections ahead of time.

01:13:29.910 --> 01:13:30.576
AUDIENCE: Right.

01:13:30.576 --> 01:13:33.675
But you don't necessarily want
okld to open up all the sockets

01:13:33.675 --> 01:13:34.700
for all the services.

01:13:34.700 --> 01:13:37.940
Because it might not know where
the services are connected.

01:13:37.940 --> 01:13:38.140
PROFESSOR: That's right.

01:13:38.140 --> 01:13:38.510
Yeah.

01:13:38.510 --> 01:13:39.960
So that's a little bit
of an awkward thing.

01:13:39.960 --> 01:13:40.850
I absolutely agree.

01:13:40.850 --> 01:13:42.700
And this is part
of the reason why

01:13:42.700 --> 01:13:44.950
I think capabilities
haven't completely

01:13:44.950 --> 01:13:46.830
subsumed everything
in the security world,

01:13:46.830 --> 01:13:48.350
is because they are
kind of awkward to use.

01:13:48.350 --> 01:13:50.430
Because the guy that gives
you all the privileges

01:13:50.430 --> 01:13:52.638
has to know exactly what
things you're going to need,

01:13:52.638 --> 01:13:55.100
like these connections
to backend servers.

01:13:55.100 --> 01:13:58.150
So at some level, maybe this
is not such a huge problem

01:13:58.150 --> 01:13:58.650
in OKWS.

01:13:58.650 --> 01:14:01.330
Because the launcher
daemon has to read a Config

01:14:01.330 --> 01:14:03.610
file and is going to pass
the token to the service

01:14:03.610 --> 01:14:04.401
in the first place.

01:14:04.401 --> 01:14:07.070
So maybe the token is going
to contain the host and port

01:14:07.070 --> 01:14:08.580
number to which
you're connected to.

01:14:08.580 --> 01:14:09.080
But I agree.

01:14:09.080 --> 01:14:10.360
It's not great.

01:14:10.360 --> 01:14:12.590
Because especially,
suppose the database server

01:14:12.590 --> 01:14:13.780
disconnects you.

01:14:13.780 --> 01:14:15.150
Well, you're kind of stuck now.

01:14:15.150 --> 01:14:17.135
The file server is
not connected anymore,

01:14:17.135 --> 01:14:18.060
and you can't
connect to a new one.

01:14:18.060 --> 01:14:20.476
So basically, if the database
server crashes, or restarts,

01:14:20.476 --> 01:14:22.130
or the network
breaks, you basically

01:14:22.130 --> 01:14:24.500
have to terminate it,
get yourself response,

01:14:24.500 --> 01:14:27.230
so you can get a new one of
these connections past you.

01:14:27.230 --> 01:14:29.104
So it's maybe not a
great plan in that sense.

01:14:29.104 --> 01:14:32.518
AUDIENCE: Could we wrap the
system call, the function

01:14:32.518 --> 01:14:35.144
[INAUDIBLE] to open a
socket so that it faults

01:14:35.144 --> 01:14:37.602
the middleman instead of the
socket that the users send out

01:14:37.602 --> 01:14:39.254
to [INAUDIBLE]?

01:14:39.254 --> 01:14:39.920
PROFESSOR: Yeah.

01:14:39.920 --> 01:14:43.130
This is what I think the
FreeBSD guys have done since.

01:14:43.130 --> 01:14:46.312
Well, there's a
bunch of situations

01:14:46.312 --> 01:14:48.770
like this, where you want to
open some file after the fact,

01:14:48.770 --> 01:14:50.728
or you want to connect
to something after going

01:14:50.728 --> 01:14:51.880
into capability mode.

01:14:51.880 --> 01:14:54.060
So the FreeBSD
developers have added

01:14:54.060 --> 01:14:58.250
this daemon called Casper, that
every capability based process

01:14:58.250 --> 01:14:59.470
has a handle on.

01:14:59.470 --> 01:15:03.010
And this Casper daemon runs
outside of capability mode,

01:15:03.010 --> 01:15:04.470
and basically
listens to requests

01:15:04.470 --> 01:15:06.380
from sandbox processes.

01:15:06.380 --> 01:15:09.790
And if you want
to open some file,

01:15:09.790 --> 01:15:12.400
or if you want to send a
network connection, or a packet,

01:15:12.400 --> 01:15:14.980
or something, but you didn't
have the right capability

01:15:14.980 --> 01:15:18.250
beforehand, then this Casper
daemon will do it for you.

01:15:18.250 --> 01:15:21.022
But it carefully
maintains a list of things

01:15:21.022 --> 01:15:22.980
that every sandbox process
should or should not

01:15:22.980 --> 01:15:24.010
be able to do.

01:15:24.010 --> 01:15:25.870
So it's like a systems service.

01:15:25.870 --> 01:15:28.400
So when you start a
capability process,

01:15:28.400 --> 01:15:30.900
or enter capability
mode, by default,

01:15:30.900 --> 01:15:33.620
this Casper thing will not allow
you to do anything extra funny.

01:15:33.620 --> 01:15:35.250
But you could say,
well, hey, I'm

01:15:35.250 --> 01:15:37.050
going to start the
sandbox process.

01:15:37.050 --> 01:15:40.750
And you can ask Casper,
well, please allow my process

01:15:40.750 --> 01:15:42.977
to do the following
things later.

01:15:42.977 --> 01:15:43.810
So you could, right?

01:15:43.810 --> 01:15:46.240
And the cool thing is that
you can pass file descriptors

01:15:46.240 --> 01:15:48.700
or capabilities through
fd passing in Unix.

01:15:48.700 --> 01:15:51.520
So once you have a handle
on this Casper guy,

01:15:51.520 --> 01:15:55.120
you can get more
capabilities later on.

01:15:55.120 --> 01:15:58.680
So it's, again, trade off
between being pure capability

01:15:58.680 --> 01:16:04.330
world versus actually being
programmable or easy to use.

01:16:04.330 --> 01:16:06.110
So it seems to be working out.

01:16:06.110 --> 01:16:10.230
I think the particular thing
they use it for in FreeBSD,

01:16:10.230 --> 01:16:13.350
or the thing that shows up
often, is making DNS queries.

01:16:13.350 --> 01:16:15.600
So you want to be able to
make DNS queries once you're

01:16:15.600 --> 01:16:16.150
in a sandbox.

01:16:16.150 --> 01:16:18.608
And actually, this is a problem
they ran into with tcpdump.

01:16:18.608 --> 01:16:20.850
Because when tcpdump is
printing your packets,

01:16:20.850 --> 01:16:22.580
it wants to print the host
name for an IP address.

01:16:22.580 --> 01:16:24.680
In order to do this, it has
to talk to a DNS server.

01:16:24.680 --> 01:16:26.263
But you probably
don't want to connect

01:16:26.263 --> 01:16:28.940
to a DNS server ahead of
time, or to every DNS server

01:16:28.940 --> 01:16:30.320
you might ever need.

01:16:30.320 --> 01:16:32.230
So instead, they use
this helper daemon

01:16:32.230 --> 01:16:35.440
that's going to make
DNS queries for you.

01:16:35.440 --> 01:16:37.388
Make sense?

01:16:37.388 --> 01:16:38.750
All right.

01:16:38.750 --> 01:16:42.905
So I guess the last thing
I wanted to talk about

01:16:42.905 --> 01:16:46.310
is what are the security
guarantees that Capsicum

01:16:46.310 --> 01:16:46.810
provides?

01:16:46.810 --> 01:16:49.120
So should you trust it?

01:16:49.120 --> 01:16:50.700
How could Capsicum go wrong?

01:16:53.399 --> 01:16:55.440
Presumably you can always
have security problems,

01:16:55.440 --> 01:16:57.870
regardless of what mechanism
you're using underneath.

01:16:57.870 --> 01:16:59.370
But what particular
things should we

01:16:59.370 --> 01:17:01.930
worry about in
Capsicum when we're

01:17:01.930 --> 01:17:03.310
building some system here?

01:17:06.710 --> 01:17:08.680
Suppose you have to
attack this thing.

01:17:08.680 --> 01:17:11.970
You have to attack this
tcpdump thing, or gzip,

01:17:11.970 --> 01:17:14.060
or whatever it is
that they implemented.

01:17:14.060 --> 01:17:18.039
What would you look at, in
terms of bugs or problems?

01:17:18.039 --> 01:17:19.872
AUDIENCE: Well, it
depends on the developers

01:17:19.872 --> 01:17:21.524
knowing what they're doing.

01:17:21.524 --> 01:17:24.220
So they might give
a bad capability.

01:17:24.220 --> 01:17:25.220
PROFESSOR: That's right.

01:17:25.220 --> 01:17:25.350
Yeah.

01:17:25.350 --> 01:17:27.710
So it's actually one
interesting property of Capsicum

01:17:27.710 --> 01:17:30.640
is that it's not a guarantee
that the user of the system

01:17:30.640 --> 01:17:31.430
gets.

01:17:31.430 --> 01:17:33.290
It's really a tool
that the developer

01:17:33.290 --> 01:17:38.260
has to build more trustworthy
or better application software.

01:17:38.260 --> 01:17:40.095
But I, as a user of the
system, have no idea

01:17:40.095 --> 01:17:41.553
whether this is a
good or bad thing

01:17:41.553 --> 01:17:43.178
that the application
is using Capsicum.

01:17:43.178 --> 01:17:46.440
You could totally misuse it,
as you're absolutely right.

01:17:46.440 --> 01:17:49.170
So maybe one example is,
as they show in the paper,

01:17:49.170 --> 01:17:51.490
you could give too many
privileges to the sandbox

01:17:51.490 --> 01:17:51.990
process.

01:17:51.990 --> 01:17:53.810
Like the the TCP
helper, or maybe

01:17:53.810 --> 01:17:55.030
it has access to my console.

01:17:55.030 --> 01:17:57.900
And that's not so great,
but it's hard for me

01:17:57.900 --> 01:18:01.130
as a user to really tell this
in a general purpose fashion.

01:18:01.130 --> 01:18:01.828
Yeah?

01:18:01.828 --> 01:18:05.443
AUDIENCE: It might also be that
when you set the permissions

01:18:05.443 --> 01:18:09.100
to the masks on any
given file descriptor

01:18:09.100 --> 01:18:11.304
that you set two
permission masks.

01:18:11.304 --> 01:18:11.970
PROFESSOR: Yeah.

01:18:11.970 --> 01:18:12.170
Right.

01:18:12.170 --> 01:18:13.720
So it's not just the
file descriptors.

01:18:13.720 --> 01:18:15.610
Also, what can you do with
those file descriptors?

01:18:15.610 --> 01:18:16.220
You're right.

01:18:16.220 --> 01:18:16.450
Yes.

01:18:16.450 --> 01:18:18.140
These maps are another
part of the story

01:18:18.140 --> 01:18:21.460
that you have to watch out for.

01:18:21.460 --> 01:18:21.980
OK.

01:18:21.980 --> 01:18:23.594
So suppose we got
the masks right.

01:18:23.594 --> 01:18:25.010
We got the file
descriptors right.

01:18:25.010 --> 01:18:26.120
We haven't used lth_start.

01:18:26.120 --> 01:18:28.740
There's nothing extra in memory.

01:18:28.740 --> 01:18:30.532
AUDIENCE: [INAUDIBLE].

01:18:30.532 --> 01:18:31.490
PROFESSOR: That's true.

01:18:31.490 --> 01:18:31.990
Yes.

01:18:31.990 --> 01:18:34.030
So maybe there's like
something before you even

01:18:34.030 --> 01:18:35.950
add the capability
mode that's damaging.

01:18:35.950 --> 01:18:39.030
So it only helps
once you jump in.

01:18:39.030 --> 01:18:42.240
And one slightly
annoying thing is

01:18:42.240 --> 01:18:47.360
that it seems like it can't do
a whole lot inside of capability

01:18:47.360 --> 01:18:51.560
mode, not in the sense that you
can't run large computations,

01:18:51.560 --> 01:18:55.010
but you can't really put a large
part of a complicated system

01:18:55.010 --> 01:18:55.900
into capability mode.

01:18:55.900 --> 01:18:57.358
Because inevitably,
in Unix, you'll

01:18:57.358 --> 01:18:59.820
need to do something
with new processes,

01:18:59.820 --> 01:19:01.870
opening network
connections, et cetera.

01:19:01.870 --> 01:19:03.487
And you'll probably
need to use some

01:19:03.487 --> 01:19:05.130
of these global
namespaces that are not

01:19:05.130 --> 01:19:06.790
available in capability mode.

01:19:06.790 --> 01:19:08.330
So it's probably
going to be quite

01:19:08.330 --> 01:19:12.790
difficult to put large chunks
of logic or intricate system

01:19:12.790 --> 01:19:15.370
code inside of capability mode.

01:19:15.370 --> 01:19:19.760
So only well-defined
chunks of an application

01:19:19.760 --> 01:19:22.500
are likely to be running
in capability mode.

01:19:22.500 --> 01:19:23.000
It depends.

01:19:23.000 --> 01:19:25.520
I don't know if this is
entirely true or not.

01:19:25.520 --> 01:19:27.180
In Chrome, for example,
large processes

01:19:27.180 --> 01:19:30.460
do run in capability
mode in their design.

01:19:30.460 --> 01:19:32.960
It might be that
you basically have

01:19:32.960 --> 01:19:37.190
to have non-capability mode
chunks of your application

01:19:37.190 --> 01:19:40.390
because you wanted to
incorporate nicely with Unix,

01:19:40.390 --> 01:19:44.330
or whatever is is you're
running alongside of it.

01:19:44.330 --> 01:19:44.910
OK.

01:19:44.910 --> 01:19:48.460
Any other thing you
should worry about?

01:19:48.460 --> 01:19:49.170
Yeah?

01:19:49.170 --> 01:19:51.450
AUDIENCE: Well, whether they
implemented capabilities

01:19:51.450 --> 01:19:52.090
correctly.

01:19:52.090 --> 01:19:53.012
PROFESSOR: Yeah.

01:19:53.012 --> 01:19:55.320
AUDIENCE: Whether they've
covered all the system calls.

01:19:55.320 --> 01:19:55.750
PROFESSOR: That's right.

01:19:55.750 --> 01:19:56.010
Yes.

01:19:56.010 --> 01:19:58.220
So that's actually a huge
problem, in some sense,

01:19:58.220 --> 01:19:58.980
already.

01:19:58.980 --> 01:20:01.230
If you think about
it, there's probably

01:20:01.230 --> 01:20:03.960
hundreds of system calls
that the kernel provides you.

01:20:03.960 --> 01:20:06.529
And they're not especially
precisely documented,

01:20:06.529 --> 01:20:08.695
so you probably have to
look at their implementation

01:20:08.695 --> 01:20:11.242
and see if, for every
system call, if there's

01:20:11.242 --> 01:20:13.650
some way for the
applications to get

01:20:13.650 --> 01:20:16.010
the system call to
perform some operation

01:20:16.010 --> 01:20:18.600
on some extra object that didn't
have a file descriptor to it.

01:20:18.600 --> 01:20:20.490
And most Unix
system calls weren't

01:20:20.490 --> 01:20:22.870
written with the
expectation of everything

01:20:22.870 --> 01:20:24.600
has to be operation
on a file descriptor.

01:20:24.600 --> 01:20:27.160
So you really have to get
every system all right.

01:20:27.160 --> 01:20:30.100
And probably more worryingly
is that the kernel has

01:20:30.100 --> 01:20:32.300
to be free of bugs,
like buffer overflows

01:20:32.300 --> 01:20:34.884
or whatever other memory
corruption like you guys

01:20:34.884 --> 01:20:35.800
explained [INAUDIBLE].

01:20:35.800 --> 01:20:37.940
Otherwise, all of this
is complete nonsense.

01:20:37.940 --> 01:20:40.300
You just are on arbitrary
assembly code in the kernel,

01:20:40.300 --> 01:20:43.298
and you have full
control of the machine.

01:20:43.298 --> 01:20:44.214
AUDIENCE: [INAUDIBLE].

01:20:52.225 --> 01:20:53.300
PROFESSOR: Yeah.

01:20:53.300 --> 01:20:54.100
I guess, yeah.

01:20:54.100 --> 01:20:55.900
So the one thing I didn't
get a chance to talk about

01:20:55.900 --> 01:20:56.816
is alternative things.

01:20:56.816 --> 01:20:58.150
So this is in FreeBSD.

01:20:58.150 --> 01:20:59.990
Linux has this thing
called [INAUDIBLE],

01:20:59.990 --> 01:21:04.140
that allows you to specify which
system calls you can operate.

01:21:04.140 --> 01:21:06.070
If you squinted, it's
kind of like Capsicum

01:21:06.070 --> 01:21:08.190
but very different, in
the sense that Capsicum

01:21:08.190 --> 01:21:09.731
talks about specific
file descriptors

01:21:09.731 --> 01:21:11.010
that you can operate.

01:21:11.010 --> 01:21:12.812
And in Linux, the
[INAUDIBLE] mechanism

01:21:12.812 --> 01:21:14.520
lets you talk about
specific system calls

01:21:14.520 --> 01:21:16.040
that you could run.

01:21:16.040 --> 01:21:18.670
So it's probably
less fine grained,

01:21:18.670 --> 01:21:22.110
but it's what's
available in Linux today.

01:21:22.110 --> 01:21:24.289
And it's actually
probably a good idea

01:21:24.289 --> 01:21:26.622
to look at your applications
and see what system call do

01:21:26.622 --> 01:21:29.450
you expect it to make
and then code in a filter

01:21:29.450 --> 01:21:31.770
and allow it to make
only those system calls.

01:21:31.770 --> 01:21:34.311
The problem is that if you have
any interesting applications,

01:21:34.311 --> 01:21:36.215
it'll probably run exec
and open and write,

01:21:36.215 --> 01:21:38.680
and that's probably enough
to do quite a bit of damage

01:21:38.680 --> 01:21:39.264
to the system.

01:21:39.264 --> 01:21:41.763
So that's why you probably want
the more fine-grained system

01:21:41.763 --> 01:21:43.170
like Capsicum,
where you can say,

01:21:43.170 --> 01:21:45.630
well, you can run right,
but only on this thing,

01:21:45.630 --> 01:21:49.350
not on my entire home directory.

01:21:49.350 --> 01:21:49.850
All right.

01:21:49.850 --> 01:21:51.520
So I guess we're out of
time to talk about Capsicum.

01:21:51.520 --> 01:21:53.250
So let's talk about
native clients

01:21:53.250 --> 01:21:56.840
on Wednesday and a different
way to sandbox programs.