WEBVTT

00:00:00.070 --> 00:00:02.430
The following content is
provided under a Creative

00:00:02.430 --> 00:00:03.810
Commons license.

00:00:03.810 --> 00:00:06.060
Your support will help
MIT OpenCourseWare

00:00:06.060 --> 00:00:10.150
continue to offer high-quality
educational resources for free.

00:00:10.150 --> 00:00:12.700
To make a donation or to
view additional materials

00:00:12.700 --> 00:00:16.600
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:16.600 --> 00:00:17.310
at ocw.mit.edu.

00:00:26.169 --> 00:00:27.210
PROFESSOR: Hey, everyone.

00:00:27.210 --> 00:00:28.076
Good on that?

00:00:28.076 --> 00:00:29.480
All right, cool.

00:00:29.480 --> 00:00:34.477
So today we're going to talk
about the economics of spam

00:00:34.477 --> 00:00:35.740
and security in general.

00:00:35.740 --> 00:00:37.355
And so up to this
point in the class,

00:00:37.355 --> 00:00:40.540
we've mainly talked about the
technical aspects of security.

00:00:40.540 --> 00:00:42.680
So we've looked at things
like buffer overflows,

00:00:42.680 --> 00:00:46.540
the same-origin policy, Tor, and
all kinds of things like that.

00:00:46.540 --> 00:00:49.550
And so the context
for that discussion

00:00:49.550 --> 00:00:53.780
was that we were looking at
how an adversary can compromise

00:00:53.780 --> 00:00:54.560
a system.

00:00:54.560 --> 00:00:56.570
We tried to devise
a threat model that

00:00:56.570 --> 00:00:58.820
would describe the types of
things we want to prevent,

00:00:58.820 --> 00:01:00.320
and then we tried
to think about how

00:01:00.320 --> 00:01:03.400
we could design systems
that would help us to defend

00:01:03.400 --> 00:01:05.129
against that threat model.

00:01:05.129 --> 00:01:07.560
So today we're going to look
at an altered perspective.

00:01:07.560 --> 00:01:09.950
And the perspective
that we'll look at today

00:01:09.950 --> 00:01:13.520
is, why is the attacker trying
to compromise your system?

00:01:13.520 --> 00:01:17.189
Why is the attacker trying to
do these evil things to us?

00:01:17.189 --> 00:01:18.730
And so there's a
bunch of the reasons

00:01:18.730 --> 00:01:20.750
you can imagine why
attackers might be

00:01:20.750 --> 00:01:22.510
trying to do these evil things.

00:01:22.510 --> 00:01:25.805
So some of these attacks are
done for ideological reasons.

00:01:25.805 --> 00:01:27.804
So think about
people who perceive

00:01:27.804 --> 00:01:30.220
themselves to be political
activists, or things like that.

00:01:30.220 --> 00:01:32.950
Or if you think about
Stuxnet, for example.

00:01:32.950 --> 00:01:35.490
Sometimes it's like governments
attacking other governments.

00:01:35.490 --> 00:01:38.470
And so for these
types of attacks

00:01:38.470 --> 00:01:41.265
money, economics, is not
the primary motivation

00:01:41.265 --> 00:01:42.816
for the attack to take place.

00:01:42.816 --> 00:01:45.050
And what's interesting
is that it's actually

00:01:45.050 --> 00:01:48.540
hard to make these attacks go
away, other than generically

00:01:48.540 --> 00:01:51.357
making computers more secure.

00:01:51.357 --> 00:01:53.190
There's not really some
financial thumbscrew

00:01:53.190 --> 00:01:57.010
you can turn to make these
attackers disincentivized

00:01:57.010 --> 00:01:57.940
to do things.

00:01:57.940 --> 00:02:02.170
However, there are
some types of attacks

00:02:02.170 --> 00:02:04.900
that do involve a strong
economic component,

00:02:04.900 --> 00:02:07.690
and those are some of the things
we're going to look at today.

00:02:07.690 --> 00:02:08.990
One of the interesting
things, though,

00:02:08.990 --> 00:02:09.929
is that for a lot
of these attacks

00:02:09.929 --> 00:02:12.099
that don't have an economic
component, in that we

00:02:12.099 --> 00:02:14.640
can't use regulations and things
like that to try and prevent

00:02:14.640 --> 00:02:15.139
them.

00:02:15.139 --> 00:02:17.426
It can sometimes be
difficult to figure out

00:02:17.426 --> 00:02:19.800
how we'd be able to stop them
at all beyond, like I said,

00:02:19.800 --> 00:02:21.549
just trying to make
computers more secure.

00:02:21.549 --> 00:02:23.570
For example, Stuxnet's
a great idea.

00:02:23.570 --> 00:02:26.850
So this is the malware
that was attacking

00:02:26.850 --> 00:02:30.740
some of the industrial software
in Iran, with the centrifuges.

00:02:30.740 --> 00:02:34.430
So we all kind of know where
Stuxnet came from, right?

00:02:34.430 --> 00:02:36.850
We basically know it was the
Americans and the Israelis.

00:02:36.850 --> 00:02:37.370
Basically.

00:02:37.370 --> 00:02:40.000
But can we prove that
in a court of law?

00:02:40.000 --> 00:02:43.344
Like, who can we sue, to say
You put Stuxnet on our machine?

00:02:43.344 --> 00:02:44.885
So it becomes a
little bit murky when

00:02:44.885 --> 00:02:47.100
you have some of these
attacks, where it's not clear

00:02:47.100 --> 00:02:49.720
you can sue the Federal
Reserve, or you can sue Israel,

00:02:49.720 --> 00:02:50.770
for something like this.

00:02:50.770 --> 00:02:52.000
And furthermore, no
one's gone on the record

00:02:52.000 --> 00:02:53.750
as officially claiming
that it was them.

00:02:53.750 --> 00:02:56.660
So there's some very interesting
legal and financial issues

00:02:56.660 --> 00:02:58.243
that get involved
when you look at how

00:02:58.243 --> 00:02:59.460
to prevent these attacks.

00:02:59.460 --> 00:03:01.770
So there are many
kinds of computer crime

00:03:01.770 --> 00:03:04.440
that are driven by
economic motivations.

00:03:04.440 --> 00:03:07.050
So for example, state-sponsored
industrial espionage,

00:03:07.050 --> 00:03:07.819
for instance.

00:03:07.819 --> 00:03:10.110
So this is one thing that
some of our previous speakers

00:03:10.110 --> 00:03:10.660
have talked about.

00:03:10.660 --> 00:03:12.230
Sometimes governments
try to hack

00:03:12.230 --> 00:03:14.540
into other governments
or other industries

00:03:14.540 --> 00:03:17.562
to steal intellectual
property, or things like that.

00:03:17.562 --> 00:03:20.020
And what's interesting is that,
like the attacks that we'll

00:03:20.020 --> 00:03:21.840
look at today, which
are spam, you'll

00:03:21.840 --> 00:03:24.750
see that actually take some
money to make some money.

00:03:24.750 --> 00:03:27.770
Spammers actually have to
invest in an infrastructure

00:03:27.770 --> 00:03:30.100
before they can actually
send these messages out.

00:03:30.100 --> 00:03:32.630
And so if you have these
attacks where it takes money

00:03:32.630 --> 00:03:34.290
to make money, and
you can figure out

00:03:34.290 --> 00:03:37.314
what that financial sort of
tool chain looks like, then

00:03:37.314 --> 00:03:38.730
maybe you can think
about applying

00:03:38.730 --> 00:03:43.580
upstream financial pressure to
stop that downstream malware

00:03:43.580 --> 00:03:46.470
attacks or security problems.

00:03:46.470 --> 00:03:47.900
And so I think the
take-home point

00:03:47.900 --> 00:03:50.840
is that if we look at the
context of spam in particular,

00:03:50.840 --> 00:03:54.550
spammers will stop sending spam
if it becomes unprofitable.

00:03:54.550 --> 00:03:56.980
One of the sad truths of
the world that we continue

00:03:56.980 --> 00:03:59.260
to get spam messages
because it's cheap for them

00:03:59.260 --> 00:04:02.465
to send them, and 2% to 3%
of our fellow human beings

00:04:02.465 --> 00:04:05.050
will actually click on
links and look at stuff.

00:04:05.050 --> 00:04:08.430
And so as long as these costs
for sending these messages out

00:04:08.430 --> 00:04:10.775
are so low, then even if
the hit rates are low,

00:04:10.775 --> 00:04:12.900
people can still make money
off that kind of stuff.

00:04:12.900 --> 00:04:19.200
So for today we're
going to look at attacks

00:04:19.200 --> 00:04:24.266
that have a significant
economic component to them.

00:04:27.020 --> 00:04:30.110
And so one interesting
example which I actually just

00:04:30.110 --> 00:04:33.490
read about takes place in China.

00:04:33.490 --> 00:04:37.710
And so in China they
have this problem

00:04:37.710 --> 00:04:41.680
with what they call
text message cars.

00:04:41.680 --> 00:04:46.350
So the basic idea here is
that people drive around

00:04:46.350 --> 00:04:49.790
with these cars that have
these radio antennas attached

00:04:49.790 --> 00:04:50.730
to the side.

00:04:50.730 --> 00:04:52.770
And they can essentially
do-- think of it

00:04:52.770 --> 00:04:55.520
almost like a man in the middle
between people's mobile cell

00:04:55.520 --> 00:04:57.850
phones and the actual
cellphone tower.

00:04:57.850 --> 00:05:00.360
And so they can basically run
around in these troll cars,

00:05:00.360 --> 00:05:02.420
and they can get all of
these cell phone numbers,

00:05:02.420 --> 00:05:06.600
and then use that car to
send spam messages directly

00:05:06.600 --> 00:05:09.190
to the numbers that
they've collected using

00:05:09.190 --> 00:05:12.040
this sort of vehicle take.

00:05:12.040 --> 00:05:13.850
So these text message
cars can actually

00:05:13.850 --> 00:05:21.440
send upward of 200,000
messages a day,

00:05:21.440 --> 00:05:23.100
which is an incredibly
high number.

00:05:23.100 --> 00:05:25.630
And the cost of labor over
there is actually very cheap.

00:05:25.630 --> 00:05:28.134
So it's very inexpensive
to hire a driver,

00:05:28.134 --> 00:05:29.800
drive around one of
these cars, and just

00:05:29.800 --> 00:05:32.070
snoop on people's traffic
and send them spam.

00:05:32.070 --> 00:05:33.970
So let's look at the
economics of this.

00:05:33.970 --> 00:05:40.530
So what is the cost
of the evil antenna,

00:05:40.530 --> 00:05:43.350
this thing that
allows people to take

00:05:43.350 --> 00:05:45.630
these messages off the air?

00:05:45.630 --> 00:05:50.530
Roughly speaking, it's
somewhere in the order of about

00:05:50.530 --> 00:05:53.790
1600 bucks, give or take.

00:05:53.790 --> 00:05:59.760
So how much profit can
these people make a day?

00:05:59.760 --> 00:06:01.470
So in a hilarious
coincidence, this

00:06:01.470 --> 00:06:06.074
is also roughly 1600 dollars.

00:06:06.074 --> 00:06:07.240
So this is very interesting.

00:06:07.240 --> 00:06:10.230
What this means is that once
you buy one of these things,

00:06:10.230 --> 00:06:12.872
then in a day essentially
you've made back your money.

00:06:12.872 --> 00:06:16.260
So that's great, from the
perspective of being a spammer.

00:06:16.260 --> 00:06:18.835
Now you might say, OK, but you
might get caught by the police

00:06:18.835 --> 00:06:21.210
and then you might get put in
jail or have to pay a fine.

00:06:21.210 --> 00:06:29.650
So in the case of the fines,
the fines for getting caught

00:06:29.650 --> 00:06:32.100
are less than 5K.

00:06:35.220 --> 00:06:37.810
And people rarely get caught.

00:06:37.810 --> 00:06:40.215
And so these are the
types of calculations

00:06:40.215 --> 00:06:41.715
we have to look at
when we're trying

00:06:41.715 --> 00:06:44.360
to think about how
to economically deter

00:06:44.360 --> 00:06:45.620
these spammers.

00:06:45.620 --> 00:06:47.060
Because if these
spammers only get

00:06:47.060 --> 00:06:49.870
caught a couple times a
year, and they basically

00:06:49.870 --> 00:06:52.570
make back their hardware
costs in a single day,

00:06:52.570 --> 00:06:54.360
it's very tricky
to figure out how

00:06:54.360 --> 00:06:56.605
we can use financial
disincentives to make them

00:06:56.605 --> 00:06:58.330
stop doing this kind of stuff.

00:06:58.330 --> 00:07:02.790
And what's interesting is that
in China the mobile carriers

00:07:02.790 --> 00:07:05.160
are also somewhat
implicit in this scheme.

00:07:05.160 --> 00:07:06.740
So every time you
send a spam, you're

00:07:06.740 --> 00:07:09.540
going to send some small amount
of money to the mobile carrier,

00:07:09.540 --> 00:07:09.720
right?

00:07:09.720 --> 00:07:10.420
A couple cents.

00:07:10.420 --> 00:07:11.970
It works that way
over here as well.

00:07:11.970 --> 00:07:14.280
Now over here in
Europe in many cases,

00:07:14.280 --> 00:07:16.450
the mobile carriers
have decided that they

00:07:16.450 --> 00:07:18.610
don't want angry customers
contacting them saying,

00:07:18.610 --> 00:07:20.970
I'm getting hit by these
spam messages all the time.

00:07:20.970 --> 00:07:23.410
But apparently a lot of the
Chinese mobile carriers,

00:07:23.410 --> 00:07:24.910
at least the top
three ones, they're

00:07:24.910 --> 00:07:26.780
actually seeing
these spam messages

00:07:26.780 --> 00:07:29.070
as a source of revenue.

00:07:29.070 --> 00:07:31.970
They actually think this
is a nice way for them

00:07:31.970 --> 00:07:32.950
to get some free money.

00:07:32.950 --> 00:07:36.810
So in fact these telcos
have set up these things

00:07:36.810 --> 00:07:41.414
called 106 prefix numbers.

00:07:41.414 --> 00:07:44.190
I don't know if you've
heard of these before.

00:07:44.190 --> 00:07:44.849
[BANGING]

00:07:44.849 --> 00:07:48.521
But the original-- there's
apparently a ghost in the room.

00:07:48.521 --> 00:07:50.810
The original purpose
of these numbers

00:07:50.810 --> 00:07:53.710
was to do things for
non-commercial reasons.

00:07:53.710 --> 00:07:56.180
For example, imagine
that you run a company,

00:07:56.180 --> 00:07:58.120
and you want to send a
bunch of text messages

00:07:58.120 --> 00:07:59.540
to all of your employees.

00:07:59.540 --> 00:08:02.205
You can use one of
these 106 numbers,

00:08:02.205 --> 00:08:05.730
and you would basically be
able to send things in bulk.

00:08:05.730 --> 00:08:08.510
You'd be able to avoid some
of the built-in rate-limiting

00:08:08.510 --> 00:08:10.840
mechanisms they had
in the cell network.

00:08:10.840 --> 00:08:12.630
So there's this nice
thing sitting around

00:08:12.630 --> 00:08:14.820
that spammers can actually use.

00:08:14.820 --> 00:08:16.800
And so as it turns
out, I think it's

00:08:16.800 --> 00:08:26.050
something like 55% of the mobile
span that gets sent in China

00:08:26.050 --> 00:08:30.620
comes from one of
these 106 numbers.

00:08:30.620 --> 00:08:32.900
So this is a really
interesting case study

00:08:32.900 --> 00:08:36.180
of how these financial
numbers work out,

00:08:36.180 --> 00:08:37.710
and how sometimes
you can actually

00:08:37.710 --> 00:08:41.630
have these sort of perverse
incentives, where in this case

00:08:41.630 --> 00:08:44.160
the cellphone carriers
are just going along

00:08:44.160 --> 00:08:47.223
with these scams
and these schemes.

00:08:47.223 --> 00:08:49.056
And there'll be a link
in the lecture notes.

00:08:49.056 --> 00:08:50.968
There's an interesting
Economist article about this.

00:08:50.968 --> 00:08:51.759
[BANGING CONTINUES]

00:08:51.759 --> 00:08:55.800
There is like a pan-African
drum circle back there.

00:08:55.800 --> 00:08:57.450
This is super exciting, though.

00:08:57.450 --> 00:08:57.950
I like it.

00:08:57.950 --> 00:08:59.430
I am being
adversarially attacked.

00:08:59.430 --> 00:09:00.383
That's OK.

00:09:00.383 --> 00:09:02.130
We will play through the pain.

00:09:02.130 --> 00:09:03.780
Perhaps this is the Mossad.

00:09:03.780 --> 00:09:06.920
They don't want me to
talk about Stuxnet.

00:09:06.920 --> 00:09:09.040
Another interesting
thing about security

00:09:09.040 --> 00:09:12.400
is that there are actually
many companies that

00:09:12.400 --> 00:09:14.470
deal in cyber arms.

00:09:14.470 --> 00:09:17.802
So this is kind of
something out of G.I. Joe,

00:09:17.802 --> 00:09:20.260
but there are actually these
companies that will sit around

00:09:20.260 --> 00:09:22.976
and they will actually
sell you malware,

00:09:22.976 --> 00:09:24.350
they will sell
you exploits, they

00:09:24.350 --> 00:09:26.300
will sell you things like this.

00:09:26.300 --> 00:09:34.430
So one example is this
company that's called Endgame.

00:09:34.430 --> 00:09:42.210
And so for example for
about $1.5 million,

00:09:42.210 --> 00:09:45.940
Endgame will give
you IP addresses

00:09:45.940 --> 00:09:53.195
and the physical locations of
millions of unpatched machines.

00:09:57.460 --> 00:10:00.450
So they have sort of vantage
points all over the internet,

00:10:00.450 --> 00:10:02.620
and they know all kinds
of interesting information

00:10:02.620 --> 00:10:04.690
about machines that
you may or may not

00:10:04.690 --> 00:10:07.690
want to attack if, for
example, you're a government,

00:10:07.690 --> 00:10:09.890
or if you're another agency
or something like that.

00:10:09.890 --> 00:10:15.650
For about $2.5 million,
they will give you

00:10:15.650 --> 00:10:22.990
what is delightfully called a
zero-day subscription package.

00:10:22.990 --> 00:10:28.170
And so if you sign up for
this, then basically you

00:10:28.170 --> 00:10:30.800
will get 25 exploits
a year, they

00:10:30.800 --> 00:10:33.130
claim, for that much money.

00:10:33.130 --> 00:10:36.060
And so you'll get those exploits
in your inbox or whatever.

00:10:36.060 --> 00:10:39.880
Once again, you can do with
these things whatever you want.

00:10:39.880 --> 00:10:41.565
You've clearly got
2.5 million dollars,

00:10:41.565 --> 00:10:43.660
so you've got a lot of spare
time to think about this stuff,

00:10:43.660 --> 00:10:44.320
presumably.

00:10:44.320 --> 00:10:46.240
And so what's
interesting is that a lot

00:10:46.240 --> 00:10:48.420
of people who work in
these cyber arms dealers,

00:10:48.420 --> 00:10:50.850
they're actually ex
three-letter agencies.

00:10:50.850 --> 00:10:53.662
They're ex-CIA, or ex-NSA,
or things like this.

00:10:53.662 --> 00:10:55.120
It's interesting
to think about who

00:10:55.120 --> 00:10:57.867
are the actual customers of
these cyber arms dealers.

00:10:57.867 --> 00:10:59.450
Some of them are
actually governments,

00:10:59.450 --> 00:11:01.199
like the American
government, for example.

00:11:01.199 --> 00:11:03.310
And they use these things
to attack other nations,

00:11:03.310 --> 00:11:04.070
or whatever.

00:11:04.070 --> 00:11:06.337
But some of the people
who buy this stuff

00:11:06.337 --> 00:11:07.920
are actually,
increasingly, companies.

00:11:07.920 --> 00:11:09.670
So one thing we'll
talk about a little bit

00:11:09.670 --> 00:11:12.260
at the end of the lecture is
how sometimes companies are now

00:11:12.260 --> 00:11:13.968
taking cybersecurity
into their own hands

00:11:13.968 --> 00:11:17.000
and sometimes doing
what's called hackbacks.

00:11:17.000 --> 00:11:19.026
So without getting the
government involved,

00:11:19.026 --> 00:11:20.900
companies that are
attacked by cybercriminals

00:11:20.900 --> 00:11:22.680
will sometimes go
back and explicitly

00:11:22.680 --> 00:11:24.810
try to take out people
who tried to steal

00:11:24.810 --> 00:11:26.070
their intellectual property.

00:11:26.070 --> 00:11:28.430
And they've used some very
inventive legal arguments

00:11:28.430 --> 00:11:30.140
to justify this, and
so far it's actually

00:11:30.140 --> 00:11:31.098
been fairly successful.

00:11:31.098 --> 00:11:33.395
So this is an interesting
aspect of cyber warfare.

00:11:33.395 --> 00:11:35.135
AUDIENCE: How is
any of that legal?

00:11:38.910 --> 00:11:39.910
PROFESSOR: Well, so.

00:11:39.910 --> 00:11:42.181
I mean, information
wants to be free, dude.

00:11:42.181 --> 00:11:42.680
Right?

00:11:42.680 --> 00:11:46.910
So if you think about stuff
like this, for example.

00:11:46.910 --> 00:11:49.895
Just telling you stuff
isn't necessarily illegal.

00:11:49.895 --> 00:11:52.020
I mean, it gets a
little bit gray.

00:11:52.020 --> 00:11:54.860
But for example, if I tell
you that look over there,

00:11:54.860 --> 00:11:59.550
there's a house, and the lock
doesn't work on that door.

00:11:59.550 --> 00:12:00.730
Can I have 20 bucks?

00:12:00.730 --> 00:12:02.540
That's not necessarily illegal.

00:12:02.540 --> 00:12:04.730
Because as it turns
out, these companies

00:12:04.730 --> 00:12:06.880
have, like, hordes
of lawyers that

00:12:06.880 --> 00:12:08.880
look into things like this.

00:12:08.880 --> 00:12:10.654
But in many cases, if
you think about it,

00:12:10.654 --> 00:12:12.320
you can search for
stuff on the internet

00:12:12.320 --> 00:12:14.986
and go to websites that tell you
things like how to build bombs,

00:12:14.986 --> 00:12:16.460
for example.

00:12:16.460 --> 00:12:19.170
Just posting that
information typically

00:12:19.170 --> 00:12:21.392
is not illegal, because
you're just learning.

00:12:21.392 --> 00:12:22.850
What if I'm a
chemist, for example?

00:12:22.850 --> 00:12:24.680
Or something like this.

00:12:24.680 --> 00:12:27.200
So a lot of times, just
giving someone knowledge

00:12:27.200 --> 00:12:29.045
is not necessarily illegal.

00:12:29.045 --> 00:12:31.290
But you're right that
there's some gray areas here,

00:12:31.290 --> 00:12:34.220
and as we'll talk about with
some of these hackbacks,

00:12:34.220 --> 00:12:35.250
it's not always clear.

00:12:35.250 --> 00:12:38.730
For example, if I am a bank, I'm
not a government, I'm a bank.

00:12:38.730 --> 00:12:39.500
I get hacked.

00:12:39.500 --> 00:12:40.600
It's not always
clear that I actually

00:12:40.600 --> 00:12:42.058
have the legal
authority to go back

00:12:42.058 --> 00:12:44.690
and, let's say, try to shut down
a botnet or things like that.

00:12:44.690 --> 00:12:46.680
Companies have done
stuff like that.

00:12:46.680 --> 00:12:50.670
But I think this is an
example where the law is

00:12:50.670 --> 00:12:54.610
lagging behind practice.

00:12:54.610 --> 00:12:56.170
And so people have
used things like,

00:12:56.170 --> 00:12:57.970
we will use copyright
infringement law

00:12:57.970 --> 00:12:59.880
to attack botnets as a company.

00:12:59.880 --> 00:13:02.260
Because they're selling
legal goods of ours,

00:13:02.260 --> 00:13:04.470
so we'll use IP infringement.

00:13:04.470 --> 00:13:06.470
Like, this is probably
not what Thomas Jefferson

00:13:06.470 --> 00:13:07.845
was thinking when
he was thinking

00:13:07.845 --> 00:13:09.360
about how these laws work.

00:13:09.360 --> 00:13:11.370
So this is a little bit
of a cat-and-mouse game.

00:13:11.370 --> 00:13:15.650
So we'll do a little bit of
that later in the lecture.

00:13:15.650 --> 00:13:17.940
So, yes, this is
very interesting.

00:13:17.940 --> 00:13:21.130
Basically what this all
means is that there's

00:13:21.130 --> 00:13:28.760
this marketplace for all kinds
of computational resources

00:13:28.760 --> 00:13:32.500
that you might use as someone
who wants to launch attacks.

00:13:32.500 --> 00:13:34.070
So for example,
there's a marketplace

00:13:34.070 --> 00:13:39.270
for compromised systems.

00:13:39.270 --> 00:13:40.980
So, for example, you
can go to the darker

00:13:40.980 --> 00:13:43.820
places of the internet,
you can purchase

00:13:43.820 --> 00:13:47.460
entire compromised machines
that might be part of a botnet.

00:13:47.460 --> 00:13:51.130
You can actually buy access
to a compromised website,

00:13:51.130 --> 00:13:52.070
for example.

00:13:52.070 --> 00:13:55.130
You might use that website
to post spam, or put up

00:13:55.130 --> 00:13:57.630
evil links, or things like that.

00:13:57.630 --> 00:14:00.810
You can also get access to
compromised email accounts,

00:14:00.810 --> 00:14:02.265
like Gmail or Yahoo accounts.

00:14:02.265 --> 00:14:03.640
As we'll talk
later, those things

00:14:03.640 --> 00:14:05.726
are very very powerful
for an attacker.

00:14:05.726 --> 00:14:08.190
And you may also just buy
sort of a subscription

00:14:08.190 --> 00:14:09.472
service for a botnet.

00:14:09.472 --> 00:14:11.180
You'll just have this
thing lying around.

00:14:11.180 --> 00:14:13.280
You can use it to send denial
of service attacks or things

00:14:13.280 --> 00:14:13.780
like that.

00:14:13.780 --> 00:14:15.350
So there's a
marketplace for that.

00:14:15.350 --> 00:14:18.650
There's a marketplace for tools.

00:14:18.650 --> 00:14:22.170
So you can get, as an attacker,
off-the-shelf malware kits,

00:14:22.170 --> 00:14:23.470
for example.

00:14:23.470 --> 00:14:26.370
You can use perhaps
arms dealers like this

00:14:26.370 --> 00:14:27.893
to get access to
zero-day exploits

00:14:27.893 --> 00:14:30.510
so you can write your own
malware, so on and so forth.

00:14:30.510 --> 00:14:32.620
And there's also
a big marketplace

00:14:32.620 --> 00:14:38.150
for stolen user information.

00:14:38.150 --> 00:14:40.480
So this is stuff like
Social Security numbers,

00:14:40.480 --> 00:14:44.040
credit card numbers, email
addresses, so on and so forth.

00:14:44.040 --> 00:14:45.710
So it's all out
there on the internet

00:14:45.710 --> 00:14:47.717
if you're just willing
to look for it.

00:14:47.717 --> 00:14:49.350
And so the paper
that we're going

00:14:49.350 --> 00:14:52.550
to look at today
basically focused

00:14:52.550 --> 00:14:56.990
on one aspect of this,
which is the spam ecosystem.

00:15:00.110 --> 00:15:02.020
And so in particular,
they look at the sale

00:15:02.020 --> 00:15:06.850
of pharmaceuticals, of
knockoff goods, and software.

00:15:06.850 --> 00:15:09.420
And so they basically
break this spam ecosystem

00:15:09.420 --> 00:15:11.100
into three parts.

00:15:11.100 --> 00:15:15.230
They break it into advertising.

00:15:15.230 --> 00:15:18.020
So this is the
process of somehow

00:15:18.020 --> 00:15:22.570
getting a user to click
on a spam link somehow.

00:15:22.570 --> 00:15:25.300
And then once they've
done that, there's

00:15:25.300 --> 00:15:29.890
this issue of click support.

00:15:29.890 --> 00:15:33.665
So this is the notion that
once the user clicks the link,

00:15:33.665 --> 00:15:36.165
there has to be some type of
web server, DNS infrastructure,

00:15:36.165 --> 00:15:38.220
so on and so forth
on the back end that

00:15:38.220 --> 00:15:40.790
actually presents the spam
website that the user goes to.

00:15:40.790 --> 00:15:43.076
And then the final
part is realization.

00:15:45.820 --> 00:15:48.910
So this is actually
allowing the user

00:15:48.910 --> 00:15:51.650
to say they want
to buy something.

00:15:51.650 --> 00:15:53.950
The user sends money
to the spammers,

00:15:53.950 --> 00:15:57.230
and the user's going to get some
product back in the back end.

00:15:57.230 --> 00:16:01.450
And so this is where all
of the money makes place.

00:16:01.450 --> 00:16:04.160
And so a lot of this
stuff is actually

00:16:04.160 --> 00:16:10.070
outsourced to what the paper
calls affiliate programs.

00:16:13.050 --> 00:16:15.650
And so you can think of
these affiliate programs

00:16:15.650 --> 00:16:20.030
as essentially doing a
lot of the back-end grunt

00:16:20.030 --> 00:16:23.130
work of talking to banks
and Visa and MasterCard

00:16:23.130 --> 00:16:24.200
and things like this.

00:16:24.200 --> 00:16:26.044
And so a lot of
times, the spammers,

00:16:26.044 --> 00:16:27.710
they don't want to
deal with that stuff.

00:16:27.710 --> 00:16:29.640
They just want to
create the links

00:16:29.640 --> 00:16:32.520
and do-- you can think of it
as the advertising component.

00:16:32.520 --> 00:16:34.230
And so a lot of
times the spammers

00:16:34.230 --> 00:16:37.920
themselves, they will
work on a commission.

00:16:37.920 --> 00:16:42.340
So they will get, let's
say, anywhere between 30%

00:16:42.340 --> 00:16:49.890
and maybe 50% of the final
sale that they deliver to one

00:16:49.890 --> 00:16:52.670
of these back-end affiliates.

00:16:52.670 --> 00:16:55.541
So does that all make
sense at a high level?

00:16:55.541 --> 00:16:56.040
OK.

00:16:56.040 --> 00:17:02.570
So what we'll do is we'll look
at each component of this spam

00:17:02.570 --> 00:17:05.230
trajectory, and then see how
it works, and then maybe think

00:17:05.230 --> 00:17:07.505
about how we'd to be able
to shut down spammers

00:17:07.505 --> 00:17:11.540
at different levels
of this [INAUDIBLE].

00:17:11.540 --> 00:17:14.609
So the first thing we'll look
at is the advertising component.

00:17:18.992 --> 00:17:21.450
And so, like I mentioned, the
basic idea of the advertising

00:17:21.450 --> 00:17:29.440
is, how do you get the
user to click on a link?

00:17:34.180 --> 00:17:36.630
That's the primary question
we'll be concerned with here.

00:17:36.630 --> 00:17:39.320
And so the typical
thing, as we all know,

00:17:39.320 --> 00:17:42.457
is you're going to email
spam, although as we discussed

00:17:42.457 --> 00:17:43.915
at the beginning
of lecture, people

00:17:43.915 --> 00:17:45.670
are starting to use
text messages and some

00:17:45.670 --> 00:17:48.890
of these other forms
of communication.

00:17:48.890 --> 00:17:50.760
You could also imagine
maybe here we're

00:17:50.760 --> 00:17:53.305
going to start using
social networks as well.

00:17:53.305 --> 00:17:54.763
So now when you go
to Facebook, not

00:17:54.763 --> 00:17:56.929
only are you polluted by
your real friends' content,

00:17:56.929 --> 00:17:58.940
you're also polluted
by spam messages too.

00:17:58.940 --> 00:18:03.390
So this is about
economics, this discussion.

00:18:03.390 --> 00:18:05.190
So one interesting
question is, how much

00:18:05.190 --> 00:18:08.350
does it cost to actually
send out these spam messages.

00:18:08.350 --> 00:18:12.250
And so as it turns out, it's
not very expensive at all.

00:18:12.250 --> 00:18:18.454
For about 60 bucks, you can
spend a million spam messages.

00:18:21.150 --> 00:18:23.760
So that's a super,
super low cost.

00:18:23.760 --> 00:18:26.190
And this cost is
actually much lower

00:18:26.190 --> 00:18:28.220
if you're directly
operating a botnet.

00:18:28.220 --> 00:18:29.990
You can cut out the middleman.

00:18:29.990 --> 00:18:32.570
But even if you are
renting one of the botnets

00:18:32.570 --> 00:18:35.890
from one of these marketplaces,
this is still super, super low.

00:18:35.890 --> 00:18:38.154
AUDIENCE: So how many of
those are actually effective?

00:18:38.154 --> 00:18:40.072
As in, they don't get filtered?

00:18:40.072 --> 00:18:41.780
PROFESSOR: Ah, so
that's a good question.

00:18:41.780 --> 00:18:44.300
So that leads to my next point.

00:18:44.300 --> 00:18:46.299
So you're sending
a million spams,

00:18:46.299 --> 00:18:47.840
but then they're
going to get dropped

00:18:47.840 --> 00:18:49.174
at various points along the way.

00:18:49.174 --> 00:18:51.006
They're going to get
caught in spam filters,

00:18:51.006 --> 00:18:53.048
people will-- they see it
but they just delete it

00:18:53.048 --> 00:18:55.005
because they know that
an email that has, like,

00:18:55.005 --> 00:18:56.700
18 dollar signs should
just be deleted.

00:18:56.700 --> 00:18:58.940
So if you look at
the conversion rate,

00:18:58.940 --> 00:19:00.870
you'll see that the
click rates are actually

00:19:00.870 --> 00:19:04.320
very low because of
things like spam filters

00:19:04.320 --> 00:19:05.290
and stuff like that.

00:19:05.290 --> 00:19:10.200
And also many users are
trained to avoid these things.

00:19:10.200 --> 00:19:11.950
Click rates are low.

00:19:11.950 --> 00:19:15.170
And this is why
sending spam has to be

00:19:15.170 --> 00:19:18.800
super, super cheap,
because you will not

00:19:18.800 --> 00:19:20.040
get a lot of conversions.

00:19:20.040 --> 00:19:21.850
So for example, there have been
some empirical studies that

00:19:21.850 --> 00:19:23.016
looked at these click rates.

00:19:23.016 --> 00:19:31.030
And one study found that they
looked at 350 million spam

00:19:31.030 --> 00:19:34.650
messages, and they
found that out

00:19:34.650 --> 00:19:37.650
of those 350 million
messages, there

00:19:37.650 --> 00:19:44.960
was only about 10,000
clicks on those messages.

00:19:44.960 --> 00:19:46.710
So there's a massive
dropoff here.

00:19:46.710 --> 00:19:49.750
And then out of
these 10,000 clicks

00:19:49.750 --> 00:19:52.567
there were only 28
purchase attempts.

00:19:55.680 --> 00:19:58.430
So that's super, super low.

00:19:58.430 --> 00:20:01.010
And so that's why it's
extremely important

00:20:01.010 --> 00:20:04.275
for this entire ecosystem to be
very cheap from the perspective

00:20:04.275 --> 00:20:04.820
of a spammer.

00:20:04.820 --> 00:20:06.653
Because I mean, look
at these dropoffs here.

00:20:06.653 --> 00:20:08.780
These are multiple
orders of magnitude.

00:20:08.780 --> 00:20:13.636
And so that's why one might
hope that at least in theory we

00:20:13.636 --> 00:20:15.010
could squeeze--
like for example,

00:20:15.010 --> 00:20:17.880
we could drive this
number up maybe just $10.

00:20:17.880 --> 00:20:20.280
Maybe that has some
catastrophic knockdown effect

00:20:20.280 --> 00:20:22.440
on how profitable this stuff is.

00:20:22.440 --> 00:20:24.995
So it's very important
for the spammers

00:20:24.995 --> 00:20:26.880
that everything be
as cheap as possible.

00:20:26.880 --> 00:20:28.848
AUDIENCE: So those
10,000 clicks.

00:20:28.848 --> 00:20:33.768
Again, how many of
those 350 million emails

00:20:33.768 --> 00:20:35.911
were filtered out of the inbox?

00:20:35.911 --> 00:20:39.854
I'm just trying to get a sense
of out of how many emails

00:20:39.854 --> 00:20:41.270
those clicks were
out of, to gauge

00:20:41.270 --> 00:20:45.577
how effective spam filtering is
versus how silly us humans are.

00:20:45.577 --> 00:20:47.410
PROFESSOR: Yeah, that
I'm not actually sure.

00:20:47.410 --> 00:20:49.960
That's a good question.

00:20:49.960 --> 00:20:52.870
AUDIENCE: So I was just
listening to a talk

00:20:52.870 --> 00:20:55.490
by Jeff Walker on
Friday about this stuff,

00:20:55.490 --> 00:20:59.350
and he says that on
the order of 20% to 40%

00:20:59.350 --> 00:21:02.990
of clicks going to one of
these websites actually

00:21:02.990 --> 00:21:04.425
goes from a user's spam folder.

00:21:04.425 --> 00:21:07.363
So users go in their spam
folder, looking for this stuff,

00:21:07.363 --> 00:21:08.238
and they click on it.

00:21:08.238 --> 00:21:10.070
So presumably there's
a class of customers

00:21:10.070 --> 00:21:11.842
that are looking for
this, and if they're

00:21:11.842 --> 00:21:14.300
looking for it-- oh, yeah, I'll
just go into my spam folder

00:21:14.300 --> 00:21:15.340
to find this.

00:21:15.340 --> 00:21:17.850
So it's not clear that things
going into spam folders

00:21:17.850 --> 00:21:19.324
are getting zero clicks.

00:21:19.324 --> 00:21:21.740
PROFESSOR: Yeah, I've heard
anecdotal reports of that too.

00:21:21.740 --> 00:21:24.900
Some people, even for
legitimate emails,

00:21:24.900 --> 00:21:26.980
they'll mark it as spam
just so that if there's

00:21:26.980 --> 00:21:29.512
a shoulder-surfer,
like at work, who's

00:21:29.512 --> 00:21:30.970
seeing them go to
Gmail, let's say,

00:21:30.970 --> 00:21:33.440
they won't come and see that
you've subscribed to, you know,

00:21:33.440 --> 00:21:33.940
whatever.

00:21:33.940 --> 00:21:35.950
And then they can secretly
go into the spam folder,

00:21:35.950 --> 00:21:37.910
they know it's not deleted,
and look at this stuff.

00:21:37.910 --> 00:21:38.890
This is actually a
really interesting point.

00:21:38.890 --> 00:21:41.020
There's this whole
psychology of who

00:21:41.020 --> 00:21:42.804
it is that actually
clicks on these links.

00:21:42.804 --> 00:21:45.470
And so I think one of the papers
that I linked to in the lecture

00:21:45.470 --> 00:21:49.830
notes talks about why these
Nigerian scams still work.

00:21:49.830 --> 00:21:52.810
Because you'd think that
anyone who basically

00:21:52.810 --> 00:21:54.440
has either common
sense themselves,

00:21:54.440 --> 00:21:56.270
or a friend who
has common sense,

00:21:56.270 --> 00:21:59.120
would never click on one of
these Nigerian email scams.

00:21:59.120 --> 00:21:59.620
Right?

00:21:59.620 --> 00:22:04.470
But it turns out that the
Nigerian meme is actually

00:22:04.470 --> 00:22:08.450
useful for spammers
to filter out idiots.

00:22:08.450 --> 00:22:12.260
In other words, if you are so
foolish that you would still

00:22:12.260 --> 00:22:15.210
click on a Nigerian
email, then oh, OK, you're

00:22:15.210 --> 00:22:19.230
going to do one of these
conversion things here.

00:22:19.230 --> 00:22:21.540
When you think about it,
that's one of the key things

00:22:21.540 --> 00:22:22.350
that spammers need.

00:22:22.350 --> 00:22:24.370
They need people
who are gullible

00:22:24.370 --> 00:22:28.370
enough or idealistic enough to
click through on these things.

00:22:28.370 --> 00:22:31.490
There's a whole sort of
psychology behind this.

00:22:31.490 --> 00:22:32.833
It's very interesting.

00:22:32.833 --> 00:22:36.037
AUDIENCE: So each of these
purchases, about how much

00:22:36.037 --> 00:22:37.254
are they worth?

00:22:37.254 --> 00:22:38.670
PROFESSOR: That's
a good question.

00:22:38.670 --> 00:22:41.560
So it actually depends
on the type of thing

00:22:41.560 --> 00:22:42.850
that you're looking at.

00:22:42.850 --> 00:22:45.930
A lot of these purchases are not
actually super high in value.

00:22:45.930 --> 00:22:48.500
So you're thinking that
someone's buying herbal Viagra

00:22:48.500 --> 00:22:50.530
or they're buying like
a knockoff Windows

00:22:50.530 --> 00:22:51.870
license or things like that.

00:22:51.870 --> 00:22:54.453
And in fact, a lot of times when
they're buying these knockoff

00:22:54.453 --> 00:22:55.924
products, presumably
the price is

00:22:55.924 --> 00:22:58.215
lower than what they'd actually
get in the real market,

00:22:58.215 --> 00:23:00.510
because otherwise you could
just go down to your local mall

00:23:00.510 --> 00:23:01.480
and buy these things.

00:23:01.480 --> 00:23:03.521
So a lot of times these
purchases you're actually

00:23:03.521 --> 00:23:05.850
making are less
than 1,000 dollars,

00:23:05.850 --> 00:23:09.205
and oftentimes a
lot less than that.

00:23:09.205 --> 00:23:11.310
Any other questions?

00:23:11.310 --> 00:23:12.180
OK.

00:23:12.180 --> 00:23:14.515
So these conversion rates
are super, super low.

00:23:14.515 --> 00:23:16.430
So like I said, one
of the key things

00:23:16.430 --> 00:23:22.680
to do as a defender is to
try to basically make spam

00:23:22.680 --> 00:23:29.380
more expensive for the spammer.

00:23:29.380 --> 00:23:31.020
So there's a couple
different ways

00:23:31.020 --> 00:23:32.920
you might think
about doing that.

00:23:32.920 --> 00:23:40.170
One way you might think about
doing that are IP blacklists.

00:23:40.170 --> 00:23:43.540
So maybe ISPs or
someone else basically

00:23:43.540 --> 00:23:45.545
collects this list
of IPS that are

00:23:45.545 --> 00:23:48.125
known to be bad, that are
known to come from spammers.

00:23:48.125 --> 00:23:51.630
And then we just don't let
these people send traffic.

00:23:51.630 --> 00:23:54.430
So this kinda-sorta used
to work for a while.

00:23:54.430 --> 00:23:58.470
But now it's so much
easier for the attackers

00:23:58.470 --> 00:24:00.756
to use techniques like
DNS redirection and stuff

00:24:00.756 --> 00:24:02.260
like that, that we'll talk
about in a little bit,

00:24:02.260 --> 00:24:04.200
this doesn't actually
work out very well.

00:24:04.200 --> 00:24:06.420
Because now there's a much
larger set of addresses

00:24:06.420 --> 00:24:08.890
that spammers can
send spam from,

00:24:08.890 --> 00:24:10.970
and they can also
dynamically switch

00:24:10.970 --> 00:24:15.480
the binding between
hostnames and web servers

00:24:15.480 --> 00:24:18.250
and all these types of things So
this doesn't work out so well.

00:24:18.250 --> 00:24:20.760
Another idea that's been
around for a long time

00:24:20.760 --> 00:24:27.600
is charging for email in some
way, so each email you send,

00:24:27.600 --> 00:24:30.840
you have to pay
some micropayment.

00:24:30.840 --> 00:24:33.024
So that currency could be
a couple different things.

00:24:33.024 --> 00:24:34.565
So you might imagine
that if I wanted

00:24:34.565 --> 00:24:36.360
to send you an email,
maybe I'd have to pay

00:24:36.360 --> 00:24:38.390
a tenth of a tenth of a penny.

00:24:38.390 --> 00:24:41.174
And that's no big deal
for me, because I don't

00:24:41.174 --> 00:24:42.340
send that many emails a day.

00:24:42.340 --> 00:24:44.798
But if you're a spammer trying
to operate at these volumes,

00:24:44.798 --> 00:24:46.000
then that quickly adds up.

00:24:46.000 --> 00:24:48.360
That destroys their value chain.

00:24:48.360 --> 00:24:49.945
Another idea that
people have had

00:24:49.945 --> 00:24:53.590
is, what if you used
computation as a currency?

00:24:53.590 --> 00:24:55.740
This is the idea
that before my email

00:24:55.740 --> 00:24:57.370
server will accept
an email from me,

00:24:57.370 --> 00:24:58.842
I have to solve some puzzle.

00:24:58.842 --> 00:25:01.680
I have to do some math trick,
or something like that.

00:25:01.680 --> 00:25:03.840
Once again, that
cuts down the rate

00:25:03.840 --> 00:25:07.642
at which these bulk
mailers can send messages.

00:25:07.642 --> 00:25:10.215
Also, we're all familiar
with CAPTCHAs, too.

00:25:10.215 --> 00:25:11.590
This is basically
the idea that I

00:25:11.590 --> 00:25:14.750
have to look at some
picture of nine animals

00:25:14.750 --> 00:25:16.260
and find the cat
instead of the dog,

00:25:16.260 --> 00:25:18.074
or type in some weird
squiggly number that

00:25:18.074 --> 00:25:19.990
looks like a migraine,
or something like that.

00:25:19.990 --> 00:25:24.280
So there have been
all kinds of ideas

00:25:24.280 --> 00:25:26.772
for charging for email to
stop this kind of stuff

00:25:26.772 --> 00:25:28.680
from happening.

00:25:28.680 --> 00:25:31.180
One of the classic problems,
though, with all these schemes,

00:25:31.180 --> 00:25:35.120
is who's going to be the
first one to implement it.

00:25:35.120 --> 00:25:37.172
And if all the email
providers don't move forward

00:25:37.172 --> 00:25:38.880
at the same time, then
of course spammers

00:25:38.880 --> 00:25:41.088
are just going to migrate
to the email providers that

00:25:41.088 --> 00:25:42.682
don't require these techniques.

00:25:42.682 --> 00:25:44.890
So there's been the problem
of how do we get everyone

00:25:44.890 --> 00:25:47.010
to upgrade en masse.

00:25:47.010 --> 00:25:48.930
And there's this
issue of, well, what

00:25:48.930 --> 00:25:52.360
would happen if a user
device is compromised?

00:25:52.360 --> 00:25:54.900
So maybe if someone breaks
into my Gmail account,

00:25:54.900 --> 00:25:56.275
then maybe they're
going to force

00:25:56.275 --> 00:26:00.330
me to pay 350 million
micropayments, which

00:26:00.330 --> 00:26:02.555
could individually bankrupt me.

00:26:02.555 --> 00:26:04.805
And so it's not quite clear
that some of these schemes

00:26:04.805 --> 00:26:06.335
are ready for
primetime, but they

00:26:06.335 --> 00:26:07.920
do represent an interesting
thought experiment

00:26:07.920 --> 00:26:09.700
about how you might be able
to stop some of this stuff

00:26:09.700 --> 00:26:10.658
from the senders' side.

00:26:10.658 --> 00:26:13.582
AUDIENCE: So how do they work
with mailing lists, where you

00:26:13.582 --> 00:26:14.790
have these big mailing lists?

00:26:14.790 --> 00:26:15.340
PROFESSOR: Yeah,
so there's problems

00:26:15.340 --> 00:26:17.820
with that, and with
mailing list aggregation.

00:26:17.820 --> 00:26:20.050
So it's very, very tricky,
because there are actually

00:26:20.050 --> 00:26:22.722
some bulk mails that
you do want to send.

00:26:22.722 --> 00:26:24.930
I mean, you might imagine
having some heuristic where

00:26:24.930 --> 00:26:27.010
you look at the size
of the mailing list

00:26:27.010 --> 00:26:29.702
and maybe you scale the
payment according to that.

00:26:29.702 --> 00:26:31.160
So for example,
maybe heuristically

00:26:31.160 --> 00:26:33.950
you think it's reasonable
to send email to 1000 folks

00:26:33.950 --> 00:26:36.790
but not to 350 million folks,
or something like this.

00:26:36.790 --> 00:26:39.331
But you're right that there are
a lot of practical limitation

00:26:39.331 --> 00:26:42.280
issues that come out
with this kind of stuff.

00:26:42.280 --> 00:26:51.080
So what the adversary can do
to get around some of this?

00:26:51.080 --> 00:26:54.070
There are basically
three workarounds

00:26:54.070 --> 00:26:58.170
that adversaries might try.

00:26:58.170 --> 00:27:02.820
So one thing they can
do is just use botnets,

00:27:02.820 --> 00:27:11.512
because botnets have a lot of
IPs that the attacker can use.

00:27:11.512 --> 00:27:12.970
And so for example,
even if someone

00:27:12.970 --> 00:27:15.340
were trying to do something
like IP blacklists,

00:27:15.340 --> 00:27:17.960
then maybe the attacker can
cycle through a bunch of IPs

00:27:17.960 --> 00:27:19.650
in this botnet and
maybe get around

00:27:19.650 --> 00:27:22.510
some of that
blacklist filtering.

00:27:22.510 --> 00:27:28.320
They can also try to use
compromised webmail accounts

00:27:28.320 --> 00:27:29.260
to send spam.

00:27:32.210 --> 00:27:35.560
So the reason why
these are super useful

00:27:35.560 --> 00:27:38.870
is because sites
like Gmail or Yahoo

00:27:38.870 --> 00:27:43.170
or Hotmail, those services can't
be blacklisted, because they're

00:27:43.170 --> 00:27:44.095
super, super powerful.

00:27:44.095 --> 00:27:46.230
So if you blacklisted
the entire service,

00:27:46.230 --> 00:27:48.188
then you're probably
going to shut down service

00:27:48.188 --> 00:27:50.020
for tens of millions of people.

00:27:50.020 --> 00:27:54.320
Now of course, these individual
services can shut down you.

00:27:54.320 --> 00:27:56.654
And so that will
actually happen once they

00:27:56.654 --> 00:27:59.070
have these heuristics running
that see that you're sending

00:27:59.070 --> 00:28:00.570
to a lot of people
you've never sent

00:28:00.570 --> 00:28:01.980
before, and so on and so forth.

00:28:01.980 --> 00:28:05.660
A lot of AI strategy takes
place on the webmail server side

00:28:05.660 --> 00:28:07.324
to try to predict these things.

00:28:07.324 --> 00:28:09.490
But these things can be
very valuable to an attacker

00:28:09.490 --> 00:28:13.100
because even if your
compromised account is not

00:28:13.100 --> 00:28:16.250
used to send a lot of emails,
it can be used to send emails

00:28:16.250 --> 00:28:18.210
to people that you know.

00:28:18.210 --> 00:28:20.170
So maybe it allows the
attacker to do things

00:28:20.170 --> 00:28:22.740
like spearfishing more
easily, or things like that.

00:28:22.740 --> 00:28:24.250
People are more likely
to click on an email that

00:28:24.250 --> 00:28:26.000
comes from an address
that they recognize.

00:28:26.000 --> 00:28:29.500
So that's a very
powerful technique there.

00:28:29.500 --> 00:28:31.210
And then attackers
can also try to do

00:28:31.210 --> 00:28:38.530
things like hijack IP addresses
from legitimate owners.

00:28:38.530 --> 00:28:42.790
So as was mentioned
briefly in Mark's talk,

00:28:42.790 --> 00:28:45.380
there's this protocol
called BGP that

00:28:45.380 --> 00:28:48.130
basically is used to control
routing on the internet.

00:28:48.130 --> 00:28:49.960
So there are these
attacks that people

00:28:49.960 --> 00:28:52.120
can do whereby they
will essentially say,

00:28:52.120 --> 00:28:55.905
hey, I'm actually the owner of
some prefix of IP addresses,

00:28:55.905 --> 00:28:57.530
even though they
don't actually own it.

00:28:57.530 --> 00:28:59.734
So all the traffic that's
involving those addresses

00:28:59.734 --> 00:29:01.650
will go in towards the
attacker, and then they

00:29:01.650 --> 00:29:04.520
can actually use those addresses
to send out spam from there.

00:29:04.520 --> 00:29:05.790
Then once they're
done with their evil,

00:29:05.790 --> 00:29:07.373
they can release the
BGP advertisement

00:29:07.373 --> 00:29:10.220
and then go try to do
this somewhere else.

00:29:10.220 --> 00:29:12.940
There's a lot of research
in how you can essentially

00:29:12.940 --> 00:29:15.810
think of ways to authenticate
BGP by advertisement

00:29:15.810 --> 00:29:18.290
or otherwise prevent
these IP address hijacks.

00:29:18.290 --> 00:29:19.440
So there's a bunch of
different techniques

00:29:19.440 --> 00:29:21.398
that attackers can do to
try to get around some

00:29:21.398 --> 00:29:24.840
of these defensive techniques.

00:29:24.840 --> 00:29:28.030
So this can all be done,
but still, these defenses,

00:29:28.030 --> 00:29:28.739
they're not free.

00:29:28.739 --> 00:29:31.279
So presumably the attacker has
to pay for the botnet somehow,

00:29:31.279 --> 00:29:33.590
they have to get inside
these webmail accounts.

00:29:33.590 --> 00:29:36.330
And so any of these
defenses that you can do

00:29:36.330 --> 00:29:39.856
will help to drive the cost
up of generating these spams.

00:29:39.856 --> 00:29:41.230
So as such, they're
still useful,

00:29:41.230 --> 00:29:45.610
even though they are
not perfect defenses.

00:29:45.610 --> 00:29:48.760
So what do these
botnets look like?

00:29:48.760 --> 00:29:55.770
So at a high level, you
have the proverbial cloud

00:29:55.770 --> 00:29:56.785
from your cloud diagram.

00:29:56.785 --> 00:29:59.780
You have your command and
control infrastructure up here,

00:29:59.780 --> 00:30:01.760
and this is the
thing that actually

00:30:01.760 --> 00:30:08.220
sends commands to all of the
individual bots down here.

00:30:08.220 --> 00:30:11.490
So the spammer will talk to
the C&C and will say hey,

00:30:11.490 --> 00:30:14.130
here's my new spam
messages I want to send,

00:30:14.130 --> 00:30:17.445
and then maybe these bots will
act on behalf of their command

00:30:17.445 --> 00:30:19.570
and control infrastructure
and start sending emails

00:30:19.570 --> 00:30:21.460
to a bunch of people.

00:30:21.460 --> 00:30:23.030
So let's see here.

00:30:23.030 --> 00:30:25.230
So why are these bots useful?

00:30:25.230 --> 00:30:27.592
Well, as I mentioned here,
they have IP addresses,

00:30:27.592 --> 00:30:28.550
which are super useful.

00:30:28.550 --> 00:30:31.050
But of course they also have
the associated bandwidth there.

00:30:31.050 --> 00:30:32.551
They also have
computational cycles.

00:30:32.551 --> 00:30:33.925
Sometimes these
bots are actually

00:30:33.925 --> 00:30:35.240
used as web servers themselves.

00:30:35.240 --> 00:30:37.610
So these things are
very, very useful.

00:30:37.610 --> 00:30:40.905
And they also serve as
a layer of indirection.

00:30:40.905 --> 00:30:43.740
So, as we're to discuss in
more detail in a second,

00:30:43.740 --> 00:30:46.460
indirection is very
useful for attackers.

00:30:46.460 --> 00:30:49.590
That means that if law
enforcement or whatnot shuts

00:30:49.590 --> 00:30:51.724
down this level, well, if
the command and control

00:30:51.724 --> 00:30:53.890
infrastructure's still
alive, then maybe the spammer

00:30:53.890 --> 00:30:55.672
can just attach this
command and control

00:30:55.672 --> 00:30:57.380
infrastructure to a
different set of bots

00:30:57.380 --> 00:30:59.040
and keep on running.

00:30:59.040 --> 00:31:01.670
So that's one reason why
these bots are very useful.

00:31:01.670 --> 00:31:04.370
And these bots can scale
to the order of magnitude

00:31:04.370 --> 00:31:06.860
of millions of IP addresses.

00:31:06.860 --> 00:31:09.550
So as it turns out, people
will click random links

00:31:09.550 --> 00:31:11.700
involving malware all the time.

00:31:11.700 --> 00:31:13.796
So these things can get
very, very, very large.

00:31:13.796 --> 00:31:15.920
And so some of these
takedowns that these companies

00:31:15.920 --> 00:31:18.253
get involved in, with trying
to take down these botnets,

00:31:18.253 --> 00:31:20.561
they involve millions
upon millions of machines.

00:31:20.561 --> 00:31:22.900
So they're very
technically challenging.

00:31:22.900 --> 00:31:25.780
So how much does it cost to
get your malware installed

00:31:25.780 --> 00:31:27.060
on all these bots?

00:31:27.060 --> 00:31:29.080
Remember, these
are all typically

00:31:29.080 --> 00:31:30.680
regular end-user machines.

00:31:30.680 --> 00:31:34.300
So the cost for getting
your malware on one of these

00:31:34.300 --> 00:31:45.640
machines, so price per post,
is about $0.10 for U.S.

00:31:45.640 --> 00:31:58.370
hosts and on the order of
$0.01 for posts in Asia.

00:31:58.370 --> 00:32:00.640
So it's interesting there's
this differential here.

00:32:00.640 --> 00:32:01.620
There might a couple
of different reasons

00:32:01.620 --> 00:32:03.020
we can imagine for why that is.

00:32:03.020 --> 00:32:09.240
It might be that people are
prone to think that connections

00:32:09.240 --> 00:32:11.860
originating from the U.S. are
more likely to be trustworthy.

00:32:11.860 --> 00:32:14.390
It may also be that
because there's

00:32:14.390 --> 00:32:15.890
pirated software
running here, stuff

00:32:15.890 --> 00:32:18.430
that's not actively up to
date with respect to patches.

00:32:18.430 --> 00:32:21.100
It's actually easier to
get botnet posts over here.

00:32:21.100 --> 00:32:24.000
So you'll see some very
interesting statistics

00:32:24.000 --> 00:32:27.180
about how some of these rates
might fluctuate, for example,

00:32:27.180 --> 00:32:29.410
as you see companies
like Microsoft go out

00:32:29.410 --> 00:32:32.169
and try to stamp down on
piracy and things like that.

00:32:32.169 --> 00:32:33.710
But anyway, this is
a rough estimate.

00:32:33.710 --> 00:32:38.260
Suffice it to say, this
is not super expensive.

00:32:38.260 --> 00:32:41.480
So what does-- any questions
before we continue?

00:32:41.480 --> 00:32:41.980
OK.

00:32:41.980 --> 00:32:45.340
So what does this command
and control infrastructure

00:32:45.340 --> 00:32:46.060
look like?

00:32:46.060 --> 00:32:49.580
So you can imagine that in one
substantiation, the simplest

00:32:49.580 --> 00:32:55.090
substantiation, this is
just some centralized setup.

00:32:55.090 --> 00:32:58.185
And so this is maybe
one machine or maybe

00:32:58.185 --> 00:32:59.840
some small number of machines.

00:32:59.840 --> 00:33:01.990
The attacker gets to
log into those machines

00:33:01.990 --> 00:33:04.490
and essentially just send these
commands out to the botnets

00:33:04.490 --> 00:33:05.195
from there.

00:33:05.195 --> 00:33:06.653
So if it's going
to be centralized,

00:33:06.653 --> 00:33:10.890
then it's going to be very
useful for the attacker to have

00:33:10.890 --> 00:33:13.160
what's known as
bulletproof hosting.

00:33:17.480 --> 00:33:19.545
So the idea behind
bulletproof hosting

00:33:19.545 --> 00:33:23.980
is that you want to put
this command and control

00:33:23.980 --> 00:33:31.750
infrastructure on servers that
reside in ISPs that ignore

00:33:31.750 --> 00:33:33.570
requests from banks or
from law enforcement

00:33:33.570 --> 00:33:35.980
to take down servers.

00:33:35.980 --> 00:33:38.200
So there are actually
bulletproof servers that exist.

00:33:38.200 --> 00:33:40.699
They charge a premium, because
there is a little bit of risk

00:33:40.699 --> 00:33:41.500
involved there.

00:33:41.500 --> 00:33:44.041
But if you can manage to host
one of your command and control

00:33:44.041 --> 00:33:45.819
centers there, it's
going to be very nice.

00:33:45.819 --> 00:33:47.860
Because then when the
American government or when

00:33:47.860 --> 00:33:50.220
Goldman Sachs or whoever
says hey, shut this guy down,

00:33:50.220 --> 00:33:52.500
they're running spam,
the provider will say,

00:33:52.500 --> 00:33:53.390
how can you make me?

00:33:53.390 --> 00:33:55.199
I run in a different
legal jurisdiction.

00:33:55.199 --> 00:33:57.490
I don't have to follow your
intellectual property laws.

00:33:57.490 --> 00:33:58.922
So on and so forth.

00:33:58.922 --> 00:33:59.880
So this is very useful.

00:33:59.880 --> 00:34:02.330
Like I said, these
types of hosts

00:34:02.330 --> 00:34:05.300
actually charge a risk
premium for running

00:34:05.300 --> 00:34:06.850
that kind of service.

00:34:06.850 --> 00:34:09.489
And so the other alternative for
running the C&C infrastructure

00:34:09.489 --> 00:34:13.639
is, this could be a
peer-to-peer network.

00:34:17.280 --> 00:34:22.042
And so the idea here is that
maybe this is sort of-- you

00:34:22.042 --> 00:34:24.250
can almost think of it as
a mini-botnet up there too.

00:34:24.250 --> 00:34:25.965
So the entire control
infrastructure

00:34:25.965 --> 00:34:28.250
is spread across many
different machines,

00:34:28.250 --> 00:34:30.010
and maybe at any
given time there's

00:34:30.010 --> 00:34:32.270
a different machine that's
responsible for sending

00:34:32.270 --> 00:34:34.484
commands to all of these
worker nodes down here.

00:34:34.484 --> 00:34:36.109
And so this is nice,
because it doesn't

00:34:36.109 --> 00:34:39.370
require you to have access to
one of these bulletproof hosts.

00:34:39.370 --> 00:34:42.040
You can construct the
C&C infrastructure

00:34:42.040 --> 00:34:44.900
using regular bots.

00:34:44.900 --> 00:34:47.179
The P2P aspect of it
makes it a little more

00:34:47.179 --> 00:34:49.820
difficult to provide guarantees
about the availability

00:34:49.820 --> 00:34:52.047
of the hosts that are up
here, but it does have

00:34:52.047 --> 00:34:53.255
some other nice advantages.

00:34:53.255 --> 00:34:55.428
At a high level, those
are the two approaches

00:34:55.428 --> 00:34:57.610
that people can use.

00:34:57.610 --> 00:35:08.130
So what happens if the hosting
service gets taken down?

00:35:12.590 --> 00:35:17.740
Well, there's a couple things
that the adversary can do.

00:35:17.740 --> 00:35:23.895
So they can use DNS to
essentially redirect requests.

00:35:30.440 --> 00:35:34.060
So let's say that
someone attacks,

00:35:34.060 --> 00:35:36.610
or someone issues a takedown
for the DNS infrastructure

00:35:36.610 --> 00:35:37.870
for something like this.

00:35:37.870 --> 00:35:39.870
As long as the back-end
servers are still alive,

00:35:39.870 --> 00:35:44.750
what the attacker
can do is basically--

00:35:44.750 --> 00:35:51.330
the attacker creates lists
of server IP addresses.

00:35:55.114 --> 00:35:58.285
And there may be hundreds or
thousands of these IP addresses

00:35:58.285 --> 00:35:59.600
that it collects.

00:35:59.600 --> 00:36:08.210
And then it will bind
each one to a host

00:36:08.210 --> 00:36:13.090
name for a very
short period of time.

00:36:13.090 --> 00:36:16.610
So let's say maybe
for 300 seconds.

00:36:20.317 --> 00:36:22.400
And so what's nice about
this is that if someone's

00:36:22.400 --> 00:36:24.370
trying to run
heuristics that say,

00:36:24.370 --> 00:36:28.140
if I see some particular
server sending

00:36:28.140 --> 00:36:32.197
more than 1,000 spam-like
messages in a given period

00:36:32.197 --> 00:36:34.780
I'm going to try to issue some
kind of takedown to them, well,

00:36:34.780 --> 00:36:37.795
these types of techniques will
maybe help the attacker fly

00:36:37.795 --> 00:36:40.086
under the radar of those
types of detection techniques.

00:36:40.086 --> 00:36:41.990
Because essentially every
300 seconds they're saying,

00:36:41.990 --> 00:36:43.245
OK, I'm going to be
serving spam from here,

00:36:43.245 --> 00:36:45.620
then I'm going to be serving
spam from here, serving spam

00:36:45.620 --> 00:36:46.960
from here, so on and so forth.

00:36:46.960 --> 00:36:49.292
So this is a nice use
of indirection, at least

00:36:49.292 --> 00:36:50.960
from the attacker's perspective.

00:36:50.960 --> 00:36:55.640
And so, as I mentioned earlier,
these types of indirection

00:36:55.640 --> 00:36:58.000
are of one of the key
ways that attackers

00:36:58.000 --> 00:37:02.710
try to evade law enforcement
and these detection heuristics.

00:37:02.710 --> 00:37:05.540
So you might think about,
well, what if we just

00:37:05.540 --> 00:37:07.480
take down the DNS server?

00:37:07.480 --> 00:37:09.337
How hard is it to do that?

00:37:09.337 --> 00:37:10.795
Well, as the paper
describes, there

00:37:10.795 --> 00:37:12.160
are a couple different
layers on which

00:37:12.160 --> 00:37:13.500
you can attack these spammers.

00:37:13.500 --> 00:37:17.409
So you can try to take down the
attacker's domain registration.

00:37:17.409 --> 00:37:18.950
That's basically
the thing that says,

00:37:18.950 --> 00:37:25.050
like, hey, if you're looking
for russianpharma.rx.biz.org,

00:37:25.050 --> 00:37:27.299
then here's the DNS
server that you talk to.

00:37:27.299 --> 00:37:29.090
You can imagine attacking
it at that level.

00:37:29.090 --> 00:37:30.548
You could also
imagine attacking it

00:37:30.548 --> 00:37:34.060
at the level of taking down
the spammer's DNS server,

00:37:34.060 --> 00:37:36.120
the thing to which you'll
be redirected once you

00:37:36.120 --> 00:37:38.552
look at that top-level domain.

00:37:38.552 --> 00:37:40.260
And so what's tricky
is that the attacker

00:37:40.260 --> 00:37:43.540
can use these sort of
fast flux techniques

00:37:43.540 --> 00:37:44.800
at every different level.

00:37:44.800 --> 00:37:47.600
So, for example, they
can rotate the servers

00:37:47.600 --> 00:37:49.360
they use to act as
their DNS servers.

00:37:49.360 --> 00:37:54.970
They can rotate the web servers
they use to send out the spam.

00:37:54.970 --> 00:37:56.388
And so on and so forth.

00:37:56.388 --> 00:37:58.221
So that's just a
high-level review

00:37:58.221 --> 00:37:59.846
of how people can
use multiple machines

00:37:59.846 --> 00:38:03.810
to try to avoid detection.

00:38:03.810 --> 00:38:09.540
So as I mentioned earlier,
you can use compromised

00:38:09.540 --> 00:38:14.660
webmail accounts to send spam.

00:38:20.900 --> 00:38:25.190
And the power of that is
that if you can get access

00:38:25.190 --> 00:38:27.065
to someone's account,
then you don't actually

00:38:27.065 --> 00:38:28.773
have to install malware
on their machine.

00:38:28.773 --> 00:38:30.374
You can actually
access their account

00:38:30.374 --> 00:38:32.290
from the privacy of your
own machine, wherever

00:38:32.290 --> 00:38:33.373
it is that you're located.

00:38:33.373 --> 00:38:36.074
And as we were
discussing earlier,

00:38:36.074 --> 00:38:37.740
this is useful for
spearfishing attacks,

00:38:37.740 --> 00:38:40.690
because you can send this spam
message as the person whose

00:38:40.690 --> 00:38:42.570
account it actually belongs to.

00:38:42.570 --> 00:38:44.280
And so as a result
the webmail providers

00:38:44.280 --> 00:38:47.714
are very motivated to shut
this kind of thing down.

00:38:47.714 --> 00:38:49.380
Because if they don't
do that, then they

00:38:49.380 --> 00:38:51.600
risk being blacklisted
as a whole.

00:38:51.600 --> 00:38:54.880
All the users risk being flagged
as spam, which they don't want.

00:38:54.880 --> 00:38:58.140
And also the provider actually
needs to somehow monetize

00:38:58.140 --> 00:38:58.870
their service.

00:38:58.870 --> 00:39:01.750
They actually need real
users to be doing things

00:39:01.750 --> 00:39:03.550
like clicking on ads
in the righthand bar

00:39:03.550 --> 00:39:04.880
of their webmail account.

00:39:04.880 --> 00:39:08.380
So the higher the proportion of
their users which are spamming,

00:39:08.380 --> 00:39:10.535
the less likely advertisers
are to advertise

00:39:10.535 --> 00:39:11.920
in their webmail system.

00:39:11.920 --> 00:39:13.972
So the webmail
account providers are

00:39:13.972 --> 00:39:17.280
very incentivized to shut
down this kind of stuff.

00:39:17.280 --> 00:39:20.330
So how do they try to
detect this type of spam?

00:39:20.330 --> 00:39:21.450
They use those heuristics.

00:39:21.450 --> 00:39:24.350
They might try to use CAPTCHAs.

00:39:24.350 --> 00:39:27.180
If they suspect that you've
sent some spam-like messages,

00:39:27.180 --> 00:39:28.835
let's say five
times in a row, they

00:39:28.835 --> 00:39:30.960
might ask you to type in
one of those fuzzy letters

00:39:30.960 --> 00:39:32.950
or whatever.

00:39:32.950 --> 00:39:35.150
Suffice it to say, though,
a lot of these techniques

00:39:35.150 --> 00:39:36.520
don't work very well.

00:39:36.520 --> 00:39:41.650
If you look at the
price per account,

00:39:41.650 --> 00:39:43.425
so how much you
as a spammer would

00:39:43.425 --> 00:39:45.880
have to pay to get
one of these things,

00:39:45.880 --> 00:39:47.590
it's still super, super cheap.

00:39:47.590 --> 00:39:54.860
So it's on the order of $0.01 to
$0.05 for an account on Yahoo,

00:39:54.860 --> 00:39:56.770
Gmail, Hotmail,
something like that.

00:39:56.770 --> 00:39:59.030
So once again, this
is very, very low.

00:39:59.030 --> 00:40:01.580
And so this does not act as
an effective disincentive

00:40:01.580 --> 00:40:04.670
for spammers to try to
do these types of things.

00:40:04.670 --> 00:40:08.590
So this maybe is a
little bit disappointing,

00:40:08.590 --> 00:40:10.740
because it seems
like everywhere we

00:40:10.740 --> 00:40:13.160
go, we have to solve
these CAPTCHAs if we

00:40:13.160 --> 00:40:15.459
want to buy things
or send emails or do

00:40:15.459 --> 00:40:16.250
that kind of stuff.

00:40:16.250 --> 00:40:20.480
So basically, what
happened to CAPTCHAs?

00:40:20.480 --> 00:40:24.660
They were supposed to make
all this bad stuff go away.

00:40:24.660 --> 00:40:29.580
And as it turns
out, the attacker

00:40:29.580 --> 00:40:34.250
can build services
to solve CAPTCHAs.

00:40:37.580 --> 00:40:41.210
So this can be automated,
just like anything else.

00:40:44.380 --> 00:40:46.860
As it turns out, the
economics for this

00:40:46.860 --> 00:40:49.440
is that if you want
to solve one CAPTCHA,

00:40:49.440 --> 00:40:57.521
then it's approximately $0.001
dollar to solve a CAPTCHA.

00:40:57.521 --> 00:40:59.930
Which is nothing.

00:40:59.930 --> 00:41:02.830
And this can be done with
very, very low latency, too.

00:41:02.830 --> 00:41:05.400
So CAPTCHAs essentially
are not presenting

00:41:05.400 --> 00:41:08.620
most large-scale spammers
with a high barrier

00:41:08.620 --> 00:41:10.200
for sending these spams.

00:41:10.200 --> 00:41:12.630
And so how is this being done?

00:41:12.630 --> 00:41:14.290
If it's this cheap,
you might think,

00:41:14.290 --> 00:41:17.182
maybe it's being done all
by computers, by software.

00:41:17.182 --> 00:41:18.140
But it's not, actually.

00:41:18.140 --> 00:41:21.434
So a lot of this
is done by humans.

00:41:25.780 --> 00:41:29.903
In particular, the attacker
can outsource this in one

00:41:29.903 --> 00:41:30.650
of two ways.

00:41:30.650 --> 00:41:32.191
So first of all the
attacker can just

00:41:32.191 --> 00:41:34.570
find a labor market
where the cost of labor

00:41:34.570 --> 00:41:36.340
is very, very cheap.

00:41:36.340 --> 00:41:39.740
So you can employ humans
to essentially act

00:41:39.740 --> 00:41:42.154
as CAPTCHA solvers for you.

00:41:42.154 --> 00:41:44.070
You, the spammer, are
presented with a CAPTCHA

00:41:44.070 --> 00:41:45.240
by Gmail or whatever.

00:41:45.240 --> 00:41:47.470
You, the spammer,
then send that CAPTCHA

00:41:47.470 --> 00:41:49.290
over to some human
sitting somewhere.

00:41:49.290 --> 00:41:51.690
They solve for you, they've
earned some small amount

00:41:51.690 --> 00:41:54.340
of money, and then
you send their answer

00:41:54.340 --> 00:41:56.410
to the legitimate site.

00:41:56.410 --> 00:42:02.160
You could also do this
with Mechanical Turk.

00:42:02.160 --> 00:42:05.064
Have you guys heard
of Mechanical Turk?

00:42:05.064 --> 00:42:07.800
I've asked the question, my
back is turned, [INAUDIBLE].

00:42:07.800 --> 00:42:11.230
OK, so Mechanical
Turk is pretty neat,

00:42:11.230 --> 00:42:12.980
I mean neat if you're
trying to do evil.

00:42:12.980 --> 00:42:13.880
So what's nice about
that is that you

00:42:13.880 --> 00:42:16.192
can post these tasks on
Mechanical Turk and say,

00:42:16.192 --> 00:42:18.650
hey, I have a picture-solving
game, or something like this.

00:42:18.650 --> 00:42:20.390
Or you can just come
out and say straight up,

00:42:20.390 --> 00:42:22.015
I've got some CAPTCHAs
I want to solve.

00:42:22.015 --> 00:42:23.990
You post a price, and
then basically the market

00:42:23.990 --> 00:42:26.466
will match you with people who
are willing to do that task.

00:42:26.466 --> 00:42:28.840
And then they'll do it for
you, they'll post the answers.

00:42:28.840 --> 00:42:34.060
So this actually automates
a lot of actually finding

00:42:34.060 --> 00:42:37.180
the labor pool for the spammer.

00:42:37.180 --> 00:42:38.907
The problem with
this is that you

00:42:38.907 --> 00:42:40.365
have more overhead
for the spammer,

00:42:40.365 --> 00:42:43.955
because Amazon has to take
some cut of that profit that's

00:42:43.955 --> 00:42:44.890
generated from that.

00:42:44.890 --> 00:42:48.410
But that's very nice there.

00:42:48.410 --> 00:42:50.780
Another thing that
attackers can do

00:42:50.780 --> 00:42:55.530
is they can actually reuse
CAPTCHAs on legitimate sites.

00:42:55.530 --> 00:42:58.610
So there's some CAPTCHA that
the attacker wants to solve.

00:42:58.610 --> 00:43:00.590
They then have some
legitimate site

00:43:00.590 --> 00:43:03.590
on the side where they present
that exact same CAPTCHA,

00:43:03.590 --> 00:43:06.510
and get a real visitor to
figure out what that CAPTCHA is.

00:43:06.510 --> 00:43:08.680
Then they come back
over to the first site

00:43:08.680 --> 00:43:11.590
and then use that
answer as the answer.

00:43:11.590 --> 00:43:14.001
And like all these
crowdsourcing-type things,

00:43:14.001 --> 00:43:15.626
if you don't trust
your users, then you

00:43:15.626 --> 00:43:17.540
can maybe replicate the work.

00:43:17.540 --> 00:43:19.880
So you send the CAPTCHA to
maybe two or three people.

00:43:19.880 --> 00:43:21.963
And then you come back in
and use majority voting,

00:43:21.963 --> 00:43:25.430
take whatever that majority
vote was as your CAPTCHA answer.

00:43:25.430 --> 00:43:27.190
And so these are
some of the reasons

00:43:27.190 --> 00:43:29.270
why the CAPTCHA
defenses don't work

00:43:29.270 --> 00:43:31.130
as well as you might think.

00:43:31.130 --> 00:43:34.590
So the providers, so for example
Gmail or Yahoo or whatever,

00:43:34.590 --> 00:43:37.840
can to try to implement
more frequent CAPTCHAs

00:43:37.840 --> 00:43:42.200
to try to push the friction
level up for the spammer.

00:43:42.200 --> 00:43:44.320
The problem there is
that then regular users

00:43:44.320 --> 00:43:45.490
will get irritated.

00:43:45.490 --> 00:43:47.960
So a good example
of this is Gmail's

00:43:47.960 --> 00:43:49.210
two-factor authentication.

00:43:49.210 --> 00:43:51.610
It's actually a super good idea.

00:43:51.610 --> 00:43:53.585
Whenever Gmail will
detect that you're

00:43:53.585 --> 00:43:55.320
trying to use Gmail
from a machine

00:43:55.320 --> 00:43:57.580
that it doesn't
know about, it'll

00:43:57.580 --> 00:44:00.025
basically send you a text
message saying hey, enter

00:44:00.025 --> 00:44:02.940
this verification
code into Gmail

00:44:02.940 --> 00:44:05.170
before you can actually
continue to use the service.

00:44:05.170 --> 00:44:07.336
And so what's funny is that
it's a super great idea,

00:44:07.336 --> 00:44:09.370
but at least for me,
I get super irritated

00:44:09.370 --> 00:44:11.044
when I have to get
that text message.

00:44:11.044 --> 00:44:13.210
Like, I know it's good for
me, but I just get angry.

00:44:13.210 --> 00:44:13.918
It's frictionful.

00:44:13.918 --> 00:44:15.479
And so I'll do it
if I don't migrate

00:44:15.479 --> 00:44:17.020
to a lot of different
machines a lot,

00:44:17.020 --> 00:44:19.640
but if I had to do it any
more than I did right now,

00:44:19.640 --> 00:44:22.800
it's unclear that I'd feel
as happy about it as I do.

00:44:22.800 --> 00:44:24.690
So there's this very
interesting sort

00:44:24.690 --> 00:44:27.060
of tradeoff between the
security that people

00:44:27.060 --> 00:44:29.660
say that they want and the
security measures that they're

00:44:29.660 --> 00:44:30.740
willing to put up with.

00:44:30.740 --> 00:44:32.490
So as a result,
it's very difficult

00:44:32.490 --> 00:44:35.485
for the webmail providers to
increase the amount of CAPTCHAs

00:44:35.485 --> 00:44:38.620
and still keep users happy.

00:44:38.620 --> 00:44:40.490
OK, so any other questions
before we move on

00:44:40.490 --> 00:44:41.360
to click support?

00:44:41.360 --> 00:44:45.824
AUDIENCE: So is one of the
reasons for the non-adoption

00:44:45.824 --> 00:44:49.296
of encrypted emails,
besides the [INAUDIBLE]

00:44:49.296 --> 00:44:52.770
is that spam filters have
a very, very big part?

00:44:52.770 --> 00:44:56.374
PROFESSOR: Ah, because then they
can't inspect messages and see

00:44:56.374 --> 00:44:57.040
what's going on.

00:44:57.040 --> 00:44:57.998
That's a good question.

00:44:57.998 --> 00:44:59.530
I think it's
actually hard to say.

00:44:59.530 --> 00:45:01.820
I don't know, because it's a
little bit of a chicken and egg

00:45:01.820 --> 00:45:02.320
problem.

00:45:02.320 --> 00:45:05.260
So because there isn't a huge
volume of encrypted email,

00:45:05.260 --> 00:45:07.977
it's unclear whether
spammers are actually trying

00:45:07.977 --> 00:45:09.060
to take advantage of that.

00:45:09.060 --> 00:45:11.130
But I could see that
maybe being a problem.

00:45:11.130 --> 00:45:12.810
I mean, people
have looked at ways

00:45:12.810 --> 00:45:16.880
to do computation
over encrypted data.

00:45:16.880 --> 00:45:19.430
So maybe you could think
about doing something there.

00:45:19.430 --> 00:45:20.880
But it's always tricky.

00:45:20.880 --> 00:45:22.560
So for example,
with spam, people

00:45:22.560 --> 00:45:25.730
have these spam filters that
were based on Markov models

00:45:25.730 --> 00:45:26.810
and things like that.

00:45:26.810 --> 00:45:27.935
So what do the spammers do?

00:45:27.935 --> 00:45:30.950
They start making these
images that basically

00:45:30.950 --> 00:45:32.480
can't be seen by
the text scanners,

00:45:32.480 --> 00:45:34.313
but then have the
spamming content in there.

00:45:34.313 --> 00:45:38.290
So it's always an arms race.

00:45:38.290 --> 00:45:38.995
All right.

00:45:38.995 --> 00:45:44.100
So let's move on
to click support.

00:45:44.100 --> 00:45:49.000
So what is this about?

00:45:49.000 --> 00:45:51.870
So once the advertising
step has succeeded

00:45:51.870 --> 00:45:54.930
and the user is given a link, so
these are clicks on that link,

00:45:54.930 --> 00:46:01.630
so the user contacts
some DNS server

00:46:01.630 --> 00:46:09.010
after clicking on that
link to basically translate

00:46:09.010 --> 00:46:18.130
some hostname that was
in that link to some IP.

00:46:18.130 --> 00:46:21.940
And then after that
translation takes place,

00:46:21.940 --> 00:46:34.980
the user has to contact some
web server that has that IP.

00:46:34.980 --> 00:46:37.080
So to make all this
work, the spammer

00:46:37.080 --> 00:46:44.838
has to register a domain name.

00:46:44.838 --> 00:46:54.570
And then the spammer
has to run a DNS server,

00:46:54.570 --> 00:46:56.920
and then they have
to run a web server.

00:47:02.930 --> 00:47:05.010
So this is essentially
what the spammer

00:47:05.010 --> 00:47:07.950
has to do to make this click
support thing work out.

00:47:07.950 --> 00:47:10.376
So one question you
might have is, well,

00:47:10.376 --> 00:47:13.380
why wouldn't the
spammer just use

00:47:13.380 --> 00:47:18.101
raw IP addresses, for example,
like in these spam URLs?

00:47:18.101 --> 00:47:20.100
And so does anyone have
any thoughts about that?

00:47:20.100 --> 00:47:25.046
Why wouldn't you just
have 183.4.4 dot whatever,

00:47:25.046 --> 00:47:27.530
instead of having something
like russianjewels.biz?

00:47:27.530 --> 00:47:29.495
AUDIENCE: Because
it looks sketchy,

00:47:29.495 --> 00:47:30.694
it makes it easier to tell.

00:47:30.694 --> 00:47:31.360
PROFESSOR: Yeah.

00:47:31.360 --> 00:47:34.814
So one thing, one would
hope, is that a user would

00:47:34.814 --> 00:47:37.230
look at this thing that just
has a bunch of numbers in it,

00:47:37.230 --> 00:47:39.962
and they'd say, well,
this clearly seems weird.

00:47:39.962 --> 00:47:42.420
As it turns out, this will only
weed out some of the users,

00:47:42.420 --> 00:47:43.461
but you're exactly right.

00:47:43.461 --> 00:47:46.225
There's a subset of people you
would lose just because nobody

00:47:46.225 --> 00:47:47.730
wants to click on that.

00:47:47.730 --> 00:47:50.210
Another reason is
that once again,

00:47:50.210 --> 00:47:53.580
having this sort of DNS
infrastructure up here

00:47:53.580 --> 00:47:56.220
gives the attacker another
level of indirection.

00:47:56.220 --> 00:47:59.900
So once again, if the legal
authorities or whoever

00:47:59.900 --> 00:48:02.280
shut down the DNS
infrastructure but they somehow

00:48:02.280 --> 00:48:05.400
don't manage to shut down
that back-end web server,

00:48:05.400 --> 00:48:07.524
then the spammer can
conjure up a different sort

00:48:07.524 --> 00:48:09.190
of front end for their
service and maybe

00:48:09.190 --> 00:48:11.930
try to use that same web
server on the back end.

00:48:11.930 --> 00:48:13.450
So that's another
reason, I think,

00:48:13.450 --> 00:48:16.960
that people don't
typically put these raw IP

00:48:16.960 --> 00:48:21.020
addresses in their spam URLs.

00:48:21.020 --> 00:48:27.400
So another example of how this
redirection comes into play--

00:48:27.400 --> 00:48:29.790
how this indirection
comes into play, sorry--

00:48:29.790 --> 00:48:37.445
is that these spam URLs often
point to redirection sites.

00:48:43.070 --> 00:48:48.660
And so these are sites like
bit.ly, or things like that.

00:48:48.660 --> 00:48:52.793
And so in addition to
things like bit.ly,

00:48:52.793 --> 00:48:55.870
you could also imagine
that a compromised

00:48:55.870 --> 00:49:02.515
website can actually
also act as a redirecter.

00:49:05.310 --> 00:49:09.134
You just put the appropriate
HTML or JavaScript in there

00:49:09.134 --> 00:49:10.675
that when the user
goes to that site,

00:49:10.675 --> 00:49:13.520
it's then going to
redirect the user's browser

00:49:13.520 --> 00:49:15.674
to some other different site.

00:49:15.674 --> 00:49:17.590
So once again, this
useful because it provides

00:49:17.590 --> 00:49:19.320
that level of indirection.

00:49:19.320 --> 00:49:21.585
And it actually acts
as a force multiplier,

00:49:21.585 --> 00:49:25.770
so you have a single
spamming web server back end,

00:49:25.770 --> 00:49:29.180
but then you can name it
using different things.

00:49:29.180 --> 00:49:32.480
And that will allow
you to maybe confuse

00:49:32.480 --> 00:49:35.980
filters who have blacklisted,
let's say, 10% of your URLs,

00:49:35.980 --> 00:49:37.970
but not the other 90% of them.

00:49:37.970 --> 00:49:40.290
So this is a very,
very common technique.

00:49:40.290 --> 00:49:45.770
And then another thing is
that sometimes the spammers

00:49:45.770 --> 00:49:58.070
can use botnets as web servers
or maybe as proxies, as DNS

00:49:58.070 --> 00:50:01.800
servers, and so and so forth.

00:50:01.800 --> 00:50:04.990
We mentioned this a
little bit earlier,

00:50:04.990 --> 00:50:07.860
but this is another example of
how the more machines you have

00:50:07.860 --> 00:50:10.237
as an attacker, the more
defense that gives you.

00:50:10.237 --> 00:50:12.320
Because you can hide your
evil amongst a watershed

00:50:12.320 --> 00:50:12.870
of machines.

00:50:16.758 --> 00:50:20.160
All right.

00:50:20.160 --> 00:50:22.802
So in some cases, one of the
things the paper talks about

00:50:22.802 --> 00:50:24.010
is these affiliate providers.

00:50:24.010 --> 00:50:29.290
These affiliate providers kind
of act as evil clearinghouses.

00:50:29.290 --> 00:50:31.905
They will help to automate some
of the tedium of interacting

00:50:31.905 --> 00:50:34.020
with the banks, and
things like this,

00:50:34.020 --> 00:50:35.730
on behalf of you, the spammer.

00:50:35.730 --> 00:50:37.610
So one thing you
might wonder is, well,

00:50:37.610 --> 00:50:39.800
why can't the law
enforcement just take down

00:50:39.800 --> 00:50:41.090
the affiliate providers?

00:50:41.090 --> 00:50:43.152
They seem kind of
like a choke point.

00:50:43.152 --> 00:50:45.110
And the thing is that
these affiliate providers

00:50:45.110 --> 00:50:48.310
are kind of like SPECTRE
from the James Bond movies.

00:50:48.310 --> 00:50:50.220
They're very
decentralized themselves.

00:50:50.220 --> 00:50:53.184
So it's very difficult to
point to an affiliate provider

00:50:53.184 --> 00:50:55.350
at this particular machine,
and we'll just shut down

00:50:55.350 --> 00:50:56.530
that particular machine.

00:50:56.530 --> 00:50:58.000
Oftentimes the
affiliate providers

00:50:58.000 --> 00:50:59.640
are distributed themselves.

00:50:59.640 --> 00:51:01.800
So that means that it's
actually pretty tricky for,

00:51:01.800 --> 00:51:04.770
let's say, the FBI, to just
go to some affiliate program

00:51:04.770 --> 00:51:07.840
and say, thou shalt
not do this anymore.

00:51:07.840 --> 00:51:09.420
Another interesting
thing, too, is

00:51:09.420 --> 00:51:12.830
that the paper mentions
that in many countries

00:51:12.830 --> 00:51:14.640
IP laws are different,
for example.

00:51:14.640 --> 00:51:17.600
So the FBI may not be able to
enforce intellectual properties

00:51:17.600 --> 00:51:19.430
that we have with
other countries.

00:51:19.430 --> 00:51:21.520
And also, according
to the paper,

00:51:21.520 --> 00:51:23.755
in many of these spam
forums, the spammers

00:51:23.755 --> 00:51:26.790
claim they are providing a
useful, legitimate service

00:51:26.790 --> 00:51:28.370
to Western countries.

00:51:28.370 --> 00:51:30.720
They say that
essentially, prices

00:51:30.720 --> 00:51:32.380
are too high for
some of these things,

00:51:32.380 --> 00:51:34.900
in these Western countries,
and that the fact that people

00:51:34.900 --> 00:51:37.850
are clicking on demand indicates
there's a legitimate need

00:51:37.850 --> 00:51:41.970
to buy Windows copies that
may be riddled with malware.

00:51:41.970 --> 00:51:44.399
So a lot of times the
spammers themselves

00:51:44.399 --> 00:51:46.190
don't feel that they're
doing anything bad.

00:51:46.190 --> 00:51:48.050
And as we'll discuss
a little bit later,

00:51:48.050 --> 00:51:50.430
the spammers do often
actually give you

00:51:50.430 --> 00:51:52.476
the stuff that you've
paid money for,

00:51:52.476 --> 00:51:54.642
which for me was one of the
most surprising outcomes

00:51:54.642 --> 00:51:55.790
of the paper.

00:51:55.790 --> 00:51:59.610
And so we'll discuss why
that is in a little bit.

00:51:59.610 --> 00:52:02.030
So one thing that
the paper talks about

00:52:02.030 --> 00:52:05.380
is various takedown
strategies that you

00:52:05.380 --> 00:52:09.680
can imagine employing to
try to stop a spammer.

00:52:09.680 --> 00:52:11.420
So one thing it
talked about, they

00:52:11.420 --> 00:52:24.900
said that only a few
number of registrars host

00:52:24.900 --> 00:52:27.955
domains for many affiliates.

00:52:32.330 --> 00:52:37.195
And so what that means is
that most of these affiliate

00:52:37.195 --> 00:52:40.900
programs are-- there's sort
of this one-to-one binding

00:52:40.900 --> 00:52:43.350
between affiliates and
the registrars that

00:52:43.350 --> 00:52:45.950
are dealing with their domain
name and infrastructure.

00:52:45.950 --> 00:52:48.360
It's very rare that you
have a single domain name

00:52:48.360 --> 00:52:51.280
registrar who's going
to be associated

00:52:51.280 --> 00:52:53.390
with a bunch of different
affiliate programs.

00:52:53.390 --> 00:52:55.056
So what that means
is that in many cases

00:52:55.056 --> 00:52:57.240
there's not this, like,
master decapitation strike

00:52:57.240 --> 00:52:58.520
you could launch,
where you'd take out

00:52:58.520 --> 00:53:00.603
this particular registrar
and then all of a sudden

00:53:00.603 --> 00:53:03.360
the entire spam
infrastructure falls down.

00:53:03.360 --> 00:53:09.670
They found similar results
for things like web servers.

00:53:09.670 --> 00:53:12.330
It's very rare that
one ISP will actually

00:53:12.330 --> 00:53:16.230
host a ton of web servers for
a ton of affiliate programs.

00:53:16.230 --> 00:53:17.910
This distributed
nature, once again,

00:53:17.910 --> 00:53:20.000
makes it very difficult
to say, if we just

00:53:20.000 --> 00:53:23.050
take out these three things
then the whole ecosystem just

00:53:23.050 --> 00:53:25.560
crumbles.

00:53:25.560 --> 00:53:27.300
So that's a little
bit disappointing,

00:53:27.300 --> 00:53:29.130
because one would
hope that there'd

00:53:29.130 --> 00:53:34.000
be one web server in Evildonia,
where if we could just

00:53:34.000 --> 00:53:36.865
take down Evildonia, then people
would stop sending us spam.

00:53:36.865 --> 00:53:38.290
That's actually not true.

00:53:38.290 --> 00:53:40.490
As we'll see later,
though, that may

00:53:40.490 --> 00:53:42.470
be true to some extent
at the banking back end.

00:53:42.470 --> 00:53:44.990
And so maybe we can actually
put the squeeze on there.

00:53:44.990 --> 00:53:48.580
So anyway, I was alluding to
earlier about this realization

00:53:48.580 --> 00:53:51.320
phase.

00:53:51.320 --> 00:53:57.220
So the realization phase is what
happens after you, the user,

00:53:57.220 --> 00:54:00.050
have decided to buy something.

00:54:00.050 --> 00:54:03.660
So the realization phase
consists of two parts.

00:54:03.660 --> 00:54:07.770
The user pays for whatever
goods they've bought,

00:54:07.770 --> 00:54:14.140
or they want to buy, and
then the user hopefully

00:54:14.140 --> 00:54:17.700
will receive those goods.

00:54:17.700 --> 00:54:20.450
So either in the
mail because they're

00:54:20.450 --> 00:54:23.180
buying some type
of knockoff drug,

00:54:23.180 --> 00:54:25.489
or they get some
software download

00:54:25.489 --> 00:54:27.780
because they want to get some
fake version of Photoshop

00:54:27.780 --> 00:54:28.780
or something like that.

00:54:28.780 --> 00:54:33.870
And so the money flow
looks something like this.

00:54:33.870 --> 00:54:38.840
We start with the
customer here, and they're

00:54:38.840 --> 00:54:44.180
going to tell the merchant hey,
I want to go buy something.

00:54:44.180 --> 00:54:47.430
They will send some
credit card info here,

00:54:47.430 --> 00:54:50.050
and then the merchant is
going to talk to the payment

00:54:50.050 --> 00:54:52.800
processor.

00:54:52.800 --> 00:54:54.840
And this is
essentially a middleman

00:54:54.840 --> 00:54:58.650
that helps the
merchant, the spammer,

00:54:58.650 --> 00:55:00.710
deal with some of the
intricacies of interacting

00:55:00.710 --> 00:55:03.160
with the credit card system.

00:55:03.160 --> 00:55:07.320
The payment processor will
talk to the acquiring bank.

00:55:10.097 --> 00:55:12.180
So the acquiring bank,
that's the merchant's bank.

00:55:17.630 --> 00:55:20.000
And then the acquiring bank--
running out of space here.

00:55:20.000 --> 00:55:24.120
So, violating all
good design standards,

00:55:24.120 --> 00:55:25.880
we will come up here.

00:55:25.880 --> 00:55:28.860
So the acquiring bank is
then going to talk to-- they

00:55:28.860 --> 00:55:33.400
call them in the paper
the association network,

00:55:33.400 --> 00:55:35.940
but just think of this as Visa.

00:55:35.940 --> 00:55:40.170
This is the credit
card network up here.

00:55:40.170 --> 00:55:42.290
And then finally the
association network,

00:55:42.290 --> 00:55:48.460
Visa or MasterCard or whatever,
talks to the issuing bank.

00:55:48.460 --> 00:55:52.070
So that issuing bank
is the customer's bank.

00:55:52.070 --> 00:55:57.067
And essentially
the Visa or whoever

00:55:57.067 --> 00:55:59.150
is going to go to the
customer's bank and say hey,

00:55:59.150 --> 00:56:00.191
is this a legit purchase?

00:56:00.191 --> 00:56:01.570
Is this a legit transaction?

00:56:01.570 --> 00:56:03.280
And if this is a
legit transaction,

00:56:03.280 --> 00:56:04.970
then the money
will actually flow

00:56:04.970 --> 00:56:06.255
through this entire system.

00:56:06.255 --> 00:56:11.810
So this is what the end-to-end
financial workflow looks like.

00:56:11.810 --> 00:56:13.992
And so this workflow
can actually

00:56:13.992 --> 00:56:14.950
process a lot of money.

00:56:14.950 --> 00:56:18.030
So one of the papers that we
mentioned in the lecture notes

00:56:18.030 --> 00:56:20.090
shows that a single
affiliate can

00:56:20.090 --> 00:56:23.530
get more than $10 million
dollars at this workflow here.

00:56:23.530 --> 00:56:26.580
And so in practice, you
might think that oh,

00:56:26.580 --> 00:56:29.610
why wouldn't the acquiring
bank or the issuing

00:56:29.610 --> 00:56:31.980
bank say, something
looks kind of fishy here?

00:56:31.980 --> 00:56:35.740
As it turns, in many
cases, they don't.

00:56:35.740 --> 00:56:37.960
And so this gets into this
interesting discussion

00:56:37.960 --> 00:56:45.580
about why is it that these
workflows are often tolerated

00:56:45.580 --> 00:56:46.790
by the financial system.

00:56:46.790 --> 00:56:54.480
For example, why do
spammers properly

00:56:54.480 --> 00:56:55.650
classify their transactions?

00:56:58.930 --> 00:57:05.160
So if you want to send
something through this system,

00:57:05.160 --> 00:57:08.942
you have to tag that transaction
with some type of type.

00:57:08.942 --> 00:57:10.650
You have to say, this
is pharmaceuticals,

00:57:10.650 --> 00:57:13.250
this is software, this is
whatever, this is whatever.

00:57:13.250 --> 00:57:15.300
So you might think
that as a spammer,

00:57:15.300 --> 00:57:18.390
you wouldn't actually
want to do this.

00:57:18.390 --> 00:57:22.157
If you were selling fake
Flintstones vitamins,

00:57:22.157 --> 00:57:23.990
maybe you don't want
to say this is actually

00:57:23.990 --> 00:57:25.810
a pharmaceutical transaction.

00:57:25.810 --> 00:57:28.170
And what's interesting is
that spammers do actually

00:57:28.170 --> 00:57:30.840
properly classify these
transactions in many cases.

00:57:30.840 --> 00:57:37.660
And the reason is that there are
high fines if you misclassify.

00:57:40.520 --> 00:57:46.590
So essentially what happens is
that these association networks

00:57:46.590 --> 00:57:50.440
like Visa or Mastercard,
in many cases

00:57:50.440 --> 00:57:52.985
they are OK, perhaps,
with transactions

00:57:52.985 --> 00:57:54.730
that are slightly shady.

00:57:54.730 --> 00:57:57.810
But they don't want to be blamed
for being a money launderer,

00:57:57.810 --> 00:58:00.330
or for trying to
deceive the authorities.

00:58:00.330 --> 00:58:04.480
So as long as you properly
classify what you do, then

00:58:04.480 --> 00:58:06.970
in a certain sense this
gives the association

00:58:06.970 --> 00:58:08.790
networks a little
bit of, well, listen,

00:58:08.790 --> 00:58:10.700
they told us what was going on.

00:58:10.700 --> 00:58:12.540
Maybe the law was a
little bit unclear.

00:58:12.540 --> 00:58:14.410
But we, at least,
Visa or MasterCard,

00:58:14.410 --> 00:58:18.140
did not try to hide the
intent of this transaction.

00:58:18.140 --> 00:58:20.195
So spammers do oftentimes
properly classify

00:58:20.195 --> 00:58:22.750
their transactions.

00:58:22.750 --> 00:58:23.879
So that's interesting.

00:58:23.879 --> 00:58:25.920
It seems like they're
playing within the confines

00:58:25.920 --> 00:58:27.520
of the system a little bit.

00:58:27.520 --> 00:58:30.450
So another question
I mentioned earlier

00:58:30.450 --> 00:58:33.970
is, why send anything to users?

00:58:38.240 --> 00:58:41.400
Because presumably you're a
spammer, so you're a criminal,

00:58:41.400 --> 00:58:41.900
right?

00:58:41.900 --> 00:58:45.545
So why wouldn't it just be cool
if you just took people's money

00:58:45.545 --> 00:58:46.340
and then ran?

00:58:46.340 --> 00:58:48.050
I mean, that'd be
the ultimate crime.

00:58:48.050 --> 00:58:53.260
So as it turns out, they
actually send things to users

00:58:53.260 --> 00:58:59.150
because, surprise surprise,
high fines if they don't.

00:58:59.150 --> 00:59:00.780
So it's this very
entertaining system

00:59:00.780 --> 00:59:03.660
whereby spammers kind of want
to do things that are legal,

00:59:03.660 --> 00:59:06.000
when they actually
can't use Bitcoins yet.

00:59:06.000 --> 00:59:08.634
They actually have to work
within the constraints

00:59:08.634 --> 00:59:09.800
of this pre-existing system.

00:59:09.800 --> 00:59:12.485
So as it turns out, there
are these high fines

00:59:12.485 --> 00:59:19.000
if you, and by you
I mean the spammer,

00:59:19.000 --> 00:59:20.160
have too many chargebacks.

00:59:24.050 --> 00:59:29.370
So a chargeback is
essentially when a customer

00:59:29.370 --> 00:59:31.280
tells their credit
card company, hey,

00:59:31.280 --> 00:59:34.805
I didn't get the thing that
I was supposed to get that I

00:59:34.805 --> 00:59:36.040
bought with your credit card.

00:59:36.040 --> 00:59:38.120
Or I got it, but
they didn't like it.

00:59:38.120 --> 00:59:41.400
So if you're a spammer and you
have too many customers saying

00:59:41.400 --> 00:59:43.150
things like this,
then you will actually

00:59:43.150 --> 00:59:45.580
get charged very,
very high fines.

00:59:45.580 --> 00:59:50.550
And as we saw earlier, the
clickthrough rates for spam

00:59:50.550 --> 00:59:52.285
are super, super low.

00:59:52.285 --> 00:59:55.070
The conversion rates
are super, super low.

00:59:55.070 --> 00:59:58.290
So even just one or two
fines might wipe out

00:59:58.290 --> 01:00:00.302
your entire profit
for a month, let's

01:00:00.302 --> 01:00:01.510
say, for something like this.

01:00:01.510 --> 01:00:03.860
So spammers are really
motivated to avoid these fines

01:00:03.860 --> 01:00:04.850
in both cases.

01:00:04.850 --> 01:00:07.920
AUDIENCE: Would using
Paypal obscure any of that,

01:00:07.920 --> 01:00:10.590
like the relationship
with the bank?

01:00:10.590 --> 01:00:13.690
PROFESSOR: Well,
typically, yes and no.

01:00:13.690 --> 01:00:17.930
So you can think of those--
Paypal is in many respects

01:00:17.930 --> 01:00:20.410
very similar to
Visa or MasterCard.

01:00:20.410 --> 01:00:24.420
So it has very similar
regulations that oversee it,

01:00:24.420 --> 01:00:27.080
because it bears many of
the same types of risks.

01:00:27.080 --> 01:00:31.122
I do think that Visa
has slightly stricter

01:00:31.122 --> 01:00:32.580
restrictions on
some of this stuff,

01:00:32.580 --> 01:00:34.000
as we'll talk about in a second.

01:00:34.000 --> 01:00:35.375
But for all intents
and purposes,

01:00:35.375 --> 01:00:37.012
Paypal looks very similar.

01:00:37.012 --> 01:00:39.200
AUDIENCE: Is there
any sort of idea

01:00:39.200 --> 01:00:42.405
of having a group where you
make some sort of account

01:00:42.405 --> 01:00:44.520
and then intentionally go
to a bunch of spammers,

01:00:44.520 --> 01:00:48.180
buy a bunch of things, and then
ask for a bunch of chargebacks

01:00:48.180 --> 01:00:50.590
whether or not they
send it to you?

01:00:50.590 --> 01:00:52.470
So that they incur these fines.

01:00:52.470 --> 01:00:55.110
Or report them for
misclassifying things,

01:00:55.110 --> 01:00:57.540
in order to just make
them pay these fines.

01:00:57.540 --> 01:00:59.830
PROFESSOR: That's interesting.

01:00:59.830 --> 01:01:00.706
It's like vigilantes.

01:01:00.706 --> 01:01:01.871
AUDIENCE: Spam the spammers.

01:01:01.871 --> 01:01:03.030
PROFESSOR: Yeah, exactly.

01:01:03.030 --> 01:01:04.988
I don't know if I've
heard anything about that.

01:01:04.988 --> 01:01:09.630
I do know that the
spammers do try to detect

01:01:09.630 --> 01:01:11.350
people who are trolling them.

01:01:11.350 --> 01:01:14.710
So for example, one thing that
they talked about in the paper

01:01:14.710 --> 01:01:18.160
a little bit is that
spammers-- so how

01:01:18.160 --> 01:01:21.519
did the authors of the
paper determine all this?

01:01:21.519 --> 01:01:23.310
They actually got a
bunch of spam messages,

01:01:23.310 --> 01:01:24.685
they clicked on
a bunch of stuff.

01:01:24.685 --> 01:01:26.230
They got a special
Visa card they

01:01:26.230 --> 01:01:28.870
used to purchase this stuff,
and then so on and so forth.

01:01:28.870 --> 01:01:31.250
So spammers obviously
don't like this.

01:01:31.250 --> 01:01:33.810
And so in the paper they
call this test buys.

01:01:33.810 --> 01:01:35.620
Spammers want to
prevent these test buys

01:01:35.620 --> 01:01:38.430
from researchers who are trying
to figure out what's going on.

01:01:38.430 --> 01:01:41.990
So one thing that some spammers
did-- do, I should say--

01:01:41.990 --> 01:01:45.330
is they actually require
proof of your identity

01:01:45.330 --> 01:01:46.730
before you can buy something.

01:01:46.730 --> 01:01:49.820
So they might ask you to send
a picture of your photo ID,

01:01:49.820 --> 01:01:51.470
or something like that.

01:01:51.470 --> 01:01:53.790
In particular,
some people started

01:01:53.790 --> 01:01:58.000
doing this after Visa tightened
up some of their rules

01:01:58.000 --> 01:01:58.720
about spam.

01:01:58.720 --> 01:02:04.500
Now, the problem with this
is that most people who

01:02:04.500 --> 01:02:07.000
would click on span
apparently are still

01:02:07.000 --> 01:02:10.470
reluctant to send their photo
ID to just some random person.

01:02:10.470 --> 01:02:12.527
So there's a bunch
of-- I've linked

01:02:12.527 --> 01:02:14.027
one of these articles
in the lecture

01:02:14.027 --> 01:02:15.460
notes-- there's a bunch
of hilarious commentary

01:02:15.460 --> 01:02:18.200
from a spammer bulletin board,
where they say oh no, Visa's

01:02:18.200 --> 01:02:19.260
cracking down on us.

01:02:19.260 --> 01:02:21.390
We try to ask for
people's photo IDs,

01:02:21.390 --> 01:02:23.820
but they don't want to send
it to us for some reason.

01:02:23.820 --> 01:02:25.840
And it's so weird that people
wouldn't want to do that,

01:02:25.840 --> 01:02:27.490
but they will give them
their credit card number.

01:02:27.490 --> 01:02:29.198
But anyway, so long
story short, spammers

01:02:29.198 --> 01:02:33.375
are highly incentivized to try
to detect that kind of stuff.

01:02:33.375 --> 01:02:36.854
AUDIENCE: So for chargebacks,
if you don't necessarily

01:02:36.854 --> 01:02:40.333
want your bank to know that you
were buying these completely

01:02:40.333 --> 01:02:44.309
shady items, do a lot of
users actually do chargebacks

01:02:44.309 --> 01:02:45.800
if they don't get the item?

01:02:45.800 --> 01:02:47.800
Or are they too embarrassed?

01:02:47.800 --> 01:02:49.466
PROFESSOR: Yeah,
that's a good question.

01:02:49.466 --> 01:02:52.540
I don't know what
fraction of people

01:02:52.540 --> 01:02:54.890
are in the set of
people who bought

01:02:54.890 --> 01:02:56.830
herbal Flintstones
vitamins, were disappointed

01:02:56.830 --> 01:02:58.290
by herbal Flintstones
vitamins, and then,

01:02:58.290 --> 01:03:00.706
yeah, told their bank-- but
what's interesting, though, is

01:03:00.706 --> 01:03:03.016
that the bank has to
know in the first place

01:03:03.016 --> 01:03:04.390
that they're going
to this place,

01:03:04.390 --> 01:03:06.120
right, because the
thing went through.

01:03:06.120 --> 01:03:09.634
So avoiding the chargeback, I
don't think you're going to--

01:03:09.634 --> 01:03:11.300
but by doing the
chargeback, let me say,

01:03:11.300 --> 01:03:13.799
I don't think you'd reveal any
extra information to the bank

01:03:13.799 --> 01:03:15.000
that they wouldn't already know.

01:03:15.000 --> 01:03:17.291
Because they had to clear
the transaction first for you

01:03:17.291 --> 01:03:19.000
to actually get it
and be disappointed.

01:03:19.000 --> 01:03:22.320
AUDIENCE: So then roughly how
many chargebacks is too much?

01:03:22.320 --> 01:03:24.410
PROFESSOR: So some of the
figures I've heard here

01:03:24.410 --> 01:03:26.862
are greater than 1%.

01:03:26.862 --> 01:03:28.445
So in other words,
if you're a spammer

01:03:28.445 --> 01:03:30.890
and you have more than 1%
of your transactions causing

01:03:30.890 --> 01:03:33.142
these problems,
you get in trouble.

01:03:33.142 --> 01:03:35.475
And I wouldn't be surprised
if it was a little bit lower

01:03:35.475 --> 01:03:37.794
than that, but 1% is the
number that I've heard.

01:03:41.220 --> 01:03:41.720
All right.

01:03:41.720 --> 01:03:44.540
So to me, like I
said, this was one

01:03:44.540 --> 01:03:46.607
of the most interesting
parts of the paper.

01:03:46.607 --> 01:03:48.940
Because I would have thought
that a lot of spamming just

01:03:48.940 --> 01:03:50.234
involved straight-up fraud.

01:03:50.234 --> 01:03:52.150
That people clicked on
links, they sent money,

01:03:52.150 --> 01:03:53.149
they never got anything.

01:03:53.149 --> 01:03:55.272
But as it turns out,
because these spammers have

01:03:55.272 --> 01:03:58.130
to go through this
network which has

01:03:58.130 --> 01:04:02.330
all these mechanisms
to prevent fraud,

01:04:02.330 --> 01:04:06.892
they end up having to actually
ship things over to users.

01:04:06.892 --> 01:04:10.030
So that's kind of neat.

01:04:10.030 --> 01:04:12.400
And so another
reason why spammers

01:04:12.400 --> 01:04:14.940
want to do these things,
properly classify transactions

01:04:14.940 --> 01:04:16.610
and actually send
things to users,

01:04:16.610 --> 01:04:24.650
is that only a few
banks are actually

01:04:24.650 --> 01:04:28.320
willing to interact
with spammers.

01:04:32.590 --> 01:04:38.894
And so what this means
is that if the spammer is

01:04:38.894 --> 01:04:40.560
getting a lot of
chargebacks, or getting

01:04:40.560 --> 01:04:42.685
in trouble with the bank
or the credit card company

01:04:42.685 --> 01:04:44.549
or whatever, and
some bank decides,

01:04:44.549 --> 01:04:46.090
I can't do business
with you anymore,

01:04:46.090 --> 01:04:49.030
there's not a really
large set of other banks

01:04:49.030 --> 01:04:53.120
that the spammer could go to
to continue their chicanery.

01:04:53.120 --> 01:04:57.440
So one study of this stuff
found that there are basically

01:04:57.440 --> 01:05:06.290
only 30 acquiring banks that
spammers were seen to use over

01:05:06.290 --> 01:05:07.530
some two-year period.

01:05:07.530 --> 01:05:09.360
That's actually not very high.

01:05:09.360 --> 01:05:14.166
So there is this
other incentive to not

01:05:14.166 --> 01:05:15.790
be too goofy with
the financial system,

01:05:15.790 --> 01:05:18.165
because you don't really have
too many other places to go

01:05:18.165 --> 01:05:20.300
if you break those
relationships.

01:05:20.300 --> 01:05:25.140
So it seems like maybe
this is a good choke point

01:05:25.140 --> 01:05:26.910
to try to cut down on spam.

01:05:26.910 --> 01:05:29.075
So we've already discussed
how things like botnets

01:05:29.075 --> 01:05:31.140
give the attack a
lot of IP addresses.

01:05:31.140 --> 01:05:33.919
There's a lot of
different types of hosts

01:05:33.919 --> 01:05:36.210
who are willing to run web
servers, so on and so forth.

01:05:36.210 --> 01:05:37.751
But this number
actually seems small.

01:05:37.751 --> 01:05:41.660
So maybe we can actually
attack spamming here.

01:05:41.660 --> 01:05:43.920
But as I alluded to earlier,
it's a little bit tricky

01:05:43.920 --> 01:05:46.900
to do this because of things
like differing IP laws,

01:05:46.900 --> 01:05:50.290
because of things
like the fact that it

01:05:50.290 --> 01:05:54.830
can be sort of tricky to
actually say that spammers

01:05:54.830 --> 01:05:57.560
are doing something illegal.

01:05:57.560 --> 01:06:00.230
So if you are
using spam messages

01:06:00.230 --> 01:06:03.220
to sell someone-- let's make
this up, let's say sugar,

01:06:03.220 --> 01:06:04.130
sugar's delicious.

01:06:04.130 --> 01:06:07.252
It's not illegal to sell
sugar, even at cut-rate prices.

01:06:07.252 --> 01:06:08.710
So even though the
way that you may

01:06:08.710 --> 01:06:11.400
have drawn the user
to that purchase

01:06:11.400 --> 01:06:13.970
was sort of
duplicitous or gross,

01:06:13.970 --> 01:06:17.180
it is not in and of itself
illegal to sell someone sugar.

01:06:17.180 --> 01:06:18.860
And so as it turns
out, a lot of spam

01:06:18.860 --> 01:06:21.647
sort of falls into
this gray area,

01:06:21.647 --> 01:06:23.480
where the things that
the spammers are doing

01:06:23.480 --> 01:06:26.510
are distasteful, but maybe
not necessarily as illegal

01:06:26.510 --> 01:06:27.370
as you'd think.

01:06:27.370 --> 01:06:30.350
Now, for stuff like
pirated software,

01:06:30.350 --> 01:06:31.742
there it's much more clear-cut.

01:06:31.742 --> 01:06:33.700
But suffice it to say,
it's not always the case

01:06:33.700 --> 01:06:35.710
that you can just point to one
of these banks and say hey,

01:06:35.710 --> 01:06:36.918
your customers are criminals.

01:06:36.918 --> 01:06:38.220
Because that's not always true.

01:06:38.220 --> 01:06:44.870
Particularly if there's not a
very strong paper trail that

01:06:44.870 --> 01:06:48.230
attaches the financial
transaction to some spam

01:06:48.230 --> 01:06:51.160
URL that was the origin
of the transaction.

01:06:51.160 --> 01:06:55.050
It's often very difficult to
prove those types of links.

01:06:55.050 --> 01:06:58.260
OK, so since this
paper was published,

01:06:58.260 --> 01:07:00.952
the credit card networks
have taken some actions.

01:07:00.952 --> 01:07:02.910
So this paper actually
made a pretty big splash

01:07:02.910 --> 01:07:04.100
when it came out.

01:07:04.100 --> 01:07:07.430
And so the association networks
like Visa and MasterCard

01:07:07.430 --> 01:07:09.560
and all of them were
wondering, what can we

01:07:09.560 --> 01:07:13.510
do to cut down on
some of this spam?

01:07:13.510 --> 01:07:15.360
So interestingly, after
the paper came out,

01:07:15.360 --> 01:07:18.710
some pharmaceutical companies
and software vendors actually

01:07:18.710 --> 01:07:21.000
lodged complaints with Visa.

01:07:21.000 --> 01:07:22.450
So if you remember
from the paper,

01:07:22.450 --> 01:07:25.790
Visa was the association
network the researchers used

01:07:25.790 --> 01:07:28.640
to make these test
buys, these dummy buys.

01:07:28.640 --> 01:07:30.890
So it's a little
bit unfortunate,

01:07:30.890 --> 01:07:33.600
but that then showed
some of these companies

01:07:33.600 --> 01:07:37.510
that hey, Visa can be used
as the association network

01:07:37.510 --> 01:07:39.280
to fund some of this
spam, or to translate

01:07:39.280 --> 01:07:41.590
some of this spam traffic.

01:07:41.590 --> 01:07:44.700
So some people
complained about that.

01:07:44.700 --> 01:07:51.270
So Visa made some policy
changes in response

01:07:51.270 --> 01:07:53.600
to some of the issues
that were brought up

01:07:53.600 --> 01:07:56.460
in the paper and some
of the complaints

01:07:56.460 --> 01:07:59.120
that they got as
a result. So now,

01:07:59.120 --> 01:08:07.090
for example, all
pharmaceutical sales are now

01:08:07.090 --> 01:08:11.780
labeled by Visa as high-risk.

01:08:14.990 --> 01:08:19.439
So what this means is
that if a bank acts

01:08:19.439 --> 01:08:27.859
as an acquirer for these
high-risk transactions,

01:08:27.859 --> 01:08:31.569
then Visa will have some more
stringent regulations they will

01:08:31.569 --> 01:08:34.460
put on that merchant-side bank.

01:08:34.460 --> 01:08:36.729
For example, they
will require that bank

01:08:36.729 --> 01:08:38.920
to engage in a risk
management program,

01:08:38.920 --> 01:08:40.970
and they may be audited
more frequently,

01:08:40.970 --> 01:08:42.229
and so on and so forth.

01:08:42.229 --> 01:08:45.410
So Visa made that change.

01:08:45.410 --> 01:08:52.430
And Visa also changed
its operating guidelines.

01:08:52.430 --> 01:08:58.720
So its operating
guidelines, now they

01:08:58.720 --> 01:09:07.220
explicitly enumerate and
forbid illegal sales of drugs

01:09:07.220 --> 01:09:08.970
and trademark-enforcing goods.

01:09:12.050 --> 01:09:14.689
So the reason why they did
this is that by tightening up

01:09:14.689 --> 01:09:17.270
this language, it is
now easier for them

01:09:17.270 --> 01:09:21.737
to issue more aggressive fines
against banks and merchants

01:09:21.737 --> 01:09:25.680
that they feel are doing
things like selling

01:09:25.680 --> 01:09:29.859
illegal pharmaceuticals or
selling knockoff versions

01:09:29.859 --> 01:09:32.065
of watches or things like that.

01:09:32.065 --> 01:09:33.815
So once again, there's
still a lot of spam

01:09:33.815 --> 01:09:36.590
that's in that gray area where
it's not necessarily illegal.

01:09:36.590 --> 01:09:37.624
It's just that the
customers were required

01:09:37.624 --> 01:09:38.665
to do certain techniques.

01:09:38.665 --> 01:09:40.459
And this is very
useful because now Visa

01:09:40.459 --> 01:09:44.450
can drop some much
bigger hammers on folks.

01:09:44.450 --> 01:09:46.450
And as I mentioned before,
some of the spammers

01:09:46.450 --> 01:09:48.420
tried to react to
this by saying,

01:09:48.420 --> 01:09:50.880
well, let's just
prevent these test buys.

01:09:50.880 --> 01:09:52.796
Because not only do
security researchers do

01:09:52.796 --> 01:09:54.902
these test buys, but the
association networks can

01:09:54.902 --> 01:09:55.860
do these test buys too.

01:09:55.860 --> 01:09:58.160
So they did some things like
the photo ID type stuff,

01:09:58.160 --> 01:10:01.820
and that tended not to
work out super well.

01:10:01.820 --> 01:10:04.460
And so at least a few years
after these changes were made,

01:10:04.460 --> 01:10:05.900
this did have an impact.

01:10:05.900 --> 01:10:09.160
I'm not sure what the latest
state-of-the-art is with

01:10:09.160 --> 01:10:12.014
respect to trolling these
Visa policy changes,

01:10:12.014 --> 01:10:14.430
but it was kind of cool to see
this paper have this impact

01:10:14.430 --> 01:10:16.574
in real life.

01:10:16.574 --> 01:10:18.740
So one interesting thing
they mentioned in the paper

01:10:18.740 --> 01:10:21.825
is they talked about
the ethical aspects

01:10:21.825 --> 01:10:23.260
of doing security research.

01:10:23.260 --> 01:10:27.960
And in particular, doing this
research about the spam chain.

01:10:27.960 --> 01:10:31.530
To actually understand how some
of this banking stuff worked,

01:10:31.530 --> 01:10:34.700
these researchers actually
had to make purchases.

01:10:34.700 --> 01:10:37.890
They actually had to
give money to people

01:10:37.890 --> 01:10:39.310
in exchange for these products.

01:10:39.310 --> 01:10:41.420
And so in the paper they
go through this kind

01:10:41.420 --> 01:10:44.857
of semi-hilarious defensive
section where they say,

01:10:44.857 --> 01:10:46.690
we totally burned
everything that we bought.

01:10:46.690 --> 01:10:47.398
We didn't use it.

01:10:47.398 --> 01:10:49.972
We talked to the companies
whose pirated software we

01:10:49.972 --> 01:10:51.320
were buying before we got it.

01:10:51.320 --> 01:10:53.240
But these things are actually
pretty important to go through,

01:10:53.240 --> 01:10:55.100
particularly if you're
within a university setting.

01:10:55.100 --> 01:10:56.600
Because as you may
know, if you want

01:10:56.600 --> 01:10:59.174
to do anything that involves--
particularly human research,

01:10:59.174 --> 01:11:01.590
but anything that might have
these ethical sort of aspects

01:11:01.590 --> 01:11:04.060
to it, you have to get things
cleared by lawyers, sometimes

01:11:04.060 --> 01:11:06.121
by an IRB, and things like that.

01:11:06.121 --> 01:11:07.870
So it's actually pretty
important for them

01:11:07.870 --> 01:11:10.820
to jump through these hoops,
because at the end of the day

01:11:10.820 --> 01:11:13.090
they have to at least be
somewhat confident that they

01:11:13.090 --> 01:11:16.170
weren't supporting some
deeply nefarious activity

01:11:16.170 --> 01:11:18.130
in some far-flung
corner of the world.

01:11:18.130 --> 01:11:20.640
So that was another interesting
part of the paper, too.

01:11:20.640 --> 01:11:23.390
And other people have talked in
this class about things like,

01:11:23.390 --> 01:11:27.610
what are the ethics of releasing
zero-day exploits if you

01:11:27.610 --> 01:11:29.360
know they haven't been
patched by someone?

01:11:29.360 --> 01:11:30.818
So it's a really
interesting aspect

01:11:30.818 --> 01:11:32.075
of doing security research.

01:11:32.075 --> 01:11:36.350
AUDIENCE: Is there any sort of
oversight on security ethics?

01:11:36.350 --> 01:11:39.042
Because in the paper, they
said the IRB wasn't interested.

01:11:39.042 --> 01:11:41.000
PROFESSOR: Yeah, so that
was super interesting.

01:11:41.000 --> 01:11:41.500
Yes.

01:11:41.500 --> 01:11:44.470
They said the IRB wasn't
interested, I think,

01:11:44.470 --> 01:11:48.940
because there was no
obvious human subject.

01:11:48.940 --> 01:11:50.890
But I think that at
most universities,

01:11:50.890 --> 01:11:53.015
you couldn't just
say, oh, there's

01:11:53.015 --> 01:11:54.515
no direct human
subject, let me just

01:11:54.515 --> 01:11:58.220
go buy some stuff from somebody
at the end of a spam link.

01:11:58.220 --> 01:12:00.170
And what they describe
in the paper, actually

01:12:00.170 --> 01:12:01.240
in the acknowledgment
section, they

01:12:01.240 --> 01:12:02.730
thank this whole set of people.

01:12:02.730 --> 01:12:06.024
Like, Sally at Legal,
so-and-so at the Philosophers

01:12:06.024 --> 01:12:07.440
For Ethical Computing
Association,

01:12:07.440 --> 01:12:09.440
and stuff like that.

01:12:09.440 --> 01:12:12.650
I don't think there's
actually a, how would

01:12:12.650 --> 01:12:16.820
you say it, an
America-wide standard

01:12:16.820 --> 01:12:18.420
for doing this type of research.

01:12:18.420 --> 01:12:20.070
I know that each
university's IRB

01:12:20.070 --> 01:12:22.640
has slightly different policies
of what they do and do not

01:12:22.640 --> 01:12:26.639
allow, but I don't think
there's a blanket policy.

01:12:26.639 --> 01:12:29.477
AUDIENCE: Out of the
350 million spam URLs

01:12:29.477 --> 01:12:33.840
they tracked, of the 28 that
actually responded, is there

01:12:33.840 --> 01:12:37.554
any chance that an appreciable
number of those 28 spam

01:12:37.554 --> 01:12:39.637
responses were coming from
researchers researching

01:12:39.637 --> 01:12:42.332
on spam?

01:12:42.332 --> 01:12:44.540
PROFESSOR: Well, it's true
that this type of calculus

01:12:44.540 --> 01:12:46.320
is actually one
reason why I think

01:12:46.320 --> 01:12:49.302
the authors went to such
lengths to defend themselves.

01:12:49.302 --> 01:12:51.780
Because if you think
about it, the reason

01:12:51.780 --> 01:12:53.680
why those statistics
are so hilarious

01:12:53.680 --> 01:12:56.210
is that it means that if you
were to add five or remove

01:12:56.210 --> 01:12:58.340
five, that's the difference
between a spammer being

01:12:58.340 --> 01:13:00.090
able to give their
kids, like, a real gift

01:13:00.090 --> 01:13:01.952
versus a piece of coal.

01:13:01.952 --> 01:13:03.410
Because those
numbers are so small.

01:13:06.712 --> 01:13:08.587
So with regard to that
particular [INAUDIBLE]

01:13:08.587 --> 01:13:10.545
that I gave you, I don't
know how many of those

01:13:10.545 --> 01:13:12.190
were researchers.

01:13:12.190 --> 01:13:15.420
But I do think in general--
like I said, the spammers,

01:13:15.420 --> 01:13:17.010
they want to take your money.

01:13:17.010 --> 01:13:19.460
And so if they could
find some equilibrium

01:13:19.460 --> 01:13:23.200
whereby security researchers
could do test buys,

01:13:23.200 --> 01:13:25.650
but that had no impact
on their overall sales,

01:13:25.650 --> 01:13:26.949
they'd be fine with that.

01:13:26.949 --> 01:13:27.990
They just want the money.

01:13:27.990 --> 01:13:29.615
But the tricky thing
is that, let's say

01:13:29.615 --> 01:13:32.520
that-- let's make some
number up-- half of those 35

01:13:32.520 --> 01:13:34.560
were test buys, and
that resulted in people

01:13:34.560 --> 01:13:37.490
putting pressure on the banks,
and then instead of 35 they'd

01:13:37.490 --> 01:13:38.470
be getting two.

01:13:38.470 --> 01:13:39.380
That they don't want.

01:13:39.380 --> 01:13:41.956
So that's why they're so
motivated to stop that stuff.

01:13:41.956 --> 01:13:44.436
AUDIENCE: How much of
this is blind emailing

01:13:44.436 --> 01:13:45.924
versus any sort of filtering?

01:13:45.924 --> 01:13:48.652
Because I'm sure they
could run some models

01:13:48.652 --> 01:13:51.380
and get that 350 million
down to, like, one page.

01:13:51.380 --> 01:13:54.350
PROFESSOR: Yeah, so it's all
about the cost-benefit analysis

01:13:54.350 --> 01:13:56.350
from the perspective
of the spammer.

01:13:56.350 --> 01:13:59.660
So I think that you're right,
and there are actually--

01:13:59.660 --> 01:14:02.922
there's a marketplace
for more targeted stuff.

01:14:02.922 --> 01:14:05.380
In particular, that's where
some of those compromised email

01:14:05.380 --> 01:14:07.650
accounts can become very useful.

01:14:07.650 --> 01:14:10.170
But I think what you
see is that people

01:14:10.170 --> 01:14:14.774
tend to go for the more
focused stuff, like the more

01:14:14.774 --> 01:14:16.190
focused spam emails,
for what they

01:14:16.190 --> 01:14:17.960
view as higher-reward targets.

01:14:17.960 --> 01:14:21.240
So for example,
political groups.

01:14:21.240 --> 01:14:24.010
People associated with the
Dalai Lama, for instance.

01:14:24.010 --> 01:14:26.620
There, the perceived
value of being

01:14:26.620 --> 01:14:28.260
able to get into that
system is so high

01:14:28.260 --> 01:14:30.958
that people will spend the
time to do this kind of stuff.

01:14:30.958 --> 01:14:32.333
AUDIENCE: It would
be interesting

01:14:32.333 --> 01:14:33.940
if there was one
company dedicated

01:14:33.940 --> 01:14:35.788
to finding all the
gullible grandmas

01:14:35.788 --> 01:14:37.640
and putting their
emails into stuff.

01:14:37.640 --> 01:14:38.270
PROFESSOR: Oh, interesting.

01:14:38.270 --> 01:14:38.780
I see.

01:14:38.780 --> 01:14:40.154
So basically having
some database

01:14:40.154 --> 01:14:42.660
where it's like, totally send
spam to this person, because--

01:14:42.660 --> 01:14:43.700
AUDIENCE: It works.

01:14:43.700 --> 01:14:45.908
PROFESSOR: I wouldn't be
surprised if stuff like that

01:14:45.908 --> 01:14:49.310
existed, but I don't
know if they do.

01:14:49.310 --> 01:14:52.110
So one last thing that I
wanted to mention is that,

01:14:52.110 --> 01:14:54.730
and I alluded to this a
bit earlier in the lecture,

01:14:54.730 --> 01:14:57.970
that some companies have taken
to doing these things they

01:14:57.970 --> 01:14:59.357
call hackbacks.

01:14:59.357 --> 01:15:01.440
So the idea is that, let's
say that you're a bank,

01:15:01.440 --> 01:15:02.981
someone tries to
break into your bank

01:15:02.981 --> 01:15:04.440
and steal your information.

01:15:04.440 --> 01:15:07.040
That bank will then,
of their own volition,

01:15:07.040 --> 01:15:10.780
go back to those hackers
and try to do something.

01:15:10.780 --> 01:15:13.116
Where something may be as
quote-on-quote innocuous

01:15:13.116 --> 01:15:15.090
as shutting down
the botnet, or maybe

01:15:15.090 --> 01:15:16.920
they try to steal
their information back,

01:15:16.920 --> 01:15:17.794
and things like that.

01:15:17.794 --> 01:15:20.940
This has actually
become very much more

01:15:20.940 --> 01:15:22.550
common than it used to be.

01:15:22.550 --> 01:15:26.910
And one reason for this is that
because the legal system has

01:15:26.910 --> 01:15:30.261
a little bit slow in adapting
to some of these threats,

01:15:30.261 --> 01:15:32.760
some of these institutions, in
particular software companies

01:15:32.760 --> 01:15:34.852
and banks, are tired of
waiting for government--

01:15:34.852 --> 01:15:36.560
like, their national
government-- to deal

01:15:36.560 --> 01:15:37.540
with stuff.

01:15:37.540 --> 01:15:40.630
So what ends up happening
is that, for example, there

01:15:40.630 --> 01:15:43.000
was this big botnet
in 2013 that was

01:15:43.000 --> 01:15:45.690
hosting all kinds of pirated
goods and things like that.

01:15:45.690 --> 01:15:51.010
And so this huge coalition of
Microsoft, American Express,

01:15:51.010 --> 01:15:53.350
Paypal, a bunch of them
launched an operation

01:15:53.350 --> 01:15:55.379
to take down a botnet.

01:15:55.379 --> 01:15:56.920
They themselves took
down the botnet.

01:15:56.920 --> 01:15:58.586
They lurked around
for a while, they

01:15:58.586 --> 01:16:01.210
learned about where the command
and control infrastructure was.

01:16:01.210 --> 01:16:02.690
They actually went
in there, took

01:16:02.690 --> 01:16:04.773
control of the command and
control infrastructure,

01:16:04.773 --> 01:16:06.685
identified where all
the end-user bots were.

01:16:06.685 --> 01:16:08.590
And they could send
them messages saying,

01:16:08.590 --> 01:16:10.630
you need to patch your machine.

01:16:10.630 --> 01:16:13.790
And so it's a very interesting
area of intersection

01:16:13.790 --> 01:16:15.960
between security and the law.

01:16:15.960 --> 01:16:17.850
Because what part
of American law,

01:16:17.850 --> 01:16:21.810
for example, gave those
companies the right to do that?

01:16:21.810 --> 01:16:24.880
So what Microsoft
lawyers said, at least,

01:16:24.880 --> 01:16:26.530
is that they said
these botnets were

01:16:26.530 --> 01:16:29.380
violating Microsoft trademarks.

01:16:29.380 --> 01:16:31.450
So for example, if you
sell pirated goods,

01:16:31.450 --> 01:16:34.222
and you're saying this
is Windows, for example,

01:16:34.222 --> 01:16:36.180
but it's not actually
Windows or it didn't come

01:16:36.180 --> 01:16:38.630
from an official channel,
then Microsoft says OK,

01:16:38.630 --> 01:16:40.340
you're violating our trademark.

01:16:40.340 --> 01:16:43.330
Therefore we can
hack your botnet.

01:16:43.330 --> 01:16:46.980
It's a little interesting to
see how that leap of logic

01:16:46.980 --> 01:16:47.760
took place.

01:16:47.760 --> 01:16:49.280
But the courts allowed it.

01:16:49.280 --> 01:16:51.440
And this is increasingly
happening more and more.

01:16:51.440 --> 01:16:54.440
And the banks in particular seem
to be pretty upset about this,

01:16:54.440 --> 01:16:57.386
because there seems to be a
lot of state-level sponsorship

01:16:57.386 --> 01:16:58.835
of some of these banking hacks.

01:16:58.835 --> 01:17:00.840
And the bankers care
about the money,

01:17:00.840 --> 01:17:02.350
and so when they
lose this money,

01:17:02.350 --> 01:17:04.000
they get very upset about that.

01:17:04.000 --> 01:17:06.470
And so it's
interesting to see how

01:17:06.470 --> 01:17:09.630
some of the burden
for doing cyber

01:17:09.630 --> 01:17:11.940
security, in particular
offensive operations,

01:17:11.940 --> 01:17:14.800
has now shifted a little bit
more to the private sector.

01:17:14.800 --> 01:17:17.750
So it's not quite clear what
the long-term implications are.

01:17:17.750 --> 01:17:18.250
OK.

01:17:18.250 --> 01:17:19.770
That's the end of
the lecture, and I

01:17:19.770 --> 01:17:21.890
guess we will see
you on Wednesday

01:17:21.890 --> 01:17:25.240
and we'll go through
the class projects.