WEBVTT

00:00:00.060 --> 00:00:02.500
The following content is
provided under a Creative

00:00:02.500 --> 00:00:04.019
Commons license.

00:00:04.019 --> 00:00:06.360
Your support will help
MIT OpenCourseWare

00:00:06.360 --> 00:00:10.730
continue to offer high quality,
educational resources for free.

00:00:10.730 --> 00:00:13.330
To make a donation or
view additional materials

00:00:13.330 --> 00:00:17.236
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:17.236 --> 00:00:17.861
at ocw.mit.edu.

00:00:26.600 --> 00:00:29.342
NANCY LYNCH: OK so today, you're
going to see something new.

00:00:29.342 --> 00:00:30.800
In fact all this
week, you're going

00:00:30.800 --> 00:00:33.510
to see something that's quite
different from what you've

00:00:33.510 --> 00:00:36.960
been studying in this course.

00:00:36.960 --> 00:00:37.950
These are algorithms.

00:00:37.950 --> 00:00:42.380
But they're for a completely
different sort of model.

00:00:42.380 --> 00:00:46.110
Distributed algorithms,
OK, so what are they?

00:00:46.110 --> 00:00:48.190
So now instead of
having algorithms

00:00:48.190 --> 00:00:50.480
that run on a typical
computer, you're

00:00:50.480 --> 00:00:55.400
going to have algorithms that
run on a network of processors.

00:00:55.400 --> 00:00:57.350
Or it could be on
one machine that

00:00:57.350 --> 00:01:01.330
has multiple processors,
multi processors that memory.

00:01:05.970 --> 00:01:09.370
Much of computing is
distributed algorithms now.

00:01:09.370 --> 00:01:11.550
They solve problems
like communication

00:01:11.550 --> 00:01:18.560
on the internet, data
management over a network,

00:01:18.560 --> 00:01:21.260
allocating resources
in a network setting,

00:01:21.260 --> 00:01:23.540
synchronizing,
reaching agreement

00:01:23.540 --> 00:01:28.990
among different agents
at remote locations.

00:01:28.990 --> 00:01:31.470
So these are all distributed
problems, not things

00:01:31.470 --> 00:01:34.610
that you just solve
on one computer.

00:01:34.610 --> 00:01:38.120
The kinds of algorithms you
design for these settings

00:01:38.120 --> 00:01:45.420
have to work under extremely
difficult platforms

00:01:45.420 --> 00:01:48.360
because what you have is
concurrent activity that's

00:01:48.360 --> 00:01:51.840
going on at many locations,
many processors doing things

00:01:51.840 --> 00:01:53.220
at the same time.

00:01:53.220 --> 00:01:55.800
And you don't know exactly
when everybody's going

00:01:55.800 --> 00:01:57.970
to perform their activities.

00:01:57.970 --> 00:02:02.090
You can have different
sorts of timing uncertainty.

00:02:02.090 --> 00:02:05.010
The order of events isn't clear.

00:02:05.010 --> 00:02:08.830
There could be inputs that
arrive at different locations.

00:02:08.830 --> 00:02:12.150
And then you also have to
deal with failure and recovery

00:02:12.150 --> 00:02:15.190
of some of the processors or
some of the channels involved

00:02:15.190 --> 00:02:16.522
in the computation.

00:02:16.522 --> 00:02:17.980
You don't think of
any of this when

00:02:17.980 --> 00:02:20.315
you're just trying to run an
algorithm on one computer.

00:02:22.920 --> 00:02:25.990
So distributed algorithms
can be pretty complicated.

00:02:25.990 --> 00:02:28.210
It's not easy to design them.

00:02:28.210 --> 00:02:30.654
And after you design
them, you still

00:02:30.654 --> 00:02:32.070
have to make sure
they're correct.

00:02:32.070 --> 00:02:34.290
So there are issues involved
in proving them correct

00:02:34.290 --> 00:02:35.730
and analyzing them.

00:02:35.730 --> 00:02:37.980
A little bit of
history, the field

00:02:37.980 --> 00:02:42.000
pretty much started
around the late '60s.

00:02:42.000 --> 00:02:46.330
Edsger Dijkstra was one of the
earliest leaders in the field.

00:02:46.330 --> 00:02:49.850
He won of the first
Turing Awards.

00:02:49.850 --> 00:02:52.780
Leslie Lamport won the
Turing Award last year.

00:02:52.780 --> 00:02:55.590
Although he actually
started as a very young guy,

00:02:55.590 --> 00:02:59.470
way back in the early
days of the field.

00:02:59.470 --> 00:03:01.770
If you want to look at some
sources, I have a book.

00:03:01.770 --> 00:03:04.390
There's another textbook
by Attiya and Welch.

00:03:04.390 --> 00:03:06.710
There's a new series of
monographs that basically

00:03:06.710 --> 00:03:10.190
try to summarize many of the
important research topics

00:03:10.190 --> 00:03:12.250
in distributed computing theory.

00:03:12.250 --> 00:03:16.170
And the last two lines have a
couple of the main conferences

00:03:16.170 --> 00:03:18.620
in the field.

00:03:18.620 --> 00:03:21.750
OK so I can't do that
much in one week.

00:03:21.750 --> 00:03:24.610
What I'll do is just
introduce the area,

00:03:24.610 --> 00:03:29.140
by showing you two common
models for distributed networks.

00:03:29.140 --> 00:03:32.686
And just introduce a very
few fundamental algorithms,

00:03:32.686 --> 00:03:35.060
and you'll see along the way
some techniques for modeling

00:03:35.060 --> 00:03:37.030
and analyzing them.

00:03:37.030 --> 00:03:39.740
OK the two models here are
synchronous distributed

00:03:39.740 --> 00:03:44.820
networks, and asynchronous
distributed networks.

00:03:44.820 --> 00:03:47.050
The problems I'll look at
in the synchronous setting

00:03:47.050 --> 00:03:50.860
are a simple problem of leader
election, which is a symmetry

00:03:50.860 --> 00:03:53.400
breaking problem, basically.

00:03:53.400 --> 00:03:58.227
Maximal independence set
problem, and then a couple

00:03:58.227 --> 00:04:00.060
of problems that should
look familiar to you

00:04:00.060 --> 00:04:04.530
from the settings of this
class, establishing structures

00:04:04.530 --> 00:04:08.360
like breadth-first spanning
trees and shortest paths trees.

00:04:08.360 --> 00:04:10.800
In the asynchronous case
I'll revisit these last two

00:04:10.800 --> 00:04:13.290
problems, setting up
breadth-first and shortest path

00:04:13.290 --> 00:04:15.100
trees.

00:04:15.100 --> 00:04:17.620
OK so I mentioned
something about modeling

00:04:17.620 --> 00:04:19.430
in proofs and analysis.

00:04:19.430 --> 00:04:23.030
Turns out, getting the
formal models right

00:04:23.030 --> 00:04:25.730
and getting real
proofs tends to be

00:04:25.730 --> 00:04:27.830
pretty important for
distributed algorithms

00:04:27.830 --> 00:04:31.180
because with all the stuff
going on, they're complicated.

00:04:31.180 --> 00:04:34.030
And it's easy to make mistakes.

00:04:34.030 --> 00:04:38.110
The kinds of models that we use
are interacting state machines,

00:04:38.110 --> 00:04:39.320
inputs and outputs.

00:04:39.320 --> 00:04:41.640
They send each other messages.

00:04:41.640 --> 00:04:44.050
But the kinds of
proofs you do typically

00:04:44.050 --> 00:04:46.210
use invariants, a
technique that you're very

00:04:46.210 --> 00:04:47.680
familiar with from this class.

00:04:47.680 --> 00:04:50.670
You can still use them
in a distributed setting.

00:04:50.670 --> 00:04:53.910
And you still prove them
the same way, by induction.

00:04:53.910 --> 00:04:56.640
Something else that comes up a
lot in the distributed setting

00:04:56.640 --> 00:05:00.690
is modeling and proofs
using levels of abstraction.

00:05:00.690 --> 00:05:02.810
You might want to give
an abstract description

00:05:02.810 --> 00:05:04.860
of an algorithm and
prove that that works.

00:05:04.860 --> 00:05:07.720
And then you have a very
detailed, complicated,

00:05:07.720 --> 00:05:11.480
lower level description that you
can prove implements the higher

00:05:11.480 --> 00:05:13.286
level description.

00:05:13.286 --> 00:05:15.460
That's another
popular technique.

00:05:15.460 --> 00:05:17.670
You use different kinds
of complexity measures.

00:05:17.670 --> 00:05:21.510
For time complexity,
you would measure rounds

00:05:21.510 --> 00:05:25.610
if it's the synchronous
model, or some approximation

00:05:25.610 --> 00:05:28.660
to real time, if it's
the asynchronous model.

00:05:28.660 --> 00:05:31.410
You also count communication,
either the number

00:05:31.410 --> 00:05:33.660
of messages you send, or
the total number of bits

00:05:33.660 --> 00:05:35.421
that you send in an algorithm.

00:05:38.000 --> 00:05:40.550
So throughout
these two lectures,

00:05:40.550 --> 00:05:44.300
we'll be looking at
distributed networks.

00:05:44.300 --> 00:05:45.740
So you start with a graph.

00:05:45.740 --> 00:05:49.830
Let's just look at
undirected graphs this week.

00:05:49.830 --> 00:05:52.050
We use n in this
field for what you're

00:05:52.050 --> 00:05:56.490
calling v, the total number
of nodes in the network

00:05:56.490 --> 00:06:01.780
or vertices in the graph.

00:06:01.780 --> 00:06:05.200
We use the notation gamma of
u to mean the neighbors of u

00:06:05.200 --> 00:06:06.910
in the graph.

00:06:06.910 --> 00:06:11.060
So every vertex of the graph has
a set of immediate neighboring

00:06:11.060 --> 00:06:11.810
vertices.

00:06:11.810 --> 00:06:13.730
That's gamma of u.

00:06:13.730 --> 00:06:19.310
And the degree of u is the size
of the neighborhood, the number

00:06:19.310 --> 00:06:22.090
of neighbors of the vertex.

00:06:22.090 --> 00:06:24.050
OK so we start with the graph.

00:06:24.050 --> 00:06:25.740
But now we're
going to plunk down

00:06:25.740 --> 00:06:29.350
a process, some kind
of active entity

00:06:29.350 --> 00:06:31.900
at each vertex of the graph.

00:06:31.900 --> 00:06:33.520
So this is some
kind of automaton.

00:06:33.520 --> 00:06:36.130
If you've taken automata
theory, it's not really

00:06:36.130 --> 00:06:39.960
finite state machines, it's more
like infinite state automata

00:06:39.960 --> 00:06:43.760
that can interact
with each other.

00:06:43.760 --> 00:06:47.820
So we usually talk about
vertices in a graph, processes

00:06:47.820 --> 00:06:49.160
at the vertices of a graph.

00:06:49.160 --> 00:06:51.930
But sometimes we get
sloppy and just say nodes.

00:06:51.930 --> 00:06:54.980
And we could mean either the
vertex or the active thing

00:06:54.980 --> 00:06:56.740
running at the vertex.

00:06:56.740 --> 00:06:59.840
Can't keep them
straight all the time.

00:06:59.840 --> 00:07:02.120
OK and then with the
edges of the graph,

00:07:02.120 --> 00:07:05.120
we would put
communication channels,

00:07:05.120 --> 00:07:08.690
one in each direction,
so that the processes

00:07:08.690 --> 00:07:11.700
can communicate over the edges.

00:07:11.700 --> 00:07:13.840
This week I'm not going
to talk about what

00:07:13.840 --> 00:07:16.750
happens when you introduce
failures because we just

00:07:16.750 --> 00:07:17.610
don't have time.

00:07:17.610 --> 00:07:20.800
A lot of distributed computing
theory deals with what

00:07:20.800 --> 00:07:24.340
happens when some of the
components in your system fail.

00:07:24.340 --> 00:07:27.330
How do you cope with that?

00:07:27.330 --> 00:07:29.880
So we'll start right in
with synchronous distributed

00:07:29.880 --> 00:07:30.380
algorithms.

00:07:32.956 --> 00:07:34.580
A source for that,
if you're interested

00:07:34.580 --> 00:07:38.180
is the first technical
chapter in my book.

00:07:38.180 --> 00:07:41.060
OK so you have processes
at the nodes of a graph,

00:07:41.060 --> 00:07:42.270
like I just said.

00:07:42.270 --> 00:07:45.830
They communicate using messages.

00:07:45.830 --> 00:07:49.460
So think of each process as not
knowing who his neighbors are,

00:07:49.460 --> 00:07:51.930
not knowing anything
about the graph.

00:07:51.930 --> 00:07:53.080
So what do they have?

00:07:53.080 --> 00:07:53.990
They have ports.

00:07:53.990 --> 00:07:57.410
You could say they have output
ports, on which they could send

00:07:57.410 --> 00:08:01.360
a message, and then some
input ports on which messages

00:08:01.360 --> 00:08:02.900
can come in.

00:08:02.900 --> 00:08:06.060
So in general, the
process doesn't know

00:08:06.060 --> 00:08:08.660
who the ports are connected to.

00:08:08.660 --> 00:08:10.470
It just has local
names for the ports,

00:08:10.470 --> 00:08:13.800
like one, two, three,
up to the degree.

00:08:13.800 --> 00:08:17.110
If you have any questions
just stop me and ask,

00:08:17.110 --> 00:08:19.190
if something's not clear.

00:08:19.190 --> 00:08:20.802
Otherwise I'll go pretty fast.

00:08:20.802 --> 00:08:22.510
And I know that none
of this is familiar.

00:08:25.620 --> 00:08:27.470
So in general, the
processes don't have

00:08:27.470 --> 00:08:31.070
to be distinguishable at all.

00:08:31.070 --> 00:08:35.127
So they don't have to have
special unique identifiers

00:08:35.127 --> 00:08:36.710
so you could tell
the processes apart.

00:08:36.710 --> 00:08:38.995
They could be
completely identical.

00:08:38.995 --> 00:08:40.870
Well if they have
different numbers of ports,

00:08:40.870 --> 00:08:43.370
they're not exactly identical.

00:08:43.370 --> 00:08:44.870
They certainly
know how many ports

00:08:44.870 --> 00:08:47.540
they have, and release the
local names for the ports.

00:08:51.320 --> 00:08:52.817
Good so these are
processes sitting

00:08:52.817 --> 00:08:53.900
at the nodes of the graph.

00:08:53.900 --> 00:08:55.350
What do they do?

00:08:55.350 --> 00:08:56.490
So they execute.

00:08:56.490 --> 00:09:00.900
And we talk about an
execution of this network.

00:09:00.900 --> 00:09:04.310
It goes in synchronous
rounds, and every round,

00:09:04.310 --> 00:09:06.620
every process
looks at its state,

00:09:06.620 --> 00:09:08.820
and decides what
messages it's going

00:09:08.820 --> 00:09:12.460
to send on all of the ports.

00:09:12.460 --> 00:09:15.425
So it could send different
messages on different ports.

00:09:18.110 --> 00:09:20.200
So then what happens
is all the messages

00:09:20.200 --> 00:09:23.540
that the processes decide to
send get put onto the channels

00:09:23.540 --> 00:09:26.810
and they get delivered to
the process at the other end.

00:09:26.810 --> 00:09:30.520
So the process of the
other end is in some state.

00:09:30.520 --> 00:09:32.090
All these messages come in.

00:09:32.090 --> 00:09:34.840
It updates its state, based
on the arriving messages.

00:09:34.840 --> 00:09:37.750
So it changes state in
response to whatever comes in.

00:09:42.880 --> 00:09:46.780
And this is completely different
from this semester so far.

00:09:46.780 --> 00:09:48.880
We're going to completely
ignore the costs

00:09:48.880 --> 00:09:51.560
of the local computation.

00:09:51.560 --> 00:09:54.642
So each node can compute
some complicated algorithm

00:09:54.642 --> 00:09:56.600
of the sort you've been
studying in this class,

00:09:56.600 --> 00:09:59.310
and we usually don't
consider that cost.

00:09:59.310 --> 00:10:03.610
We're more worried about
the communication costs.

00:10:03.610 --> 00:10:07.690
And so we'll be focusing on the
number of rounds that it takes,

00:10:07.690 --> 00:10:11.170
in the synchronous case, and
the number of communication

00:10:11.170 --> 00:10:14.710
messages or bits.

00:10:14.710 --> 00:10:15.460
OK so far?

00:10:18.650 --> 00:10:20.150
So let's start on
the first problem.

00:10:20.150 --> 00:10:22.250
Here's a graph.

00:10:22.250 --> 00:10:24.460
The nodes start out
possibly identical,

00:10:24.460 --> 00:10:27.300
but you want to somehow
distinguish one of them

00:10:27.300 --> 00:10:30.200
to be a leader.

00:10:30.200 --> 00:10:34.150
So you have this arbitrary,
connected, undirected graph.

00:10:34.150 --> 00:10:36.280
And exactly one
process is supposed

00:10:36.280 --> 00:10:37.650
to elect itself the leader.

00:10:37.650 --> 00:10:40.645
That means it outputs a
special leader signal.

00:10:43.360 --> 00:10:45.554
so exactly one should do that.

00:10:45.554 --> 00:10:46.720
So why do you want a leader?

00:10:46.720 --> 00:10:52.080
Well in practice, leaders
can coordinate things.

00:10:52.080 --> 00:10:54.160
They can take charge
of communication,

00:10:54.160 --> 00:10:56.110
and inform other
nodes when they're

00:10:56.110 --> 00:10:57.380
allowed to send messages.

00:10:57.380 --> 00:10:59.440
They can coordinate
the processing of data.

00:10:59.440 --> 00:11:01.610
Basically it allows
you to centralize

00:11:01.610 --> 00:11:03.190
some of the computation.

00:11:03.190 --> 00:11:05.400
It can schedule the
other processes.

00:11:05.400 --> 00:11:07.680
It can allocate the resources.

00:11:07.680 --> 00:11:10.020
It could help to reach
agreement among the processes,

00:11:10.020 --> 00:11:12.280
if they start out with
different opinions about what

00:11:12.280 --> 00:11:13.196
is supposed to happen.

00:11:15.782 --> 00:11:17.990
All right so let's start
out with a very simple case.

00:11:17.990 --> 00:11:18.740
You have a clique.

00:11:18.740 --> 00:11:22.500
Here's a four clique, where
all the vertices are directly

00:11:22.500 --> 00:11:24.490
connected to all
the other vertices,

00:11:24.490 --> 00:11:27.932
with two directional channels.

00:11:27.932 --> 00:11:29.265
And the processes are identical.

00:11:31.962 --> 00:11:33.420
So I should have
asked you, instead

00:11:33.420 --> 00:11:36.790
of just giving the
answer here, but are they

00:11:36.790 --> 00:11:39.550
able to elect a leader?

00:11:39.550 --> 00:11:42.770
So this theorem says that in
general, that's impossible.

00:11:42.770 --> 00:11:46.880
Or it's not possible, in
the most general case.

00:11:46.880 --> 00:11:48.730
If you have, no
matter what n is,

00:11:48.730 --> 00:11:53.100
let's just say we have an
n vertex clique for some n.

00:11:53.100 --> 00:11:57.040
It's not possible to have
any algorithm that you

00:11:57.040 --> 00:12:01.760
can have all the processes
run, if it's deterministic

00:12:01.760 --> 00:12:04.550
and the processes start
out all indistinguishable.

00:12:04.550 --> 00:12:07.940
There's no way
that they can elect

00:12:07.940 --> 00:12:09.430
a single node as a leader.

00:12:09.430 --> 00:12:12.030
So do you have an intuition
for why that might be the case?

00:12:14.982 --> 00:12:16.434
Yeah.

00:12:16.434 --> 00:12:17.934
AUDIENCE: They're
all connected, and

00:12:17.934 --> 00:12:21.378
the cross-problem
communication in one round

00:12:21.378 --> 00:12:23.697
is equal, then to
be equal likely

00:12:23.697 --> 00:12:24.822
to select each one of them.

00:12:24.822 --> 00:12:26.300
It would be--

00:12:26.300 --> 00:12:30.290
NANCY LYNCH: It's deterministic
there's no likelihood here.

00:12:30.290 --> 00:12:34.410
And nobody is doing
any selecting.

00:12:34.410 --> 00:12:36.640
You're talking as if there's
somebody who's choosing

00:12:36.640 --> 00:12:38.260
a process to do something.

00:12:38.260 --> 00:12:40.870
There isn't anyone in charge.

00:12:40.870 --> 00:12:43.700
So this is a really
different way of thinking.

00:12:43.700 --> 00:12:46.180
AUDIENCE: So every node is
essentially the exact same.

00:12:46.180 --> 00:12:49.370
So if it says, OK, let's
assume I'm going to be leader,

00:12:49.370 --> 00:12:53.400
everyone is going to assume
they're going to be leader.

00:12:53.400 --> 00:12:55.420
NANCY LYNCH: That's exactly
the right intuition.

00:12:55.420 --> 00:12:56.810
They can't
distinguish themselves

00:12:56.810 --> 00:12:59.960
because they're always
going to do the same thing.

00:12:59.960 --> 00:13:01.340
Let's look at a
very simple case.

00:13:01.340 --> 00:13:05.420
Suppose we have just two nodes,
two node clique, two nodes

00:13:05.420 --> 00:13:07.970
connected by channels.

00:13:07.970 --> 00:13:09.470
These are identical.

00:13:09.470 --> 00:13:10.600
They're deterministic.

00:13:10.600 --> 00:13:11.900
What can they do?

00:13:11.900 --> 00:13:14.530
Well you could try to design
algorithms for one of them

00:13:14.530 --> 00:13:16.680
to elect itself as the leader.

00:13:16.680 --> 00:13:18.960
But you can show,
by using induction,

00:13:18.960 --> 00:13:20.530
that the processes
are actually going

00:13:20.530 --> 00:13:25.230
to remain in the same state
as each other forever,

00:13:25.230 --> 00:13:28.260
however many rounds you execute.

00:13:28.260 --> 00:13:30.130
So let's slow down.

00:13:30.130 --> 00:13:32.090
We can work by contradiction.

00:13:32.090 --> 00:13:36.460
Suppose you have an algorithm
that solves this problem.

00:13:36.460 --> 00:13:38.270
Both of the processes,
they're identical.

00:13:38.270 --> 00:13:40.676
They start in the
same start state.

00:13:40.676 --> 00:13:42.300
Let's say there's a
unique start state.

00:13:46.350 --> 00:13:49.750
So we could prove by induction
on the number of rounds

00:13:49.750 --> 00:13:53.250
that after any number
of rounds, say r rounds,

00:13:53.250 --> 00:13:57.770
the processes are still
in identical states.

00:13:57.770 --> 00:13:59.770
So the inductive step
is, all right, they're

00:13:59.770 --> 00:14:01.960
in identical states after
some number of rounds.

00:14:01.960 --> 00:14:03.677
Let's look at the next round.

00:14:03.677 --> 00:14:04.760
They're in the same state.

00:14:04.760 --> 00:14:08.330
So they generate
the same messages.

00:14:08.330 --> 00:14:09.900
So they each other
the same messages.

00:14:09.900 --> 00:14:12.680
They receive the same message.

00:14:12.680 --> 00:14:15.140
And then they make
the same state change.

00:14:15.140 --> 00:14:17.080
So they stay in the same state.

00:14:19.800 --> 00:14:22.260
And you can tweak
this, and say how this

00:14:22.260 --> 00:14:24.025
works for-- yeah, question?

00:14:24.025 --> 00:14:28.010
AUDIENCE: So in what ways is
the proof a contradiction?

00:14:28.010 --> 00:14:29.260
NANCY LYNCH: I'm not finished.

00:14:29.260 --> 00:14:30.620
You're exactly right.

00:14:30.620 --> 00:14:33.850
We have to finish by using the
requirements of the problem.

00:14:33.850 --> 00:14:38.410
Since the algorithm has to solve
the leader election problem,

00:14:38.410 --> 00:14:41.170
the requirements say that
eventually, one of them

00:14:41.170 --> 00:14:45.460
has to output leader.

00:14:45.460 --> 00:14:46.820
And what happens when he does?

00:14:50.940 --> 00:14:51.440
Anyone?

00:14:51.440 --> 00:14:51.730
Yeah.

00:14:51.730 --> 00:14:54.240
AUDIENCE: You have node also
outputting the leader signal.

00:14:54.240 --> 00:14:56.790
NANCY LYNCH: Yeah the other one
would also do the same thing.

00:14:56.790 --> 00:14:59.820
We're saying round by round,
they stay in the same state.

00:14:59.820 --> 00:15:04.360
So as someone said before,
when one guy outputs leader,

00:15:04.360 --> 00:15:08.210
at the same round the other
guy will output leader as well.

00:15:08.210 --> 00:15:10.545
So that's a contradiction
to the problem requirements.

00:15:10.545 --> 00:15:12.170
Notice we didn't
assume anything at all

00:15:12.170 --> 00:15:15.040
about exactly how
the algorithm works.

00:15:15.040 --> 00:15:17.990
We're just saying, however it
works, it can't solve this,

00:15:17.990 --> 00:15:20.110
under the assumptions
that the nodes

00:15:20.110 --> 00:15:21.830
are indistinguishable
and deterministic.

00:15:24.780 --> 00:15:26.710
So as you can see,
this will extend if you

00:15:26.710 --> 00:15:30.680
have larger cliques of size n.

00:15:30.680 --> 00:15:33.710
So now the process has
not just one output port,

00:15:33.710 --> 00:15:38.080
it has n minus 1 output ports to
connect to all the other nodes.

00:15:38.080 --> 00:15:41.370
Let's say they're numbered
1 through n minus 1.

00:15:41.370 --> 00:15:45.240
And one of the possibilities,
and one I'll use in this proof

00:15:45.240 --> 00:15:47.980
is that the ports happen to
be numbered consistently.

00:15:47.980 --> 00:15:52.470
So that if you have output
port number k at one node,

00:15:52.470 --> 00:15:57.320
it's connected to input port
number k at the other end.

00:15:57.320 --> 00:16:00.207
So that's one way
things can match up.

00:16:00.207 --> 00:16:01.790
All right if that's
the case, we could

00:16:01.790 --> 00:16:03.230
do the same proof we just did.

00:16:03.230 --> 00:16:06.560
Show by induction that all
the processes in the clique

00:16:06.560 --> 00:16:09.580
remain in the same
state forever.

00:16:09.580 --> 00:16:10.652
So same proof.

00:16:10.652 --> 00:16:12.610
Suppose you have an
algorithm that's solves it.

00:16:12.610 --> 00:16:14.620
They all began in
the same state.

00:16:14.620 --> 00:16:17.080
You show by induction that
they all remain the same state.

00:16:19.690 --> 00:16:21.640
Well so now we slow
down a little bit.

00:16:21.640 --> 00:16:25.920
Each process sends a possibly
different message on each port.

00:16:25.920 --> 00:16:28.540
But everybody sends the
same message on port k

00:16:28.540 --> 00:16:30.980
because they're all
indistinguishable.

00:16:30.980 --> 00:16:33.080
And then because the
way the ports match up,

00:16:33.080 --> 00:16:36.370
everybody receives the
same message on port k.

00:16:36.370 --> 00:16:38.120
And then they make the
same state changes.

00:16:41.030 --> 00:16:43.456
AUDIENCE: Does this
proof imply that there's

00:16:43.456 --> 00:16:46.442
a kernel for simplifying the
graph when you find a clique?

00:16:50.240 --> 00:16:53.250
NANCY LYNCH: No because
if you have a graph that

00:16:53.250 --> 00:16:55.060
consists of a clique
and then let's say,

00:16:55.060 --> 00:16:57.330
some other stuff,
maybe the leader

00:16:57.330 --> 00:16:59.770
could be somebody
outside the clique.

00:16:59.770 --> 00:17:01.920
So you can't just
say because there's

00:17:01.920 --> 00:17:04.619
a clique that you can't elect
a leader because you could

00:17:04.619 --> 00:17:08.091
break the symmetry of the graph
with other stuff in the graph.

00:17:08.091 --> 00:17:09.055
Yeah?

00:17:09.055 --> 00:17:11.947
AUDIENCE: What assumptions do
we make to know that for each k,

00:17:11.947 --> 00:17:14.035
they receive the same message?

00:17:14.035 --> 00:17:15.410
NANCY LYNCH:
Because everybody is

00:17:15.410 --> 00:17:18.109
going to send the same message
on the same numbered port,

00:17:18.109 --> 00:17:19.192
because they're identical.

00:17:22.079 --> 00:17:23.933
And one way the ports
can be hooked up,

00:17:23.933 --> 00:17:26.349
and we have to tolerate all
ways they could be hooked up--

00:17:26.349 --> 00:17:28.800
say an adversary
hooks them up-- is

00:17:28.800 --> 00:17:32.430
that port k,
somebody's output port,

00:17:32.430 --> 00:17:36.890
is the other end's
input port numbered k.

00:17:36.890 --> 00:17:38.710
So then they all
receive the same message

00:17:38.710 --> 00:17:41.374
on their port number k.

00:17:41.374 --> 00:17:41.874
Yeah?

00:17:41.874 --> 00:17:43.317
AUDIENCE: Is it actually
possible to always hook up

00:17:43.317 --> 00:17:44.150
the boards that way.

00:17:44.150 --> 00:17:48.480
I mean, it's like wrapped
with three vertices.

00:17:48.480 --> 00:17:51.310
NANCY LYNCH: Well I'm
just doing it for cliques.

00:17:51.310 --> 00:17:53.390
Yeah it is.

00:17:53.390 --> 00:17:54.610
Yeah you could do it.

00:17:54.610 --> 00:17:57.470
I mean you could have port
one always going clockwise,

00:17:57.470 --> 00:18:00.780
and port two going
counterclockwise,

00:18:00.780 --> 00:18:03.560
I mean, there's always a
way to do that in a clique.

00:18:03.560 --> 00:18:06.330
I checked that.

00:18:06.330 --> 00:18:09.240
So what you've just seen is
one of the very basic problems

00:18:09.240 --> 00:18:11.780
for distributed algorithms,
which is breaking symmetry

00:18:11.780 --> 00:18:13.680
among identical processes.

00:18:13.680 --> 00:18:17.850
And you see that deterministic,
indistinguishable processes

00:18:17.850 --> 00:18:19.140
just can't do it.

00:18:19.140 --> 00:18:21.610
So we have to have
something more.

00:18:21.610 --> 00:18:23.100
So what do you
think we could add

00:18:23.100 --> 00:18:24.545
to make this problem solvable?

00:18:27.260 --> 00:18:28.680
AUDIENCE: [INAUDIBLE] processes.

00:18:28.680 --> 00:18:30.174
NANCY LYNCH: I can't hear.

00:18:30.174 --> 00:18:31.090
AUDIENCE: Probability.

00:18:31.090 --> 00:18:33.135
Probability, OK, anything else?

00:18:36.210 --> 00:18:39.320
So we could have the processes
actually distinguishable.

00:18:39.320 --> 00:18:42.710
The common way in this area is
to say that each process has

00:18:42.710 --> 00:18:43.720
an identifier.

00:18:43.720 --> 00:18:47.690
Like, you buy a chip and it's
got some identifier burned in.

00:18:47.690 --> 00:18:50.160
OK so you have some kind
of unique identifiers.

00:18:50.160 --> 00:18:53.520
Or you can use randomness.

00:18:53.520 --> 00:18:57.230
OK for unique
identifiers, you assume

00:18:57.230 --> 00:19:00.430
everybody has some
number or some identifier

00:19:00.430 --> 00:19:01.890
that it knows what it is.

00:19:01.890 --> 00:19:07.050
It's built into its state, let's
say, a special state variable.

00:19:07.050 --> 00:19:08.740
They're totally
ordered, generally.

00:19:08.740 --> 00:19:15.170
They could be integers, or
from some totally ordered set.

00:19:15.170 --> 00:19:16.760
When you say unique
identifiers, is

00:19:16.760 --> 00:19:20.810
it means that different
identifiers could

00:19:20.810 --> 00:19:23.360
appear any place in the graph.

00:19:23.360 --> 00:19:27.430
But each identifier can
appear at most once.

00:19:27.430 --> 00:19:29.870
You can have a huge identifier
space in a small graph.

00:19:29.870 --> 00:19:32.700
But you're Just selecting
some identifiers

00:19:32.700 --> 00:19:36.790
to put in the
processes in the graph.

00:19:36.790 --> 00:19:37.720
So that's one set up.

00:19:37.720 --> 00:19:41.880
And the other one, of
course, is using randomness.

00:19:41.880 --> 00:19:44.930
So let's look at the
unique identifiers first.

00:19:44.930 --> 00:19:46.270
Now the problem becomes easy.

00:19:46.270 --> 00:19:48.330
Let's look at the clique again.

00:19:48.330 --> 00:19:51.970
Suppose there's an
algorithm-- well, let's

00:19:51.970 --> 00:19:53.920
construct an algorithm
that consists

00:19:53.920 --> 00:19:58.760
of deterministic processes
with unique identifiers.

00:19:58.760 --> 00:20:02.250
And we're going to guarantee
to elect a leader in the graph.

00:20:02.250 --> 00:20:03.990
And moreover, it's
just going to take

00:20:03.990 --> 00:20:06.180
one round of communication.

00:20:06.180 --> 00:20:10.210
And it's only going to
use n squared messages.

00:20:10.210 --> 00:20:11.190
How could that work?

00:20:17.160 --> 00:20:20.340
Everybody in this click
has a unique identifier.

00:20:20.340 --> 00:20:22.540
What would they do?

00:20:22.540 --> 00:20:23.400
Send it out, right?

00:20:23.400 --> 00:20:25.860
So you can just send
it on all your ports.

00:20:25.860 --> 00:20:28.250
Everybody would send its
unique identifier on all

00:20:28.250 --> 00:20:29.740
its output ports.

00:20:29.740 --> 00:20:33.540
And then they collect the unique
identifiers from everyone else.

00:20:33.540 --> 00:20:37.360
So everybody sees the
same set of identifiers.

00:20:37.360 --> 00:20:40.870
And so the process with the
maximum unique identifier

00:20:40.870 --> 00:20:43.409
knows that it's the only
one with that identifier.

00:20:43.409 --> 00:20:44.450
And it's the biggest one.

00:20:44.450 --> 00:20:46.120
So it can elect
itself the leader.

00:20:49.250 --> 00:20:51.790
So all you is unique
identifiers and the ability

00:20:51.790 --> 00:20:54.070
to exchange them reliably.

00:20:54.070 --> 00:20:55.930
And you can elect
somebody easily.

00:20:58.810 --> 00:21:03.050
Randomness, well,
various ways to do it.

00:21:03.050 --> 00:21:07.270
But one idea is the processes
could just choose identifiers

00:21:07.270 --> 00:21:08.700
randomly.

00:21:08.700 --> 00:21:13.420
You take a sufficiently large
set of possible identifiers,

00:21:13.420 --> 00:21:16.540
and so if they just choose
uniformly at random,

00:21:16.540 --> 00:21:19.640
they're likely to choose
all different identifiers.

00:21:19.640 --> 00:21:22.590
Once you have these
randomly chosen identifiers

00:21:22.590 --> 00:21:26.770
you could use them like the
really unique identifiers.

00:21:26.770 --> 00:21:29.700
The only thing is you might,
there's a small chance

00:21:29.700 --> 00:21:31.170
that you'll have a duplicate.

00:21:31.170 --> 00:21:34.520
In which case, you want to be
able to detect that and repeat

00:21:34.520 --> 00:21:36.100
this.

00:21:36.100 --> 00:21:40.112
So first of all, how big
the a set do you need?

00:21:40.112 --> 00:21:41.070
Well here's an example.

00:21:43.950 --> 00:21:46.410
Suppose that you have
the n processes choosing

00:21:46.410 --> 00:21:51.390
at random, independently
from a space of size r.

00:21:51.390 --> 00:21:57.030
Identifiers are the
numbers one through r.

00:21:57.030 --> 00:22:01.290
OK and r is going
to depend on n.

00:22:01.290 --> 00:22:03.230
It's going to be like n
squared, but it's also

00:22:03.230 --> 00:22:06.230
going to depend on epsilon,
which is the error probability

00:22:06.230 --> 00:22:08.270
that you're interested in.

00:22:08.270 --> 00:22:11.710
Turns out that n squared over
2 epsilon is good enough.

00:22:11.710 --> 00:22:15.300
OK so you have your IDs
space at least that large.

00:22:15.300 --> 00:22:18.820
And then you can guarantee that
with probability at least 1

00:22:18.820 --> 00:22:22.130
minus epsilon, all the
numbers that everybody chooses

00:22:22.130 --> 00:22:24.342
are different.

00:22:24.342 --> 00:22:25.300
It's a very easy proof.

00:22:25.300 --> 00:22:27.980
The probability-- just look
at two particular processes--

00:22:27.980 --> 00:22:31.050
what's the probability that
they choose the same number?

00:22:31.050 --> 00:22:32.594
It's just 1 over r, right.

00:22:32.594 --> 00:22:34.260
Because they're both
choosing at random.

00:22:34.260 --> 00:22:35.690
The first one chooses something.

00:22:35.690 --> 00:22:37.470
The probability
that the second one

00:22:37.470 --> 00:22:41.080
chooses the same thing
is just 1 over r.

00:22:41.080 --> 00:22:42.600
But now you can
take a union bound,

00:22:42.600 --> 00:22:49.020
just add up the probabilities
of any pair having a duplicate.

00:22:49.020 --> 00:22:52.520
And so you have n square
around n squared over 2 pairs.

00:22:52.520 --> 00:22:57.500
And so multiplying 1 over
r by n squared over 2

00:22:57.500 --> 00:23:00.590
still keeps your probability
less than or equal to epsilon,

00:23:00.590 --> 00:23:02.820
your error probability.

00:23:02.820 --> 00:23:08.740
So you can choose
identifiers using randomness.

00:23:08.740 --> 00:23:11.640
With large enough space,
with very high probability,

00:23:11.640 --> 00:23:15.910
you can get them to
be all different.

00:23:15.910 --> 00:23:17.795
And now here's how
the algorithm works.

00:23:20.460 --> 00:23:24.630
So you get an algorithm that
would finish in only one round,

00:23:24.630 --> 00:23:26.980
with probability
1 minus epsilon.

00:23:26.980 --> 00:23:28.300
But it will be correct.

00:23:28.300 --> 00:23:30.640
And it will have
repeated rounds,

00:23:30.640 --> 00:23:32.840
in case the first
round doesn't work.

00:23:32.840 --> 00:23:35.900
But the expected
time is just 1 over 1

00:23:35.900 --> 00:23:39.130
minus epsilon, not very big.

00:23:39.130 --> 00:23:40.380
What's the algorithm?

00:23:40.380 --> 00:23:43.880
Well processes just choose the
random IDs from the big space,

00:23:43.880 --> 00:23:45.200
like we just said.

00:23:45.200 --> 00:23:47.770
They exchange their Ids.

00:23:47.770 --> 00:23:50.030
And now, everybody
can see everyone's ID,

00:23:50.030 --> 00:23:52.750
but they also can tell
if there's a duplicate.

00:23:52.750 --> 00:23:55.030
if the maximum is not unique.

00:23:55.030 --> 00:23:57.680
So if the maximum is unique,
find the maximum wins.

00:23:57.680 --> 00:23:59.500
And everyone knows that.

00:23:59.500 --> 00:24:01.190
Otherwise you have a problem.

00:24:01.190 --> 00:24:02.190
And you have to repeat.

00:24:02.190 --> 00:24:06.200
And you just keep doing
that until you succeed.

00:24:06.200 --> 00:24:08.650
So this can just
continue, but it's

00:24:08.650 --> 00:24:11.860
likely to finish very fast,
if you have a high likelihood

00:24:11.860 --> 00:24:13.560
of having no duplicates.

00:24:17.310 --> 00:24:20.440
Questions about the
leader election?

00:24:20.440 --> 00:24:23.910
So the story was, it's
impossible without something

00:24:23.910 --> 00:24:27.640
to help you distinguish
some processes.

00:24:27.640 --> 00:24:29.286
You can do it with
unique identifiers.

00:24:29.286 --> 00:24:30.410
You can do with randomness.

00:24:36.680 --> 00:24:42.240
Second problem is called
maximal independent set.

00:24:42.240 --> 00:24:44.820
So you have a picture of
a maximal independent set

00:24:44.820 --> 00:24:47.020
in a graph here.

00:24:47.020 --> 00:24:49.790
Let's try this.

00:24:49.790 --> 00:24:51.120
Yeah cursor.

00:24:51.120 --> 00:24:53.670
So the maximal independent
set in the graph is here.

00:24:53.670 --> 00:24:57.300
But this is something I'll
come back to a minute.

00:24:57.300 --> 00:25:00.010
This is actually a use of
the maximal independent set

00:25:00.010 --> 00:25:02.600
to model what happens
in a certain kind

00:25:02.600 --> 00:25:06.140
of biological system.

00:25:06.140 --> 00:25:07.660
What's a maximal
independence set?

00:25:07.660 --> 00:25:13.750
So you start with a general,
undirected graph network.

00:25:13.750 --> 00:25:18.280
And the problem is to choose
a subset of the nodes so that

00:25:18.280 --> 00:25:21.000
they form what we call
a maximal independent .

00:25:21.000 --> 00:25:22.180
Set let's break that down.

00:25:22.180 --> 00:25:23.430
What does this mean?

00:25:23.430 --> 00:25:26.810
Independent means you don't
have any two neighbors that

00:25:26.810 --> 00:25:30.310
are both in the set.

00:25:30.310 --> 00:25:32.960
So you don't want to get
two neighbors in the set.

00:25:32.960 --> 00:25:37.510
Maximal means that
whatever set you choose,

00:25:37.510 --> 00:25:42.480
you can't add any more nodes
without violating independence.

00:25:42.480 --> 00:25:44.010
So now this should
look something

00:25:44.010 --> 00:25:45.860
like a couple of
homework problems

00:25:45.860 --> 00:25:48.800
that you had from the
beginning and recently.

00:25:48.800 --> 00:25:52.180
But I'm not saying that it's
maximum independent set.

00:25:52.180 --> 00:25:54.420
I'm not saying you have to
have the global, largest

00:25:54.420 --> 00:25:55.970
number of nodes.

00:25:55.970 --> 00:25:58.960
I'm just saying it has
to be a local optimum,

00:25:58.960 --> 00:26:01.850
in the sense that you can't
add any more nodes to your set

00:26:01.850 --> 00:26:05.180
without violating the
independence property.

00:26:05.180 --> 00:26:06.820
Make sense?

00:26:06.820 --> 00:26:09.560
There's two examples,
the same graph,

00:26:09.560 --> 00:26:12.910
two different maximal
independent sets.

00:26:12.910 --> 00:26:18.350
The green nodes, here
we have four green nodes

00:26:18.350 --> 00:26:22.135
that are independent, not
neighbors of each other.

00:26:22.135 --> 00:26:23.760
And they're maximal,
in that I couldn't

00:26:23.760 --> 00:26:26.540
add any of the red
nodes into a set

00:26:26.540 --> 00:26:31.150
without violating the
independence property.

00:26:31.150 --> 00:26:34.080
But then over here, we have a
second maximal independent set

00:26:34.080 --> 00:26:35.850
for the same graph.

00:26:35.850 --> 00:26:39.160
Now we just have two nodes.

00:26:39.160 --> 00:26:41.760
And you can't add
any of the red nodes

00:26:41.760 --> 00:26:44.960
without violating the
independence property.

00:26:44.960 --> 00:26:48.550
In other words, every
node is either in the MIS,

00:26:48.550 --> 00:26:51.810
or has a neighbor in the MIS.

00:26:51.810 --> 00:26:56.620
There's nothing else you can
do to add notes to the MIS

00:26:56.620 --> 00:27:00.175
So the notion of maximal
independence, that make sense?

00:27:04.120 --> 00:27:08.430
All right, so to make this
a distributed problem,

00:27:08.430 --> 00:27:11.490
let's start out assuming we
have no unique identifier.

00:27:11.490 --> 00:27:12.869
Actually, for this
whole problem,

00:27:12.869 --> 00:27:14.660
we're not going to have
unique identifiers.

00:27:14.660 --> 00:27:17.580
They're all going
to be identical.

00:27:17.580 --> 00:27:19.990
The processes do need
one piece of information,

00:27:19.990 --> 00:27:24.010
which is some approximation
to n, the size of the network,

00:27:24.010 --> 00:27:27.160
the total number of vertices.

00:27:27.160 --> 00:27:29.990
So we would like to
have these nodes somehow

00:27:29.990 --> 00:27:35.860
cooperate to compute an MIS
of the entire network graph.

00:27:35.860 --> 00:27:39.570
What that means is every process
should find out whether it

00:27:39.570 --> 00:27:41.380
is in the MIS or not.

00:27:41.380 --> 00:27:43.780
If it is, it should output n.

00:27:43.780 --> 00:27:46.060
And if it's not,
it'll just output out.

00:27:49.110 --> 00:27:51.570
So you don't have to
actually compute this,

00:27:51.570 --> 00:27:53.150
like you're used
to solving problems

00:27:53.150 --> 00:27:55.950
like this, where somebody
has to gather all

00:27:55.950 --> 00:27:57.990
the information in one place.

00:27:57.990 --> 00:27:59.280
Nobody gathers anything.

00:27:59.280 --> 00:28:01.360
Everybody just has to
know whether or not

00:28:01.360 --> 00:28:02.228
they're in the MIS.

00:28:05.880 --> 00:28:07.760
So as you can
imagine, this is going

00:28:07.760 --> 00:28:10.000
to be unsolvable
in certain graphs

00:28:10.000 --> 00:28:14.870
by deterministic algorithms,
by the same kind of symmetry

00:28:14.870 --> 00:28:19.810
breaking problems that you
saw for leader election.

00:28:19.810 --> 00:28:22.320
So we're going to move right
to randomized algorithms

00:28:22.320 --> 00:28:25.400
for this problem.

00:28:25.400 --> 00:28:28.180
Some applications
of distributed MIS,

00:28:28.180 --> 00:28:30.230
well they come up in
communication networks,

00:28:30.230 --> 00:28:32.860
where you want to choose--
let's say you have a very

00:28:32.860 --> 00:28:35.040
dense network of processes.

00:28:35.040 --> 00:28:37.830
You want to choose just
a few nodes, which would

00:28:37.830 --> 00:28:39.770
be like an overlay network.

00:28:39.770 --> 00:28:41.850
You would choose some
nodes who could take charge

00:28:41.850 --> 00:28:44.710
of communication that you can
communicate on this overlay

00:28:44.710 --> 00:28:46.910
network, and then in
the end, each node

00:28:46.910 --> 00:28:50.960
can take care of communicating
with its many neighbors.

00:28:50.960 --> 00:28:54.250
So that's a common
sort of application.

00:28:54.250 --> 00:28:56.670
But it also comes
up in other places.

00:28:56.670 --> 00:28:59.300
A great example is in
developmental biology, where

00:28:59.300 --> 00:29:04.170
a couple of years ago, there
was a paper in Science by Afek,

00:29:04.170 --> 00:29:05.980
Alon-- there's like
eight authors on that.

00:29:05.980 --> 00:29:11.380
But Ziv Bar-Joseph was the
lead author of this paper.

00:29:11.380 --> 00:29:15.730
So the idea is you have a
bunch of cells in a fruit fly.

00:29:15.730 --> 00:29:18.830
And during development,
some of those cells

00:29:18.830 --> 00:29:21.300
are supposed to
distinguish themselves

00:29:21.300 --> 00:29:24.730
as being what's called
sensory organ precursor cells.

00:29:24.730 --> 00:29:28.880
The properties that you
want it that actually, you

00:29:28.880 --> 00:29:31.940
would like a maximal independent
set of the cells to become

00:29:31.940 --> 00:29:34.370
distinguished in this way.

00:29:34.370 --> 00:29:36.800
So they wrote a paper about
it, got published in Science.

00:29:36.800 --> 00:29:39.790
They basically designed a
new distributed algorithm

00:29:39.790 --> 00:29:43.709
that closely mirrored what
happened in the fruit fly

00:29:43.709 --> 00:29:44.500
during development.

00:29:48.420 --> 00:29:52.240
So what I'm going to show you
is a very well-known algorithm,

00:29:52.240 --> 00:29:55.780
a classical algorithm for MIS.

00:29:55.780 --> 00:29:58.690
This is by Michael Luby.

00:29:58.690 --> 00:30:02.070
Very simple algorithm,
it executes in phases.

00:30:02.070 --> 00:30:05.430
Each phase has two realms.

00:30:05.430 --> 00:30:07.690
So you start out with all
the nodes being active.

00:30:07.690 --> 00:30:08.810
They're all involved.

00:30:08.810 --> 00:30:12.060
They don't know what they're
going to end up with.

00:30:12.060 --> 00:30:15.410
And at each phase, some
of the active nodes

00:30:15.410 --> 00:30:18.580
are going to decide
they're in the MIS.

00:30:18.580 --> 00:30:21.970
Some others will decide
they're out of the MIS.

00:30:21.970 --> 00:30:24.400
And some others won't know yet.

00:30:24.400 --> 00:30:27.230
So then you just continue
to the next phase,

00:30:27.230 --> 00:30:30.880
with all the remaining nodes
and the edges between them.

00:30:30.880 --> 00:30:32.670
So you're basically
going to settle

00:30:32.670 --> 00:30:35.150
what happens with some
subset of the nodes,

00:30:35.150 --> 00:30:36.945
and then reduce the
graph and continue.

00:30:39.870 --> 00:30:40.870
So that's the algorithm.

00:30:40.870 --> 00:30:43.000
So what do you do in each phase?

00:30:43.000 --> 00:30:46.115
Here's what an active
node does at a phase.

00:30:46.115 --> 00:30:47.810
Two rounds.

00:30:47.810 --> 00:30:50.930
The first round, it
picks a random value

00:30:50.930 --> 00:30:54.640
in a large space, the same
kind of idea as before.

00:30:54.640 --> 00:30:57.680
This time it's 1 up
2 n to the fifth.

00:30:57.680 --> 00:31:01.790
It sends that random value
to all its neighbors,

00:31:01.790 --> 00:31:06.360
receives the values from all
its still active neighbors,

00:31:06.360 --> 00:31:11.310
and then it just looks to see
if its value is greater than all

00:31:11.310 --> 00:31:13.190
the values it received.

00:31:13.190 --> 00:31:14.450
So then it's a local maximum.

00:31:14.450 --> 00:31:16.830
It has chosen a value
that's strictly greater

00:31:16.830 --> 00:31:19.640
than the values chosen
by all its neighbors.

00:31:19.640 --> 00:31:24.040
So then it decides to join
the MIS and it outputs in.

00:31:24.040 --> 00:31:26.372
But now you want to make
sure none of its neighbors--

00:31:26.372 --> 00:31:27.830
you know that none
of its neighbors

00:31:27.830 --> 00:31:31.200
are going to join
the MIS at round one.

00:31:31.200 --> 00:31:34.040
Because you know this
guy's chosen value

00:31:34.040 --> 00:31:36.930
is larger, strictly larger,
than all its neighbors.

00:31:36.930 --> 00:31:39.450
But now you want to tell them
that they should not join.

00:31:39.450 --> 00:31:40.650
They should be out.

00:31:40.650 --> 00:31:49.080
So if you join the
MIS you're going

00:31:49.080 --> 00:31:54.740
to announce that by sending
messages to all your neighbors.

00:31:54.740 --> 00:32:02.510
And then anybody who
receives an announcement can

00:32:02.510 --> 00:32:05.470
decide it's not going to be
in the MIS and it outputs out.

00:32:05.470 --> 00:32:10.260
Because it knows it has a
neighbor that's in the MIS.

00:32:10.260 --> 00:32:15.050
So if you decided in or out
at this phase, you're done.

00:32:15.050 --> 00:32:16.420
You become inactive.

00:32:16.420 --> 00:32:18.190
And only the
remaining active guys

00:32:18.190 --> 00:32:20.570
continue to the next phase.

00:32:20.570 --> 00:32:21.200
Make sense?

00:32:24.240 --> 00:32:26.220
any questions about how
the algorithm works?

00:32:32.480 --> 00:32:34.020
And animation.

00:32:34.020 --> 00:32:37.770
All right so all the
nodes start out identical.

00:32:37.770 --> 00:32:39.347
They all pick IDs.

00:32:39.347 --> 00:32:40.930
So here's some numbers
that they pick.

00:32:40.930 --> 00:32:45.410
So which nodes are going
to now join the MIS?

00:32:45.410 --> 00:32:50.480
16, and the one that chose 13.

00:32:50.480 --> 00:32:52.230
Good, so they're in the MIS.

00:32:52.230 --> 00:32:55.070
And then at the same phase,
all of their neighbors,

00:32:55.070 --> 00:33:02.750
those for red nodes, are going
to decide to be out of the MIS

00:33:02.750 --> 00:33:04.840
And now you're left with
the remaining four nodes.

00:33:04.840 --> 00:33:07.290
We don't keep going
with the same IDs.

00:33:07.290 --> 00:33:08.140
we start over.

00:33:08.140 --> 00:33:10.840
We want the rounds
to be independent.

00:33:10.840 --> 00:33:14.610
So if they choose
again, they get new IDs.

00:33:14.610 --> 00:33:19.200
And now the guy with the
12 and the guy with the 18

00:33:19.200 --> 00:33:22.180
going to join the
MIS at this phase.

00:33:22.180 --> 00:33:27.280
And their neighbors will
decide not to be in the MIS.

00:33:27.280 --> 00:33:30.800
That leaves us with just one
mode, the guy who had four.

00:33:30.800 --> 00:33:33.240
Next phase, he
chooses another ID.

00:33:33.240 --> 00:33:36.652
But he has no neighbors
so by default,

00:33:36.652 --> 00:33:38.110
he's bigger than
all the neighbors.

00:33:38.110 --> 00:33:39.340
So he just joins the MIS.

00:33:42.719 --> 00:33:43.760
So that's how this works.

00:33:43.760 --> 00:33:45.600
Very simple algorithm,
and it actually

00:33:45.600 --> 00:33:48.240
works to find an
MIS very quickly.

00:33:53.380 --> 00:33:57.150
Why does this give
you independence?

00:33:57.150 --> 00:34:00.310
How do we know that if
this ever terminates,

00:34:00.310 --> 00:34:02.960
if everybody decides, how do
we know that we don't ever

00:34:02.960 --> 00:34:08.440
have two neighbors that
decided to be in the MIS?

00:34:08.440 --> 00:34:09.600
Yeah.

00:34:09.600 --> 00:34:11.580
AUDIENCE: Because once
a node joins the MIS,

00:34:11.580 --> 00:34:14.550
it broadcasts to
its neighbors that--

00:34:14.550 --> 00:34:16.630
NANCY LYNCH: Right.

00:34:16.630 --> 00:34:18.750
The only way you join
the MIS is if you

00:34:18.750 --> 00:34:21.750
have the unique maximum
value in your neighborhood.

00:34:21.750 --> 00:34:25.469
And when you do, all your
neighbors become inactive.

00:34:25.469 --> 00:34:29.020
So you're certainly going
to have independence.

00:34:29.020 --> 00:34:33.199
Maximality, if it
terminates, the final set

00:34:33.199 --> 00:34:37.159
is not going to allow you
to add any more nodes.

00:34:37.159 --> 00:34:37.659
Why?

00:34:37.659 --> 00:34:40.290
Because a node is only
going to become inactive

00:34:40.290 --> 00:34:45.460
if it joins the MIS, or
a neighbor joins the MIS.

00:34:45.460 --> 00:34:47.159
And we just continue
this algorithm

00:34:47.159 --> 00:34:51.080
until all the nodes
become inactive.

00:34:51.080 --> 00:34:55.010
So either the node is in the
MIS or a neighbor is in the MIS.

00:34:55.010 --> 00:34:58.170
So you can't possibly
add any more.

00:34:58.170 --> 00:35:00.350
Yes?

00:35:00.350 --> 00:35:01.970
So this has the
basic correctness

00:35:01.970 --> 00:35:04.590
properties, but what
you're probably wondering,

00:35:04.590 --> 00:35:07.940
is why is this efficient enough?

00:35:07.940 --> 00:35:10.120
Why is it efficient?

00:35:10.120 --> 00:35:13.850
Well we could say that with high
probability, of probability 1,

00:35:13.850 --> 00:35:15.065
it will eventually terminate.

00:35:17.590 --> 00:35:25.020
More quantitative, we can
state this theorem that says,

00:35:25.020 --> 00:35:28.770
with probability at
least 1 minus 1 over n,

00:35:28.770 --> 00:35:33.490
all the nodes decide
within four log n phases.

00:35:33.490 --> 00:35:35.540
Since n is the
number of nodes, this

00:35:35.540 --> 00:35:38.670
doesn't tell us that you get
probability 1 of eventually

00:35:38.670 --> 00:35:39.290
terminating.

00:35:39.290 --> 00:35:42.310
But we can repeat this
and get the same sort

00:35:42.310 --> 00:35:47.520
of bound repeatedly
for successive phases.

00:35:47.520 --> 00:35:50.860
But let's just focus on getting
probability at least 1 minus 1

00:35:50.860 --> 00:35:57.245
over n that all nodes decide
within about four log n phases.

00:36:00.270 --> 00:36:01.680
So let's see what
this is saying.

00:36:01.680 --> 00:36:04.580
You have this big
complicated graph.

00:36:04.580 --> 00:36:08.920
And in one round, for this to
be like log n behavior, what

00:36:08.920 --> 00:36:10.885
has to happen at each phase?

00:36:13.740 --> 00:36:16.120
You have to reduce it by
some constant fraction.

00:36:16.120 --> 00:36:18.650
The number of nodes,
say, should go down.

00:36:18.650 --> 00:36:23.220
So it's sort of how
the proof will go.

00:36:23.220 --> 00:36:25.310
So we start out
with a Lemma saying,

00:36:25.310 --> 00:36:27.660
you're choosing
these IDs at random.

00:36:27.660 --> 00:36:30.584
You want a high probability
that they're all different.

00:36:30.584 --> 00:36:32.500
So we have a lemma like
the one we had before.

00:36:32.500 --> 00:36:35.920
It says, the probability
at least, we use 1 minus 1

00:36:35.920 --> 00:36:38.530
over n squared, in each phase.

00:36:38.530 --> 00:36:41.310
All these phases
up to four log n,

00:36:41.310 --> 00:36:44.650
everybody's choosing a
different random value.

00:36:44.650 --> 00:36:48.880
All the nodes choose different
values at each phase.

00:36:48.880 --> 00:36:51.810
So this lets us
ignore the possibility

00:36:51.810 --> 00:36:54.000
that you have repeats.

00:36:54.000 --> 00:36:55.610
So we'll come back
to that at the end.

00:36:58.147 --> 00:37:00.480
All right, so we're going to
pretend that in each phase,

00:37:00.480 --> 00:37:02.340
all the random
numbers are different.

00:37:04.960 --> 00:37:09.070
So the key idea of this is
to show that the graph has

00:37:09.070 --> 00:37:13.019
to shrink enough at each phase.

00:37:13.019 --> 00:37:14.560
So the way we're
going to say that is

00:37:14.560 --> 00:37:17.500
not in terms of the
nodes, but in terms

00:37:17.500 --> 00:37:18.740
of the number of edges.

00:37:18.740 --> 00:37:22.670
We're going to say at
each phase, the expected

00:37:22.670 --> 00:37:26.140
number of edges that are
live-- why is that shaking?

00:37:31.680 --> 00:37:32.240
OK.

00:37:32.240 --> 00:37:33.760
The expected number
of edges that

00:37:33.760 --> 00:37:38.300
are live at the end of the
phase is at most half the number

00:37:38.300 --> 00:37:41.570
that were live at the
beginning of the phase.

00:37:41.570 --> 00:37:45.410
So an edge is live, if its
endpoints are still live.

00:37:45.410 --> 00:37:48.795
So instead of talking about
reducing the number of nodes

00:37:48.795 --> 00:37:50.170
by a constant
fraction, I'm going

00:37:50.170 --> 00:37:52.550
to reduce the number
of remaining edges

00:37:52.550 --> 00:37:56.710
by constant fraction
of each phase.

00:37:56.710 --> 00:37:58.860
So this is what
I'm going to prove.

00:37:58.860 --> 00:38:02.260
So now I've got only three
slides, but the only three

00:38:02.260 --> 00:38:04.690
slides today that have
calculations on them.

00:38:04.690 --> 00:38:07.300
So probably have
to pay attention,

00:38:07.300 --> 00:38:09.320
if you want to follow
the calculations online.

00:38:09.320 --> 00:38:11.390
So let's see why.

00:38:11.390 --> 00:38:12.560
But the goal is clear?

00:38:12.560 --> 00:38:14.570
We have to reduce
the number of edges

00:38:14.570 --> 00:38:16.750
that remain by a factor of two.

00:38:19.270 --> 00:38:23.470
So this is actually a new
proof of this algorithm's

00:38:23.470 --> 00:38:24.180
performance.

00:38:24.180 --> 00:38:28.150
The proof in the original
papers is pretty complicated.

00:38:28.150 --> 00:38:32.770
This is a very
intuitive, neat proof.

00:38:32.770 --> 00:38:35.020
So the first line
of the proof says

00:38:35.020 --> 00:38:38.820
if you have a node that
has a neighbor that

00:38:38.820 --> 00:38:43.200
chooses a value that's bigger
than all of its own neighbors--

00:38:43.200 --> 00:38:45.006
so u has a neighbor w.

00:38:45.006 --> 00:38:48.760
W chooses a value that's
bigger than all w's neighbors.

00:38:48.760 --> 00:38:49.640
But let's say more.

00:38:49.640 --> 00:38:53.220
Let's say it's also bigger than
all of u's other neighbors,

00:38:53.220 --> 00:38:55.670
besides w.

00:38:55.670 --> 00:38:59.270
So w is really big, bigger
than all w's neighbors,

00:38:59.270 --> 00:39:02.960
bigger than all of
u's other neighbors.

00:39:02.960 --> 00:39:08.240
If that happens, then
what happens to u?

00:39:08.240 --> 00:39:12.160
Well we know that w is going
to decide to join the MIS.

00:39:12.160 --> 00:39:16.260
And u is going to
definitely die,

00:39:16.260 --> 00:39:18.230
is not going to join the MIS.

00:39:18.230 --> 00:39:20.560
Right?

00:39:20.560 --> 00:39:21.060
OK?

00:39:21.060 --> 00:39:23.920
I don't want to lose
people in the first line.

00:39:23.920 --> 00:39:26.310
Question?

00:39:26.310 --> 00:39:27.890
Here's a picture.

00:39:27.890 --> 00:39:28.490
Here's u.

00:39:31.870 --> 00:39:34.800
And it has a neighbor w.

00:39:34.800 --> 00:39:40.780
And let's say that w's chosen
value is greater than all

00:39:40.780 --> 00:39:43.090
of w's neighbors, but
also greater than all

00:39:43.090 --> 00:39:44.785
of u's other neighbors.

00:39:47.510 --> 00:39:49.650
Yes?

00:39:49.650 --> 00:39:53.160
If w has that, w is
going to join the MIS,

00:39:53.160 --> 00:39:56.480
and u is going to
definitely not join the MIS.

00:39:56.480 --> 00:40:00.470
It's going to decide
out in this phase.

00:40:00.470 --> 00:40:02.076
OK so far?

00:40:02.076 --> 00:40:05.830
AUDIENCE: Why does you need
w to have value greater

00:40:05.830 --> 00:40:07.750
than u's neighbors?

00:40:07.750 --> 00:40:10.630
Because if w is greater than
all of its neighbors then it's--

00:40:10.630 --> 00:40:13.630
NANCY LYNCH: --be in the MIS
and u will not be in the MIS.

00:40:13.630 --> 00:40:15.900
And that seems like
it ought to be enough.

00:40:15.900 --> 00:40:19.240
But look at the next line.

00:40:19.240 --> 00:40:21.510
Well the line after this one.

00:40:21.510 --> 00:40:24.960
What's the probability that
w chooses a value like that?

00:40:28.080 --> 00:40:31.570
So if it's going to be bigger
than all u's neighbors, and all

00:40:31.570 --> 00:40:33.450
of w's neighbors,
and keeping in mind

00:40:33.450 --> 00:40:35.320
that they are each
other's neighbors,

00:40:35.320 --> 00:40:37.920
turns out that
there is degree u,

00:40:37.920 --> 00:40:43.410
at most degree u plus degree
w nodes involved here.

00:40:43.410 --> 00:40:45.790
W has to have the
biggest of all of those,

00:40:45.790 --> 00:40:48.810
so it's going to
have the probability

00:40:48.810 --> 00:40:53.060
1 over the number of nodes
of being the biggest one.

00:40:53.060 --> 00:40:55.540
So it's just 1 over
the degree of u

00:40:55.540 --> 00:40:59.180
plus the degree of
w, the probability

00:40:59.180 --> 00:41:01.320
that w will choose
a big enough value.

00:41:06.580 --> 00:41:09.400
But you ask, this
is pessimistic.

00:41:09.400 --> 00:41:13.950
Why don't I just say that w
is bigger than its own values?

00:41:13.950 --> 00:41:15.490
Because I want to
do this next step.

00:41:15.490 --> 00:41:19.180
I want to say the probability
that node u gets killed

00:41:19.180 --> 00:41:24.730
by one of its neighbors, any one
of its neighbors in this phase.

00:41:24.730 --> 00:41:26.640
I can calculate that as the sum.

00:41:29.540 --> 00:41:32.220
The probability that node
u is killed by a neighbor

00:41:32.220 --> 00:41:35.520
is at least the sum over
all of its neighbors.

00:41:35.520 --> 00:41:40.240
You look at all the vertices
in the neighbor set,

00:41:40.240 --> 00:41:43.990
and you add up this fraction.

00:41:43.990 --> 00:41:49.090
So why did I need to make that
additional assumption before?

00:41:49.090 --> 00:41:51.850
That w is greater than
all of u's neighbors,

00:41:51.850 --> 00:41:54.390
as well as all of
its own neighbors.

00:41:54.390 --> 00:41:55.078
Yeah?

00:41:55.078 --> 00:41:56.910
AUDIENCE: So you can
add a problem to--

00:41:56.910 --> 00:41:58.618
NANCY LYNCH: Yeah
because otherwise these

00:41:58.618 --> 00:42:00.840
would be overlapping events.

00:42:00.840 --> 00:42:03.260
But this way I know they're
definitely disjoint events.

00:42:03.260 --> 00:42:06.991
We can't have-- if we
have w and w prime,

00:42:06.991 --> 00:42:08.490
you can't have both
of those holding

00:42:08.490 --> 00:42:12.590
because the requirement for
w is saying that its ID is

00:42:12.590 --> 00:42:16.540
bigger than w prime's ID.

00:42:16.540 --> 00:42:18.790
Because you have
these disjoint events,

00:42:18.790 --> 00:42:21.140
you can just add
the probabilities.

00:42:21.140 --> 00:42:22.520
And you know that
the probability

00:42:22.520 --> 00:42:25.460
that u gets killed
by some neighbor

00:42:25.460 --> 00:42:28.130
is at least this summation.

00:42:28.130 --> 00:42:29.491
OK so far?

00:42:29.491 --> 00:42:30.740
So now I'm going to calculate.

00:42:33.260 --> 00:42:34.870
But I wanted to
focus on the edges.

00:42:34.870 --> 00:42:38.560
So let's see, this tells us a
way that a node can get killed.

00:42:38.560 --> 00:42:42.990
But let's look at what happens
for an edge getting killed.

00:42:42.990 --> 00:42:48.230
This is the probability
that a node is killed.

00:42:48.230 --> 00:42:53.070
So the probability that
an edge dies at this phase

00:42:53.070 --> 00:42:56.180
is at least the maximum
of the probability

00:42:56.180 --> 00:43:02.766
that either of its
two endpoints die.

00:43:02.766 --> 00:43:04.390
And let's just write
it as the average.

00:43:04.390 --> 00:43:06.790
The probability
that an edge dies

00:43:06.790 --> 00:43:09.050
is at least the average
of the probability

00:43:09.050 --> 00:43:11.760
that it's two endpoints
are killed, in this way.

00:43:15.200 --> 00:43:18.010
So for an edge, an edge is
definitely going to die,

00:43:18.010 --> 00:43:20.380
if one of its endpoints dies.

00:43:20.380 --> 00:43:23.130
And then the edge dies if it
dies in this particular way.

00:43:26.110 --> 00:43:29.050
So the probability an edge dies
is at least the probability

00:43:29.050 --> 00:43:32.390
that one of the-- half the
sum of the probabilities

00:43:32.390 --> 00:43:36.130
that the two end points die.

00:43:36.130 --> 00:43:38.490
It's the average probability.

00:43:38.490 --> 00:43:39.950
Makes sense?

00:43:39.950 --> 00:43:42.740
You might have to
read this later.

00:43:42.740 --> 00:43:45.770
So now we can go from that to
the expected number of edges

00:43:45.770 --> 00:43:47.809
that die.

00:43:47.809 --> 00:43:48.350
What is that?

00:43:48.350 --> 00:43:51.110
You just add up, over all,
the edges, the probability

00:43:51.110 --> 00:43:53.250
that the edge dies.

00:43:53.250 --> 00:43:55.710
The expected number
of edges that die

00:43:55.710 --> 00:44:00.910
is at least the sum over all
of the edges of the probability

00:44:00.910 --> 00:44:03.070
that the two endpoints die.

00:44:09.040 --> 00:44:12.170
So you have the sum,
over all of the edges.

00:44:12.170 --> 00:44:13.570
You add up for all the edges.

00:44:13.570 --> 00:44:16.300
The probability that
one endpoint is killed,

00:44:16.300 --> 00:44:20.360
and the probability the
other endpoint is killed.

00:44:20.360 --> 00:44:22.240
So what we have is this
great big summation

00:44:22.240 --> 00:44:26.810
involving now the kill
probabilities for vertices.

00:44:26.810 --> 00:44:29.050
So we have the kill
probability for each vertex.

00:44:29.050 --> 00:44:32.360
How many times does that occur?

00:44:32.360 --> 00:44:38.100
If you have a vertex, u, it
appears once for every edge

00:44:38.100 --> 00:44:39.912
that u is an endpoint of.

00:44:43.200 --> 00:44:47.610
So you have the kill probability
for each node occurring exactly

00:44:47.610 --> 00:44:50.500
it's degree number of times.

00:44:50.500 --> 00:44:53.580
So that lets me rewrite
this in terms of vertices.

00:44:53.580 --> 00:44:58.420
This sum is just 1/2 the
sum over all the nodes

00:44:58.420 --> 00:45:02.580
of the probability that the node
gets killed times its degree.

00:45:05.200 --> 00:45:09.370
So I'm calculating by
replacing the description

00:45:09.370 --> 00:45:11.100
in terms of edges,
by description

00:45:11.100 --> 00:45:13.150
in terms of vertices.

00:45:13.150 --> 00:45:16.780
More or less OK so far?

00:45:16.780 --> 00:45:18.000
So now what do I do?

00:45:18.000 --> 00:45:21.040
Well, I know the probability
that u is killed.

00:45:21.040 --> 00:45:23.900
I have a bound for that
up on the first line.

00:45:23.900 --> 00:45:26.960
So I'm just going
to plug that in.

00:45:26.960 --> 00:45:29.460
So I get 1/2 the sum
over all the nodes,

00:45:29.460 --> 00:45:36.320
the degree of the node times
this summation that gives me

00:45:36.320 --> 00:45:39.092
the kill probability
for that node.

00:45:39.092 --> 00:45:40.550
And now I play
around with the sum.

00:45:40.550 --> 00:45:45.170
I can move the degree
inside the second summation,

00:45:45.170 --> 00:45:48.360
and I get this.

00:45:48.360 --> 00:45:49.750
So now let's stare
at this again.

00:45:49.750 --> 00:45:54.760
I have the sum over all
nodes of the sum over all

00:45:54.760 --> 00:45:58.400
of its neighbors
of some expression.

00:45:58.400 --> 00:46:01.702
But if I'm considering a
node, every note and every one

00:46:01.702 --> 00:46:03.410
of its neighbors,
that's like considering

00:46:03.410 --> 00:46:05.610
all the directed edges.

00:46:05.610 --> 00:46:08.760
I look at every u, and I
look at every edge that

00:46:08.760 --> 00:46:11.600
connects u to something else.

00:46:11.600 --> 00:46:14.890
So I can write it as the sum
over all the directed edges

00:46:14.890 --> 00:46:18.080
of this expression.

00:46:18.080 --> 00:46:20.160
So I get half of
the sum over all the

00:46:20.160 --> 00:46:23.454
directed edges of
this expression.

00:46:23.454 --> 00:46:25.245
But we were talking
about undirected edges.

00:46:28.380 --> 00:46:32.229
And the undirected edges
are being twice here, once

00:46:32.229 --> 00:46:33.020
for each direction.

00:46:35.950 --> 00:46:39.690
I can change this sum to a
sum over undirected edges.

00:46:39.690 --> 00:46:42.270
But now I have the two
endpoints to deal with.

00:46:42.270 --> 00:46:48.390
So I get the degree of u and
the degree of v in the numerator

00:46:48.390 --> 00:46:50.390
because I'm looking at
it from the point of view

00:46:50.390 --> 00:46:53.810
both of the endpoints
of each edge.

00:46:53.810 --> 00:46:55.780
Well something
drops out, so I have

00:46:55.780 --> 00:47:00.860
1/2 the sum over all the
undirected edges of 1.

00:47:00.860 --> 00:47:03.645
So that's 1/2 of the
number of undirected edges.

00:47:06.550 --> 00:47:08.800
So I don't expect you to
get every step of this,

00:47:08.800 --> 00:47:10.850
but it's on three
slides, so you can

00:47:10.850 --> 00:47:14.250
stare at this when you go home
and make sure the steps work.

00:47:14.250 --> 00:47:16.070
But remember the
point of this is

00:47:16.070 --> 00:47:18.090
to show that you reduce
the number of edges

00:47:18.090 --> 00:47:21.640
by a factor of two, and it's
done and sort of a clever way

00:47:21.640 --> 00:47:24.185
by counting the kill
probabilities of vertices.

00:47:30.700 --> 00:47:34.020
So we get this, reducing
the number of edges.

00:47:34.020 --> 00:47:35.910
And now we can just
plug that back in

00:47:35.910 --> 00:47:41.072
to get our complexity bound
for the entire algorithm.

00:47:41.072 --> 00:47:43.280
Remember the original theorem
you're we were to prove

00:47:43.280 --> 00:47:47.740
is a probability bound for
deciding within log n phases.

00:47:47.740 --> 00:47:49.420
Well you should have
a pretty good idea

00:47:49.420 --> 00:47:51.370
of why that works
because if at each phase,

00:47:51.370 --> 00:47:53.120
you're going to reduce
the number of edges

00:47:53.120 --> 00:47:55.800
by around a factor
of two, then it's

00:47:55.800 --> 00:48:00.530
going to take something
like log n phases to finish.

00:48:00.530 --> 00:48:02.120
And I just put a proof sketch.

00:48:04.940 --> 00:48:07.710
The number of edges that are
still alive after four log n

00:48:07.710 --> 00:48:11.010
phases, well you divide
by 2 four log n times,

00:48:11.010 --> 00:48:13.310
so you get down to
practically nothing.

00:48:13.310 --> 00:48:19.690
The probability any edges are
alive at the end is very small.

00:48:19.690 --> 00:48:23.910
So you get a small probability
the algorithm doesn't terminate

00:48:23.910 --> 00:48:26.700
within four log n phases.

00:48:26.700 --> 00:48:29.569
There's an extra little
term I threw in here.

00:48:29.569 --> 00:48:30.610
You might have forgotten.

00:48:30.610 --> 00:48:33.030
There was a term that I needed
for the small probability,

00:48:33.030 --> 00:48:36.090
that somebody chose
duplicate IDs.

00:48:36.090 --> 00:48:37.760
So I'm bringing them
back in at the end,

00:48:37.760 --> 00:48:40.430
in a little union bound.

00:48:40.430 --> 00:48:43.857
And we get our 1 over
n probability this way.

00:48:43.857 --> 00:48:45.940
But the key idea is you
reduce the number of edges

00:48:45.940 --> 00:48:49.150
by half at each stage.

00:48:49.150 --> 00:48:52.790
Enough for you to look at later,
I guess to figure this out

00:48:52.790 --> 00:48:55.670
or you have any
questions about this?

00:48:55.670 --> 00:49:00.190
So that's the last
equations and calculation.

00:49:00.190 --> 00:49:06.470
I'm going to go onto a new
idea, more conceptual stuff.

00:49:06.470 --> 00:49:09.800
Familiar problem,
breadth-first spanning trees,

00:49:09.800 --> 00:49:14.070
setting up breadth-first
paths to every node,

00:49:14.070 --> 00:49:19.130
but we're going to study
it in our new setting.

00:49:19.130 --> 00:49:21.080
We have a connected graph.

00:49:21.080 --> 00:49:24.390
This time, let's suppose that
it has a distinguished vertex,

00:49:24.390 --> 00:49:26.310
like it already has a leader.

00:49:26.310 --> 00:49:28.450
So it has a distinguished
vertex in the graph

00:49:28.450 --> 00:49:31.930
that's going to become
the root of the BFS tree.

00:49:34.920 --> 00:49:37.620
And the processes don't
need any knowledge

00:49:37.620 --> 00:49:39.220
about the graph for this one.

00:49:44.930 --> 00:49:48.490
For the rest of the
time today and Thursday,

00:49:48.490 --> 00:49:51.250
we'll assume the processes
have unique identifiers,

00:49:51.250 --> 00:49:53.460
and I don't think we're
using any probabilities.

00:49:53.460 --> 00:49:56.970
So this is just going to be
using the unique identifiers

00:49:56.970 --> 00:49:59.570
to solve our problems.

00:49:59.570 --> 00:50:02.700
So everybody knows its
own unique identifier.

00:50:02.700 --> 00:50:05.710
The root has a distinguished,
generally known,

00:50:05.710 --> 00:50:08.720
unique identifier say i0.

00:50:08.720 --> 00:50:10.880
And the process that
has i0 knows hey,

00:50:10.880 --> 00:50:13.060
I'm at the root of the graph.

00:50:13.060 --> 00:50:14.380
So the set up make sense?

00:50:17.647 --> 00:50:19.230
We might as well
assume that everybody

00:50:19.230 --> 00:50:21.710
knows the unique identifiers
of their neighbors

00:50:21.710 --> 00:50:23.990
because they could easily
exchange information

00:50:23.990 --> 00:50:27.200
now, and match up who's
connected on which port

00:50:27.200 --> 00:50:28.385
by a unique identifier.

00:50:31.830 --> 00:50:33.499
We'll just do deterministic.

00:50:33.499 --> 00:50:35.540
There'll be a little bit
of non-determinism here.

00:50:35.540 --> 00:50:36.770
I'll say more about that.

00:50:36.770 --> 00:50:42.470
But I'm not going to worry
about probabilities for this.

00:50:42.470 --> 00:50:45.100
Well that told you
about the general setup.

00:50:45.100 --> 00:50:47.880
What are the processes
supposed to do?

00:50:47.880 --> 00:50:50.720
Well they're supposed to compute
a breadth-first spanning tree,

00:50:50.720 --> 00:50:53.210
rooted at vertex v0.

00:50:53.210 --> 00:50:56.140
The branches are
going to be directed

00:50:56.140 --> 00:51:00.040
paths in this undirected
graph, coming from v0.

00:51:00.040 --> 00:51:03.520
Spanning means they should
reach all the vertices.

00:51:03.520 --> 00:51:06.370
And breadth-first means that
if a vertex is at a distance

00:51:06.370 --> 00:51:12.600
d from v0, it will appear at
depth d in this spanning tree.

00:51:12.600 --> 00:51:17.910
So everybody should get a
shortest path from the root.

00:51:17.910 --> 00:51:20.610
Now how are we going to compute
this in a distributed setting?

00:51:20.610 --> 00:51:23.400
Well now the output
of a process is just

00:51:23.400 --> 00:51:26.850
going to be its
parent in the tree.

00:51:26.850 --> 00:51:29.590
So we're not actually going
to compute this tree anywhere

00:51:29.590 --> 00:51:30.680
as a whole.

00:51:30.680 --> 00:51:33.546
Everybody's just going to
know its parent in the tree.

00:51:37.810 --> 00:51:38.420
Questions?

00:51:38.420 --> 00:51:39.340
Problem make sense?

00:51:43.920 --> 00:51:47.694
So this is just an example
of a spanning tree,

00:51:47.694 --> 00:51:48.860
breadth-first spanning tree.

00:51:48.860 --> 00:51:53.000
This gives you shortest
paths to all of the nodes, ,

00:51:53.000 --> 00:51:55.200
shortest in terms of
the number of hops.

00:51:58.600 --> 00:52:01.740
So we can have a very,
very simple algorithm.

00:52:01.740 --> 00:52:06.270
We're going to let the processes
mark themselves as they

00:52:06.270 --> 00:52:08.520
get included in the tree.

00:52:08.520 --> 00:52:12.970
Starts out only the first
process, i0, is marked.

00:52:12.970 --> 00:52:17.470
So do you want to give an idea,
maybe, of how this might work?

00:52:17.470 --> 00:52:19.421
Sketch out-- yeah?

00:52:19.421 --> 00:52:21.504
AUDIENCE: The root will
send out to its neighbors.

00:52:21.504 --> 00:52:22.968
And they will then
mark themselves

00:52:22.968 --> 00:52:25.408
as the parent of
whoever they heard from.

00:52:25.408 --> 00:52:27.369
Then they will--

00:52:27.369 --> 00:52:28.910
NANCY LYNCH: This
is all synchronous.

00:52:28.910 --> 00:52:29.710
So that's great.

00:52:29.710 --> 00:52:31.720
They'll be doing this
in synchronous rounds.

00:52:31.720 --> 00:52:34.030
So everybody will, at
the certain distance,

00:52:34.030 --> 00:52:37.790
is going to get the message
at the right number of rounds

00:52:37.790 --> 00:52:40.320
to mark their distance.

00:52:40.320 --> 00:52:45.000
OK so in round one,
process i0 will

00:52:45.000 --> 00:52:48.550
send a special
message, say search,

00:52:48.550 --> 00:52:50.660
to all of its neighbors.

00:52:50.660 --> 00:52:52.950
And anybody who receives
a message in round one

00:52:52.950 --> 00:52:57.320
will mark itself,
decide i0 is its parent,

00:52:57.320 --> 00:53:01.850
could output that i0 is
my parent, parent i0.

00:53:01.850 --> 00:53:03.970
And then it can get
ready for the next round,

00:53:03.970 --> 00:53:09.210
when it's supposed to
send to continue this.

00:53:09.210 --> 00:53:13.000
So at later rounds, if you
decided you're going to send,

00:53:13.000 --> 00:53:16.070
if you know you're supposed to
send from the previous round,

00:53:16.070 --> 00:53:20.000
then you send a search message
to all of your neighbors.

00:53:20.000 --> 00:53:22.530
Now the process is
sitting there and it

00:53:22.530 --> 00:53:25.040
receives a search message.

00:53:25.040 --> 00:53:30.340
If he's already marked, then he
should just ignore the message.

00:53:30.340 --> 00:53:32.430
Once you're included
in the tree,

00:53:32.430 --> 00:53:35.160
you don't care if you
get other messages,

00:53:35.160 --> 00:53:37.840
search messages on other paths.

00:53:37.840 --> 00:53:41.170
So you only do anything
if you're not yet marked

00:53:41.170 --> 00:53:43.120
and you receive a message.

00:53:43.120 --> 00:53:44.855
And in that case, then
you mark yourself.

00:53:48.020 --> 00:53:49.970
Then you mark
yourself, and then you

00:53:49.970 --> 00:53:53.980
choose one of your neighbors
as to be your parent.

00:53:53.980 --> 00:53:55.920
Now because this
is synchronous, you

00:53:55.920 --> 00:53:58.970
have several nodes that could
be sending at the same time.

00:53:58.970 --> 00:54:02.280
So one node could be
receiving search messages

00:54:02.280 --> 00:54:05.040
from several different
neighbors at once.

00:54:05.040 --> 00:54:07.660
Well, it wants to choose
one of them as its parent,

00:54:07.660 --> 00:54:10.380
doesn't matter which
one it chooses.

00:54:10.380 --> 00:54:13.000
So it can just choose
nondeterminstically just

00:54:13.000 --> 00:54:15.160
arbitrarily.

00:54:15.160 --> 00:54:19.932
And then it decides that it
will send the next round.

00:54:19.932 --> 00:54:21.170
Is the algorithm clear?

00:54:26.770 --> 00:54:29.380
So there's, I mentioned, a
little bit of nondeterministic

00:54:29.380 --> 00:54:31.970
here, only in that a process
can choose arbitrarily

00:54:31.970 --> 00:54:34.120
among several possible parents.

00:54:36.770 --> 00:54:38.490
And then we could
put in a default,

00:54:38.490 --> 00:54:40.830
saying that it chooses the
one with the smallest ID,

00:54:40.830 --> 00:54:43.170
if we really want to
make it deterministic.

00:54:43.170 --> 00:54:45.542
But it's also OK to leave
distributed algorithms

00:54:45.542 --> 00:54:46.250
nondeterministic.

00:54:49.690 --> 00:54:51.530
And here I should
make a remark that

00:54:51.530 --> 00:54:54.230
shows how differently
nondeterminism

00:54:54.230 --> 00:54:56.910
is regarded in the
distributed setting,

00:54:56.910 --> 00:55:00.520
from the way it is for
sequential algorithms.

00:55:00.520 --> 00:55:03.410
For distributed algorithms,
there can be many options.

00:55:03.410 --> 00:55:04.840
And maybe they're all OK.

00:55:04.840 --> 00:55:07.560
But the algorithm is
supposed to work correctly,

00:55:07.560 --> 00:55:12.960
no matter how you resolve
the nondeterministic choices.

00:55:12.960 --> 00:55:15.850
So think about like
np, and the other ways

00:55:15.850 --> 00:55:18.160
that you've seen
nondeterminism so far.

00:55:18.160 --> 00:55:21.390
There you say you're lucky if
there is a path to a choice.

00:55:21.390 --> 00:55:24.200
Here when you make a
nondeterministic choice,

00:55:24.200 --> 00:55:26.590
or when the algorithm
behaves nondeterministically,

00:55:26.590 --> 00:55:28.330
all the choices are
supposed to work.

00:55:28.330 --> 00:55:30.890
It's like all the paths have to
come up with correct answers.

00:55:30.890 --> 00:55:32.259
Do you have a question?

00:55:32.259 --> 00:55:34.384
AUDIENCE: Yes, whenever
there's a sub- [INAUDIBLE],

00:55:34.384 --> 00:55:36.740
whenever there's
a race condition,

00:55:36.740 --> 00:55:38.740
we locally assume that
there wasn't a difference

00:55:38.740 --> 00:55:41.160
in local computation time.

00:55:41.160 --> 00:55:42.830
But if there is, even
in the slightest,

00:55:42.830 --> 00:55:45.330
then they would get a parent
[INAUDIBLE] before another one,

00:55:45.330 --> 00:55:47.129
it would still be a valid--

00:55:47.129 --> 00:55:48.670
NANCY LYNCH: So the
synchronous model

00:55:48.670 --> 00:55:50.030
is more abstract than that.

00:55:50.030 --> 00:55:52.540
You don't model the
local computation time.

00:55:52.540 --> 00:55:54.660
You're moving more toward
an asynchronous model,

00:55:54.660 --> 00:55:58.280
where the steps can take
differing amounts of time.

00:55:58.280 --> 00:56:01.250
Here we just assume you have
an abstract model, where

00:56:01.250 --> 00:56:04.280
everybody does stuff
at once, in each round.

00:56:04.280 --> 00:56:05.900
But you still have
nondeterminism

00:56:05.900 --> 00:56:11.560
because they can all arrive
at the same round somewhere.

00:56:11.560 --> 00:56:12.240
But it's OK.

00:56:12.240 --> 00:56:14.100
You can pick any one
and it still works.

00:56:17.560 --> 00:56:20.600
So it should be not hard to
see that this does give you

00:56:20.600 --> 00:56:23.830
a BFS tree because you're
creating all the branches

00:56:23.830 --> 00:56:25.040
synchronously.

00:56:25.040 --> 00:56:28.790
And you're growing
one hop at each round.

00:56:28.790 --> 00:56:30.690
It reaches all the
nodes eventually

00:56:30.690 --> 00:56:32.080
because the graph is connected.

00:56:32.080 --> 00:56:36.400
And everybody sends messages
once a node get marked.

00:56:36.400 --> 00:56:38.520
It sends messages
to its neighbors.

00:56:38.520 --> 00:56:40.400
So eventually, the
markings are going

00:56:40.400 --> 00:56:46.640
to reach all the neighbors,
all the nodes in the graph.

00:56:46.640 --> 00:56:50.970
So here's how you get the
example I showed before,

00:56:50.970 --> 00:56:53.460
simple breadth-first search.

00:56:53.460 --> 00:56:57.270
That's a search message
sent by this guy.

00:56:57.270 --> 00:56:59.360
I put it to the
right of the edge

00:56:59.360 --> 00:57:02.990
to indicate-- it's kind
of hard to distinguish.

00:57:02.990 --> 00:57:04.880
But I put them on
the right of the edge

00:57:04.880 --> 00:57:06.750
from the point of
view of the sender.

00:57:06.750 --> 00:57:09.770
So he sends a search message.

00:57:09.770 --> 00:57:10.650
it gets there.

00:57:10.650 --> 00:57:13.720
This arrow just indicates
that it reached the other end.

00:57:13.720 --> 00:57:16.160
And this guy has
chosen the sender,

00:57:16.160 --> 00:57:19.790
which is the other direction
on the arrow, as its parent.

00:57:19.790 --> 00:57:25.540
Now the recipient is going
to send some search messages.

00:57:25.540 --> 00:57:28.370
So he sends four of them.

00:57:28.370 --> 00:57:29.820
They all get to the other end.

00:57:29.820 --> 00:57:32.770
And OK, so all these
guys now get marked.

00:57:32.770 --> 00:57:36.230
They're included
in the BFS tree.

00:57:36.230 --> 00:57:40.000
And now the next round,
they all send some messages.

00:57:40.000 --> 00:57:44.270
I'm not putting in the messages
where somebody would send back

00:57:44.270 --> 00:57:45.970
to a guy who sent to him.

00:57:45.970 --> 00:57:47.810
But I put in all the others.

00:57:47.810 --> 00:57:51.350
Some of them are
going to be ignored.

00:57:51.350 --> 00:57:53.820
But you do get to a
few new nodes this way.

00:57:53.820 --> 00:57:55.460
That's round three.

00:57:55.460 --> 00:57:57.900
Round four, everybody sends.

00:57:57.900 --> 00:58:00.490
And now you have all
the nodes included.

00:58:03.250 --> 00:58:05.540
So this gives you
the spanning tree

00:58:05.540 --> 00:58:07.970
that I showed at the
beginning of this topic.

00:58:12.450 --> 00:58:14.650
This is not a very
complicated algorithm.

00:58:14.650 --> 00:58:17.830
But I think you can see
that things can get worse.

00:58:17.830 --> 00:58:22.820
And you want to argue about why
the algorithms work correctly.

00:58:22.820 --> 00:58:25.970
So as I said before,
a popular method

00:58:25.970 --> 00:58:28.300
of reasoning about
the algorithms

00:58:28.300 --> 00:58:30.300
is to state invariance.

00:58:30.300 --> 00:58:32.010
So here, suppose
I want to describe

00:58:32.010 --> 00:58:35.525
the state of the entire
network, after some number, r,

00:58:35.525 --> 00:58:37.921
of rounds.

00:58:37.921 --> 00:58:39.170
what could you say about that?

00:58:39.170 --> 00:58:41.400
What's the case after r
rounds of this algorithm?

00:58:49.010 --> 00:58:50.483
Yeah.

00:58:50.483 --> 00:58:53.429
AUDIENCE: All nodes at
distance r from the root

00:58:53.429 --> 00:58:55.260
have been marked.

00:58:55.260 --> 00:58:58.280
NANCY LYNCH: All the nodes
at distance r from the root

00:58:58.280 --> 00:58:59.530
have been marked.

00:58:59.530 --> 00:59:03.000
In fact, only those by
round r, only the ones

00:59:03.000 --> 00:59:06.330
with distances up through
r have been marked.

00:59:06.330 --> 00:59:09.350
So to state the invariance, if
you want to state invariance,

00:59:09.350 --> 00:59:12.540
I have to say what's in
the state of the processes.

00:59:12.540 --> 00:59:14.770
So all right, what can we say?

00:59:14.770 --> 00:59:18.570
So the process has a Boolean
that says whether or not

00:59:18.570 --> 00:59:19.740
it's marked.

00:59:19.740 --> 00:59:23.570
It has a place to
record a parent.

00:59:23.570 --> 00:59:29.150
And it has someplace
where it puts information

00:59:29.150 --> 00:59:30.750
about whether it's
supposed to send

00:59:30.750 --> 00:59:33.100
a message at the next round.

00:59:33.100 --> 00:59:36.180
And we also should
know its UID, so I'll

00:59:36.180 --> 00:59:38.800
put that in another
state variable.

00:59:38.800 --> 00:59:43.570
So here is something I
can say in invariance.

00:59:43.570 --> 00:59:48.920
At the end of r rounds, as you
said, at the end of r rounds

00:59:48.920 --> 00:59:52.390
exactly the processes
at distance at most r

00:59:52.390 --> 00:59:57.511
from the source node, the
root node, are marked.

00:59:57.511 --> 00:59:58.510
I can say a little more.

00:59:58.510 --> 01:00:02.750
I can say a process has its
parents defined if and only

01:00:02.750 --> 01:00:04.390
if it's marked.

01:00:04.390 --> 01:00:05.640
So it doesn't just get market.

01:00:05.640 --> 01:00:08.050
It also computes a
parent, and the parent

01:00:08.050 --> 01:00:13.030
gets computed at the point
where it gets marked.

01:00:13.030 --> 01:00:15.950
Then I should say that
the parent is correct.

01:00:15.950 --> 01:00:21.400
So for any process that's at
distance d from the source,

01:00:21.400 --> 01:00:23.340
if the parent is
defined, then it's

01:00:23.340 --> 01:00:26.410
in fact the UID of a process
at distance d minus 1

01:00:26.410 --> 01:00:28.670
from the source.

01:00:28.670 --> 01:00:30.220
So that says it's
actually getting

01:00:30.220 --> 01:00:33.590
a correct breadth-first tree.

01:00:33.590 --> 01:00:36.890
It's getting the parent
on a shortest path.

01:00:36.890 --> 01:00:37.566
Yeah?

01:00:37.566 --> 01:00:39.946
AUDIENCE: Do these invariants
[INAUDIBLE] for i0?

01:00:42.810 --> 01:00:44.460
NANCY LYNCH:
Distance 0 is marked.

01:00:47.090 --> 01:00:52.200
i0 doesn't ever-- I
see what you're saying.

01:00:52.200 --> 01:00:54.330
i0 doesn't have a parent.

01:00:54.330 --> 01:00:56.910
So I guess that we
should say for i

01:00:56.910 --> 01:01:01.200
not equal to i0 in this case.

01:01:01.200 --> 01:01:03.310
So this would be a
process other than i0.

01:01:03.310 --> 01:01:04.886
It would have its
parent defined,

01:01:04.886 --> 01:01:06.010
if and only if it's marked.

01:01:06.010 --> 01:01:09.180
Well as I think
you just noticed,

01:01:09.180 --> 01:01:12.000
the root node is marked but
it doesn't have a parent.

01:01:12.000 --> 01:01:15.240
So it's an exception.

01:01:15.240 --> 01:01:19.500
But this should be,
this doesn't involve i0.

01:01:19.500 --> 01:01:22.777
So the second one, I
can fix that a bit.

01:01:22.777 --> 01:01:23.860
Other comments, questions?

01:01:27.890 --> 01:01:31.000
So if somebody wanted to
do a formal correctness

01:01:31.000 --> 01:01:33.040
proof of an algorithm
like this one,

01:01:33.040 --> 01:01:34.637
you would use these invariants.

01:01:34.637 --> 01:01:35.720
You prove it by induction.

01:01:35.720 --> 01:01:37.510
In fact there's
quite a few people

01:01:37.510 --> 01:01:43.030
who use interactive theorem
provers to do proofs

01:01:43.030 --> 01:01:46.480
like this because the algorithms
can get pretty complicated,

01:01:46.480 --> 01:01:48.520
with a lot of variables.

01:01:48.520 --> 01:01:50.470
So you have to do
some bookkeeping.

01:01:50.470 --> 01:01:52.460
You keep track of
all these invariants,

01:01:52.460 --> 01:01:55.880
and then you want to prove that
they're all true by induction.

01:01:55.880 --> 01:01:58.440
They all hold through
an inductive step.

01:01:58.440 --> 01:02:00.780
So you can use an
interactive theorem prover

01:02:00.780 --> 01:02:03.540
to help you do the bookkeeping.

01:02:03.540 --> 01:02:06.390
But even a manual proof
in a research paper

01:02:06.390 --> 01:02:08.790
would use invariance
in this style.

01:02:12.000 --> 01:02:14.940
OK complexity.

01:02:14.940 --> 01:02:19.660
So the number of rounds until
everybody outputs their parent

01:02:19.660 --> 01:02:23.440
would be the maximum
distance of any node from v0.

01:02:23.440 --> 01:02:25.780
So we can say that's at most
the diameter of the graph.

01:02:25.780 --> 01:02:26.620
It could be less.

01:02:26.620 --> 01:02:28.030
It's just is the
maximum distance

01:02:28.030 --> 01:02:31.440
from this particular node.

01:02:31.440 --> 01:02:33.020
Message complexity?

01:02:33.020 --> 01:02:38.140
Well how many messages are
sent in this algorithm?

01:02:38.140 --> 01:02:40.290
So everybody is going
to send messages

01:02:40.290 --> 01:02:43.880
only once on all of its edges.

01:02:43.880 --> 01:02:47.320
So that means all the
edges get a message sent

01:02:47.320 --> 01:02:48.714
in each direction just once.

01:02:48.714 --> 01:02:50.255
So it's order of
the number of edges.

01:02:55.360 --> 01:02:58.720
All right, so we can
play around with this.

01:02:58.720 --> 01:03:01.560
So this algorithm just tells
everybody who his parent is.

01:03:01.560 --> 01:03:03.280
But maybe when you're
finished, you'd

01:03:03.280 --> 01:03:05.460
like to who your
children are as well.

01:03:08.400 --> 01:03:10.380
For many uses of
these trees, you'd

01:03:10.380 --> 01:03:14.040
like to have a parent be
able to talk to its children

01:03:14.040 --> 01:03:15.250
in the tree.

01:03:15.250 --> 01:03:16.140
So how to do that?

01:03:16.140 --> 01:03:20.080
Well you can add a child
pointer because anybody

01:03:20.080 --> 01:03:22.520
who gets a search message
and selects its parents

01:03:22.520 --> 01:03:24.830
could send back a message
to that parents saying, hey,

01:03:24.830 --> 01:03:26.330
I'm your child.

01:03:26.330 --> 01:03:29.330
And if you get a search message,
and you decide that that's not

01:03:29.330 --> 01:03:31.760
your parent, you can
help that guy out

01:03:31.760 --> 01:03:34.407
by sending a message saying
you're not my parent.

01:03:34.407 --> 01:03:35.990
In the synchronous
case, he would just

01:03:35.990 --> 01:03:37.864
know that, if he didn't
get a parent message.

01:03:37.864 --> 01:03:40.970
But things are going to
get more complicated.

01:03:40.970 --> 01:03:43.517
So we'll send parents
or non parent responses

01:03:43.517 --> 01:03:44.475
to the search messages.

01:03:49.770 --> 01:03:52.300
Suppose we want to compute
the distances from v0,

01:03:52.300 --> 01:03:53.630
not just to the parents are.

01:03:53.630 --> 01:03:55.310
Well that's easy.

01:03:55.310 --> 01:03:58.190
Everybody can just record
its distances, as well as

01:03:58.190 --> 01:04:01.670
its parent and the mark.

01:04:01.670 --> 01:04:04.670
And then you just include
your own distance value

01:04:04.670 --> 01:04:06.170
in your search message.

01:04:06.170 --> 01:04:09.340
And when somebody
receives a search message,

01:04:09.340 --> 01:04:13.820
it sets its own distance to
the received distance plus 1.

01:04:13.820 --> 01:04:17.750
So we can just keep track
and add one to the distance.

01:04:17.750 --> 01:04:20.380
It's easy to augment
this algorithm

01:04:20.380 --> 01:04:21.630
to get this extra information.

01:04:24.630 --> 01:04:26.380
All right, now how
do the processes know

01:04:26.380 --> 01:04:27.463
when this is all finished?

01:04:30.140 --> 01:04:32.011
So everybody was able
to output parent.

01:04:32.011 --> 01:04:33.010
I know who my parent is.

01:04:33.010 --> 01:04:36.870
But how does anybody
know when the entire tree

01:04:36.870 --> 01:04:39.770
has been produced?

01:04:39.770 --> 01:04:42.820
Not so obvious.

01:04:42.820 --> 01:04:46.270
So in some settings, you
might know an upper bound

01:04:46.270 --> 01:04:48.350
on the depth of the tree.

01:04:48.350 --> 01:04:51.477
And then you could just wait
for that number of rounds.

01:04:51.477 --> 01:04:52.810
But what if you don't know that?

01:04:52.810 --> 01:04:54.476
You don't know anything
about the graph.

01:04:54.476 --> 01:04:57.360
Nobody knows.

01:04:57.360 --> 01:04:59.460
So let's come up
with an algorithm

01:04:59.460 --> 01:05:04.660
for process i0, the root,
to know definitively

01:05:04.660 --> 01:05:07.700
that the tree has been
completely constructed.

01:05:07.700 --> 01:05:08.200
Ideas?

01:05:14.100 --> 01:05:15.890
You're creating this
by search messages.

01:05:15.890 --> 01:05:17.889
How is i0 going to
know when its done?

01:05:25.872 --> 01:05:26.372
Yeah.

01:05:26.372 --> 01:05:27.869
AUDIENCE: Every time
you mark a node,

01:05:27.869 --> 01:05:29.865
the node can send a
message back to its parent,

01:05:29.865 --> 01:05:30.863
saying hi, I've been marked.

01:05:30.863 --> 01:05:33.154
Then you can probably get
all the way back to the root.

01:05:33.154 --> 01:05:36.601
And then the root can count
the number of-- actually,

01:05:36.601 --> 01:05:37.860
no if the root doesn't--

01:05:37.860 --> 01:05:39.985
NANCY LYNCH: Root doesn't
know the number of nodes.

01:05:39.985 --> 01:05:41.543
So that's a good idea.

01:05:41.543 --> 01:05:43.042
AUDIENCE: If you
don't have a child,

01:05:43.042 --> 01:05:45.507
you can tell your parent
that you don't have a child.

01:05:48.545 --> 01:05:49.920
NANCY LYNCH: That's
a good start.

01:05:49.920 --> 01:05:51.230
Was there another?

01:05:51.230 --> 01:05:51.737
Yeah.

01:05:51.737 --> 01:05:53.153
AUDIENCE: More
generally, you just

01:05:53.153 --> 01:05:55.885
send a signal when you
know your sub-tree is done.

01:05:55.885 --> 01:05:58.010
NANCY LYNCH: When you know
you're sub-tree is done,

01:05:58.010 --> 01:06:00.770
so that means you're going
to be communicating something

01:06:00.770 --> 01:06:02.170
up the tree.

01:06:02.170 --> 01:06:05.640
Right, so that's the idea
that you're working toward.

01:06:05.640 --> 01:06:08.980
So a termination
algorithm to inform i0

01:06:08.980 --> 01:06:11.550
when the tree is
completely constructed.

01:06:11.550 --> 01:06:15.080
So let's say that the search
messages get their responses.

01:06:15.080 --> 01:06:17.700
So everybody knows
which nodes are their,

01:06:17.700 --> 01:06:22.290
which neighbors are its
children, and which are not.

01:06:22.290 --> 01:06:24.810
So suppose a node
has gotten responses

01:06:24.810 --> 01:06:30.830
to all of its search messages,
knows who all its children are.

01:06:30.830 --> 01:06:33.260
Now the leaves in
this tree are going

01:06:33.260 --> 01:06:34.730
to know that they're leaves.

01:06:34.730 --> 01:06:37.880
How do they know that?

01:06:37.880 --> 01:06:41.860
Propagating all these search
messages, and I'm a leaf.

01:06:41.860 --> 01:06:43.524
How do I know I'm a leaf?

01:06:43.524 --> 01:06:44.940
AUDIENCE: You can't
have children.

01:06:44.940 --> 01:06:47.410
NANCY LYNCH: Yeah, you send
all these search messages,

01:06:47.410 --> 01:06:51.810
and everybody says, sorry
you're not my parent.

01:06:51.810 --> 01:06:54.390
So you know you have no
children because of the kind

01:06:54.390 --> 01:06:57.140
of responses you get.

01:06:57.140 --> 01:06:58.540
So now we're going
to use what we

01:06:58.540 --> 01:07:01.300
call a convergecast strategy.

01:07:01.300 --> 01:07:03.160
Broadcast is sending things out.

01:07:03.160 --> 01:07:06.320
Convergecast is fanning
in information back

01:07:06.320 --> 01:07:09.560
to the top of the tree.

01:07:09.560 --> 01:07:11.800
So the convergecast
would say, all right,

01:07:11.800 --> 01:07:15.200
so the leaves would send
a message to their parents

01:07:15.200 --> 01:07:18.060
saying they're done.

01:07:18.060 --> 01:07:23.600
Now if I'm some node in
the middle of the tree,

01:07:23.600 --> 01:07:24.780
how do I know I'm done?

01:07:24.780 --> 01:07:28.420
Well it's what you said.

01:07:28.420 --> 01:07:31.750
You know that you
can figure out when

01:07:31.750 --> 01:07:33.990
your entire sub-tree is done.

01:07:33.990 --> 01:07:37.750
Well first of all, you have
to know your children are.

01:07:37.750 --> 01:07:39.760
It's kind of a
two stage process.

01:07:39.760 --> 01:07:42.610
You have to know who
your children are,

01:07:42.610 --> 01:07:46.530
by having received responses
to all your search messages.

01:07:46.530 --> 01:07:49.410
And you wait to receive
done messages from all

01:07:49.410 --> 01:07:51.247
of your actual children.

01:07:51.247 --> 01:07:53.080
So if I'm sitting in
the middle of the tree,

01:07:53.080 --> 01:07:56.020
and I've got done messages
from all my children,

01:07:56.020 --> 01:07:57.850
I know my whole
sub-tree is done.

01:07:57.850 --> 01:08:02.140
Then I can send the done
message to my parent.

01:08:02.140 --> 01:08:04.180
Got that?

01:08:04.180 --> 01:08:05.870
That's how convergecast works.

01:08:05.870 --> 01:08:09.690
And when it reaches
the top, if i0

01:08:09.690 --> 01:08:12.580
knows who its children are,
and it receives done messages

01:08:12.580 --> 01:08:15.540
from all its children, it
knows the whole tree is done.

01:08:15.540 --> 01:08:20.100
So it can output that the
tree construction is complete.

01:08:20.100 --> 01:08:22.420
And it could tell
the others by sending

01:08:22.420 --> 01:08:25.859
a message down the tree,
so they all know as well.

01:08:25.859 --> 01:08:27.194
Questions?

01:08:27.194 --> 01:08:32.674
AUDIENCE: Wouldn't i0
be the last one to know?

01:08:32.674 --> 01:08:34.090
NANCY LYNCH: He'd
be the last one.

01:08:34.090 --> 01:08:37.390
No, he'd be the first one to
know that the whole tree is

01:08:37.390 --> 01:08:38.670
complete.

01:08:38.670 --> 01:08:41.450
Everybody else knows when
their sub-tree is complete.

01:08:41.450 --> 01:08:45.410
So i0 still has to now send
another message down the tree

01:08:45.410 --> 01:08:48.276
to tell everyone else the
entire tree is complete.

01:08:48.276 --> 01:08:49.359
Is there another question?

01:08:52.289 --> 01:08:53.830
All right so this
isn't showing that.

01:08:53.830 --> 01:08:56.279
This is just showing done
messages, which are actually

01:08:56.279 --> 01:08:58.560
going in the opposite
direction from these edges,

01:08:58.560 --> 01:08:59.390
going up the tree.

01:08:59.390 --> 01:09:02.060
But you can just see
how they propagate up

01:09:02.060 --> 01:09:04.670
until the roots says done.

01:09:04.670 --> 01:09:05.180
No big deal.

01:09:08.149 --> 01:09:10.819
Complexity for termination.

01:09:10.819 --> 01:09:14.130
Well it just takes at most
diameter rounds and n messages

01:09:14.130 --> 01:09:16.880
for this done information
to come up to the top,

01:09:16.880 --> 01:09:19.229
once the tree
actually is finished.

01:09:19.229 --> 01:09:21.130
Because now you're
just sending messages

01:09:21.130 --> 01:09:25.130
on the paths in this
tree, which are only,

01:09:25.130 --> 01:09:29.029
at most, diameter in length.

01:09:29.029 --> 01:09:32.920
And this is just the process
i0 can tell everybody else.

01:09:32.920 --> 01:09:34.540
It doesn't take
very long either.

01:09:37.260 --> 01:09:41.149
Applications, well suppose you
construct a tree like this.

01:09:41.149 --> 01:09:44.460
And process i0 now wants
to use it to communicate.

01:09:44.460 --> 01:09:46.450
It wants to send a
whole batch of messages

01:09:46.450 --> 01:09:47.819
to all the other nodes.

01:09:47.819 --> 01:09:49.990
It can just send
them now on the tree.

01:09:49.990 --> 01:09:52.790
It's an easy way to
make sure messages reach

01:09:52.790 --> 01:09:54.240
everybody else in the network.

01:09:54.240 --> 01:09:57.310
Just send them on the edges
of the breadth-first spanning

01:09:57.310 --> 01:09:59.580
tree.

01:09:59.580 --> 01:10:03.610
So now the messages,
each individual message

01:10:03.610 --> 01:10:07.370
takes at most n
message instances

01:10:07.370 --> 01:10:09.650
along the edges of the
tree, because you only have

01:10:09.650 --> 01:10:11.570
to traverse the tree edges.

01:10:11.570 --> 01:10:15.920
No more dependence on the total
number of edges in the network.

01:10:15.920 --> 01:10:19.000
And in fact, you can
save time by pipelining

01:10:19.000 --> 01:10:20.280
a series of messages.

01:10:20.280 --> 01:10:23.410
So you can send them one
round after the other.

01:10:28.180 --> 01:10:31.740
The other way, suppose you want
to compute something globally.

01:10:31.740 --> 01:10:34.870
Suppose everybody starts
with some initial value.

01:10:34.870 --> 01:10:38.590
And process i0 is going
to try to determine

01:10:38.590 --> 01:10:42.650
the value of some function
of everybody's initial value,

01:10:42.650 --> 01:10:46.530
like the minimum or maximum
or the sum or anything.

01:10:46.530 --> 01:10:48.990
Well you can do this
while convergecasting

01:10:48.990 --> 01:10:52.910
on an already built BFS tree.

01:10:52.910 --> 01:10:56.470
So everybody can just send
their information up the tree,

01:10:56.470 --> 01:10:58.290
and i0 can collect it all.

01:10:58.290 --> 01:11:00.933
In general, you
can accumulate, you

01:11:00.933 --> 01:11:04.610
can do data aggregation as you
go up the paths of the tree.

01:11:04.610 --> 01:11:09.910
So the message size
doesn't blow up.

01:11:09.910 --> 01:11:13.520
So if you want, for example,
the sum of everybody's values,

01:11:13.520 --> 01:11:16.260
everybody just sends their
values up in a convergecast.

01:11:16.260 --> 01:11:18.890
And each node computes
the sum of all the values

01:11:18.890 --> 01:11:21.550
in its sub-tree.

01:11:21.550 --> 01:11:24.100
So this is pretty efficient.

01:11:24.100 --> 01:11:26.722
Make sense?

01:11:26.722 --> 01:11:27.680
I'm going to skip this.

01:11:27.680 --> 01:11:30.110
But you could do leader
election in a general graph,

01:11:30.110 --> 01:11:32.470
If you don't have
a leader, already,

01:11:32.470 --> 01:11:35.550
i0 by having everybody
run a breadth-first search

01:11:35.550 --> 01:11:36.160
in parallel.

01:11:36.160 --> 01:11:37.359
But we'll skip that.

01:11:37.359 --> 01:11:39.400
Because I just wanted to
have a couple of minutes

01:11:39.400 --> 01:11:43.900
to start the last topic, and
we'll pick it up next time.

01:11:43.900 --> 01:11:47.060
So it's the obvious extension.

01:11:47.060 --> 01:11:49.350
Instead of just
breadth-first search trees,

01:11:49.350 --> 01:11:51.810
let's put weights
on the edges and try

01:11:51.810 --> 01:11:57.170
to compute shortest paths trees
in terms of the total weight

01:11:57.170 --> 01:11:58.631
of the path.

01:12:01.560 --> 01:12:04.350
So we're going to add weights.

01:12:04.350 --> 01:12:05.440
It's an undirected graph.

01:12:05.440 --> 01:12:08.160
So it's just a weight
for each undirected edge.

01:12:11.290 --> 01:12:19.160
I'll still have a starting
node, vertex v0 with process i0.

01:12:19.160 --> 01:12:22.110
Still have unique identifiers.

01:12:22.110 --> 01:12:24.670
And I'll assume the processes
know who their neighbors are.

01:12:24.670 --> 01:12:27.270
And they know the
weights of the incident

01:12:27.270 --> 01:12:29.659
edges, their adjacent edges.

01:12:29.659 --> 01:12:31.200
But otherwise they
don't need to know

01:12:31.200 --> 01:12:34.590
anything else about the graph.

01:12:34.590 --> 01:12:36.890
So again, this is
a familiar problem.

01:12:36.890 --> 01:12:38.990
But we're looking at it
in a very different way,

01:12:38.990 --> 01:12:40.160
by distributing it.

01:12:43.360 --> 01:12:47.640
so the processes are supposed to
compute a shortest paths tree,

01:12:47.640 --> 01:12:49.960
in the sense that
everybody should

01:12:49.960 --> 01:12:52.050
output its parent in the tree.

01:12:52.050 --> 01:12:55.440
And let's say they output
the distance as well,

01:12:55.440 --> 01:12:58.680
the weighted distance
from the root node.

01:13:03.540 --> 01:13:06.920
So this is called
Bellman-Ford's algorithm.

01:13:06.920 --> 01:13:11.970
Again it's got the same name
in the distributed setting.

01:13:11.970 --> 01:13:13.870
The Bellman-Ford
shortest paths algorithm.

01:13:17.230 --> 01:13:20.710
So everybody is keeping
track of their current best

01:13:20.710 --> 01:13:23.630
distance that they
know, and their parent.

01:13:23.630 --> 01:13:27.270
And they know their
unique identifier.

01:13:27.270 --> 01:13:29.040
And here's how the
algorithm works.

01:13:29.040 --> 01:13:31.170
This will look
familiar from when

01:13:31.170 --> 01:13:34.130
you had Bellman-Ford earlier.

01:13:34.130 --> 01:13:37.650
At every round,
everybody is going

01:13:37.650 --> 01:13:40.752
to send its distance
to its neighbors.

01:13:40.752 --> 01:13:42.460
Instead of just sending
a search message,

01:13:42.460 --> 01:13:46.450
now it will send its actual
distance information.

01:13:46.450 --> 01:13:50.720
And you receive the messages
from your neighbors.

01:13:50.720 --> 01:13:55.240
And now you do a relaxation
step, as you've seen before.

01:13:55.240 --> 01:13:56.990
You look at the current
distance you have.

01:13:56.990 --> 01:13:59.610
And you see if you've
gotten a new distance

01:13:59.610 --> 01:14:03.650
from a neighbor, such that if
you add the new distance you

01:14:03.650 --> 01:14:06.600
receive to the weight of
the edge between yourself

01:14:06.600 --> 01:14:08.690
and that neighbor, you
get something better

01:14:08.690 --> 01:14:10.350
than what you had before.

01:14:10.350 --> 01:14:14.220
If you get that, then you're
going to improve your distance.

01:14:14.220 --> 01:14:16.170
And if you improve
your distance,

01:14:16.170 --> 01:14:19.070
then you're going
to reset your parent

01:14:19.070 --> 01:14:24.720
to the sender of this new,
better distance information.

01:14:24.720 --> 01:14:26.610
So does this
algorithm make sense?

01:14:26.610 --> 01:14:28.580
It's like what you saw before.

01:14:28.580 --> 01:14:32.470
But there's no running
through all the nodes.

01:14:32.470 --> 01:14:34.310
Each node is doing
its own thing.

01:14:34.310 --> 01:14:37.040
It's waiting to get better
distance information

01:14:37.040 --> 01:14:39.100
and re-computing.

01:14:39.100 --> 01:14:41.750
And then it's going to
be sending out its better

01:14:41.750 --> 01:14:43.316
information at the next round.

01:14:46.100 --> 01:14:46.710
Question?

01:14:46.710 --> 01:14:49.660
So this is kind of a jump
in the way of thinking.

01:14:54.060 --> 01:14:56.990
All right, so now I'm just
going to end basically

01:14:56.990 --> 01:14:59.560
with an animation that'll
show you the kinds of things

01:14:59.560 --> 01:15:01.930
that happen here.

01:15:01.930 --> 01:15:07.100
All right so you start
out with the initial node.

01:15:07.100 --> 01:15:10.590
And what's recorded in the
circle is the best distances.

01:15:10.590 --> 01:15:14.522
The rest of these, the best
distance they know is infinity.

01:15:14.522 --> 01:15:15.480
So I didn't write that.

01:15:15.480 --> 01:15:23.470
So this guy knows 0 After one
round, he sent two messages.

01:15:23.470 --> 01:15:25.920
The best distance each
of these guys knows

01:15:25.920 --> 01:15:30.360
is just the weight of the
edge between v0 and itself.

01:15:30.360 --> 01:15:33.410
So this guy's now estimating
it's distance at 16

01:15:33.410 --> 01:15:36.080
and this guy at 1.

01:15:36.080 --> 01:15:38.930
16 is not very good because it's
actually very roundabout routes

01:15:38.930 --> 01:15:40.070
that can get there.

01:15:40.070 --> 01:15:45.310
But it's going to take us some
time to make that adjustment.

01:15:45.310 --> 01:15:50.240
After two rounds, everybody
is sending their distance

01:15:50.240 --> 01:15:50.880
information.

01:15:50.880 --> 01:15:54.700
But now we get a
correction here.

01:15:54.700 --> 01:15:57.110
This used to say 16.

01:15:57.110 --> 01:15:59.710
But now we have a
two hop path that

01:15:59.710 --> 01:16:02.170
gives you a better distance.

01:16:02.170 --> 01:16:04.000
So you get the 1 plus the 14.

01:16:04.000 --> 01:16:08.850
So he's going to here,
about the distance of 15

01:16:08.850 --> 01:16:11.390
as a result of what 1 sends.

01:16:11.390 --> 01:16:16.740
And some new guys get their
distance is calculated

01:16:16.740 --> 01:16:21.680
And then after three rounds, it
gets a little bit complicated.

01:16:21.680 --> 01:16:24.910
So maybe I'm just going to
flip through it quickly and let

01:16:24.910 --> 01:16:26.500
you study later.

01:16:26.500 --> 01:16:29.340
But you see that you keep
getting improvements,

01:16:29.340 --> 01:16:32.390
as you perform relaxation steps.

01:16:32.390 --> 01:16:36.270
As information gets to
somebody by better paths that

01:16:36.270 --> 01:16:38.680
happen to have
more hops, they're

01:16:38.680 --> 01:16:40.560
going to be reducing
their estimates.

01:16:40.560 --> 01:16:44.640
I'm going to flip, and you
see that this guy's estimate

01:16:44.640 --> 01:16:47.050
is going down.

01:16:47.050 --> 01:16:49.920
And in the end, after
eight rounds of this,

01:16:49.920 --> 01:16:52.180
you end up with a
very roundabout path

01:16:52.180 --> 01:16:56.430
that actually gives this
guy a much better estimate.

01:16:56.430 --> 01:16:58.640
So you can see how that works.

01:17:01.190 --> 01:17:03.660
So the claim is that
eventually, every process

01:17:03.660 --> 01:17:08.270
will have its distance being
a correct minimum weight

01:17:08.270 --> 01:17:12.710
of the path, and its
parent will be correct.

01:17:12.710 --> 01:17:14.710
I think maybe this is
a good place to stop.

01:17:14.710 --> 01:17:17.810
We'll pick up with this
algorithm and its analysis.

01:17:17.810 --> 01:17:19.910
Most of next time
is going to be spent

01:17:19.910 --> 01:17:22.440
on asynchronous
algorithms, which

01:17:22.440 --> 01:17:25.560
is a whole other
level of complication.

01:17:25.560 --> 01:17:27.820
So I'll see you on Thursday.