WEBVTT

00:00:00.080 --> 00:00:02.500
The following content is
provided under a Creative

00:00:02.500 --> 00:00:04.019
Commons license.

00:00:04.019 --> 00:00:06.360
Your support will help
MIT OpenCourseWare

00:00:06.360 --> 00:00:10.730
continue to offer high quality,
educational resources for free.

00:00:10.730 --> 00:00:13.340
To make a donation or
view additional materials

00:00:13.340 --> 00:00:17.236
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:17.236 --> 00:00:17.861
at ocw.mit.edu.

00:00:20.900 --> 00:00:22.400
PROFESSOR: Today,
what we want to do

00:00:22.400 --> 00:00:25.720
is talk about something at a
much higher scale than what

00:00:25.720 --> 00:00:28.100
we've thought about through
most of this semester.

00:00:28.100 --> 00:00:30.642
And that's probably by design.

00:00:30.642 --> 00:00:32.100
Over the course of
the semester, we

00:00:32.100 --> 00:00:35.300
started with kind
of enzyme kinetics

00:00:35.300 --> 00:00:37.220
or molecular binding
kind of events,

00:00:37.220 --> 00:00:40.680
and we slowly built our way up
the larger and larger scales.

00:00:40.680 --> 00:00:43.110
Now there's always this
question about whether we're

00:00:43.110 --> 00:00:46.820
claiming that we really
understand how the higher

00:00:46.820 --> 00:00:49.974
levels of organization
result from the lower level

00:00:49.974 --> 00:00:50.515
interactions.

00:00:50.515 --> 00:00:54.070
And I'd say, we definitely
don't understand all of it.

00:00:54.070 --> 00:00:56.990
So you shouldn't come away
with that as the notion.

00:00:56.990 --> 00:00:59.100
But at least one
thing that I think

00:00:59.100 --> 00:01:01.690
is fascinating about this
area of systems biology

00:01:01.690 --> 00:01:05.129
is that much of the framework
that we use to understand,

00:01:05.129 --> 00:01:08.220
let's say, molecular scale
interactions or stochastic gene

00:01:08.220 --> 00:01:11.420
expression, so these dynamics
at the smaller scale,

00:01:11.420 --> 00:01:14.304
much of those ideas and
such certainly transport up

00:01:14.304 --> 00:01:16.470
to these higher scales or
translate up to the higher

00:01:16.470 --> 00:01:17.990
scales, where, in
this case, we're

00:01:17.990 --> 00:01:19.900
using kind of master
equation type formulas

00:01:19.900 --> 00:01:22.314
to try to understand
relative species abundance.

00:01:22.314 --> 00:01:23.730
And so I think
part of what I like

00:01:23.730 --> 00:01:28.420
about this topic of neutral
theory versus niche theory

00:01:28.420 --> 00:01:30.780
and so forth in ecology
is that you can just

00:01:30.780 --> 00:01:34.530
see how very, very similar
ideas, that we applied

00:01:34.530 --> 00:01:36.400
for studying stochastic
gene expression,

00:01:36.400 --> 00:01:38.358
can also be used to try
to understand why it is

00:01:38.358 --> 00:01:41.800
that some species are more
common than others when you go

00:01:41.800 --> 00:01:46.610
and you count them, in this
case, on an island in Panama.

00:01:46.610 --> 00:01:50.230
Now, the subject
is, by its nature,

00:01:50.230 --> 00:01:52.840
less experimentally
focused than much

00:01:52.840 --> 00:01:54.840
of what we've done over
the course the semester.

00:01:54.840 --> 00:01:56.680
And this is really
a topic the tends

00:01:56.680 --> 00:02:00.200
to be a combination
of mathematical theory

00:02:00.200 --> 00:02:04.680
with kind of careful counting of
species in some different areas

00:02:04.680 --> 00:02:07.199
and trying to understand
what that means.

00:02:07.199 --> 00:02:08.740
But it's an area
that there have been

00:02:08.740 --> 00:02:11.880
a number of physicists involved
in over the last 10 years.

00:02:11.880 --> 00:02:13.690
And I think that
it's fascinating,

00:02:13.690 --> 00:02:17.010
because it does get to the
heart of what we are looking

00:02:17.010 --> 00:02:19.530
for from a theory,
what kind of evidence

00:02:19.530 --> 00:02:23.040
do we use to support a
theory or to refute it.

00:02:23.040 --> 00:02:27.272
So I think there are a lot of
very basic issues about science

00:02:27.272 --> 00:02:28.730
that come up when
we start thinking

00:02:28.730 --> 00:02:31.630
about this question of
neutral theory in ecology.

00:02:31.630 --> 00:02:34.832
And since it's, for many
of us, a totally new area

00:02:34.832 --> 00:02:36.290
that we don't know
very much about,

00:02:36.290 --> 00:02:40.570
you can come to it
with maybe fresh eyes.

00:02:40.570 --> 00:02:42.320
And you don't have the
same preconceptions

00:02:42.320 --> 00:02:44.981
that you would have for many
other models that you might

00:02:44.981 --> 00:02:47.230
be more familiar with in the
context of molecular cell

00:02:47.230 --> 00:02:47.730
biology.

00:02:50.410 --> 00:02:53.640
So the basic question
that we're going

00:02:53.640 --> 00:02:56.610
to try to talk
about today is just

00:02:56.610 --> 00:02:59.950
the question of why is it that,
when you look out at the world,

00:02:59.950 --> 00:03:02.510
you see that there
are some species that

00:03:02.510 --> 00:03:06.850
seem to be abundant and
some that seem to be rare?

00:03:06.850 --> 00:03:10.350
Are there other patterns
that are somehow universal?

00:03:10.350 --> 00:03:14.010
And what kind of sort
of lower scale processes

00:03:14.010 --> 00:03:17.390
might lead to the
patterns that we observe?

00:03:17.390 --> 00:03:22.220
And I think that this paper
that we read is-- I mean,

00:03:22.220 --> 00:03:24.510
it's not that it's.

00:03:24.510 --> 00:03:30.930
Well, can somebody say what the
actual scientific contribution

00:03:30.930 --> 00:03:33.801
of this paper was?

00:03:33.801 --> 00:03:34.300
Yes?

00:03:34.300 --> 00:03:35.330
AUDIENCE: They
did a calculation.

00:03:35.330 --> 00:03:36.940
PROFESSOR: They
did a calculation.

00:03:36.940 --> 00:03:39.350
But it's a little bit
more specific than that.

00:03:39.350 --> 00:03:39.850
What is it?

00:03:39.850 --> 00:03:43.160
AUDIENCE: They came up with
the closed form equation?

00:03:43.160 --> 00:03:44.160
PROFESSOR: That's right.

00:03:44.160 --> 00:03:47.230
Basically, there was a model of
this neutral theory in ecology

00:03:47.230 --> 00:03:49.920
that we're going to explain
or try to understand.

00:03:49.920 --> 00:03:51.680
You can simulate the
model, but then there

00:03:51.680 --> 00:03:54.700
are possible issues associated
with convergence or something

00:03:54.700 --> 00:03:55.410
of those.

00:03:55.410 --> 00:03:56.826
Although it's hard
to believe that

00:03:56.826 --> 00:03:58.200
that's really such a concern.

00:03:58.200 --> 00:03:59.970
But you can simulate that model.

00:03:59.970 --> 00:04:01.680
What they did is
they just showed

00:04:01.680 --> 00:04:05.919
that you could get an analytic-y
kind of expression for it.

00:04:05.919 --> 00:04:07.460
It's not a super
analytic expression,

00:04:07.460 --> 00:04:09.920
but, at least, it's not
a straight up simulation.

00:04:09.920 --> 00:04:11.892
You kind of numerically
do something,

00:04:11.892 --> 00:04:13.600
integrate something,
as compared to doing

00:04:13.600 --> 00:04:14.683
the stochastic simulation.

00:04:17.844 --> 00:04:19.510
So it's not that that,
in and of itself,

00:04:19.510 --> 00:04:23.469
is what you feel like--
it's not what we necessarily

00:04:23.469 --> 00:04:24.260
care so much about.

00:04:24.260 --> 00:04:28.880
But I think that it's still
just a nice, short description

00:04:28.880 --> 00:04:31.681
of the model and the
assumptions that go into it.

00:04:31.681 --> 00:04:33.180
And you get a little
bit of a window

00:04:33.180 --> 00:04:34.990
into the debate that's
going on between these two

00:04:34.990 --> 00:04:36.910
communities of kind
of the neutral theory

00:04:36.910 --> 00:04:39.780
guys and the niche
theory community.

00:04:45.650 --> 00:04:48.080
So there's only one
figure in this paper.

00:04:48.080 --> 00:04:50.880
And it's an example
of the kind of data

00:04:50.880 --> 00:04:53.110
that we want to
try to understand.

00:04:53.110 --> 00:04:55.330
So there's a particular
pattern in terms

00:04:55.330 --> 00:04:57.400
of the relative
species abundance.

00:04:57.400 --> 00:05:00.150
And we want to understand
what kind of models

00:05:00.150 --> 00:05:03.280
might lead to that
observed pattern.

00:05:03.280 --> 00:05:05.940
But given that there's just
one figure in the paper,

00:05:05.940 --> 00:05:07.640
we have to make sure
that we understand

00:05:07.640 --> 00:05:09.560
exactly what is being plotted.

00:05:09.560 --> 00:05:12.210
And what I've found
from experience--

00:05:12.210 --> 00:05:17.000
and, actually, even the
answer to the email question

00:05:17.000 --> 00:05:19.514
that was sent out, I
think, was incorrect on one

00:05:19.514 --> 00:05:20.180
of these things.

00:05:20.180 --> 00:05:21.880
So we'll talk about
that some more.

00:05:21.880 --> 00:05:24.587
So beware.

00:05:24.587 --> 00:05:25.420
We'll figure it out.

00:05:25.420 --> 00:05:28.040
But I think it's actually
surprisingly tricky

00:05:28.040 --> 00:05:32.560
to understand what
this figure is saying.

00:05:32.560 --> 00:05:36.020
But first of all, can
somebody describe not

00:05:36.020 --> 00:05:38.620
what the figure is
saying but just what

00:05:38.620 --> 00:05:40.160
the data is supposed to be?

00:05:56.550 --> 00:05:59.640
Where do they get the data?

00:05:59.640 --> 00:06:02.896
Anything that's useful?

00:06:02.896 --> 00:06:05.692
AUDIENCE: They were on
an island ecosystem.

00:06:05.692 --> 00:06:06.900
PROFESSOR: There's an island.

00:06:06.900 --> 00:06:12.752
It's called BCI,
Barro Colorado Island.

00:06:12.752 --> 00:06:13.668
AUDIENCE: [INAUDIBLE].

00:06:25.080 --> 00:06:27.015
PROFESSOR: So it's
a 50 hectare plot.

00:06:31.380 --> 00:06:33.504
Does anybody know
what a hectare is?

00:06:33.504 --> 00:06:35.420
AUDIENCE: It's a lot
more than a square meter.

00:06:35.420 --> 00:06:39.550
PROFESSOR: It's a lot more than
a square meter, yes, indeed.

00:06:39.550 --> 00:06:40.200
Yeah.

00:06:40.200 --> 00:06:44.921
Is this an English
unit of measure?

00:06:44.921 --> 00:06:46.920
This is the kind of thing
that I have to Google.

00:06:46.920 --> 00:06:50.970
But it's one hectare is equal
to 10 to the 4 meters squared.

00:06:54.032 --> 00:06:55.365
That's a good thing to memorize.

00:06:59.041 --> 00:06:59.540
I

00:06:59.540 --> 00:07:02.180
AUDIENCE: Exactly
or approximate?

00:07:02.180 --> 00:07:03.430
PROFESSOR: I think it's exact.

00:07:03.430 --> 00:07:04.680
I think I think it's an exact.

00:07:04.680 --> 00:07:06.805
AUDIENCE: Then
it's a metric unit.

00:07:06.805 --> 00:07:08.930
PROFESSOR: Yeah, so apparently
it is a metric unit.

00:07:08.930 --> 00:07:12.150
So the idea is that if you take
a 100 meters by 100 meters,

00:07:12.150 --> 00:07:13.070
this is a hectare.

00:07:13.070 --> 00:07:15.080
And there's 50 of them.

00:07:15.080 --> 00:07:17.435
It's about like a half
a square kilometer

00:07:17.435 --> 00:07:21.110
to give you a sense of
what we're talking about.

00:07:21.110 --> 00:07:23.331
And what do they
do on this plot?

00:07:23.331 --> 00:07:28.930
AUDIENCE: They count a certain
number as canopy trees.

00:07:28.930 --> 00:07:32.209
So the trees that
are, like, really big.

00:07:32.209 --> 00:07:34.500
PROFESSOR: And how do they
decide which trees to count?

00:07:34.500 --> 00:07:36.190
Did they count every tree?

00:07:36.190 --> 00:07:40.890
AUDIENCE: No, just the ones
that like formed the top layer.

00:07:40.890 --> 00:07:44.987
PROFESSOR: I think that the
way that they decide-- OK.

00:07:44.987 --> 00:07:47.070
Does anybody remember how
many trees were counted?

00:07:50.283 --> 00:07:51.660
AUDIENCE: [INAUDIBLE].

00:07:51.660 --> 00:08:06.390
PROFESSOR: So there are 21,457
trees in this 50 hectare plot.

00:08:06.390 --> 00:08:10.770
They identify the species for
each one of these 21,000 trees.

00:08:10.770 --> 00:08:11.670
And they assign them.

00:08:11.670 --> 00:08:14.610
And they found that there
were 225 distinct species.

00:08:19.590 --> 00:08:23.552
So this is really quite
an amazing data set.

00:08:23.552 --> 00:08:27.960
Because I can tell you that I
would not be able to do this.

00:08:31.080 --> 00:08:35.559
This was highly
skilled biologists

00:08:35.559 --> 00:08:38.880
that can distinguish 225.

00:08:38.880 --> 00:08:40.834
If they can identify
these 225, that

00:08:40.834 --> 00:08:43.250
means they have to be able to
identify other ones as well.

00:08:43.250 --> 00:08:45.340
And they did it
for 20,000 trees.

00:08:45.340 --> 00:08:47.620
And indeed, Barro
Colorado Island

00:08:47.620 --> 00:08:54.560
is one of the major Smithsonian
research institutes,

00:08:54.560 --> 00:08:56.032
where they've been tracking.

00:08:56.032 --> 00:08:57.740
They do this like
every five years or so,

00:08:57.740 --> 00:09:00.630
where they do a census, where
they count all of the trees.

00:09:00.630 --> 00:09:05.929
And they're also tracking many
other-- it's not just trees.

00:09:05.929 --> 00:09:07.220
They're doing everything there.

00:09:07.220 --> 00:09:08.512
AUDIENCE: Is there only plants?

00:09:08.512 --> 00:09:09.470
PROFESSOR: What's that?

00:09:09.470 --> 00:09:10.890
AUDIENCE: Is it only plants?

00:09:10.890 --> 00:09:11.473
PROFESSOR: No.

00:09:16.090 --> 00:09:18.285
So actually, I
visited BCI, and it

00:09:18.285 --> 00:09:20.410
seemed like they were
studying all sorts of things.

00:09:20.410 --> 00:09:22.760
And there were nice
looking birds there.

00:09:22.760 --> 00:09:24.589
AUDIENCE: No, I
mean in this census.

00:09:24.589 --> 00:09:26.380
PROFESSOR: In this
census, it's only trees.

00:09:26.380 --> 00:09:32.150
And the way that they decide
which of the trees to do,

00:09:32.150 --> 00:09:36.880
it's the ones that are more
than 10 centimeters DBH.

00:09:36.880 --> 00:09:38.940
Anybody can guess
what DBH might mean?

00:09:50.050 --> 00:09:52.240
It's actually diameter
at breast height.

00:10:00.908 --> 00:10:05.925
So what they do is they walk
up to the tree with a ruler,

00:10:05.925 --> 00:10:07.800
and then, if it's larger
than 10 centimeters,

00:10:07.800 --> 00:10:11.020
then they count it.

00:10:11.020 --> 00:10:13.200
You need to have some
threshold at the lower end,

00:10:13.200 --> 00:10:17.140
otherwise you're
in trouble, right?

00:10:17.140 --> 00:10:20.131
And there were plenty of trees
that satisfied this requirement

00:10:20.131 --> 00:10:20.630
here.

00:10:27.380 --> 00:10:30.420
Then what they do, for
all of these trees,

00:10:30.420 --> 00:10:34.295
it's assigned to some species.

00:10:38.020 --> 00:10:42.350
The basic goal of this
branch of biology or ecology

00:10:42.350 --> 00:10:46.817
is to try to
understand the pattern,

00:10:46.817 --> 00:10:48.650
from this sort of data,
where it comes from.

00:10:48.650 --> 00:10:51.170
Or first describe
it, and then once you

00:10:51.170 --> 00:10:52.950
have a description
of it, then you

00:10:52.950 --> 00:10:56.377
can try to understand what
microscale processes might

00:10:56.377 --> 00:10:57.210
lead to the pattern.

00:10:57.210 --> 00:11:00.030
And the pattern is what's
plotted in figure 1.

00:11:00.030 --> 00:11:01.610
It's the only
figure in the paper.

00:11:01.610 --> 00:11:05.660
I have reconstructed
a rough version of it,

00:11:05.660 --> 00:11:07.120
here, for you on the board.

00:11:07.120 --> 00:11:08.960
But if you want a
more accurate version,

00:11:08.960 --> 00:11:12.600
you can look at your paper.

00:11:12.600 --> 00:11:15.120
Now, we want to make
sure that we understand

00:11:15.120 --> 00:11:16.430
what the figure is saying.

00:11:16.430 --> 00:11:21.880
So we will ask the
following question.

00:11:21.880 --> 00:11:23.860
What is the most common
number of individuals

00:11:23.860 --> 00:11:26.066
for a species in this data set?

00:11:30.040 --> 00:11:39.660
The most common/frequent number
of individuals for a species

00:11:39.660 --> 00:11:49.340
to have in this data set.

00:11:49.340 --> 00:11:51.594
Now, it's maybe
worth just saying

00:11:51.594 --> 00:11:52.760
something a little bit more.

00:11:52.760 --> 00:11:54.710
So you notice that
they were not trying

00:11:54.710 --> 00:11:58.780
to count the total number
of species, altogether.

00:11:58.780 --> 00:12:01.426
And in general, all of this
field of relative species

00:12:01.426 --> 00:12:03.467
abundance, to try to
understand them, what you do

00:12:03.467 --> 00:12:05.792
is typically take
one trophic level.

00:12:05.792 --> 00:12:07.250
So some of the
classic studies were

00:12:07.250 --> 00:12:11.000
of beetles in the Thames River.

00:12:11.000 --> 00:12:13.050
The idea is that it's
some set of species

00:12:13.050 --> 00:12:14.850
that you think are
going to be interacting,

00:12:14.850 --> 00:12:16.350
maybe competing,
with each other,

00:12:16.350 --> 00:12:17.390
in some way, in the
sense that they're

00:12:17.390 --> 00:12:20.180
maybe eating related things and
being eaten by related things.

00:12:20.180 --> 00:12:24.070
And so in this case,
these are the trees

00:12:24.070 --> 00:12:25.230
in Barro Colorado Island.

00:12:25.230 --> 00:12:30.537
And you can imagine
that this is useful.

00:12:30.537 --> 00:12:32.620
The fact that it's trees
instead of something else

00:12:32.620 --> 00:12:35.251
means that you can actually
track the individuals

00:12:35.251 --> 00:12:35.750
over time.

00:12:35.750 --> 00:12:37.166
And when you go
to the island what

00:12:37.166 --> 00:12:40.929
you see is that all the trees,
they're wrapped by some tag.

00:12:40.929 --> 00:12:42.470
And presumably, they
have some system

00:12:42.470 --> 00:12:45.590
to tell you which species
that is so that they

00:12:45.590 --> 00:12:47.880
keep records of everything.

00:12:47.880 --> 00:12:53.870
But the question is, what's
the most common number

00:12:53.870 --> 00:12:55.790
of individuals for
species in the data set?

00:12:55.790 --> 00:12:57.506
Do you understand what
I'm trying to ask?

00:13:03.960 --> 00:13:07.120
And we're going do
approximate, so we'll say.

00:13:20.230 --> 00:13:21.640
Or this, can't determine.

00:13:28.530 --> 00:13:32.600
We want to know, what is the
mode of this distribution

00:13:32.600 --> 00:13:36.360
of the number of individuals
for each of these species?

00:13:39.140 --> 00:13:41.540
Do you understand the question?

00:13:41.540 --> 00:13:45.220
I'm going to give you 20
seconds to look at this.

00:13:55.659 --> 00:13:58.110
AUDIENCE: Should we just
hold a blank piece of paper?

00:13:58.110 --> 00:14:00.516
PROFESSOR: Oh, we
don't have our-- ah.

00:14:00.516 --> 00:14:02.310
AUDIENCE: [INAUDIBLE]?

00:14:02.310 --> 00:14:05.530
PROFESSOR: You know, the
TA always lets me down.

00:14:05.530 --> 00:14:08.310
All right, yeah.

00:14:08.310 --> 00:14:13.634
So you can do A, B,
C, D, E. Are we ready?

00:14:13.634 --> 00:14:14.550
AUDIENCE: [INAUDIBLE]?

00:14:21.570 --> 00:14:23.680
PROFESSOR: You can just
do this if you're not.

00:14:23.680 --> 00:14:26.100
But given this was the
only figure in the paper,

00:14:26.100 --> 00:14:28.470
and that this is a basic
property of the distribution,

00:14:28.470 --> 00:14:31.407
I'm sure that you figured that
out last night, anyways, right?

00:14:31.407 --> 00:14:33.240
Especially since it was
one of the questions

00:14:33.240 --> 00:14:34.031
in the [INAUDIBLE].

00:14:34.031 --> 00:14:36.410
So you presumably already
thought about this question,

00:14:36.410 --> 00:14:36.940
right?

00:14:36.940 --> 00:14:38.430
OK.

00:14:38.430 --> 00:14:39.253
Yes?

00:14:39.253 --> 00:14:40.800
AUDIENCE: Yes.

00:14:40.800 --> 00:14:43.725
PROFESSOR: Ready,
three, two, one.

00:14:47.590 --> 00:14:50.060
I'd say we got a lot of B's.

00:14:50.060 --> 00:14:51.935
So it seems like B is the most.

00:14:55.284 --> 00:14:56.950
So this, we'll put a
question mark here.

00:15:00.860 --> 00:15:04.440
Can somebody verbally
say why their neighbor

00:15:04.440 --> 00:15:08.740
said that the mode of the
distribution is around 30?

00:15:11.810 --> 00:15:12.427
Yeah?

00:15:12.427 --> 00:15:13.510
AUDIENCE: The tallest bar.

00:15:13.510 --> 00:15:16.620
PROFESSOR: The tallest
bar there is around 30.

00:15:16.620 --> 00:15:18.270
That's a very
practical definition.

00:15:18.270 --> 00:15:21.240
So that's normally what
we mean by the mode.

00:15:21.240 --> 00:15:23.550
There is a slight
problem in all of this,

00:15:23.550 --> 00:15:25.520
which is that this
thing is plotted

00:15:25.520 --> 00:15:28.415
in a very kind of funny way.

00:15:28.415 --> 00:15:30.290
So if you look at the
figure, what you'll see

00:15:30.290 --> 00:15:31.748
is that it's number
of individuals.

00:15:31.748 --> 00:15:33.375
And down here, it
says, log2 scale.

00:15:40.610 --> 00:15:44.530
Now, when we say the mode,
what we're wondering about

00:15:44.530 --> 00:15:50.080
is that, if you just take the
most typical kind of species

00:15:50.080 --> 00:15:52.080
of tree that's there,
how many individuals do

00:15:52.080 --> 00:15:53.371
we think there should be there?

00:15:55.570 --> 00:15:57.560
Of course, typical
is hard to define.

00:15:57.560 --> 00:16:00.650
We can talk about mode,
median, mean, et cetera.

00:16:00.650 --> 00:16:03.210
But the most common
number of individuals

00:16:03.210 --> 00:16:06.890
for a species of the data
set ends up not being 30.

00:16:06.890 --> 00:16:08.190
It ends up being 1.

00:16:12.220 --> 00:16:16.430
And we will try to
reconstruct this right now.

00:16:16.430 --> 00:16:18.870
Because you have to do
a little bit of digging

00:16:18.870 --> 00:16:21.330
to figure out what is
being plotted here.

00:16:21.330 --> 00:16:23.340
But it's not the raw data.

00:16:23.340 --> 00:16:26.360
The problem here is that this
is on this log scale, where

00:16:26.360 --> 00:16:29.720
the bins here are growing
kind of geometrically

00:16:29.720 --> 00:16:33.310
or exponentially, whatever,
as you move to the right.

00:16:33.310 --> 00:16:39.975
So over here, this thing
only contains one real bin.

00:16:39.975 --> 00:16:41.350
And actually,
we're about to find

00:16:41.350 --> 00:16:44.040
it's half a bin,
which is even weirder.

00:16:44.040 --> 00:16:46.330
Whereas out here,
this is maybe 30 bins.

00:16:46.330 --> 00:16:50.550
So the number of species that
we're going to put in this bin

00:16:50.550 --> 00:16:56.990
is everything between around
20 something up to 50 or so.

00:16:56.990 --> 00:16:59.701
The number of kind
of true bins that

00:16:59.701 --> 00:17:01.200
end up in each of
these plotted bins

00:17:01.200 --> 00:17:05.480
is going to grow geometrically
as we move to the right.

00:17:05.480 --> 00:17:09.730
So this is a very funny
transform of the data.

00:17:09.730 --> 00:17:14.690
And indeed, I think it's
always nice to just, in life,

00:17:14.690 --> 00:17:17.890
you always plot
the raw data first.

00:17:17.890 --> 00:17:20.720
And then what you can do
is then you can do funny.

00:17:20.720 --> 00:17:23.510
There's a reason to
plot it this way.

00:17:23.510 --> 00:17:27.770
Because this is where they
get this idea that this might

00:17:27.770 --> 00:17:30.530
described as described
as a log normal.

00:17:30.530 --> 00:17:32.550
The idea is, if you
take a log of the data,

00:17:32.550 --> 00:17:34.550
then you get something
that looks like a normal.

00:17:34.550 --> 00:17:38.105
But you always plot
the raw data first.

00:17:38.105 --> 00:17:40.480
So let's try to figure out
what the raw data looked like.

00:17:45.260 --> 00:17:46.680
And now what we're
going to do is

00:17:46.680 --> 00:17:49.141
we're going to have real
scalings, honest to goodness

00:17:49.141 --> 00:17:49.640
numbers.

00:17:53.130 --> 00:17:55.170
Now the number of
species you get still.

00:17:57.800 --> 00:18:00.580
So this is asking, how
many different species

00:18:00.580 --> 00:18:04.100
do we see with one member
or with two members

00:18:04.100 --> 00:18:05.980
or with three, four, et cetera?

00:18:11.715 --> 00:18:13.340
And I don't know how
far we're actually

00:18:13.340 --> 00:18:14.860
going to be able to get.

00:18:14.860 --> 00:18:21.440
But in this one
figure, in our paper,

00:18:21.440 --> 00:18:23.640
they tell us what
the histogram means.

00:18:23.640 --> 00:18:26.600
So the first histogram
bar represents what

00:18:26.600 --> 00:18:29.680
they call phi 1 divided by 2.

00:18:29.680 --> 00:18:36.920
Phi 1 was the number of species
observed with one member, which

00:18:36.920 --> 00:18:42.980
means that even this
first plot bar is not

00:18:42.980 --> 00:18:45.940
the number of species observed
with a single individual.

00:18:45.940 --> 00:18:48.810
It's half of that.

00:18:48.810 --> 00:18:50.350
You can argue about
the consistency

00:18:50.350 --> 00:18:52.230
of how these things
should be, but that's

00:18:52.230 --> 00:18:54.000
what this thing's plotted.

00:18:54.000 --> 00:18:58.000
And it looks like it was nine,
here, so this should be 18.

00:18:58.000 --> 00:19:02.500
So I'm going to put up here,
here's a 20 and here's a 10.

00:19:05.950 --> 00:19:07.140
Right, so here is an 18.

00:19:14.510 --> 00:19:17.670
Now, what do they say?

00:19:17.670 --> 00:19:23.270
This bin represents
phi 1 divided

00:19:23.270 --> 00:19:26.894
by 2 plus phi 2 divided by 2.

00:19:26.894 --> 00:19:28.310
So they took the
number of species

00:19:28.310 --> 00:19:30.940
where they saw just a single
individual plus the number

00:19:30.940 --> 00:19:32.690
of species where they
saw two individuals,

00:19:32.690 --> 00:19:35.829
and they added those
and they divided by 2.

00:19:35.829 --> 00:19:36.620
That's this number.

00:19:38.799 --> 00:19:40.840
We're not going to go
through this whole process,

00:19:40.840 --> 00:19:42.350
because it's a
little bit tiresome.

00:19:42.350 --> 00:19:46.410
But I've already
done it for you.

00:19:46.410 --> 00:19:50.670
So I'm going to plot a few
of things to get you there.

00:19:58.240 --> 00:20:02.490
And so I calculated
it was 19, 13, 9, 6.

00:20:02.490 --> 00:20:05.696
It becomes ill-determined
once you get out here,

00:20:05.696 --> 00:20:07.320
in the sense that we
don't have enough.

00:20:07.320 --> 00:20:10.240
It's not uniquely specified
going from that to that

00:20:10.240 --> 00:20:12.980
as it has to be.

00:20:12.980 --> 00:20:13.970
But I calculated it.

00:20:13.970 --> 00:20:17.565
It's around 5, in
here, for a few.

00:20:17.565 --> 00:20:21.120
And somewhere in here,
it's going to go into 4.

00:20:21.120 --> 00:20:27.095
And then this might go down
to 3, and then deh, deh, deh.

00:20:32.550 --> 00:20:42.950
Now, if you look at this and
the rapid rapid fall-off,

00:20:42.950 --> 00:20:46.665
do you think that you're
going to find any species that

00:20:46.665 --> 00:20:47.915
have more than 20 individuals?

00:20:51.440 --> 00:20:52.340
We're going to vote.

00:20:52.340 --> 00:20:53.857
So you see this falling-off?

00:20:53.857 --> 00:20:55.690
So let's say that I've
just showed you this,

00:20:55.690 --> 00:20:58.930
and I haven't yet
calculated the rest,

00:20:58.930 --> 00:21:01.540
do we think that there's going
to be any species with more

00:21:01.540 --> 00:21:04.230
than 20 individuals?

00:21:04.230 --> 00:21:08.430
Greater than 20
individuals, question mark?

00:21:08.430 --> 00:21:10.760
1 is yes.

00:21:10.760 --> 00:21:12.245
2 is no.

00:21:14.945 --> 00:21:16.230
It's going to be yes, no.

00:21:16.230 --> 00:21:18.835
Ready, three, two, one.

00:21:21.560 --> 00:21:25.190
So we got some 2s.

00:21:25.190 --> 00:21:27.300
So I'd say that most
people are saying, no.

00:21:27.300 --> 00:21:28.532
Look at this fall-off.

00:21:28.532 --> 00:21:29.990
They're not going
to be any species

00:21:29.990 --> 00:21:33.480
with more than 20 individuals.

00:21:33.480 --> 00:21:35.630
Although we already
know that there

00:21:35.630 --> 00:21:38.480
are many species with
more than 20 individuals.

00:21:38.480 --> 00:21:41.930
So this plot is
useful for something.

00:21:41.930 --> 00:21:43.560
You can see that there are.

00:21:43.560 --> 00:21:45.450
And we know exactly
the number of species

00:21:45.450 --> 00:21:47.930
that have more than 20
individuals, roughly.

00:21:47.930 --> 00:21:50.760
So those ones are all in these.

00:21:50.760 --> 00:21:52.985
So you can see that there
are hundreds of species

00:21:52.985 --> 00:21:54.235
with more than 20 individuals.

00:21:57.630 --> 00:22:00.830
And indeed, it looks like there
were two or three species that

00:22:00.830 --> 00:22:03.186
had more than 1,000
individuals or 1,500

00:22:03.186 --> 00:22:04.560
or whatever the
cutoff there was.

00:22:07.920 --> 00:22:14.050
So this distribution
starts out rather high

00:22:14.050 --> 00:22:15.590
but then falls quickly.

00:22:15.590 --> 00:22:19.019
And out here, it's going
to be very, very sparse.

00:22:19.019 --> 00:22:21.060
So there's going to be a
bunch of numbers in here

00:22:21.060 --> 00:22:24.160
where there's not any
species in the histogram.

00:22:24.160 --> 00:22:26.340
And then out there, there's
going to be one, right?

00:22:26.340 --> 00:22:28.680
And indeed, you have
to go really far out.

00:22:28.680 --> 00:22:30.490
Because there's one
species out there

00:22:30.490 --> 00:22:33.870
that has a couple thousand.

00:22:33.870 --> 00:22:40.660
And indeed, the mean number
of individuals per species

00:22:40.660 --> 00:22:44.880
has to be around 100.

00:22:44.880 --> 00:22:47.160
We know how to calculate a mean.

00:22:47.160 --> 00:22:50.810
This divided by this
is just short of 100.

00:22:50.810 --> 00:22:54.150
So the mean number of
individuals in a species

00:22:54.150 --> 00:22:56.280
is around 100.

00:22:56.280 --> 00:22:59.090
The mode is one.

00:23:01.446 --> 00:23:02.070
And the median?

00:23:08.630 --> 00:23:11.200
Well, ready?

00:23:11.200 --> 00:23:14.387
We decided this was the mode.

00:23:14.387 --> 00:23:15.720
Where is the median going to be?

00:23:15.720 --> 00:23:18.260
Is it going to be A, B, C, D?

00:23:18.260 --> 00:23:20.935
Ready, three, two, one.

00:23:23.760 --> 00:23:27.060
Indeed, this tells you pretty
clear where the median is.

00:23:27.060 --> 00:23:28.854
This thing is indeed
around the median.

00:23:28.854 --> 00:23:31.020
Because you can say, oh,
it's about the same numbers

00:23:31.020 --> 00:23:32.240
to either side.

00:23:32.240 --> 00:23:33.740
So the median is around here.

00:23:36.310 --> 00:23:39.370
And I told you where
the mean was, again.

00:23:39.370 --> 00:23:40.330
You guys remember?

00:23:40.330 --> 00:23:42.710
Ready, three, two, one.

00:23:42.710 --> 00:23:43.960
Mean, uno.

00:23:43.960 --> 00:23:44.460
Mean.

00:23:50.220 --> 00:23:53.070
So this is a very, very
funny distribution.

00:23:53.070 --> 00:23:54.730
I guess I want to
highlight that.

00:23:54.730 --> 00:23:57.530
And I think it's
not at all what you

00:23:57.530 --> 00:24:00.070
would have expected somehow.

00:24:00.070 --> 00:24:04.799
At least, if you had described
this measurement process to me,

00:24:04.799 --> 00:24:06.590
if you told me that
you went to this island

00:24:06.590 --> 00:24:10.545
and you counted 20,000 trees,
I don't know how many species

00:24:10.545 --> 00:24:11.420
I would have guessed.

00:24:11.420 --> 00:24:16.255
But OK, 220, it's reasonable.

00:24:16.255 --> 00:24:18.630
Well, I would have guessed it
would have looked something

00:24:18.630 --> 00:24:22.380
like this on a linear
scale, maybe, right?

00:24:22.380 --> 00:24:25.430
You know, that there would be a
bunch of them around 50 to 100

00:24:25.430 --> 00:24:28.340
and some would go couple
hundred, some of them.

00:24:28.340 --> 00:24:32.480
So I guess I would have
thought that the mean, mode,

00:24:32.480 --> 00:24:35.070
median would all be kind
of a more similar thing.

00:24:35.070 --> 00:24:38.130
But this is just not
the way the world is.

00:24:38.130 --> 00:24:40.230
It's not just on BCI.

00:24:40.230 --> 00:24:44.150
People, for hundreds
of years, have been

00:24:44.150 --> 00:24:45.520
studying these distributions.

00:24:45.520 --> 00:24:51.100
And things that look like this,
with extremely long tails,

00:24:51.100 --> 00:24:54.810
this is what people see.

00:24:54.810 --> 00:24:57.720
Now you can argue about
exactly how fast it falls off

00:24:57.720 --> 00:25:00.040
and whether it's different
on a mainland or an island.

00:25:00.040 --> 00:25:05.210
But this basic feature, that
rare species are common,

00:25:05.210 --> 00:25:09.140
this seems to be just
that's what you always see.

00:25:09.140 --> 00:25:11.220
This is the thing that
you have to remember,

00:25:11.220 --> 00:25:12.425
rare species are common.

00:25:24.520 --> 00:25:28.680
And I think that this is
the basic, surprising thing

00:25:28.680 --> 00:25:30.530
in this whole field.

00:25:30.530 --> 00:25:32.480
And the ironic
thing is that even

00:25:32.480 --> 00:25:35.580
after spending all this
time reading about theories

00:25:35.580 --> 00:25:39.140
to describe these distributions,
it's still very possible--

00:25:39.140 --> 00:25:41.780
and I would say, based on
the statistics, this year

00:25:41.780 --> 00:25:44.380
and past years, it's
not just possible,

00:25:44.380 --> 00:25:46.060
but it is the
standard outcome-- is

00:25:46.060 --> 00:25:47.476
that after reading
this paper, you

00:25:47.476 --> 00:25:51.230
do not realize that the
distribution looks like this.

00:25:51.230 --> 00:25:54.000
You somehow still think that
it looks-- you kind of still

00:25:54.000 --> 00:25:59.060
think it's like a linear scale,
where the typical species has

00:25:59.060 --> 00:26:01.130
this, where the mean,
median, mode are all

00:26:01.130 --> 00:26:02.260
about the same thing.

00:26:02.260 --> 00:26:08.661
So I guess always plot the raw
data in an untransformed way.

00:26:08.661 --> 00:26:10.410
There are theoretical
reasons why it might

00:26:10.410 --> 00:26:11.618
be nice to plot it like this.

00:26:11.618 --> 00:26:15.120
But be very careful
about what you're doing.

00:26:15.120 --> 00:26:17.965
Because then you're left with
a mental image of a histogram

00:26:17.965 --> 00:26:19.850
that looks like this.

00:26:19.850 --> 00:26:21.640
And that's very, very dangerous.

00:26:21.640 --> 00:26:22.140
Yeah?

00:26:22.140 --> 00:26:25.570
AUDIENCE: Why does it
matter [INAUDIBLE]?

00:26:25.570 --> 00:26:29.000
[INAUDIBLE] the aggregate
data in bins like that.

00:26:29.000 --> 00:26:33.150
And I mean, sure, exactly
one species is the mode,

00:26:33.150 --> 00:26:35.619
but do you really want the--?

00:26:35.619 --> 00:26:37.410
PROFESSOR: I understand
what you're saying.

00:26:42.240 --> 00:26:44.090
It's just that there's
a qualitative aspect

00:26:44.090 --> 00:26:49.510
to the data, which is that
most species are very rare.

00:26:49.510 --> 00:26:51.630
And this is something that
I think is surprising.

00:26:51.630 --> 00:26:53.410
I think it's deep.

00:26:53.410 --> 00:26:56.120
And it's something that
you do not get realized.

00:26:56.120 --> 00:27:00.368
AUDIENCE: Most species
have more than 16.

00:27:00.368 --> 00:27:03.024
I mean, it depends
what you mean by rare.

00:27:03.024 --> 00:27:03.690
PROFESSOR: Yeah.

00:27:03.690 --> 00:27:06.108
AUDIENCE: Look at the way
that the distribution is away

00:27:06.108 --> 00:27:07.268
from trend.

00:27:07.268 --> 00:27:08.518
AUDIENCE: That's a good point.

00:27:08.518 --> 00:27:10.687
But the species
density is clustered

00:27:10.687 --> 00:27:11.962
around the low numbers.

00:27:11.962 --> 00:27:12.670
PROFESSOR: Right.

00:27:12.670 --> 00:27:14.920
AUDIENCE: But actually most
species have more than 30.

00:27:18.679 --> 00:27:20.220
PROFESSOR: Maybe
the surprising thing

00:27:20.220 --> 00:27:24.260
is that just if you
take-- the mean is 100.

00:27:24.260 --> 00:27:33.440
And so I would've thought that,
if you plot number of species

00:27:33.440 --> 00:27:41.650
as a function of the
number of individuals,

00:27:41.650 --> 00:27:44.290
given those numbers, I would
have guessed, OK, here's 100.

00:27:44.290 --> 00:27:46.480
I would have guessed--
here's 50, so just

00:27:46.480 --> 00:27:49.130
to highlight that this is 150.

00:27:49.130 --> 00:27:53.620
So linear scale, I
would have guessed

00:27:53.620 --> 00:27:56.820
it would look something like
that, maybe larger than Rudin

00:27:56.820 --> 00:27:57.889
or something.

00:27:57.889 --> 00:28:00.055
AUDIENCE: What would that
look like in a log2 scale?

00:28:00.055 --> 00:28:03.951
It would look like It's
like the log of [INAUDIBLE]?

00:28:03.951 --> 00:28:06.460
So it goes up really
fast and then--

00:28:06.460 --> 00:28:11.730
PROFESSOR: So this thing
would be kind of like shoom.

00:28:11.730 --> 00:28:14.070
I mean all the
weight would be in.

00:28:14.070 --> 00:28:16.988
It would be like all here plus
a little bit on each of these.

00:28:16.988 --> 00:28:18.884
AUDIENCE: But yeah.

00:28:18.884 --> 00:28:21.730
I don't think it's
actually that different.

00:28:21.730 --> 00:28:24.340
The only thing that's different
is the tail on the left.

00:28:24.340 --> 00:28:26.092
PROFESSOR: And the
tail on the right.

00:28:26.092 --> 00:28:27.800
AUDIENCE: Yeah, it's
a little bit longer.

00:28:27.800 --> 00:28:29.383
PROFESSOR: No, it's
lot longer, right?

00:28:29.383 --> 00:28:33.530
Because this thing, all of the
weight is between 50 and 150,

00:28:33.530 --> 00:28:35.740
which means that
all of the counts

00:28:35.740 --> 00:28:41.124
are basically going to
be these two, basically.

00:28:41.124 --> 00:28:42.790
Because this thing
comes out either way.

00:28:42.790 --> 00:28:46.240
So in this case, if you
take that histogram put it

00:28:46.240 --> 00:28:48.560
on this kind of scale,
you end up with two bars

00:28:48.560 --> 00:28:50.717
up high, nothing outside.

00:28:50.717 --> 00:28:52.300
So it's a very
different distribution.

00:28:52.300 --> 00:28:55.330
And it's not to say that this
is a ridiculous thing to do.

00:28:55.330 --> 00:28:57.230
It's just that.

00:28:57.230 --> 00:28:59.610
But the problem is
that your mental image

00:28:59.610 --> 00:29:01.930
of what the distribution
looks like ends up

00:29:01.930 --> 00:29:03.910
being incorrect, in
the sense that you

00:29:03.910 --> 00:29:07.040
have a qualitatively different
sense of what's of what's

00:29:07.040 --> 00:29:08.690
going on.

00:29:08.690 --> 00:29:14.260
And if you go up to 10 species,
here, and 10 is way down here.

00:29:17.589 --> 00:29:19.130
If this is what it
looked like, there

00:29:19.130 --> 00:29:22.970
would be essentially no species
with fewer than 10 individuals.

00:29:22.970 --> 00:29:25.780
But if you come over here
and you add it up here.

00:29:25.780 --> 00:29:31.740
It's like a mean of 6
times 10 is 60 out of 200.

00:29:31.740 --> 00:29:35.830
A quarter or a third of the
species on this plot of land

00:29:35.830 --> 00:29:37.450
have fewer than 10 individuals.

00:29:37.450 --> 00:29:39.515
And 10 is really a
very small number.

00:29:43.810 --> 00:29:47.590
Well, rare species are common.

00:29:47.590 --> 00:29:52.170
I think it's a true description
of the observed distribution

00:29:52.170 --> 00:29:53.180
here and elsewhere.

00:29:53.180 --> 00:29:56.620
And it's not something
that you appreciate

00:29:56.620 --> 00:29:58.911
or realize when you
plot it in that way.

00:29:58.911 --> 00:30:01.522
AUDIENCE: But you can get this
information from that plot.

00:30:01.522 --> 00:30:02.480
PROFESSOR: No, I agree.

00:30:02.480 --> 00:30:03.440
You can get it.

00:30:03.440 --> 00:30:04.290
You can get it.

00:30:04.290 --> 00:30:07.280
But it was only 10%
of the group got it.

00:30:07.280 --> 00:30:10.430
Right, the fact that you can
get it-- right, it's possible.

00:30:10.430 --> 00:30:13.370
But you don't get it.

00:30:13.370 --> 00:30:15.680
That is a practical statement.

00:30:15.680 --> 00:30:18.150
Yeah, I'm not dead set
against this distribution.

00:30:18.150 --> 00:30:20.737
It's just that it
makes everybody think

00:30:20.737 --> 00:30:21.820
something that's not true.

00:30:21.820 --> 00:30:24.430
So if you think that that's
OK, then I can't help you.

00:30:27.670 --> 00:30:29.690
It's OK, but it's
just you have to be

00:30:29.690 --> 00:30:33.300
careful is my only statement.

00:30:33.300 --> 00:30:36.550
And I very much want
you to take away.

00:30:36.550 --> 00:30:38.800
Because I this is an accurate
description of the data.

00:30:38.800 --> 00:30:40.270
Rare species are common.

00:30:40.270 --> 00:30:43.360
And one of the readings-- I
think it was in this paper,

00:30:43.360 --> 00:30:45.640
maybe it was a different
one that I was reading.

00:30:45.640 --> 00:30:49.300
Even Darwin, when talking about
this, commented on this fact

00:30:49.300 --> 00:30:54.820
that rarity of species is
somehow a typical event.

00:30:54.820 --> 00:30:57.700
AUDIENCE: And common
species are rare.

00:30:57.700 --> 00:31:01.160
PROFESSOR: And common species
are rare, that's right.

00:31:01.160 --> 00:31:04.720
This distribution is
hugely, hugely skewed.

00:31:14.986 --> 00:31:16.110
These are the measurements.

00:31:18.731 --> 00:31:20.730
It's good to look at them
in both of these ways.

00:31:20.730 --> 00:31:23.950
Because you can't even plot
the data on a linear scale.

00:31:23.950 --> 00:31:26.020
So that's a good
reason for doing it.

00:31:26.020 --> 00:31:29.350
But I think it's good to have
both of these pictures in mind.

00:31:38.680 --> 00:31:42.960
What we want to do is to talk
about two classes of models

00:31:42.960 --> 00:31:45.110
that give something
that's essentially

00:31:45.110 --> 00:31:46.620
this log normal distribution.

00:31:46.620 --> 00:31:50.350
So on a log scale it looks
normally distributed,

00:31:50.350 --> 00:31:53.130
approximately.

00:31:53.130 --> 00:31:54.640
And those two models
are going to be

00:31:54.640 --> 00:31:58.610
kind of a niche-based
model and a neutral model.

00:31:58.610 --> 00:32:01.460
Can somebody, in words,
explain what they maybe

00:32:01.460 --> 00:32:04.200
see as the difference
between this niche

00:32:04.200 --> 00:32:05.800
and a neutral kind of approach?

00:32:21.768 --> 00:32:22.349
Yeah?

00:32:22.349 --> 00:32:23.265
AUDIENCE: [INAUDIBLE].

00:32:26.255 --> 00:32:27.505
PROFESSOR: Every species is--?

00:32:27.505 --> 00:32:28.430
AUDIENCE: [INAUDIBLE].

00:32:28.430 --> 00:32:29.846
PROFESSOR: In which one?

00:32:29.846 --> 00:32:31.130
AUDIENCE: In niche.

00:32:31.130 --> 00:32:34.340
PROFESSOR: In the niche theory,
the species are different.

00:32:34.340 --> 00:32:39.450
So it seems like a
ridiculous statement.

00:32:39.450 --> 00:32:41.380
Do you believe that
species are different?

00:32:41.380 --> 00:32:43.200
We can vote, yes or no.

00:32:43.200 --> 00:32:46.150
Ready, three, two, one.

00:32:46.150 --> 00:32:47.800
Yeah.

00:32:47.800 --> 00:32:51.330
Well, somebody's been convinced
by the neutral theory.

00:32:51.330 --> 00:32:54.760
It's clear that
species are different.

00:32:54.760 --> 00:32:59.390
And the question is which
patterns in the data

00:32:59.390 --> 00:33:01.340
do you need to
invoke differences

00:33:01.340 --> 00:33:03.890
in order to explain?

00:33:03.890 --> 00:33:06.050
And I think that one,
maybe, theme that's

00:33:06.050 --> 00:33:08.640
come out of this relative
species abundance literature

00:33:08.640 --> 00:33:11.110
and the debates between the
neutral and the niche guys

00:33:11.110 --> 00:33:14.390
is just that this
distribution is

00:33:14.390 --> 00:33:18.310
less informative
of the micro scale

00:33:18.310 --> 00:33:22.240
or individual kind
of interactions

00:33:22.240 --> 00:33:25.640
then you might have thought.

00:33:25.640 --> 00:33:29.290
Because multiple
models can adequately

00:33:29.290 --> 00:33:32.050
explain such a pattern.

00:33:32.050 --> 00:33:35.220
In all areas, we
have to remember

00:33:35.220 --> 00:33:38.530
that you make an observation,
and you write down a model

00:33:38.530 --> 00:33:40.892
that explains that observation.

00:33:40.892 --> 00:33:42.600
So what you do is you
write down a model.

00:33:42.600 --> 00:33:44.040
And writing down a
model, what that means

00:33:44.040 --> 00:33:46.010
is that you make some
set of assumptions.

00:33:46.010 --> 00:33:48.176
And then you look to see
what happens in that model.

00:33:48.176 --> 00:33:53.555
And if the model is consistent
with the data, that's good.

00:33:53.555 --> 00:33:55.680
But it doesn't prove that
the assumptions that went

00:33:55.680 --> 00:33:56.804
into the model are correct.

00:33:56.804 --> 00:33:58.475
And this is a trivial statement.

00:33:58.475 --> 00:33:59.475
And I've said it before.

00:34:01.839 --> 00:34:03.880
You have to tell yourself
this or remind yourself

00:34:03.880 --> 00:34:05.580
of this kind of once a month.

00:34:05.580 --> 00:34:07.970
Because it's just such an
easy thing to forget about.

00:34:15.090 --> 00:34:18.420
Now, the niche
models indeed assume

00:34:18.420 --> 00:34:20.190
that the species are different.

00:34:20.190 --> 00:34:22.840
And that's reasonable.

00:34:22.840 --> 00:34:24.687
Because we think it's true.

00:34:24.687 --> 00:34:26.770
But then, of course, there
are many different ways

00:34:26.770 --> 00:34:28.550
of capturing those differences.

00:34:28.550 --> 00:34:32.600
And then you have to decide
whether the assumptions there

00:34:32.600 --> 00:34:37.030
are reasonable or whether
they're necessary, essential.

00:34:37.030 --> 00:34:39.170
In the context of
the niche models,

00:34:39.170 --> 00:34:42.500
we're going to think about the
so-called broken stick models.

00:34:52.510 --> 00:34:55.219
So basically, you get
log normal distributions

00:34:55.219 --> 00:34:58.320
when there's some sort of
multiplicative-type random

00:34:58.320 --> 00:35:01.260
process that's being
added together.

00:35:01.260 --> 00:35:02.930
You get normal
distributions when

00:35:02.930 --> 00:35:05.794
you have sums of random
things going together.

00:35:05.794 --> 00:35:07.210
This is the central
limit theorem.

00:35:07.210 --> 00:35:09.560
But when you have
multiplicative kind

00:35:09.560 --> 00:35:12.895
of errors or random
processes coming together,

00:35:12.895 --> 00:35:14.270
you get log normal
distributions.

00:35:14.270 --> 00:35:17.650
And I want to highlight that
that does not necessarily

00:35:17.650 --> 00:35:23.720
have to tell you so much
about the biology of it.

00:35:23.720 --> 00:35:28.540
Because a classic
situation where you get log

00:35:28.540 --> 00:35:35.590
normal distributions is if you
take a stone and you crush it.

00:35:41.220 --> 00:35:43.350
You can do this
experiment at home.

00:35:43.350 --> 00:35:47.930
And then you measure the mass
distribution of the resulting

00:35:47.930 --> 00:35:48.430
fragments.

00:35:55.390 --> 00:36:00.050
And the distribution
of mass is log normal.

00:36:05.210 --> 00:36:13.280
Just take a stone, grind it
under your boot or hammer it,

00:36:13.280 --> 00:36:15.030
just kind rub it right in.

00:36:15.030 --> 00:36:18.230
You'll get you'll get some
distribution of fragments.

00:36:18.230 --> 00:36:20.950
For each of the fragments,
measure the mass, and, indeed,

00:36:20.950 --> 00:36:22.930
you end up getting a
log normal distribution.

00:36:22.930 --> 00:36:24.930
Because there's some sense
that what's happening

00:36:24.930 --> 00:36:28.260
is that you take a larger
mass, you break it up randomly,

00:36:28.260 --> 00:36:30.490
and then the resulting
fragments, at some rate,

00:36:30.490 --> 00:36:32.270
each of them you
break up randomly.

00:36:32.270 --> 00:36:33.770
and the small ones
are maybe kind of

00:36:33.770 --> 00:36:35.644
less likely to get broken
up as the big ones,

00:36:35.644 --> 00:36:38.232
so then the small ones can
still get even smaller.

00:36:38.232 --> 00:36:40.690
But then there's going to be,
at some rate, some very large

00:36:40.690 --> 00:36:42.060
ones.

00:36:42.060 --> 00:36:47.880
So such a process ends up--
I mean it's not biology.

00:36:47.880 --> 00:36:51.660
This is just something about
the nature of the breaking

00:36:51.660 --> 00:36:53.990
up of this physical object.

00:36:53.990 --> 00:36:56.160
And indeed, the basic
idea behind many

00:36:56.160 --> 00:36:58.630
of the niche models that give
you a log normal distribution

00:36:58.630 --> 00:37:02.270
is equivalent to
crushing a stone

00:37:02.270 --> 00:37:05.910
and measuring the
resulting distribution.

00:37:05.910 --> 00:37:07.670
I'll describe what
I mean by that.

00:37:07.670 --> 00:37:11.120
Typically, the
broken stick models,

00:37:11.120 --> 00:37:16.140
they say there's
some resource axis.

00:37:16.140 --> 00:37:20.300
This is a resource axis.

00:37:20.300 --> 00:37:24.430
And this could be, for example,
where you're getting food from.

00:37:24.430 --> 00:37:26.570
Now, we're going to have
to divide up this resource

00:37:26.570 --> 00:37:28.615
access among some number
of different species.

00:37:28.615 --> 00:37:29.990
And what we're
going to assume is

00:37:29.990 --> 00:37:32.120
that the number of
individuals in the species

00:37:32.120 --> 00:37:35.460
is proportional to the
length of the resource axis

00:37:35.460 --> 00:37:36.698
that it's able to capture.

00:37:39.450 --> 00:37:43.150
And I want to make
sure I find my notes.

00:37:43.150 --> 00:37:44.275
I want to highlight this.

00:37:44.275 --> 00:37:53.600
This comes from
MacArthur in the 1950s.

00:37:53.600 --> 00:37:58.480
MacArthur and it's 1957.

00:37:58.480 --> 00:38:01.780
So we imagine there's this
homogeneous resource axis.

00:38:01.780 --> 00:38:03.600
We're going to break
it up into N segments.

00:38:03.600 --> 00:38:08.860
And the abundances are
proportional to the length.

00:38:08.860 --> 00:38:13.690
And the idea is that, if you
just break this up randomly,

00:38:13.690 --> 00:38:17.150
so let's say you just draw
N minus 1 lines randomly,

00:38:17.150 --> 00:38:20.290
or N minus 1 points
randomly here.

00:38:20.290 --> 00:38:27.775
Now you have N species with
N different abundances.

00:38:30.480 --> 00:38:32.360
The question is does
that give a log normal?

00:38:35.160 --> 00:38:37.495
We'll say N minus
1 random points.

00:38:41.084 --> 00:38:42.377
Do you understand what I mean.

00:38:42.377 --> 00:38:44.460
You sample uniformly once,
sample uniformly twice.

00:38:44.460 --> 00:38:50.220
You do that N minus 1 times,
and now you have N and deh deh.

00:38:50.220 --> 00:38:52.130
And then we say, OK,
the first species

00:38:52.130 --> 00:38:53.230
has this many individuals.

00:38:53.230 --> 00:38:54.790
The second has this one.

00:38:54.790 --> 00:38:58.230
The third is this
one, et cetera.

00:38:58.230 --> 00:39:00.589
The question is
does random points,

00:39:00.589 --> 00:39:01.880
does that lead to a log normal?

00:39:07.400 --> 00:39:09.380
Yes and no.

00:39:09.380 --> 00:39:12.250
Let's think about
this for 10 seconds.

00:39:12.250 --> 00:39:20.030
N minus 1 random points,
log normal distribution,

00:39:20.030 --> 00:39:23.115
ready, three, two, one.

00:39:26.320 --> 00:39:30.200
So I'd say that we have
a majority are saying no.

00:39:30.200 --> 00:39:31.610
Can somebody say why that is?

00:39:31.610 --> 00:39:34.430
AUDIENCE: [INAUDIBLE].

00:39:34.430 --> 00:39:36.440
PROFESSOR: Because
it's something else.

00:39:36.440 --> 00:39:39.900
That's fair.

00:39:39.900 --> 00:39:41.865
But can you say
qualitatively why it is

00:39:41.865 --> 00:39:43.682
that this is not going to work?

00:39:43.682 --> 00:39:48.632
AUDIENCE: You can't
have very long gaps.

00:39:48.632 --> 00:39:49.340
PROFESSOR: Right.

00:39:49.340 --> 00:39:51.256
That is it's going to
be very unusual that you

00:39:51.256 --> 00:39:53.310
get a very long gap.

00:39:53.310 --> 00:39:55.510
What about the other end?

00:39:55.510 --> 00:39:58.064
AUDIENCE: Also a very long tail.

00:39:58.064 --> 00:39:59.730
PROFESSOR: Now I'm a
little bit worried.

00:40:03.040 --> 00:40:04.460
I think that that's true, right?

00:40:10.640 --> 00:40:12.910
Well, I'm going to say
that you're not going

00:40:12.910 --> 00:40:16.140
to get this super long ones.

00:40:16.140 --> 00:40:20.810
I think that the
distribution might still

00:40:20.810 --> 00:40:24.500
be peaked at short values.

00:40:24.500 --> 00:40:25.230
No?

00:40:25.230 --> 00:40:26.550
AUDIENCE: No.

00:40:26.550 --> 00:40:28.510
PROFESSOR: Random?

00:40:28.510 --> 00:40:32.236
If we were just traveling
along this resource axis,

00:40:32.236 --> 00:40:34.360
at a rate that's kind of
exponentially distributed,

00:40:34.360 --> 00:40:36.480
like Poisson rate,
we just dropped

00:40:36.480 --> 00:40:40.635
points, that's something
very similar to this random--

00:40:40.635 --> 00:40:43.167
AUDIENCE: It said we're
limited in the number--

00:40:43.167 --> 00:40:43.750
PROFESSOR: No.

00:40:43.750 --> 00:40:45.604
Is that not true?

00:40:45.604 --> 00:40:47.020
AUDIENCE: Your
sample [INAUDIBLE].

00:40:53.030 --> 00:40:57.310
PROFESSOR: I'm a little bit
worried that I might be-- now,

00:40:57.310 --> 00:40:58.600
I'm not 100% confident.

00:40:58.600 --> 00:41:01.361
Depending on how I look at this,
I get different distributions.

00:41:01.361 --> 00:41:01.860
Yeah?

00:41:01.860 --> 00:41:05.716
AUDIENCE: But I think the
first thing that he said,

00:41:05.716 --> 00:41:08.465
where you just say, I'm going
to pick N minus 1 points--

00:41:08.465 --> 00:41:09.090
PROFESSOR: Yes.

00:41:09.090 --> 00:41:11.940
AUDIENCE: --is a different
thing than going along the axis

00:41:11.940 --> 00:41:14.320
and exponentially
dropping ones along.

00:41:14.320 --> 00:41:17.021
PROFESSOR: I agree
it's different.

00:41:17.021 --> 00:41:19.395
AUDIENCE: I don't think that
would be the idea simulated,

00:41:19.395 --> 00:41:22.093
because you would be
very likely to just get

00:41:22.093 --> 00:41:24.134
this giant thing at the
end when you're finished.

00:41:24.134 --> 00:41:26.120
AUDIENCE: What you could
do, you could go on

00:41:26.120 --> 00:41:29.830
to draft N plus 2 points.

00:41:29.830 --> 00:41:30.830
PROFESSOR: No, I think--

00:41:30.830 --> 00:41:33.125
AUDIENCE: These scales that
are your two end points

00:41:33.125 --> 00:41:33.980
[? are doubled. ?]

00:41:33.980 --> 00:41:35.938
PROFESSOR: Because I
think that the probability

00:41:35.938 --> 00:41:37.310
distribution does grow.

00:41:37.310 --> 00:41:40.370
I think that I'm going
to side with you.

00:41:40.370 --> 00:41:42.190
So we've decided
that there are not

00:41:42.190 --> 00:41:44.315
going to be as
many short sticks,

00:41:44.315 --> 00:41:46.190
and there's not going
to be as long sticks as

00:41:46.190 --> 00:41:47.231
compared to a log normal.

00:41:47.231 --> 00:41:48.226
Do we agree with that?

00:41:51.205 --> 00:41:53.580
At least we agree that it's
not going to be a log normal.

00:41:53.580 --> 00:41:55.496
So you're not going to
get this huge variation

00:41:55.496 --> 00:41:59.230
of some very long sticks
and some very short ones.

00:41:59.230 --> 00:42:03.710
Now, the question is how would
you change this sort of model

00:42:03.710 --> 00:42:05.581
in order to generate
a log normal?

00:42:05.581 --> 00:42:07.330
And the answer is that
what you have to do

00:42:07.330 --> 00:42:10.980
is you have to what is called
some niche hierarchy or so

00:42:10.980 --> 00:42:12.690
some hierarchical breaking.

00:42:12.690 --> 00:42:15.040
Just like what led to the
stone giving you a log normal

00:42:15.040 --> 00:42:17.100
is that you have to have
some successive process

00:42:17.100 --> 00:42:18.810
of breaking things.

00:42:18.810 --> 00:42:21.190
So this is what they call
some hierarchy model.

00:42:27.225 --> 00:42:29.225
And then they key thing
is that it's sequential.

00:42:33.260 --> 00:42:35.060
You have your resource axis.

00:42:35.060 --> 00:42:39.661
First, you have some
rule for breaking it up.

00:42:39.661 --> 00:42:41.410
It could be that you
just sample uniformly

00:42:41.410 --> 00:42:43.034
or some other
probability distribution.

00:42:45.089 --> 00:42:46.880
And the way that you
might think about this

00:42:46.880 --> 00:42:57.020
is via-- just everything
up on the board

00:42:57.020 --> 00:42:58.270
is so nice and useful.

00:42:58.270 --> 00:43:02.430
I feel bad getting rid of it.

00:43:02.430 --> 00:43:06.170
This thing is not true, so
I don't mind erasing it.

00:43:06.170 --> 00:43:10.560
So let's imagine some bird
community in the forest.

00:43:13.260 --> 00:43:14.770
And we're going to
think about where

00:43:14.770 --> 00:43:18.336
is it that the birds are
getting their grub or their food

00:43:18.336 --> 00:43:18.836
to eat.

00:43:21.530 --> 00:43:27.620
First, well, now the
axis is somehow vertical.

00:43:27.620 --> 00:43:35.210
You could divide them up
into the ground foragers

00:43:35.210 --> 00:43:39.730
as compared to the tree
foragers in terms of where

00:43:39.730 --> 00:43:40.971
they're getting their food.

00:43:40.971 --> 00:43:43.470
And you say, oh, well, how much
of the food is on each side?

00:43:43.470 --> 00:43:48.850
Oh, well, we'll say 30% is on
the ground, 70% is on the tree.

00:43:48.850 --> 00:43:50.889
This is along the stick.

00:43:50.889 --> 00:43:52.430
You cut the stick
in some way, or you

00:43:52.430 --> 00:43:54.530
break the stick in some way.

00:43:54.530 --> 00:43:56.060
But then within
the tree foragers,

00:43:56.060 --> 00:44:00.110
you'd say, well, the
resources might be separated.

00:44:00.110 --> 00:44:01.750
And this is really
like speciation,

00:44:01.750 --> 00:44:03.690
a species is in the
niche, the species

00:44:03.690 --> 00:44:06.300
are focusing on
different niches.

00:44:06.300 --> 00:44:09.030
So you'd say, oh, some are
going to focus on the trunk,

00:44:09.030 --> 00:44:12.330
some will focus on branches.

00:44:12.330 --> 00:44:13.950
And again, this
part of the stick

00:44:13.950 --> 00:44:17.720
is now broken or divided among
different resource locations

00:44:17.720 --> 00:44:18.885
with some amount.

00:44:18.885 --> 00:44:21.855
But then also, you're
going to get speciation

00:44:21.855 --> 00:44:23.730
in different directions
here, because there's

00:44:23.730 --> 00:44:27.590
both the surface-- I don't
know if you guys have ever

00:44:27.590 --> 00:44:30.200
eaten grubs-- but there's
the surface grubs,

00:44:30.200 --> 00:44:32.665
and then there's also
the sub-bark grubs.

00:44:38.100 --> 00:44:42.230
And so you kind of do this
process multiple times,

00:44:42.230 --> 00:44:44.650
where you kind of pick
different branches

00:44:44.650 --> 00:44:47.020
and break them to
divide up the niche.

00:44:47.020 --> 00:44:49.972
And then you end up with a
log normal type distribution.

00:44:49.972 --> 00:44:54.840
And this is a similar process
to the crushing of the stone,

00:44:54.840 --> 00:45:01.870
because the idea is that there's
sequential breaks of the stone.

00:45:01.870 --> 00:45:04.980
So the stone first breaks
into maybe simply two

00:45:04.980 --> 00:45:06.474
or it could be three.

00:45:06.474 --> 00:45:07.640
First, there's one breaking.

00:45:07.640 --> 00:45:09.400
And then one of
them is broken more.

00:45:09.400 --> 00:45:11.924
So given this
process, you end up

00:45:11.924 --> 00:45:13.340
getting a log
normal distribution.

00:45:13.340 --> 00:45:14.334
Yeah.

00:45:14.334 --> 00:45:18.640
AUDIENCE: But you also have a
distribution of like how far.

00:45:18.640 --> 00:45:20.340
Because I guess there
are two questions.

00:45:20.340 --> 00:45:23.620
Like when you break your
stick, you assume, somehow,

00:45:23.620 --> 00:45:25.154
that you uniformly break it.

00:45:25.154 --> 00:45:25.820
PROFESSOR: Yeah.

00:45:29.050 --> 00:45:31.290
A lot of work has gone
into the question of how it

00:45:31.290 --> 00:45:33.330
is you should break the stick.

00:45:33.330 --> 00:45:35.390
Given that you have this
tree foraging stick.

00:45:39.160 --> 00:45:40.840
On a practical level,
what they do is

00:45:40.840 --> 00:45:44.140
they ask, well, what probability
distribution gives you the best

00:45:44.140 --> 00:45:45.490
agreement with the data?

00:45:45.490 --> 00:45:46.860
Is it uniform?

00:45:46.860 --> 00:45:50.160
Or is it, oh, it's
broken like this?

00:45:50.160 --> 00:45:52.490
And in some cases
people say, well, it's

00:45:52.490 --> 00:45:54.055
actually tilted on one side.

00:46:00.900 --> 00:46:02.770
Well, in the context
of a succession

00:46:02.770 --> 00:46:04.350
and some other
environments, there's

00:46:04.350 --> 00:46:06.660
an idea that, if a species
first gets somewhere,

00:46:06.660 --> 00:46:08.790
they can kind of
monopolize a larger

00:46:08.790 --> 00:46:11.630
fraction of the resources
then if it's divided kind

00:46:11.630 --> 00:46:12.922
of an equally at the beginning.

00:46:12.922 --> 00:46:15.505
And that's going to effect where
this probability distribution

00:46:15.505 --> 00:46:17.150
is going to break each one.

00:46:17.150 --> 00:46:23.580
But there's always this
question about how constrained

00:46:23.580 --> 00:46:25.080
are the notions and so forth.

00:46:25.080 --> 00:46:27.384
And I'm agnostic on that point.

00:46:27.384 --> 00:46:31.016
AUDIENCE: But you also need
distribution for how many times

00:46:31.016 --> 00:46:33.440
it breaks [INAUDIBLE].

00:46:33.440 --> 00:46:35.390
PROFESSOR: Yes.

00:46:35.390 --> 00:46:38.054
It's just that, if
you do this process,

00:46:38.054 --> 00:46:39.970
it's like a central limit
theorem type result.

00:46:39.970 --> 00:46:42.010
So you have to do it enough
times so that you get

00:46:42.010 --> 00:46:43.260
to some limiting distribution.

00:46:43.260 --> 00:46:45.120
And then you could
keep on doing it.

00:46:45.120 --> 00:46:47.120
In the end, we always say
that species abundance

00:46:47.120 --> 00:46:49.190
is proportional to the size.

00:46:49.190 --> 00:46:51.520
So we're going to
scale, ultimately,

00:46:51.520 --> 00:46:53.360
to get the correct
number of individuals.

00:46:53.360 --> 00:46:57.130
It's just that you have to do it
some reasonable number of times

00:46:57.130 --> 00:46:58.880
so that the randomness
kind of washes out,

00:46:58.880 --> 00:47:00.963
and you end up approaching
that limiting behavior.

00:47:04.095 --> 00:47:04.970
Does that make sense?

00:47:09.480 --> 00:47:14.690
And indeed I just want
to mention a major result

00:47:14.690 --> 00:47:15.840
in this field.

00:47:18.360 --> 00:47:23.760
These niche type
models successfully

00:47:23.760 --> 00:47:27.692
explained or predicted
another pattern

00:47:27.692 --> 00:47:30.150
that had been observed, which
is the so-called species area

00:47:30.150 --> 00:47:30.733
relationships.

00:47:39.440 --> 00:47:43.570
So this is just saying that,
here, we looked at 50 hectares,

00:47:43.570 --> 00:47:45.610
and we asked how many
species where there.

00:47:45.610 --> 00:47:47.550
225 species in 50 hectares.

00:47:47.550 --> 00:47:50.550
Now, the question is, if instead
of looking at 50 hectares,

00:47:50.550 --> 00:47:54.082
we instead looked
at 500, do you think

00:47:54.082 --> 00:47:55.790
of that the number of
species we observed

00:47:55.790 --> 00:48:01.538
would have gone up, stayed
the same, or gone down?

00:48:01.538 --> 00:48:07.936
Up, same, down, ready,
three, two, one.

00:48:07.936 --> 00:48:08.880
Up.

00:48:08.880 --> 00:48:09.830
Up.

00:48:09.830 --> 00:48:13.660
If you look at a
larger area, you

00:48:13.660 --> 00:48:17.170
expect to see more
species in a larger area.

00:48:17.170 --> 00:48:18.890
And people really do this.

00:48:18.890 --> 00:48:23.070
They look in some area, going
from, say, they take a meter,

00:48:23.070 --> 00:48:24.450
and they count all the species.

00:48:24.450 --> 00:48:26.530
And then they go and
here is 100 meters,

00:48:26.530 --> 00:48:29.060
and they count all the species.

00:48:29.060 --> 00:48:31.190
And they ask, how
many species do you

00:48:31.190 --> 00:48:32.440
see as a function of the area?

00:48:32.440 --> 00:48:34.773
And what people have found
is that the number of species

00:48:34.773 --> 00:48:42.580
you observe it is proportional
to the area to some power,

00:48:42.580 --> 00:48:44.790
where Z is around a 1/4.

00:48:52.036 --> 00:48:55.584
And of course, the area
goes as some r squared.

00:48:55.584 --> 00:48:57.000
If you wanted to,
you could say it

00:48:57.000 --> 00:49:01.000
goes as the square root
of the radius, whatever.

00:49:01.000 --> 00:49:03.670
But the number
species in some area,

00:49:03.670 --> 00:49:06.100
it grows, but it
grows in a manner

00:49:06.100 --> 00:49:08.870
that is less than linear.

00:49:08.870 --> 00:49:11.349
Does that make sense?

00:49:11.349 --> 00:49:13.390
It definitely makes sense
that's less the linear.

00:49:13.390 --> 00:49:17.302
Because linear would be that you
sample a bunch of species here,

00:49:17.302 --> 00:49:19.135
and then you look at
another identical plot,

00:49:19.135 --> 00:49:20.125
you get some other species.

00:49:20.125 --> 00:49:22.583
And they were saying that, oh,
that you really don't expect

00:49:22.583 --> 00:49:24.400
any of those species overlap.

00:49:24.400 --> 00:49:26.420
That would be a weird world.

00:49:26.420 --> 00:49:31.390
So it very much make sense
that this is less than 1.

00:49:31.390 --> 00:49:34.550
Of course, it didn't have
to be this power law.

00:49:34.550 --> 00:49:39.010
But one thing that has been
discovered, around the world,

00:49:39.010 --> 00:49:42.050
is that power laws
are very interesting.

00:49:42.050 --> 00:49:45.810
But once again, many different
microscopic processes

00:49:45.810 --> 00:49:47.200
can lead to power laws.

00:49:47.200 --> 00:49:49.270
The niche models
have successfully

00:49:49.270 --> 00:49:53.290
predicted or explained why
it might have this scaling.

00:49:53.290 --> 00:49:56.400
But it turns out that neutral
models can also predict it.

00:49:56.400 --> 00:50:00.490
And may just be that lots
of spatially explicit models

00:50:00.490 --> 00:50:02.520
will give you some power
law type scaling that

00:50:02.520 --> 00:50:04.589
looks kind of like this.

00:50:04.589 --> 00:50:06.130
So once again, it's
a question of how

00:50:06.130 --> 00:50:09.162
convinced you should be
about microscopic processes

00:50:09.162 --> 00:50:10.870
based on being able
to explain some data.

00:50:10.870 --> 00:50:17.880
And I think the best
cure for this danger,

00:50:17.880 --> 00:50:20.480
of assuming that the microscopic
assumptions are correct,

00:50:20.480 --> 00:50:22.438
because the model is able
to explain something,

00:50:22.438 --> 00:50:25.010
is that, if you find some
other very different set

00:50:25.010 --> 00:50:29.629
of microscopic assumptions that
also explain the patterns, then

00:50:29.629 --> 00:50:31.670
it becomes clear that you
have to take everything

00:50:31.670 --> 00:50:33.496
with a grain of salt.

00:50:33.496 --> 00:50:34.870
And that's I think
part of what's

00:50:34.870 --> 00:50:38.270
been very valuable about the
neutral theory contribution

00:50:38.270 --> 00:50:39.010
to this field.

00:50:39.010 --> 00:50:41.179
AUDIENCE: Does this
just come from-- you

00:50:41.179 --> 00:50:44.312
assume that all the individuals
are uniformly distributed

00:50:44.312 --> 00:50:45.758
and then [INAUDIBLE]?

00:50:50.997 --> 00:50:53.080
PROFESSOR: There are
multiple derivations of this,

00:50:53.080 --> 00:50:54.371
so it's a little bit confusing.

00:51:03.135 --> 00:51:05.500
The neutral models,
that I have seen,

00:51:05.500 --> 00:51:07.780
that lead to these
patterns, they basically

00:51:07.780 --> 00:51:11.510
have the individuals
randomly, either

00:51:11.510 --> 00:51:14.490
with sex or without sex,
kind of diffusing around,

00:51:14.490 --> 00:51:16.640
and then they divide, deh-deh.

00:51:16.640 --> 00:51:20.190
And then you can explicitly
just do the different spaces

00:51:20.190 --> 00:51:22.040
and see that you get a scaling.

00:51:22.040 --> 00:51:28.210
It seems to be a
surprisingly emergent feature

00:51:28.210 --> 00:51:31.190
of many of these models.

00:51:31.190 --> 00:51:32.740
And once again, it
may be something

00:51:32.740 --> 00:51:34.492
that tells us less
about biology than it

00:51:34.492 --> 00:51:35.700
does about math or something.

00:51:44.940 --> 00:51:47.890
Any other questions about
this, the base notion

00:51:47.890 --> 00:51:49.390
of this niche
hierarchy type models?

00:51:54.510 --> 00:51:56.060
So I want to spend
some time talking

00:51:56.060 --> 00:51:58.830
about this neutral
theory in ecology.

00:51:58.830 --> 00:52:00.550
The math, in particular
the derivation

00:52:00.550 --> 00:52:02.216
of this particular
closed form solution,

00:52:02.216 --> 00:52:04.775
is not really so
interesting or relevant.

00:52:04.775 --> 00:52:06.650
But I think it's very
important to understand

00:52:06.650 --> 00:52:09.635
what the assumptions are in the
model and maybe also something

00:52:09.635 --> 00:52:12.260
about the circumstances in which
we think that it should apply.

00:52:25.530 --> 00:52:29.370
So the basic idea is that
we have, what we hope,

00:52:29.370 --> 00:52:36.140
is some metacommunity
that is large.

00:52:39.530 --> 00:52:40.900
And then we have an island.

00:52:40.900 --> 00:52:44.670
So this has to do with this
theory of island biogeography.

00:52:44.670 --> 00:52:47.185
We have an island over here.

00:52:47.185 --> 00:52:51.540
And in the context of the
nomenclature of this paper,

00:52:51.540 --> 00:52:54.249
they are some community
size, size j here.

00:52:54.249 --> 00:52:56.165
This tells us about the
number of individuals.

00:53:01.360 --> 00:53:04.830
And they're distributed
across some number of species.

00:53:04.830 --> 00:53:08.710
Now, the neutral
theory, the key thing

00:53:08.710 --> 00:53:12.260
is that we assume that all
individuals are identical.

00:53:22.780 --> 00:53:26.490
And once again, it's not
that the neutral theorists

00:53:26.490 --> 00:53:28.310
believe that this is true.

00:53:28.310 --> 00:53:31.314
It's that they think that it
may be sufficient to explain

00:53:31.314 --> 00:53:32.605
the patterns that are observed.

00:53:40.755 --> 00:53:42.880
And when we say that all
individuals are identical,

00:53:42.880 --> 00:53:46.470
what we mean is that the
demographic parameters are

00:53:46.470 --> 00:53:49.586
the same, birth, death rates.

00:53:53.990 --> 00:53:56.460
And it's even a stronger
assumption, in some ways,

00:53:56.460 --> 00:53:57.010
than that.

00:53:57.010 --> 00:53:59.010
It's assuming that the
individuals are the same,

00:53:59.010 --> 00:54:02.360
the species are the same, and
that there are no interactions

00:54:02.360 --> 00:54:03.560
within the species as well.

00:54:03.560 --> 00:54:07.880
So there's no Alley effect,
or no specific competition.

00:54:07.880 --> 00:54:09.480
So the birth, death
rates are going

00:54:09.480 --> 00:54:16.120
to be independent of everything,
which is an amazingly

00:54:16.120 --> 00:54:16.917
parsimonious model.

00:54:16.917 --> 00:54:19.250
And it's kind of amazing you
can get anything out of it.

00:54:24.040 --> 00:54:28.150
And then we have a
migration rate m.

00:54:28.150 --> 00:54:29.740
It's either a rate
or a probability,

00:54:29.740 --> 00:54:32.230
depending on how
you think about it.

00:54:32.230 --> 00:54:35.490
Rate or probability m.

00:54:35.490 --> 00:54:42.330
And can somebody remind
us how we handle that?

00:54:48.713 --> 00:54:51.168
AUDIENCE: Both just
in a community?

00:54:51.168 --> 00:54:52.150
PROFESSOR: Yeah.

00:54:52.150 --> 00:54:54.114
AUDIENCE: At some
probability that

00:54:54.114 --> 00:54:57.167
is proportional to the
distribution of the species

00:54:57.167 --> 00:54:58.042
in the metacommunity?

00:55:00.790 --> 00:55:02.040
PROFESSOR: Yeah, that's right.

00:55:02.040 --> 00:55:03.531
AUDIENCE: --transfer
an individual

00:55:03.531 --> 00:55:05.919
from the metacommunity
to the island.

00:55:05.919 --> 00:55:06.710
PROFESSOR: Perfect.

00:55:06.710 --> 00:55:08.264
AUDIENCE: We do
stick to the island

00:55:08.264 --> 00:55:09.820
to make sure that
number of individuals.

00:55:09.820 --> 00:55:10.528
PROFESSOR: Right.

00:55:10.528 --> 00:55:15.150
So what we're going to
do is we're basically

00:55:15.150 --> 00:55:19.570
going to pick a random
individual, here, each cycle.

00:55:19.570 --> 00:55:21.440
This is kind of like
a Moran process.

00:55:21.440 --> 00:55:23.320
We're going to pick
an individual here.

00:55:23.320 --> 00:55:25.950
And we're going to kill him.

00:55:25.950 --> 00:55:30.740
And then what we're going to
do is, with probability m,

00:55:30.740 --> 00:55:32.830
replace that individual
with one member

00:55:32.830 --> 00:55:35.030
of the metacommunity at random.

00:55:35.030 --> 00:55:36.930
So the rate coming
from here will

00:55:36.930 --> 00:55:40.380
be proportional to the species
abundance in the metacommunity.

00:55:40.380 --> 00:55:43.430
And with a probability of 1
minus m, what we're going to do

00:55:43.430 --> 00:55:46.400
is we're going to
replace that individual

00:55:46.400 --> 00:55:51.960
with another individual
in the island.

00:55:51.960 --> 00:55:56.510
Now, the math kind of gets
hairy and complicated.

00:55:56.510 --> 00:56:01.547
But the basic notion
is really quite simple.

00:56:01.547 --> 00:56:03.130
You have a metacommunity
distribution,

00:56:03.130 --> 00:56:04.980
which is going to end up
being the so-called Fisher log

00:56:04.980 --> 00:56:06.030
series in this model.

00:56:13.780 --> 00:56:17.004
This describes the species
abundance on the metacommunity.

00:56:17.004 --> 00:56:18.420
But then on the
island, we're just

00:56:18.420 --> 00:56:21.100
going to assume that there's
birth, death that occurs over

00:56:21.100 --> 00:56:22.210
here at some rate.

00:56:22.210 --> 00:56:25.205
But we don't even have to
hardly think about that.

00:56:25.205 --> 00:56:27.330
From the standpoint of,
say, a simulation or model,

00:56:27.330 --> 00:56:29.070
we just run multiple
cycles of this,

00:56:29.070 --> 00:56:30.260
where we have j individuals.

00:56:30.260 --> 00:56:31.260
And we always have
j individuals,

00:56:31.260 --> 00:56:32.830
because it's like
the Moran process.

00:56:32.830 --> 00:56:35.066
At every time point,
we kill one individual,

00:56:35.066 --> 00:56:37.690
and we replace it, with somebody
either from the same community

00:56:37.690 --> 00:56:40.840
or from the island.

00:56:40.840 --> 00:56:47.539
And you can imagine that in
the limit of m going to zero,

00:56:47.539 --> 00:56:49.080
what's going to
happen on the island?

00:56:52.240 --> 00:56:54.120
Yeah, so you'll end up
just one species, just

00:56:54.120 --> 00:56:56.980
because this is just
random, like genetic drift.

00:56:56.980 --> 00:57:00.410
It's ecological drift where
one species will take over.

00:57:00.410 --> 00:57:04.320
Whereas if m is large,
then somehow it's

00:57:04.320 --> 00:57:07.405
more of a reflection
of the metacommunity.

00:57:13.680 --> 00:57:15.830
Are there any questions
about what this model

00:57:15.830 --> 00:57:18.098
is looking like for now?

00:57:18.098 --> 00:57:20.264
AUDIENCE: Could we talk
about the Fisher log series?

00:57:20.264 --> 00:57:20.732
PROFESSOR: Yeah.

00:57:20.732 --> 00:57:22.606
AUDIENCE: So we would
put it on the same axis

00:57:22.606 --> 00:57:23.540
as the [INAUDIBLE]?

00:57:23.540 --> 00:57:25.770
PROFESSOR: Yes, this is a
very, very good question.

00:57:25.770 --> 00:57:27.319
So we'll do this
in just a moment.

00:57:27.319 --> 00:57:28.610
Because this is very important.

00:57:32.210 --> 00:57:35.950
I want to say just a couple
things about this model.

00:57:35.950 --> 00:57:37.750
So when I read this
paper, what I imagined

00:57:37.750 --> 00:57:39.208
is that it really
looked like this.

00:57:39.208 --> 00:57:44.890
This was Panama, and that,
30 kilometers off the coast,

00:57:44.890 --> 00:57:48.610
there was this island,
BCI, Barro Colorado Island.

00:57:48.610 --> 00:57:51.560
But that's not maybe
an accurate description

00:57:51.560 --> 00:57:54.660
of what the real
system looks like.

00:57:54.660 --> 00:57:56.340
Does anybody know where BCI is?

00:57:56.340 --> 00:57:57.907
AUDIENCE: It's in Panama.

00:57:57.907 --> 00:57:58.490
PROFESSOR: Hm?

00:57:58.490 --> 00:57:59.350
AUDIENCE: Panama.

00:57:59.350 --> 00:58:00.600
PROFESSOR: So it is in Panama.

00:58:00.600 --> 00:58:02.430
But it's not off
the coast of Panama.

00:58:02.430 --> 00:58:03.820
I guess that was my original.

00:58:03.820 --> 00:58:05.450
AUDIENCE: It's in the canal.

00:58:05.450 --> 00:58:07.050
PROFESSOR: Yeah,
it's in the canal.

00:58:07.050 --> 00:58:09.400
So it's an island
that was created when

00:58:09.400 --> 00:58:11.540
they made the Panama Canal.

00:58:11.540 --> 00:58:13.520
So this thing was
not always an island.

00:58:13.520 --> 00:58:16.240
It's been an island
for 100 years.

00:58:16.240 --> 00:58:17.959
And it's in the
middle of a canal.

00:58:17.959 --> 00:58:20.250
And they actually have cougars
that swim back and forth

00:58:20.250 --> 00:58:22.980
from the mainland.

00:58:22.980 --> 00:58:25.952
But it does make you
wonder whether this

00:58:25.952 --> 00:58:32.390
is-- it's much more strongly
coupled to the mainland

00:58:32.390 --> 00:58:36.670
then I imagined when I
read this paper at first.

00:58:36.670 --> 00:58:39.390
I don't know what that
means for all this.

00:58:39.390 --> 00:58:42.680
But certainly, you
expect this to be

00:58:42.680 --> 00:58:44.880
a more or less appropriate
model depending on this.

00:58:44.880 --> 00:58:46.810
Because, of course,
if you went and you

00:58:46.810 --> 00:58:48.990
sampled 50 hectares
here, you wouldn't

00:58:48.990 --> 00:58:51.440
believe that it should
have the same distribution.

00:58:51.440 --> 00:58:54.706
You'd believe it should be more
like the Fisher log series.

00:58:54.706 --> 00:58:57.080
And there's some evidence that
things are tilted in a way

00:58:57.080 --> 00:58:57.920
that you would expect.

00:58:57.920 --> 00:58:59.003
And we'll talk about that.

00:59:01.890 --> 00:59:02.470
It's tricky.

00:59:02.470 --> 00:59:04.990
And of course, you have to
decide in all this stuff, oh,

00:59:04.990 --> 00:59:07.480
what do you mean
by free parameters?

00:59:07.480 --> 00:59:09.750
And actually, it seems
like people can't count.

00:59:09.750 --> 00:59:11.541
And we'll talk about
this in a moment, too.

00:59:14.680 --> 00:59:16.830
Because, of course,
constructing the model,

00:59:16.830 --> 00:59:19.450
there's some sense of free
parameters that you have there.

00:59:19.450 --> 00:59:21.360
Because we could
have said, oh, it's

00:59:21.360 --> 00:59:23.140
just going to be the Fisher log
series, or we could have said,

00:59:23.140 --> 00:59:23.880
oh, it's going to be island.

00:59:23.880 --> 00:59:25.296
Or we could have
said, oh, there's

00:59:25.296 --> 00:59:26.660
another island out here.

00:59:26.660 --> 00:59:28.760
And then that would be
another distribution.

00:59:28.760 --> 00:59:31.480
And not all of these things
introduce more free parameters,

00:59:31.480 --> 00:59:32.550
necessarily, because
you could say,

00:59:32.550 --> 00:59:34.049
oh, this is the
same migration rate,

00:59:34.049 --> 00:59:35.240
or you could do something.

00:59:35.240 --> 00:59:37.770
But they are going to lead
to different distributions,

00:59:37.770 --> 00:59:38.680
and you have that
freedom when you're

00:59:38.680 --> 00:59:39.804
trying to explain the data.

00:59:42.180 --> 00:59:46.156
There are a lot of judgment
calls in this business.

00:59:46.156 --> 00:59:47.780
But let's talk about
Fisher log series,

00:59:47.780 --> 00:59:48.995
because this is relevant.

00:59:54.550 --> 00:59:58.680
So the model is very similar
to what we did for the master

00:59:58.680 --> 01:00:01.290
equation in the context
of gene expression

01:00:01.290 --> 01:00:03.930
and the number of mRNA.

01:00:03.930 --> 01:00:06.610
So was the equilibrium
or steady state

01:00:06.610 --> 01:00:11.540
distribution of mRNA in a cell,
was that a Fisher log series?

01:00:11.540 --> 01:00:14.420
Yes or no, five seconds?

01:00:17.060 --> 01:00:20.970
Was the mRNA steady state
probability distribution

01:00:20.970 --> 01:00:21.990
a Fisher log series?

01:00:21.990 --> 01:00:24.550
Ready, three, two, one.

01:00:27.450 --> 01:00:28.030
No.

01:00:28.030 --> 01:00:28.529
No.

01:00:28.529 --> 01:00:29.415
What was it?

01:00:29.415 --> 01:00:30.870
It was a Poisson.

01:00:30.870 --> 01:00:34.420
And you guys should review what
all these distributions are,

01:00:34.420 --> 01:00:35.790
when you get them, and so forth.

01:00:35.790 --> 01:00:37.300
So what was the
Difference why is it

01:00:37.300 --> 01:00:48.770
that we have some
probability, P0, P1, P2?

01:00:48.770 --> 01:00:50.490
This could be mRNA
or it could be

01:00:50.490 --> 01:00:53.590
number of individuals in
some species with some birth

01:00:53.590 --> 01:00:54.440
and death rates.

01:00:54.440 --> 01:00:58.070
What was the key difference
between the mRNA model, which

01:00:58.070 --> 01:01:01.670
led to this distribution
becoming Poisson,

01:01:01.670 --> 01:01:03.790
and the model that we
just studied here, where

01:01:03.790 --> 01:01:05.080
it became a Fisher log series?

01:01:09.200 --> 01:01:13.020
And I should maybe write down
what the Fisher log series is.

01:01:13.020 --> 01:01:17.740
So this is the expected number
of species with n individuals

01:01:17.740 --> 01:01:19.069
on the metacommunity.

01:01:19.069 --> 01:01:20.360
Here is the Fisher log species.

01:01:20.360 --> 01:01:23.970
There was some theta X
to the n divided by n.

01:01:28.782 --> 01:01:29.990
So what's the key difference?

01:01:29.990 --> 01:01:31.244
Yeah.

01:01:31.244 --> 01:01:36.711
AUDIENCE: I think that the
birth and death rates are

01:01:36.711 --> 01:01:38.202
both proportional [INAUDIBLE].

01:01:41.989 --> 01:01:43.780
PROFESSOR: Right, the
birth and death rates

01:01:43.780 --> 01:01:44.855
are both proportional.

01:01:44.855 --> 01:01:46.330
AUDIENCE: In the
Fisher log series.

01:01:46.330 --> 01:01:47.829
PROFESSOR: In the
Fisher log series.

01:01:47.829 --> 01:01:51.630
So what we have is that
b0-- and what should we

01:01:51.630 --> 01:01:54.544
call b0 in this model?

01:01:54.544 --> 01:01:56.055
AUDIENCE: [INAUDIBLE].

01:01:56.055 --> 01:01:57.430
PROFESSOR: Well,
right now, we're

01:01:57.430 --> 01:01:59.026
thinking about
the metacommunity.

01:01:59.026 --> 01:01:59.900
AUDIENCE: Speciation.

01:01:59.900 --> 01:02:00.816
PROFESSOR: Speciation.

01:02:00.816 --> 01:02:03.870
b0 is speciation, which
we're going to assume

01:02:03.870 --> 01:02:06.670
is going to be constant.

01:02:06.670 --> 01:02:11.070
In this model, do we have
speciation on the island?

01:02:11.070 --> 01:02:12.046
No.

01:02:12.046 --> 01:02:13.420
The assumption is
that the island

01:02:13.420 --> 01:02:16.280
is small enough that the rate of
speciation is just negligible.

01:02:16.280 --> 01:02:20.164
So speciation plays
a role in forming

01:02:20.164 --> 01:02:22.080
the metacommunity
distribution, but it doesn't

01:02:22.080 --> 01:02:25.210
play a role in the model.

01:02:25.210 --> 01:02:27.280
So this is speciation.

01:02:27.280 --> 01:02:30.610
But then what we assume
is that b1, here,

01:02:30.610 --> 01:02:35.180
is equal to some
fundamental rate b times n,

01:02:35.180 --> 01:02:36.870
but it's b times,
in this case, 1.

01:02:36.870 --> 01:02:43.840
So more broadly, bn is equal
to some birth rate times n.

01:02:43.840 --> 01:02:46.290
This is saying that the
individuals can give birth

01:02:46.290 --> 01:02:49.260
to other individuals.

01:02:49.260 --> 01:02:51.919
Now, we're not assuming anything
about sexual reproduction

01:02:51.919 --> 01:02:52.710
necessarily or not.

01:02:52.710 --> 01:02:55.320
We're just saying
that the kind of rates

01:02:55.320 --> 01:02:56.800
are proportional to the numbers.

01:02:56.800 --> 01:02:58.520
So if you have twice
as many individuals,

01:02:58.520 --> 01:03:00.209
the birth rate will
be twice as large.

01:03:00.209 --> 01:03:01.000
This is reasonable.

01:03:07.980 --> 01:03:12.250
This is Pn and
this is Pn plus 1.

01:03:12.250 --> 01:03:17.200
So this is d of n plus
1 is equal to some death

01:03:17.200 --> 01:03:19.330
rate times n plus 1.

01:03:19.330 --> 01:03:21.800
So each individual just
has some rate of dying.

01:03:21.800 --> 01:03:23.152
It's exponentially distributed.

01:03:23.152 --> 01:03:24.110
This again makes sense.

01:03:26.750 --> 01:03:28.990
What was the key difference
between our mRNA model,

01:03:28.990 --> 01:03:30.406
from before that
gave the Poisson,

01:03:30.406 --> 01:03:32.670
and this model that gives
the Fisher log series?

01:03:32.670 --> 01:03:36.992
AUDIENCE: So with the mRNA, it's
with a standard like a chemical

01:03:36.992 --> 01:03:41.952
equation where there's
some fixed external input.

01:03:41.952 --> 01:03:44.680
But then the
degradation is according

01:03:44.680 --> 01:03:45.920
to the amount that you have.

01:03:45.920 --> 01:03:48.420
So death is proportionate
[INAUDIBLE].

01:03:48.420 --> 01:03:49.790
PROFESSOR: Perfect.

01:03:49.790 --> 01:03:52.400
In both cases, the death rate
is proportional to the number

01:03:52.400 --> 01:03:55.140
of either mRNA or individuals.

01:03:55.140 --> 01:03:57.140
However, in the mRNA
model, what we assume

01:03:57.140 --> 01:03:59.840
is there some just constant
rate of transcription,

01:03:59.840 --> 01:04:02.466
so a constant rate, per unit
time, of making more mRNA.

01:04:02.466 --> 01:04:03.840
So just because
there's more mRNA

01:04:03.840 --> 01:04:06.950
doesn't mean that you're
going to get more mRNA.

01:04:06.950 --> 01:04:10.080
But here, we assume
that the birth rate

01:04:10.080 --> 01:04:12.100
is proportional to the number.

01:04:12.100 --> 01:04:13.982
So that's what leads
to the difference.

01:04:13.982 --> 01:04:15.690
And so this is one of
the few other cases

01:04:15.690 --> 01:04:18.060
that you can simply
solve the master equation

01:04:18.060 --> 01:04:19.692
and get an equilibrium
distribution.

01:04:19.692 --> 01:04:21.650
And it's the same thing
we do from just always,

01:04:21.650 --> 01:04:26.840
where we say, at steady
state, the probability fluxes

01:04:26.840 --> 01:04:28.800
or whatever are equal.

01:04:28.800 --> 01:04:32.510
So you get that P1
should be equal to P0.

01:04:32.510 --> 01:04:35.316
and then we have a
b0 divided by d1.

01:04:35.316 --> 01:04:39.130
And more broadly, we
just cycle through.

01:04:39.130 --> 01:04:41.400
The probability of
being in the nth state,

01:04:41.400 --> 01:04:43.060
it's going to be some P0.

01:04:43.060 --> 01:04:48.380
And then basically, it's
going to b0 divided by d1,

01:04:48.380 --> 01:04:58.470
b1 divided by d2, b2, d3, dot,
dot, dot, up to bn minus 1 dn.

01:04:58.470 --> 01:05:01.780
And indeed, if we just plug in
what these things are equal to,

01:05:01.780 --> 01:05:06.650
we end up getting-- there's
P0, the fundamental birth

01:05:06.650 --> 01:05:08.191
over death to the nth power.

01:05:08.191 --> 01:05:09.940
And then we just are
left with a 1 over n.

01:05:09.940 --> 01:05:13.170
Because we're going to have a
2 here and a 2 here, and those

01:05:13.170 --> 01:05:13.670
cancel.

01:05:13.670 --> 01:05:15.370
A 2 here and 3 here,
and those cancel.

01:05:15.370 --> 01:05:20.130
And we're just left with
the n at the end, finally.

01:05:20.130 --> 01:05:24.470
So this x, over there,
is then, in this model,

01:05:24.470 --> 01:05:27.920
the ratio of the
birth and death rates.

01:05:27.920 --> 01:05:29.890
So which one is larger?

01:05:29.890 --> 01:05:32.330
Is it A slash 1?

01:05:32.330 --> 01:05:35.270
Is it b is greater than d?

01:05:35.270 --> 01:05:39.844
Or is it b slash 2,
that b is less than d?

01:05:39.844 --> 01:05:41.260
Think about this
for five seconds.

01:05:43.770 --> 01:05:47.002
Do you think that
birth rates should

01:05:47.002 --> 01:05:48.710
be larger than death
rates or death rates

01:05:48.710 --> 01:05:51.168
should be larger than birth
rates or do they have to equal?

01:05:54.990 --> 01:05:57.940
Ready, three, two, one.

01:06:00.460 --> 01:06:07.780
So we got a number of-- it's
kind of distributed, 1 and 2's.

01:06:07.780 --> 01:06:10.322
Well, it's maybe not that
deep, not deep enough.

01:06:10.322 --> 01:06:11.780
Can somebody say
why their neighbor

01:06:11.780 --> 01:06:12.988
thinks it's one or the other?

01:06:15.374 --> 01:06:17.290
People are actually
turning to their neighbor.

01:06:20.720 --> 01:06:22.336
A justification for
one or the other.

01:06:22.336 --> 01:06:26.144
AUDIENCE: So if this
problem where b over d

01:06:26.144 --> 01:06:30.242
is greater than 1, then this
distribution is not normalized.

01:06:30.242 --> 01:06:30.950
PROFESSOR: Right.

01:06:30.950 --> 01:06:35.380
So if b over d is greater than
1, so if x is greater than 1,

01:06:35.380 --> 01:06:39.480
then this distribution blows up.

01:06:39.480 --> 01:06:41.230
Then it gets more and
more likely to have

01:06:41.230 --> 01:06:44.240
all these larger numbers.

01:06:44.240 --> 01:06:48.510
But then if b is less than d,
shouldn't everybody be extinct?

01:06:48.510 --> 01:06:50.000
No.

01:06:50.000 --> 01:06:52.640
Can somebody else say
why it is that it's OK

01:06:52.640 --> 01:06:53.640
for b to be less than d?

01:06:53.640 --> 01:06:55.348
If birth rates are
less than death rates,

01:06:55.348 --> 01:06:56.681
shouldn't everyone be extinct?

01:06:56.681 --> 01:06:59.140
AUDIENCE: Because
there's a rate b0.

01:06:59.140 --> 01:07:01.350
PROFESSOR: Because there's
a rate b0, exactly.

01:07:01.350 --> 01:07:03.250
So there's a finite
rate of speciation.

01:07:03.250 --> 01:07:05.505
So it's true that every
species will go extinct.

01:07:08.070 --> 01:07:12.050
But because we have a constant
influx of new species,

01:07:12.050 --> 01:07:16.170
we end up with this distribution
that's this Fisher log series.

01:07:16.170 --> 01:07:21.230
Now, if you plot the
Fisher log series,

01:07:21.230 --> 01:07:23.690
it looks a bit like this.

01:07:23.690 --> 01:07:25.990
But let's think about
it a little bit.

01:07:25.990 --> 01:07:28.540
Does the Fisher log
series, does it fall off,

01:07:28.540 --> 01:07:32.200
A, faster or slower than this?

01:07:32.200 --> 01:07:41.580
Fisher falls, A, faster-- this
is in this direction-- or, B,

01:07:41.580 --> 01:07:42.080
slower?

01:07:48.519 --> 01:07:50.060
AUDIENCE: Faster or
slower than what?

01:07:50.060 --> 01:07:54.300
PROFESSOR: Than the
island distribution.

01:07:54.300 --> 01:07:56.640
Because you can see that this
falls off pretty rapidly.

01:08:01.180 --> 01:08:04.250
Ready, maybe?

01:08:04.250 --> 01:08:06.575
Three, two, one.

01:08:14.850 --> 01:08:18.452
I saw a fair number
of people that

01:08:18.452 --> 01:08:20.380
don't want to make a guess.

01:08:20.380 --> 01:08:21.810
Indeed, it's going to be faster.

01:08:21.810 --> 01:08:22.765
Can somebody say why?

01:08:26.639 --> 01:08:27.139
Yeah.

01:08:27.139 --> 01:08:28.055
AUDIENCE: [INAUDIBLE].

01:08:30.361 --> 01:08:32.569
PROFESSOR: Is it going to
be because of the 1 over n?

01:08:32.569 --> 01:08:34.460
I mean the 1 over n
is certainly relevant.

01:08:37.399 --> 01:08:38.899
Without the one
over n, then we just

01:08:38.899 --> 01:08:40.930
have sort of a geometric series.

01:08:40.930 --> 01:08:43.550
And the log normal is not just
a geometric series either.

01:08:47.382 --> 01:08:53.590
AUDIENCE: [INAUDIBLE] Whereas
this has a very long tail.

01:08:53.590 --> 01:08:54.590
PROFESSOR: That's right.

01:08:54.590 --> 01:08:55.649
So this falls off.

01:08:58.189 --> 01:08:59.689
This would be kind
of exponentially,

01:08:59.689 --> 01:09:01.272
and this is faster
than exponentially.

01:09:03.420 --> 01:09:07.600
And indeed, this make
sense based on the model.

01:09:07.600 --> 01:09:10.840
Because this
community, the reason

01:09:10.840 --> 01:09:15.920
that it has some very,
very abundant species

01:09:15.920 --> 01:09:18.090
is partly because
it gets migration

01:09:18.090 --> 01:09:19.479
from the abundant species here.

01:09:22.680 --> 01:09:26.200
This falls off pretty quickly.

01:09:26.200 --> 01:09:30.450
But those frequent species still
can play a pretty important

01:09:30.450 --> 01:09:33.859
role in the island community,
because the migration rate

01:09:33.859 --> 01:09:38.924
is influenced by large numbers.

01:09:38.924 --> 01:09:40.340
And the other thing
is, of course,

01:09:40.340 --> 01:09:43.890
that the rare species are
going to often go extinct.

01:09:43.890 --> 01:09:45.880
I mean the distribution
on the island

01:09:45.880 --> 01:09:50.330
is some complicated process
of the dynamics going here,

01:09:50.330 --> 01:09:51.800
plus sampling from here.

01:09:51.800 --> 01:09:54.480
But there's a sense that it's
biased towards-- it's not just

01:09:54.480 --> 01:09:56.690
a reflection of
the metacommunity,

01:09:56.690 --> 01:09:58.430
because the migration
rate is sampled

01:09:58.430 --> 01:09:59.830
towards the abundant species.

01:09:59.830 --> 01:10:02.620
So the migration
of these species

01:10:02.620 --> 01:10:05.980
ends up playing a major role
in pushing the distribution

01:10:05.980 --> 01:10:08.390
to the right.

01:10:08.390 --> 01:10:14.500
So you have much more frequent,
abundant species on the island

01:10:14.500 --> 01:10:18.205
as compared to the mainland.

01:10:18.205 --> 01:10:19.167
AUDIENCE: [INAUDIBLE]?

01:10:19.167 --> 01:10:19.833
PROFESSOR: Yeah.

01:10:19.833 --> 01:10:23.015
AUDIENCE: [INAUDIBLE]
measurement of the distribution

01:10:23.015 --> 01:10:24.940
on the--

01:10:24.940 --> 01:10:26.500
PROFESSOR: Well,
I'm sure they have.

01:10:29.810 --> 01:10:32.450
I think the statement
that there's a faster fall

01:10:32.450 --> 01:10:34.120
off on mainlands
than on the islands

01:10:34.120 --> 01:10:37.580
I think is borne
out by the data.

01:10:37.580 --> 01:10:42.730
But I don't know if trees on
the Panama side of the canal

01:10:42.730 --> 01:10:45.050
are actually better described
by a Fisher log series

01:10:45.050 --> 01:10:47.215
as compared to this, though.

01:10:47.215 --> 01:10:50.185
AUDIENCE: I guess my question
was the abundant species

01:10:50.185 --> 01:10:52.907
that we see on the
island, is it just

01:10:52.907 --> 01:10:57.120
the result of diffusive drift?

01:10:57.120 --> 01:11:02.630
PROFESSOR: Well, this also
has the diffusive drift.

01:11:08.398 --> 01:11:12.462
AUDIENCE: But in the sense
that what really pushes.

01:11:12.462 --> 01:11:13.920
PROFESSOR: Well,
I mean I think you

01:11:13.920 --> 01:11:16.705
need both, the diffusive
drift and the migration.

01:11:16.705 --> 01:11:19.010
But I think that the fact
that the migration is

01:11:19.010 --> 01:11:20.720
from the mainland,
and it's biased

01:11:20.720 --> 01:11:24.190
towards those abundant
things, I think

01:11:24.190 --> 01:11:26.849
is necessary or important.

01:11:26.849 --> 01:11:28.890
AUDIENCE: I guess just in
terms of distinguishing

01:11:28.890 --> 01:11:33.133
between the niche and
the neutral models,

01:11:33.133 --> 01:11:39.733
as applied to the mainland, does
the niche model predict also

01:11:39.733 --> 01:11:40.274
a log normal?

01:11:40.274 --> 01:11:42.644
Because it seemed like,
in the discussion earlier,

01:11:42.644 --> 01:11:47.584
the neutral also predicted
log normal [INAUDIBLE].

01:11:47.584 --> 01:11:49.000
PROFESSOR: That's
a good question.

01:11:54.200 --> 01:11:58.560
In this whole area, I mean
it's a little bit empirical.

01:11:58.560 --> 01:12:01.490
The fact that the niche
model kind of predicts

01:12:01.490 --> 01:12:05.790
this, or this broken stick
thing predicts a log normal,

01:12:05.790 --> 01:12:08.540
they didn't say anything
about islands there, right?

01:12:08.540 --> 01:12:11.630
I guess even Fisher's
original log series,

01:12:11.630 --> 01:12:14.752
he used it to
describe-- I think maybe

01:12:14.752 --> 01:12:16.210
that was the beetles
on the Thames.

01:12:16.210 --> 01:12:19.790
But his original data set,
where the Fisher log series

01:12:19.790 --> 01:12:21.970
was supposed to
described it, as it

01:12:21.970 --> 01:12:24.060
was sampled better and
better, it eventually

01:12:24.060 --> 01:12:26.393
started looking more and more
like a log normal anyways.

01:12:29.316 --> 01:12:31.190
I mean it's easy to see
the frequent species,

01:12:31.190 --> 01:12:34.560
because you see them.

01:12:34.560 --> 01:12:36.930
This tail can actually
be very hard to see,

01:12:36.930 --> 01:12:40.145
because you have to
find the individuals.

01:12:43.770 --> 01:12:46.550
It's a good question of to
what degree each of the models

01:12:46.550 --> 01:12:49.580
really predicts one thing
on one place and another.

01:12:49.580 --> 01:12:53.450
There's always tweaks of each
model that adjust things.

01:12:53.450 --> 01:12:56.010
So I think it's a bit muddy.

01:13:01.070 --> 01:13:03.240
But the one thing that
I want to highlight.

01:13:03.240 --> 01:13:05.420
So there's a lot
of debates, then,

01:13:05.420 --> 01:13:07.150
between these different models.

01:13:07.150 --> 01:13:09.260
And each of the
models have some fit.

01:13:09.260 --> 01:13:12.180
They have red and black.

01:13:12.180 --> 01:13:13.860
There's one that kind
of goes like this.

01:13:13.860 --> 01:13:15.820
And another one that
kind of goes like that.

01:13:15.820 --> 01:13:20.410
And they're not labeled,
because they look the same.

01:13:20.410 --> 01:13:23.990
And you can argue about chi
squareds and everything,

01:13:23.990 --> 01:13:26.290
but I think it's irrelevant.

01:13:26.290 --> 01:13:28.490
They both fit the data fine.

01:13:28.490 --> 01:13:31.670
And the other thing,
just the sampling of kind

01:13:31.670 --> 01:13:35.710
root n sampling, if you
expect to see 10 species,

01:13:35.710 --> 01:13:38.355
then if you go and you
actually do sampling,

01:13:38.355 --> 01:13:41.330
you expect to have kind
of a root n on each one.

01:13:41.330 --> 01:13:44.110
I mean the error bars,
I think, around this

01:13:44.110 --> 01:13:46.540
are consistent with both models.

01:13:46.540 --> 01:13:49.259
So I'd say that the
exercise of trying

01:13:49.259 --> 01:13:51.550
to distinguish those models
based on fit to such a data

01:13:51.550 --> 01:13:55.760
set I think is hopeless
from the beginning.

01:13:55.760 --> 01:13:58.640
And then you can talk about
the number of parameters.

01:13:58.640 --> 01:14:00.350
And if you read
these two papers,

01:14:00.350 --> 01:14:02.124
they both say that
they have fewer number

01:14:02.124 --> 01:14:02.915
of free parameters.

01:14:05.610 --> 01:14:07.862
And it is hard to
believe that there could

01:14:07.862 --> 01:14:09.070
be a disagreement about this.

01:14:12.534 --> 01:14:14.200
But then, you know,
it's like, oh, well,

01:14:14.200 --> 01:14:16.080
what do you call
a free parameter?

01:14:16.080 --> 01:14:19.950
And then what they say, any
given RSA data set contains

01:14:19.950 --> 01:14:22.110
information about the
local community size j.

01:14:24.790 --> 01:14:27.155
So they say, given that,
it's not a free parameter,

01:14:27.155 --> 01:14:28.705
because you put that in.

01:14:28.705 --> 01:14:30.080
That's the number
of individuals.

01:14:30.080 --> 01:14:32.093
And then outcome is your
distribution, right?

01:14:32.093 --> 01:14:33.800
And you say, OK, well,
all right, that's

01:14:33.800 --> 01:14:36.008
fine if you don't want to
call that a free parameter.

01:14:36.008 --> 01:14:39.510
But then when you fit the log
normal to this distribution,

01:14:39.510 --> 01:14:42.720
the overall amplitude
is also to give you

01:14:42.720 --> 01:14:46.280
the number of individuals
in the metacommunity.

01:14:46.280 --> 01:14:50.880
So if you don't call j a
free parameter in this model,

01:14:50.880 --> 01:14:53.840
then you can't call the
amplitude a free parameter when

01:14:53.840 --> 01:14:57.970
you fit the log normal,
at least in my opinion.

01:14:57.970 --> 01:15:01.540
I think that they
both have three.

01:15:01.540 --> 01:15:04.570
Because if you fit a
log normal to this,

01:15:04.570 --> 01:15:05.890
you have the overall amplitude.

01:15:05.890 --> 01:15:07.300
That's the number
of individuals.

01:15:07.300 --> 01:15:11.350
And then you have the mean
and the standard deviation

01:15:11.350 --> 01:15:14.089
or whatever.

01:15:14.089 --> 01:15:16.047
From that standpoint, I
think they're the same.

01:15:16.047 --> 01:15:16.547
Yeah.

01:15:16.547 --> 01:15:18.909
AUDIENCE: But I mean how
do you fit the log normal

01:15:18.909 --> 01:15:19.863
when you don't impose?

01:15:19.863 --> 01:15:21.294
Do they impose the amplitude?

01:15:24.530 --> 01:15:26.400
I mean it's still a parameter.

01:15:26.400 --> 01:15:26.890
PROFESSOR: No, that's
what I was saying.

01:15:26.890 --> 01:15:27.650
It's a parameter.

01:15:27.650 --> 01:15:30.302
I mean the normalized log
normal, you integrate,

01:15:30.302 --> 01:15:31.010
and it goes to 1.

01:15:31.010 --> 01:15:32.600
But then you have
some measured number

01:15:32.600 --> 01:15:35.020
of individuals in your
sample, and then you

01:15:35.020 --> 01:15:37.650
have to multiply by
that to give you.

01:15:37.650 --> 01:15:40.337
AUDIENCE: But is that what
they do when they do their fit?

01:15:40.337 --> 01:15:41.003
PROFESSOR: Yeah.

01:15:41.003 --> 01:15:43.544
AUDIENCE: Or do they keep that
amplitude as also a parameter?

01:15:46.040 --> 01:15:48.320
PROFESSOR: I think that
you can argue whether this

01:15:48.320 --> 01:15:49.540
is a free parameter or not.

01:15:49.540 --> 01:15:51.440
But I think that
you can just put it

01:15:51.440 --> 01:15:54.149
as the number of
individuals, and it's not

01:15:54.149 --> 01:15:55.190
going to affect anything.

01:15:55.190 --> 01:15:58.330
You could actually
have it be a free.

01:15:58.330 --> 01:16:00.790
But this gets into this
question about what constitutes

01:16:00.790 --> 01:16:02.230
a free parameter or not.

01:16:02.230 --> 01:16:04.740
And actually, there is
some subtlety to this.

01:16:04.740 --> 01:16:09.664
But I think, at
the end of the day,

01:16:09.664 --> 01:16:11.580
the log normal is not
going to look like this.

01:16:11.580 --> 01:16:12.840
You have to.

01:16:12.840 --> 01:16:14.840
You basically put in the
number of individuals

01:16:14.840 --> 01:16:15.880
that you measured.

01:16:15.880 --> 01:16:17.840
AUDIENCE: So when you
calculate [INAUDIBLE]?

01:16:23.750 --> 01:16:25.380
PROFESSOR: Huge
numbers of pages of

01:16:25.380 --> 01:16:27.602
has been written about
comparing these things.

01:16:27.602 --> 01:16:30.060
At some point, it comes down
to this philosophical question

01:16:30.060 --> 01:16:32.300
about what you think
constitutes a null model.

01:16:32.300 --> 01:16:33.990
And this gets to be
much more subtle.

01:16:33.990 --> 01:16:37.200
And I think reasonable
people can disagree

01:16:37.200 --> 01:16:39.990
about whether the null model
that you need to reject

01:16:39.990 --> 01:16:41.800
should be this
neutral model or if it

01:16:41.800 --> 01:16:43.160
should be a niche-based model.

01:16:43.160 --> 01:16:45.997
Or maybe it's just that there's
some multiplicative type

01:16:45.997 --> 01:16:48.330
process that's going on and
gives you distributions that

01:16:48.330 --> 01:16:50.790
look like this, and you need
other kinds of information

01:16:50.790 --> 01:16:52.260
to try to distinguish
those things.

01:16:52.260 --> 01:16:54.051
And in particular, I'd
say that it's really

01:16:54.051 --> 01:16:56.940
the dynamic information in which
these models have strikingly

01:16:56.940 --> 01:16:58.398
different predictions,
and then you

01:16:58.398 --> 01:17:00.820
can reject neutral-type models.

01:17:00.820 --> 01:17:03.160
Because that neutral
models predict

01:17:03.160 --> 01:17:04.820
that these species
that are abundant

01:17:04.820 --> 01:17:07.380
are just transiently abundant,
and they should go way.

01:17:07.380 --> 01:17:08.630
Whereas the niche-based
models would say,

01:17:08.630 --> 01:17:09.760
oh, they're really fixed.

01:17:09.760 --> 01:17:13.324
And indeed, in many cases,
the abundant species kind of

01:17:13.324 --> 01:17:14.740
stick around longer
than you would

01:17:14.740 --> 01:17:16.320
expect from a neutral model.

01:17:16.320 --> 01:17:19.692
Of course, the neutral model
is not true in the sense

01:17:19.692 --> 01:17:21.400
that different
individuals are different.

01:17:21.400 --> 01:17:24.500
But it's important to
highlight that even

01:17:24.500 --> 01:17:27.210
such a minimal
model can give you

01:17:27.210 --> 01:17:29.220
striking patterns that
are similar to what

01:17:29.220 --> 01:17:30.756
you observe in nature.

01:17:30.756 --> 01:17:32.130
And so I think
we're out of time.

01:17:32.130 --> 01:17:33.560
So with that, I
think we'll quit.

01:17:33.560 --> 01:17:36.250
But it's been a pleasure having
you guys for this semester.

01:17:36.250 --> 01:17:38.870
And if you have any
questions about any systems

01:17:38.870 --> 01:17:41.460
biology things in the
future, please, email me.

01:17:41.460 --> 01:17:42.460
I'm happy to meet up.

01:17:42.460 --> 01:17:44.203
Good luck on the final.