WEBVTT
00:00:01.030 --> 00:00:04.660
In this segment, we discuss the
so-called "random incidence"
00:00:04.660 --> 00:00:07.220
paradox for the Poisson process.
00:00:07.220 --> 00:00:09.170
It's a paradox
because it involves
00:00:09.170 --> 00:00:12.200
a somewhat counterintuitive
phenomenon.
00:00:12.200 --> 00:00:15.430
However, we will understand
exactly what's going on,
00:00:15.430 --> 00:00:18.450
and in the end, it will
cease to be a paradox
00:00:18.450 --> 00:00:20.700
and we will have an
intuitive understanding
00:00:20.700 --> 00:00:22.650
of what exactly is happening.
00:00:22.650 --> 00:00:26.250
So consider a Poisson process
that has been running forever,
00:00:26.250 --> 00:00:28.500
or think of it as a
Poisson process that
00:00:28.500 --> 00:00:33.100
started a very long
time back in the past.
00:00:33.100 --> 00:00:37.110
To make things concrete,
suppose that the arrival rate
00:00:37.110 --> 00:00:42.900
is 4 arrivals per hour so that
the expected interarrival time
00:00:42.900 --> 00:00:47.690
is one fourth, in
hours, or that would
00:00:47.690 --> 00:00:49.360
be the same as 15 minutes.
00:00:51.960 --> 00:00:56.690
For example, suppose that
the bus company in your town
00:00:56.690 --> 00:01:00.690
claims that buses
arrive to your stop
00:01:00.690 --> 00:01:05.200
according to a Poisson process
with this particular rate.
00:01:05.200 --> 00:01:09.050
But you don't really believe
that your bus company is
00:01:09.050 --> 00:01:12.789
telling the truth and you
decide to investigate.
00:01:12.789 --> 00:01:15.130
So what you do is the following.
00:01:15.130 --> 00:01:19.130
You show up at some
time at your bus stop
00:01:19.130 --> 00:01:24.370
and wait until the
next arrival comes
00:01:24.370 --> 00:01:27.630
and also ask someone
who lives near the bus
00:01:27.630 --> 00:01:30.770
stop, what time was
the last arrival?
00:01:30.770 --> 00:01:32.690
And they tell you
the last arrival
00:01:32.690 --> 00:01:35.350
happened at that time instant.
00:01:35.350 --> 00:01:37.960
And you measure
this amount of time,
00:01:37.960 --> 00:01:43.120
which is the interarrival
time, record what it is,
00:01:43.120 --> 00:01:49.340
repeat this experiment on many
days, and calculate an average.
00:01:49.340 --> 00:01:51.880
What you're likely
to see turns out
00:01:51.880 --> 00:01:56.320
to be something
around 30 minutes.
00:01:56.320 --> 00:01:59.350
At this point, you could
go to the bus company
00:01:59.350 --> 00:02:01.160
and challenge them.
00:02:01.160 --> 00:02:05.100
You claim an arrival rate of
4 arrivals per hour, which
00:02:05.100 --> 00:02:08.310
would translate into
interarrivals of 15 minutes,
00:02:08.310 --> 00:02:12.070
but every day I go and
check the interarrival time
00:02:12.070 --> 00:02:15.800
and I find that they
are close to 30 minutes.
00:02:15.800 --> 00:02:17.030
What's the explanation?
00:02:17.030 --> 00:02:18.210
What's going on?
00:02:18.210 --> 00:02:21.820
Is it that the belief or
the claim of the bus company
00:02:21.820 --> 00:02:26.190
is incorrect, or is there
something more complicated?
00:02:26.190 --> 00:02:27.910
So let us try to
understand what's
00:02:27.910 --> 00:02:33.000
going on by being very
precise and careful.
00:02:33.000 --> 00:02:37.360
You show up at the bus
station at some time--
00:02:37.360 --> 00:02:41.530
let's call that time t star.
00:02:41.530 --> 00:02:45.110
You ask someone who has
been at the station,
00:02:45.110 --> 00:02:48.750
when was the last arrival
time, and they tell you,
00:02:48.750 --> 00:02:53.860
and it is some number U.
You wait until the next bus,
00:02:53.860 --> 00:02:57.950
and the next bus arrives
at some future time capital
00:02:57.950 --> 00:03:02.730
V. You are interested
in the interarrival time
00:03:02.730 --> 00:03:04.910
that you're observing,
which is the difference
00:03:04.910 --> 00:03:09.290
between these two random
variables V minus U.
00:03:09.290 --> 00:03:12.890
Now this difference-- let
us split it into two pieces.
00:03:12.890 --> 00:03:16.336
There's one piece
from t star until V,
00:03:16.336 --> 00:03:20.270
which is V minus t star.
00:03:20.270 --> 00:03:23.610
And there's another piece,
which is the first interval,
00:03:23.610 --> 00:03:31.930
and this is t star minus U. Now
t star, the time at which you
00:03:31.930 --> 00:03:34.690
arrive, is just a constant.
00:03:34.690 --> 00:03:38.600
Suppose that you arrive at the
bus station at exactly 12 noon.
00:03:38.600 --> 00:03:40.430
There's nothing random about it.
00:03:40.430 --> 00:03:43.620
However, V and U are
random variables.
00:03:43.620 --> 00:03:46.460
What kind of random
variable is this?
00:03:46.460 --> 00:03:52.290
You show up at 12 noon and you
wait until the first arrival.
00:03:52.290 --> 00:03:56.920
Because a Poisson process
starts fresh at any given time--
00:03:56.920 --> 00:03:59.680
so after 12 noon it
starts fresh-- this
00:03:59.680 --> 00:04:02.940
is the time until the first
arrival in a Poisson process
00:04:02.940 --> 00:04:06.150
with rate lambda, so
this is a random variable
00:04:06.150 --> 00:04:10.750
which is exponential
with parameter lambda.
00:04:10.750 --> 00:04:14.060
Now let us understand what
this random variable is.
00:04:16.930 --> 00:04:19.899
One way of thinking
about it is to think
00:04:19.899 --> 00:04:23.720
of the Poisson process
running backwards in time,
00:04:23.720 --> 00:04:25.420
so you live time backwards.
00:04:25.420 --> 00:04:29.910
You show up at 12 noon, and
then time runs backwards,
00:04:29.910 --> 00:04:34.600
and you wait until you see
the first arrival coming
00:04:34.600 --> 00:04:38.650
in this backwards universe.
00:04:38.650 --> 00:04:41.250
So we're dealing
here with the time
00:04:41.250 --> 00:04:44.240
until an arrival in
a Poisson process
00:04:44.240 --> 00:04:46.840
that runs backwards in time.
00:04:46.840 --> 00:04:50.020
What kind of process is a
backwards Poisson process?
00:04:52.960 --> 00:04:56.260
If you take a Poisson
process in reverse time,
00:04:56.260 --> 00:04:59.590
the independence
assumption is not affected.
00:04:59.590 --> 00:05:02.380
Disjoint time intervals
are independent.
00:05:02.380 --> 00:05:05.080
Even if you reverse time,
disjoints time intervals
00:05:05.080 --> 00:05:07.320
still remain independent.
00:05:07.320 --> 00:05:11.130
Any given time interval
of small length delta
00:05:11.130 --> 00:05:15.070
will have certain
probabilities of an arrival
00:05:15.070 --> 00:05:18.280
or of two arrivals, and
these will be the same
00:05:18.280 --> 00:05:22.120
whether time goes forward
or time goes backward.
00:05:22.120 --> 00:05:24.400
So the conclusion
from this discussion
00:05:24.400 --> 00:05:26.940
is that the backwards
running Poisson process
00:05:26.940 --> 00:05:31.020
is also a Poisson
process, and so this time
00:05:31.020 --> 00:05:34.500
until the first arrival
in the backwards process
00:05:34.500 --> 00:05:38.030
is just like the time until
the first arrival in a Poisson
00:05:38.030 --> 00:05:38.830
process.
00:05:38.830 --> 00:05:42.870
So this also is an
exponential random variable
00:05:42.870 --> 00:05:46.510
with parameter lambda.
00:05:46.510 --> 00:05:51.050
Even more than that,
these two random variables
00:05:51.050 --> 00:05:53.200
are independent of each other.
00:05:53.200 --> 00:05:55.180
Why are they independent?
00:05:55.180 --> 00:05:58.220
The length of this
time interval has
00:05:58.220 --> 00:06:00.830
to do with the history
of the Poisson process
00:06:00.830 --> 00:06:02.990
after time t star.
00:06:02.990 --> 00:06:04.790
The length of this
time interval has
00:06:04.790 --> 00:06:07.200
to do with the history
of the Poisson process
00:06:07.200 --> 00:06:10.810
before time t star, but in
the Poisson process because
00:06:10.810 --> 00:06:14.800
of the independence property,
the past and the future
00:06:14.800 --> 00:06:17.640
are independent, and
therefore, this random variable
00:06:17.640 --> 00:06:21.580
is independent from
that random variable.
00:06:21.580 --> 00:06:27.120
In any case, the expected value
of the interarrival interval
00:06:27.120 --> 00:06:30.300
that you see, the expected
value of this random variable,
00:06:30.300 --> 00:06:34.510
is going to be the expected
value of one exponential, which
00:06:34.510 --> 00:06:37.240
is 1 over lambda,
plus the expected
00:06:37.240 --> 00:06:40.630
value of another exponential,
which is 1 over lambda,
00:06:40.630 --> 00:06:46.220
and we get a result
of 2 over lambda.
00:06:46.220 --> 00:06:50.630
And that's why when you actually
carried out the experiment,
00:06:50.630 --> 00:06:53.850
you saw interarrival
intervals that
00:06:53.850 --> 00:06:58.390
had a length of 30 minutes
as opposed to the 15 minutes
00:06:58.390 --> 00:07:01.890
that you were expecting
in the first place.
00:07:01.890 --> 00:07:04.420
Now how can this be?
00:07:04.420 --> 00:07:08.280
Since the interarrival
times in a Poisson process
00:07:08.280 --> 00:07:11.410
have expected value
1 over lambda,
00:07:11.410 --> 00:07:14.160
how can it be that
the expected length
00:07:14.160 --> 00:07:16.730
of the interarrival
times that you see
00:07:16.730 --> 00:07:20.560
have an expected value
of 2 over lambda?
00:07:20.560 --> 00:07:23.360
Well, the resolution of this
paradox has to do [with]
00:07:23.360 --> 00:07:29.840
what exactly we mean when we use
the words an interarrival time.
00:07:29.840 --> 00:07:34.520
There's one interpretation which
is the first interarrival time,
00:07:34.520 --> 00:07:38.060
the second one, the
hundredth interarrival time--
00:07:38.060 --> 00:07:41.280
each one of these
actually has an expected
00:07:41.280 --> 00:07:45.100
value of 1 over lambda.
00:07:45.100 --> 00:07:47.980
But this is a different
kind of interarrival time.
00:07:47.980 --> 00:07:50.720
It's not the first
or the second or
00:07:50.720 --> 00:07:54.190
some specific k-th
interarrival time.
00:07:54.190 --> 00:08:00.570
It's the interarrival time
that you selected to watch.
00:08:00.570 --> 00:08:04.780
When you show up at a
certain time, like 12 noon,
00:08:04.780 --> 00:08:09.310
you're more likely to fall
inside a large interarrival
00:08:09.310 --> 00:08:13.740
interval rather than a
smaller interarrival interval.
00:08:13.740 --> 00:08:16.420
So just the fact that
you're showing up
00:08:16.420 --> 00:08:18.640
at a certain time
that's uncoordinated
00:08:18.640 --> 00:08:20.630
with the rest of
the process makes
00:08:20.630 --> 00:08:24.500
you more likely to be
biased towards longer
00:08:24.500 --> 00:08:26.740
rather than shorter intervals.
00:08:26.740 --> 00:08:31.220
And this bias is what
causes this factor of 2.
00:08:34.030 --> 00:08:37.308
So it's an issue really
about how you sample
00:08:37.308 --> 00:08:40.250
or how you choose
the interarrival time
00:08:40.250 --> 00:08:43.419
that you're going to watch,
and this particular sampling
00:08:43.419 --> 00:08:46.990
method has a bias
towards longer intervals.
00:08:46.990 --> 00:08:49.420
As we will see, this
is not something
00:08:49.420 --> 00:08:52.330
that's specific to
the Poisson process.
00:08:52.330 --> 00:08:54.500
In general, in many
occasions there
00:08:54.500 --> 00:08:57.760
are different ways of
sampling which give you
00:08:57.760 --> 00:09:01.520
different answers, and we will
go through a number of examples
00:09:01.520 --> 00:09:04.280
that will give you some
intuition about the source
00:09:04.280 --> 00:09:07.660
of the discrepancy
between these two answers.