WEBVTT

00:00:01.640 --> 00:00:08.170
PROFESSOR: So the handouts will
be just the problem set 8

00:00:08.170 --> 00:00:11.120
solutions, of which you already
have the first two.

00:00:11.120 --> 00:00:15.840
Remind you problem set 9 is
due on Friday, but we will

00:00:15.840 --> 00:00:17.840
accept it on Monday if that's
when you want to

00:00:17.840 --> 00:00:20.230
hand it in to Ashish.

00:00:20.230 --> 00:00:24.620
And problem set 10 I will hand
out next week, but you won't

00:00:24.620 --> 00:00:27.020
be responsible for it.

00:00:27.020 --> 00:00:32.490
You could try it if
you're so moved.

00:00:32.490 --> 00:00:33.530
OK.

00:00:33.530 --> 00:00:35.370
We're in the middle
of chapter 13.

00:00:35.370 --> 00:00:39.540
We've been talking about
capacity approaching codes.

00:00:39.540 --> 00:00:43.640
We've talked about a number of
classes of them, low density

00:00:43.640 --> 00:00:48.220
parity check, turbo, repeat
accumulate, and I've given you

00:00:48.220 --> 00:00:52.610
a general idea of how the sum
product decoding algorithm is

00:00:52.610 --> 00:00:54.580
applied to decode these codes.

00:00:54.580 --> 00:00:58.093
These are all defined on graphs
with cycles, in the

00:00:58.093 --> 00:01:02.330
middle of which is a large
pseudo random interleaver.

00:01:02.330 --> 00:01:06.740
The sum product algorithm is
therefore done iteratively.

00:01:06.740 --> 00:01:10.400
In general, the initial observed
information comes in

00:01:10.400 --> 00:01:13.880
on one side, the left side or
the right side, and the

00:01:13.880 --> 00:01:17.880
iterative schedule amounts to
doing first the left side,

00:01:17.880 --> 00:01:19.980
then the right side, then the
left side, then the right

00:01:19.980 --> 00:01:24.200
side, until you converge,
you hope.

00:01:24.200 --> 00:01:27.650
That was the original turbo
idea, and that continues to be

00:01:27.650 --> 00:01:30.290
the right way to do it.

00:01:30.290 --> 00:01:31.290
OK.

00:01:31.290 --> 00:01:33.800
Today, we're actually going to
try to do some analysis.

00:01:33.800 --> 00:01:38.300
To do the analysis, we're going
to focus on low density

00:01:38.300 --> 00:01:44.330
party check codes, which are
certainly far easier than

00:01:44.330 --> 00:01:46.460
turbo codes to analyze,
because they

00:01:46.460 --> 00:01:47.760
have such simple elements.

00:01:47.760 --> 00:01:51.580
I guess the repeat accumulate
codes are equally easy to

00:01:51.580 --> 00:01:55.510
analyze, but maybe not as
good in performance.

00:01:55.510 --> 00:01:58.200
Maybe they're as good,
I don't know.

00:01:58.200 --> 00:02:00.390
No one has driven that
as far as low density

00:02:00.390 --> 00:02:01.640
parity check codes.

00:02:03.880 --> 00:02:07.340
Also, we're going to take
a very simple channel.

00:02:07.340 --> 00:02:11.110
It's actually the channel for
which most of the analysis has

00:02:11.110 --> 00:02:16.170
been done, which is the Binary
Erasure Channel, where

00:02:16.170 --> 00:02:19.670
everything reduces to a
one-dimensional problem, and

00:02:19.670 --> 00:02:22.980
therefore we can do things
quite precisely.

00:02:22.980 --> 00:02:26.850
But this will allow me to
introduce density evolution,

00:02:26.850 --> 00:02:30.320
which is the generalization of
this for more general channels

00:02:30.320 --> 00:02:33.910
like the binary input added
white Gaussian noise channel,

00:02:33.910 --> 00:02:36.400
if I manage to go fast enough.

00:02:36.400 --> 00:02:39.340
I apologize today, I
do feel in a hurry.

00:02:39.340 --> 00:02:43.670
Nonetheless, please ask
questions whenever you want to

00:02:43.670 --> 00:02:47.220
slow me down or just get some
more understanding.

00:02:47.220 --> 00:02:50.310
So, the Binary Erasure Channel
is one of the elementary

00:02:50.310 --> 00:02:53.260
channels, if you've ever taken
information theory,

00:02:53.260 --> 00:02:54.620
that you look at.

00:02:54.620 --> 00:02:59.010
It has two inputs and
three outputs.

00:02:59.010 --> 00:03:02.910
The two inputs are 0 and 1, the
outputs are 0 and 1 or an

00:03:02.910 --> 00:03:06.350
erasure, an ambiguous output.

00:03:06.350 --> 00:03:11.110
If you send a 0, you can either
get the 0 correctly, or

00:03:11.110 --> 00:03:13.030
you could get an erasure.

00:03:13.030 --> 00:03:15.540
Might be a deletion, you just
don't get anything.

00:03:15.540 --> 00:03:18.080
Similarly, if you send
a 1, you either

00:03:18.080 --> 00:03:19.290
get a 1 or an erasure.

00:03:19.290 --> 00:03:22.480
There's no possibility of
getting something incorrectly.

00:03:22.480 --> 00:03:26.870
That's the key thing
about this channel.

00:03:26.870 --> 00:03:30.050
The probability of an erasure
is p, regardless of whether

00:03:30.050 --> 00:03:31.980
you send 0 or 1.

00:03:31.980 --> 00:03:36.330
So there's a single parameter
that governs this channel.

00:03:36.330 --> 00:03:40.486
Now admittedly, this is not
a very realistic channel.

00:03:40.486 --> 00:03:43.580
It's a toy channel in
the binary case.

00:03:43.580 --> 00:03:50.330
However, actually some of the
impetus for this development

00:03:50.330 --> 00:03:53.080
was the people who were
considering packet

00:03:53.080 --> 00:03:55.830
transmission on the internet.

00:03:55.830 --> 00:03:58.110
And in the case of packet
transmission on the internet,

00:03:58.110 --> 00:04:02.660
of course, you have a long
packet, a very non-binary

00:04:02.660 --> 00:04:06.750
symbol if you like, but if you
consider these to be packets,

00:04:06.750 --> 00:04:08.880
then on the internet, you either
receive the packet

00:04:08.880 --> 00:04:11.950
correctly or you fail
to receive it.

00:04:11.950 --> 00:04:16.250
You don't receive it at all, and
you know it, because there

00:04:16.250 --> 00:04:19.240
is an internal party check
in each packet.

00:04:19.240 --> 00:04:23.220
So the q-ary erasure channel is
in fact a realistic model,

00:04:23.220 --> 00:04:26.310
and in fact, there's a company,
Digital Fountain,

00:04:26.310 --> 00:04:31.310
that has been founded and is
still going strong as far as I

00:04:31.310 --> 00:04:36.510
know, which is specifically
devoted to solutions for this

00:04:36.510 --> 00:04:40.250
q-ary erasure channel for
particular kinds of scenarios

00:04:40.250 --> 00:04:43.140
on the internet where you want
to do forward error correction

00:04:43.140 --> 00:04:46.280
rather than repeat
transmission.

00:04:46.280 --> 00:04:51.370
And a lot of the early work on
the analysis of these guys--

00:04:51.370 --> 00:04:55.350
Luby, Shokrollahi,
other people--

00:04:55.350 --> 00:04:58.840
they were some of the people
who focused on low density

00:04:58.840 --> 00:05:03.730
party check codes immediately,
following work of Spielman and

00:05:03.730 --> 00:05:10.110
Sipser here at MIT, and said,
OK, suppose we try this on our

00:05:10.110 --> 00:05:11.950
q-ary erasure channel.

00:05:11.950 --> 00:05:15.440
And they were able to get very
close to the capacity of the

00:05:15.440 --> 00:05:18.190
q-ary erasure channel, which
is also 1 minus p.

00:05:18.190 --> 00:05:21.230
This is the information
theoretic capacity of the

00:05:21.230 --> 00:05:22.110
binary channel.

00:05:22.110 --> 00:05:25.940
It's kind of obvious that it
should be 1 minus p, because

00:05:25.940 --> 00:05:29.410
on the average, you get 1 minus
p good bits out for

00:05:29.410 --> 00:05:30.800
every bit that you send in.

00:05:30.800 --> 00:05:34.090
So the maximum rate you could
expect to send over this

00:05:34.090 --> 00:05:38.100
channel is 1 minus p.

00:05:38.100 --> 00:05:39.350
OK.

00:05:41.060 --> 00:05:44.740
Let's first think about maximum
likelihood decoding on

00:05:44.740 --> 00:05:45.300
this channel.

00:05:45.300 --> 00:05:49.980
Suppose we take a word from a
code, from a binary code, and

00:05:49.980 --> 00:05:53.750
send it through this channel,
and we get some erasure

00:05:53.750 --> 00:05:55.880
pattern at the output.

00:05:55.880 --> 00:05:59.750
So we have a subset of the
bits that are good, and a

00:05:59.750 --> 00:06:03.310
subset that are erased
at the output.

00:06:03.310 --> 00:06:06.150
Now what does maximum likelihood
decoding amount to

00:06:06.150 --> 00:06:07.400
on this channel?

00:06:10.630 --> 00:06:15.780
Well, the code word that we sent
is going to match up with

00:06:15.780 --> 00:06:19.400
all the good bits
received, right?

00:06:19.400 --> 00:06:21.580
So we know that there's going to
be at least one word in the

00:06:21.580 --> 00:06:23.940
code that agrees with
the received

00:06:23.940 --> 00:06:25.690
sequence in the good places.

00:06:28.430 --> 00:06:30.970
If that's the only word in the
code that agrees with the

00:06:30.970 --> 00:06:33.310
received word in those places,
then we can declare it the

00:06:33.310 --> 00:06:34.900
winner, right?

00:06:34.900 --> 00:06:37.070
And maximum likelihood
decoding succeeds.

00:06:37.070 --> 00:06:40.130
We know what the channel is, so
we know that all the good

00:06:40.130 --> 00:06:43.070
bits have to match up
with the code word.

00:06:43.070 --> 00:06:46.120
But suppose there are 2 words
in the code that match up in

00:06:46.120 --> 00:06:48.960
all the good places?

00:06:48.960 --> 00:06:50.790
There's no way to decide
between them, right?

00:06:54.180 --> 00:06:56.900
So basically, that's what
maximum likelihood decoding

00:06:56.900 --> 00:06:58.630
amounts to.

00:06:58.630 --> 00:07:07.410
You simply check how many
code words match the

00:07:07.410 --> 00:07:09.120
received good bits.

00:07:09.120 --> 00:07:12.160
If there's only one,
you decode.

00:07:12.160 --> 00:07:15.480
If there's more than one, you
could flip a coin, but we'll

00:07:15.480 --> 00:07:18.290
consider that to be a
decoding failure.

00:07:18.290 --> 00:07:20.820
You just don't know, so you
throw up your hands, you have

00:07:20.820 --> 00:07:24.110
a detected decoding failure.

00:07:24.110 --> 00:07:28.860
So in the case of a linear code,
what are we doing here?

00:07:28.860 --> 00:07:31.950
In the case of a linear
code, consider the

00:07:31.950 --> 00:07:33.130
parity check equations.

00:07:33.130 --> 00:07:38.030
We basically have n minus k
parity check equations, and

00:07:38.030 --> 00:07:44.040
we're trying to find how many
code sequences that solve

00:07:44.040 --> 00:07:45.780
those parity check equations.

00:07:45.780 --> 00:07:51.350
So we have n minus k equations,
n unknowns, and

00:07:51.350 --> 00:07:55.310
we're basically just trying
to solve linear equations.

00:07:55.310 --> 00:07:57.240
So that would be one
decoding method for

00:07:57.240 --> 00:07:59.250
maximum likelihood decoding.

00:07:59.250 --> 00:08:00.370
Solve the equations.

00:08:00.370 --> 00:08:02.450
If you get a unique solution,
you're finished.

00:08:02.450 --> 00:08:05.970
If you get a space of solutions,
so dimension one or

00:08:05.970 --> 00:08:09.270
more, you lose.

00:08:09.270 --> 00:08:09.730
OK?

00:08:09.730 --> 00:08:14.330
So we know lots of ways of
solving linear equations like

00:08:14.330 --> 00:08:17.068
Gaussian elimination, back
propagation/back substitution

00:08:17.068 --> 00:08:18.318
(I'm not sure exactly
what it's called).

00:08:21.290 --> 00:08:24.140
That's actually what we will
be doing with low density

00:08:24.140 --> 00:08:26.630
parity check codes.

00:08:26.630 --> 00:08:30.780
And so, decoding for the binary
ratio channel you can

00:08:30.780 --> 00:08:33.740
think of as just trying to
solve linear equations.

00:08:33.740 --> 00:08:39.840
If you get a unique solution,
you win, otherwise, you fail.

00:08:39.840 --> 00:08:45.460
Another way of looking at it
in a linear code is what do

00:08:45.460 --> 00:08:48.920
the good bits have to form?

00:08:48.920 --> 00:08:52.920
The erased bits have to
be a function of the

00:08:52.920 --> 00:08:55.570
good bits, all right?

00:08:55.570 --> 00:08:59.450
In linear code, that's just
a function of where

00:08:59.450 --> 00:09:00.630
the good bits are.

00:09:00.630 --> 00:09:02.940
We've run into this
concept before.

00:09:02.940 --> 00:09:05.366
We called it an information
set.

00:09:05.366 --> 00:09:09.330
An information set is a subset
of the coordinates that

00:09:09.330 --> 00:09:11.760
basically determines the
rest the coordinates.

00:09:11.760 --> 00:09:14.150
If you know the bits in
that subset, then you

00:09:14.150 --> 00:09:14.840
know the code word.

00:09:14.840 --> 00:09:17.870
You can fill out the rest of
the code word through some

00:09:17.870 --> 00:09:19.050
linear equation.

00:09:19.050 --> 00:09:24.500
So basically, we're going to
succeed if the good bits cover

00:09:24.500 --> 00:09:29.840
an information set, and we're
going to fail otherwise.

00:09:29.840 --> 00:09:35.200
So how many bits do we need to
cover an information set?

00:09:35.200 --> 00:09:37.690
We're certainly going
to need at least k.

00:09:37.690 --> 00:09:40.440
Now today, we're going to be
considering very long codes.

00:09:40.440 --> 00:09:45.550
So suppose I have a
long (n,k) code.

00:09:45.550 --> 00:09:50.050
I have an (n,k) code, and I
transmit it over this channel.

00:09:50.050 --> 00:09:52.915
About how many bits are
going to be erased?

00:09:55.700 --> 00:09:59.400
About pn bits are going
to be erased, or (1

00:09:59.400 --> 00:10:01.380
minus p) times n.

00:10:01.380 --> 00:10:03.510
We're going to get
approximately--

00:10:03.510 --> 00:10:05.120
law of large numbers--

00:10:05.120 --> 00:10:14.340
(1 minus p) times n unerased
bits, and this has to be

00:10:14.340 --> 00:10:18.400
clearly greater than k.

00:10:18.400 --> 00:10:22.620
OK, so with very high
probability, if we get more

00:10:22.620 --> 00:10:27.190
than that, we'll be able to
solve the equations, find a

00:10:27.190 --> 00:10:28.020
unique code word.

00:10:28.020 --> 00:10:31.390
If we get fewer than that,
there's no possible way we

00:10:31.390 --> 00:10:32.550
could solve the equations.

00:10:32.550 --> 00:10:34.980
We don't have enough left.

00:10:34.980 --> 00:10:37.310
So what does this say?

00:10:37.310 --> 00:10:44.600
This says that k over n, which
is the code rate, has to be

00:10:44.600 --> 00:10:49.260
less than 1 minus p in order
for this maximum likelihood

00:10:49.260 --> 00:10:52.480
decoding to work with a linear
code over the binary erasure

00:10:52.480 --> 00:10:57.370
channel, and that is consistent
with this.

00:10:57.370 --> 00:11:00.410
If the rate is less than 1 minus
p, then with very high

00:11:00.410 --> 00:11:02.480
probability you're going
to be successful.

00:11:02.480 --> 00:11:08.360
If it's greater than 1 minus p,
no chance, as n becomes 1.

00:11:08.360 --> 00:11:08.890
OK?

00:11:08.890 --> 00:11:09.820
You with me?

00:11:09.820 --> 00:11:11.070
AUDIENCE: [UNINTELLIGIBLE]?

00:11:13.080 --> 00:11:17.090
PROFESSOR: Well, in general,
they're not, and the first

00:11:17.090 --> 00:11:21.630
exercise on the homework
says take the 844 code.

00:11:21.630 --> 00:11:26.070
There's certain places where if
you erase 4 bits, you lose,

00:11:26.070 --> 00:11:27.760
and there are other places
where if you

00:11:27.760 --> 00:11:31.100
erase 4 bits, you win.

00:11:31.100 --> 00:11:37.630
And that exercise also points
out that the low density

00:11:37.630 --> 00:11:39.880
parity check decoding that
we're going to do, the

00:11:39.880 --> 00:11:43.520
graphical decoding, may fail
in a case where maximum

00:11:43.520 --> 00:11:46.660
likelihood decoding
might work.

00:11:46.660 --> 00:11:48.770
But maximum likelihood decoding
is certainly the best

00:11:48.770 --> 00:11:53.040
we can do, so it's clear.

00:11:53.040 --> 00:11:56.090
You can't signal at a rate
greater than 1 minus p.

00:11:56.090 --> 00:11:59.480
You just don't get more than 1
minus p bits of information,

00:11:59.480 --> 00:12:03.920
or n times (1 minus p) bits of
information in a code word of

00:12:03.920 --> 00:12:08.030
length n, so you can't possibly
communicate more than

00:12:08.030 --> 00:12:13.000
n times (1 minus p)
bits in a block.

00:12:13.000 --> 00:12:14.250
OK.

00:12:16.820 --> 00:12:20.920
So what are we going
to try to do to

00:12:20.920 --> 00:12:24.640
signal over this channel?

00:12:24.640 --> 00:12:29.500
We're going to try using a low
density parity check code.

00:12:29.500 --> 00:12:31.330
Actually, I guess I did
want this first.

00:12:34.232 --> 00:12:37.790
Let me talk about both of
these back and forth.

00:12:37.790 --> 00:12:39.500
Sorry, Mr. TV guy.

00:12:42.000 --> 00:12:45.210
So we're going to use a low
density parity check code, and

00:12:45.210 --> 00:12:50.110
initially, we're going to assume
a regular code with,

00:12:50.110 --> 00:12:56.860
say, left degree is 3 over
here, right degree is 6.

00:12:56.860 --> 00:13:00.650
And we're going to try to decode
by using the iterative

00:13:00.650 --> 00:13:03.310
sum product algorithm with
a left right schedule.

00:13:07.340 --> 00:13:10.140
OK.

00:13:10.140 --> 00:13:13.550
I can work either
here or up here.

00:13:13.550 --> 00:13:20.470
What are the rules for sum
product update on a binary

00:13:20.470 --> 00:13:22.170
erasure channel?

00:13:22.170 --> 00:13:30.220
Let's just start out, and walk
through it a little bit, and

00:13:30.220 --> 00:13:33.195
then step back and develop
some more general rules.

00:13:35.700 --> 00:13:40.080
What is the message coming
in here that we

00:13:40.080 --> 00:13:41.680
receive from the channel?

00:13:41.680 --> 00:13:45.340
We're going to convert it
into an APP vector.

00:13:45.340 --> 00:13:48.480
What could the APP vector be?

00:13:48.480 --> 00:13:56.360
It's either, say, 0, 1 or 1,
0, if the bit is unerased.

00:13:56.360 --> 00:14:00.300
So this would be-- if we get
this, we know a posteriori

00:14:00.300 --> 00:14:04.145
probability of a 0 is a
1, and of a 1 is a 0.

00:14:04.145 --> 00:14:06.240
No question, we have
certainty.

00:14:06.240 --> 00:14:10.980
Similarly down here, it's a 1.

00:14:10.980 --> 00:14:13.090
And in here, it's 1/2, 1/2.

00:14:15.610 --> 00:14:17.310
Complete uncertainty.

00:14:17.310 --> 00:14:18.560
No information.

00:14:20.610 --> 00:14:22.050
So, we can get--

00:14:22.050 --> 00:14:25.570
those are our three
possibilities off the channel.

00:14:25.570 --> 00:14:29.320
(0,1), (1,0), (1/2, 1/2).

00:14:29.320 --> 00:14:32.380
Now, if we get a certain bit
coming in, what are the

00:14:32.380 --> 00:14:36.520
messages going out on each
of these lines here?

00:14:36.520 --> 00:14:38.720
We actually only need
to know this one.

00:14:38.720 --> 00:14:42.660
Initially, everything inside
here is complete ignorance.

00:14:42.660 --> 00:14:44.120
Half, 1/2 everywhere.

00:14:44.120 --> 00:14:45.730
You can consider everything
to be erased.

00:14:48.420 --> 00:14:48.750
All right.

00:14:48.750 --> 00:14:53.470
Well, if we got a known bit
coming in, a 0 or a 1, the

00:14:53.470 --> 00:14:55.780
repetition node simply
says propagate

00:14:55.780 --> 00:14:57.260
that through all here.

00:14:57.260 --> 00:15:01.950
So if you worked out the sum
product update rule for here,

00:15:01.950 --> 00:15:07.100
it would basically say, in this
message, if any of these

00:15:07.100 --> 00:15:11.980
lines is known, then this line
is known and we have a certain

00:15:11.980 --> 00:15:13.950
bit going out.

00:15:13.950 --> 00:15:14.370
All right?

00:15:14.370 --> 00:15:17.140
So if 0, 1 comes in,
we'll get 0, 1 out.

00:15:17.140 --> 00:15:20.210
It's certainly a 1.

00:15:20.210 --> 00:15:26.240
Only in the case where all of
these other lines are erased--

00:15:26.240 --> 00:15:28.600
all these other incoming
messages are erasures-- then

00:15:28.600 --> 00:15:30.180
we don't know anything,
then the output

00:15:30.180 --> 00:15:32.260
has to be an erasure.

00:15:32.260 --> 00:15:32.780
All right?

00:15:32.780 --> 00:15:38.465
So that's the sum product update
rule at a equals node.

00:15:38.465 --> 00:15:38.850
All right?

00:15:38.850 --> 00:15:44.050
If any of these d minus 1
incoming messages is known,

00:15:44.050 --> 00:15:45.820
then the output is known.

00:15:45.820 --> 00:15:48.280
If they're all unknown, then
the output is unknown.

00:15:52.360 --> 00:15:55.490
You're going to find, in
general, these are the only

00:15:55.490 --> 00:15:58.010
kinds of messages we're ever
going to have to deal with.

00:15:58.010 --> 00:16:01.920
Either, we're basically going
to take known bits and

00:16:01.920 --> 00:16:04.235
propagate them through
the graph--

00:16:09.800 --> 00:16:12.770
so initially, everything is
erased, and after awhile, we

00:16:12.770 --> 00:16:14.000
start learning things.

00:16:14.000 --> 00:16:16.910
More and more things become
known, and we succeed if

00:16:16.910 --> 00:16:20.000
everything becomes known
inside the graph.

00:16:20.000 --> 00:16:20.380
All right?

00:16:20.380 --> 00:16:23.500
So it's just the propagation of
unerased variables through

00:16:23.500 --> 00:16:25.272
this graph.

00:16:25.272 --> 00:16:26.522
AUDIENCE: [UNINTELLIGIBLE]

00:16:29.240 --> 00:16:30.490
PROFESSOR: No.

00:16:32.712 --> 00:16:36.400
So they're not only known,
but they're correct.

00:16:36.400 --> 00:16:40.550
And like everything else, you
can prove that by induction.

00:16:40.550 --> 00:16:43.650
The bits that we receive from
the channel certainly have to

00:16:43.650 --> 00:16:46.430
be consistent with the
correct code word.

00:16:46.430 --> 00:16:49.110
All these internal constraints
are the constraints of the

00:16:49.110 --> 00:16:52.830
code, so we can never generate
an incorrect message.

00:16:52.830 --> 00:16:55.080
That's basically the hand
waving proof of that.

00:16:57.800 --> 00:16:59.050
OK.

00:17:02.510 --> 00:17:06.410
So we're going to propagate
either known bits or erasures

00:17:06.410 --> 00:17:08.920
in the first iteration.

00:17:08.920 --> 00:17:12.490
And what's the fraction of these
lines that's going to be

00:17:12.490 --> 00:17:16.003
erased in a very long code?

00:17:16.003 --> 00:17:16.470
AUDIENCE: [UNINTELLIGIBLE]

00:17:16.470 --> 00:17:17.579
PROFESSOR: Its going to be p.

00:17:17.579 --> 00:17:18.099
All right?

00:17:18.099 --> 00:17:25.520
So initially, we have fraction
p erased and fraction 1 minus

00:17:25.520 --> 00:17:28.790
p which are good.

00:17:28.790 --> 00:17:30.230
OK.

00:17:30.230 --> 00:17:33.750
And then, this, we'll take this
to be a perfectly random

00:17:33.750 --> 00:17:34.380
interleaver.

00:17:34.380 --> 00:17:39.730
So perfectly randomly,
this comes out there.

00:17:39.730 --> 00:17:40.980
OK?

00:17:44.900 --> 00:17:47.410
All right, so now we have
various messages

00:17:47.410 --> 00:17:49.850
coming in over here.

00:17:49.850 --> 00:17:54.250
Some are erased, some are known
and correct, and that's

00:17:54.250 --> 00:17:56.730
the only things they can be.

00:17:56.730 --> 00:17:59.030
All right, what can we do
on the right side now?

00:17:59.030 --> 00:18:02.460
On the right side, we have to
execute the sum product

00:18:02.460 --> 00:18:05.400
algorithm for a zero sum
node of this type.

00:18:07.960 --> 00:18:09.210
What is the rule here?

00:18:11.880 --> 00:18:16.710
Clearly, if we get good data
on all these input bits, we

00:18:16.710 --> 00:18:18.980
know what the output bit is.

00:18:18.980 --> 00:18:22.570
So if we get five good ones over
here, we can tell what

00:18:22.570 --> 00:18:23.820
the sixth one has to be.

00:18:26.790 --> 00:18:33.690
However, if any of these is
erased, then what's the

00:18:33.690 --> 00:18:36.105
probability this
is a 0 or a 1?

00:18:36.105 --> 00:18:38.550
It's 1/2, 1/2.

00:18:38.550 --> 00:18:42.230
So any erasure here
means we get no

00:18:42.230 --> 00:18:43.800
information out of this node.

00:18:43.800 --> 00:18:47.230
We get an erasure coming out.

00:18:47.230 --> 00:18:56.960
All right, so we come in here,
and if p is some large number,

00:18:56.960 --> 00:19:01.200
the rate of this code is 1/2.

00:19:01.200 --> 00:19:04.460
So I'm going to do a simulation
for like p equals a

00:19:04.460 --> 00:19:06.290
little less--

00:19:06.290 --> 00:19:10.240
small enough so that this code
could succeed, 0.4--

00:19:10.240 --> 00:19:16.050
so the capacity is
0.6 bits per bit.

00:19:16.050 --> 00:19:21.140
But if this is 0.4, what's the
probability that any 5 of

00:19:21.140 --> 00:19:23.640
these are all going
to be unerased?

00:19:23.640 --> 00:19:24.890
It's pretty small.

00:19:28.850 --> 00:19:34.380
So you won't be surprised to
learn that the probability of

00:19:34.380 --> 00:19:36.900
an erasure of coming back--

00:19:36.900 --> 00:19:38.800
call that q--

00:19:38.800 --> 00:19:42.930
equals 0.9 or more,
greater than 0.9.

00:19:42.930 --> 00:19:45.330
But it's not 1.

00:19:45.330 --> 00:19:49.290
So for some small fraction of
these over here, we're going

00:19:49.290 --> 00:19:51.780
to get some information, some
additional information, that

00:19:51.780 --> 00:19:53.490
we didn't have before.

00:19:53.490 --> 00:19:57.140
And this is going to propagate
randomly back, and it may

00:19:57.140 --> 00:20:02.240
allow us to now know some of
these bits that were initially

00:20:02.240 --> 00:20:03.490
erased on the channel.

00:20:07.620 --> 00:20:10.080
So that's the idea.

00:20:10.080 --> 00:20:14.410
So to understand the
performance of

00:20:14.410 --> 00:20:18.850
this, we simply track--

00:20:18.850 --> 00:20:22.850
let me call this, in general,
the erasure probability going

00:20:22.850 --> 00:20:26.680
from left to right, and this,
in general, we'll call the

00:20:26.680 --> 00:20:29.770
erasure probability going
from right to left.

00:20:32.630 --> 00:20:36.520
And we can actually compute what
these probabilities are

00:20:36.520 --> 00:20:39.810
for each iteration under the
assumption that the code is

00:20:39.810 --> 00:20:44.090
very long and random so that
every time we make a

00:20:44.090 --> 00:20:46.690
computation, we're dealing
with completely fresh and

00:20:46.690 --> 00:20:48.230
independent information.

00:20:48.230 --> 00:20:49.590
And that's what we're
going to do.

00:20:49.590 --> 00:20:50.170
Yes?

00:20:50.170 --> 00:20:51.420
AUDIENCE: [UNINTELLIGIBLE]

00:20:58.410 --> 00:21:02.120
PROFESSOR: When they come from
the right side, they're either

00:21:02.120 --> 00:21:04.180
erased or they're consistent.

00:21:07.610 --> 00:21:11.670
I argued before, waving my
hands, that these messages

00:21:11.670 --> 00:21:13.400
could never be incorrect.

00:21:13.400 --> 00:21:16.680
So if you get 2 known messages,
they can't conflict

00:21:16.680 --> 00:21:18.220
with each other.

00:21:18.220 --> 00:21:20.302
Is that your concern?

00:21:20.302 --> 00:21:20.788
AUDIENCE: Yeah.

00:21:20.788 --> 00:21:23.218
Because you're randomly
connecting [UNINTELLIGIBLE],

00:21:23.218 --> 00:21:26.620
so it might be that one of the
plus signs gave you an

00:21:26.620 --> 00:21:29.174
[UNINTELLIGIBLE], whereas
another plus sign gave you a

00:21:29.174 --> 00:21:30.106
proper message.

00:21:30.106 --> 00:21:33.050
And they both run back
to the same equation.

00:21:33.050 --> 00:21:34.450
PROFESSOR: Well, OK.

00:21:34.450 --> 00:21:36.430
So this is pseudo random,
but is chosen for

00:21:36.430 --> 00:21:37.280
once and for all.

00:21:37.280 --> 00:21:38.340
It determines the code.

00:21:38.340 --> 00:21:42.900
I don't re-choose it every time,
but when I analyze it,

00:21:42.900 --> 00:21:46.140
I'll assume that it's random
enough so that the bits that

00:21:46.140 --> 00:21:48.630
enter into any one calculation
are bits that I've never seen

00:21:48.630 --> 00:21:51.690
before, and therefore can be
taken to be entirely random.

00:21:51.690 --> 00:21:55.180
But of course, in actual
practice, you've got a fixed

00:21:55.180 --> 00:21:57.050
interleaver here, and you
have to, in order

00:21:57.050 --> 00:21:59.590
to decode the code.

00:21:59.590 --> 00:22:04.010
But the other concern here
is if we actually had the

00:22:04.010 --> 00:22:07.670
possibility of errors, the pure
binary erasure channel

00:22:07.670 --> 00:22:08.680
never allows errors.

00:22:08.680 --> 00:22:12.450
If this actually allowed a 0 to
go to a 1 or a 1 to go to

00:22:12.450 --> 00:22:14.860
0, then we'd have an altogether
different situation

00:22:14.860 --> 00:22:18.920
over here, and we'd have to
simply honestly compute the

00:22:18.920 --> 00:22:22.110
sum product algorithm and what
is the APP if we have some

00:22:22.110 --> 00:22:23.580
probability of error.

00:22:23.580 --> 00:22:25.890
And they could conflict, and
we'd have to weigh the

00:22:25.890 --> 00:22:29.340
evidence, and take the
dominating evidence, or mix it

00:22:29.340 --> 00:22:31.640
all up into the single
parameter

00:22:31.640 --> 00:22:32.890
that we call the APP.

00:22:36.460 --> 00:22:37.710
All right.

00:22:39.710 --> 00:22:45.695
So let me now do a
little analysis.

00:22:45.695 --> 00:22:48.060
Actually, I've done this
a couple places.

00:22:50.760 --> 00:22:55.110
Suppose the probability
of erasure here--

00:22:55.110 --> 00:23:00.620
this is the q right
to left parameter.

00:23:00.620 --> 00:23:06.160
Suppose the probability of q
right to left is 0.9, or

00:23:06.160 --> 00:23:13.420
whatever, and this is the
original received message from

00:23:13.420 --> 00:23:16.685
the channel, which had an
erasure probability of p.

00:23:19.480 --> 00:23:22.530
What's the q left to right?

00:23:22.530 --> 00:23:28.036
What's the erasure probability
for the outgoing message?

00:23:28.036 --> 00:23:31.390
Well, the outgoing message is
erased only if all of these

00:23:31.390 --> 00:23:34.480
incoming messages are erased.

00:23:34.480 --> 00:23:38.450
All right, so this is simply
p times q right to

00:23:38.450 --> 00:23:42.960
left, d minus 1.

00:23:42.960 --> 00:23:44.045
OK?

00:23:44.045 --> 00:23:45.295
AUDIENCE: [UNINTELLIGIBLE]

00:23:47.910 --> 00:23:49.860
PROFESSOR: Assuming it's
a long random code, so

00:23:49.860 --> 00:23:52.580
everything here is
independent.

00:23:52.580 --> 00:23:55.360
I'll say something else about
this in just a second.

00:23:58.150 --> 00:24:01.495
But let's naively make that
assumption right now, and then

00:24:01.495 --> 00:24:04.830
see how best we can
justify it.

00:24:04.830 --> 00:24:06.020
What's the rule over here?

00:24:06.020 --> 00:24:11.770
Here, we're over on the right
side if we want to compute the

00:24:11.770 --> 00:24:13.430
right to left.

00:24:13.430 --> 00:24:18.140
If these are all erased with
probability q left to right,

00:24:18.140 --> 00:24:24.944
what is the probability that
this one going out is erased?

00:24:24.944 --> 00:24:27.770
Well, it's easier to compute
here the probability of not

00:24:27.770 --> 00:24:29.050
being erased.

00:24:29.050 --> 00:24:34.230
This is not erased only if all
of these are not erased.

00:24:34.230 --> 00:24:38.800
So we get q right to left.

00:24:38.800 --> 00:24:43.780
One minus q right to left is
equal to 1 minus q left to

00:24:43.780 --> 00:24:48.070
right, to the d minus 1.

00:24:48.070 --> 00:24:51.630
And let's see, this is d right,
and this is d left.

00:24:54.830 --> 00:24:58.770
I'm doing it for the
specific context.

00:24:58.770 --> 00:25:05.320
OK, so under the independence
assumption, we can compute

00:25:05.320 --> 00:25:09.900
exactly what these evolving
erasure probabilities are as

00:25:09.900 --> 00:25:12.520
we go through this left
right iteration.

00:25:12.520 --> 00:25:16.660
This is what's so neat about
this whole thing.

00:25:16.660 --> 00:25:22.430
Now, here's the best argument
for why these are all

00:25:22.430 --> 00:25:24.440
independent.

00:25:24.440 --> 00:25:30.690
Let's look at the messages
that enter into, say, a

00:25:30.690 --> 00:25:31.950
particular--

00:25:31.950 --> 00:25:35.610
this is computing q left
to right down here.

00:25:35.610 --> 00:25:40.030
All right, we've got something
coming in, one bit here.

00:25:40.030 --> 00:25:45.860
We've got more bits coming in
up here, and here, which

00:25:45.860 --> 00:25:48.800
originally came from bits
coming in up here.

00:25:48.800 --> 00:25:50.870
We have a tree of computation.

00:25:50.870 --> 00:25:54.120
If we went back through this
pseudo random but fixed

00:25:54.120 --> 00:25:58.900
interleaver, we could actually
draw this tree for every

00:25:58.900 --> 00:26:04.790
instance of every computation,
and this would be q left to

00:26:04.790 --> 00:26:07.610
right at the nth iteration,
this is--

00:26:07.610 --> 00:26:08.882
I'm sorry.

00:26:08.882 --> 00:26:12.470
Yeah, this is q left to right at
the nth iteration, this is

00:26:12.470 --> 00:26:17.630
q right to left at the n minus
first iteration, this is q

00:26:17.630 --> 00:26:22.300
left to right at the n minus
first iteration, and so forth.

00:26:26.500 --> 00:26:31.190
Now, the argument is
that if I go back--

00:26:31.190 --> 00:26:36.050
let's fix the number of
iterations I go back here--

00:26:36.050 --> 00:26:40.690
m, let's say, and I want to do
an analysis of the first m

00:26:40.690 --> 00:26:43.400
iterations.

00:26:43.400 --> 00:26:48.380
I claim that as this code
becomes long, n goes to

00:26:48.380 --> 00:26:54.340
infinity with fixed d lambda,
d rho, that the probability

00:26:54.340 --> 00:26:58.220
you're ever going to run into
repeated bit or message up

00:26:58.220 --> 00:27:00.780
here goes to 0.

00:27:00.780 --> 00:27:02.230
All right?

00:27:02.230 --> 00:27:04.510
So I fix the number
of iterations I'm

00:27:04.510 --> 00:27:05.220
going to look at.

00:27:05.220 --> 00:27:07.500
I let the length of the
code go to infinity.

00:27:07.500 --> 00:27:11.340
I let everything be chosen
pseudo randomly over here.

00:27:11.340 --> 00:27:17.570
Then the probability of seeing
the same message or bit twice

00:27:17.570 --> 00:27:19.680
in this tree goes to 0.

00:27:19.680 --> 00:27:22.740
And therefore, in that limit,
the independence assumption

00:27:22.740 --> 00:27:23.590
become valid.

00:27:23.590 --> 00:27:27.390
That is basically the
argument, all right?

00:27:27.390 --> 00:27:29.840
So I can analyze any
fixed number of

00:27:29.840 --> 00:27:31.190
iterations in this way.

00:27:39.142 --> 00:27:40.392
AUDIENCE: [UNINTELLIGIBLE]

00:27:42.640 --> 00:27:43.770
PROFESSOR: OK, yes.

00:27:43.770 --> 00:27:45.140
Good.

00:27:45.140 --> 00:27:51.010
So this is saying the girth
is probabilistically--

00:27:51.010 --> 00:27:59.860
so limit in probability going
to infinity, or it's also

00:27:59.860 --> 00:28:02.110
referred to as the locally
tree-like assumption.

00:28:05.330 --> 00:28:09.260
OK, graph in the neighborhood
of any node--

00:28:09.260 --> 00:28:12.380
this is kind of a map of the
neighborhood back for a

00:28:12.380 --> 00:28:13.630
distance of m--

00:28:16.730 --> 00:28:18.640
we're not ever going to
run into any cycles.

00:28:23.530 --> 00:28:26.420
Good, thank you.

00:28:26.420 --> 00:28:29.270
OK, so under that assumption,
now we

00:28:29.270 --> 00:28:30.760
can do an exact analysis.

00:28:30.760 --> 00:28:32.010
This is what's amazing.

00:28:35.760 --> 00:28:36.760
And how do we do it?

00:28:36.760 --> 00:28:39.180
Here's a good way of doing it.

00:28:39.180 --> 00:28:42.120
We just draw the curves of these
2 equations, and we go

00:28:42.120 --> 00:28:43.480
back and forth between them.

00:28:46.180 --> 00:28:49.980
And this was actually a
technique invented earlier for

00:28:49.980 --> 00:28:53.130
turbo codes, but it works very
nicely for low density parity

00:28:53.130 --> 00:28:54.890
check code analysis.

00:28:54.890 --> 00:28:57.280
It's called the exit chart.

00:28:57.280 --> 00:29:01.570
I've drawn it in a somewhat
peculiar way, but it's so that

00:29:01.570 --> 00:29:04.360
it will look like the exit
charts you might see an in the

00:29:04.360 --> 00:29:06.318
literature.

00:29:06.318 --> 00:29:10.100
So I'm just drawing q right to
left on this axis, and q left

00:29:10.100 --> 00:29:11.870
to right on this axis.

00:29:11.870 --> 00:29:15.250
I want to sort of start in the
lower left and work my way up

00:29:15.250 --> 00:29:18.030
to the upper right, which
is the way exit

00:29:18.030 --> 00:29:19.160
charts always work.

00:29:19.160 --> 00:29:23.610
So to do that, I basically
invert the axis and take it

00:29:23.610 --> 00:29:25.410
from 1 down to 0.

00:29:25.410 --> 00:29:27.640
Initially, both of these--

00:29:27.640 --> 00:29:30.350
the probability is one that
everything is erased

00:29:30.350 --> 00:29:34.830
internally on every edge, and if
things work out, we'll get

00:29:34.830 --> 00:29:38.390
up to the point where nothing
is erased with high

00:29:38.390 --> 00:29:39.640
probability.

00:29:41.500 --> 00:29:46.290
OK, these are our 2 equations
just copied from over there

00:29:46.290 --> 00:29:49.810
for the specific case of left
degree equals 3 and right

00:29:49.810 --> 00:29:52.900
degree equals 6.

00:29:52.900 --> 00:29:56.650
And so I just plot the curves
of these 2 equations.

00:29:56.650 --> 00:30:02.090
This is done in the notes, and
the important thing is that

00:30:02.090 --> 00:30:08.660
the curves don't cross, for
a value of p equal 0.4.

00:30:08.660 --> 00:30:13.030
One of these curves depends on
p, the other one doesn't.

00:30:13.030 --> 00:30:16.390
So this is just a simple little
quadratic curve here,

00:30:16.390 --> 00:30:19.150
and this is a fifth order
curve, and they look

00:30:19.150 --> 00:30:22.000
something like this.

00:30:22.000 --> 00:30:22.870
What does this mean?

00:30:22.870 --> 00:30:25.400
Initially, the q right
to left is 1.

00:30:25.400 --> 00:30:30.800
If I go through one iteration,
using the fact that I get this

00:30:30.800 --> 00:30:34.010
external information--

00:30:34.010 --> 00:30:35.610
extrinsic information--

00:30:35.610 --> 00:30:39.570
then q left to right becomes
0.4, so we do to the outer.

00:30:39.570 --> 00:30:45.130
Now, I have q left to right
propagating to the right side,

00:30:45.130 --> 00:30:50.290
and at this point, I get
something like 0.922, I think

00:30:50.290 --> 00:30:52.700
is the first one.

00:30:52.700 --> 00:30:55.280
So the q right to left
has gone from

00:30:55.280 --> 00:30:58.210
1 down to 0.9 something.

00:30:58.210 --> 00:31:00.260
OK, but that's better.

00:31:00.260 --> 00:31:05.050
Now, with that value of q, of
course I get a much more

00:31:05.050 --> 00:31:07.510
favorable situation
on the left.

00:31:07.510 --> 00:31:10.730
I go over to the left
side, and now I get

00:31:10.730 --> 00:31:14.240
some p equal to--

00:31:14.240 --> 00:31:17.670
this is all done
in the notes--

00:31:17.670 --> 00:31:20.880
0.34.

00:31:20.880 --> 00:31:23.790
So I've reduced my erasure
probability going from left to

00:31:23.790 --> 00:31:33.000
right, which in turn, helps me
out as I go over here, 0.875,

00:31:33.000 --> 00:31:35.490
and so forth.

00:31:35.490 --> 00:31:36.050
Are you with me?

00:31:36.050 --> 00:31:38.730
Does everyone see
what I'm doing?

00:31:38.730 --> 00:31:41.050
Any questions?

00:31:41.050 --> 00:31:44.250
Again, I'm claiming this is
an exact calculation--

00:31:44.250 --> 00:31:46.050
or I would call it
a simulation--

00:31:46.050 --> 00:31:48.830
of what the algorithm does
in each iteration.

00:31:48.830 --> 00:31:52.810
First iteration, first full,
left, right, right left, you

00:31:52.810 --> 00:31:53.490
get to here.

00:31:53.490 --> 00:31:56.840
Second one, you get to
here, and so forth.

00:31:56.840 --> 00:32:00.640
And I claim as n goes to
infinity, and everything is

00:32:00.640 --> 00:32:05.530
random, this is the
way the erasure

00:32:05.530 --> 00:32:06.800
probabilities will evolve.

00:32:09.440 --> 00:32:17.320
And it's clear visually that if
the curves don't cross, we

00:32:17.320 --> 00:32:21.060
get to the upper right corner,
which means decoding succeeds.

00:32:21.060 --> 00:32:26.640
There are no erasures anywhere
at the end of the day.

00:32:26.640 --> 00:32:29.790
And furthermore, you go and
you take a very long code,

00:32:29.790 --> 00:32:33.050
like 10 to the seventh bits,
and you simulate it on this

00:32:33.050 --> 00:32:36.120
channel, and it will behave
exactly like this.

00:32:36.120 --> 00:32:40.100
OK, so this is really a good
piece of analysis.

00:32:40.100 --> 00:32:43.810
So this reduces it to
very simple terms.

00:32:43.810 --> 00:32:48.760
We have 2 equations, and of
course they meet here at the

00:32:48.760 --> 00:32:50.700
(0,0) point.

00:32:50.700 --> 00:32:52.970
Substitute 0 in here,
you get 0 there.

00:32:52.970 --> 00:32:55.690
Substance 0 here,
you get 0 there.

00:32:55.690 --> 00:32:59.490
But if they don't meet anywhere
else, if there's no

00:32:59.490 --> 00:33:05.860
fixed point to this iterative
convergence, then decoding is

00:33:05.860 --> 00:33:07.680
going to succeed.

00:33:07.680 --> 00:33:10.920
So this is the whole question:
can we design 2 curves that

00:33:10.920 --> 00:33:12.170
don't cross?

00:33:22.020 --> 00:33:23.270
OK.

00:33:24.980 --> 00:33:29.330
So what do we expect
now to happen?

00:33:29.330 --> 00:33:32.670
Suppose we increase p.

00:33:32.670 --> 00:33:37.540
Suppose we increase p to 0.45,
which is another case that's

00:33:37.540 --> 00:33:41.580
considered in the notes,
what's going to happen?

00:33:41.580 --> 00:33:43.990
This curve is just a simple
quadratic, it's going to be

00:33:43.990 --> 00:33:45.720
dragged down a little bit.

00:33:45.720 --> 00:33:52.570
We're going to get some
different curve, which is just

00:33:52.570 --> 00:33:57.220
this curve scaled by
0.45 over 0.4.

00:33:57.220 --> 00:33:59.790
It's going to start here,
and it's going to

00:33:59.790 --> 00:34:00.980
be this scaled curve.

00:34:00.980 --> 00:34:03.895
And unfortunately, those
2 curves cross.

00:34:07.550 --> 00:34:12.510
So that's the way it's going
to look, and now, again, we

00:34:12.510 --> 00:34:18.429
can simulate iterative decoding
for this case.

00:34:18.429 --> 00:34:20.290
Again, initially,
we'll start out.

00:34:20.290 --> 00:34:24.760
We'll go from 1, 0.45 will be
our right going erasure

00:34:24.760 --> 00:34:25.350
probability.

00:34:25.350 --> 00:34:28.980
We'll go over here, make
some progress, but

00:34:28.980 --> 00:34:30.690
what's going to happen?

00:34:30.690 --> 00:34:32.395
We're going to get stuck
right there.

00:34:36.260 --> 00:34:37.639
So we find the fixed point.

00:34:37.639 --> 00:34:40.480
In fact, this simulation is
a very efficient way of

00:34:40.480 --> 00:34:45.170
calculating what the fixed point
of these 2 curves are.

00:34:45.170 --> 00:34:47.770
Probably some of you are
analytical whizzes and can do

00:34:47.770 --> 00:34:50.350
it analytically, but
it's not that easy

00:34:50.350 --> 00:34:51.600
for a quintic equation.

00:34:55.699 --> 00:35:00.430
In any case, as far as decoding
is concerned--

00:35:00.430 --> 00:35:03.390
all right, this code doesn't
work on an erasure channel

00:35:03.390 --> 00:35:05.940
which has an erasure probability
of 0.45.

00:35:05.940 --> 00:35:10.120
It does work on one that has an
erasure probability of 0.4.

00:35:10.120 --> 00:35:16.770
That should suggest
to you-- yeah?

00:35:16.770 --> 00:35:18.020
AUDIENCE: [UNINTELLIGIBLE]

00:35:20.520 --> 00:35:23.520
PROFESSOR: Yes, so this code
doesn't get to capacity.

00:35:23.520 --> 00:35:24.770
Too bad.

00:35:27.690 --> 00:35:33.540
So I'm not claiming that a
regular d left equals 3, d

00:35:33.540 --> 00:35:37.010
right equals 6 LDPC code
can achieve capacity.

00:35:40.030 --> 00:35:44.030
There's some threshold for p,
below which it'll work, and

00:35:44.030 --> 00:35:45.830
above which it won't work.

00:35:45.830 --> 00:35:50.460
That threshold is somewhere
between 0.4 and 0.45.

00:35:50.460 --> 00:35:53.510
In fact, it's 0.429 something
or other.

00:35:53.510 --> 00:36:00.470
So this design approach will
succeed it's near capacity,

00:36:00.470 --> 00:36:02.790
but I certainly don't
claim this is a

00:36:02.790 --> 00:36:04.350
capacity approaching code.

00:36:13.360 --> 00:36:17.920
I might mention now something
called the area theorem,

00:36:17.920 --> 00:36:20.310
because it's easy to do
now and it will be

00:36:20.310 --> 00:36:22.730
harder to do later.

00:36:22.730 --> 00:36:24.140
What is this area here?

00:36:28.340 --> 00:36:31.240
I'm saying the area above
this curve here.

00:36:35.344 --> 00:36:38.740
Well, you can do that simply
by integrating this.

00:36:38.740 --> 00:36:46.870
It's integral of p times
q-squared dq from 0 to 1, and

00:36:46.870 --> 00:36:49.840
it turns out to be p over 3.

00:36:49.840 --> 00:36:51.090
Believe me?

00:36:54.360 --> 00:37:00.130
Which happens to be p over
the left degree.

00:37:00.130 --> 00:37:06.230
Not fortuitously, because this
is the left degree minus 1.

00:37:06.230 --> 00:37:08.800
So you're always going to get
p over the left degree.

00:37:11.780 --> 00:37:14.900
And what's the area
under here?

00:37:14.900 --> 00:37:19.070
Well, I can compute--

00:37:19.070 --> 00:37:22.340
basically change variables to
1 minus q, q prime, and 1

00:37:22.340 --> 00:37:26.590
minus q is q prime over here,
and so I'll get the same kind

00:37:26.590 --> 00:37:31.590
of calculation, 0 to 1, this
time q to the fifth over pq,

00:37:31.590 --> 00:37:36.020
which is 1/6, which not

00:37:36.020 --> 00:37:39.370
fortuitously is 1 over d right.

00:37:39.370 --> 00:37:46.920
So the area here is p over
3, and the area here is--

00:37:46.920 --> 00:37:49.460
under this side of
the curve is--

00:37:49.460 --> 00:37:51.340
that must be 5/6.

00:37:51.340 --> 00:37:55.090
Sorry, so the area under
this side is 1/6 so

00:37:55.090 --> 00:37:56.400
it's 1 minus this.

00:38:07.490 --> 00:38:10.560
It's clearly the big part,
so this is 5/6.

00:38:15.560 --> 00:38:15.980
All right.

00:38:15.980 --> 00:38:19.230
I've claimed my criterion for
successful decoding is that

00:38:19.230 --> 00:38:21.980
these curves not cross.

00:38:21.980 --> 00:38:31.010
All right, so for successful
decoding, clearly the sum of

00:38:31.010 --> 00:38:35.940
these 2 areas has to be
less than 1, right?

00:38:35.940 --> 00:38:50.210
So successful decoding: a
necessary condition is that p

00:38:50.210 --> 00:38:52.500
over d_lambda --

00:38:52.500 --> 00:38:56.510
let me just extend this
to any regular code--

00:38:56.510 --> 00:39:03.050
plus 1 minus 1 over d_rho
has to be less than 1.

00:39:09.490 --> 00:39:13.320
OK, what does this sum out to?

00:39:13.320 --> 00:39:27.530
This says that p has to be less
than d_lambda over d_rho,

00:39:27.530 --> 00:39:30.290
which happens to 1
minus r, right?

00:39:32.960 --> 00:39:38.835
Or equivalently, r less than 1
minus p, which is capacity.

00:39:42.200 --> 00:39:44.150
So what did I just prove
very quickly?

00:39:44.150 --> 00:39:48.800
I proved that for a regular low
density parity check code,

00:39:48.800 --> 00:39:54.370
just considering the areas under
these 2 curves and the

00:39:54.370 --> 00:39:59.490
requirement that the 2 curves
must not cross, I find that

00:39:59.490 --> 00:40:04.200
regular codes can't possibly
work for a rate any greater

00:40:04.200 --> 00:40:06.520
than 1 minus p, which
is capacity.

00:40:06.520 --> 00:40:10.160
In fact, the rate has to be less
than 1 minus p, strictly

00:40:10.160 --> 00:40:13.020
less, in order for there to--

00:40:13.020 --> 00:40:16.870
unless we were lucky enough just
to get 2 curves that were

00:40:16.870 --> 00:40:18.620
right on top of each other.

00:40:18.620 --> 00:40:19.840
I don't know whether that
would work or not.

00:40:19.840 --> 00:40:21.330
I guess it doesn't work.

00:40:21.330 --> 00:40:23.735
But we'd need them to be
just a scooch apart.

00:40:27.010 --> 00:40:30.170
OK, so I can make an inequality
sign here.

00:40:30.170 --> 00:40:33.920
OK, well that's rather
gratifying.

00:40:38.160 --> 00:40:43.510
What do we do to improve
the situation?

00:40:43.510 --> 00:40:45.560
OK, one--

00:40:45.560 --> 00:40:47.570
it's probably the first thing
you would think of

00:40:47.570 --> 00:40:50.830
investigating maybe at this
point, why don't we look at an

00:40:50.830 --> 00:40:53.640
irregular LDPC code?

00:41:07.360 --> 00:41:11.320
And I'm going to characterize
such a code by--

00:41:11.320 --> 00:41:18.650
there's going to be some
distribution on the left side,

00:41:18.650 --> 00:41:22.580
which I might write
by lambda_d.

00:41:22.580 --> 00:41:26.920
This is going to be the
fraction of left

00:41:26.920 --> 00:41:33.910
nodes of degree d.

00:41:33.910 --> 00:41:36.290
All right, I'll simply let that
be some distribution.

00:41:36.290 --> 00:41:39.250
Some might have degree 2, some
might have degree 3.

00:41:39.250 --> 00:41:44.270
Some might have degree 500.

00:41:44.270 --> 00:41:51.000
And similarly, rho_d
is the fraction of

00:41:51.000 --> 00:41:54.170
right nodes, et cetera.

00:42:01.500 --> 00:42:06.430
And there's some average degree
here, and some average

00:42:06.430 --> 00:42:09.250
degree here.

00:42:09.250 --> 00:42:12.950
So this is the average degree,
or the typical degree.

00:42:16.500 --> 00:42:20.500
This is average left degree,
this is average right degree.

00:42:23.990 --> 00:42:26.470
If I do that, then
the calculations

00:42:26.470 --> 00:42:29.640
are done in the notes.

00:42:29.640 --> 00:42:32.070
I won't take the time to do them
here, but basically you

00:42:32.070 --> 00:42:38.190
find the rate of the code is 1
minus the average left degree

00:42:38.190 --> 00:42:39.760
over the average right degree.

00:42:42.860 --> 00:42:45.180
OK, so it reduces to
the previous case

00:42:45.180 --> 00:42:47.820
and the regular case.

00:42:47.820 --> 00:42:51.440
Regular case, this is 1 for one
particular degree and 0

00:42:51.440 --> 00:42:52.690
for everything else.

00:42:56.050 --> 00:42:59.160
It works out.

00:42:59.160 --> 00:43:02.910
If I do that and go through
exactly the same analysis with

00:43:02.910 --> 00:43:06.690
my computation tree, now I
simply have a distribution of

00:43:06.690 --> 00:43:11.710
degrees at each level of the
computation tree, and you will

00:43:11.710 --> 00:43:18.040
not be surprised to hear what I
get out as my left to right

00:43:18.040 --> 00:43:22.315
equations, is I get out
some average of this.

00:43:25.350 --> 00:43:39.090
In fact, what I get out now is
that q left to right is the

00:43:39.090 --> 00:43:43.520
sum over d of--

00:43:43.520 --> 00:43:55.590
this is going to be lambda_d
times p times q right to left

00:43:55.590 --> 00:43:58.810
to the d minus 1.

00:43:58.810 --> 00:44:02.140
Which again reduces to the
previous thing, if only one of

00:44:02.140 --> 00:44:06.250
these is 1 and the rest are 0.

00:44:06.250 --> 00:44:07.500
So I just get the--

00:44:09.920 --> 00:44:11.570
this is just an expectation.

00:44:11.570 --> 00:44:13.830
This is the fraction
of erasures.

00:44:13.830 --> 00:44:17.860
I just count the number of times
I go through a node of

00:44:17.860 --> 00:44:20.990
degree d, and for that fraction
of time, I'm going to

00:44:20.990 --> 00:44:25.340
get this relationship, and so
I just average over them.

00:44:25.340 --> 00:44:26.450
That's very quick.

00:44:26.450 --> 00:44:29.733
Look at the notes for a detailed
derivation, but I

00:44:29.733 --> 00:44:32.840
hope it's intuitively
plausible.

00:44:32.840 --> 00:44:39.812
And similarly, 1 minus q right
to the left is the sum over d

00:44:39.812 --> 00:44:48.365
of rho_d, 1 minus q left to
right to the d minus 1.

00:44:52.510 --> 00:44:55.970
OK, this is elegantly
done if we

00:44:55.970 --> 00:44:59.710
define generating functions.

00:44:59.710 --> 00:45:03.490
We do that over here.

00:45:03.490 --> 00:45:05.600
I've lost it now so I'll
do it over here.

00:45:08.500 --> 00:45:11.230
So what you'll see in the
literature is generating

00:45:11.230 --> 00:45:15.920
functions to find is lambda_x
equals sum over d of lambda_d

00:45:15.920 --> 00:45:18.740
x to the d minus 1.

00:45:18.740 --> 00:45:25.650
And rho of x equals sum over d,
rho_d, x to the d minus 1.

00:45:25.650 --> 00:45:28.360
And then these equations
are simply written as--

00:45:28.360 --> 00:45:35.410
this is p times lambda of q
right to left, and this is

00:45:35.410 --> 00:45:42.710
equal to rho of 1 minus
q left to right.

00:45:46.870 --> 00:45:49.630
OK, so we get nice, elegant
generating function

00:45:49.630 --> 00:45:50.880
representations.

00:45:52.840 --> 00:45:56.350
But from the point of view of
the curves, we're basically

00:45:56.350 --> 00:45:58.110
just going to average
these curves.

00:45:58.110 --> 00:46:02.520
So we now replace these
equations up here by the

00:46:02.520 --> 00:46:03.770
average equations.

00:46:11.940 --> 00:46:18.100
This becomes p times lambda of
q right to left, and this

00:46:18.100 --> 00:46:25.950
becomes rho of 1 minus
q left to right.

00:46:25.950 --> 00:46:30.760
OK, but again, I'm going to
reduce all of this 2 curves,

00:46:30.760 --> 00:46:34.770
which again I can use
for a simulation.

00:46:34.770 --> 00:46:38.900
And now I have lots of
degrees of freedom.

00:46:38.900 --> 00:46:41.630
I could change all these lambdas
and all these rhos,

00:46:41.630 --> 00:46:45.180
and I can explore the space,
and that's what's Sae-Young

00:46:45.180 --> 00:46:48.530
Chung did in his thesis, not
so much for this channel.

00:46:48.530 --> 00:46:53.260
He did do it for this channel,
but also for additive white

00:46:53.260 --> 00:46:54.610
Gaussian noise channels.

00:46:54.610 --> 00:47:00.980
And so the idea is you try to
make these 2 curves just as

00:47:00.980 --> 00:47:02.910
close together as you can.

00:47:08.880 --> 00:47:09.820
Something like that.

00:47:09.820 --> 00:47:11.730
Or, of course, you can
do other tricks.

00:47:11.730 --> 00:47:15.690
You can have some of these--

00:47:15.690 --> 00:47:17.300
you can have some bits
over here that go

00:47:17.300 --> 00:47:18.250
to the outside world.

00:47:18.250 --> 00:47:20.520
You can suppress some
of these bits here.

00:47:20.520 --> 00:47:23.800
You can play around
with the graph.

00:47:23.800 --> 00:47:25.580
No limit on invention.

00:47:25.580 --> 00:47:27.490
But you don't really have
to do any of that.

00:47:30.430 --> 00:47:37.790
So it becomes a curve fitting
exercise, and you can imagine

00:47:37.790 --> 00:47:39.630
doing this in your thesis,
except you were

00:47:39.630 --> 00:47:41.095
not born soon enough.

00:47:44.950 --> 00:47:48.090
The interesting point here is
that this now becomes--

00:47:48.090 --> 00:47:51.470
the area becomes p over
d_lambda-bar, again,

00:47:51.470 --> 00:47:54.320
proof in the notes.

00:47:54.320 --> 00:48:00.790
This becomes 1 minus
1 over d_rho-bar.

00:48:04.820 --> 00:48:06.720
And so again, the
area theorem--

00:48:12.540 --> 00:48:15.880
in order for these curves not to
cross, we've got to have p

00:48:15.880 --> 00:48:25.860
over d_lambda-bar plus 1 minus
1 over d_rho-bar, less than

00:48:25.860 --> 00:48:30.230
the area of the whole exit
chart, which is 1.

00:48:30.230 --> 00:48:33.870
We again find that--

00:48:33.870 --> 00:48:39.870
let me put it this way, 1
minus d_lambda-bar over

00:48:39.870 --> 00:48:46.450
d_rho-bar is less than 1 minus
p, which is equivalent to the

00:48:46.450 --> 00:48:50.660
rate must be less than the
capacity of the channel.

00:48:50.660 --> 00:48:52.855
So this is a very nice,
elegant result.

00:48:52.855 --> 00:48:56.530
The area theorem says that no
matter how you play with these

00:48:56.530 --> 00:48:59.430
degree distributions in an
irregular low-density parity

00:48:59.430 --> 00:49:04.790
check code, you of course can
never get above capacity.

00:49:04.790 --> 00:49:10.010
But, it certainly suggests that
you might be able to play

00:49:10.010 --> 00:49:13.060
around with these curves such
that they get as close as you

00:49:13.060 --> 00:49:14.170
might like.

00:49:14.170 --> 00:49:17.270
And the converse of this is
that if you can make these

00:49:17.270 --> 00:49:20.890
arbitrarily close to each other,
then you can achieve

00:49:20.890 --> 00:49:22.830
rates arbitrarily close
to capacity.

00:49:26.630 --> 00:49:29.850
And that, in fact, is true.

00:49:29.850 --> 00:49:33.060
So simply by going to irregular
low-density parity

00:49:33.060 --> 00:49:36.810
check codes, we can get as close
as we like, arbitrarily

00:49:36.810 --> 00:49:41.610
close, to the capacity of the
binary erasure channel with

00:49:41.610 --> 00:49:44.276
this kind of iterative
decoding.

00:49:44.276 --> 00:49:46.320
And you can see the kind
of trade you're

00:49:46.320 --> 00:49:47.010
going to have to make.

00:49:47.010 --> 00:49:51.160
Obviously, you're going to have
more iterations as these

00:49:51.160 --> 00:49:52.670
get very close.

00:49:52.670 --> 00:49:55.450
What is the decoding process
going to look like?

00:49:55.450 --> 00:49:59.960
It's going to look like very
fine grained steps here, lots

00:49:59.960 --> 00:50:02.650
of iterations, but--

00:50:02.650 --> 00:50:03.120
all right.

00:50:03.120 --> 00:50:04.360
So it's a 100 iterations.

00:50:04.360 --> 00:50:07.880
So it's 200 irritations.

00:50:07.880 --> 00:50:10.100
These are not crazy numbers.

00:50:10.100 --> 00:50:12.570
These are quite feasible
numbers.

00:50:12.570 --> 00:50:16.450
And so if you're willing to
do a lot of computation--

00:50:16.450 --> 00:50:18.160
which is what you expect,
as you get close

00:50:18.160 --> 00:50:19.480
to capacity, right--

00:50:19.480 --> 00:50:23.520
you can get as close to capacity
as you like, at least

00:50:23.520 --> 00:50:26.270
on this channel.

00:50:26.270 --> 00:50:30.010
OK, isn't that great?

00:50:30.010 --> 00:50:35.010
It's an easy channel, I grant
you, but everything here is

00:50:35.010 --> 00:50:38.000
pretty simple.

00:50:38.000 --> 00:50:40.890
All these sum product
updates--

00:50:40.890 --> 00:50:45.750
for here, it's just a matter
of basically snow point

00:50:45.750 --> 00:50:46.900
propagating erasures.

00:50:46.900 --> 00:50:49.870
You just take the
known variables.

00:50:49.870 --> 00:50:53.130
You keep computing as many
as you can of them.

00:50:53.130 --> 00:50:57.110
Basically, every time an edge
becomes known, you only have

00:50:57.110 --> 00:50:59.910
to visit each edge
once, actually.

00:50:59.910 --> 00:51:02.220
The first time it becomes known
is the only time you

00:51:02.220 --> 00:51:02.850
have to visit it.

00:51:02.850 --> 00:51:05.490
After that, you can just
leave it fixed.

00:51:05.490 --> 00:51:13.130
All right, so if this has a
linear number of edges, as it

00:51:13.130 --> 00:51:15.940
does, by construction, for
either the regular or

00:51:15.940 --> 00:51:18.580
irregular case, the complexity
is now going

00:51:18.580 --> 00:51:20.690
to be linear, right?

00:51:20.690 --> 00:51:22.190
We only have to visit
each edge once.

00:51:22.190 --> 00:51:27.640
There are only a number of
edges proportional to n.

00:51:27.640 --> 00:51:29.800
So the complexity of this whole
decoding algorithm-- all

00:51:29.800 --> 00:51:33.490
you do is, you fix as many edges
as you can, then you go

00:51:33.490 --> 00:51:36.670
over here and you try to fix as
many more edges as you can.

00:51:36.670 --> 00:51:39.520
You come back here, try to fix
as many more as you can.

00:51:39.520 --> 00:51:42.700
It will behave exactly as this
simulation shows it will

00:51:42.700 --> 00:51:49.220
behave, and after going back
and forth maybe 100 times--

00:51:49.220 --> 00:51:53.470
in more reasonable cases, it's
only 10 or 20 times, it's a

00:51:53.470 --> 00:51:56.970
very finite number of times--

00:51:56.970 --> 00:51:59.700
you'll be done.

00:51:59.700 --> 00:52:05.390
Another qualitative aspect of
this that you already see in

00:52:05.390 --> 00:52:08.150
the regular code case--

00:52:08.150 --> 00:52:10.760
in fact, you see it very
nicely there-- is that

00:52:10.760 --> 00:52:16.450
typically, very typically, you
have an initial period here

00:52:16.450 --> 00:52:18.680
where you make a rapid progress
because the curves

00:52:18.680 --> 00:52:22.740
are pretty far apart, then you
have some narrow little tunnel

00:52:22.740 --> 00:52:25.540
that you have to get through,
and then the

00:52:25.540 --> 00:52:27.010
curves widen up again.

00:52:27.010 --> 00:52:28.480
I've exaggerated it here.

00:52:32.860 --> 00:52:35.730
So OK, you're making great
progress, you're filling in,

00:52:35.730 --> 00:52:39.400
lots of edges become known, and
then for a while it seems

00:52:39.400 --> 00:52:43.020
like you're making no progress
at all, making very tiny

00:52:43.020 --> 00:52:46.630
progress on each iteration.

00:52:46.630 --> 00:52:50.730
But then, you get through
this tunnel, and boom!

00:52:50.730 --> 00:52:53.330
Things go very fast.

00:52:53.330 --> 00:52:55.920
And for this code,
it has a zero--

00:52:55.920 --> 00:52:59.670
the regular code has a zero
slope at this point, whereas

00:52:59.670 --> 00:53:03.540
this has a non-zero slope.

00:53:03.540 --> 00:53:05.600
So these things will go boom,
boom, boom, boom, boom as you

00:53:05.600 --> 00:53:08.200
go in there.

00:53:08.200 --> 00:53:11.660
So these guys at Digital
Fountain, they called their

00:53:11.660 --> 00:53:13.990
second class of codes,
[UNINTELLIGIBLE], tornado

00:53:13.990 --> 00:53:15.930
codes, because they
had this effect.

00:53:15.930 --> 00:53:18.370
You have to struggle and
struggle, but then when you

00:53:18.370 --> 00:53:22.440
finally get it, there's a
tornado, a blizzard, of known

00:53:22.440 --> 00:53:24.661
edges, and all of a sudden, all
the edges become known.

00:53:27.960 --> 00:53:31.760
Oh by the way, this could
be done for packets.

00:53:31.760 --> 00:53:32.870
There's nothing--

00:53:32.870 --> 00:53:36.380
you know, this is a repetition
for a packet, and this is a

00:53:36.380 --> 00:53:38.800
bit-wise parity check
for a packet.

00:53:38.800 --> 00:53:42.200
So the same diagram works
perfectly well for packet

00:53:42.200 --> 00:53:43.400
transmission.

00:53:43.400 --> 00:53:44.390
That's the way they use it.

00:53:44.390 --> 00:53:44.858
Yeah?

00:53:44.858 --> 00:53:46.108
AUDIENCE: [UNINTELLIGIBLE]

00:53:49.070 --> 00:53:49.760
PROFESSOR: Yeah.

00:53:49.760 --> 00:53:50.060
Right.

00:53:50.060 --> 00:53:53.670
So this chart makes
it very clear.

00:53:53.670 --> 00:53:55.740
If you're going to get this
tornado effect, it's because

00:53:55.740 --> 00:53:57.490
you have some gap in here.

00:53:57.490 --> 00:53:59.480
The bigger the gap, the further
away you are from

00:53:59.480 --> 00:54:01.630
capacity, quite quantitatively.

00:54:06.886 --> 00:54:08.782
So I just--

00:54:08.782 --> 00:54:11.380
this is the first year I've been
able to get this far in

00:54:11.380 --> 00:54:13.150
the course, and I think
this is very much

00:54:13.150 --> 00:54:17.530
worth presenting because--

00:54:17.530 --> 00:54:18.650
look at what's happened here.

00:54:18.650 --> 00:54:23.230
At least for one channel, after
50 years of work in

00:54:23.230 --> 00:54:29.040
trying to get to Shannon's
channel capacity, around 1995

00:54:29.040 --> 00:54:32.960
or so, people finally figured
out a way of constructing a

00:54:32.960 --> 00:54:35.990
code and a decoding algorithm
that in fact has linear

00:54:35.990 --> 00:54:40.630
complexity, and can get as close
to channel capacity as

00:54:40.630 --> 00:54:42.890
you like in a very
feasible way, at

00:54:42.890 --> 00:54:46.300
least for this channel.

00:54:46.300 --> 00:54:50.160
So that's really where we want
to end the story in this

00:54:50.160 --> 00:54:52.480
class, because the whole class
has been about getting to

00:54:52.480 --> 00:54:53.290
channel capacity.

00:54:53.290 --> 00:54:56.350
Well, what about
other channels?

00:54:56.350 --> 00:55:00.230
What about channels
with errors here?

00:55:00.230 --> 00:55:13.850
So let's go to the symmetric
input binary

00:55:13.850 --> 00:55:20.700
channel, which I--

00:55:23.490 --> 00:55:27.760
symmetric, sorry-- symmetric
binary input channel.

00:55:27.760 --> 00:55:30.160
This is not standardized.

00:55:30.160 --> 00:55:35.810
The problem is, what you really
want to say is the

00:55:35.810 --> 00:55:38.620
binary symmetric channel, except
that term is already

00:55:38.620 --> 00:55:41.940
taken, so you've got to
say something else.

00:55:41.940 --> 00:55:44.510
I say symmetric binary
input channel.

00:55:44.510 --> 00:55:45.775
You'll see other things
in the literature.

00:55:48.610 --> 00:55:54.840
This channel has 2 inputs: 0
and 1, and it has as many

00:55:54.840 --> 00:55:55.950
outputs as you like.

00:55:55.950 --> 00:55:59.290
It might have an
erasure output.

00:55:59.290 --> 00:56:02.120
And the key thing about the
erasure output is that the

00:56:02.120 --> 00:56:04.730
probability of getting there
from either 0 or 1 is the

00:56:04.730 --> 00:56:07.790
same, call it p again.

00:56:07.790 --> 00:56:11.610
And so the a posteriori
probability, let's write the

00:56:11.610 --> 00:56:14.830
APPs by each of these.

00:56:14.830 --> 00:56:16.950
The erasure output is always
going to be a state of

00:56:16.950 --> 00:56:19.510
complete ignorance,
you don't know.

00:56:19.510 --> 00:56:22.010
So there might be one output
like that, and then there will

00:56:22.010 --> 00:56:28.150
be other outputs here
that occur in pairs.

00:56:28.150 --> 00:56:31.650
And the pairs are always going
to have the character that

00:56:31.650 --> 00:56:35.730
their APP is going
to be 1 minus--

00:56:35.730 --> 00:56:37.280
I've used p excessively here.

00:56:37.280 --> 00:56:41.080
Let me take it off of here
and use it here--

00:56:41.080 --> 00:56:44.330
for a typical other pair, you're
going to have 1 minus

00:56:44.330 --> 00:56:48.570
pp, or p 1 minus p.

00:56:48.570 --> 00:56:50.680
In other words, just looking
at these 2 outputs, it's a

00:56:50.680 --> 00:56:53.380
binary symmetric channel.

00:56:53.380 --> 00:56:56.080
The probability of p
of cross over and 1

00:56:56.080 --> 00:56:59.890
minus p of being correct.

00:56:59.890 --> 00:57:03.140
And we may have pairs that are
pretty unreliable where p is

00:57:03.140 --> 00:57:05.580
close to 1/2, and we
may have pairs that

00:57:05.580 --> 00:57:07.460
are extremely reliable.

00:57:07.460 --> 00:57:14.110
So this 1 minus p prime, p
prime, where p prime might be

00:57:14.110 --> 00:57:17.550
very close to 0.

00:57:17.550 --> 00:57:20.820
But the point is, the outputs
always occur in these pairs.

00:57:20.820 --> 00:57:25.750
The output space can be
partitioned into pairs such

00:57:25.750 --> 00:57:28.780
that, for each pair, you have a
binary symmetric channel, or

00:57:28.780 --> 00:57:32.580
you might have this singleton,
which is an erasure.

00:57:32.580 --> 00:57:37.030
And this is, of course, what we
have for the binary input

00:57:37.030 --> 00:57:39.170
additive white Gaussian
noise channel.

00:57:39.170 --> 00:57:45.440
We have 2 inputs, and now we
have an output which is the

00:57:45.440 --> 00:57:49.760
complete real line, which has
a distribution like this.

00:57:49.760 --> 00:57:53.555
But in this case, 0
is the erasure.

00:57:53.555 --> 00:57:57.080
If we get a 0, then the
probability of the APP message

00:57:57.080 --> 00:57:59.140
is (1/2,1/2).

00:57:59.140 --> 00:58:02.620
And the pairs are
plus or minus y.

00:58:02.620 --> 00:58:11.090
If we get to see y, then the
probability of y given 0, or

00:58:11.090 --> 00:58:14.920
given one, that's the same pair
as the probability of

00:58:14.920 --> 00:58:16.910
minus y given--

00:58:16.910 --> 00:58:20.740
this is, of course, minus 1,
plus 1 for my 2 possible

00:58:20.740 --> 00:58:22.950
transmissions here.

00:58:22.950 --> 00:58:25.800
Point is, binary input added
white Gaussian noise channel

00:58:25.800 --> 00:58:27.120
is in this class.

00:58:27.120 --> 00:58:29.517
It has a continuous output
rather than a discrete output.

00:58:32.200 --> 00:58:34.630
But there's a key symmetry
property here.

00:58:34.630 --> 00:58:39.720
Basically, if you exchange
0 for 1, nothing changes.

00:58:39.720 --> 00:58:42.070
All right, so the symmetry
between 0 and 1.

00:58:42.070 --> 00:58:44.830
That's why it's called
a symmetric channel.

00:58:44.830 --> 00:58:49.390
That means you can easily prove
that for the capacity

00:58:49.390 --> 00:58:53.630
achieving input distribution is
always (1/2,1/2), for any

00:58:53.630 --> 00:58:54.450
such channel.

00:58:54.450 --> 00:58:56.880
If you've taken information
theory, you've seen this

00:58:56.880 --> 00:58:59.050
demonstrated.

00:58:59.050 --> 00:59:04.370
And this has the important
implication that you can use

00:59:04.370 --> 00:59:08.320
linear codes on any symmetric
binary input channel without

00:59:08.320 --> 00:59:09.865
loss of channel capacity.

00:59:13.200 --> 00:59:15.020
Linear codes achieve capacity.

00:59:19.450 --> 00:59:22.640
OK, whereas, of course, if this
weren't (1/2,1/2), then

00:59:22.640 --> 00:59:24.480
linear codes couldn't possibly
achieve capacity.

00:59:32.190 --> 00:59:33.960
Suppose you have
such a channel.

00:59:33.960 --> 00:59:38.450
What's the sum product
updates?

00:59:38.450 --> 00:59:43.860
The sum product updates become
more complicated.

00:59:43.860 --> 00:59:46.400
They're really not hard
for the equality sign.

00:59:46.400 --> 00:59:50.560
You remember for a repetition
node, the sum product update

00:59:50.560 --> 00:59:54.610
is just the product of basically
the APPs coming in

00:59:54.610 --> 00:59:56.670
or the APPs going out.

00:59:56.670 --> 00:59:58.920
So all we've got to do
is take the product.

00:59:58.920 --> 01:00:02.410
It'll turn out the messages in
this case are always of the

01:00:02.410 --> 01:00:06.500
form p, 1 minus p--

01:00:06.500 --> 01:00:09.420
of course, because they're
binary, and so

01:00:09.420 --> 01:00:11.040
has to be like this--

01:00:11.040 --> 01:00:13.830
so we really just need
a single parameter p.

01:00:13.830 --> 01:00:18.120
We multiply all the p's, and
that normalize correctly, and

01:00:18.120 --> 01:00:19.660
that'll be the output.

01:00:22.410 --> 01:00:26.270
For the update here, I'm sorry
I don't have time to talk

01:00:26.270 --> 01:00:32.990
about it in class, but there's
a clever little procedure

01:00:32.990 --> 01:00:35.790
which basically says take
the Hadamard Transform

01:00:35.790 --> 01:00:37.330
of p, 1 minus p.

01:00:37.330 --> 01:00:40.750
The Hadamard Transform in
general says, convert this to

01:00:40.750 --> 01:00:44.160
the pair of a plus
b, a minus b.

01:00:44.160 --> 01:00:50.110
So in this case, we convert it
to a plus b is always 1, and a

01:00:50.110 --> 01:00:56.990
minus b is, in this
case, 2p minus 1.

01:00:56.990 --> 01:00:59.440
Works out better, turns
out this is actually

01:00:59.440 --> 01:01:02.400
a likelihood ratio.

01:01:02.400 --> 01:01:04.970
Take the Hadamard Transform,
then you can use the same

01:01:04.970 --> 01:01:07.250
product update rule as
you used up here.

01:01:10.220 --> 01:01:19.860
So do the repetition node
updates, which is easy--

01:01:19.860 --> 01:01:23.980
so it says just multiply all the
inputs component-wise in

01:01:23.980 --> 01:01:27.880
this vector, and then take the
Hadamard Transform again to

01:01:27.880 --> 01:01:33.400
get your time domain
or primal domain,

01:01:33.400 --> 01:01:34.820
rather than dual domain.

01:01:34.820 --> 01:01:36.920
So you work in the dual
domain, rather

01:01:36.920 --> 01:01:38.560
than the primal domain.

01:01:38.560 --> 01:01:40.630
Again, I'm sorry.

01:01:40.630 --> 01:01:42.390
You got a homework problem on
it, after you've done the

01:01:42.390 --> 01:01:45.590
homework problem, you'll
understand this.

01:01:45.590 --> 01:01:51.410
And this turns out to
involve hyperbolic

01:01:51.410 --> 01:01:53.860
tangents to do these.

01:01:53.860 --> 01:01:57.770
These Hadamard Transforms turn
out to be taking hyperbolic

01:01:57.770 --> 01:02:01.010
tangents, and this is called the
hyperbolic tangent rule,

01:02:01.010 --> 01:02:02.630
the tanh rule.

01:02:02.630 --> 01:02:05.790
So there's a simple way to do
updates in general for any of

01:02:05.790 --> 01:02:07.040
these channels.

01:02:09.740 --> 01:02:13.400
Now, you can do the
same kind of

01:02:13.400 --> 01:02:17.970
analysis, but what's different?

01:02:17.970 --> 01:02:22.280
For the erasure channel, we only
had 2 types of messages,

01:02:22.280 --> 01:02:25.620
known or erased, and all we
really had to do is keep track

01:02:25.620 --> 01:02:28.980
of what's the probability of the
erasure type of message,

01:02:28.980 --> 01:02:32.690
or 1 minus this probability,
it doesn't matter.

01:02:32.690 --> 01:02:34.670
So that's why I said it
was one-dimensional.

01:02:37.430 --> 01:02:45.310
For the symmetric binary input
channel, in general, you can

01:02:45.310 --> 01:02:47.880
have any APP vector here.

01:02:47.880 --> 01:02:49.840
This is a single parameter
vector.

01:02:49.840 --> 01:02:54.780
It's parameterized by p, or by
the likelihood ratio, or by

01:02:54.780 --> 01:02:56.690
the log likelihood ratio.

01:02:56.690 --> 01:02:58.340
There are various ways
to parameterize it.

01:02:58.340 --> 01:03:01.750
But in any case, a single number
tells you what the APP

01:03:01.750 --> 01:03:04.220
message is.

01:03:04.220 --> 01:03:08.676
And so at this point-- or I
guess, better looking at it in

01:03:08.676 --> 01:03:10.310
the competition tree--

01:03:10.310 --> 01:03:12.790
at each point, instead of having
a single number, we

01:03:12.790 --> 01:03:16.600
have a probability distribution
on p.

01:03:16.600 --> 01:03:22.840
So we get some probability
distribution on p, pp of p,

01:03:22.840 --> 01:03:27.640
that characterizes where
you are at this time.

01:03:27.640 --> 01:03:31.490
Coming off the channel,
initially, the probability

01:03:31.490 --> 01:03:35.690
distribution on p might be equal
to y, I think it is,

01:03:35.690 --> 01:03:39.370
actually, or p to the minus y,
and you get some probability

01:03:39.370 --> 01:03:41.700
distribution on what p is.

01:03:45.070 --> 01:03:47.720
By the way, again because of
symmetry, you can always

01:03:47.720 --> 01:03:51.760
assume that the all-zero vector
was sent in your code.

01:03:51.760 --> 01:03:55.770
It doesn't matter which of your
code words is sent, since

01:03:55.770 --> 01:03:57.430
everything is symmetrical.

01:03:57.430 --> 01:04:00.120
So you can do all your analysis
assuming the all-zero

01:04:00.120 --> 01:04:01.620
code word was sent.

01:04:01.620 --> 01:04:04.060
This simplifies things
a lot, too.

01:04:04.060 --> 01:04:07.840
p then becomes the probability
which--

01:04:07.840 --> 01:04:10.270
well, I guess I've
got it backwards.

01:04:10.270 --> 01:04:15.870
Should be 1 minus pp, because
p then becomes the

01:04:15.870 --> 01:04:18.398
probability.

01:04:18.398 --> 01:04:22.140
If the assumed probability of
the input is a 1, in other

01:04:22.140 --> 01:04:24.640
words, the probability
that your current

01:04:24.640 --> 01:04:27.470
guess would be wrong--

01:04:27.470 --> 01:04:29.000
I'm not saying that well.

01:04:29.000 --> 01:04:31.640
Anyway, you get some
distribution of p.

01:04:31.640 --> 01:04:34.540
Let me just draw it like that.

01:04:34.540 --> 01:04:37.120
So here's pp of p.

01:04:37.120 --> 01:04:40.210
There's probability
distribution.

01:04:40.210 --> 01:04:42.560
And again, we'll draw it
going from 1 to 0.

01:04:45.860 --> 01:04:47.180
So that doesn't go out here.

01:04:50.550 --> 01:04:56.700
OK, with more effort, you can
again see what the effect of

01:04:56.700 --> 01:04:58.480
the update rule is
going to be.

01:04:58.480 --> 01:05:02.400
For each iteration, you have a
certain input distribution on

01:05:02.400 --> 01:05:03.490
all these lines.

01:05:03.490 --> 01:05:06.190
Again, under the independence
assumption, you get

01:05:06.190 --> 01:05:08.580
independently--

01:05:08.580 --> 01:05:12.640
you get a distribution for the
APP parameter p on each of

01:05:12.640 --> 01:05:13.980
these lines.

01:05:13.980 --> 01:05:14.780
That leads--

01:05:14.780 --> 01:05:17.380
you can then calculate what the
distribution-- or simulate

01:05:17.380 --> 01:05:21.380
what it is on the output line,
just by seeing what's the

01:05:21.380 --> 01:05:24.480
effect of applying the
sum product rule.

01:05:24.480 --> 01:05:27.300
It's a much more elaborate
calculation, but you can do

01:05:27.300 --> 01:05:30.870
it, or you can do it up to
some degree of precision.

01:05:30.870 --> 01:05:35.950
This you can't do exactly, but
you can do it to fourteen bits

01:05:35.950 --> 01:05:38.800
of precision if you like.

01:05:38.800 --> 01:05:44.290
And so again, you can work
through something that amounts

01:05:44.290 --> 01:05:51.580
to plotting the progress of the
iteration through here, up

01:05:51.580 --> 01:05:54.860
to any degree of precision
you want.

01:05:54.860 --> 01:05:59.980
So again, you can determine
whether it succeeds or fails,

01:05:59.980 --> 01:06:02.900
again, for regular or irregular
low-density parity

01:06:02.900 --> 01:06:05.590
check codes.

01:06:05.590 --> 01:06:07.340
In general, it's better
to make it irregular.

01:06:07.340 --> 01:06:11.580
You could make it as irregular
as you like.

01:06:11.580 --> 01:06:14.380
And so you can see that this
could involve a lot of

01:06:14.380 --> 01:06:21.000
computer time to optimize
everything, but at the end of

01:06:21.000 --> 01:06:27.080
the day, it's basically a
similar kind of hill climbing,

01:06:27.080 --> 01:06:31.950
curve fitting exercise, where
ultimately on any of these

01:06:31.950 --> 01:06:39.180
binary input symmetric channels,
you can get as close

01:06:39.180 --> 01:06:41.350
as you want to capacity.

01:06:41.350 --> 01:06:44.090
In the very first lecture, I
showed you what Sae-Young

01:06:44.090 --> 01:06:46.460
Chung achieved in his thesis.

01:06:46.460 --> 01:06:49.740
He got the binary input
additive white

01:06:49.740 --> 01:06:51.420
Gaussian noise channel.

01:06:51.420 --> 01:06:54.980
He got under the assumption of

01:06:54.980 --> 01:06:56.990
asymptotically long random codes.

01:06:56.990 --> 01:07:02.290
He got within 0.0045 dB
of channel capacity.

01:07:02.290 --> 01:07:05.990
And then for a more reasonable
number, like a block length of

01:07:05.990 --> 01:07:10.970
10 to the seventh, he
got within 0.040

01:07:10.970 --> 01:07:13.500
dB of channel capacity.

01:07:13.500 --> 01:07:15.200
Now, that's still a
longer code than

01:07:15.200 --> 01:07:16.110
anybody's going to use.

01:07:16.110 --> 01:07:20.730
It's a little bit of a stunt,
but I think his work convinced

01:07:20.730 --> 01:07:25.070
everybody that we finally had
gotten to channel capacity.

01:07:25.070 --> 01:07:27.300
OK, the Eta Kappa Nu
person is here.

01:07:27.300 --> 01:07:30.910
Please help her out, and
we'll see you Monday.