1
00:00:00,910 --> 00:00:03,790
PROFESSOR: We are still
in chapter 12 on

2
00:00:03,790 --> 00:00:05,780
the sum-product algorithm.

3
00:00:05,780 --> 00:00:10,460
I hope to get through that
today, fairly expeditiously,

4
00:00:10,460 --> 00:00:13,720
and to start with chapter 13.

5
00:00:13,720 --> 00:00:18,610
The handouts for today
will be chapter 13.

6
00:00:18,610 --> 00:00:24,550
And, as usual, problem set 9
with problem 8.3 has been

7
00:00:24,550 --> 00:00:26,560
moved up into problem set 9.

8
00:00:26,560 --> 00:00:33,870
Problem set 8 solutions
for 8.1 and 8.2.

9
00:00:33,870 --> 00:00:39,450
So I'm going to start off on a
different tack of presenting

10
00:00:39,450 --> 00:00:41,810
the sum-product algorithm.

11
00:00:41,810 --> 00:00:42,900
Where is everybody?

12
00:00:42,900 --> 00:00:46,460
1, 2, 3, 4, 5, 6, 7, 8, 9.

13
00:00:46,460 --> 00:00:50,850
Is there any competitive event
that I should know about?

14
00:00:50,850 --> 00:00:56,870
I know the traffic was bad and
it's a rainy, late morning.

15
00:00:56,870 --> 00:00:58,100
Problem getting here myself.

16
00:00:58,100 --> 00:01:00,440
Well, this is improving.

17
00:01:07,125 --> 00:01:09,720
The write up of the sum-product
in the notes is in

18
00:01:09,720 --> 00:01:12,220
terms of equations.

19
00:01:12,220 --> 00:01:19,940
And I've never been terribly
satisfied with this write up.

20
00:01:19,940 --> 00:01:25,150
It's concise, but we have to
invent some new notation,

21
00:01:25,150 --> 00:01:29,960
which I don't consider to be
completely transparent.

22
00:01:29,960 --> 00:01:32,950
I've never found it particularly
easy to present

23
00:01:32,950 --> 00:01:36,240
in class, and it's been a little
frustrating to me.

24
00:01:36,240 --> 00:01:42,020
So I'd like to try a
different approach.

25
00:01:42,020 --> 00:01:46,720
I'd like in class just to go
over through a very explicit

26
00:01:46,720 --> 00:01:50,720
example, our favorite
844 code.

27
00:01:50,720 --> 00:01:54,160
What we're trying to achieve
with the sum-product algorithm

28
00:01:54,160 --> 00:01:59,110
and how the sum-product
algorithm helps us to achieve

29
00:01:59,110 --> 00:02:00,630
it in an efficient way.

30
00:02:00,630 --> 00:02:03,000
So it's proof by example.

31
00:02:03,000 --> 00:02:06,260
It won't be very satisfying to
those of you who like to see

32
00:02:06,260 --> 00:02:09,889
real proofs, but that you can
get by going back and

33
00:02:09,889 --> 00:02:12,140
re-reading the notes.

34
00:02:12,140 --> 00:02:17,190
So I'm going to try doing it
this way and you will let me

35
00:02:17,190 --> 00:02:22,380
know by your looks of gladness
or of frustration whether you

36
00:02:22,380 --> 00:02:25,380
like it or not.

37
00:02:25,380 --> 00:02:27,110
So let's start at the
beginning again.

38
00:02:27,110 --> 00:02:29,820
What are we trying to do?

39
00:02:29,820 --> 00:02:33,550
We're trying to compute the a
posteriori probability of

40
00:02:33,550 --> 00:02:37,810
every possible value of every
symbol, internal and external,

41
00:02:37,810 --> 00:02:42,720
that appears in a graphical
realization for a code.

42
00:02:42,720 --> 00:02:51,080
And we will consider these
to be APP vectors.

43
00:02:51,080 --> 00:02:53,700
In other words, if a symbol is
eight valued, there will be

44
00:02:53,700 --> 00:02:57,003
eight APP values
in the vector.

45
00:03:00,200 --> 00:03:03,510
And we're going to assume that
we have a cycle-free graph

46
00:03:03,510 --> 00:03:10,210
realization, so that we can
proceed by the cut-set idea to

47
00:03:10,210 --> 00:03:14,010
solve independent parts
of the graph.

48
00:03:14,010 --> 00:03:17,030
And at the very end, we'll relax
that and see whether we

49
00:03:17,030 --> 00:03:20,240
can still do OK.

50
00:03:20,240 --> 00:03:25,540
All right, one concept that
seems very simple in theory

51
00:03:25,540 --> 00:03:28,730
but that some people don't get
till they really try to do it

52
00:03:28,730 --> 00:03:35,670
in practice is that an APP
vector is really a likelihood

53
00:03:35,670 --> 00:03:39,750
weight vector normalized so
that the sums of all the

54
00:03:39,750 --> 00:03:43,590
probabilities are equal to 1.

55
00:03:43,590 --> 00:03:49,500
For instance, if we send plus
1 or minus 1 through an

56
00:03:49,500 --> 00:03:55,220
additive white Gaussian Noise
channel with sigma squared

57
00:03:55,220 --> 00:04:02,180
equal to 1 or whatever, then
the probability of--

58
00:04:02,180 --> 00:04:06,820
we call this y and this r, the
probability of r given y

59
00:04:06,820 --> 00:04:08,510
equals whatever.

60
00:04:08,510 --> 00:04:13,100
This is probability of r given
y, is 1 over square root of 2

61
00:04:13,100 --> 00:04:21,200
pi sigma squared e to the minus
y minus r-squared over 2

62
00:04:21,200 --> 00:04:22,950
sigma-squared.

63
00:04:22,950 --> 00:04:28,020
And so we can compute that
for y equals 1 and y

64
00:04:28,020 --> 00:04:30,470
equals minus 1.

65
00:04:30,470 --> 00:04:33,690
And we'll get a vector
of two things.

66
00:04:33,690 --> 00:04:40,280
The APP vector would then
consist of this evaluated for

67
00:04:40,280 --> 00:04:43,090
y equals 1.

68
00:04:43,090 --> 00:04:47,440
Let me just take this part of
it, a to the minus r minus 1

69
00:04:47,440 --> 00:04:54,090
squared over 2 sigma squared
and e to the minus r plus 1

70
00:04:54,090 --> 00:04:59,370
squared over 2 sigma-squared.

71
00:04:59,370 --> 00:05:03,010
Something proportional to these
two vectors normalized

72
00:05:03,010 --> 00:05:07,050
and that's why I continue to
use this proportional to

73
00:05:07,050 --> 00:05:10,490
symbol, which you may not
have seen before.

74
00:05:10,490 --> 00:05:11,930
Or maybe you have.

75
00:05:11,930 --> 00:05:15,900
So the APP vector in this case
would be proportional to--

76
00:05:15,900 --> 00:05:18,530
well, actually 1 over square
root of 2 pi sigma-squared

77
00:05:18,530 --> 00:05:22,080
times each of these
two things.

78
00:05:22,080 --> 00:05:27,790
But this would basically tell
you how much weight to give to

79
00:05:27,790 --> 00:05:32,500
plus 1 and minus 1 as possible
transmitted symbols.

80
00:05:32,500 --> 00:05:33,490
And all this matters --

81
00:05:33,490 --> 00:05:36,350
there's one extra degree
of freedom in here.

82
00:05:36,350 --> 00:05:40,740
You can scale this, the whole
vector, by any scale factor

83
00:05:40,740 --> 00:05:43,220
you like and it gives you
the same information.

84
00:05:43,220 --> 00:05:45,680
Because you can always normalize
the two terms to be

85
00:05:45,680 --> 00:05:46,950
equal to 1.

86
00:05:46,950 --> 00:05:56,260
So if this, for instance, was
0.12 and this was 0.03, that's

87
00:05:56,260 --> 00:06:03,770
equivalent to 0.8 and 0.2.

88
00:06:03,770 --> 00:06:05,080
Two probabilities.

89
00:06:05,080 --> 00:06:08,660
Actual a-posteriori
probabilities that sum to 1,

90
00:06:08,660 --> 00:06:10,380
all you've got to do is
keep the ratio right.

91
00:06:10,380 --> 00:06:13,530
This is four times as large as
that, so what that really

92
00:06:13,530 --> 00:06:16,260
means is 0.8 and 0.2.

93
00:06:16,260 --> 00:06:20,500
And in implementation, so the
sum-product algorithm, we

94
00:06:20,500 --> 00:06:24,020
never really worry about
constant scaling factors.

95
00:06:24,020 --> 00:06:31,010
As long as they appear in every
term we can just forget

96
00:06:31,010 --> 00:06:31,790
about scaling.

97
00:06:31,790 --> 00:06:34,680
I mean we don't want things to
get too large or too small, so

98
00:06:34,680 --> 00:06:37,120
we can apply an overall
scaling factor.

99
00:06:37,120 --> 00:06:39,740
But as long as we've got
something, the relative

100
00:06:39,740 --> 00:06:41,920
weights, then we have what
we need to know.

101
00:06:47,160 --> 00:06:53,570
Let's go to the 844 code since
we know that very well and ask

102
00:06:53,570 --> 00:06:56,990
ourselves what it is we're
trying to compute.

103
00:06:56,990 --> 00:07:02,240
We're given APP vectors for
each of the eight received

104
00:07:02,240 --> 00:07:03,120
bits, let's say.

105
00:07:03,120 --> 00:07:07,190
We transmit y, we receive some
r, perhaps over an additive

106
00:07:07,190 --> 00:07:08,970
white Gaussian noise channel.

107
00:07:08,970 --> 00:07:13,450
So we get a little 2 term APP
vector for each of these y's,

108
00:07:13,450 --> 00:07:17,045
telling us the relative
probability that a 0 was sent

109
00:07:17,045 --> 00:07:19,230
or a 1 was sent.

110
00:07:19,230 --> 00:07:21,020
That's called the intrinsic
information.

111
00:07:23,560 --> 00:07:24,790
Let me just call that.

112
00:07:24,790 --> 00:07:27,530
So for each one we
get p0 and--

113
00:07:30,080 --> 00:07:37,000
let me make it p0 and q0, the
two terms for y0; p1 and q1

114
00:07:37,000 --> 00:07:39,020
for y1 and so forth.

115
00:07:39,020 --> 00:07:41,030
So we have that information
for each of

116
00:07:41,030 --> 00:07:42,280
the received symbols.

117
00:07:49,520 --> 00:07:54,870
What's the likelihood of each of
these code words then given

118
00:07:54,870 --> 00:07:58,795
these individual bitwise
likelihoods, the probability

119
00:07:58,795 --> 00:08:01,390
of the all zero code word.

120
00:08:01,390 --> 00:08:11,330
It's likelihood, let's say, is
then proportional to p0, p1,

121
00:08:11,330 --> 00:08:19,000
p2, p3, p4, p5, p6, p7.

122
00:08:19,000 --> 00:08:25,900
Whereas the probability at 1, 1,
1, 1, 0, 0, 0, 0 is q0, q1,

123
00:08:25,900 --> 00:08:32,909
q2, q3, p4, p5, p6,
p7 and so forth.

124
00:08:32,909 --> 00:08:38,190
Each one is a product of the
eight corresponding terms.

125
00:08:38,190 --> 00:08:40,780
Yes?

126
00:08:40,780 --> 00:08:44,000
AUDIENCE: You're assuming that
the bits are independent.

127
00:08:44,000 --> 00:08:45,740
PROFESSOR: Yes, it's a
memoryless channel.

128
00:08:45,740 --> 00:08:50,940
So the noise in every symbol
transmission is independent.

129
00:08:50,940 --> 00:08:52,735
And that's important too.

130
00:08:52,735 --> 00:08:55,210
AUDIENCE: So basically once
again, the second bit is

131
00:08:55,210 --> 00:08:59,180
independent of the first
bit in transmission?

132
00:08:59,180 --> 00:09:00,410
PROFESSOR: I'm sorry,
I was writing and I

133
00:09:00,410 --> 00:09:01,660
missed what you said.

134
00:09:04,565 --> 00:09:09,030
AUDIENCE: Well, I mean, if the
first bit is 0, maybe that is

135
00:09:09,030 --> 00:09:12,010
a code where if the first
bit is 0, the second

136
00:09:12,010 --> 00:09:13,426
bit has to be 0.

137
00:09:17,330 --> 00:09:20,910
PROFESSOR: Yeah, there's
certainly dependencies among

138
00:09:20,910 --> 00:09:21,610
the code words.

139
00:09:21,610 --> 00:09:24,070
That's what we're trying
to take into account.

140
00:09:24,070 --> 00:09:25,920
This is how to take
it into account.

141
00:09:25,920 --> 00:09:27,850
So I'm assuming a memoryless
channel.

142
00:09:27,850 --> 00:09:33,110
But I'm of course, assuming
dependencies within the code.

143
00:09:33,110 --> 00:09:35,800
How do I do that?

144
00:09:35,800 --> 00:09:40,000
One way is maximum likelihood
decoding.

145
00:09:40,000 --> 00:09:41,220
What do we do when
we do maximum

146
00:09:41,220 --> 00:09:43,560
likelihood decoding in principle?

147
00:09:43,560 --> 00:09:50,630
We exhaustively computed each of
these 16 products, each of

148
00:09:50,630 --> 00:09:54,470
these 16 likelihoods for each
of the 16 code words.

149
00:09:54,470 --> 00:09:57,330
We'd simply pick the one that's
greatest and that's the

150
00:09:57,330 --> 00:10:01,250
code word that is most likely
to have been sent.

151
00:10:01,250 --> 00:10:05,070
So that's what exhaustive
maximum likelihood decoding

152
00:10:05,070 --> 00:10:08,920
consists of, finding the maximum
likelihood word.

153
00:10:08,920 --> 00:10:12,270
And that clearly takes into
account the dependencies.

154
00:10:12,270 --> 00:10:14,100
We're only considering
valid code

155
00:10:14,100 --> 00:10:16,130
sequences when we do that.

156
00:10:18,920 --> 00:10:23,960
In APP decoding we're doing
something else, but it can be

157
00:10:23,960 --> 00:10:25,930
characterized quite simply.

158
00:10:25,930 --> 00:10:29,410
What's the a-posteriori
probability that say, the

159
00:10:29,410 --> 00:10:36,170
first bit, is equal to a
0 or is equal to a 1?

160
00:10:36,170 --> 00:10:43,350
We can get the APP by summing up
the likelihoods of all the

161
00:10:43,350 --> 00:10:46,660
code words that are associated
with the value of the bit that

162
00:10:46,660 --> 00:10:47,900
we want to see.

163
00:10:47,900 --> 00:10:50,900
So we take the eight code words
that have a 0 in this

164
00:10:50,900 --> 00:10:54,850
place and we'd sum up these
likelihoods for those eight

165
00:10:54,850 --> 00:10:58,010
code words, and that would be
the weight, the likelihood

166
00:10:58,010 --> 00:11:01,740
weight of y0 being 0.

167
00:11:01,740 --> 00:11:05,920
And the likelihood weight of y0
being 1 would be the sum of

168
00:11:05,920 --> 00:11:07,310
the other eight.

169
00:11:07,310 --> 00:11:09,290
And again you see that
scale factors aren't

170
00:11:09,290 --> 00:11:11,240
going to matter here.

171
00:11:11,240 --> 00:11:15,010
All we want is the relative
weights of these things.

172
00:11:15,010 --> 00:11:18,110
So that's what we're trying to
do in APP decoding, but we're

173
00:11:18,110 --> 00:11:21,480
trying to do it now for
every single variable.

174
00:11:21,480 --> 00:11:24,490
At this point, I've only shown
you the symbol variables, the

175
00:11:24,490 --> 00:11:26,100
external variables.

176
00:11:26,100 --> 00:11:30,040
So a brute force way of doing
that would be to fill out this

177
00:11:30,040 --> 00:11:35,600
table, all 16 likelihoods, do
these sums and then we'd have

178
00:11:35,600 --> 00:11:38,490
the APP vector for
each of these.

179
00:11:40,990 --> 00:11:44,820
Now we're looking for a more
efficient way of doing this.

180
00:11:44,820 --> 00:11:47,700
The efficient way of doing this
is going to be based on

181
00:11:47,700 --> 00:11:50,920
the fact we have a cycle-free
graph realization.

182
00:11:50,920 --> 00:11:53,810
Which in this case, is a
trellis realization.

183
00:11:53,810 --> 00:12:00,300
I'm going to use this two
section trellis where each

184
00:12:00,300 --> 00:12:02,810
section is a four-tuple.

185
00:12:02,810 --> 00:12:04,390
We've drawn it two
different ways.

186
00:12:04,390 --> 00:12:08,940
This is a very explicit way
showing the two parallel

187
00:12:08,940 --> 00:12:12,260
transitions that go from the
initial node to the central

188
00:12:12,260 --> 00:12:15,650
state, and then two
more down here.

189
00:12:15,650 --> 00:12:17,980
The set of code words
is the set of all

190
00:12:17,980 --> 00:12:20,070
possible paths through--

191
00:12:20,070 --> 00:12:23,080
corresponds to the set of all
paths through this trellis.

192
00:12:23,080 --> 00:12:27,200
So for instance, includes the
all zero word, (0,0,1,1,1),

193
00:12:27,200 --> 00:12:31,740
(1,1,1,0,0,0), (1,1,1,1,1,1,1),
and so forth.

194
00:12:31,740 --> 00:12:34,920
And these I happened to have
listed as the first four here.

195
00:12:37,830 --> 00:12:43,210
Or we now have this more
abstract way of writing this

196
00:12:43,210 --> 00:12:45,210
same thing.

197
00:12:45,210 --> 00:12:48,660
This says there are four
external variables and two

198
00:12:48,660 --> 00:12:54,030
state variables or a vector
space of dimension 2 here and

199
00:12:54,030 --> 00:12:56,640
a vector space with
dimension 4 here.

200
00:12:56,640 --> 00:13:01,370
That the dimension of this
constraint is 3.

201
00:13:01,370 --> 00:13:08,240
That means there are eight
possible values for these

202
00:13:08,240 --> 00:13:10,200
combinations of 4 and 2.

203
00:13:10,200 --> 00:13:12,610
They're shown explicitly
up here.

204
00:13:12,610 --> 00:13:16,315
And they form a linear
vector space.

205
00:13:22,370 --> 00:13:27,470
Either way, this is our
trellis realization.

206
00:13:27,470 --> 00:13:32,920
Now, we know that to do maximum
likelihood decoding,

207
00:13:32,920 --> 00:13:36,680
it's useful to have this trellis
realization, because,

208
00:13:36,680 --> 00:13:38,840
for instance, we can do
Viterbi decoding.

209
00:13:38,840 --> 00:13:41,830
And that's a more efficient
way of finding what the

210
00:13:41,830 --> 00:13:44,410
maximum likelihood
sequence is.

211
00:13:44,410 --> 00:13:47,210
Basically, because we go through
and at this point we

212
00:13:47,210 --> 00:13:52,760
can make a decision between
0, 0, 0, and 1, 1, 1, 1.

213
00:13:52,760 --> 00:13:56,670
We only have to do that based on
this four-tuple and then we

214
00:13:56,670 --> 00:13:59,110
can live with that decision
regardless of what else we see

215
00:13:59,110 --> 00:14:00,560
in the other half.

216
00:14:00,560 --> 00:14:01,950
And likewise in the
other direction.

217
00:14:04,570 --> 00:14:09,430
So now we'd like to use this
same structure to simplify the

218
00:14:09,430 --> 00:14:12,910
APP decoding calculation,
which looks like a more

219
00:14:12,910 --> 00:14:16,120
complicated calculation.

220
00:14:16,120 --> 00:14:17,490
So how do we do that?

221
00:14:17,490 --> 00:14:22,970
We introduce two state
bits here.

222
00:14:22,970 --> 00:14:28,230
All of these have state
bit 0, 0, 0, 0.

223
00:14:28,230 --> 00:14:32,410
Sorry, this point, we've
called this state 0, 1.

224
00:14:35,290 --> 00:14:40,430
Just to keep it visually
straight, 0, 1, 0, 1, 0, 1,

225
00:14:40,430 --> 00:14:43,900
and then there are eight more
corresponding to the other two

226
00:14:43,900 --> 00:14:47,210
possible values of the
state bits over here.

227
00:14:54,520 --> 00:14:56,350
This is a quaternary variable.

228
00:14:56,350 --> 00:15:00,940
We want it to assume that it's
a four-valued variable and to

229
00:15:00,940 --> 00:15:03,420
make the realization cycle-free,
I don't want to

230
00:15:03,420 --> 00:15:06,000
consider it to be two
independent bits.

231
00:15:06,000 --> 00:15:07,920
So there are four
possible values

232
00:15:07,920 --> 00:15:10,110
for this state variable.

233
00:15:10,110 --> 00:15:13,345
So maybe I just ought to call
this my state space --

234
00:15:16,600 --> 00:15:18,330
which is a four-valued
state space.

235
00:15:20,970 --> 00:15:28,330
Now, what's the a-posteriori
probability of the state

236
00:15:28,330 --> 00:15:30,810
being say, 0, 0?

237
00:15:30,810 --> 00:15:33,930
I compute that in
the same way.

238
00:15:33,930 --> 00:15:36,600
I compute that simply by
summing up the four

239
00:15:36,600 --> 00:15:43,170
likelihoods of the code for the
external variables, the

240
00:15:43,170 --> 00:15:49,405
code words that are associated
with state variable 0, 0 --

241
00:15:49,405 --> 00:15:53,880
the first possible values of
this quaternary variable.

242
00:15:53,880 --> 00:15:56,660
So it'd be the sum of
these four things.

243
00:15:56,660 --> 00:15:57,220
Yes?

244
00:15:57,220 --> 00:15:58,470
AUDIENCE:
[UNINTELLIGIBLE PHRASE]

245
00:16:01,990 --> 00:16:04,852
PROFESSOR: I've only listed
half the code words.

246
00:16:04,852 --> 00:16:07,630
This is only 8 of the
16 code words.

247
00:16:07,630 --> 00:16:09,270
You want me to write
the other 8?

248
00:16:15,010 --> 00:16:20,030
And by the way, there's at least
an implicit assumption

249
00:16:20,030 --> 00:16:24,220
here that the code sequence
determines the state sequence.

250
00:16:24,220 --> 00:16:29,630
As we found when we did minimal
trellis realizations,

251
00:16:29,630 --> 00:16:33,170
the code sequence always does
determine the state sequence.

252
00:16:33,170 --> 00:16:37,110
So if I know the entire code
word that also tells me what

253
00:16:37,110 --> 00:16:39,260
the values of all the
state variables are.

254
00:16:39,260 --> 00:16:42,240
There's a 1:1 correspondence
between code words and

255
00:16:42,240 --> 00:16:46,770
trajectories in a
minimal trellis.

256
00:16:46,770 --> 00:16:52,750
And I believe I'm sometimes
implicitly using that

257
00:16:52,750 --> 00:16:54,010
assumption.

258
00:16:54,010 --> 00:16:56,920
But anyway, it holds whenever
we have a cycle-free graph.

259
00:16:56,920 --> 00:16:59,690
In a cycle-free graph, we have
a well-defined minimal

260
00:16:59,690 --> 00:17:00,970
realization.

261
00:17:00,970 --> 00:17:04,359
And the minimal realization does
have this 1:1 property.

262
00:17:04,359 --> 00:17:06,619
So the state variables
are determined

263
00:17:06,619 --> 00:17:08,949
by the symbol variables.

264
00:17:13,410 --> 00:17:18,030
Suppose I want to compute the
values of these state

265
00:17:18,030 --> 00:17:18,800
variables here.

266
00:17:18,800 --> 00:17:21,130
Let me write this out.

267
00:17:21,130 --> 00:17:26,874
This will be p0, p1,
p2, p3, q1--

268
00:17:26,874 --> 00:17:27,720
sorry ---

269
00:17:27,720 --> 00:17:32,330
q4, q5, q6, q7.

270
00:17:32,330 --> 00:17:34,560
And then there's one
that's all q's.

271
00:17:34,560 --> 00:17:41,370
q0, q1, q2, q3, q4,
q5, q6, q7.

272
00:17:41,370 --> 00:17:43,960
So basically, what I want to
do is take the sum of these

273
00:17:43,960 --> 00:17:48,440
four things, and that's just a
sum of products and I could

274
00:17:48,440 --> 00:17:51,040
just do it.

275
00:17:51,040 --> 00:17:54,910
But the Cartesian product lemma
gives us a simpler way

276
00:17:54,910 --> 00:17:58,020
of doing it.

277
00:17:58,020 --> 00:18:02,980
We notice that I can have
either of these two

278
00:18:02,980 --> 00:18:06,470
four-tuples as the first half
and either of these two

279
00:18:06,470 --> 00:18:08,160
four-tuples as a second half.

280
00:18:08,160 --> 00:18:12,370
In fact, I see over here,
explicitly, these four code

281
00:18:12,370 --> 00:18:17,380
words are the Cartesian product
of two four-tuple

282
00:18:17,380 --> 00:18:22,330
codes, the repetition
code on 4.

283
00:18:22,330 --> 00:18:26,480
So I can use the Cartesian
product lemma to

284
00:18:26,480 --> 00:18:28,150
write this as --

285
00:18:28,150 --> 00:18:31,970
again, let me write it
very explicitly.

286
00:18:31,970 --> 00:18:39,140
p0, p1, p2, p3, which you
recognize as the probability

287
00:18:39,140 --> 00:18:45,780
of this four-tuple plus
q0, q1, q2, q3.

288
00:18:45,780 --> 00:18:55,060
The probability of this
four-tuple times p4, p5, p6,

289
00:18:55,060 --> 00:19:05,840
p7 plus q4, q5, q6, q7.

290
00:19:05,840 --> 00:19:08,760
Now do you agree that the
product of those two terms is

291
00:19:08,760 --> 00:19:13,250
equal to this, the sum
of these four terms?

292
00:19:13,250 --> 00:19:16,590
Yes, there are four terms
in this sum and

293
00:19:16,590 --> 00:19:17,790
they're what we want.

294
00:19:17,790 --> 00:19:22,190
And it's because we have this
Cartesian product structure,

295
00:19:22,190 --> 00:19:24,240
which we always have
in the state space.

296
00:19:24,240 --> 00:19:27,905
This is the basic Markov
property of a state variable.

297
00:19:31,310 --> 00:19:32,666
So that's a simpler
calculation.

298
00:19:37,510 --> 00:19:42,160
This requires four
multiplications, each one is

299
00:19:42,160 --> 00:19:44,420
seven-fold multiplication.

300
00:19:44,420 --> 00:19:51,320
This only requires one multip --
well, it requires one, two,

301
00:19:51,320 --> 00:19:54,290
three, four-fold
multiplications, plus one

302
00:19:54,290 --> 00:19:57,060
overall multiplication.

303
00:19:57,060 --> 00:20:01,316
And it's certainly a simpler
way to compute that.

304
00:20:04,080 --> 00:20:07,050
And I can similarly do that for
each of these down here.

305
00:20:13,530 --> 00:20:17,240
What this leads me to is what
I called last time the past

306
00:20:17,240 --> 00:20:23,890
future decomposition.

307
00:20:23,890 --> 00:20:31,656
We have one APP vector over here
that corresponds to the--

308
00:20:31,656 --> 00:20:44,770
let's call it the APP vector
of the state given the past

309
00:20:44,770 --> 00:20:45,800
received symbols.

310
00:20:45,800 --> 00:20:49,080
We had something like that.

311
00:20:49,080 --> 00:20:53,850
In other words, given what was
received for y0, y1, y2, y3,

312
00:20:53,850 --> 00:20:58,140
what's the probability, the
partial a-posteriori

313
00:20:58,140 --> 00:21:02,540
probability for each of the four
possible state variables

314
00:21:02,540 --> 00:21:06,310
given this past?

315
00:21:06,310 --> 00:21:10,860
And similarly, we have another
vector which I will consider

316
00:21:10,860 --> 00:21:13,990
to be a message coming in this
direction, a message

317
00:21:13,990 --> 00:21:17,370
consisting of a vector
of four values.

318
00:21:17,370 --> 00:21:20,300
Which is what's the probability
the state vector

319
00:21:20,300 --> 00:21:26,105
given these four received
simple, which is computed in

320
00:21:26,105 --> 00:21:26,700
the same way?

321
00:21:26,700 --> 00:21:29,320
That's this here.

322
00:21:29,320 --> 00:21:32,500
And the past future rule is
that just the overall

323
00:21:32,500 --> 00:21:35,480
probability for any of these
internal state variables, the

324
00:21:35,480 --> 00:21:43,330
overall APP vector, is just
obtained by a component-wise

325
00:21:43,330 --> 00:21:44,580
multiplication.

326
00:21:46,750 --> 00:21:51,850
This would be the APP of the
state vector being equal to 0,

327
00:21:51,850 --> 00:21:54,020
0 given the past.

328
00:21:54,020 --> 00:21:57,080
This would be APP of the
state vector given

329
00:21:57,080 --> 00:21:59,100
0, 0 given the future.

330
00:21:59,100 --> 00:22:02,790
We'd have the same thing
for 0, 1, 1, 0, 1, 1.

331
00:22:02,790 --> 00:22:07,390
And to get the overall APP
vector given all of r, the

332
00:22:07,390 --> 00:22:10,415
rule is just multiply
them component-wise.

333
00:22:13,390 --> 00:22:17,150
We take the likelihood weight
for 0, 0 given the past and

334
00:22:17,150 --> 00:22:19,390
multiply it by likelihood
weight of

335
00:22:19,390 --> 00:22:20,470
0, 0 given the future.

336
00:22:20,470 --> 00:22:23,840
And that gives us the total
likelihood weight up to a

337
00:22:23,840 --> 00:22:25,090
scale factor.

338
00:22:31,340 --> 00:22:35,350
The notes derive this
in equation terms.

339
00:22:35,350 --> 00:22:38,930
You see it basically comes from
this Cartesian product

340
00:22:38,930 --> 00:22:41,150
decomposition, which we
always get for any

341
00:22:41,150 --> 00:22:44,250
internal state variable.

342
00:22:44,250 --> 00:22:47,670
The possible code words
consistent with that state

343
00:22:47,670 --> 00:22:53,400
consist of a certain set of
possible past code words

344
00:22:53,400 --> 00:22:55,050
consistent with that state.

345
00:22:55,050 --> 00:22:58,250
Cartesian product with a set of
possible future code words

346
00:22:58,250 --> 00:22:59,920
consistent with that state.

347
00:22:59,920 --> 00:23:02,020
And as a result of that
Cartesian product

348
00:23:02,020 --> 00:23:06,530
decomposition, we always get
this APP vector decomposition.

349
00:23:06,530 --> 00:23:10,046
That's what the proof
says in the notes.

350
00:23:10,046 --> 00:23:11,555
Do you like this
way of arguing?

351
00:23:14,640 --> 00:23:16,280
A couple people are
nodding at least.

352
00:23:16,280 --> 00:23:18,540
It's an experiment.

353
00:23:18,540 --> 00:23:22,470
All right, so that tells us what
we're going to want to

354
00:23:22,470 --> 00:23:24,330
do, at least, for the
internal variables.

355
00:23:24,330 --> 00:23:27,170
And actually, the same
is true up here.

356
00:23:27,170 --> 00:23:29,500
The variables going
in are easy.

357
00:23:29,500 --> 00:23:33,690
That's just p0, p1, p2,
p3, or whatever.

358
00:23:33,690 --> 00:23:35,700
The ones coming out are--

359
00:23:35,700 --> 00:23:39,730
this is called the intrinsic
variable, the one that we get

360
00:23:39,730 --> 00:23:42,700
from direct observation,
the intrinsic APP

361
00:23:42,700 --> 00:23:44,970
vector, excuse me.

362
00:23:44,970 --> 00:23:48,180
The extrinsic APP vector is the
one that we get from all

363
00:23:48,180 --> 00:23:51,260
the rest of the inputs and
symbols and the other

364
00:23:51,260 --> 00:23:52,810
parts of the graph.

365
00:23:52,810 --> 00:23:55,450
And we combine these
component-wise to get the

366
00:23:55,450 --> 00:24:00,920
overall APP vector.

367
00:24:00,920 --> 00:24:07,280
All right, but we now
have to do this for

368
00:24:07,280 --> 00:24:09,730
every part of the graph.

369
00:24:13,410 --> 00:24:18,540
Let me go through another
calculation, which will--

370
00:24:18,540 --> 00:24:23,880
how do we, for instance, find
the APP vector for y7, the

371
00:24:23,880 --> 00:24:28,300
extrinsic APP vector for y7,
given that we have a

372
00:24:28,300 --> 00:24:29,840
trellis like this?

373
00:24:32,810 --> 00:24:37,910
In general, for y7, we're going
to be adding up all the

374
00:24:37,910 --> 00:24:40,990
zeros, these guys.

375
00:24:40,990 --> 00:24:45,060
And four more down here, and
we'll make that the extrinsic

376
00:24:45,060 --> 00:24:48,870
APP vector of 0.

377
00:24:48,870 --> 00:24:53,550
So we want to add up this and
this and this and this and so

378
00:24:53,550 --> 00:24:56,480
forth, eight of those.

379
00:24:56,480 --> 00:24:58,520
We want to do that in
an efficient way.

380
00:25:09,180 --> 00:25:14,465
This defines a code word with
eight possible components.

381
00:25:14,465 --> 00:25:17,720
Let me move this up now.

382
00:25:17,720 --> 00:25:20,640
So let's talk about
this 6, 3 code.

383
00:25:20,640 --> 00:25:22,320
What does it look like?

384
00:25:22,320 --> 00:25:31,290
With state vectors 0, 0,
I can have y4, y5, y6,

385
00:25:31,290 --> 00:25:33,570
y7 be 0, 0, 0, 0.

386
00:25:33,570 --> 00:25:36,920
Or I can have it
be 1, 1, 1, 1.

387
00:25:36,920 --> 00:25:39,690
The state vector is 0, 1.

388
00:25:39,690 --> 00:25:46,210
I can have 0, 1, 0, 1 or I
could have 1, 0, 1, 0.

389
00:25:46,210 --> 00:25:51,830
With state vectors variable
1, 0, I can have 0, 0, 1,

390
00:25:51,830 --> 00:25:56,010
1 or 1, 1, 0, 0.

391
00:25:56,010 --> 00:26:07,550
And with 1, 1, I can have 0, 1,
1, 0 or 1, 1, 1, 0, 0, 1.

392
00:26:07,550 --> 00:26:10,300
All right, so that's
precisely the 6, 3

393
00:26:10,300 --> 00:26:11,810
code I'm talking about.

394
00:26:14,440 --> 00:26:19,650
Now suppose I have the APP
vector for each of these.

395
00:26:19,650 --> 00:26:24,320
The APP vector for each of
these, 0, 0 we've already

396
00:26:24,320 --> 00:26:25,090
calculated.

397
00:26:25,090 --> 00:26:32,510
It'd be p0, p1, p2, p3
plus q0, q1, q2, q3.

398
00:26:32,510 --> 00:26:40,070
That's the likelihood vector for
0, 0 coming from the past.

399
00:26:40,070 --> 00:26:44,460
So that's one part of
this message here.

400
00:26:44,460 --> 00:26:55,830
For either of these I get
p0, q1, p2, q2, q3

401
00:26:55,830 --> 00:27:01,830
plus q0, p1, q2, p3.

402
00:27:01,830 --> 00:27:05,970
Reflecting the two possible ways
I can get to this value.

403
00:27:10,060 --> 00:27:15,060
I'm going to get four terms
that I've computed for the

404
00:27:15,060 --> 00:27:17,600
past message.

405
00:27:17,600 --> 00:27:20,600
I'm going to have four
likelihood weights

406
00:27:20,600 --> 00:27:23,100
corresponding to each
of the four values

407
00:27:23,100 --> 00:27:25,706
of this state variable.

408
00:27:25,706 --> 00:27:27,180
So I've got that vector.

409
00:27:27,180 --> 00:27:30,490
I already computed it as part
of the computation to do the

410
00:27:30,490 --> 00:27:31,490
APP for here.

411
00:27:31,490 --> 00:27:33,027
It's the past part of it.

412
00:27:36,730 --> 00:27:45,315
So now, I want to compute the
probability that y7 is a 0.

413
00:27:49,100 --> 00:27:56,750
The rule here is to take every
combination in this code and

414
00:27:56,750 --> 00:27:59,655
again, I'm going to get a--

415
00:27:59,655 --> 00:28:01,250
well, let me write
this part of it.

416
00:28:01,250 --> 00:28:05,990
This is going to be
p4, p5, p6, p7.

417
00:28:05,990 --> 00:28:08,210
That's the probability
of this four-tuple.

418
00:28:08,210 --> 00:28:13,710
This is q4, q5, q6, q7.

419
00:28:13,710 --> 00:28:26,660
This is p4, q5, p6, q7, q4,
p5, q6, p7 and so forth.

420
00:28:26,660 --> 00:28:30,230
That's the weights I want to
assign to each of these.

421
00:28:30,230 --> 00:28:36,940
So now I get a weight for each
of these six-tuples.

422
00:28:36,940 --> 00:28:41,510
I multiply this by this to get
the weight, the likelihood

423
00:28:41,510 --> 00:28:43,440
weight, for this code word.

424
00:28:43,440 --> 00:28:45,900
This by this to get the
likelihood weight for this

425
00:28:45,900 --> 00:28:47,550
code word and so forth.

426
00:28:47,550 --> 00:28:54,460
And I add these to
buckets for--

427
00:28:54,460 --> 00:28:57,310
I simply go through this and
I'll add this one to the

428
00:28:57,310 --> 00:28:59,100
bucket for y7 equals 0.

429
00:28:59,100 --> 00:29:02,030
And I'll add this to the
bucket for y7 equals 1.

430
00:29:02,030 --> 00:29:04,310
And this to the bucket
for y7 equals 1.

431
00:29:04,310 --> 00:29:08,450
And this to the bucket for
y7 equals 0 and so forth.

432
00:29:15,750 --> 00:29:19,760
It's clear that in this bucket
I'm always going to get the

433
00:29:19,760 --> 00:29:21,800
same value for p--

434
00:29:21,800 --> 00:29:27,310
I'm always going to get a
p7 for 0 and a q7 for 1.

435
00:29:27,310 --> 00:29:33,060
So just looking at y7, I don't
have to compute the seventh

436
00:29:33,060 --> 00:29:34,630
value here.

437
00:29:34,630 --> 00:29:42,430
So if I did this separately as a
little y7 variable, then the

438
00:29:42,430 --> 00:29:45,400
incoming message, the intrinsic
information, would

439
00:29:45,400 --> 00:29:51,070
just be this p7, q7 likelihood
weight vector.

440
00:29:53,970 --> 00:30:00,920
And what I really want to
compute is the outgoing part,

441
00:30:00,920 --> 00:30:03,620
which has to do with
y4 through y6.

442
00:30:03,620 --> 00:30:07,720
So let me draw it that way.

443
00:30:07,720 --> 00:30:11,190
This I don't think is very
good exposition.

444
00:30:11,190 --> 00:30:13,200
Bear with me.

445
00:30:13,200 --> 00:30:20,250
So now I'm going to put this
times this in my bucket for 0.

446
00:30:20,250 --> 00:30:25,950
This times this in my bucket
for 1, and so forth.

447
00:30:25,950 --> 00:30:30,580
The question here is, does this
method give me the same

448
00:30:30,580 --> 00:30:34,220
answer as my exhaustive
method up here?

449
00:30:34,220 --> 00:30:37,900
And again, because of the
Cartesian product character of

450
00:30:37,900 --> 00:30:41,980
the states, you can convince
yourself that it does.

451
00:30:41,980 --> 00:30:46,340
In other words, I can use this
as a summary of everything in

452
00:30:46,340 --> 00:30:50,880
the past that got to the state
value equal to 0, 0, 0.

453
00:30:50,880 --> 00:30:54,820
Any time I got to 0, 0,
0, I got through one

454
00:30:54,820 --> 00:30:56,690
of these two things.

455
00:30:56,690 --> 00:31:02,320
And so for any possible
continuation over here, I

456
00:31:02,320 --> 00:31:05,010
could have got to it in
one of these two ways.

457
00:31:05,010 --> 00:31:08,995
And I'm going to find terms,
e.g., these two terms.

458
00:31:12,040 --> 00:31:13,740
Let's see.

459
00:31:13,740 --> 00:31:14,540
What do I want?

460
00:31:14,540 --> 00:31:18,080
I want two ways of getting
to 0, 0, 0.

461
00:31:18,080 --> 00:31:20,940
I'm sorry, these two.

462
00:31:20,940 --> 00:31:25,160
I claim that this times
these is equal to the

463
00:31:25,160 --> 00:31:28,310
sum of these two.

464
00:31:28,310 --> 00:31:29,370
And it is.

465
00:31:29,370 --> 00:31:34,260
I can get to 0, 0, 0, 0, 0
either through this or this.

466
00:31:34,260 --> 00:31:39,320
And I get to 1, 1, 1, 1, which
is this one, either through

467
00:31:39,320 --> 00:31:40,310
this or this.

468
00:31:40,310 --> 00:31:43,860
And again, it's a fundamental
property of the state that

469
00:31:43,860 --> 00:31:46,850
whenever I get one of these, I'm
going to get both of them

470
00:31:46,850 --> 00:31:49,980
in equal combination.

471
00:31:49,980 --> 00:31:54,210
So I can summarize things in
this way, which leads to an

472
00:31:54,210 --> 00:31:57,260
efficiency of computation.

473
00:31:57,260 --> 00:32:01,550
So what I'm basically explaining
to you now is the

474
00:32:01,550 --> 00:32:12,865
sum-product update rule, which
is how to propagate messages.

475
00:32:17,710 --> 00:32:24,425
Propagating messages, which
are APP vectors.

476
00:32:29,860 --> 00:32:32,670
In general, my situation is
going to be like this.

477
00:32:32,670 --> 00:32:36,680
I'm going to have some
constraint code, (n,k), I'm

478
00:32:36,680 --> 00:32:42,160
going to have some incoming
messages over here which tell

479
00:32:42,160 --> 00:32:45,550
me about everything that's
happened in this past, p

480
00:32:45,550 --> 00:32:48,400
prime, and in this
past, p prime.

481
00:32:48,400 --> 00:32:52,310
Give me a summary of all the
weights corresponding to that.

482
00:32:52,310 --> 00:32:56,980
I'm going to have some output
symbol over here and I want to

483
00:32:56,980 --> 00:33:02,530
compute now what the APP vector
is for the possible

484
00:33:02,530 --> 00:33:06,040
values of this output symbol,
which is going to be some

485
00:33:06,040 --> 00:33:08,010
aggregate of this down here.

486
00:33:08,010 --> 00:33:12,560
What I do is I go through all
2 to the k code words, call

487
00:33:12,560 --> 00:33:13,900
this constraint code.

488
00:33:13,900 --> 00:33:15,870
Sorry, I'm using k again.

489
00:33:15,870 --> 00:33:18,930
That's 2 to the k code words.

490
00:33:18,930 --> 00:33:22,470
And compute the product
of the inputs.

491
00:33:30,130 --> 00:33:39,620
Of input APP vectors according
to the possible combinations

492
00:33:39,620 --> 00:33:42,430
of the input variables and the
output variables that are

493
00:33:42,430 --> 00:33:43,680
allowed by this constraint.

494
00:33:46,390 --> 00:33:48,770
That's explicitly what
I'm doing here.

495
00:33:48,770 --> 00:33:52,410
Here are all the possible
combinations of state

496
00:33:52,410 --> 00:33:56,210
variables and symbol variables
that are allowed by this 6, 3

497
00:33:56,210 --> 00:33:57,020
constraint code.

498
00:33:57,020 --> 00:33:59,410
There are eight of them.

499
00:33:59,410 --> 00:34:05,420
For each one, I'm going to
compute the relevant product

500
00:34:05,420 --> 00:34:08,599
of APP vectors and I'm going
to add it to a bin.

501
00:34:08,599 --> 00:34:13,889
So add to a bin.

502
00:34:13,889 --> 00:34:16,954
Bins which correspond to--

503
00:34:16,954 --> 00:34:20,580
draw it like that--

504
00:34:20,580 --> 00:34:21,060
output.

505
00:34:21,060 --> 00:34:24,539
I'm calling this the output
variable values.

506
00:34:34,840 --> 00:34:39,260
OK, and that'll give me the
summary APP vector of the

507
00:34:39,260 --> 00:34:45,960
outputs for now the past, which
consists of everything

508
00:34:45,960 --> 00:34:50,800
that could have happened in
this side of the graph.

509
00:34:50,800 --> 00:34:54,219
Again, I'm using the cycle-free
assumption in

510
00:34:54,219 --> 00:34:55,120
several ways.

511
00:34:55,120 --> 00:34:57,930
I'm saying that everything
that happens up here is

512
00:34:57,930 --> 00:35:01,770
independent of everything
that happens here.

513
00:35:01,770 --> 00:35:05,030
Everything that happens
completely in this past is

514
00:35:05,030 --> 00:35:07,140
just the sum of what
happens here.

515
00:35:07,140 --> 00:35:10,570
What happens here are subject
to this constraint and has

516
00:35:10,570 --> 00:35:12,920
nothing to do with what happens

517
00:35:12,920 --> 00:35:16,360
over here in the future.

518
00:35:16,360 --> 00:35:19,000
Eventually I'm going to do the
same thing in the future.

519
00:35:19,000 --> 00:35:22,390
I'm going to compute a future
component that's based on all

520
00:35:22,390 --> 00:35:23,365
this stuff.

521
00:35:23,365 --> 00:35:26,130
I'm going to get messages going
in each direction and

522
00:35:26,130 --> 00:35:28,010
I'm going to sum them up.

523
00:35:31,120 --> 00:35:34,130
OK, now I'm seeing a fair
amount of puzzlement on

524
00:35:34,130 --> 00:35:37,395
people's faces.

525
00:35:40,410 --> 00:35:45,180
Do you get abstractly what
I'm trying to do here?

526
00:35:45,180 --> 00:35:47,415
Does anyone want to ask
a clarifying question?

527
00:36:02,450 --> 00:36:03,392
So now--

528
00:36:03,392 --> 00:36:05,182
AUDIENCE:
[UNINTELLIGIBLE PHRASE]

529
00:36:05,182 --> 00:36:08,570
the bottom right,
output variable?

530
00:36:08,570 --> 00:36:10,230
PROFESSOR: Output variable
values, I'm sorry.

531
00:36:19,080 --> 00:36:22,210
I think there's a little bit
of confusion here because

532
00:36:22,210 --> 00:36:24,240
sometimes I've considered
this output to

533
00:36:24,240 --> 00:36:26,780
be a 16-valued output.

534
00:36:26,780 --> 00:36:31,840
Here, in fact, well, it has
eight actually valid values.

535
00:36:31,840 --> 00:36:35,690
And if I was considering it that
way, then I would just

536
00:36:35,690 --> 00:36:38,600
have one bin for each of
these likelihoods.

537
00:36:38,600 --> 00:36:43,630
But now if I consider these as
four binary variables, then I

538
00:36:43,630 --> 00:36:47,510
would aggregate these things
according to the values of

539
00:36:47,510 --> 00:36:50,560
each of the binary values.

540
00:36:50,560 --> 00:36:56,330
And I'd get four vectors
of length 2 for

541
00:36:56,330 --> 00:36:58,530
the individual APP.

542
00:36:58,530 --> 00:37:01,900
Which is ultimately what I want
here, so I'm sorry I've

543
00:37:01,900 --> 00:37:08,340
gone back and forth between
subtly different realizations.

544
00:37:08,340 --> 00:37:10,410
I do find this very
hard to explain.

545
00:37:10,410 --> 00:37:15,120
Perhaps there's someone else
who can do a better job.

546
00:37:15,120 --> 00:37:18,990
You have a homework problem
which asks you to do this, and

547
00:37:18,990 --> 00:37:23,410
I think having done that you'll
then get the idea.

548
00:37:23,410 --> 00:37:26,260
But it's a matter of going back
and forth between the

549
00:37:26,260 --> 00:37:28,770
gross structure and the
actual equations and

550
00:37:28,770 --> 00:37:30,020
just working it out.

551
00:37:37,090 --> 00:37:43,980
All right, so the key things in
the sum-product algorithm.

552
00:37:43,980 --> 00:37:49,820
I've already explained
some level to them.

553
00:37:49,820 --> 00:37:51,600
We have this sum-product
update.

554
00:37:55,400 --> 00:38:02,780
That basically explains given
messages coming into a node,

555
00:38:02,780 --> 00:38:07,600
how do we compute the message
going out of the node?

556
00:38:07,600 --> 00:38:15,090
We take contributions, products
corresponding to what

557
00:38:15,090 --> 00:38:16,300
the code allows.

558
00:38:16,300 --> 00:38:18,970
We dump them into the
appropriate bins in the output

559
00:38:18,970 --> 00:38:22,902
vector and that's the
sum-product update rule.

560
00:38:25,790 --> 00:38:30,063
We have the past future
decomposition

561
00:38:30,063 --> 00:38:34,270
or combination rule.

562
00:38:34,270 --> 00:38:37,140
And that says at the end of the
day, you're finally going

563
00:38:37,140 --> 00:38:40,380
to get messages going
in both directions.

564
00:38:40,380 --> 00:38:45,200
And you just combine them
component-wise as

565
00:38:45,200 --> 00:38:47,360
we started out with.

566
00:38:47,360 --> 00:38:51,414
And finally, we need to talk
about the overall schedule of

567
00:38:51,414 --> 00:38:52,664
the algorithm.

568
00:38:56,480 --> 00:39:01,413
So let me draw an arbitrary
cycle-free graph.

569
00:39:14,060 --> 00:39:18,540
OK, suppose I have an arbitrary
cycle-free graph.

570
00:39:18,540 --> 00:39:23,290
It consists of external
variables out here, internal

571
00:39:23,290 --> 00:39:28,240
variables and constraint
nodes.

572
00:39:28,240 --> 00:39:34,470
How am I going to schedule these
computations in order to

573
00:39:34,470 --> 00:39:38,970
do APP decoding of this graph?

574
00:39:38,970 --> 00:39:40,640
Well, what do I have
at the beginning?

575
00:39:40,640 --> 00:39:44,380
At the beginning, I measure
the received variables

576
00:39:44,380 --> 00:39:48,020
corresponding to each symbol,
and that gives me the

577
00:39:48,020 --> 00:39:53,090
intrinsic information, which
is incoming message you can

578
00:39:53,090 --> 00:39:56,710
consider from every
symbol variable.

579
00:39:56,710 --> 00:39:58,540
Basically, what did I
see on the channel?

580
00:39:58,540 --> 00:40:05,210
That's what that likelihood
weight vector corresponds to.

581
00:40:05,210 --> 00:40:06,460
All right.

582
00:40:09,640 --> 00:40:14,120
What do I need in order to do
the sum-product update rule?

583
00:40:14,120 --> 00:40:15,870
I need the inputs.

584
00:40:15,870 --> 00:40:18,090
Think of this now as
a directed graph.

585
00:40:18,090 --> 00:40:27,070
I need the incoming messages on
all of the incident edges

586
00:40:27,070 --> 00:40:28,950
on this node except for one.

587
00:40:28,950 --> 00:40:32,580
If I have all but one, I can
compute the last one as an

588
00:40:32,580 --> 00:40:35,120
outgoing message.

589
00:40:35,120 --> 00:40:39,340
So that's the structure
of that update.

590
00:40:39,340 --> 00:40:42,250
So at this point, can I
compute anything here?

591
00:40:42,250 --> 00:40:44,870
No, I would need the inputs.

592
00:40:44,870 --> 00:40:47,740
But I can compute this
outgoing message.

593
00:40:47,740 --> 00:40:50,470
I can compute this
outgoing message.

594
00:40:50,470 --> 00:40:53,200
I can compute this
outgoing message.

595
00:40:53,200 --> 00:40:56,260
I haven't drawn these
of very high degree.

596
00:40:56,260 --> 00:40:59,065
Normally we have degree
higher than two.

597
00:40:59,065 --> 00:41:04,190
If I had another input here,
however, I could still get

598
00:41:04,190 --> 00:41:05,440
that output.

599
00:41:07,430 --> 00:41:11,270
All right, so that's time one.

600
00:41:11,270 --> 00:41:15,780
Think of there being a clock
and at time one I can

601
00:41:15,780 --> 00:41:18,410
propagate messages
into the graph to

602
00:41:18,410 --> 00:41:21,530
depth one if you like.

603
00:41:21,530 --> 00:41:25,260
There's going to be a systematic
way that we can

604
00:41:25,260 --> 00:41:26,750
assign depths.

605
00:41:26,750 --> 00:41:30,100
All right, now at time two
what can I compute?

606
00:41:30,100 --> 00:41:33,300
I can compute this guy because
I have this input.

607
00:41:36,600 --> 00:41:39,130
Now I have both inputs
coming in here, I can

608
00:41:39,130 --> 00:41:41,460
compute this guy.

609
00:41:41,460 --> 00:41:44,610
Anything else?

610
00:41:44,610 --> 00:41:47,100
So maybe 0, 1, 2.

611
00:41:47,100 --> 00:41:48,970
I should put times
on each of these,

612
00:41:48,970 --> 00:41:50,660
so I don't get confused.

613
00:41:50,660 --> 00:41:54,810
0, 0, 1, 0.

614
00:41:54,810 --> 00:41:56,750
So this is 2 at the output.

615
00:42:00,300 --> 00:42:05,060
Now I'm at time three,
what can I compute?

616
00:42:05,060 --> 00:42:10,210
At time three, I now have two
inputs coming in here.

617
00:42:10,210 --> 00:42:13,910
I can compute this output.

618
00:42:13,910 --> 00:42:16,930
Actually, from these two inputs,
I can compute this

619
00:42:16,930 --> 00:42:18,850
output at time three.

620
00:42:18,850 --> 00:42:21,910
And from these two, I can
compute this outgoing message

621
00:42:21,910 --> 00:42:23,160
at time three.

622
00:42:28,170 --> 00:42:29,880
Are there other things
I can compute?

623
00:42:29,880 --> 00:42:31,380
Can I compute this?

624
00:42:31,380 --> 00:42:32,830
No, I don't have this yet.

625
00:42:36,360 --> 00:42:37,370
That's probably it.

626
00:42:37,370 --> 00:42:39,410
Does anyone else see anything
else I can compute at time

627
00:42:39,410 --> 00:42:44,050
three based on the time zero,
one, and two messages?

628
00:42:44,050 --> 00:42:45,300
No.

629
00:42:47,030 --> 00:42:49,530
All right, but I'm
making progress.

630
00:42:49,530 --> 00:42:52,820
And should be clear that I'm
always going to be able to

631
00:42:52,820 --> 00:42:53,960
make some progress.

632
00:42:53,960 --> 00:42:56,130
At time four I can compute this

633
00:42:56,130 --> 00:42:59,400
message in that direction.

634
00:42:59,400 --> 00:43:02,560
Let's see, I now have everything
I need to compute

635
00:43:02,560 --> 00:43:05,940
this message in that
direction.

636
00:43:05,940 --> 00:43:08,860
And I'm almost done.

637
00:43:08,860 --> 00:43:13,460
And finally, I can compute this
message in this direction

638
00:43:13,460 --> 00:43:15,030
at time four, the extrinsic

639
00:43:15,030 --> 00:43:17,710
information for those variables.

640
00:43:17,710 --> 00:43:20,720
I guess I can do this
at time four.

641
00:43:20,720 --> 00:43:24,150
I had that at time three.

642
00:43:24,150 --> 00:43:27,260
And now at time five, I can get

643
00:43:27,260 --> 00:43:30,086
everything else that I need.

644
00:43:30,086 --> 00:43:32,900
Time five I get that out.

645
00:43:32,900 --> 00:43:39,700
I get this out based on this
input and this input.

646
00:43:39,700 --> 00:43:45,470
And I get this out based on
this input and this input.

647
00:43:45,470 --> 00:43:49,560
So in a finite amount of time,
in this case, five

648
00:43:49,560 --> 00:43:54,250
computational cycles, I can
compute all of the messages in

649
00:43:54,250 --> 00:43:59,860
both directions for every
variable in the graph.

650
00:43:59,860 --> 00:44:05,190
Now as a final cleanup step, I
simply component-wise multiply

651
00:44:05,190 --> 00:44:07,490
the vectors in the two
directions, and that gives me

652
00:44:07,490 --> 00:44:09,195
the APPs for all
the variables.

653
00:44:11,920 --> 00:44:15,330
This is a kind of parallel
computation of all the APPs

654
00:44:15,330 --> 00:44:17,550
for all variables
in the graph.

655
00:44:20,090 --> 00:44:22,960
And I claim two things.

656
00:44:22,960 --> 00:44:25,910
It's finite and it's exact.

657
00:44:25,910 --> 00:44:30,820
And the exactness is proved
from the equations and the

658
00:44:30,820 --> 00:44:31,845
cycle-free assumptions.

659
00:44:31,845 --> 00:44:35,560
So we can always decompose
things as Cartesian products.

660
00:44:35,560 --> 00:44:39,340
Independent components.

661
00:44:39,340 --> 00:44:41,650
The finiteness, well, I assumed

662
00:44:41,650 --> 00:44:44,530
this is a finite graph.

663
00:44:44,530 --> 00:44:47,970
What is this number 5?

664
00:44:47,970 --> 00:44:51,710
It's really the maximum length
of time it takes me to get

665
00:44:51,710 --> 00:44:58,160
from any of these external
symbol variables

666
00:44:58,160 --> 00:44:59,760
to any other external--

667
00:44:59,760 --> 00:45:02,900
from leaf to leaf in
graph theory terms.

668
00:45:02,900 --> 00:45:04,850
And that's called the diameter
of the graph.

669
00:45:04,850 --> 00:45:08,920
And again, some people count
the last little symbol node

670
00:45:08,920 --> 00:45:11,670
and some people don't count the
last little symbol node.

671
00:45:11,670 --> 00:45:14,420
But it's clear that a finite
graph is going to have finite

672
00:45:14,420 --> 00:45:18,180
diameter and that the
propagation time is basically

673
00:45:18,180 --> 00:45:19,830
going to be equal
to the diameter.

674
00:45:19,830 --> 00:45:23,660
That's how long it takes to
get across the graph.

675
00:45:23,660 --> 00:45:27,410
So each of these messages
in each direction

676
00:45:27,410 --> 00:45:29,210
has a certain depth.

677
00:45:29,210 --> 00:45:33,380
That's the distance in
the graph to the most

678
00:45:33,380 --> 00:45:37,620
distant leaf node.

679
00:45:37,620 --> 00:45:42,200
And the maximum sum of these
is always going to

680
00:45:42,200 --> 00:45:44,320
be 5 in this case.

681
00:45:44,320 --> 00:45:46,110
Notice that it is.

682
00:45:46,110 --> 00:45:50,415
A tiny little graph
theory theorem.

683
00:45:50,415 --> 00:45:51,665
AUDIENCE:
[UNINTELLIGIBLE PHRASE]

684
00:45:53,760 --> 00:45:59,534
you see why if you sum these
messages flying through those

685
00:45:59,534 --> 00:46:03,285
past and future of the single
edge, you would get the

686
00:46:03,285 --> 00:46:05,850
overall APP because you're
summing the contribution from

687
00:46:05,850 --> 00:46:10,981
both the set of outputs and
those [UNINTELLIGIBLE PHRASE].

688
00:46:10,981 --> 00:46:15,762
How would you prove that that
should be overall the APP of

689
00:46:15,762 --> 00:46:17,012
the symbols?

690
00:46:19,640 --> 00:46:22,750
PROFESSOR: Recall the
state space theorem.

691
00:46:22,750 --> 00:46:25,420
Let's take any edge in here.

692
00:46:25,420 --> 00:46:28,100
Suppose we think about
cutting it.

693
00:46:28,100 --> 00:46:31,530
A cut-set through that edge.

694
00:46:31,530 --> 00:46:35,110
Suppose we consider
agglomerating everything back

695
00:46:35,110 --> 00:46:37,800
here and calling
that the past.

696
00:46:37,800 --> 00:46:39,940
Agglomerating everything
out here and

697
00:46:39,940 --> 00:46:42,390
calling that the future.

698
00:46:42,390 --> 00:46:46,250
This is past and future.

699
00:46:46,250 --> 00:46:49,230
The idea of the state space
theorem was that this is

700
00:46:49,230 --> 00:46:52,740
exactly the same situation
as a two section trellis.

701
00:46:56,610 --> 00:47:04,020
Because everything is linear
you get the state space

702
00:47:04,020 --> 00:47:09,160
theorem basically, which says
that you get a Cartesian

703
00:47:09,160 --> 00:47:11,700
product of a certain past,
a certain future.

704
00:47:11,700 --> 00:47:14,500
You get a minimal state
space, which is kind

705
00:47:14,500 --> 00:47:17,050
of the total code.

706
00:47:17,050 --> 00:47:20,080
The modulo, the part of the code
that lives purely on the

707
00:47:20,080 --> 00:47:24,090
past, and the part that lives
purely on the future.

708
00:47:24,090 --> 00:47:28,400
That quotient space is really
what it is, corresponds to the

709
00:47:28,400 --> 00:47:31,870
state space at this time.

710
00:47:31,870 --> 00:47:35,920
Because every edge in a
cycle-free graph is by itself

711
00:47:35,920 --> 00:47:40,720
a cut-set, you can do this
for every edge in

712
00:47:40,720 --> 00:47:42,550
any cycle-free graph.

713
00:47:42,550 --> 00:47:44,000
You can make this
same argument.

714
00:47:44,000 --> 00:47:47,350
So you get a well-defined
minimal state space.

715
00:47:47,350 --> 00:47:51,220
You get a Cartesian product
decomposition, which has a

716
00:47:51,220 --> 00:47:54,695
number of terms equal to the
size of the state space, due

717
00:47:54,695 --> 00:47:58,260
to the dimension of the state
space over binary field.

718
00:47:58,260 --> 00:48:00,130
And so everything goes through
just as it did

719
00:48:00,130 --> 00:48:01,160
in the trellis case.

720
00:48:01,160 --> 00:48:04,700
That was the argument.

721
00:48:04,700 --> 00:48:07,910
So mushing all this
together, you get

722
00:48:07,910 --> 00:48:11,260
the sum-product algorithm.

723
00:48:11,260 --> 00:48:15,240
This is the fundamental reason
why I took the time to go

724
00:48:15,240 --> 00:48:22,390
through minimal trellis
realizations of block codes.

725
00:48:22,390 --> 00:48:25,810
Not because that's actually the
way we decode block codes

726
00:48:25,810 --> 00:48:28,880
all by themselves, or even
use block codes.

727
00:48:28,880 --> 00:48:31,040
We use convolutional codes.

728
00:48:31,040 --> 00:48:33,680
It's so that we could prove
these very fundamental

729
00:48:33,680 --> 00:48:36,450
theorems about graph decoding.

730
00:48:36,450 --> 00:48:40,090
And we aren't even going to
decode cycle-free graphs.

731
00:48:40,090 --> 00:48:42,270
But that's the intellectual
line here.

732
00:48:48,490 --> 00:48:52,140
Thank you for that question,
it was very well timed.

733
00:48:52,140 --> 00:48:53,390
AUDIENCE:
[UNINTELLIGIBLE PHRASE]

734
00:49:00,450 --> 00:49:03,140
PROFESSOR: Let's take a look
at the structure of this.

735
00:49:03,140 --> 00:49:05,990
Where do the computations
occur?

736
00:49:05,990 --> 00:49:09,060
Every node there's
a computation.

737
00:49:09,060 --> 00:49:12,910
What's the complexity
of that computation?

738
00:49:12,910 --> 00:49:17,100
It's somehow approximately
equal to the size of the

739
00:49:17,100 --> 00:49:17,950
constraint code.

740
00:49:17,950 --> 00:49:21,220
We do one thing for every
possible combination of

741
00:49:21,220 --> 00:49:24,410
legitimate combination of
variables, which is what we

742
00:49:24,410 --> 00:49:26,710
mean by the size of the
constraint code.

743
00:49:26,710 --> 00:49:30,780
So if this is a 6, 3 code, we
need to do eight things here.

744
00:49:30,780 --> 00:49:35,070
Basically, eight
multiplications.

745
00:49:35,070 --> 00:49:39,540
So this, in more general
cycle-free graphs, that's what

746
00:49:39,540 --> 00:49:45,380
corresponds to branch complexity
and trellis graphs.

747
00:49:45,380 --> 00:49:48,940
We found that the branch space
corresponded to a constraint

748
00:49:48,940 --> 00:49:51,290
code in trellises.

749
00:49:51,290 --> 00:49:55,470
So the computation is basically
determined by the

750
00:49:55,470 --> 00:49:58,970
constraint code complexity and
really, overall what we'd like

751
00:49:58,970 --> 00:50:03,410
to do is minimize the constraint
code, the maximum

752
00:50:03,410 --> 00:50:05,740
size of any constraint code
because that's going to

753
00:50:05,740 --> 00:50:09,190
dominate computation.

754
00:50:09,190 --> 00:50:12,200
So we try to keep
the k in these

755
00:50:12,200 --> 00:50:15,690
things as small as possible.

756
00:50:15,690 --> 00:50:16,070
Yes?

757
00:50:16,070 --> 00:50:17,320
AUDIENCE:
[UNINTELLIGIBLE PHRASE]

758
00:50:21,650 --> 00:50:26,901
and you get larger dimension
constraint for

759
00:50:26,901 --> 00:50:28,151
[UNINTELLIGIBLE PHRASE].

760
00:50:31,691 --> 00:50:33,890
PROFESSOR: Yeah, we're actually
not too concerned

761
00:50:33,890 --> 00:50:35,140
about diameter.

762
00:50:37,150 --> 00:50:39,750
Because things go up
exponentially with constraint

763
00:50:39,750 --> 00:50:43,310
code dimension, just that single
parameter tends to be

764
00:50:43,310 --> 00:50:46,230
the thing we want to focus on.

765
00:50:46,230 --> 00:50:50,160
But we remember that's hard
to minimize in a trellis.

766
00:50:50,160 --> 00:50:55,240
We showed that basically our
only trick there after

767
00:50:55,240 --> 00:50:58,990
ordering the coordinates was
sectionalization and that we

768
00:50:58,990 --> 00:51:02,070
could not reduce the constraint
code complexity,

769
00:51:02,070 --> 00:51:06,150
the branch complexity,
by sectionalization.

770
00:51:06,150 --> 00:51:08,800
And then again, we build on that
and we went through this

771
00:51:08,800 --> 00:51:13,460
argument that said, gee, this
corresponds to some trellis.

772
00:51:13,460 --> 00:51:17,130
We could draw a trellis where
this, literally was the past

773
00:51:17,130 --> 00:51:18,650
and this was the future.

774
00:51:18,650 --> 00:51:21,990
And the minimal size state space
in this cycle-free graph

775
00:51:21,990 --> 00:51:24,165
would be the same as
in that trellis.

776
00:51:26,710 --> 00:51:31,500
What I guess I didn't show
is you can't reduce the

777
00:51:31,500 --> 00:51:37,320
constraint code complexity
by significantly --

778
00:51:37,320 --> 00:51:39,730
actually, I'm not sure there's
a crisp theorem.

779
00:51:39,730 --> 00:51:44,620
But the fact that you now are
constrained to size of state

780
00:51:44,620 --> 00:51:48,260
space, this is certainly going
to be at least as large as

781
00:51:48,260 --> 00:51:49,610
state space sizes.

782
00:51:49,610 --> 00:51:52,580
Tends to indicate that you're
not going to be able to do a

783
00:51:52,580 --> 00:51:56,520
lot about constraint code
complexity by going to more

784
00:51:56,520 --> 00:51:59,820
elaborate cycle-free graphs than
just the chain trellis

785
00:51:59,820 --> 00:52:01,070
graph either.

786
00:52:03,610 --> 00:52:05,902
That's, I think, a missing
theorem that

787
00:52:05,902 --> 00:52:08,530
would be nice to have.

788
00:52:08,530 --> 00:52:12,380
I had a paper about a year or
two ago, I'm not sure it ever

789
00:52:12,380 --> 00:52:13,630
quite gets to that theorem.

790
00:52:13,630 --> 00:52:15,450
But that's the theorem
you'd like to see.

791
00:52:15,450 --> 00:52:19,830
You really can't improve this.

792
00:52:19,830 --> 00:52:23,690
So ultimately, we're bound by
what you can do in a trellis.

793
00:52:23,690 --> 00:52:27,520
And in a trellis you can't do
too much about this branch

794
00:52:27,520 --> 00:52:30,690
complexity parameter once
you've ordered the

795
00:52:30,690 --> 00:52:31,940
coordinates.

796
00:52:35,320 --> 00:52:37,810
A bit of hand-waving here, but
we're going to need to go away

797
00:52:37,810 --> 00:52:39,060
from cycle-free graphs.

798
00:52:41,430 --> 00:52:44,720
One reason I like normal graphs
is that you get this

799
00:52:44,720 --> 00:52:47,870
very nice separation
of function.

800
00:52:47,870 --> 00:52:50,220
You get the idea that nodes

801
00:52:50,220 --> 00:52:52,380
correspond to little computers.

802
00:52:52,380 --> 00:52:55,650
You could actually realize this
with little computers.

803
00:52:55,650 --> 00:52:57,410
Put a computer in here
for each node.

804
00:52:57,410 --> 00:53:02,410
Its job is to do the sum-product
update equation.

805
00:53:02,410 --> 00:53:05,880
And then, well, what
are the edges?

806
00:53:05,880 --> 00:53:08,610
They're for communication.

807
00:53:08,610 --> 00:53:13,450
These are the wires that run
around on your chip.

808
00:53:13,450 --> 00:53:16,280
And so when we talk about state
space complexity, we're

809
00:53:16,280 --> 00:53:18,310
really talking about something
like bandwidth.

810
00:53:18,310 --> 00:53:21,170
How wide do you have to
make these wires?

811
00:53:21,170 --> 00:53:24,410
Do you have 6 bits, or
9 bits, or whatever?

812
00:53:24,410 --> 00:53:26,990
So state space really has
to do with communication

813
00:53:26,990 --> 00:53:28,440
complexity.

814
00:53:28,440 --> 00:53:31,840
Constraint or branch has to
do with the computational

815
00:53:31,840 --> 00:53:36,910
complexity, which sometimes
one's more

816
00:53:36,910 --> 00:53:37,690
important than the other.

817
00:53:37,690 --> 00:53:40,100
Computational tends to be
the thing we focus on.

818
00:53:40,100 --> 00:53:43,150
But we know more and more as
we get to more complicated

819
00:53:43,150 --> 00:53:45,890
chips, communications
complexity

820
00:53:45,890 --> 00:53:50,410
tends to become dominant.

821
00:53:50,410 --> 00:53:55,780
And this, of course, is the
I/O. These are your paths

822
00:53:55,780 --> 00:53:59,230
which go to the outside
world, your I/O paths.

823
00:53:59,230 --> 00:54:01,650
So it's really a good way
to think about this.

824
00:54:01,650 --> 00:54:04,190
And in fact, some people like
Andy [UNINTELLIGIBLE]

825
00:54:04,190 --> 00:54:08,290
and his students have
experimented with building

826
00:54:08,290 --> 00:54:10,095
analog sum-product decoders.

827
00:54:13,650 --> 00:54:17,420
The general idea here is that
you can compute an output as

828
00:54:17,420 --> 00:54:21,510
soon as you have the two
corresponding inputs.

829
00:54:21,510 --> 00:54:25,910
Well, imagine another schedule
where this thing just computes

830
00:54:25,910 --> 00:54:28,725
all the time based on whatever
inputs it has.

831
00:54:31,680 --> 00:54:36,500
In a cycle-free graph, say
you're in the interior.

832
00:54:36,500 --> 00:54:40,420
Initially, what you have on
your inputs is garbage.

833
00:54:40,420 --> 00:54:42,930
But eventually, you're going to
get the right things on all

834
00:54:42,930 --> 00:54:43,580
your inputs.

835
00:54:43,580 --> 00:54:45,430
And therefore, you're going to
put the right things on all

836
00:54:45,430 --> 00:54:46,810
your outputs.

837
00:54:46,810 --> 00:54:51,050
So again, up to some propagation
times, just

838
00:54:51,050 --> 00:54:55,260
putting in non-clocked little
computers here, whose function

839
00:54:55,260 --> 00:54:58,720
is to generate the right outputs
given the inputs in

840
00:54:58,720 --> 00:55:02,480
each direction, is eventually
going to be right in a

841
00:55:02,480 --> 00:55:05,330
cycle-free graph.

842
00:55:05,330 --> 00:55:08,180
And in fact, there are very
nice little analog

843
00:55:08,180 --> 00:55:12,300
implementations of this where
it turns out the sum-product

844
00:55:12,300 --> 00:55:15,700
update rule is extremely
amenable to transistor

845
00:55:15,700 --> 00:55:20,540
implementation nonlinear rule.

846
00:55:20,540 --> 00:55:26,590
So you just put these in, little
computing nodes, and

847
00:55:26,590 --> 00:55:30,090
you put whatever you received
on here as the received

848
00:55:30,090 --> 00:55:31,900
symbols, and you let it go.

849
00:55:31,900 --> 00:55:36,860
And it computes incredibly
quickly for

850
00:55:36,860 --> 00:55:38,260
a cycle-free graph.

851
00:55:41,451 --> 00:55:47,160
Where we're going to go is we're
going to say, well, gee,

852
00:55:47,160 --> 00:55:49,790
we got this nice little local
rule that we can implement

853
00:55:49,790 --> 00:55:53,160
with local computers for the
sum-product algorithm.

854
00:55:53,160 --> 00:55:57,160
Maybe it'll work if we do it
on a graph with cycles.

855
00:55:57,160 --> 00:56:00,240
And on a graph with cycles
this kind of parallel or

856
00:56:00,240 --> 00:56:03,590
flooding schedule, where every
little computer is computing

857
00:56:03,590 --> 00:56:07,520
something at every instant of
time, is probably the most

858
00:56:07,520 --> 00:56:09,140
popular schedule.

859
00:56:09,140 --> 00:56:12,366
And again, you build an analog
implementation of that and it

860
00:56:12,366 --> 00:56:16,270
goes extremely fast and it
converges to something.

861
00:56:16,270 --> 00:56:20,740
It just relaxes to something,
there's a local equilibrium in

862
00:56:20,740 --> 00:56:23,080
all these constraint nodes.

863
00:56:23,080 --> 00:56:27,880
And therefore, some global APP
vector that satisfies the

864
00:56:27,880 --> 00:56:29,130
whole thing.

865
00:56:34,640 --> 00:56:37,820
I think this is a nice attribute
of this kind of

866
00:56:37,820 --> 00:56:40,670
graphical realization.

867
00:56:40,670 --> 00:56:45,720
All right, a particular
example of how

868
00:56:45,720 --> 00:56:49,010
this schedule works.

869
00:56:49,010 --> 00:56:51,570
The cycle-free realization
that we care most

870
00:56:51,570 --> 00:56:55,050
about is the trellis.

871
00:56:55,050 --> 00:56:59,920
The BCJR algorithm is simply
the implementation of the

872
00:56:59,920 --> 00:57:03,616
sum-product algorithm
on a trellis.

873
00:57:03,616 --> 00:57:07,310
SP on a trellis.

874
00:57:07,310 --> 00:57:13,010
So generically, what does
a trellis look like?

875
00:57:13,010 --> 00:57:18,610
I guess I should allow for some
nontrivial code here.

876
00:57:18,610 --> 00:57:21,250
So all trellises have this
same boring graph.

877
00:57:33,330 --> 00:57:35,750
What is this schedule?

878
00:57:35,750 --> 00:57:39,350
It's a cycle-free graph, so we
could operate according to

879
00:57:39,350 --> 00:57:44,670
this nice controlled schedule,
compute everything just once.

880
00:57:44,670 --> 00:57:46,040
And how does that operate?

881
00:57:46,040 --> 00:57:51,320
We get intrinsic information
that's measured at each of

882
00:57:51,320 --> 00:57:53,290
these symbols here.

883
00:57:53,290 --> 00:57:56,400
What did we receive
on the channel?

884
00:57:56,400 --> 00:58:02,280
We put that through a constraint
code and we get

885
00:58:02,280 --> 00:58:04,450
here, let's give this a name.

886
00:58:04,450 --> 00:58:06,870
This is called intrinsic 0.

887
00:58:06,870 --> 00:58:13,440
Here is alpha 1, which is just
a function of the intrinsic

888
00:58:13,440 --> 00:58:15,350
information at time zero.

889
00:58:15,350 --> 00:58:17,960
And we combine that with the
intrinsic information at time

890
00:58:17,960 --> 00:58:22,460
one and we get alpha
2, and so forth.

891
00:58:22,460 --> 00:58:26,810
So we get a forward-going
path down the trellis.

892
00:58:26,810 --> 00:58:31,170
If any of you knows what a
Kalman filter or a smoother

893
00:58:31,170 --> 00:58:34,310
does, this is really exactly
the same thing.

894
00:58:34,310 --> 00:58:39,690
This is finding the conditional
APP probabilities

895
00:58:39,690 --> 00:58:42,600
of this state vector given
everything in the past.

896
00:58:45,335 --> 00:58:47,760
In Kalman filtering or
smoothing, it's done for

897
00:58:47,760 --> 00:58:48,830
Gaussian vectors.

898
00:58:48,830 --> 00:58:51,480
Here we're talking about
discrete vectors.

899
00:58:51,480 --> 00:58:54,630
But in principle, it's doing
exactly the same thing.

900
00:58:54,630 --> 00:58:57,880
It's computing conditional
probabilities given everything

901
00:58:57,880 --> 00:58:59,810
observed in the past.

902
00:58:59,810 --> 00:59:02,830
And so the diameter here
is just the length

903
00:59:02,830 --> 00:59:04,080
of the trellis basically.

904
00:59:06,580 --> 00:59:10,740
So this is i n minus 1, i n.

905
00:59:10,740 --> 00:59:14,040
And here we have alpha n.

906
00:59:18,210 --> 00:59:21,490
When we have alpha n finally
in here, we can compute the

907
00:59:21,490 --> 00:59:27,330
extrinsic information based on
alpha n as just extrinsic n.

908
00:59:27,330 --> 00:59:33,310
We combine these two and we're
done for that guy.

909
00:59:33,310 --> 00:59:38,600
Meanwhile, we have to compute
the backward-going information

910
00:59:38,600 --> 00:59:40,270
on each of these branches.

911
00:59:40,270 --> 00:59:44,370
So we compute first
b n based on i n.

912
00:59:44,370 --> 00:59:47,910
Then we get b n minus 1
based on i n minus 1

913
00:59:47,910 --> 00:59:50,910
and b n, and so forth.

914
00:59:50,910 --> 00:59:56,650
We get a backward-going
propagation of conditional

915
00:59:56,650 --> 00:59:57,330
information.

916
00:59:57,330 --> 01:00:01,300
Each of these betas represents
the a-posteriori probability

917
01:00:01,300 --> 01:00:04,640
vector given the
future from it.

918
01:00:04,640 --> 01:00:07,480
And finally, when we get down
to here we can compute

919
01:00:07,480 --> 01:00:12,850
extrinsic 0 from data 1.

920
01:00:12,850 --> 01:00:16,800
And by this time, when we have
both alpha 1 and beta 2 coming

921
01:00:16,800 --> 01:00:21,100
into here, we can, of course,
get extrinsic 1.

922
01:00:21,100 --> 01:00:23,850
When we get both the alpha and
beta coming in here, we can

923
01:00:23,850 --> 01:00:26,380
get extrinsic 2, and so forth.

924
01:00:32,800 --> 01:00:36,292
Of course, you can write
this down as equations.

925
01:00:36,292 --> 01:00:39,060
If you want to read the
equations, you look in the

926
01:00:39,060 --> 01:00:47,190
original Bahl, Cocke, Jelinek,
and Raviv article in 1973.

927
01:00:47,190 --> 01:00:54,300
This was again, a kind of lost
paper because just for

928
01:00:54,300 --> 01:00:57,180
decoding a trellis, the
Viterbi algorithm

929
01:00:57,180 --> 01:01:00,640
is much less complex.

930
01:01:00,640 --> 01:01:05,240
And gives you approximately
the same thing.

931
01:01:05,240 --> 01:01:09,590
I mean, I talked last time how
probably what you really want

932
01:01:09,590 --> 01:01:12,680
to do in decoding a block code
is maximum likelihood.

933
01:01:12,680 --> 01:01:16,090
That's minimum probability of
error in a block-wise basis.

934
01:01:18,910 --> 01:01:22,650
What APP decoding gives you
minimum probability of error.

935
01:01:22,650 --> 01:01:24,760
If you then make a decision
based on each of these

936
01:01:24,760 --> 01:01:29,620
extrinsic information, or on the
combination of these two

937
01:01:29,620 --> 01:01:33,000
to get the overall APP vector,
that's called a map algorithm,

938
01:01:33,000 --> 01:01:35,600
maximum a-posteriori
probability.

939
01:01:35,600 --> 01:01:37,460
If you make a decision based
on that, that will give you

940
01:01:37,460 --> 01:01:39,650
the minimal bit error
probability, but it may not

941
01:01:39,650 --> 01:01:42,840
necessarily give you a code
word, and it will not minimize

942
01:01:42,840 --> 01:01:45,250
the probability of making any
error, which is generally what

943
01:01:45,250 --> 01:01:47,690
you want to minimize.

944
01:01:47,690 --> 01:01:51,180
So for trellis decoding
of block codes

945
01:01:51,180 --> 01:01:53,410
this was not favored.

946
01:01:53,410 --> 01:01:57,640
However, a block code as a
component of a larger code,

947
01:01:57,640 --> 01:02:02,630
you want the APP vector to feed
off to something else.

948
01:02:02,630 --> 01:02:10,350
And so with the advent of much
larger codes, of which block

949
01:02:10,350 --> 01:02:14,040
codes are components or
convolutional codes, BCJR is

950
01:02:14,040 --> 01:02:15,800
the way you want to go.

951
01:02:15,800 --> 01:02:18,660
And some people say it's about
three times as complicated as

952
01:02:18,660 --> 01:02:19,910
the Viterbi algorithm.

953
01:02:22,450 --> 01:02:25,730
All right, so you'll have an
exercise with doing that in

954
01:02:25,730 --> 01:02:26,980
the homework.

955
01:02:29,280 --> 01:02:33,880
There is a closely related
algorithm called the min-sum

956
01:02:33,880 --> 01:02:38,420
algorithm, or the max-product
algorithm.

957
01:02:38,420 --> 01:02:45,880
And the idea of that is that you
can really carry through

958
01:02:45,880 --> 01:02:52,010
the same logical computations
except when you get to one of

959
01:02:52,010 --> 01:02:54,720
these combining operations,
rather than doing a sum of

960
01:02:54,720 --> 01:03:00,750
products, you can do
a max of products.

961
01:03:00,750 --> 01:03:04,540
And that the Cartesian product
law still holds, the same kind

962
01:03:04,540 --> 01:03:07,950
of decompositions that
we have still hold.

963
01:03:07,950 --> 01:03:21,330
For instance, in this example,
to get the state 0, 0, 0

964
01:03:21,330 --> 01:03:24,990
before what we said we're going
to get an APP vector

965
01:03:24,990 --> 01:03:36,490
where this is the past APP for
the 0, 0 value of the middle

966
01:03:36,490 --> 01:03:37,270
state vector.

967
01:03:37,270 --> 01:03:42,950
We simply combine the likelihood
weights of the two

968
01:03:42,950 --> 01:03:46,660
possible ways of getting to that
state and we sum them.

969
01:03:49,270 --> 01:03:52,575
Instead of doing sum,
let's just take max.

970
01:03:55,470 --> 01:03:59,044
The maximum probability
of getting--

971
01:03:59,044 --> 01:04:02,340
in other words, what's the
best way of getting here?

972
01:04:02,340 --> 01:04:04,180
Same question as we ask in
the Viterbi algorithm.

973
01:04:08,520 --> 01:04:13,610
We'll let that be the likelihood
weight for the

974
01:04:13,610 --> 01:04:14,420
state vector.

975
01:04:14,420 --> 01:04:16,870
Otherwise, the logic is
exactly the same.

976
01:04:16,870 --> 01:04:22,040
Similarly, coming this way,
what's the max of these two?

977
01:04:22,040 --> 01:04:24,930
OK, what's the best way of
getting to the state vector in

978
01:04:24,930 --> 01:04:27,970
terms of likelihood
from the future?

979
01:04:27,970 --> 01:04:31,880
And what do you know, if we
combine these things at this

980
01:04:31,880 --> 01:04:36,130
point, we now have a past vector
and a future vector.

981
01:04:36,130 --> 01:04:42,890
Let's take the max of this
times the max of that.

982
01:04:42,890 --> 01:04:45,490
That gives us the maximum
likelihood way of going

983
01:04:45,490 --> 01:04:48,175
through the 0, 0 value
of the state.

984
01:04:54,720 --> 01:04:58,370
If we do exactly the same
logic for decoding this

985
01:04:58,370 --> 01:05:01,220
trellis, we get a two-way
algorithm.

986
01:05:01,220 --> 01:05:02,060
It's like this.

987
01:05:02,060 --> 01:05:07,350
It's like the BCJR algorithm,
where at every point we're

988
01:05:07,350 --> 01:05:11,060
computing the maximum
likelihood.

989
01:05:11,060 --> 01:05:15,020
It's not hard to show that what
this gives you is that

990
01:05:15,020 --> 01:05:17,640
each of these symbols out here,
it gives you the symbol

991
01:05:17,640 --> 01:05:20,660
that belongs to the maximum
likelihood code word.

992
01:05:24,990 --> 01:05:27,075
How does it differ from
the Viterbi algorithm?

993
01:05:27,075 --> 01:05:28,800
It gives you the same answer.

994
01:05:28,800 --> 01:05:32,550
It gives you the symbols, one
by one, of the maximum

995
01:05:32,550 --> 01:05:35,640
likelihood code word.

996
01:05:35,640 --> 01:05:37,760
It doesn't have to remember
anything.

997
01:05:37,760 --> 01:05:41,210
It doesn't store survivors.

998
01:05:41,210 --> 01:05:42,520
That's its advantage.

999
01:05:42,520 --> 01:05:45,220
It's disadvantage is it's
a two-way algorithm.

1000
01:05:45,220 --> 01:05:48,320
You have to start from both
ends and propagate your

1001
01:05:48,320 --> 01:05:51,310
information in the same way.

1002
01:05:51,310 --> 01:05:53,640
And for block codes,
maybe that's OK.

1003
01:05:53,640 --> 01:05:55,870
For convolutional codes that's
certainly not something you

1004
01:05:55,870 --> 01:05:59,770
want to do because the code word
may go on indefinitely.

1005
01:05:59,770 --> 01:06:05,350
So the Viterbi algorithm gets
rid of the backward step by

1006
01:06:05,350 --> 01:06:07,785
always remembering
the survivor.

1007
01:06:07,785 --> 01:06:12,310
It remembers the history of
how it got to where it is.

1008
01:06:12,310 --> 01:06:18,100
So that once you get the max,
you don't have to--

1009
01:06:18,100 --> 01:06:20,870
you simply read it out
rather than having--

1010
01:06:20,870 --> 01:06:27,270
this backward part amounts to
a trace back operation.

1011
01:06:27,270 --> 01:06:30,980
That's probably much too
quick to be absorbed.

1012
01:06:30,980 --> 01:06:36,460
But I just wanted to mention
there is a max product

1013
01:06:36,460 --> 01:06:41,230
version, which if you take log
likelihoods becomes a max-sum.

1014
01:06:41,230 --> 01:06:42,480
Or if you take negative log

1015
01:06:42,480 --> 01:06:44,220
likelihoods it becomes min-sum.

1016
01:06:44,220 --> 01:06:46,760
So this is called the
min-sum algorithm.

1017
01:06:46,760 --> 01:06:51,030
And it does maximum likelihood
decoding

1018
01:06:51,030 --> 01:06:52,280
rather than APP decoding.

1019
01:06:54,690 --> 01:06:59,000
And sometimes it's used as
an approximation or very

1020
01:06:59,000 --> 01:07:04,378
intermediate approximations
between these two.

1021
01:07:04,378 --> 01:07:05,628
AUDIENCE:
[UNINTELLIGIBLE PHRASE]

1022
01:07:11,100 --> 01:07:16,830
PROFESSOR: No, if you use the
min-sum, basically everybody

1023
01:07:16,830 --> 01:07:20,980
will converge on the same
maximum likelihood code word.

1024
01:07:20,980 --> 01:07:24,790
So all these constraints will
be consistent with--

1025
01:07:24,790 --> 01:07:27,110
you'd be taking a single
max at each point.

1026
01:07:27,110 --> 01:07:30,130
But at each point you will
solve for the maximum

1027
01:07:30,130 --> 01:07:31,150
likelihood code word.

1028
01:07:31,150 --> 01:07:34,530
And what you'll see up here is
the symbol that belongs to the

1029
01:07:34,530 --> 01:07:36,330
maximum likelihood code word.

1030
01:07:36,330 --> 01:07:39,710
This is maybe a bit of a miracle
when you first see it,

1031
01:07:39,710 --> 01:07:44,060
but you'll independently get
each of the bits in the

1032
01:07:44,060 --> 01:07:46,770
maximum likelihood code word.

1033
01:07:46,770 --> 01:07:47,560
Check it out at home.

1034
01:07:47,560 --> 01:07:51,430
AUDIENCE: Why not
always do this?

1035
01:07:51,430 --> 01:07:54,080
PROFESSOR: Why not
always do that?

1036
01:07:54,080 --> 01:08:00,450
It turns out that we want softer
decisions when this is

1037
01:08:00,450 --> 01:08:03,230
part of a big graph.

1038
01:08:03,230 --> 01:08:07,820
We actually want the APPs, the
relative probabilities of each

1039
01:08:07,820 --> 01:08:10,010
of the values.

1040
01:08:10,010 --> 01:08:13,710
The max, in effect, is making
a hard decision.

1041
01:08:13,710 --> 01:08:16,560
AUDIENCE:
[UNINTELLIGIBLE PHRASE]

1042
01:08:16,560 --> 01:08:17,149
PROFESSOR: Yeah, all right.

1043
01:08:17,149 --> 01:08:18,819
It does give reliability
information.

1044
01:08:23,340 --> 01:08:25,620
I'm not sure I can give you a
conclusive answer to that

1045
01:08:25,620 --> 01:08:28,390
question except APP
works better.

1046
01:08:28,390 --> 01:08:31,569
That's a more empirical
answer.

1047
01:08:31,569 --> 01:08:32,934
Yes?

1048
01:08:32,934 --> 01:08:34,184
AUDIENCE:
[UNINTELLIGIBLE PHRASE]

1049
01:08:42,089 --> 01:08:46,520
Where does that flexibility
of the input symbols being

1050
01:08:46,520 --> 01:08:47,770
[UNINTELLIGIBLE PHRASE].

1051
01:08:52,259 --> 01:08:55,745
We should be able to take
that into account

1052
01:08:55,745 --> 01:08:56,995
[UNINTELLIGIBLE PHRASE].

1053
01:09:04,979 --> 01:09:06,374
PROFESSOR: Yeah.

1054
01:09:06,374 --> 01:09:18,090
OK, well, if the input symbols
are independently not

1055
01:09:18,090 --> 01:09:26,819
equiprobable, you can
feed that in as--

1056
01:09:26,819 --> 01:09:31,569
you know, once they came out
of the channel, once we see

1057
01:09:31,569 --> 01:09:35,399
the channel information,
the two symbols are not

1058
01:09:35,399 --> 01:09:37,170
equiprobable anymore.

1059
01:09:37,170 --> 01:09:43,220
So if somehow they a priori
independently were not equally

1060
01:09:43,220 --> 01:09:45,689
probable, you could just feed
that in as part of this

1061
01:09:45,689 --> 01:09:48,880
intrinsic information vector.

1062
01:09:48,880 --> 01:09:54,720
That's almost never the
case in coding.

1063
01:09:54,720 --> 01:09:58,970
The intrinsic information is
sometimes regarded as a-priori

1064
01:09:58,970 --> 01:10:04,360
and this has a-posteriori, or
the combination of the two of

1065
01:10:04,360 --> 01:10:08,690
them is then a-posteriori when
this goes off somewhere else.

1066
01:10:11,510 --> 01:10:17,540
But it doesn't really mean that
the symbol itself was--

1067
01:10:17,540 --> 01:10:22,170
it means that the evidence is
biased one way or the other.

1068
01:10:22,170 --> 01:10:25,980
And when this appears somewhere
else in the graph,

1069
01:10:25,980 --> 01:10:29,860
this will now appear somewhere
down in some other graph.

1070
01:10:29,860 --> 01:10:32,660
It's called the a-priori
information, but what it is is

1071
01:10:32,660 --> 01:10:35,920
the a-posteriori information
given all of the received

1072
01:10:35,920 --> 01:10:39,018
symbols in this part
of the graph.

1073
01:10:39,018 --> 01:10:40,268
AUDIENCE:
[UNINTELLIGIBLE PHRASE]

1074
01:10:43,730 --> 01:10:49,320
Does the i0 of that other
graph contain this table

1075
01:10:49,320 --> 01:10:53,140
sharing information
added to it?

1076
01:10:53,140 --> 01:10:56,720
PROFESSOR: Yeah, the i0 for
another graph is just the e0

1077
01:10:56,720 --> 01:10:57,970
of this graph.

1078
01:11:01,710 --> 01:11:03,970
I mean, if you consider it
all as one big graph,

1079
01:11:03,970 --> 01:11:04,740
that's what you want.

1080
01:11:04,740 --> 01:11:07,560
You want the message that goes
in this direction, which

1081
01:11:07,560 --> 01:11:11,210
basically summarizes all of the
inputs from this graph.

1082
01:11:14,970 --> 01:11:18,460
Generally, you'd have a little
equals node here.

1083
01:11:18,460 --> 01:11:23,160
You'd have the actual intrinsic
information coming

1084
01:11:23,160 --> 01:11:25,100
from the outside world there.

1085
01:11:25,100 --> 01:11:28,520
You would compute the
e0 from this graph.

1086
01:11:28,520 --> 01:11:32,975
And at this point, you would
combine e0 and i0, and that

1087
01:11:32,975 --> 01:11:34,410
would go around here.

1088
01:11:40,400 --> 01:11:45,460
So the very last page of chapter
12 just says, OK,

1089
01:11:45,460 --> 01:11:48,420
we've got a nice, clean, finite
exact algorithm that

1090
01:11:48,420 --> 01:11:52,120
will compute all these APPs
on a cycle-free graph.

1091
01:11:52,120 --> 01:11:54,720
Now, suppose we're faced with
a graph that has cycles.

1092
01:11:57,650 --> 01:11:59,490
And here's the situation.

1093
01:11:59,490 --> 01:12:01,405
Suppose the graph has
cycles in it.

1094
01:12:04,220 --> 01:12:10,290
Then first thing that might
occur to you, at least after

1095
01:12:10,290 --> 01:12:14,390
this development is, well, we
now have a local rule for

1096
01:12:14,390 --> 01:12:18,220
updating at each of these
computational nodes, each of

1097
01:12:18,220 --> 01:12:21,280
these constraint nodes, in
this representation.

1098
01:12:24,150 --> 01:12:26,960
Why don't we just let it rip?

1099
01:12:26,960 --> 01:12:28,210
See what happens.

1100
01:12:30,640 --> 01:12:34,630
We're going to put in some
intrinsic information at the

1101
01:12:34,630 --> 01:12:37,360
outputs as before.

1102
01:12:37,360 --> 01:12:42,490
We can compute some output
message here based on-- well,

1103
01:12:42,490 --> 01:12:47,420
eventually we're going to get
some input message from here.

1104
01:12:47,420 --> 01:12:50,520
And we can compute some output
message here based on all of

1105
01:12:50,520 --> 01:12:55,940
these and we'll pass
that over.

1106
01:12:55,940 --> 01:12:59,760
Let's assume what's called
a parallel schedule or a

1107
01:12:59,760 --> 01:13:03,770
flooding schedule, where in each
cycle every one of these

1108
01:13:03,770 --> 01:13:05,210
little guys does its thing.

1109
01:13:05,210 --> 01:13:09,740
It takes all the inputs that it
has available at that time

1110
01:13:09,740 --> 01:13:10,990
and it generates outputs.

1111
01:13:14,460 --> 01:13:19,160
Well, hope for the best.

1112
01:13:19,160 --> 01:13:21,540
It will compute something.

1113
01:13:21,540 --> 01:13:26,340
You can sort of intuitively
see that it'll settle.

1114
01:13:26,340 --> 01:13:31,060
It'll converge, perhaps, into
some kind of equilibrium.

1115
01:13:31,060 --> 01:13:33,685
Or hopefully it will converge
into some kind of equilibrium.

1116
01:13:36,900 --> 01:13:41,800
There is a basic fallacy here,
which is that the information

1117
01:13:41,800 --> 01:13:46,960
that comes in here is used to
compute part of this message.

1118
01:13:46,960 --> 01:13:49,990
Which is used to compute
part of this message.

1119
01:13:49,990 --> 01:13:53,540
Which ultimately comes back and
is used to compute part of

1120
01:13:53,540 --> 01:13:54,190
this message.

1121
01:13:54,190 --> 01:14:00,400
And then part of this message
again, so that information

1122
01:14:00,400 --> 01:14:06,590
goes around in cycles and it
tends to be used again and

1123
01:14:06,590 --> 01:14:07,450
again and again.

1124
01:14:07,450 --> 01:14:10,320
Of course, this tends
to reinforce

1125
01:14:10,320 --> 01:14:12,190
previously computed messages.

1126
01:14:12,190 --> 01:14:13,790
It tends to make you
overconfident.

1127
01:14:16,330 --> 01:14:23,810
You heard some rumor and then
the rumor goes all around the

1128
01:14:23,810 --> 01:14:26,260
class and it comes back
and you hear it again.

1129
01:14:26,260 --> 01:14:30,010
And that tends to confirm
the rumor.

1130
01:14:30,010 --> 01:14:34,500
So it's that kind of telephone
situation, and it's not really

1131
01:14:34,500 --> 01:14:37,190
independent information.

1132
01:14:37,190 --> 01:14:40,360
Intuitively, makes sense
that if all of the

1133
01:14:40,360 --> 01:14:42,870
cycles are very large--

1134
01:14:46,230 --> 01:14:51,970
in other words, the rumor has to
go to China and back before

1135
01:14:51,970 --> 01:14:52,970
you get it.

1136
01:14:52,970 --> 01:14:56,610
That it probably is highly
attenuated by

1137
01:14:56,610 --> 01:14:57,490
the time you get.

1138
01:14:57,490 --> 01:15:01,120
It's mixed up with all kinds of
other information, so maybe

1139
01:15:01,120 --> 01:15:05,590
it doesn't hurt you too much if
the cycles are very large.

1140
01:15:05,590 --> 01:15:12,100
And in fact, that's the way
these capacity approaching

1141
01:15:12,100 --> 01:15:14,040
codes are designed.

1142
01:15:14,040 --> 01:15:17,630
Whereas, as I mentioned last
time, if you're in a physical

1143
01:15:17,630 --> 01:15:23,010
situation where basically your
graph looks something like

1144
01:15:23,010 --> 01:15:26,800
this, you don't have the freedom
to make your cycles

1145
01:15:26,800 --> 01:15:30,520
large, then this is probably
a bad idea.

1146
01:15:30,520 --> 01:15:33,660
And this isn't why in fields,
like vision, for instance,

1147
01:15:33,660 --> 01:15:35,980
people try to use this--

1148
01:15:35,980 --> 01:15:38,750
the sum-product algorithm is
the belief propagation

1149
01:15:38,750 --> 01:15:40,810
algorithm, if you've
ever heard of that.

1150
01:15:40,810 --> 01:15:44,070
Widely used in inference
on Bayesian networks.

1151
01:15:44,070 --> 01:15:49,430
And the religion in belief
propagation used to be that

1152
01:15:49,430 --> 01:15:52,260
you simply can't use belief
propagation on graphs that

1153
01:15:52,260 --> 01:15:53,750
have cycles.

1154
01:15:53,750 --> 01:15:54,390
Why?

1155
01:15:54,390 --> 01:15:58,680
Because most of the
graphs they dealt

1156
01:15:58,680 --> 01:15:59,910
with look like this.

1157
01:15:59,910 --> 01:16:01,880
And in fact, it works
terribly on that.

1158
01:16:04,850 --> 01:16:08,090
It was a great shock on that
field when the coding people

1159
01:16:08,090 --> 01:16:11,640
came along and said, well, we
use belief propagation on

1160
01:16:11,640 --> 01:16:14,710
graphs with cycles and
it works great.

1161
01:16:14,710 --> 01:16:19,580
And this, as I understand it,
really had an enormous impact

1162
01:16:19,580 --> 01:16:23,020
on the belief propagation
community.

1163
01:16:23,020 --> 01:16:26,150
But it was realized that gee,
coding people have it nice.

1164
01:16:26,150 --> 01:16:27,400
You make your own graphs.

1165
01:16:29,706 --> 01:16:34,230
And you make them so this is
going to be a very low order,

1166
01:16:34,230 --> 01:16:36,540
low impact event.

1167
01:16:36,540 --> 01:16:38,480
And that's right.

1168
01:16:38,480 --> 01:16:41,020
But coding we do have
this freedom to

1169
01:16:41,020 --> 01:16:42,270
design our own graphs.

1170
01:16:44,860 --> 01:16:48,490
We realize from the basic
arguments that I put up before

1171
01:16:48,490 --> 01:16:52,080
that we're really going to
need to have cycles.

1172
01:16:52,080 --> 01:16:54,720
As long as we have cycle-free
graphs, we're going to get

1173
01:16:54,720 --> 01:16:57,810
this kind of exponentially
increasing complexity, which

1174
01:16:57,810 --> 01:17:00,250
is characteristic of
trellis graphs.

1175
01:17:00,250 --> 01:17:04,480
So we're going to need cycles,
but as long as the cycles are

1176
01:17:04,480 --> 01:17:07,610
very long and attenuated, as
long as the girth of the graph

1177
01:17:07,610 --> 01:17:09,860
is great, maybe we can
get away with it.

1178
01:17:09,860 --> 01:17:14,030
And that, in fact, is the way
the story has developed.

1179
01:17:14,030 --> 01:17:18,960
And Gallager basically saw this
back in 1961, but it took

1180
01:17:18,960 --> 01:17:22,400
a long time for it to catch.

1181
01:17:22,400 --> 01:17:26,690
So I didn't even get
into chapter 13.

1182
01:17:26,690 --> 01:17:31,220
Chapter 13 we're going to first
just introduce several

1183
01:17:31,220 --> 01:17:34,450
classes, the most popular
classes of capacity

1184
01:17:34,450 --> 01:17:36,140
approaching codes.

1185
01:17:36,140 --> 01:17:42,370
And then I'm going to give
you some real performance

1186
01:17:42,370 --> 01:17:46,420
analysis, how you would design,

1187
01:17:46,420 --> 01:17:50,060
simulate, in these codes.

1188
01:17:50,060 --> 01:17:52,690
And it's what people
actually use.

1189
01:17:52,690 --> 01:17:54,440
OK, see you next time.