1
00:03:10,618 --> 00:03:11,117
PROFESSOR: I'm sorry.

2
00:03:11,117 --> 00:03:14,380
I have to give this to you.

3
00:03:14,380 --> 00:03:15,777
Basically OK, except you
didn't understand

4
00:03:15,777 --> 00:03:17,027
it the first time.

5
00:03:19,730 --> 00:03:20,060
OK.

6
00:03:20,060 --> 00:03:20,600
Good morning.

7
00:03:20,600 --> 00:03:21,965
Happy spring.

8
00:03:21,965 --> 00:03:24,290
Happy April.

9
00:03:24,290 --> 00:03:26,370
We're running a little late.

10
00:03:26,370 --> 00:03:29,850
I'll try to finish chapter
nine today and start on

11
00:03:29,850 --> 00:03:32,150
chapter ten.

12
00:03:32,150 --> 00:03:34,685
Ashish may have chapter
ten here later.

13
00:03:34,685 --> 00:03:37,920
If not, it'll be on the web.

14
00:03:37,920 --> 00:03:41,390
So we've been talking about
convolutional codes.

15
00:03:41,390 --> 00:03:44,710
Specifically rate 1/n
convolutional codes.

16
00:03:44,710 --> 00:03:46,960
One input n outputs.

17
00:03:46,960 --> 00:03:50,110
As you may imagine, there
are also rate k/n

18
00:03:50,110 --> 00:03:52,500
convolutional codes.

19
00:03:52,500 --> 00:03:55,250
Generalization to k
inputs n outputs.

20
00:03:55,250 --> 00:04:00,870
These have not been so much used
in practice, because in

21
00:04:00,870 --> 00:04:04,190
general, we're using these
binary codes in the power

22
00:04:04,190 --> 00:04:06,630
limited regime, and
we want low rates.

23
00:04:06,630 --> 00:04:08,770
So 1/n is a good rate.

24
00:04:08,770 --> 00:04:14,750
1/2, 1/3, 1/4, or down to 1/6
are common kinds of rates.

25
00:04:14,750 --> 00:04:17,079
So I haven't bothered to develop
the more elaborate

26
00:04:17,079 --> 00:04:19,709
theory for rate k/n codes.

27
00:04:19,709 --> 00:04:24,670
The rate 1/n codes,
just to review.

28
00:04:24,670 --> 00:04:29,200
The code is simply defined as
the set of all possible output

29
00:04:29,200 --> 00:04:34,050
sequences of a convolutional
encoder, whose N impulse

30
00:04:34,050 --> 00:04:39,580
responses are written as an
n-tuple g of d as the inputs

31
00:04:39,580 --> 00:04:41,750
range over all Laurent

32
00:04:41,750 --> 00:04:45,680
sequences, bi-infinite sequences.

33
00:04:45,680 --> 00:04:49,710
And what we showed last time
is that without loss of

34
00:04:49,710 --> 00:04:55,620
generality or optimality, we
could always take this g of d

35
00:04:55,620 --> 00:04:58,670
to be of a particular
canonical form.

36
00:04:58,670 --> 00:05:02,300
Namely, the g of d could be
taken to be a set of n

37
00:05:02,300 --> 00:05:08,900
polynomials, g_j of d, that
were relatively prime.

38
00:05:08,900 --> 00:05:13,610
And for any code, you can find
such a canonical encoder by

39
00:05:13,610 --> 00:05:18,990
its unique [UNINTELLIGIBLE] two
units, and specifies the

40
00:05:18,990 --> 00:05:21,790
code, clearly, by this.

41
00:05:21,790 --> 00:05:27,230
So that's a nice canonical
encoder to take.

42
00:05:27,230 --> 00:05:29,750
It furthermore has the property
that there's an

43
00:05:29,750 --> 00:05:32,030
obvious realization of
this encoder in the

44
00:05:32,030 --> 00:05:35,810
[UNINTELLIGIBLE] register form
with nu memory units.

45
00:05:35,810 --> 00:05:38,840
And therefore 2 to
the nu states.

46
00:05:38,840 --> 00:05:43,560
And we will prove in chapter ten
that this is the minimal

47
00:05:43,560 --> 00:05:45,480
possible encoder
for this code.

48
00:05:45,480 --> 00:05:48,050
In other words, there are lots
of different encoders that

49
00:05:48,050 --> 00:05:51,340
generate this code, but you
can't possibly encode it with

50
00:05:51,340 --> 00:05:58,422
fewer than the state space of
dimension less than nu.

51
00:05:58,422 --> 00:06:01,350
That's the way it's going
to sound in chapter ten.

52
00:06:01,350 --> 00:06:03,980
So we haven't quite got there
yet, but it's a minimal

53
00:06:03,980 --> 00:06:08,020
encoder, therefore,
in that sense.

54
00:06:08,020 --> 00:06:08,560
All right.

55
00:06:08,560 --> 00:06:13,560
Today we're going to go now and
exploit the finite state

56
00:06:13,560 --> 00:06:24,100
structure of this code, to get
a very efficient maximum

57
00:06:24,100 --> 00:06:27,930
likelihood sequence decoding
algorithm called the Viterbi

58
00:06:27,930 --> 00:06:30,470
algorithm, which I'm sure
you've all heard of.

59
00:06:30,470 --> 00:06:35,025
It's become very famous, not
only in this field, but really

60
00:06:35,025 --> 00:06:38,470
in any place where you're trying
to observe a finite

61
00:06:38,470 --> 00:06:41,130
state machine in memory-less
noise.

62
00:06:41,130 --> 00:06:45,210
A finite state hidden markov
model, if you like.

63
00:06:45,210 --> 00:06:49,100
And so now it's used, for
instance, in the detection of

64
00:06:49,100 --> 00:06:53,030
genomes, where the exons end and
the introns start and so

65
00:06:53,030 --> 00:06:55,940
forth, and the garbage is.

66
00:06:55,940 --> 00:06:59,970
And people use it who have no
idea that it was originally a

67
00:06:59,970 --> 00:07:02,560
digital communications
algorithm.

68
00:07:02,560 --> 00:07:04,190
But it's a very obvious
algorithm.

69
00:07:04,190 --> 00:07:07,850
It's come up in many different
forms, and Viterbi, in a way,

70
00:07:07,850 --> 00:07:09,880
was just lucky to get
his name on it.

71
00:07:09,880 --> 00:07:15,830
Because it would come up very
easily as soon as you pose the

72
00:07:15,830 --> 00:07:18,390
problem correctly.

73
00:07:18,390 --> 00:07:21,315
Let's start out with terminated

74
00:07:21,315 --> 00:07:22,685
convolutional codes.

75
00:07:27,920 --> 00:07:31,210
Which I forget whether I started
on this last time.

76
00:07:31,210 --> 00:07:34,460
I think I may have.

77
00:07:34,460 --> 00:07:36,580
But we'll start from
scratch again.

78
00:07:36,580 --> 00:07:43,130
When I say terminated, I mean
that we're going to take only

79
00:07:43,130 --> 00:07:46,370
a subset of the code words
that start at a

80
00:07:46,370 --> 00:07:47,500
certain time --

81
00:07:47,500 --> 00:07:49,350
say, time 0 --

82
00:07:49,350 --> 00:07:53,100
continue for some finite
time, and then end.

83
00:07:53,100 --> 00:07:56,830
And in this set up, with this
canonical encoder, it's

84
00:07:56,830 --> 00:08:05,640
polynomial, the easy way to
specify that is to let u of d

85
00:08:05,640 --> 00:08:10,380
be a finite sequence that
starts at time 0.

86
00:08:10,380 --> 00:08:13,570
That's called a polynomial.

87
00:08:13,570 --> 00:08:18,440
and let's restrict
its degree --

88
00:08:18,440 --> 00:08:23,680
degree u of d less than k.

89
00:08:23,680 --> 00:08:29,150
In other words, it looks like u0
plus u1 d plus so forth up

90
00:08:29,150 --> 00:08:33,260
to uk minus 1 d.

91
00:08:33,260 --> 00:08:38,150
Therefore I've chosen k, so I
really have k information bits

92
00:08:38,150 --> 00:08:39,400
or input bits.

93
00:08:42,960 --> 00:08:44,210
All right?

94
00:08:46,420 --> 00:08:49,730
So I specify u0 through
uk minus 1.

95
00:08:49,730 --> 00:08:52,320
I use that as the input
in my encoder.

96
00:08:52,320 --> 00:08:56,220
What is then my code,
my truncated code,

97
00:08:56,220 --> 00:09:00,250
let's call it ck --

98
00:09:00,250 --> 00:09:03,650
that's not a very good notation,
but we'll use it

99
00:09:03,650 --> 00:09:05,240
just for now --

100
00:09:05,240 --> 00:09:15,230
will be the set of all u of d,
g of d such that u of d is

101
00:09:15,230 --> 00:09:23,310
polynomial and the degree of
u of d is less than k.

102
00:09:26,570 --> 00:09:26,940
OK.

103
00:09:26,940 --> 00:09:30,110
So we ask what that code
is going to be.

104
00:09:30,110 --> 00:09:35,990
And let's take some
simple examples.

105
00:09:35,990 --> 00:09:40,700
Turns out when we terminate
codes, we always choose

106
00:09:40,700 --> 00:09:44,750
polynomial encoders, so the code
will naturally collapse

107
00:09:44,750 --> 00:09:47,730
back to the 0 state.

108
00:09:47,730 --> 00:09:49,120
How long?

109
00:09:49,120 --> 00:09:55,280
Nu time units after the
input stops coming in.

110
00:09:55,280 --> 00:09:59,480
The outputs will be all 0 at
that point and forevermore.

111
00:09:59,480 --> 00:10:02,870
The shift register will clear
itself out nu time units

112
00:10:02,870 --> 00:10:08,390
later, and then they'll
be all 0.

113
00:10:08,390 --> 00:10:12,020
So the code really starts
at time 0 and ends

114
00:10:12,020 --> 00:10:15,750
at time k plus nu.

115
00:10:15,750 --> 00:10:19,100
So we aren't going to worry, in
this case, whether the code

116
00:10:19,100 --> 00:10:22,030
is non-catastrophic or not.

117
00:10:22,030 --> 00:10:25,020
Here one of the principal
properties that we had was

118
00:10:25,020 --> 00:10:30,120
this property guaranteed
non-catastrophic.

119
00:10:34,458 --> 00:10:39,270
And in fact, is necessary
for non-catastrophicity.

120
00:10:39,270 --> 00:10:42,470
So we spent a lot of time
talking about why that would

121
00:10:42,470 --> 00:10:45,340
be a bad thing, to
be catastrophic.

122
00:10:45,340 --> 00:10:45,780
OK.

123
00:10:45,780 --> 00:10:54,980
But let's take a catastrophic
rate 1/1 encoder, which simply

124
00:10:54,980 --> 00:10:58,100
we have g of d equals
1 plus d.

125
00:10:58,100 --> 00:11:02,530
You remember this encoder
as nu equals 1.

126
00:11:02,530 --> 00:11:04,845
The input looks like this.

127
00:11:04,845 --> 00:11:08,210
uk, uk minus 1.

128
00:11:08,210 --> 00:11:12,390
And we simply add these, yk.

129
00:11:12,390 --> 00:11:19,680
So we get y of d equals
1 plus d u of d.

130
00:11:22,550 --> 00:11:22,890
OK.

131
00:11:22,890 --> 00:11:33,720
Now, if the input to this is a
finite sequence, a polynomial,

132
00:11:33,720 --> 00:11:38,440
let's see what this code
can possibly be.

133
00:11:38,440 --> 00:11:39,720
What the code's going
to look like.

134
00:11:39,720 --> 00:11:43,920
The code's going to start
in state 0, at time 0.

135
00:11:43,920 --> 00:11:47,420
So here's the time.

136
00:11:47,420 --> 00:11:51,260
At the first time, we can either
get a 0 in our a 1 in,

137
00:11:51,260 --> 00:11:53,865
and accordingly, it will
go to state 0 or 1.

138
00:11:57,110 --> 00:12:02,300
At the second time, we can go
to state 0 with another 0 or

139
00:12:02,300 --> 00:12:04,150
just state 1 with a 1.

140
00:12:04,150 --> 00:12:08,910
If we get a 1 in and we're in
state 1, then the output is

141
00:12:08,910 --> 00:12:12,260
going to be 0, and we'll
stay in state 1.

142
00:12:12,260 --> 00:12:16,140
If we get a 0 in, the output
will be a 1, and we'll go back

143
00:12:16,140 --> 00:12:17,430
to state 0.

144
00:12:17,430 --> 00:12:19,360
So this is what a
typical trellis

145
00:12:19,360 --> 00:12:21,210
section looks like here.

146
00:12:21,210 --> 00:12:23,060
We have all possible
transitions

147
00:12:23,060 --> 00:12:25,190
between the two states.

148
00:12:25,190 --> 00:12:29,380
And so now it goes on for
a while like this.

149
00:12:29,380 --> 00:12:31,820
Time invariantly.

150
00:12:31,820 --> 00:12:40,640
There are 1, dot dot dot, and
then finally, at some time --

151
00:12:40,640 --> 00:12:47,050
say this is time k, or maybe
it's time k minus 1 --

152
00:12:47,050 --> 00:12:50,000
time index is a little fuzzy
here -- we don't

153
00:12:50,000 --> 00:12:55,730
put any more 1s in.

154
00:12:55,730 --> 00:12:58,620
We only put 0s on from
that time forward.

155
00:12:58,620 --> 00:13:01,930
So we could have one more
transition back to state 0,

156
00:13:01,930 --> 00:13:06,920
and then after that it just
stays in state 0 forevermore.

157
00:13:06,920 --> 00:13:09,030
And this isn't very
interesting.

158
00:13:09,030 --> 00:13:12,650
It was in time 0 all the
time before here.

159
00:13:12,650 --> 00:13:15,570
It was in state 0 all the time
before there and put out 0s.

160
00:13:15,570 --> 00:13:19,410
It was in state 0 all the time
after here and put out 0s.

161
00:13:19,410 --> 00:13:23,780
And this is clearly not
conveying any information.

162
00:13:23,780 --> 00:13:25,250
Not an interesting
part of the code.

163
00:13:25,250 --> 00:13:28,750
So we're going to consider the
code just to be defined over

164
00:13:28,750 --> 00:13:36,140
these k plus 1 units of time.

165
00:13:36,140 --> 00:13:42,690
And so y of d we're going
to consider to be a

166
00:13:42,690 --> 00:13:44,650
polynomial of degree k.

167
00:13:51,840 --> 00:13:55,096
Now we're going to assume
that they're k bits in.

168
00:13:55,096 --> 00:13:57,800
Then we're going to wait one
more time to let the shift

169
00:13:57,800 --> 00:14:01,850
register clear out,
which is nu.

170
00:14:01,850 --> 00:14:06,095
And at that time, we're going
to take whatever y is, and

171
00:14:06,095 --> 00:14:07,490
then we're going to
terminate it.

172
00:14:07,490 --> 00:14:17,220
So if we consider this now, we
have a block code whose length

173
00:14:17,220 --> 00:14:22,800
is the number of non-zero
coefficients of y of d.

174
00:14:22,800 --> 00:14:28,450
So we have n equals k plus
1, we have k by design

175
00:14:28,450 --> 00:14:34,480
information bits, and
what's the minimum

176
00:14:34,480 --> 00:14:35,690
non-zero weight of this?

177
00:14:35,690 --> 00:14:37,000
It's linear.

178
00:14:37,000 --> 00:14:38,665
What's its minimum
non-zero weight,

179
00:14:38,665 --> 00:14:39,915
or its minimum distance?

180
00:14:42,614 --> 00:14:43,720
AUDIENCE: [INAUDIBLE]

181
00:14:43,720 --> 00:14:44,570
PROFESSOR: 1.

182
00:14:44,570 --> 00:14:47,300
Show me a code word
of weight 1.

183
00:14:47,300 --> 00:14:48,550
AUDIENCE: [INAUDIBLE]

184
00:14:54,425 --> 00:14:58,320
PROFESSOR: I claim the minimum
non-zero weight is 2 by

185
00:14:58,320 --> 00:15:00,580
typical argument.

186
00:15:00,580 --> 00:15:02,920
I have the all 0 sequence.

187
00:15:02,920 --> 00:15:06,140
If I ever leave the all 0
sequence, I accumulate 1 unit

188
00:15:06,140 --> 00:15:07,890
of Hamming weight.

189
00:15:07,890 --> 00:15:10,890
And I need to ultimately come
back to the all 0 sequence,

190
00:15:10,890 --> 00:15:13,370
because I've made everything
finite here.

191
00:15:13,370 --> 00:15:16,300
So in this case, I can make
the argument that I always

192
00:15:16,300 --> 00:15:18,120
have to come back to
the all 0 sequence.

193
00:15:18,120 --> 00:15:21,140
Whenever I merge back into
the all 0 sequence, I'll

194
00:15:21,140 --> 00:15:23,850
accumulate another
unit of weight.

195
00:15:23,850 --> 00:15:29,910
So this is a code with
minimum distance 2.

196
00:15:29,910 --> 00:15:32,090
Minimum non-zero weight 2.

197
00:15:32,090 --> 00:15:37,620
And in fact, it's just the
single parity-check code of

198
00:15:37,620 --> 00:15:41,845
length n equals k plus 1.

199
00:15:46,060 --> 00:15:47,230
OK?

200
00:15:47,230 --> 00:15:52,770
And if you ask yourself,
can you generate any --

201
00:15:52,770 --> 00:15:56,390
this is supposedly the
even weight code.

202
00:15:56,390 --> 00:16:00,730
It contains all even weight
k plus 1-tuples.

203
00:16:00,730 --> 00:16:04,380
And I think you can convince
yourself that you can generate

204
00:16:04,380 --> 00:16:11,050
any of these k plus 1 even
weight k-tuples.

205
00:16:11,050 --> 00:16:13,120
In fact, the generator
matrix for this code.

206
00:16:13,120 --> 00:16:13,380
Look.

207
00:16:13,380 --> 00:16:15,160
Here's a set of generators.

208
00:16:15,160 --> 00:16:17,120
1 1 is in the code.

209
00:16:17,120 --> 00:16:18,770
0 1 1 is in the code.

210
00:16:18,770 --> 00:16:21,020
0 0 1 1 in the code.

211
00:16:21,020 --> 00:16:24,480
So here's a set of generators.

212
00:16:24,480 --> 00:16:27,120
0 1 1.

213
00:16:27,120 --> 00:16:29,785
Generator matrix for the
code looks like this.

214
00:16:32,460 --> 00:16:34,040
We just go like that.

215
00:16:34,040 --> 00:16:37,870
And with these k generators,
you can generate any even

216
00:16:37,870 --> 00:16:41,030
weight code word.

217
00:16:41,030 --> 00:16:45,150
So we get a kind of
convolutional form to the

218
00:16:45,150 --> 00:16:46,270
generator matrix.

219
00:16:46,270 --> 00:16:50,920
A sliding parity-check form
to the generator matrix.

220
00:16:50,920 --> 00:16:51,650
OK.

221
00:16:51,650 --> 00:16:55,470
So I assert that this is the
single parity-check code.

222
00:16:55,470 --> 00:16:56,630
So here's a trellis

223
00:16:56,630 --> 00:16:59,790
representation for a block code.

224
00:16:59,790 --> 00:17:00,170
Yeah?

225
00:17:00,170 --> 00:17:01,420
AUDIENCE: [INAUDIBLE]

226
00:17:07,990 --> 00:17:11,609
PROFESSOR: I'm sure I've got the
time indices screwed up.

227
00:17:11,609 --> 00:17:15,390
But I put in k bits.

228
00:17:15,390 --> 00:17:17,270
I wait 1 more unit of time --

229
00:17:17,270 --> 00:17:18,730
nu equals 1 --

230
00:17:18,730 --> 00:17:20,089
for this to clear out.

231
00:17:20,089 --> 00:17:22,560
So I get out k plus 1 bits.

232
00:17:22,560 --> 00:17:24,250
Think of it in terms
of this here.

233
00:17:24,250 --> 00:17:29,390
I put in k, the last input you
think of as necessarily being

234
00:17:29,390 --> 00:17:34,088
0, and I take the outputs
here for k plus 1 times.

235
00:17:37,920 --> 00:17:38,420
OK.

236
00:17:38,420 --> 00:17:41,740
So first of all, a couple
points here.

237
00:17:41,740 --> 00:17:44,195
AUDIENCE: In general, the length
will be k plus nu?

238
00:17:44,195 --> 00:17:46,340
PROFESSOR: In general, length
will be k plus nu.

239
00:17:46,340 --> 00:17:48,250
So let's write that down.

240
00:17:52,620 --> 00:18:09,010
So in general, by terminating
a rate 1/n --

241
00:18:09,010 --> 00:18:11,550
nu is called the constraint
length --

242
00:18:11,550 --> 00:18:18,380
constraint length nu, or 2
to the nu state code --

243
00:18:18,380 --> 00:18:32,160
convolutional code after k
inputs, we get a binary linear

244
00:18:32,160 --> 00:18:36,480
block code whose parameters
are --

245
00:18:36,480 --> 00:18:40,010
we have k information bits.

246
00:18:40,010 --> 00:18:47,150
We have k plus nu non-trivial
output times.

247
00:18:47,150 --> 00:18:50,410
At each output time, I
get n output bits.

248
00:18:50,410 --> 00:18:57,070
So it's n times k plus nu
is the total effective

249
00:18:57,070 --> 00:19:00,020
length of the code.

250
00:19:00,020 --> 00:19:06,140
And the distance, we would have
to say, is greater than

251
00:19:06,140 --> 00:19:10,520
or equal to the distance of the
block code is greater than

252
00:19:10,520 --> 00:19:13,260
or equal to the distance of
the convolutional code.

253
00:19:13,260 --> 00:19:14,110
Why?

254
00:19:14,110 --> 00:19:22,490
Because all of the words in this
block code are actually

255
00:19:22,490 --> 00:19:25,380
sequences in the convolutional
code.

256
00:19:25,380 --> 00:19:30,960
I'm assuming here that the code
is not catastrophic, so

257
00:19:30,960 --> 00:19:33,660
it's sufficient to look at all
the finite sequences in the

258
00:19:33,660 --> 00:19:36,300
convolutional code.

259
00:19:36,300 --> 00:19:36,690
All right.

260
00:19:36,690 --> 00:19:40,390
And in general, the block code
distance is almost always

261
00:19:40,390 --> 00:19:42,540
going to be equal to the
convolutional code distance.

262
00:19:45,760 --> 00:19:48,442
And this is just for
finite code words.

263
00:19:54,710 --> 00:20:11,820
So as another example, suppose
we take our standard rate 1/2,

264
00:20:11,820 --> 00:20:18,550
nu equals 2, distance equals
5, convolutional code.

265
00:20:18,550 --> 00:20:21,910
Our example 1 that we've
been using all along.

266
00:20:21,910 --> 00:20:33,620
Suppose I terminate with
k equals 4, then I'm

267
00:20:33,620 --> 00:20:34,420
going to get what?

268
00:20:34,420 --> 00:20:44,760
I'm going to get a 2 times 6,
I'm going to get a 12, 4, 5

269
00:20:44,760 --> 00:20:46,370
binary linear block code.

270
00:20:51,170 --> 00:20:55,630
That is not so bad, actually.

271
00:20:55,630 --> 00:20:57,870
There's a point that I'm
not going to make

272
00:20:57,870 --> 00:20:59,570
very strongly here.

273
00:20:59,570 --> 00:21:08,717
But terminated convolutional
codes can be good block codes.

274
00:21:14,660 --> 00:21:18,920
And in fact, asymptotically,
by choosing the parameters

275
00:21:18,920 --> 00:21:24,960
correctly, you can show that you
can get a random ensemble

276
00:21:24,960 --> 00:21:28,960
of terminated convolutional
codes that is as good as the

277
00:21:28,960 --> 00:21:33,150
best possible random ensemble
of block codes.

278
00:21:33,150 --> 00:21:36,190
So terminated convolutional
codes is one way of

279
00:21:36,190 --> 00:21:41,860
constructing optimal block codes
in the asymptotic limit.

280
00:21:41,860 --> 00:21:45,250
These parameters aren't bad.

281
00:21:45,250 --> 00:21:47,670
Here's a rate 1/3 code
with distance 5.

282
00:21:47,670 --> 00:21:50,470
Not too long.

283
00:21:50,470 --> 00:21:53,110
You know, you might compare
it to what?

284
00:21:53,110 --> 00:21:56,110
The BCH code?

285
00:21:56,110 --> 00:22:03,650
There's a 15 7 5 BCH code,
which is clearly better,

286
00:22:03,650 --> 00:22:05,210
because it's higher rate.

287
00:22:05,210 --> 00:22:08,690
But if you shorten this code,
you have to shorten this by 3,

288
00:22:08,690 --> 00:22:10,490
this by 3 to keep
the distance.

289
00:22:10,490 --> 00:22:12,756
You would get the same code.

290
00:22:12,756 --> 00:22:18,690
So it's not optimum, but
it's not too bad.

291
00:22:18,690 --> 00:22:22,230
And furthermore, from this
construction, you get a

292
00:22:22,230 --> 00:22:30,030
trellis representation,
which in this case

293
00:22:30,030 --> 00:22:32,252
would look like this.

294
00:22:32,252 --> 00:22:36,140
It looks like the trellis that
we started to draw last time.

295
00:22:40,330 --> 00:22:41,090
Sorry.

296
00:22:41,090 --> 00:22:44,710
Doesn't look like that.

297
00:22:44,710 --> 00:22:48,755
This guy goes down here,
this guy comes here,

298
00:22:48,755 --> 00:22:50,860
this guy goes here.

299
00:22:50,860 --> 00:22:52,410
This continues for a while.

300
00:23:03,410 --> 00:23:06,950
And then finally, when we start
to enforce all zeros, we

301
00:23:06,950 --> 00:23:10,250
only have one possible output
from each of these.

302
00:23:13,976 --> 00:23:17,025
And we get a trellis that
looks like that.

303
00:23:17,025 --> 00:23:17,390
All right?

304
00:23:17,390 --> 00:23:19,270
Which is 1, 2, 3 --

305
00:23:19,270 --> 00:23:22,430
I did it correctly for
this code, actually.

306
00:23:22,430 --> 00:23:24,030
Because each of these
is going to have a

307
00:23:24,030 --> 00:23:26,034
2-tuple output on it.

308
00:23:26,034 --> 00:23:33,000
And that looked like 0,0
0,0 0,0 0,0 0,0.

309
00:23:33,000 --> 00:23:36,330
Each of the information bits
causes a two-way branch, no

310
00:23:36,330 --> 00:23:37,960
matter where you are
in the trellis.

311
00:23:37,960 --> 00:23:41,070
So here's 1, here's 2,
here's 3, here's 4.

312
00:23:41,070 --> 00:23:42,230
That's the end of them.

313
00:23:42,230 --> 00:23:47,480
We wait for nu equal 2 to let
the states converge back to 0.

314
00:23:47,480 --> 00:23:53,160
So that's a trellis
representation of this 12 4 5

315
00:23:53,160 --> 00:23:54,410
block code.

316
00:23:57,300 --> 00:23:59,270
OK.

317
00:23:59,270 --> 00:24:03,450
I want to talk about terminated
block codes,

318
00:24:03,450 --> 00:24:06,930
terminated trellis codes,
convolutional codes as block

319
00:24:06,930 --> 00:24:09,750
codes, for two reasons.

320
00:24:09,750 --> 00:24:13,960
One is to say we can get
block codes that way.

321
00:24:13,960 --> 00:24:20,130
But the other is to introduce
the Viterbi algorithm for

322
00:24:20,130 --> 00:24:21,845
terminated convolutional
codes.

323
00:24:25,730 --> 00:24:28,750
I think it's easier first to
look at how the algorithm

324
00:24:28,750 --> 00:24:32,390
works when you have a finite
trellis like this, and then

325
00:24:32,390 --> 00:24:36,800
we'll go on and say, well, how
would this work if we let the

326
00:24:36,800 --> 00:24:40,700
trellis become very long, or
in principle, infinite?

327
00:24:40,700 --> 00:24:42,510
Would the Viterbi algorithm
still work?

328
00:24:42,510 --> 00:24:46,130
And it becomes obvious
that it does.

329
00:24:46,130 --> 00:24:49,690
So this is going to be a maximum
likelihood sequence

330
00:24:49,690 --> 00:24:51,420
detection or decoding
algorithm.

331
00:25:00,020 --> 00:25:02,840
We're going to assume that each
time here, we get to see

332
00:25:02,840 --> 00:25:04,890
two things.

333
00:25:04,890 --> 00:25:07,100
Corresponding to the two
bits we transmit.

334
00:25:07,100 --> 00:25:12,940
So we transmit yk according
to which trellis

335
00:25:12,940 --> 00:25:14,390
branch we're on --

336
00:25:14,390 --> 00:25:19,330
y0 at this point -- and we
receive a 2-tuple r0.

337
00:25:19,330 --> 00:25:22,340
If we were on a binary input
additive white Gaussian

338
00:25:22,340 --> 00:25:30,400
channel, this would go into,
actually, the 2-PAM map, two

339
00:25:30,400 --> 00:25:36,470
plus or minus alphas, and be
received as two real numbers.

340
00:25:36,470 --> 00:25:38,680
And we do the same thing here.

341
00:25:38,680 --> 00:25:46,670
y1, we receive r1, and so forth,
up to yk plus nu, or k

342
00:25:46,670 --> 00:25:53,645
plus nu minus 1, I think it
is, rk plus nu minus 1.

343
00:25:53,645 --> 00:25:54,110
OK.

344
00:25:54,110 --> 00:25:56,930
So we have a transmit 2-tuple
and a received

345
00:25:56,930 --> 00:25:58,670
2-tuple every time.

346
00:25:58,670 --> 00:26:02,630
And we're going to assume that
the channel is memoryless so

347
00:26:02,630 --> 00:26:07,130
that the probabilities of
receiving the various possible

348
00:26:07,130 --> 00:26:11,310
received values, the
likelihoods, are independent

349
00:26:11,310 --> 00:26:13,230
from time unit to time unit.

350
00:26:13,230 --> 00:26:16,420
That's the only way
this works.

351
00:26:16,420 --> 00:26:26,330
Then if I want to get the total
probability of r given

352
00:26:26,330 --> 00:26:29,210
y, where now I'm talking about
over the whole block, this

353
00:26:29,210 --> 00:26:40,290
factors into the probabilities
of rk given yk, yes?

354
00:26:40,290 --> 00:26:42,380
That's what I'm going
to depend on.

355
00:26:42,380 --> 00:26:46,710
This is what's called the
memoryless condition,

356
00:26:46,710 --> 00:26:49,770
memoryless channel.

357
00:26:49,770 --> 00:26:54,200
The transition probabilities do
not depend on what's been

358
00:26:54,200 --> 00:26:57,530
sent or received in
previous blocks.

359
00:26:57,530 --> 00:27:03,970
Or it's more convenient to use
the minus log likelihood

360
00:27:03,970 --> 00:27:11,380
equals now the sum of the minus
log p of rk given yk.

361
00:27:14,030 --> 00:27:24,350
So maximum likelihood is
equivalent to minimize the

362
00:27:24,350 --> 00:27:25,600
neg-log likelihood.

363
00:27:33,280 --> 00:27:33,760
OK.

364
00:27:33,760 --> 00:27:39,040
I receive only one thing
in each time unit.

365
00:27:39,040 --> 00:27:42,840
And each trellis branch
corresponds to transmitting a

366
00:27:42,840 --> 00:27:44,860
specific thing.

367
00:27:44,860 --> 00:28:00,650
So I can label each trellis
branch with the appropriate

368
00:28:00,650 --> 00:28:06,970
minus log p of rk given yk.

369
00:28:06,970 --> 00:28:09,820
Do you see that?

370
00:28:09,820 --> 00:28:13,790
In other words, for this trellis
branch down here,

371
00:28:13,790 --> 00:28:21,520
which is associated with y2
equals 0 1, or s of y2 equals

372
00:28:21,520 --> 00:28:32,040
alpha minus alpha, suppose I
receive r2 equals r1, r2, I

373
00:28:32,040 --> 00:28:35,580
would label this by, say, the
Euclidean distance between

374
00:28:35,580 --> 00:28:40,580
what I transmitted for that
branch and what I received.

375
00:28:40,580 --> 00:28:44,870
For instance, on the additive
white Gaussian channel, this

376
00:28:44,870 --> 00:28:51,940
would simply be rk
minus s of yk.

377
00:28:51,940 --> 00:28:53,965
The Euclidean distance
squared.

378
00:28:56,500 --> 00:29:01,020
Or equivalently, minus the inner
product, the correlation

379
00:29:01,020 --> 00:29:04,460
between rk and s of yk.

380
00:29:04,460 --> 00:29:08,810
You know, these are equivalent
metrics.

381
00:29:08,810 --> 00:29:13,720
Or on a binary symmetric
channel, might be the Hamming

382
00:29:13,720 --> 00:29:17,600
distance between what I
received and what I

383
00:29:17,600 --> 00:29:19,760
transmitted, both of which
would be binary

384
00:29:19,760 --> 00:29:21,085
2-tuples, in this case.

385
00:29:23,800 --> 00:29:25,520
It's just some measure
of distance --

386
00:29:25,520 --> 00:29:28,240
Log likelihood distance,
Euclidean distance, Hamming

387
00:29:28,240 --> 00:29:29,900
distance --

388
00:29:29,900 --> 00:29:33,250
between the particular thing I
would have transmitted if I'd

389
00:29:33,250 --> 00:29:37,700
been on this branch and the
particular thing that I

390
00:29:37,700 --> 00:29:39,410
actually did receive.

391
00:29:39,410 --> 00:29:44,230
So having received a whole block
of data, I can now put a

392
00:29:44,230 --> 00:29:47,090
cost on each of these
branches.

393
00:29:47,090 --> 00:29:51,695
What's the minus log likelihood
cost if I say that

394
00:29:51,695 --> 00:29:53,660
the code word goes through
that branch?

395
00:29:53,660 --> 00:29:54,910
What does it cost me?

396
00:29:58,077 --> 00:29:59,327
All right.

397
00:30:01,800 --> 00:30:06,790
Now what I'm basically depending
on is note, there's

398
00:30:06,790 --> 00:30:16,850
a one-to-one map between
code words y and

399
00:30:16,850 --> 00:30:20,370
c and trellis paths.

400
00:30:25,610 --> 00:30:30,710
For this specific trellis up
here, how many paths are there

401
00:30:30,710 --> 00:30:31,960
through the trellis?

402
00:30:37,118 --> 00:30:40,590
Do you see that however I go
through the trellis, I'm going

403
00:30:40,590 --> 00:30:44,132
to meet four two-way branches?

404
00:30:44,132 --> 00:30:44,660
All right?

405
00:30:44,660 --> 00:30:45,790
According to [UNINTELLIGIBLE],
there

406
00:30:45,790 --> 00:30:48,210
are four yes-no questions.

407
00:30:48,210 --> 00:30:49,330
It's a binary tree.

408
00:30:49,330 --> 00:30:52,970
So there are 16 possible ways
to get through this trellis

409
00:30:52,970 --> 00:30:58,920
from start to finish, if I view
it as a maze problem.

410
00:30:58,920 --> 00:30:59,890
And they correspond --

411
00:30:59,890 --> 00:31:01,720
I haven't labeled all
the branche --

412
00:31:01,720 --> 00:31:06,270
but they do correspond to all
16 words in this block code.

413
00:31:10,500 --> 00:31:13,440
So now what is maximum
likely decoding?

414
00:31:13,440 --> 00:31:19,330
I want to find the one of those
16 words that has the

415
00:31:19,330 --> 00:31:25,120
greatest likelihood, or the
least negative log likelihood,

416
00:31:25,120 --> 00:31:31,355
over all of these 12 received
symbols or 6 received times.

417
00:31:34,750 --> 00:31:36,180
So what will that be?

418
00:31:36,180 --> 00:31:39,310
Once I've labeled each of these
branches with a log

419
00:31:39,310 --> 00:31:44,290
likelihood cost, I simply want
to find the least cost path

420
00:31:44,290 --> 00:31:45,890
through this trellis.

421
00:31:45,890 --> 00:31:50,690
And that will correspond to
the most likely code word.

422
00:31:50,690 --> 00:31:52,160
This is the essence of it.

423
00:31:52,160 --> 00:31:53,610
Do you all see that?

424
00:31:53,610 --> 00:31:54,860
AUDIENCE: [INAUDIBLE]

425
00:31:57,110 --> 00:32:00,220
PROFESSOR: The independent
transmission allows me to

426
00:32:00,220 --> 00:32:05,590
break it up into a
symbol-by-symbol or

427
00:32:05,590 --> 00:32:07,680
time-by-time expression.

428
00:32:07,680 --> 00:32:12,830
So I can simply cumulate over
this sum right here.

429
00:32:12,830 --> 00:32:15,550
And I'm just looking for the
minimum sum over all the

430
00:32:15,550 --> 00:32:17,470
possible ways I could get
through this trellis.

431
00:32:20,910 --> 00:32:31,120
So now we translated this into
finding the minimum cost path

432
00:32:31,120 --> 00:32:33,670
through a graph.

433
00:32:33,670 --> 00:32:37,410
Through a graph with a very
special structure, nice

434
00:32:37,410 --> 00:32:38,660
regular structure.

435
00:32:45,190 --> 00:32:45,540
OK.

436
00:32:45,540 --> 00:32:50,350
Once you've got that, then I
think it's very obvious how to

437
00:32:50,350 --> 00:32:55,600
come up with a nice recursive
algorithm for doing that.

438
00:32:55,600 --> 00:32:56,945
Here's my recursive algorithm.

439
00:33:03,170 --> 00:33:05,520
I'm going to start computing
weights.

440
00:33:05,520 --> 00:33:07,315
I'm going to start here.

441
00:33:07,315 --> 00:33:11,070
I'm going to assign
weight zero here.

442
00:33:11,070 --> 00:33:16,200
Right here I'm going to assign a
weight which is equal to the

443
00:33:16,200 --> 00:33:20,720
cost of the best path to get
to that node in the graph.

444
00:33:20,720 --> 00:33:24,340
In this case, it's just the
cost of that branch.

445
00:33:24,340 --> 00:33:26,250
Similarly down here.

446
00:33:26,250 --> 00:33:28,770
Similarly I proceed
to these four.

447
00:33:28,770 --> 00:33:32,040
There's only one way to get to
each of these four, and I

448
00:33:32,040 --> 00:33:37,900
simply add up what the
total weight is.

449
00:33:37,900 --> 00:33:40,750
Now here's the first point where
it gets interesting,

450
00:33:40,750 --> 00:33:45,260
where I have two possible ways
to get to this node.

451
00:33:45,260 --> 00:33:49,450
One way is via these three
branches, and that has a

452
00:33:49,450 --> 00:33:50,390
certain cost.

453
00:33:50,390 --> 00:33:53,330
Another way is via these three
branches, and that has a

454
00:33:53,330 --> 00:33:55,840
certain cost.

455
00:33:55,840 --> 00:34:00,680
Suppose the cost of these three
branches is higher than

456
00:34:00,680 --> 00:34:02,540
the cost of these
three branches.

457
00:34:02,540 --> 00:34:04,722
Just pair-wise.

458
00:34:04,722 --> 00:34:06,690
All right?

459
00:34:06,690 --> 00:34:12,710
Claim that I can pick the
minimum cost path to this

460
00:34:12,710 --> 00:34:19,800
node, throw away the other
possibility, and I can never

461
00:34:19,800 --> 00:34:22,489
have thrown away something which
itself is part of the

462
00:34:22,489 --> 00:34:24,530
minimum cost path through
the whole trellis.

463
00:34:27,610 --> 00:34:29,520
Clearly.

464
00:34:29,520 --> 00:34:34,480
Suppose this is the best
path from here to here.

465
00:34:34,480 --> 00:34:36,015
This is worse.

466
00:34:36,015 --> 00:34:41,590
X, X, X, X X. And now I find the
minimum cost path through

467
00:34:41,590 --> 00:34:44,679
the whole trellis goes
through this node.

468
00:34:44,679 --> 00:34:48,900
Say it's like that.

469
00:34:48,900 --> 00:34:51,730
Could it have started
with this?

470
00:34:51,730 --> 00:34:52,310
No.

471
00:34:52,310 --> 00:34:55,870
Because I can always replace
this with a path that starts

472
00:34:55,870 --> 00:34:59,130
with that and get
a better path.

473
00:34:59,130 --> 00:35:02,880
So at this point, I can make a
decision between all the paths

474
00:35:02,880 --> 00:35:05,400
that get to that node
and pick just one.

475
00:35:05,400 --> 00:35:07,850
The best one that gets
to that node.

476
00:35:07,850 --> 00:35:10,270
And that's called
the survivor.

477
00:35:10,270 --> 00:35:14,793
This is really the key concept
that Viterbi introduced.

478
00:35:17,620 --> 00:35:21,200
We only need to say one, the
best up to that time.

479
00:35:21,200 --> 00:35:25,780
We need to do it for all the
2 to the nu possibilities.

480
00:35:25,780 --> 00:35:32,772
So we have 2 to the nu survivors
at time k or time i,

481
00:35:32,772 --> 00:35:34,300
whatever your time index is.

482
00:35:34,300 --> 00:35:38,020
Used k for something else.

483
00:35:38,020 --> 00:35:38,490
OK.

484
00:35:38,490 --> 00:35:43,260
So I can throw away half
the possibilities.

485
00:35:43,260 --> 00:35:47,010
Now each survivor, I remember
what its past history is, and

486
00:35:47,010 --> 00:35:50,830
I remember what its cost
is to that point.

487
00:35:50,830 --> 00:35:55,310
Now to proceed, the recursion
is to proceed one time unit

488
00:35:55,310 --> 00:36:00,130
ahead, to add the incremental
cost -- say, to make a

489
00:36:00,130 --> 00:36:02,830
decision here, I need to add
the incremental cost to the

490
00:36:02,830 --> 00:36:05,830
two survivors that were
here and here.

491
00:36:05,830 --> 00:36:08,670
So I add an incremental cost,
according to what these

492
00:36:08,670 --> 00:36:10,500
branches are labeled with.

493
00:36:10,500 --> 00:36:15,890
And now I find the best of these
two possibilities to get

494
00:36:15,890 --> 00:36:17,990
to this node.

495
00:36:17,990 --> 00:36:28,770
So the recursion is called an
ACS operation for add, we add

496
00:36:28,770 --> 00:36:32,550
increments to each of these
paths, we compare which is the

497
00:36:32,550 --> 00:36:35,040
best, and we select
the best one.

498
00:36:35,040 --> 00:36:35,830
We keep that.

499
00:36:35,830 --> 00:36:36,930
We throw away the other.

500
00:36:36,930 --> 00:36:43,570
We now have a one unit longer
best path to this node and its

501
00:36:43,570 --> 00:36:51,770
cost for all 2 the nu survivors
at time i plus 1.

502
00:36:51,770 --> 00:36:56,670
And just proceeding in that way,
we go through until we

503
00:36:56,670 --> 00:36:59,210
finally get to the
terminating node.

504
00:36:59,210 --> 00:37:03,410
And at this point, when we've
found the best path to this

505
00:37:03,410 --> 00:37:05,000
node, we've found the best path

506
00:37:05,000 --> 00:37:08,320
through the whole trellis.

507
00:37:08,320 --> 00:37:08,610
OK?

508
00:37:08,610 --> 00:37:10,566
So that's all it is.

509
00:37:10,566 --> 00:37:15,890
I believe it's just totally
obvious once you reduce it to

510
00:37:15,890 --> 00:37:22,260
a search for a minimum cost
path through the trellis.

511
00:37:22,260 --> 00:37:26,125
It's very well suited to the
application of convolutional

512
00:37:26,125 --> 00:37:29,390
codes, because we really are
thinking of sending these bits

513
00:37:29,390 --> 00:37:31,190
as a stream in time.

514
00:37:31,190 --> 00:37:34,210
And at the receiver, you can
think of just proceeding

515
00:37:34,210 --> 00:37:39,150
forward one unit of time, 2 to
the nu, add, compare, selects,

516
00:37:39,150 --> 00:37:40,680
and we've got a new
set of survivors.

517
00:37:40,680 --> 00:37:45,030
There are nice implications of
this that involve, you can

518
00:37:45,030 --> 00:37:48,570
build a computer for every
node or for every pair of

519
00:37:48,570 --> 00:37:53,080
nodes to do the ACS operation,
make a very fast recursion

520
00:37:53,080 --> 00:37:55,900
through here.

521
00:37:55,900 --> 00:37:56,710
So that's it.

522
00:37:56,710 --> 00:37:59,740
That's the Viterbi algorithm
for terminated

523
00:37:59,740 --> 00:38:00,990
convolutional codes.

524
00:38:06,690 --> 00:38:10,175
Now let's ask about the
VIterbi algorithm for

525
00:38:10,175 --> 00:38:12,270
unterminated convolutional
codes.

526
00:38:21,500 --> 00:38:23,695
Suppose this trellis
becomes very long.

527
00:38:26,680 --> 00:38:27,930
Across the whole page.

528
00:38:31,750 --> 00:38:33,760
What are these survivors
going to look like?

529
00:38:33,760 --> 00:38:38,620
Suppose we start from a definite
node here, and we've

530
00:38:38,620 --> 00:38:42,140
got a four state convolutional
code, and we iterate and we

531
00:38:42,140 --> 00:38:45,730
iterate and we iterate with
the Viterbi algorithm.

532
00:38:45,730 --> 00:38:48,800
After a long time, somewhere out
here, we're going to have

533
00:38:48,800 --> 00:38:55,150
four survivors and
their histories.

534
00:38:55,150 --> 00:38:58,860
And I've greatly simplified,
but basically the histories

535
00:38:58,860 --> 00:39:03,960
are going to look schematically
something --

536
00:39:03,960 --> 00:39:06,300
I don't know -- could
be anything.

537
00:39:06,300 --> 00:39:07,550
Look like this.

538
00:39:10,830 --> 00:39:14,770
The point I'm illustrating here
is that the histories

539
00:39:14,770 --> 00:39:17,090
will be distinct right
at this time.

540
00:39:17,090 --> 00:39:18,160
They have to be, because
they're going to

541
00:39:18,160 --> 00:39:19,220
four distinct stages.

542
00:39:19,220 --> 00:39:22,050
But as you go backwards in
time, you will find they

543
00:39:22,050 --> 00:39:26,750
merge, any two histories will
merge, at a certain time back.

544
00:39:26,750 --> 00:39:28,300
And this is a probablistic
thing.

545
00:39:28,300 --> 00:39:31,640
It depends on what's happening
on the channel and so forth.

546
00:39:31,640 --> 00:39:36,240
But with high probability, they
will have merged not too

547
00:39:36,240 --> 00:39:39,740
far back in time.

548
00:39:39,740 --> 00:39:44,120
So even if we never get to a
final node, even if we just

549
00:39:44,120 --> 00:39:48,210
let this process continue
forever, at this point we can

550
00:39:48,210 --> 00:39:53,440
say, we can make a definite
decision at this time.

551
00:39:53,440 --> 00:39:53,720
Right?

552
00:39:53,720 --> 00:39:56,830
Because all survivors
start with a common

553
00:39:56,830 --> 00:40:00,780
initial part, up to here.

554
00:40:00,780 --> 00:40:03,690
So one way to operate the
Viterbi algorithm would be

555
00:40:03,690 --> 00:40:06,690
just to, at this
point, say, OK.

556
00:40:06,690 --> 00:40:10,340
I'm going to put out everything
before this time.

557
00:40:10,340 --> 00:40:12,890
No matter how long I run the
decoder, the first part of it

558
00:40:12,890 --> 00:40:14,380
is always going to
be this part.

559
00:40:14,380 --> 00:40:17,480
So these are definitely decided
up to this time.

560
00:40:17,480 --> 00:40:19,700
And then I still don't
know about here.

561
00:40:19,700 --> 00:40:22,120
I'm going to have to go
a little bit further.

562
00:40:22,120 --> 00:40:27,540
You proceed further, and after
a while, you find more that's

563
00:40:27,540 --> 00:40:29,120
definitely done.

564
00:40:29,120 --> 00:40:32,950
In practice, that would lead
to a sporadic output rate.

565
00:40:32,950 --> 00:40:34,660
That isn't really
what you want.

566
00:40:34,660 --> 00:40:40,230
So in practice what you do is
you establish a decision

567
00:40:40,230 --> 00:40:48,500
delay, delta.

568
00:40:48,500 --> 00:40:55,290
And what you put out is you hope
that 99.999% of the time,

569
00:40:55,290 --> 00:40:59,330
that if you look back delta on
all the survivors, they will

570
00:40:59,330 --> 00:41:02,430
all be a common path,
delta back here.

571
00:41:02,430 --> 00:41:04,460
So there's a very
high probability

572
00:41:04,460 --> 00:41:06,460
you will have converged.

573
00:41:06,460 --> 00:41:10,240
And so simply at this time, you
put out what you decided

574
00:41:10,240 --> 00:41:11,800
on delta time units earlier.

575
00:41:11,800 --> 00:41:14,090
Next time you put out
the next one.

576
00:41:14,090 --> 00:41:15,550
Next time you put out
the next one.

577
00:41:15,550 --> 00:41:18,490
So you get a nice, regular,
synchronous stream of data

578
00:41:18,490 --> 00:41:19,830
coming out of here.

579
00:41:19,830 --> 00:41:24,380
Every so often, it may happen
that you get out to here and

580
00:41:24,380 --> 00:41:27,220
you still haven't
made a decision.

581
00:41:27,220 --> 00:41:29,420
Then you have to do something.

582
00:41:29,420 --> 00:41:34,010
And it really doesn't matter
terribly much what you do.

583
00:41:34,010 --> 00:41:38,430
You might pick the guy who has
the best metric at this time,

584
00:41:38,430 --> 00:41:40,740
or you simply might say, well,
I'm always going to pick the

585
00:41:40,740 --> 00:41:42,130
one that goes to all zeros.

586
00:41:42,130 --> 00:41:43,800
That wouldn't be symmetric.

587
00:41:43,800 --> 00:41:48,120
You can make any decision
you like.

588
00:41:48,120 --> 00:41:50,640
And you could be wrong.

589
00:41:50,640 --> 00:41:53,780
But you pick delta large enough
so that the probability

590
00:41:53,780 --> 00:41:57,340
of this happening is very rare,
and this just adds a

591
00:41:57,340 --> 00:42:00,340
little bit to your error
probability, and as long as

592
00:42:00,340 --> 00:42:02,620
the probability of making an
error because of this kind of

593
00:42:02,620 --> 00:42:06,270
operation is much lower than
your probability of making an

594
00:42:06,270 --> 00:42:09,410
ordinary decoding error, then
you're going to be OK.

595
00:42:12,050 --> 00:42:16,140
For convolutional codes way back
at the beginning of time,

596
00:42:16,140 --> 00:42:21,370
people decided that a decision
delay a 5 times nu, 5 times

597
00:42:21,370 --> 00:42:25,870
the constraint length, was
the right rule of thumb.

598
00:42:25,870 --> 00:42:28,770
And that's been the rule of
thumb for rate 1/n codes

599
00:42:28,770 --> 00:42:30,470
forever after.

600
00:42:30,470 --> 00:42:35,510
The point is, delta should
be a lot more than nu.

601
00:42:35,510 --> 00:42:38,180
You know, after one constraint
length, you certainly won't

602
00:42:38,180 --> 00:42:39,610
have converged.

603
00:42:39,610 --> 00:42:41,580
After five constraint lengths,
you're highly

604
00:42:41,580 --> 00:42:43,410
likely to have converged.

605
00:42:43,410 --> 00:42:49,960
And theoretically, the
probability of not converging

606
00:42:49,960 --> 00:42:51,960
goes down exponentially
with delta.

607
00:42:51,960 --> 00:42:55,210
So big enough is
going to work.

608
00:42:55,210 --> 00:42:58,460
And a final point is that
sometimes, you really care

609
00:42:58,460 --> 00:43:01,930
that what you put out to
be a true code word.

610
00:43:01,930 --> 00:43:05,200
In that case, if you get to this
situation, you have to

611
00:43:05,200 --> 00:43:09,500
make a choice, you make a choice
here, then you have to

612
00:43:09,500 --> 00:43:13,340
actually eliminate all the
survivors that are not

613
00:43:13,340 --> 00:43:15,980
consistent with that choice.

614
00:43:15,980 --> 00:43:20,490
And you can do that simply by
putting an infinite metric on

615
00:43:20,490 --> 00:43:21,890
this guy here.

616
00:43:21,890 --> 00:43:25,270
Then he'll get wiped out as
soon as he's compared with

617
00:43:25,270 --> 00:43:26,520
anybody else.

618
00:43:28,630 --> 00:43:30,830
And that will ensure that
whatever you eventually put

619
00:43:30,830 --> 00:43:34,950
out, you keep the
sequence being a

620
00:43:34,950 --> 00:43:36,070
legitimate code sequence.

621
00:43:36,070 --> 00:43:39,220
So that's a very fine point.

622
00:43:39,220 --> 00:43:39,680
OK.

623
00:43:39,680 --> 00:43:44,200
So there really isn't any
serious problem with letting

624
00:43:44,200 --> 00:43:47,540
the Viterbi algorithm run
indefinitely in time once

625
00:43:47,540 --> 00:43:49,870
you've got it started.

626
00:43:49,870 --> 00:43:50,930
How do you get it started?

627
00:43:50,930 --> 00:43:57,620
Suppose you came online and
you simply had a stream of

628
00:43:57,620 --> 00:44:00,810
outputs, transmitted from a
convolutional code over a

629
00:44:00,810 --> 00:44:03,930
channel, and you didn't know
what state to start in.

630
00:44:03,930 --> 00:44:07,441
How do you use synchronize
to a starting state?

631
00:44:07,441 --> 00:44:10,400
Well, this is not hard either.

632
00:44:10,400 --> 00:44:14,350
Basically you start, you're in
one of four states, or 2 to

633
00:44:14,350 --> 00:44:15,060
the nu states.

634
00:44:15,060 --> 00:44:16,840
You don't know which one.

635
00:44:16,840 --> 00:44:23,250
Let's just give them all cost
0 and start decoding, using

636
00:44:23,250 --> 00:44:26,960
the four state trellis.

637
00:44:26,960 --> 00:44:33,160
And we just start receiving
2-tuples from here on.

638
00:44:33,160 --> 00:44:36,190
We get rk, r k plus
1, and so forth.

639
00:44:36,190 --> 00:44:37,150
So how should we start?

640
00:44:37,150 --> 00:44:38,290
Well, we'll just start
like that.

641
00:44:38,290 --> 00:44:41,180
And we get sort of the mirror
image of this --

642
00:44:41,180 --> 00:44:51,900
that after a time, these will
all converge to a single path.

643
00:44:51,900 --> 00:44:55,310
Or after a time, when we're way
down here, this is what

644
00:44:55,310 --> 00:44:58,300
the situation will look like.

645
00:44:58,300 --> 00:45:01,600
These things will each have a
different route over here, but

646
00:45:01,600 --> 00:45:03,670
they will have converged
in here.

647
00:45:03,670 --> 00:45:06,710
There will be a long path over
which they're all converged,

648
00:45:06,710 --> 00:45:10,560
and then towards the end,
they'll be unrooted again.

649
00:45:10,560 --> 00:45:10,890
OK.

650
00:45:10,890 --> 00:45:11,540
Well, that's fine.

651
00:45:11,540 --> 00:45:12,730
What does that mean
in practice?

652
00:45:12,730 --> 00:45:18,190
That means we make errors during
the synchronization.

653
00:45:21,810 --> 00:45:27,800
But we say we're synchronized
when we get this case where

654
00:45:27,800 --> 00:45:30,990
all the paths going to all the
current survivors have a

655
00:45:30,990 --> 00:45:33,820
common central stage.

656
00:45:33,820 --> 00:45:36,710
And how long does it take
to synchronize?

657
00:45:36,710 --> 00:45:41,540
Again, by analysis, it's exactly
this same delta again.

658
00:45:41,540 --> 00:45:45,720
The probability of not being
synchronized after delta goes

659
00:45:45,720 --> 00:45:48,130
down exponentially with delta
in exactly the same way.

660
00:45:48,130 --> 00:45:50,940
It's just a mirror image from
one end to the other.

661
00:45:50,940 --> 00:45:54,930
So from a practical point of
view, this is no problem.

662
00:45:54,930 --> 00:45:57,750
And just starting the Viterbi
decoder up, with arbitrary

663
00:45:57,750 --> 00:46:02,330
metrics here, and after five
constraint lengths, if you

664
00:46:02,330 --> 00:46:05,460
like, it's highly likely to
have gotten synchronized.

665
00:46:05,460 --> 00:46:07,880
You'll make errors for five
constraint lengths and after

666
00:46:07,880 --> 00:46:10,956
that, you'll be OK, as though
you knew the starting state.

667
00:46:13,640 --> 00:46:21,030
So the moral is, no problem.

668
00:46:21,030 --> 00:46:23,510
You can just set up the Viterbi

669
00:46:23,510 --> 00:46:24,760
algorithm and let it run.

670
00:46:31,330 --> 00:46:36,110
The costs will all, of course,
increase without bound, which

671
00:46:36,110 --> 00:46:36,570
is [UNINTELLIGIBLE].

672
00:46:36,570 --> 00:46:40,110
You can always renormalize them,
subtract a common cost

673
00:46:40,110 --> 00:46:40,810
from all of them.

674
00:46:40,810 --> 00:46:42,490
That won't change anything.

675
00:46:42,490 --> 00:46:43,820
Keep them within running time.

676
00:46:43,820 --> 00:46:49,410
So we don't need to terminate
convolutional codes in order

677
00:46:49,410 --> 00:46:50,620
to run the Viterbi algorithm.

678
00:46:50,620 --> 00:46:54,470
We just let it self-synchronize
and we make

679
00:46:54,470 --> 00:46:57,180
decisions with some
decision delay.

680
00:46:57,180 --> 00:47:00,290
And the additional problems
that we have

681
00:47:00,290 --> 00:47:03,220
are very, very small.

682
00:47:03,220 --> 00:47:05,070
There are no additional
problems.

683
00:47:05,070 --> 00:47:05,561
Yeah?

684
00:47:05,561 --> 00:47:06,811
AUDIENCE: [INAUDIBLE]

685
00:47:13,910 --> 00:47:16,560
PROFESSOR: At this point?

686
00:47:16,560 --> 00:47:19,220
Well, notice that we don't know
that we've synchronized

687
00:47:19,220 --> 00:47:21,905
until we've continued further,
and we've got --

688
00:47:21,905 --> 00:47:25,420
you know, where we've really
synchronized is when we see

689
00:47:25,420 --> 00:47:28,730
that every survivor path
has the common root.

690
00:47:31,840 --> 00:47:37,490
So at that point, there is
really only one path here.

691
00:47:37,490 --> 00:47:40,930
And we can say that the
synchronized part is

692
00:47:40,930 --> 00:47:42,180
definitely decoded.

693
00:47:44,450 --> 00:47:47,420
And we can't really say too
much about this out here,

694
00:47:47,420 --> 00:47:50,660
because this depends on what's
happened out in the past.

695
00:47:50,660 --> 00:47:56,080
So you say this is erasures,
if you like.

696
00:47:56,080 --> 00:47:59,550
The stuff that we're pretty sure
has a high probability of

697
00:47:59,550 --> 00:48:00,220
being wrong.

698
00:48:00,220 --> 00:48:01,470
AUDIENCE: [INAUDIBLE]

699
00:48:05,570 --> 00:48:08,740
PROFESSOR: There is
only one decoded

700
00:48:08,740 --> 00:48:12,407
path during this interval.

701
00:48:12,407 --> 00:48:13,676
AUDIENCE: But before that
interval there are branches

702
00:48:13,676 --> 00:48:14,926
and so on right.

703
00:48:17,310 --> 00:48:19,800
PROFESSOR: Well here, we
don't know anything.

704
00:48:19,800 --> 00:48:23,930
Here we know the results
of one computation.

705
00:48:23,930 --> 00:48:24,810
What are you suggesting?

706
00:48:24,810 --> 00:48:26,410
Just pick the best one
at that point?

707
00:48:26,410 --> 00:48:29,254
AUDIENCE: And finally after
you reach the [INAUDIBLE]?

708
00:48:33,280 --> 00:48:34,800
PROFESSOR: And finally?

709
00:48:34,800 --> 00:48:36,960
I'm just not sure
exactly what the

710
00:48:36,960 --> 00:48:38,680
logic is of your algorithm.

711
00:48:38,680 --> 00:48:40,730
It's clear for this, it wouldn't
make any difference

712
00:48:40,730 --> 00:48:42,730
if we just started off
arbitrarily so, we're going to

713
00:48:42,730 --> 00:48:44,820
start in the zero state.

714
00:48:44,820 --> 00:48:48,110
And we only allow things to
start in the zero state.

715
00:48:48,110 --> 00:48:51,210
Well, we'll eventually get
to this path anyway.

716
00:48:51,210 --> 00:48:53,820
So it really doesn't matter
how you start.

717
00:48:53,820 --> 00:48:55,500
You're going to have garbage for
a while, and then you're

718
00:48:55,500 --> 00:48:56,750
going to be OK.

719
00:48:58,510 --> 00:48:59,910
There's no point in doing
anything more

720
00:48:59,910 --> 00:49:01,160
sophisticated than that.

721
00:49:08,610 --> 00:49:11,500
I don't want to discuss what
you're suggesting, because I

722
00:49:11,500 --> 00:49:14,210
think there's a flaw in it.

723
00:49:14,210 --> 00:49:16,040
Try to figure out what time
you're going to make this

724
00:49:16,040 --> 00:49:17,290
decision at.

725
00:49:19,720 --> 00:49:20,655
OK.

726
00:49:20,655 --> 00:49:24,720
Do we all understand the
Viterbi algorithm?

727
00:49:24,720 --> 00:49:25,570
Yes?

728
00:49:25,570 --> 00:49:26,740
Good.

729
00:49:26,740 --> 00:49:28,720
We can easily program it up.

730
00:49:28,720 --> 00:49:31,776
There will be an exercise
on the homework.

731
00:49:34,330 --> 00:49:37,110
But now you can all do the
Viterbi algorithm.

732
00:49:39,800 --> 00:49:40,330
All right.

733
00:49:40,330 --> 00:49:40,980
So --

734
00:49:40,980 --> 00:49:41,900
and oh.

735
00:49:41,900 --> 00:49:44,027
What's the complexity of
the Viterbi algorithm?

736
00:49:51,190 --> 00:49:54,910
What is the complexity?

737
00:49:54,910 --> 00:49:57,530
We always want to be talking
about performance versus

738
00:49:57,530 --> 00:49:58,780
complexity.

739
00:50:00,860 --> 00:50:03,520
So it's a recursive algorithm.

740
00:50:03,520 --> 00:50:08,180
We do exactly the same
operations every unit of time.

741
00:50:08,180 --> 00:50:09,990
What do the operations
consist of?

742
00:50:09,990 --> 00:50:13,380
They consist of add,
compare, select.

743
00:50:13,380 --> 00:50:16,260
How many additions to
we have to make?

744
00:50:16,260 --> 00:50:19,270
Additions are basically equal
to the number of branches in

745
00:50:19,270 --> 00:50:22,520
each unit of time in the
trellis, which is 2

746
00:50:22,520 --> 00:50:23,565
to the nu plus one.

747
00:50:23,565 --> 00:50:24,485
Is that clear?

748
00:50:24,485 --> 00:50:26,280
We have 2 to the nu states.

749
00:50:26,280 --> 00:50:29,670
Two branches out of, two
branches into each state,

750
00:50:29,670 --> 00:50:31,250
always for rate one.

751
00:50:31,250 --> 00:50:38,100
So we have 2 to the nu
plus 1 additions.

752
00:50:38,100 --> 00:50:43,520
We get 2 the nu compares,
one for each state.

753
00:50:43,520 --> 00:50:46,575
Which is really, you can
consider the select to be part

754
00:50:46,575 --> 00:50:48,780
of the compare.

755
00:50:48,780 --> 00:50:55,510
Overall, you can say the
complexity is of the order of

756
00:50:55,510 --> 00:50:58,420
2 to the nu or 2 to
the nu plus 1.

757
00:51:02,030 --> 00:51:06,660
This is the number of states
or the state complexity.

758
00:51:06,660 --> 00:51:07,950
This is the number
of branches.

759
00:51:11,410 --> 00:51:14,490
I will argue a little bit
later that the branch

760
00:51:14,490 --> 00:51:16,470
complexity is really
more fundamental.

761
00:51:16,470 --> 00:51:20,610
You've got to do at least one
thing for each branch.

762
00:51:20,610 --> 00:51:24,130
So in different set up,
the branch complexity.

763
00:51:24,130 --> 00:51:28,870
But these are practically the
same thing, and so we say that

764
00:51:28,870 --> 00:51:33,030
the complexity of the Viterbi
algorithm is basically like

765
00:51:33,030 --> 00:51:33,950
the number of states.

766
00:51:33,950 --> 00:51:37,220
We have a four state encoder,
the complexity's like four, it

767
00:51:37,220 --> 00:51:39,970
goes up exponentially with
the constraint length.

768
00:51:39,970 --> 00:51:43,700
This says, this is going to be
nice, as long as we have short

769
00:51:43,700 --> 00:51:45,010
constraint length.

770
00:51:45,010 --> 00:51:48,020
For longer constraint lengths,
you know, a constraint length

771
00:51:48,020 --> 00:51:51,720
20, it's going to be a pretty
horrible algorithm.

772
00:51:51,720 --> 00:51:55,390
So we can only use the Viterbi
algorithm relatively short

773
00:51:55,390 --> 00:51:58,390
constraint length codes,
relatively

774
00:51:58,390 --> 00:52:01,080
small numbers of states.

775
00:52:01,080 --> 00:52:03,990
The biggest Viterbi algorithm
that I'm aware has ever been

776
00:52:03,990 --> 00:52:07,270
built is a 2 the 14th
state algorithm.

777
00:52:07,270 --> 00:52:10,360
The so-called big Viterbi
decoder out at JPL.

778
00:52:10,360 --> 00:52:13,050
It was used for the Galileo
space missions.

779
00:52:13,050 --> 00:52:14,440
It's in a rack that big.

780
00:52:14,440 --> 00:52:17,460
I'm sure nowadays you could
practically get it on a chip,

781
00:52:17,460 --> 00:52:20,690
and you could maybe do
2 the 20th states.

782
00:52:20,690 --> 00:52:28,510
But so this exponential
complexity, in computer

783
00:52:28,510 --> 00:52:31,970
science terms, not really
what we want.

784
00:52:31,970 --> 00:52:36,230
But when we're talking about
moderate complexity decoders,

785
00:52:36,230 --> 00:52:40,290
these really have proved to be
the most effective, and become

786
00:52:40,290 --> 00:52:43,590
the standard moderate
complexity decoder.

787
00:52:43,590 --> 00:52:46,500
The advantage is that we can
do true maximum likelihood

788
00:52:46,500 --> 00:52:52,370
decoding on a sequence basis,
using soft decisions, using

789
00:52:52,370 --> 00:52:54,910
whatever reliability information
the channel has,

790
00:52:54,910 --> 00:52:58,270
as long as it's memoryless.

791
00:52:58,270 --> 00:53:03,550
So last topic is to talk
about performance.

792
00:53:03,550 --> 00:53:06,170
How are we going to evaluate
the performance of

793
00:53:06,170 --> 00:53:08,200
convolutional codes?

794
00:53:08,200 --> 00:53:11,120
You remember what we
did on block codes?

795
00:53:11,120 --> 00:53:15,650
We basically looked at the
pairwise error probability

796
00:53:15,650 --> 00:53:19,120
between block code words, we
then did the union bound,

797
00:53:19,120 --> 00:53:25,360
based on the pairwise
error probabilities.

798
00:53:25,360 --> 00:53:28,590
And we observed that the union
bound was typically dominated

799
00:53:28,590 --> 00:53:31,370
by the minimum distance error
events, and so we get the

800
00:53:31,370 --> 00:53:34,700
union bound estimate, which
was purely based on the

801
00:53:34,700 --> 00:53:38,620
minimum distance possible
errors.

802
00:53:38,620 --> 00:53:41,050
And we can do exactly the
same thing in the

803
00:53:41,050 --> 00:53:42,700
convolutional case.

804
00:53:42,700 --> 00:53:45,530
Convolutional case, again,
is a linear code.

805
00:53:45,530 --> 00:53:49,310
That means it has the group
property, has symmetry such

806
00:53:49,310 --> 00:53:53,180
that the distances from every
possible code sequence to all

807
00:53:53,180 --> 00:53:57,390
other code sequences are going
to be the same, since we're

808
00:53:57,390 --> 00:54:02,730
talking on a long or possibly
infinite sequence basis.

809
00:54:02,730 --> 00:54:07,840
And we need to be just a little
bit more careful about

810
00:54:07,840 --> 00:54:09,420
what an error consists of.

811
00:54:09,420 --> 00:54:13,850
We need to talk about
error events.

812
00:54:13,850 --> 00:54:16,622
And this is a simple concept.

813
00:54:16,622 --> 00:54:22,970
Let us again draw a path
corresponding to the

814
00:54:22,970 --> 00:54:27,380
transmitted code words.

815
00:54:27,380 --> 00:54:32,460
A very long path, potentially
infinite, but it is some

816
00:54:32,460 --> 00:54:34,930
definite sequence.

817
00:54:34,930 --> 00:54:38,520
And let's run it through a
memory list channel, use the

818
00:54:38,520 --> 00:54:42,270
Viterbi algorithm, and we're
going to get some received

819
00:54:42,270 --> 00:54:45,336
code word, or decoded
code word.

820
00:54:51,170 --> 00:54:54,200
This is one place where you
might want to insist that the

821
00:54:54,200 --> 00:54:57,650
Viterbi algorithm actually
put out a code word.

822
00:54:57,650 --> 00:55:00,170
What is that going
to look like?

823
00:55:00,170 --> 00:55:02,660
Well, if you're running
normally, the received code

824
00:55:02,660 --> 00:55:04,850
word is going to equal
the transmitted code

825
00:55:04,850 --> 00:55:06,870
word most the time.

826
00:55:06,870 --> 00:55:08,750
Except it's going
to make errors.

827
00:55:08,750 --> 00:55:10,070
And what will the errors
look like?

828
00:55:10,070 --> 00:55:14,090
They'll look like a branch off
through the trellis, and then

829
00:55:14,090 --> 00:55:18,486
eventually a reemerging
into the same state.

830
00:55:18,486 --> 00:55:21,110
And similarly, you go on longer,
and you might have

831
00:55:21,110 --> 00:55:23,930
another error.

832
00:55:23,930 --> 00:55:26,276
And any place where there's
a difference --

833
00:55:28,786 --> 00:55:31,410
should have done a different
color here --

834
00:55:31,410 --> 00:55:33,470
any place where there's
difference, this is called an

835
00:55:33,470 --> 00:55:34,720
error event.

836
00:55:36,950 --> 00:55:40,720
So what I'm illustrating is a
case when we had two disjoint

837
00:55:40,720 --> 00:55:42,740
error events.

838
00:55:42,740 --> 00:55:45,500
Is that concept clear?

839
00:55:45,500 --> 00:55:47,670
We're going to draw the trellis
paths corresponding

840
00:55:47,670 --> 00:55:52,100
the transmitted code word and
the decoded code word.

841
00:55:52,100 --> 00:55:55,850
Wherever they diverge over a
period of time, we're going to

842
00:55:55,850 --> 00:55:56,990
call it an error event.

843
00:55:56,990 --> 00:56:00,530
But eventually they will
re-merge again.

844
00:56:00,530 --> 00:56:03,510
So it could be a short time.

845
00:56:03,510 --> 00:56:07,060
Can't be any shorter than the
constraint length plus 1.

846
00:56:07,060 --> 00:56:09,040
That would be the minimum
length error event.

847
00:56:09,040 --> 00:56:11,010
Could be a longer time.

848
00:56:11,010 --> 00:56:12,260
Unbounded, actually.

849
00:56:15,470 --> 00:56:16,870
OK.

850
00:56:16,870 --> 00:56:21,870
What is going to be the
probability of an error event

851
00:56:21,870 --> 00:56:23,290
starting at some time?

852
00:56:37,340 --> 00:56:41,680
And when I say an error event
starting at time k, let's

853
00:56:41,680 --> 00:56:44,180
suppose I've been going along
on the transmitted path, and

854
00:56:44,180 --> 00:56:48,690
the decoder has still
got that there.

855
00:56:48,690 --> 00:56:50,660
I'm asking, what is the
probability that this code

856
00:56:50,660 --> 00:56:54,590
word is actually more likely
on a maximum likelihood

857
00:56:54,590 --> 00:56:58,190
sequence detection basis
than this one?

858
00:56:58,190 --> 00:57:00,980
Well, simply the probability
that the received sequence is

859
00:57:00,980 --> 00:57:04,710
closer to this one
than this one.

860
00:57:04,710 --> 00:57:09,530
We know how to analyze that
for a finite differences.

861
00:57:09,530 --> 00:57:12,850
What is the difference here?

862
00:57:12,850 --> 00:57:22,030
The difference, call this y
of d and this y hat of d.

863
00:57:22,030 --> 00:57:26,240
What is y of d minus
y hat of d?

864
00:57:26,240 --> 00:57:30,980
We'll call that e of d.

865
00:57:30,980 --> 00:57:32,390
This is a code word.

866
00:57:32,390 --> 00:57:34,300
This is a decoded code word.

867
00:57:34,300 --> 00:57:38,400
So the error event has
to be a code word.

868
00:57:38,400 --> 00:57:39,650
Right?

869
00:57:41,210 --> 00:57:42,855
Decoder made a mistake.

870
00:57:42,855 --> 00:57:46,160
The mistake has to
be a code word.

871
00:57:46,160 --> 00:57:53,840
So we're asking, what is the
probability that y of d sent y

872
00:57:53,840 --> 00:58:01,300
hat of d decoded, where
y hat of d equals y

873
00:58:01,300 --> 00:58:03,945
of d plus e of d?

874
00:58:03,945 --> 00:58:08,880
We'll just ask for that
particular event.

875
00:58:08,880 --> 00:58:20,830
This is the probability that
r of d closer to y hat

876
00:58:20,830 --> 00:58:23,707
of d than y of d.

877
00:58:28,680 --> 00:58:30,260
Now, making a big leap.

878
00:58:34,030 --> 00:58:37,500
This is equal to --

879
00:58:37,500 --> 00:58:41,790
we're just talking about two
sequences in Euclidean space.

880
00:58:41,790 --> 00:58:43,130
All that matters is
the Euclidean

881
00:58:43,130 --> 00:58:45,320
distance between them.

882
00:58:45,320 --> 00:58:47,870
What is the Euclidean distance
between them?

883
00:58:47,870 --> 00:58:51,930
It's 4 alpha squared times the
weight of this error event,

884
00:58:51,930 --> 00:58:53,180
the Hamming weight of
this error event.

885
00:58:56,290 --> 00:59:00,050
And so if you remember how to
calculate pairwise error

886
00:59:00,050 --> 00:59:05,540
probabilities, this is just
alpha squared D over sigma

887
00:59:05,540 --> 00:59:11,950
squared, where D is the
distance of e of d.

888
00:59:11,950 --> 00:59:18,054
The weight, the Hamming
weight of e of d.

889
00:59:18,054 --> 00:59:20,040
Call it sigma squared.

890
00:59:20,040 --> 00:59:21,390
Remember something that
looked like that?

891
00:59:25,100 --> 00:59:28,550
So once again, we go from the
Hamming weight of a possible

892
00:59:28,550 --> 00:59:32,450
error event to a Euclidean
weight, which is 4 alpha

893
00:59:32,450 --> 00:59:35,940
squared times the
Hamming weight.

894
00:59:35,940 --> 00:59:40,190
We actually take the square root
of that and we only need

895
00:59:40,190 --> 00:59:45,190
to make an error, a noise
of half of that length.

896
00:59:45,190 --> 00:59:48,250
So that's where the
4 disappears.

897
00:59:48,250 --> 00:59:53,040
And we just get q of d squared
over sigma squared, where d is

898
00:59:53,040 --> 00:59:56,550
the distance to the
decision boundary.

899
00:59:56,550 --> 01:00:01,520
Compressing a lot of steps,
but it's all something you

900
01:00:01,520 --> 01:00:06,110
felt you knew well a
few chapters ago.

901
01:00:06,110 --> 01:00:16,680
So the union bound would simply
be the sum over all

902
01:00:16,680 --> 01:00:22,920
error events in the code
such that the start --

903
01:00:22,920 --> 01:00:25,010
so e of d is --

904
01:00:25,010 --> 01:00:27,330
you want them to start
at time 0, say.

905
01:00:27,330 --> 01:00:29,970
Let's ask for the probably
of an error event

906
01:00:29,970 --> 01:00:32,950
starting at time 0.

907
01:00:32,950 --> 01:00:37,590
So we want it to be polynomial,
have no non-zero

908
01:00:37,590 --> 01:00:41,310
negative coefficients, and have
the coefficient at time 0

909
01:00:41,310 --> 01:00:46,300
be 1, or not 0.

910
01:00:46,300 --> 01:00:49,346
I'm doing this very roughly.

911
01:00:49,346 --> 01:00:55,220
Of q to the square root of alpha
squared times the weight

912
01:00:55,220 --> 01:00:58,920
Hamming of e of d over
sigma squared.

913
01:01:02,390 --> 01:01:03,350
Just as before.

914
01:01:03,350 --> 01:01:07,540
So to get the union bound, we
sum up over the weights of all

915
01:01:07,540 --> 01:01:11,350
sequences that start
at time 0.

916
01:01:11,350 --> 01:01:15,916
Let's do it for our
favorite example.

917
01:01:15,916 --> 01:01:22,630
Suppose g of d is 1 plus d
squared, 1 plus d plus d

918
01:01:22,630 --> 01:01:27,550
squared, then what are the
possible e of d's that I'm

919
01:01:27,550 --> 01:01:28,380
talking about?

920
01:01:28,380 --> 01:01:31,370
I have this itself.

921
01:01:31,370 --> 01:01:34,390
1 plus d squared, 1 plus
d plus d squared.

922
01:01:34,390 --> 01:01:38,850
This is weight equal to 5.

923
01:01:38,850 --> 01:01:41,000
What's my next possible
error event?

924
01:01:41,000 --> 01:01:43,860
Would be 1 plus d times this.

925
01:01:43,860 --> 01:01:50,740
So it's going to be a table.

926
01:01:50,740 --> 01:01:56,570
We have g of d, 1 plus d times
g of d, which is equal to 1

927
01:01:56,570 --> 01:02:02,320
plus d plus d squared plus d
cubed, 1 plus d cubed, that

928
01:02:02,320 --> 01:02:05,290
has weight equals 6.

929
01:02:05,290 --> 01:02:07,520
What's my next longer one?

930
01:02:07,520 --> 01:02:16,010
1 plus d squared times g of d
equals 1 plus d fourth, 1 plus

931
01:02:16,010 --> 01:02:22,380
d plus d third plus d fourth,
that has weight 6.

932
01:02:22,380 --> 01:02:26,370
1 plus d plus d squared
times g of d.

933
01:02:31,160 --> 01:02:32,420
1 plus d plus --

934
01:02:32,420 --> 01:02:35,180
this is 1 plus d plus --

935
01:02:42,370 --> 01:02:45,710
plus d third, plus d forth.

936
01:02:45,710 --> 01:02:47,620
I may be making a
mistake here.

937
01:02:47,620 --> 01:02:51,890
1 plus d squared
plus d fourth.

938
01:02:51,890 --> 01:02:54,550
It's hard to do this by
hand after a while.

939
01:02:54,550 --> 01:02:56,190
OK.

940
01:02:56,190 --> 01:02:59,450
Notice, we start tabulating all
the error events, which I

941
01:02:59,450 --> 01:03:04,730
can do in order of the degree of
u of d, always keeping the

942
01:03:04,730 --> 01:03:09,120
non-zero, the time 0
term equal to 1.

943
01:03:09,120 --> 01:03:10,910
So this is the only
one of length 1.

944
01:03:10,910 --> 01:03:12,035
This is [UNINTELLIGIBLE]

945
01:03:12,035 --> 01:03:14,270
of degree 0.

946
01:03:14,270 --> 01:03:15,730
This is the only one
of degree 1.

947
01:03:15,730 --> 01:03:17,170
There are two of them
of degree 2.

948
01:03:17,170 --> 01:03:21,330
There will be four of them of
degree 3, and so forth.

949
01:03:21,330 --> 01:03:25,870
So I can lay out what all the
error events could be, and I

950
01:03:25,870 --> 01:03:27,720
look at what their
weights are.

951
01:03:27,720 --> 01:03:30,060
Here's my minimum weight
error event.

952
01:03:30,060 --> 01:03:32,260
Then I have two of weight
six and so forth.

953
01:03:32,260 --> 01:03:35,920
So the union bound, I'd be
adding up this term.

954
01:03:35,920 --> 01:03:40,080
I'd get one term where the
weight is 5, two where the

955
01:03:40,080 --> 01:03:44,200
weight is 6, I don't know how
many where the weight is 7.

956
01:03:44,200 --> 01:03:46,380
I happen to know there
were only two

957
01:03:46,380 --> 01:03:48,190
weight 6 error events.

958
01:03:48,190 --> 01:03:51,890
And you simply have to find out
what the weight profile is

959
01:03:51,890 --> 01:03:53,840
and put it in the union bound.

960
01:03:53,840 --> 01:04:00,470
Or you can use the union bound
estimate, which is simply --

961
01:04:00,470 --> 01:04:09,870
let's just take k nd, the number
of error events of

962
01:04:09,870 --> 01:04:15,010
minimum weight d, times q to
the square root of alpha

963
01:04:15,010 --> 01:04:24,805
squared d, where d equals min
weight over sigma squared.

964
01:04:27,850 --> 01:04:28,280
OK.

965
01:04:28,280 --> 01:04:32,800
In this case, nd equals
1, d equals 6.

966
01:04:32,800 --> 01:04:35,270
Let me take this one
step further.

967
01:04:35,270 --> 01:04:37,350
nd times q to the square
root of --

968
01:04:40,150 --> 01:04:46,398
what is sigma squared
equals n0 over 2?

969
01:04:46,398 --> 01:04:53,250
And Eb equals n times
alpha squared.

970
01:04:53,250 --> 01:04:56,120
The energy per transmitted
bit is alpha squared.

971
01:04:56,120 --> 01:05:00,760
We're going to transmit n bits
for every information bit.

972
01:05:00,760 --> 01:05:11,460
So plugging those in, I get 2d
over n times Eb over N0, which

973
01:05:11,460 --> 01:05:16,830
is again of the form nd q to the
square root of 2 times the

974
01:05:16,830 --> 01:05:22,310
coding gain of the code
times Eb over N0.

975
01:05:22,310 --> 01:05:27,190
Bottom line is I get precisely
the same performance analysis,

976
01:05:27,190 --> 01:05:36,020
or I get a nominal
coding gain of --

977
01:05:38,840 --> 01:05:40,980
in general, it's kd over n.

978
01:05:40,980 --> 01:05:44,940
If it were a rate k over n
code, since we're only

979
01:05:44,940 --> 01:05:49,140
considering rate 1 over n codes,
it's just d over n.

980
01:05:49,140 --> 01:05:59,720
And I get an error coefficient
which equals the number of

981
01:05:59,720 --> 01:06:05,690
weight d code words,
starting time 0.

982
01:06:09,950 --> 01:06:13,000
Of course an error event could
start at any time.

983
01:06:13,000 --> 01:06:16,950
If there are nd starting
at time 0, how many

984
01:06:16,950 --> 01:06:20,230
start at time 1?

985
01:06:20,230 --> 01:06:23,190
Time invariant code.

986
01:06:23,190 --> 01:06:26,620
So it's going to be the same
numbers could possibly

987
01:06:26,620 --> 01:06:27,720
start at time 1.

988
01:06:27,720 --> 01:06:30,330
So what I'm computing here is
the probability of an error

989
01:06:30,330 --> 01:06:34,500
event starting at a
particular time.

990
01:06:34,500 --> 01:06:39,380
For this particular code,
what do I have?

991
01:06:39,380 --> 01:06:46,000
I have one event of weight 5.

992
01:06:46,000 --> 01:06:55,360
So the nominal coding gain
is 5 over n, which is 2.

993
01:06:55,360 --> 01:06:56,450
Which is --

994
01:06:56,450 --> 01:07:03,920
5 is 7 dB, 2 is 3 dB,
so this is 4 dB.

995
01:07:03,920 --> 01:07:06,300
That's pretty good.

996
01:07:06,300 --> 01:07:10,370
Nominal coding gain of 4 dB with
only a four state code.

997
01:07:10,370 --> 01:07:12,690
Obviously a very simple
decoder for this code.

998
01:07:15,770 --> 01:07:20,040
And what is kd equals 1?

999
01:07:20,040 --> 01:07:21,400
That implies --

1000
01:07:21,400 --> 01:07:26,380
again, we have the same argument
about whatever the

1001
01:07:26,380 --> 01:07:28,300
error coefficient is.

1002
01:07:28,300 --> 01:07:30,590
You could plot this curve.

1003
01:07:30,590 --> 01:07:33,390
The larger the error coefficient
is, the more the

1004
01:07:33,390 --> 01:07:35,950
curve moves up, and
therefore over.

1005
01:07:35,950 --> 01:07:39,350
You get an effective coding
gain which is less.

1006
01:07:39,350 --> 01:07:42,900
But this means since it's 1,
you don't have to do that.

1007
01:07:42,900 --> 01:07:45,720
The effective coding gain
is the same as the

1008
01:07:45,720 --> 01:07:47,580
nominal coding gain.

1009
01:07:47,580 --> 01:07:49,890
It's still 4 dB.

1010
01:07:49,890 --> 01:07:54,010
So it's a real 4 dB of coding
gain for the simple little two

1011
01:07:54,010 --> 01:07:55,260
state code.

1012
01:07:59,220 --> 01:08:07,630
This code compares very directly
with the 8 4 4

1013
01:08:07,630 --> 01:08:11,430
Reed-Muller code, block code.

1014
01:08:11,430 --> 01:08:13,550
This code also has rate 1/2.

1015
01:08:13,550 --> 01:08:14,460
It has the same rate.

1016
01:08:14,460 --> 01:08:18,950
We'll see that it also has a
four-state trellis diagram.

1017
01:08:18,950 --> 01:08:23,090
But it only has distance four,
which means its nominal coding

1018
01:08:23,090 --> 01:08:26,970
gain is only 2, 3 dB.

1019
01:08:26,970 --> 01:08:34,370
And furthermore, it has 14
minimum weight words, which

1020
01:08:34,370 --> 01:08:37,939
even dividing by 4 to get the
number per bit, there's still

1021
01:08:37,939 --> 01:08:41,779
a factor of 3, still going to
cost us another couple of

1022
01:08:41,779 --> 01:08:42,710
tenths of a dB.

1023
01:08:42,710 --> 01:08:48,457
Its effective coding gain is
only about 2.6 or 2.7 dB.

1024
01:08:48,457 --> 01:08:48,890
All right?

1025
01:08:48,890 --> 01:08:54,100
So this code has much better
performance for about the same

1026
01:08:54,100 --> 01:08:59,920
complexity as this code,
at the same rate.

1027
01:08:59,920 --> 01:09:01,520
And this tends to be typical.

1028
01:09:01,520 --> 01:09:05,939
Convolutional codes just beat
block codes when you compare

1029
01:09:05,939 --> 01:09:07,640
them in this way.

1030
01:09:07,640 --> 01:09:11,620
Notice that we're assuming
maximum likelihood decoding.

1031
01:09:11,620 --> 01:09:15,279
We don't yet have a maximum
likelihood decoding algorithm

1032
01:09:15,279 --> 01:09:15,960
for this code.

1033
01:09:15,960 --> 01:09:19,510
We'll find for this code, we can
also decode it using the

1034
01:09:19,510 --> 01:09:22,529
Viterbi algorithm with
a four state decoder

1035
01:09:22,529 --> 01:09:24,100
comparable to this one.

1036
01:09:24,100 --> 01:09:26,640
So it would be, I would say,
for maximum likelihood

1037
01:09:26,640 --> 01:09:28,740
decoding --

1038
01:09:28,740 --> 01:09:30,040
about same complexity.

1039
01:09:30,040 --> 01:09:33,080
But we simply get much better
performance with the

1040
01:09:33,080 --> 01:09:33,680
convolutional.

1041
01:09:33,680 --> 01:09:34,145
Yeah?

1042
01:09:34,145 --> 01:09:35,395
AUDIENCE: [INAUDIBLE]

1043
01:09:37,870 --> 01:09:38,500
PROFESSOR: Say again?

1044
01:09:38,500 --> 01:09:40,500
AUDIENCE: What happened
to the k?

1045
01:09:40,500 --> 01:09:42,830
PROFESSOR: Why is
k equal to 1?

1046
01:09:42,830 --> 01:09:43,660
Which k?

1047
01:09:43,660 --> 01:09:44,189
Over here?

1048
01:09:44,189 --> 01:09:47,130
AUDIENCE: Yeah. kd
[INAUDIBLE].

1049
01:09:47,130 --> 01:09:47,380
PROFESSOR: All right.

1050
01:09:47,380 --> 01:09:50,955
We're only considering
rate 1/n codes.

1051
01:09:50,955 --> 01:09:55,850
If it were a rate k/n code, we
would get k over n, because

1052
01:09:55,850 --> 01:10:00,000
this would be n over k
times alpha squared.

1053
01:10:00,000 --> 01:10:02,440
Just the same as before.

1054
01:10:02,440 --> 01:10:05,800
In fact, I just want to wave my
hand, say, everything goes

1055
01:10:05,800 --> 01:10:07,270
through as before.

1056
01:10:07,270 --> 01:10:10,830
As soon as you get the error
event concept, you can reduce

1057
01:10:10,830 --> 01:10:13,870
it to the calculation of
pairwise error probabilities,

1058
01:10:13,870 --> 01:10:16,160
and then the union bound
estimate is as before.

1059
01:10:16,160 --> 01:10:17,410
AUDIENCE: [INAUDIBLE]?

1060
01:10:22,180 --> 01:10:27,600
PROFESSOR: Well, I had my
little trellis here.

1061
01:10:27,600 --> 01:10:33,270
And how long in real time
does it take for

1062
01:10:33,270 --> 01:10:34,565
two paths to merge?

1063
01:10:38,040 --> 01:10:41,560
An error event has got to take
at least nu plus 1 time units

1064
01:10:41,560 --> 01:10:45,650
for two paths to diverge
and then merge

1065
01:10:45,650 --> 01:10:46,900
again in the trellis.

1066
01:10:49,780 --> 01:10:51,030
Is that clear?

1067
01:10:57,850 --> 01:11:03,660
Or put another way, the lowest
degree of any possible error

1068
01:11:03,660 --> 01:11:08,120
event is nu, which means it
actually takes place over nu

1069
01:11:08,120 --> 01:11:09,575
plus 1 time units.

1070
01:11:12,400 --> 01:11:14,300
OK?

1071
01:11:14,300 --> 01:11:14,620
From this.

1072
01:11:14,620 --> 01:11:17,560
AUDIENCE: [INAUDIBLE]

1073
01:11:17,560 --> 01:11:18,810
PROFESSOR: Why is
the lowest --?

1074
01:11:21,390 --> 01:11:23,760
I'm taking g of d --

1075
01:11:23,760 --> 01:11:26,010
the definition of nu is what?

1076
01:11:26,010 --> 01:11:29,230
The maximum degree of g of d.

1077
01:11:29,230 --> 01:11:30,550
OK?

1078
01:11:30,550 --> 01:11:35,430
So if that's so, then the
shortest length error event is

1079
01:11:35,430 --> 01:11:41,460
1 times g of d, which takes nu
plus 1 time units to run out.

1080
01:11:41,460 --> 01:11:45,000
So error events have to be at
least this long, and then they

1081
01:11:45,000 --> 01:11:47,310
can be any integer length
longer than that.

1082
01:11:53,350 --> 01:11:54,950
You don't look totally happy.

1083
01:11:54,950 --> 01:11:56,780
AUDIENCE: [INAUDIBLE]

1084
01:11:56,780 --> 01:11:57,615
PROFESSOR: You understand?

1085
01:11:57,615 --> 01:11:57,890
OK.

1086
01:11:57,890 --> 01:11:59,140
AUDIENCE: [INAUDIBLE]

1087
01:12:06,820 --> 01:12:10,050
PROFESSOR: How I see it
from this diagram?

1088
01:12:10,050 --> 01:12:11,880
Where have I got a picture
of a trellis?

1089
01:12:11,880 --> 01:12:14,640
Here I've got a picture
of a trellis.

1090
01:12:14,640 --> 01:12:19,520
I've defined an error
event by taking a

1091
01:12:19,520 --> 01:12:20,690
transmitted code word.

1092
01:12:20,690 --> 01:12:22,880
It's a code sequence.

1093
01:12:22,880 --> 01:12:25,320
This is supposed to represent
some path through the trellis.

1094
01:12:25,320 --> 01:12:27,060
There's one-to-one
correspondence between code

1095
01:12:27,060 --> 01:12:29,500
sequences and trellis paths.

1096
01:12:29,500 --> 01:12:32,890
Then I find another code
sequence, which is the one I

1097
01:12:32,890 --> 01:12:34,080
actually decided on.

1098
01:12:34,080 --> 01:12:37,210
Call that the decoded
code sequence.

1099
01:12:37,210 --> 01:12:40,310
And I say, what's the minimum
length of time they could be

1100
01:12:40,310 --> 01:12:43,040
diverged from one another?

1101
01:12:43,040 --> 01:12:43,200
All right?

1102
01:12:43,200 --> 01:12:46,230
Let's take this particular
trellis.

1103
01:12:46,230 --> 01:12:49,080
What's the minimum length
of time any two paths --

1104
01:12:49,080 --> 01:12:51,995
say, here's the transmitted
path.

1105
01:12:54,540 --> 01:12:58,690
Suppose I try to find another
path that diverges from it?

1106
01:12:58,690 --> 01:12:59,935
Here's one.

1107
01:12:59,935 --> 01:13:01,600
Comes back to it.

1108
01:13:01,600 --> 01:13:02,950
Here's one.

1109
01:13:02,950 --> 01:13:03,470
Another one.

1110
01:13:03,470 --> 01:13:06,730
Comes back to it.

1111
01:13:06,730 --> 01:13:08,550
I say that the minimum length
of time it could

1112
01:13:08,550 --> 01:13:10,030
take is nu plus 1.

1113
01:13:10,030 --> 01:13:10,910
Why?

1114
01:13:10,910 --> 01:13:14,940
Because the difference between
these two paths is itself a

1115
01:13:14,940 --> 01:13:19,800
code sequence, is therefore
a non-zero polynomial

1116
01:13:19,800 --> 01:13:23,310
multiple of g of d.

1117
01:13:23,310 --> 01:13:23,970
OK?

1118
01:13:23,970 --> 01:13:25,220
Same argument.

1119
01:13:30,210 --> 01:13:32,830
OK.

1120
01:13:32,830 --> 01:13:36,270
So I guess I'm not going to get
into chapter 10, so I'll

1121
01:13:36,270 --> 01:13:41,240
discourse a little bit more
about convolutional codes

1122
01:13:41,240 --> 01:13:42,490
versus block codes.

1123
01:13:54,940 --> 01:13:59,480
How do you construct
convolutional codes, actually?

1124
01:13:59,480 --> 01:14:03,820
You see that really, what you
want to do is to first of all,

1125
01:14:03,820 --> 01:14:07,250
maximize the minimum distance
for certain constraint length.

1126
01:14:07,250 --> 01:14:10,220
Subject to that, you want to
minimize the number of minimum

1127
01:14:10,220 --> 01:14:13,220
distance words.

1128
01:14:13,220 --> 01:14:18,050
You want to, in fact, get the
best distance profile.

1129
01:14:18,050 --> 01:14:20,620
Is there any block codes?

1130
01:14:20,620 --> 01:14:23,340
We had nice, algebraic
ways of doing this.

1131
01:14:23,340 --> 01:14:25,830
Roots of polynomials,
Reed-Solomon codes.

1132
01:14:25,830 --> 01:14:31,090
We could develop an algebraic
formula which told us what the

1133
01:14:31,090 --> 01:14:32,380
minimum distance was.

1134
01:14:32,380 --> 01:14:34,600
Do we have any nice algebraic
constructions like that

1135
01:14:34,600 --> 01:14:35,830
convolutional code?

1136
01:14:35,830 --> 01:14:38,300
No.

1137
01:14:38,300 --> 01:14:41,940
Basically, you've just got to
search all the possible

1138
01:14:41,940 --> 01:14:45,360
polynomials where the maximum
degree is nu.

1139
01:14:45,360 --> 01:14:46,405
Take pairs of polynomials.

1140
01:14:46,405 --> 01:14:50,690
If you want a rate 1/2 code of
degree nu, there's not that

1141
01:14:50,690 --> 01:14:52,470
many things you have
to search.

1142
01:14:52,470 --> 01:14:53,190
All right?

1143
01:14:53,190 --> 01:14:56,660
You just take all possible pairs
of binary polynomials of

1144
01:14:56,660 --> 01:15:01,760
degree nu or less, making sure
that you don't take any two

1145
01:15:01,760 --> 01:15:06,170
which have a common divisor, an
untrivial common divisor,

1146
01:15:06,170 --> 01:15:08,790
so you can wipe those out.

1147
01:15:08,790 --> 01:15:11,130
You want to make sure that
the constant term is 1.

1148
01:15:11,130 --> 01:15:14,260
There's no point in sliding one
over so it has a larger

1149
01:15:14,260 --> 01:15:16,110
constant term than 1.

1150
01:15:16,110 --> 01:15:20,750
But subject to those provisos,
you simply try all pairs g1,

1151
01:15:20,750 --> 01:15:24,050
g2, and you just do --

1152
01:15:24,050 --> 01:15:27,690
as soon as you've assured
yourself they're not

1153
01:15:27,690 --> 01:15:30,740
catastrophic, they don't have a
common divisor, then you can

1154
01:15:30,740 --> 01:15:33,600
just list, you know, the finite
code words are going to

1155
01:15:33,600 --> 01:15:35,020
be the ones that are
generated by

1156
01:15:35,020 --> 01:15:37,360
finite information sequences.

1157
01:15:37,360 --> 01:15:40,180
So you can just list all the
code words, as I've started to

1158
01:15:40,180 --> 01:15:41,500
do up here.

1159
01:15:41,500 --> 01:15:47,550
Or since the trellis is, in
fact, a way of listing all the

1160
01:15:47,550 --> 01:15:50,190
code words, there's a one-to-one
correspondence

1161
01:15:50,190 --> 01:15:53,330
between code words and trellis
paths, you can just start

1162
01:15:53,330 --> 01:15:57,430
searching through the trellis
and you will quickly find all

1163
01:15:57,430 --> 01:16:01,020
the minimum weight code words,
and thereby establish the

1164
01:16:01,020 --> 01:16:02,880
minimum distance, the
weight profile as

1165
01:16:02,880 --> 01:16:04,550
far out as you like.

1166
01:16:04,550 --> 01:16:05,680
And you choose the best.

1167
01:16:05,680 --> 01:16:07,030
You try all possibilities.

1168
01:16:07,030 --> 01:16:11,980
So Joseph Odenwalder did this
as soon as people recognized

1169
01:16:11,980 --> 01:16:15,130
that short convolutional codes
could be practical, and he

1170
01:16:15,130 --> 01:16:20,140
published the tables back in his
PhD thesis in '69, so he

1171
01:16:20,140 --> 01:16:22,546
got a PhD thesis out of this.

1172
01:16:22,546 --> 01:16:25,510
It wasn't that hard, it's done
once and for all, and the

1173
01:16:25,510 --> 01:16:29,150
results are in the notes.

1174
01:16:29,150 --> 01:16:30,470
The tables.

1175
01:16:30,470 --> 01:16:36,750
And you can see from the
tables that in terms of

1176
01:16:36,750 --> 01:16:39,480
performance versus moderate
complexity,

1177
01:16:39,480 --> 01:16:41,360
things go pretty well.

1178
01:16:41,360 --> 01:16:43,660
Here's this four state code.

1179
01:16:43,660 --> 01:16:47,830
It already gets you 4 dB of
effective coding gain.

1180
01:16:47,830 --> 01:16:51,610
To get to 6 dB of effective
coding gain, you need to go up

1181
01:16:51,610 --> 01:16:54,810
to about 64 states.

1182
01:16:54,810 --> 01:16:57,850
First person to do this was
Jerry Heller at Jet Propulsion

1183
01:16:57,850 --> 01:16:58,530
Laboratory.

1184
01:16:58,530 --> 01:17:03,550
Again, about '68, immediately
after Viterbi proposed his

1185
01:17:03,550 --> 01:17:08,640
algorithm in '67, Heller was the
first to go out and say,

1186
01:17:08,640 --> 01:17:10,560
well, let's see how
these perform.

1187
01:17:10,560 --> 01:17:15,550
So he found a good 64 state
rate 1/2 code, and he did

1188
01:17:15,550 --> 01:17:19,030
probability of error versus Eb
over N0, and he said, wow.

1189
01:17:19,030 --> 01:17:21,340
I get a 6 dB coding gain.

1190
01:17:21,340 --> 01:17:26,040
And Viterbi had no idea that his
algorithm would actually

1191
01:17:26,040 --> 01:17:27,650
be useful in practice.

1192
01:17:27,650 --> 01:17:29,970
He was just using it
to make a proof.

1193
01:17:29,970 --> 01:17:33,330
And he didn't even know
it was optimum.

1194
01:17:33,330 --> 01:17:36,360
But he's always given the credit
to Heller for realizing

1195
01:17:36,360 --> 01:17:37,960
it could be practical.

1196
01:17:37,960 --> 01:17:43,450
And so that 64 state rate 1/2
code with Viterbi algorithm

1197
01:17:43,450 --> 01:17:47,360
decoding became very popular
in the '70s.

1198
01:17:47,360 --> 01:17:50,780
Heller and Viterbi and Jacobs
went off to form a company

1199
01:17:50,780 --> 01:17:52,150
called Linkabit.

1200
01:17:52,150 --> 01:17:58,180
And for the technology of the
time, that seemed to be a very

1201
01:17:58,180 --> 01:18:00,060
appropriate solution.

1202
01:18:00,060 --> 01:18:02,340
You see you get approximately
the same thing as a

1203
01:18:02,340 --> 01:18:04,220
rate 1/3, rate 1/4.

1204
01:18:04,220 --> 01:18:06,995
If you go lower in rate you
can do marginally better.

1205
01:18:06,995 --> 01:18:09,675
If you are after tenths of
a dB, it's worthwhile.

1206
01:18:12,280 --> 01:18:17,280
Later in the decade, is the best
way to get more gain to

1207
01:18:17,280 --> 01:18:20,320
go to more and more complicated

1208
01:18:20,320 --> 01:18:21,420
convolutional codes?

1209
01:18:21,420 --> 01:18:22,840
No.

1210
01:18:22,840 --> 01:18:27,890
The best way is to use the
concatenated idea.

1211
01:18:27,890 --> 01:18:32,950
Once you use these maximum
likelihood decoders, the

1212
01:18:32,950 --> 01:18:37,020
Viterbi decoders, to get your
error rate down to 10 to the

1213
01:18:37,020 --> 01:18:40,330
minus 3 or something, 1 in 1000,
at that point you have

1214
01:18:40,330 --> 01:18:44,750
very few errors, and you can
apply then an outer code to

1215
01:18:44,750 --> 01:18:47,550
clean up the error events
that do occur.

1216
01:18:47,550 --> 01:18:51,190
You see, you're going to get
bursts of errors in your

1217
01:18:51,190 --> 01:18:52,260
decoded sequence.

1218
01:18:52,260 --> 01:18:57,450
It's a very natural idea to have
an outer code, which is

1219
01:18:57,450 --> 01:19:02,740
based on gf of 256,
say, 8 bit bytes.

1220
01:19:02,740 --> 01:19:07,975
And so a Reed-Solomon code comes
along and cleans up the

1221
01:19:07,975 --> 01:19:10,360
errors that do occur, and drives
the error probability

1222
01:19:10,360 --> 01:19:13,580
down to 10 to the minus 12, to
whatever you like it, with

1223
01:19:13,580 --> 01:19:18,040
very little redundancy, very
little additional cost.

1224
01:19:18,040 --> 01:19:21,930
So that became the standard
approach for space

1225
01:19:21,930 --> 01:19:27,430
communications in the '70s
and indeed '80s.

1226
01:19:27,430 --> 01:19:31,980
I've already mentioned that
around '90, they went up to

1227
01:19:31,980 --> 01:19:35,820
this 2 to the fourteenth
state Viterbi decoder.

1228
01:19:35,820 --> 01:19:38,350
They went to much more powerful

1229
01:19:38,350 --> 01:19:40,520
outer codes, much cleverer.

1230
01:19:40,520 --> 01:19:43,670
And they were able to get to
within about 2 or 3 dB of the

1231
01:19:43,670 --> 01:19:45,020
Shannon limit.

1232
01:19:45,020 --> 01:19:47,340
And that was the
state-of-the-art on the eve of

1233
01:19:47,340 --> 01:19:50,520
the discovery of turbo codes,
which is where we're going in

1234
01:19:50,520 --> 01:19:52,220
all of this.

1235
01:19:52,220 --> 01:19:55,070
So from a practical point
of view, in the moderate

1236
01:19:55,070 --> 01:19:58,660
complexity regime, simple
convolutional codes with

1237
01:19:58,660 --> 01:20:03,230
moderate complexity Viterbi
decoding are the best still

1238
01:20:03,230 --> 01:20:04,790
that anybody knows how to do.

1239
01:20:04,790 --> 01:20:06,690
They have all these
system advantages.

1240
01:20:06,690 --> 01:20:09,820
They work with nice synchronous
streams of

1241
01:20:09,820 --> 01:20:12,380
traffic, which is what you want
for data transmission.

1242
01:20:12,380 --> 01:20:14,300
They use soft decisions.

1243
01:20:14,300 --> 01:20:15,750
They use any kind
of reliability

1244
01:20:15,750 --> 01:20:16,710
information you have.

1245
01:20:16,710 --> 01:20:19,230
They're not limited
by hard decisions.

1246
01:20:19,230 --> 01:20:22,880
They're not limited by bounded
systems as the algebraic

1247
01:20:22,880 --> 01:20:24,120
schemes are.

1248
01:20:24,120 --> 01:20:28,470
So it just proved to be a better
way to go on channels

1249
01:20:28,470 --> 01:20:31,850
like the additive white Gaussian
noise channel.

1250
01:20:31,850 --> 01:20:32,240
OK.

1251
01:20:32,240 --> 01:20:33,810
So we didn't get into
chapter 10.

1252
01:20:33,810 --> 01:20:35,060
We'll start that next time.