1 00:03:10,618 --> 00:03:11,117 PROFESSOR: I'm sorry. 2 00:03:11,117 --> 00:03:14,380 I have to give this to you. 3 00:03:14,380 --> 00:03:15,777 Basically OK, except you didn't understand 4 00:03:15,777 --> 00:03:17,027 it the first time. 5 00:03:19,730 --> 00:03:20,060 OK. 6 00:03:20,060 --> 00:03:20,600 Good morning. 7 00:03:20,600 --> 00:03:21,965 Happy spring. 8 00:03:21,965 --> 00:03:24,290 Happy April. 9 00:03:24,290 --> 00:03:26,370 We're running a little late. 10 00:03:26,370 --> 00:03:29,850 I'll try to finish chapter nine today and start on 11 00:03:29,850 --> 00:03:32,150 chapter ten. 12 00:03:32,150 --> 00:03:34,685 Ashish may have chapter ten here later. 13 00:03:34,685 --> 00:03:37,920 If not, it'll be on the web. 14 00:03:37,920 --> 00:03:41,390 So we've been talking about convolutional codes. 15 00:03:41,390 --> 00:03:44,710 Specifically rate 1/n convolutional codes. 16 00:03:44,710 --> 00:03:46,960 One input n outputs. 17 00:03:46,960 --> 00:03:50,110 As you may imagine, there are also rate k/n 18 00:03:50,110 --> 00:03:52,500 convolutional codes. 19 00:03:52,500 --> 00:03:55,250 Generalization to k inputs n outputs. 20 00:03:55,250 --> 00:04:00,870 These have not been so much used in practice, because in 21 00:04:00,870 --> 00:04:04,190 general, we're using these binary codes in the power 22 00:04:04,190 --> 00:04:06,630 limited regime, and we want low rates. 23 00:04:06,630 --> 00:04:08,770 So 1/n is a good rate. 24 00:04:08,770 --> 00:04:14,750 1/2, 1/3, 1/4, or down to 1/6 are common kinds of rates. 25 00:04:14,750 --> 00:04:17,079 So I haven't bothered to develop the more elaborate 26 00:04:17,079 --> 00:04:19,709 theory for rate k/n codes. 27 00:04:19,709 --> 00:04:24,670 The rate 1/n codes, just to review. 28 00:04:24,670 --> 00:04:29,200 The code is simply defined as the set of all possible output 29 00:04:29,200 --> 00:04:34,050 sequences of a convolutional encoder, whose N impulse 30 00:04:34,050 --> 00:04:39,580 responses are written as an n-tuple g of d as the inputs 31 00:04:39,580 --> 00:04:41,750 range over all Laurent 32 00:04:41,750 --> 00:04:45,680 sequences, bi-infinite sequences. 33 00:04:45,680 --> 00:04:49,710 And what we showed last time is that without loss of 34 00:04:49,710 --> 00:04:55,620 generality or optimality, we could always take this g of d 35 00:04:55,620 --> 00:04:58,670 to be of a particular canonical form. 36 00:04:58,670 --> 00:05:02,300 Namely, the g of d could be taken to be a set of n 37 00:05:02,300 --> 00:05:08,900 polynomials, g_j of d, that were relatively prime. 38 00:05:08,900 --> 00:05:13,610 And for any code, you can find such a canonical encoder by 39 00:05:13,610 --> 00:05:18,990 its unique [UNINTELLIGIBLE] two units, and specifies the 40 00:05:18,990 --> 00:05:21,790 code, clearly, by this. 41 00:05:21,790 --> 00:05:27,230 So that's a nice canonical encoder to take. 42 00:05:27,230 --> 00:05:29,750 It furthermore has the property that there's an 43 00:05:29,750 --> 00:05:32,030 obvious realization of this encoder in the 44 00:05:32,030 --> 00:05:35,810 [UNINTELLIGIBLE] register form with nu memory units. 45 00:05:35,810 --> 00:05:38,840 And therefore 2 to the nu states. 46 00:05:38,840 --> 00:05:43,560 And we will prove in chapter ten that this is the minimal 47 00:05:43,560 --> 00:05:45,480 possible encoder for this code. 48 00:05:45,480 --> 00:05:48,050 In other words, there are lots of different encoders that 49 00:05:48,050 --> 00:05:51,340 generate this code, but you can't possibly encode it with 50 00:05:51,340 --> 00:05:58,422 fewer than the state space of dimension less than nu. 51 00:05:58,422 --> 00:06:01,350 That's the way it's going to sound in chapter ten. 52 00:06:01,350 --> 00:06:03,980 So we haven't quite got there yet, but it's a minimal 53 00:06:03,980 --> 00:06:08,020 encoder, therefore, in that sense. 54 00:06:08,020 --> 00:06:08,560 All right. 55 00:06:08,560 --> 00:06:13,560 Today we're going to go now and exploit the finite state 56 00:06:13,560 --> 00:06:24,100 structure of this code, to get a very efficient maximum 57 00:06:24,100 --> 00:06:27,930 likelihood sequence decoding algorithm called the Viterbi 58 00:06:27,930 --> 00:06:30,470 algorithm, which I'm sure you've all heard of. 59 00:06:30,470 --> 00:06:35,025 It's become very famous, not only in this field, but really 60 00:06:35,025 --> 00:06:38,470 in any place where you're trying to observe a finite 61 00:06:38,470 --> 00:06:41,130 state machine in memory-less noise. 62 00:06:41,130 --> 00:06:45,210 A finite state hidden markov model, if you like. 63 00:06:45,210 --> 00:06:49,100 And so now it's used, for instance, in the detection of 64 00:06:49,100 --> 00:06:53,030 genomes, where the exons end and the introns start and so 65 00:06:53,030 --> 00:06:55,940 forth, and the garbage is. 66 00:06:55,940 --> 00:06:59,970 And people use it who have no idea that it was originally a 67 00:06:59,970 --> 00:07:02,560 digital communications algorithm. 68 00:07:02,560 --> 00:07:04,190 But it's a very obvious algorithm. 69 00:07:04,190 --> 00:07:07,850 It's come up in many different forms, and Viterbi, in a way, 70 00:07:07,850 --> 00:07:09,880 was just lucky to get his name on it. 71 00:07:09,880 --> 00:07:15,830 Because it would come up very easily as soon as you pose the 72 00:07:15,830 --> 00:07:18,390 problem correctly. 73 00:07:18,390 --> 00:07:21,315 Let's start out with terminated 74 00:07:21,315 --> 00:07:22,685 convolutional codes. 75 00:07:27,920 --> 00:07:31,210 Which I forget whether I started on this last time. 76 00:07:31,210 --> 00:07:34,460 I think I may have. 77 00:07:34,460 --> 00:07:36,580 But we'll start from scratch again. 78 00:07:36,580 --> 00:07:43,130 When I say terminated, I mean that we're going to take only 79 00:07:43,130 --> 00:07:46,370 a subset of the code words that start at a 80 00:07:46,370 --> 00:07:47,500 certain time -- 81 00:07:47,500 --> 00:07:49,350 say, time 0 -- 82 00:07:49,350 --> 00:07:53,100 continue for some finite time, and then end. 83 00:07:53,100 --> 00:07:56,830 And in this set up, with this canonical encoder, it's 84 00:07:56,830 --> 00:08:05,640 polynomial, the easy way to specify that is to let u of d 85 00:08:05,640 --> 00:08:10,380 be a finite sequence that starts at time 0. 86 00:08:10,380 --> 00:08:13,570 That's called a polynomial. 87 00:08:13,570 --> 00:08:18,440 and let's restrict its degree -- 88 00:08:18,440 --> 00:08:23,680 degree u of d less than k. 89 00:08:23,680 --> 00:08:29,150 In other words, it looks like u0 plus u1 d plus so forth up 90 00:08:29,150 --> 00:08:33,260 to uk minus 1 d. 91 00:08:33,260 --> 00:08:38,150 Therefore I've chosen k, so I really have k information bits 92 00:08:38,150 --> 00:08:39,400 or input bits. 93 00:08:42,960 --> 00:08:44,210 All right? 94 00:08:46,420 --> 00:08:49,730 So I specify u0 through uk minus 1. 95 00:08:49,730 --> 00:08:52,320 I use that as the input in my encoder. 96 00:08:52,320 --> 00:08:56,220 What is then my code, my truncated code, 97 00:08:56,220 --> 00:09:00,250 let's call it ck -- 98 00:09:00,250 --> 00:09:03,650 that's not a very good notation, but we'll use it 99 00:09:03,650 --> 00:09:05,240 just for now -- 100 00:09:05,240 --> 00:09:15,230 will be the set of all u of d, g of d such that u of d is 101 00:09:15,230 --> 00:09:23,310 polynomial and the degree of u of d is less than k. 102 00:09:26,570 --> 00:09:26,940 OK. 103 00:09:26,940 --> 00:09:30,110 So we ask what that code is going to be. 104 00:09:30,110 --> 00:09:35,990 And let's take some simple examples. 105 00:09:35,990 --> 00:09:40,700 Turns out when we terminate codes, we always choose 106 00:09:40,700 --> 00:09:44,750 polynomial encoders, so the code will naturally collapse 107 00:09:44,750 --> 00:09:47,730 back to the 0 state. 108 00:09:47,730 --> 00:09:49,120 How long? 109 00:09:49,120 --> 00:09:55,280 Nu time units after the input stops coming in. 110 00:09:55,280 --> 00:09:59,480 The outputs will be all 0 at that point and forevermore. 111 00:09:59,480 --> 00:10:02,870 The shift register will clear itself out nu time units 112 00:10:02,870 --> 00:10:08,390 later, and then they'll be all 0. 113 00:10:08,390 --> 00:10:12,020 So the code really starts at time 0 and ends 114 00:10:12,020 --> 00:10:15,750 at time k plus nu. 115 00:10:15,750 --> 00:10:19,100 So we aren't going to worry, in this case, whether the code 116 00:10:19,100 --> 00:10:22,030 is non-catastrophic or not. 117 00:10:22,030 --> 00:10:25,020 Here one of the principal properties that we had was 118 00:10:25,020 --> 00:10:30,120 this property guaranteed non-catastrophic. 119 00:10:34,458 --> 00:10:39,270 And in fact, is necessary for non-catastrophicity. 120 00:10:39,270 --> 00:10:42,470 So we spent a lot of time talking about why that would 121 00:10:42,470 --> 00:10:45,340 be a bad thing, to be catastrophic. 122 00:10:45,340 --> 00:10:45,780 OK. 123 00:10:45,780 --> 00:10:54,980 But let's take a catastrophic rate 1/1 encoder, which simply 124 00:10:54,980 --> 00:10:58,100 we have g of d equals 1 plus d. 125 00:10:58,100 --> 00:11:02,530 You remember this encoder as nu equals 1. 126 00:11:02,530 --> 00:11:04,845 The input looks like this. 127 00:11:04,845 --> 00:11:08,210 uk, uk minus 1. 128 00:11:08,210 --> 00:11:12,390 And we simply add these, yk. 129 00:11:12,390 --> 00:11:19,680 So we get y of d equals 1 plus d u of d. 130 00:11:22,550 --> 00:11:22,890 OK. 131 00:11:22,890 --> 00:11:33,720 Now, if the input to this is a finite sequence, a polynomial, 132 00:11:33,720 --> 00:11:38,440 let's see what this code can possibly be. 133 00:11:38,440 --> 00:11:39,720 What the code's going to look like. 134 00:11:39,720 --> 00:11:43,920 The code's going to start in state 0, at time 0. 135 00:11:43,920 --> 00:11:47,420 So here's the time. 136 00:11:47,420 --> 00:11:51,260 At the first time, we can either get a 0 in our a 1 in, 137 00:11:51,260 --> 00:11:53,865 and accordingly, it will go to state 0 or 1. 138 00:11:57,110 --> 00:12:02,300 At the second time, we can go to state 0 with another 0 or 139 00:12:02,300 --> 00:12:04,150 just state 1 with a 1. 140 00:12:04,150 --> 00:12:08,910 If we get a 1 in and we're in state 1, then the output is 141 00:12:08,910 --> 00:12:12,260 going to be 0, and we'll stay in state 1. 142 00:12:12,260 --> 00:12:16,140 If we get a 0 in, the output will be a 1, and we'll go back 143 00:12:16,140 --> 00:12:17,430 to state 0. 144 00:12:17,430 --> 00:12:19,360 So this is what a typical trellis 145 00:12:19,360 --> 00:12:21,210 section looks like here. 146 00:12:21,210 --> 00:12:23,060 We have all possible transitions 147 00:12:23,060 --> 00:12:25,190 between the two states. 148 00:12:25,190 --> 00:12:29,380 And so now it goes on for a while like this. 149 00:12:29,380 --> 00:12:31,820 Time invariantly. 150 00:12:31,820 --> 00:12:40,640 There are 1, dot dot dot, and then finally, at some time -- 151 00:12:40,640 --> 00:12:47,050 say this is time k, or maybe it's time k minus 1 -- 152 00:12:47,050 --> 00:12:50,000 time index is a little fuzzy here -- we don't 153 00:12:50,000 --> 00:12:55,730 put any more 1s in. 154 00:12:55,730 --> 00:12:58,620 We only put 0s on from that time forward. 155 00:12:58,620 --> 00:13:01,930 So we could have one more transition back to state 0, 156 00:13:01,930 --> 00:13:06,920 and then after that it just stays in state 0 forevermore. 157 00:13:06,920 --> 00:13:09,030 And this isn't very interesting. 158 00:13:09,030 --> 00:13:12,650 It was in time 0 all the time before here. 159 00:13:12,650 --> 00:13:15,570 It was in state 0 all the time before there and put out 0s. 160 00:13:15,570 --> 00:13:19,410 It was in state 0 all the time after here and put out 0s. 161 00:13:19,410 --> 00:13:23,780 And this is clearly not conveying any information. 162 00:13:23,780 --> 00:13:25,250 Not an interesting part of the code. 163 00:13:25,250 --> 00:13:28,750 So we're going to consider the code just to be defined over 164 00:13:28,750 --> 00:13:36,140 these k plus 1 units of time. 165 00:13:36,140 --> 00:13:42,690 And so y of d we're going to consider to be a 166 00:13:42,690 --> 00:13:44,650 polynomial of degree k. 167 00:13:51,840 --> 00:13:55,096 Now we're going to assume that they're k bits in. 168 00:13:55,096 --> 00:13:57,800 Then we're going to wait one more time to let the shift 169 00:13:57,800 --> 00:14:01,850 register clear out, which is nu. 170 00:14:01,850 --> 00:14:06,095 And at that time, we're going to take whatever y is, and 171 00:14:06,095 --> 00:14:07,490 then we're going to terminate it. 172 00:14:07,490 --> 00:14:17,220 So if we consider this now, we have a block code whose length 173 00:14:17,220 --> 00:14:22,800 is the number of non-zero coefficients of y of d. 174 00:14:22,800 --> 00:14:28,450 So we have n equals k plus 1, we have k by design 175 00:14:28,450 --> 00:14:34,480 information bits, and what's the minimum 176 00:14:34,480 --> 00:14:35,690 non-zero weight of this? 177 00:14:35,690 --> 00:14:37,000 It's linear. 178 00:14:37,000 --> 00:14:38,665 What's its minimum non-zero weight, 179 00:14:38,665 --> 00:14:39,915 or its minimum distance? 180 00:14:42,614 --> 00:14:43,720 AUDIENCE: [INAUDIBLE] 181 00:14:43,720 --> 00:14:44,570 PROFESSOR: 1. 182 00:14:44,570 --> 00:14:47,300 Show me a code word of weight 1. 183 00:14:47,300 --> 00:14:48,550 AUDIENCE: [INAUDIBLE] 184 00:14:54,425 --> 00:14:58,320 PROFESSOR: I claim the minimum non-zero weight is 2 by 185 00:14:58,320 --> 00:15:00,580 typical argument. 186 00:15:00,580 --> 00:15:02,920 I have the all 0 sequence. 187 00:15:02,920 --> 00:15:06,140 If I ever leave the all 0 sequence, I accumulate 1 unit 188 00:15:06,140 --> 00:15:07,890 of Hamming weight. 189 00:15:07,890 --> 00:15:10,890 And I need to ultimately come back to the all 0 sequence, 190 00:15:10,890 --> 00:15:13,370 because I've made everything finite here. 191 00:15:13,370 --> 00:15:16,300 So in this case, I can make the argument that I always 192 00:15:16,300 --> 00:15:18,120 have to come back to the all 0 sequence. 193 00:15:18,120 --> 00:15:21,140 Whenever I merge back into the all 0 sequence, I'll 194 00:15:21,140 --> 00:15:23,850 accumulate another unit of weight. 195 00:15:23,850 --> 00:15:29,910 So this is a code with minimum distance 2. 196 00:15:29,910 --> 00:15:32,090 Minimum non-zero weight 2. 197 00:15:32,090 --> 00:15:37,620 And in fact, it's just the single parity-check code of 198 00:15:37,620 --> 00:15:41,845 length n equals k plus 1. 199 00:15:46,060 --> 00:15:47,230 OK? 200 00:15:47,230 --> 00:15:52,770 And if you ask yourself, can you generate any -- 201 00:15:52,770 --> 00:15:56,390 this is supposedly the even weight code. 202 00:15:56,390 --> 00:16:00,730 It contains all even weight k plus 1-tuples. 203 00:16:00,730 --> 00:16:04,380 And I think you can convince yourself that you can generate 204 00:16:04,380 --> 00:16:11,050 any of these k plus 1 even weight k-tuples. 205 00:16:11,050 --> 00:16:13,120 In fact, the generator matrix for this code. 206 00:16:13,120 --> 00:16:13,380 Look. 207 00:16:13,380 --> 00:16:15,160 Here's a set of generators. 208 00:16:15,160 --> 00:16:17,120 1 1 is in the code. 209 00:16:17,120 --> 00:16:18,770 0 1 1 is in the code. 210 00:16:18,770 --> 00:16:21,020 0 0 1 1 in the code. 211 00:16:21,020 --> 00:16:24,480 So here's a set of generators. 212 00:16:24,480 --> 00:16:27,120 0 1 1. 213 00:16:27,120 --> 00:16:29,785 Generator matrix for the code looks like this. 214 00:16:32,460 --> 00:16:34,040 We just go like that. 215 00:16:34,040 --> 00:16:37,870 And with these k generators, you can generate any even 216 00:16:37,870 --> 00:16:41,030 weight code word. 217 00:16:41,030 --> 00:16:45,150 So we get a kind of convolutional form to the 218 00:16:45,150 --> 00:16:46,270 generator matrix. 219 00:16:46,270 --> 00:16:50,920 A sliding parity-check form to the generator matrix. 220 00:16:50,920 --> 00:16:51,650 OK. 221 00:16:51,650 --> 00:16:55,470 So I assert that this is the single parity-check code. 222 00:16:55,470 --> 00:16:56,630 So here's a trellis 223 00:16:56,630 --> 00:16:59,790 representation for a block code. 224 00:16:59,790 --> 00:17:00,170 Yeah? 225 00:17:00,170 --> 00:17:01,420 AUDIENCE: [INAUDIBLE] 226 00:17:07,990 --> 00:17:11,609 PROFESSOR: I'm sure I've got the time indices screwed up. 227 00:17:11,609 --> 00:17:15,390 But I put in k bits. 228 00:17:15,390 --> 00:17:17,270 I wait 1 more unit of time -- 229 00:17:17,270 --> 00:17:18,730 nu equals 1 -- 230 00:17:18,730 --> 00:17:20,089 for this to clear out. 231 00:17:20,089 --> 00:17:22,560 So I get out k plus 1 bits. 232 00:17:22,560 --> 00:17:24,250 Think of it in terms of this here. 233 00:17:24,250 --> 00:17:29,390 I put in k, the last input you think of as necessarily being 234 00:17:29,390 --> 00:17:34,088 0, and I take the outputs here for k plus 1 times. 235 00:17:37,920 --> 00:17:38,420 OK. 236 00:17:38,420 --> 00:17:41,740 So first of all, a couple points here. 237 00:17:41,740 --> 00:17:44,195 AUDIENCE: In general, the length will be k plus nu? 238 00:17:44,195 --> 00:17:46,340 PROFESSOR: In general, length will be k plus nu. 239 00:17:46,340 --> 00:17:48,250 So let's write that down. 240 00:17:52,620 --> 00:18:09,010 So in general, by terminating a rate 1/n -- 241 00:18:09,010 --> 00:18:11,550 nu is called the constraint length -- 242 00:18:11,550 --> 00:18:18,380 constraint length nu, or 2 to the nu state code -- 243 00:18:18,380 --> 00:18:32,160 convolutional code after k inputs, we get a binary linear 244 00:18:32,160 --> 00:18:36,480 block code whose parameters are -- 245 00:18:36,480 --> 00:18:40,010 we have k information bits. 246 00:18:40,010 --> 00:18:47,150 We have k plus nu non-trivial output times. 247 00:18:47,150 --> 00:18:50,410 At each output time, I get n output bits. 248 00:18:50,410 --> 00:18:57,070 So it's n times k plus nu is the total effective 249 00:18:57,070 --> 00:19:00,020 length of the code. 250 00:19:00,020 --> 00:19:06,140 And the distance, we would have to say, is greater than 251 00:19:06,140 --> 00:19:10,520 or equal to the distance of the block code is greater than 252 00:19:10,520 --> 00:19:13,260 or equal to the distance of the convolutional code. 253 00:19:13,260 --> 00:19:14,110 Why? 254 00:19:14,110 --> 00:19:22,490 Because all of the words in this block code are actually 255 00:19:22,490 --> 00:19:25,380 sequences in the convolutional code. 256 00:19:25,380 --> 00:19:30,960 I'm assuming here that the code is not catastrophic, so 257 00:19:30,960 --> 00:19:33,660 it's sufficient to look at all the finite sequences in the 258 00:19:33,660 --> 00:19:36,300 convolutional code. 259 00:19:36,300 --> 00:19:36,690 All right. 260 00:19:36,690 --> 00:19:40,390 And in general, the block code distance is almost always 261 00:19:40,390 --> 00:19:42,540 going to be equal to the convolutional code distance. 262 00:19:45,760 --> 00:19:48,442 And this is just for finite code words. 263 00:19:54,710 --> 00:20:11,820 So as another example, suppose we take our standard rate 1/2, 264 00:20:11,820 --> 00:20:18,550 nu equals 2, distance equals 5, convolutional code. 265 00:20:18,550 --> 00:20:21,910 Our example 1 that we've been using all along. 266 00:20:21,910 --> 00:20:33,620 Suppose I terminate with k equals 4, then I'm 267 00:20:33,620 --> 00:20:34,420 going to get what? 268 00:20:34,420 --> 00:20:44,760 I'm going to get a 2 times 6, I'm going to get a 12, 4, 5 269 00:20:44,760 --> 00:20:46,370 binary linear block code. 270 00:20:51,170 --> 00:20:55,630 That is not so bad, actually. 271 00:20:55,630 --> 00:20:57,870 There's a point that I'm not going to make 272 00:20:57,870 --> 00:20:59,570 very strongly here. 273 00:20:59,570 --> 00:21:08,717 But terminated convolutional codes can be good block codes. 274 00:21:14,660 --> 00:21:18,920 And in fact, asymptotically, by choosing the parameters 275 00:21:18,920 --> 00:21:24,960 correctly, you can show that you can get a random ensemble 276 00:21:24,960 --> 00:21:28,960 of terminated convolutional codes that is as good as the 277 00:21:28,960 --> 00:21:33,150 best possible random ensemble of block codes. 278 00:21:33,150 --> 00:21:36,190 So terminated convolutional codes is one way of 279 00:21:36,190 --> 00:21:41,860 constructing optimal block codes in the asymptotic limit. 280 00:21:41,860 --> 00:21:45,250 These parameters aren't bad. 281 00:21:45,250 --> 00:21:47,670 Here's a rate 1/3 code with distance 5. 282 00:21:47,670 --> 00:21:50,470 Not too long. 283 00:21:50,470 --> 00:21:53,110 You know, you might compare it to what? 284 00:21:53,110 --> 00:21:56,110 The BCH code? 285 00:21:56,110 --> 00:22:03,650 There's a 15 7 5 BCH code, which is clearly better, 286 00:22:03,650 --> 00:22:05,210 because it's higher rate. 287 00:22:05,210 --> 00:22:08,690 But if you shorten this code, you have to shorten this by 3, 288 00:22:08,690 --> 00:22:10,490 this by 3 to keep the distance. 289 00:22:10,490 --> 00:22:12,756 You would get the same code. 290 00:22:12,756 --> 00:22:18,690 So it's not optimum, but it's not too bad. 291 00:22:18,690 --> 00:22:22,230 And furthermore, from this construction, you get a 292 00:22:22,230 --> 00:22:30,030 trellis representation, which in this case 293 00:22:30,030 --> 00:22:32,252 would look like this. 294 00:22:32,252 --> 00:22:36,140 It looks like the trellis that we started to draw last time. 295 00:22:40,330 --> 00:22:41,090 Sorry. 296 00:22:41,090 --> 00:22:44,710 Doesn't look like that. 297 00:22:44,710 --> 00:22:48,755 This guy goes down here, this guy comes here, 298 00:22:48,755 --> 00:22:50,860 this guy goes here. 299 00:22:50,860 --> 00:22:52,410 This continues for a while. 300 00:23:03,410 --> 00:23:06,950 And then finally, when we start to enforce all zeros, we 301 00:23:06,950 --> 00:23:10,250 only have one possible output from each of these. 302 00:23:13,976 --> 00:23:17,025 And we get a trellis that looks like that. 303 00:23:17,025 --> 00:23:17,390 All right? 304 00:23:17,390 --> 00:23:19,270 Which is 1, 2, 3 -- 305 00:23:19,270 --> 00:23:22,430 I did it correctly for this code, actually. 306 00:23:22,430 --> 00:23:24,030 Because each of these is going to have a 307 00:23:24,030 --> 00:23:26,034 2-tuple output on it. 308 00:23:26,034 --> 00:23:33,000 And that looked like 0,0 0,0 0,0 0,0 0,0. 309 00:23:33,000 --> 00:23:36,330 Each of the information bits causes a two-way branch, no 310 00:23:36,330 --> 00:23:37,960 matter where you are in the trellis. 311 00:23:37,960 --> 00:23:41,070 So here's 1, here's 2, here's 3, here's 4. 312 00:23:41,070 --> 00:23:42,230 That's the end of them. 313 00:23:42,230 --> 00:23:47,480 We wait for nu equal 2 to let the states converge back to 0. 314 00:23:47,480 --> 00:23:53,160 So that's a trellis representation of this 12 4 5 315 00:23:53,160 --> 00:23:54,410 block code. 316 00:23:57,300 --> 00:23:59,270 OK. 317 00:23:59,270 --> 00:24:03,450 I want to talk about terminated block codes, 318 00:24:03,450 --> 00:24:06,930 terminated trellis codes, convolutional codes as block 319 00:24:06,930 --> 00:24:09,750 codes, for two reasons. 320 00:24:09,750 --> 00:24:13,960 One is to say we can get block codes that way. 321 00:24:13,960 --> 00:24:20,130 But the other is to introduce the Viterbi algorithm for 322 00:24:20,130 --> 00:24:21,845 terminated convolutional codes. 323 00:24:25,730 --> 00:24:28,750 I think it's easier first to look at how the algorithm 324 00:24:28,750 --> 00:24:32,390 works when you have a finite trellis like this, and then 325 00:24:32,390 --> 00:24:36,800 we'll go on and say, well, how would this work if we let the 326 00:24:36,800 --> 00:24:40,700 trellis become very long, or in principle, infinite? 327 00:24:40,700 --> 00:24:42,510 Would the Viterbi algorithm still work? 328 00:24:42,510 --> 00:24:46,130 And it becomes obvious that it does. 329 00:24:46,130 --> 00:24:49,690 So this is going to be a maximum likelihood sequence 330 00:24:49,690 --> 00:24:51,420 detection or decoding algorithm. 331 00:25:00,020 --> 00:25:02,840 We're going to assume that each time here, we get to see 332 00:25:02,840 --> 00:25:04,890 two things. 333 00:25:04,890 --> 00:25:07,100 Corresponding to the two bits we transmit. 334 00:25:07,100 --> 00:25:12,940 So we transmit yk according to which trellis 335 00:25:12,940 --> 00:25:14,390 branch we're on -- 336 00:25:14,390 --> 00:25:19,330 y0 at this point -- and we receive a 2-tuple r0. 337 00:25:19,330 --> 00:25:22,340 If we were on a binary input additive white Gaussian 338 00:25:22,340 --> 00:25:30,400 channel, this would go into, actually, the 2-PAM map, two 339 00:25:30,400 --> 00:25:36,470 plus or minus alphas, and be received as two real numbers. 340 00:25:36,470 --> 00:25:38,680 And we do the same thing here. 341 00:25:38,680 --> 00:25:46,670 y1, we receive r1, and so forth, up to yk plus nu, or k 342 00:25:46,670 --> 00:25:53,645 plus nu minus 1, I think it is, rk plus nu minus 1. 343 00:25:53,645 --> 00:25:54,110 OK. 344 00:25:54,110 --> 00:25:56,930 So we have a transmit 2-tuple and a received 345 00:25:56,930 --> 00:25:58,670 2-tuple every time. 346 00:25:58,670 --> 00:26:02,630 And we're going to assume that the channel is memoryless so 347 00:26:02,630 --> 00:26:07,130 that the probabilities of receiving the various possible 348 00:26:07,130 --> 00:26:11,310 received values, the likelihoods, are independent 349 00:26:11,310 --> 00:26:13,230 from time unit to time unit. 350 00:26:13,230 --> 00:26:16,420 That's the only way this works. 351 00:26:16,420 --> 00:26:26,330 Then if I want to get the total probability of r given 352 00:26:26,330 --> 00:26:29,210 y, where now I'm talking about over the whole block, this 353 00:26:29,210 --> 00:26:40,290 factors into the probabilities of rk given yk, yes? 354 00:26:40,290 --> 00:26:42,380 That's what I'm going to depend on. 355 00:26:42,380 --> 00:26:46,710 This is what's called the memoryless condition, 356 00:26:46,710 --> 00:26:49,770 memoryless channel. 357 00:26:49,770 --> 00:26:54,200 The transition probabilities do not depend on what's been 358 00:26:54,200 --> 00:26:57,530 sent or received in previous blocks. 359 00:26:57,530 --> 00:27:03,970 Or it's more convenient to use the minus log likelihood 360 00:27:03,970 --> 00:27:11,380 equals now the sum of the minus log p of rk given yk. 361 00:27:14,030 --> 00:27:24,350 So maximum likelihood is equivalent to minimize the 362 00:27:24,350 --> 00:27:25,600 neg-log likelihood. 363 00:27:33,280 --> 00:27:33,760 OK. 364 00:27:33,760 --> 00:27:39,040 I receive only one thing in each time unit. 365 00:27:39,040 --> 00:27:42,840 And each trellis branch corresponds to transmitting a 366 00:27:42,840 --> 00:27:44,860 specific thing. 367 00:27:44,860 --> 00:28:00,650 So I can label each trellis branch with the appropriate 368 00:28:00,650 --> 00:28:06,970 minus log p of rk given yk. 369 00:28:06,970 --> 00:28:09,820 Do you see that? 370 00:28:09,820 --> 00:28:13,790 In other words, for this trellis branch down here, 371 00:28:13,790 --> 00:28:21,520 which is associated with y2 equals 0 1, or s of y2 equals 372 00:28:21,520 --> 00:28:32,040 alpha minus alpha, suppose I receive r2 equals r1, r2, I 373 00:28:32,040 --> 00:28:35,580 would label this by, say, the Euclidean distance between 374 00:28:35,580 --> 00:28:40,580 what I transmitted for that branch and what I received. 375 00:28:40,580 --> 00:28:44,870 For instance, on the additive white Gaussian channel, this 376 00:28:44,870 --> 00:28:51,940 would simply be rk minus s of yk. 377 00:28:51,940 --> 00:28:53,965 The Euclidean distance squared. 378 00:28:56,500 --> 00:29:01,020 Or equivalently, minus the inner product, the correlation 379 00:29:01,020 --> 00:29:04,460 between rk and s of yk. 380 00:29:04,460 --> 00:29:08,810 You know, these are equivalent metrics. 381 00:29:08,810 --> 00:29:13,720 Or on a binary symmetric channel, might be the Hamming 382 00:29:13,720 --> 00:29:17,600 distance between what I received and what I 383 00:29:17,600 --> 00:29:19,760 transmitted, both of which would be binary 384 00:29:19,760 --> 00:29:21,085 2-tuples, in this case. 385 00:29:23,800 --> 00:29:25,520 It's just some measure of distance -- 386 00:29:25,520 --> 00:29:28,240 Log likelihood distance, Euclidean distance, Hamming 387 00:29:28,240 --> 00:29:29,900 distance -- 388 00:29:29,900 --> 00:29:33,250 between the particular thing I would have transmitted if I'd 389 00:29:33,250 --> 00:29:37,700 been on this branch and the particular thing that I 390 00:29:37,700 --> 00:29:39,410 actually did receive. 391 00:29:39,410 --> 00:29:44,230 So having received a whole block of data, I can now put a 392 00:29:44,230 --> 00:29:47,090 cost on each of these branches. 393 00:29:47,090 --> 00:29:51,695 What's the minus log likelihood cost if I say that 394 00:29:51,695 --> 00:29:53,660 the code word goes through that branch? 395 00:29:53,660 --> 00:29:54,910 What does it cost me? 396 00:29:58,077 --> 00:29:59,327 All right. 397 00:30:01,800 --> 00:30:06,790 Now what I'm basically depending on is note, there's 398 00:30:06,790 --> 00:30:16,850 a one-to-one map between code words y and 399 00:30:16,850 --> 00:30:20,370 c and trellis paths. 400 00:30:25,610 --> 00:30:30,710 For this specific trellis up here, how many paths are there 401 00:30:30,710 --> 00:30:31,960 through the trellis? 402 00:30:37,118 --> 00:30:40,590 Do you see that however I go through the trellis, I'm going 403 00:30:40,590 --> 00:30:44,132 to meet four two-way branches? 404 00:30:44,132 --> 00:30:44,660 All right? 405 00:30:44,660 --> 00:30:45,790 According to [UNINTELLIGIBLE], there 406 00:30:45,790 --> 00:30:48,210 are four yes-no questions. 407 00:30:48,210 --> 00:30:49,330 It's a binary tree. 408 00:30:49,330 --> 00:30:52,970 So there are 16 possible ways to get through this trellis 409 00:30:52,970 --> 00:30:58,920 from start to finish, if I view it as a maze problem. 410 00:30:58,920 --> 00:30:59,890 And they correspond -- 411 00:30:59,890 --> 00:31:01,720 I haven't labeled all the branche -- 412 00:31:01,720 --> 00:31:06,270 but they do correspond to all 16 words in this block code. 413 00:31:10,500 --> 00:31:13,440 So now what is maximum likely decoding? 414 00:31:13,440 --> 00:31:19,330 I want to find the one of those 16 words that has the 415 00:31:19,330 --> 00:31:25,120 greatest likelihood, or the least negative log likelihood, 416 00:31:25,120 --> 00:31:31,355 over all of these 12 received symbols or 6 received times. 417 00:31:34,750 --> 00:31:36,180 So what will that be? 418 00:31:36,180 --> 00:31:39,310 Once I've labeled each of these branches with a log 419 00:31:39,310 --> 00:31:44,290 likelihood cost, I simply want to find the least cost path 420 00:31:44,290 --> 00:31:45,890 through this trellis. 421 00:31:45,890 --> 00:31:50,690 And that will correspond to the most likely code word. 422 00:31:50,690 --> 00:31:52,160 This is the essence of it. 423 00:31:52,160 --> 00:31:53,610 Do you all see that? 424 00:31:53,610 --> 00:31:54,860 AUDIENCE: [INAUDIBLE] 425 00:31:57,110 --> 00:32:00,220 PROFESSOR: The independent transmission allows me to 426 00:32:00,220 --> 00:32:05,590 break it up into a symbol-by-symbol or 427 00:32:05,590 --> 00:32:07,680 time-by-time expression. 428 00:32:07,680 --> 00:32:12,830 So I can simply cumulate over this sum right here. 429 00:32:12,830 --> 00:32:15,550 And I'm just looking for the minimum sum over all the 430 00:32:15,550 --> 00:32:17,470 possible ways I could get through this trellis. 431 00:32:20,910 --> 00:32:31,120 So now we translated this into finding the minimum cost path 432 00:32:31,120 --> 00:32:33,670 through a graph. 433 00:32:33,670 --> 00:32:37,410 Through a graph with a very special structure, nice 434 00:32:37,410 --> 00:32:38,660 regular structure. 435 00:32:45,190 --> 00:32:45,540 OK. 436 00:32:45,540 --> 00:32:50,350 Once you've got that, then I think it's very obvious how to 437 00:32:50,350 --> 00:32:55,600 come up with a nice recursive algorithm for doing that. 438 00:32:55,600 --> 00:32:56,945 Here's my recursive algorithm. 439 00:33:03,170 --> 00:33:05,520 I'm going to start computing weights. 440 00:33:05,520 --> 00:33:07,315 I'm going to start here. 441 00:33:07,315 --> 00:33:11,070 I'm going to assign weight zero here. 442 00:33:11,070 --> 00:33:16,200 Right here I'm going to assign a weight which is equal to the 443 00:33:16,200 --> 00:33:20,720 cost of the best path to get to that node in the graph. 444 00:33:20,720 --> 00:33:24,340 In this case, it's just the cost of that branch. 445 00:33:24,340 --> 00:33:26,250 Similarly down here. 446 00:33:26,250 --> 00:33:28,770 Similarly I proceed to these four. 447 00:33:28,770 --> 00:33:32,040 There's only one way to get to each of these four, and I 448 00:33:32,040 --> 00:33:37,900 simply add up what the total weight is. 449 00:33:37,900 --> 00:33:40,750 Now here's the first point where it gets interesting, 450 00:33:40,750 --> 00:33:45,260 where I have two possible ways to get to this node. 451 00:33:45,260 --> 00:33:49,450 One way is via these three branches, and that has a 452 00:33:49,450 --> 00:33:50,390 certain cost. 453 00:33:50,390 --> 00:33:53,330 Another way is via these three branches, and that has a 454 00:33:53,330 --> 00:33:55,840 certain cost. 455 00:33:55,840 --> 00:34:00,680 Suppose the cost of these three branches is higher than 456 00:34:00,680 --> 00:34:02,540 the cost of these three branches. 457 00:34:02,540 --> 00:34:04,722 Just pair-wise. 458 00:34:04,722 --> 00:34:06,690 All right? 459 00:34:06,690 --> 00:34:12,710 Claim that I can pick the minimum cost path to this 460 00:34:12,710 --> 00:34:19,800 node, throw away the other possibility, and I can never 461 00:34:19,800 --> 00:34:22,489 have thrown away something which itself is part of the 462 00:34:22,489 --> 00:34:24,530 minimum cost path through the whole trellis. 463 00:34:27,610 --> 00:34:29,520 Clearly. 464 00:34:29,520 --> 00:34:34,480 Suppose this is the best path from here to here. 465 00:34:34,480 --> 00:34:36,015 This is worse. 466 00:34:36,015 --> 00:34:41,590 X, X, X, X X. And now I find the minimum cost path through 467 00:34:41,590 --> 00:34:44,679 the whole trellis goes through this node. 468 00:34:44,679 --> 00:34:48,900 Say it's like that. 469 00:34:48,900 --> 00:34:51,730 Could it have started with this? 470 00:34:51,730 --> 00:34:52,310 No. 471 00:34:52,310 --> 00:34:55,870 Because I can always replace this with a path that starts 472 00:34:55,870 --> 00:34:59,130 with that and get a better path. 473 00:34:59,130 --> 00:35:02,880 So at this point, I can make a decision between all the paths 474 00:35:02,880 --> 00:35:05,400 that get to that node and pick just one. 475 00:35:05,400 --> 00:35:07,850 The best one that gets to that node. 476 00:35:07,850 --> 00:35:10,270 And that's called the survivor. 477 00:35:10,270 --> 00:35:14,793 This is really the key concept that Viterbi introduced. 478 00:35:17,620 --> 00:35:21,200 We only need to say one, the best up to that time. 479 00:35:21,200 --> 00:35:25,780 We need to do it for all the 2 to the nu possibilities. 480 00:35:25,780 --> 00:35:32,772 So we have 2 to the nu survivors at time k or time i, 481 00:35:32,772 --> 00:35:34,300 whatever your time index is. 482 00:35:34,300 --> 00:35:38,020 Used k for something else. 483 00:35:38,020 --> 00:35:38,490 OK. 484 00:35:38,490 --> 00:35:43,260 So I can throw away half the possibilities. 485 00:35:43,260 --> 00:35:47,010 Now each survivor, I remember what its past history is, and 486 00:35:47,010 --> 00:35:50,830 I remember what its cost is to that point. 487 00:35:50,830 --> 00:35:55,310 Now to proceed, the recursion is to proceed one time unit 488 00:35:55,310 --> 00:36:00,130 ahead, to add the incremental cost -- say, to make a 489 00:36:00,130 --> 00:36:02,830 decision here, I need to add the incremental cost to the 490 00:36:02,830 --> 00:36:05,830 two survivors that were here and here. 491 00:36:05,830 --> 00:36:08,670 So I add an incremental cost, according to what these 492 00:36:08,670 --> 00:36:10,500 branches are labeled with. 493 00:36:10,500 --> 00:36:15,890 And now I find the best of these two possibilities to get 494 00:36:15,890 --> 00:36:17,990 to this node. 495 00:36:17,990 --> 00:36:28,770 So the recursion is called an ACS operation for add, we add 496 00:36:28,770 --> 00:36:32,550 increments to each of these paths, we compare which is the 497 00:36:32,550 --> 00:36:35,040 best, and we select the best one. 498 00:36:35,040 --> 00:36:35,830 We keep that. 499 00:36:35,830 --> 00:36:36,930 We throw away the other. 500 00:36:36,930 --> 00:36:43,570 We now have a one unit longer best path to this node and its 501 00:36:43,570 --> 00:36:51,770 cost for all 2 the nu survivors at time i plus 1. 502 00:36:51,770 --> 00:36:56,670 And just proceeding in that way, we go through until we 503 00:36:56,670 --> 00:36:59,210 finally get to the terminating node. 504 00:36:59,210 --> 00:37:03,410 And at this point, when we've found the best path to this 505 00:37:03,410 --> 00:37:05,000 node, we've found the best path 506 00:37:05,000 --> 00:37:08,320 through the whole trellis. 507 00:37:08,320 --> 00:37:08,610 OK? 508 00:37:08,610 --> 00:37:10,566 So that's all it is. 509 00:37:10,566 --> 00:37:15,890 I believe it's just totally obvious once you reduce it to 510 00:37:15,890 --> 00:37:22,260 a search for a minimum cost path through the trellis. 511 00:37:22,260 --> 00:37:26,125 It's very well suited to the application of convolutional 512 00:37:26,125 --> 00:37:29,390 codes, because we really are thinking of sending these bits 513 00:37:29,390 --> 00:37:31,190 as a stream in time. 514 00:37:31,190 --> 00:37:34,210 And at the receiver, you can think of just proceeding 515 00:37:34,210 --> 00:37:39,150 forward one unit of time, 2 to the nu, add, compare, selects, 516 00:37:39,150 --> 00:37:40,680 and we've got a new set of survivors. 517 00:37:40,680 --> 00:37:45,030 There are nice implications of this that involve, you can 518 00:37:45,030 --> 00:37:48,570 build a computer for every node or for every pair of 519 00:37:48,570 --> 00:37:53,080 nodes to do the ACS operation, make a very fast recursion 520 00:37:53,080 --> 00:37:55,900 through here. 521 00:37:55,900 --> 00:37:56,710 So that's it. 522 00:37:56,710 --> 00:37:59,740 That's the Viterbi algorithm for terminated 523 00:37:59,740 --> 00:38:00,990 convolutional codes. 524 00:38:06,690 --> 00:38:10,175 Now let's ask about the VIterbi algorithm for 525 00:38:10,175 --> 00:38:12,270 unterminated convolutional codes. 526 00:38:21,500 --> 00:38:23,695 Suppose this trellis becomes very long. 527 00:38:26,680 --> 00:38:27,930 Across the whole page. 528 00:38:31,750 --> 00:38:33,760 What are these survivors going to look like? 529 00:38:33,760 --> 00:38:38,620 Suppose we start from a definite node here, and we've 530 00:38:38,620 --> 00:38:42,140 got a four state convolutional code, and we iterate and we 531 00:38:42,140 --> 00:38:45,730 iterate and we iterate with the Viterbi algorithm. 532 00:38:45,730 --> 00:38:48,800 After a long time, somewhere out here, we're going to have 533 00:38:48,800 --> 00:38:55,150 four survivors and their histories. 534 00:38:55,150 --> 00:38:58,860 And I've greatly simplified, but basically the histories 535 00:38:58,860 --> 00:39:03,960 are going to look schematically something -- 536 00:39:03,960 --> 00:39:06,300 I don't know -- could be anything. 537 00:39:06,300 --> 00:39:07,550 Look like this. 538 00:39:10,830 --> 00:39:14,770 The point I'm illustrating here is that the histories 539 00:39:14,770 --> 00:39:17,090 will be distinct right at this time. 540 00:39:17,090 --> 00:39:18,160 They have to be, because they're going to 541 00:39:18,160 --> 00:39:19,220 four distinct stages. 542 00:39:19,220 --> 00:39:22,050 But as you go backwards in time, you will find they 543 00:39:22,050 --> 00:39:26,750 merge, any two histories will merge, at a certain time back. 544 00:39:26,750 --> 00:39:28,300 And this is a probablistic thing. 545 00:39:28,300 --> 00:39:31,640 It depends on what's happening on the channel and so forth. 546 00:39:31,640 --> 00:39:36,240 But with high probability, they will have merged not too 547 00:39:36,240 --> 00:39:39,740 far back in time. 548 00:39:39,740 --> 00:39:44,120 So even if we never get to a final node, even if we just 549 00:39:44,120 --> 00:39:48,210 let this process continue forever, at this point we can 550 00:39:48,210 --> 00:39:53,440 say, we can make a definite decision at this time. 551 00:39:53,440 --> 00:39:53,720 Right? 552 00:39:53,720 --> 00:39:56,830 Because all survivors start with a common 553 00:39:56,830 --> 00:40:00,780 initial part, up to here. 554 00:40:00,780 --> 00:40:03,690 So one way to operate the Viterbi algorithm would be 555 00:40:03,690 --> 00:40:06,690 just to, at this point, say, OK. 556 00:40:06,690 --> 00:40:10,340 I'm going to put out everything before this time. 557 00:40:10,340 --> 00:40:12,890 No matter how long I run the decoder, the first part of it 558 00:40:12,890 --> 00:40:14,380 is always going to be this part. 559 00:40:14,380 --> 00:40:17,480 So these are definitely decided up to this time. 560 00:40:17,480 --> 00:40:19,700 And then I still don't know about here. 561 00:40:19,700 --> 00:40:22,120 I'm going to have to go a little bit further. 562 00:40:22,120 --> 00:40:27,540 You proceed further, and after a while, you find more that's 563 00:40:27,540 --> 00:40:29,120 definitely done. 564 00:40:29,120 --> 00:40:32,950 In practice, that would lead to a sporadic output rate. 565 00:40:32,950 --> 00:40:34,660 That isn't really what you want. 566 00:40:34,660 --> 00:40:40,230 So in practice what you do is you establish a decision 567 00:40:40,230 --> 00:40:48,500 delay, delta. 568 00:40:48,500 --> 00:40:55,290 And what you put out is you hope that 99.999% of the time, 569 00:40:55,290 --> 00:40:59,330 that if you look back delta on all the survivors, they will 570 00:40:59,330 --> 00:41:02,430 all be a common path, delta back here. 571 00:41:02,430 --> 00:41:04,460 So there's a very high probability 572 00:41:04,460 --> 00:41:06,460 you will have converged. 573 00:41:06,460 --> 00:41:10,240 And so simply at this time, you put out what you decided 574 00:41:10,240 --> 00:41:11,800 on delta time units earlier. 575 00:41:11,800 --> 00:41:14,090 Next time you put out the next one. 576 00:41:14,090 --> 00:41:15,550 Next time you put out the next one. 577 00:41:15,550 --> 00:41:18,490 So you get a nice, regular, synchronous stream of data 578 00:41:18,490 --> 00:41:19,830 coming out of here. 579 00:41:19,830 --> 00:41:24,380 Every so often, it may happen that you get out to here and 580 00:41:24,380 --> 00:41:27,220 you still haven't made a decision. 581 00:41:27,220 --> 00:41:29,420 Then you have to do something. 582 00:41:29,420 --> 00:41:34,010 And it really doesn't matter terribly much what you do. 583 00:41:34,010 --> 00:41:38,430 You might pick the guy who has the best metric at this time, 584 00:41:38,430 --> 00:41:40,740 or you simply might say, well, I'm always going to pick the 585 00:41:40,740 --> 00:41:42,130 one that goes to all zeros. 586 00:41:42,130 --> 00:41:43,800 That wouldn't be symmetric. 587 00:41:43,800 --> 00:41:48,120 You can make any decision you like. 588 00:41:48,120 --> 00:41:50,640 And you could be wrong. 589 00:41:50,640 --> 00:41:53,780 But you pick delta large enough so that the probability 590 00:41:53,780 --> 00:41:57,340 of this happening is very rare, and this just adds a 591 00:41:57,340 --> 00:42:00,340 little bit to your error probability, and as long as 592 00:42:00,340 --> 00:42:02,620 the probability of making an error because of this kind of 593 00:42:02,620 --> 00:42:06,270 operation is much lower than your probability of making an 594 00:42:06,270 --> 00:42:09,410 ordinary decoding error, then you're going to be OK. 595 00:42:12,050 --> 00:42:16,140 For convolutional codes way back at the beginning of time, 596 00:42:16,140 --> 00:42:21,370 people decided that a decision delay a 5 times nu, 5 times 597 00:42:21,370 --> 00:42:25,870 the constraint length, was the right rule of thumb. 598 00:42:25,870 --> 00:42:28,770 And that's been the rule of thumb for rate 1/n codes 599 00:42:28,770 --> 00:42:30,470 forever after. 600 00:42:30,470 --> 00:42:35,510 The point is, delta should be a lot more than nu. 601 00:42:35,510 --> 00:42:38,180 You know, after one constraint length, you certainly won't 602 00:42:38,180 --> 00:42:39,610 have converged. 603 00:42:39,610 --> 00:42:41,580 After five constraint lengths, you're highly 604 00:42:41,580 --> 00:42:43,410 likely to have converged. 605 00:42:43,410 --> 00:42:49,960 And theoretically, the probability of not converging 606 00:42:49,960 --> 00:42:51,960 goes down exponentially with delta. 607 00:42:51,960 --> 00:42:55,210 So big enough is going to work. 608 00:42:55,210 --> 00:42:58,460 And a final point is that sometimes, you really care 609 00:42:58,460 --> 00:43:01,930 that what you put out to be a true code word. 610 00:43:01,930 --> 00:43:05,200 In that case, if you get to this situation, you have to 611 00:43:05,200 --> 00:43:09,500 make a choice, you make a choice here, then you have to 612 00:43:09,500 --> 00:43:13,340 actually eliminate all the survivors that are not 613 00:43:13,340 --> 00:43:15,980 consistent with that choice. 614 00:43:15,980 --> 00:43:20,490 And you can do that simply by putting an infinite metric on 615 00:43:20,490 --> 00:43:21,890 this guy here. 616 00:43:21,890 --> 00:43:25,270 Then he'll get wiped out as soon as he's compared with 617 00:43:25,270 --> 00:43:26,520 anybody else. 618 00:43:28,630 --> 00:43:30,830 And that will ensure that whatever you eventually put 619 00:43:30,830 --> 00:43:34,950 out, you keep the sequence being a 620 00:43:34,950 --> 00:43:36,070 legitimate code sequence. 621 00:43:36,070 --> 00:43:39,220 So that's a very fine point. 622 00:43:39,220 --> 00:43:39,680 OK. 623 00:43:39,680 --> 00:43:44,200 So there really isn't any serious problem with letting 624 00:43:44,200 --> 00:43:47,540 the Viterbi algorithm run indefinitely in time once 625 00:43:47,540 --> 00:43:49,870 you've got it started. 626 00:43:49,870 --> 00:43:50,930 How do you get it started? 627 00:43:50,930 --> 00:43:57,620 Suppose you came online and you simply had a stream of 628 00:43:57,620 --> 00:44:00,810 outputs, transmitted from a convolutional code over a 629 00:44:00,810 --> 00:44:03,930 channel, and you didn't know what state to start in. 630 00:44:03,930 --> 00:44:07,441 How do you use synchronize to a starting state? 631 00:44:07,441 --> 00:44:10,400 Well, this is not hard either. 632 00:44:10,400 --> 00:44:14,350 Basically you start, you're in one of four states, or 2 to 633 00:44:14,350 --> 00:44:15,060 the nu states. 634 00:44:15,060 --> 00:44:16,840 You don't know which one. 635 00:44:16,840 --> 00:44:23,250 Let's just give them all cost 0 and start decoding, using 636 00:44:23,250 --> 00:44:26,960 the four state trellis. 637 00:44:26,960 --> 00:44:33,160 And we just start receiving 2-tuples from here on. 638 00:44:33,160 --> 00:44:36,190 We get rk, r k plus 1, and so forth. 639 00:44:36,190 --> 00:44:37,150 So how should we start? 640 00:44:37,150 --> 00:44:38,290 Well, we'll just start like that. 641 00:44:38,290 --> 00:44:41,180 And we get sort of the mirror image of this -- 642 00:44:41,180 --> 00:44:51,900 that after a time, these will all converge to a single path. 643 00:44:51,900 --> 00:44:55,310 Or after a time, when we're way down here, this is what 644 00:44:55,310 --> 00:44:58,300 the situation will look like. 645 00:44:58,300 --> 00:45:01,600 These things will each have a different route over here, but 646 00:45:01,600 --> 00:45:03,670 they will have converged in here. 647 00:45:03,670 --> 00:45:06,710 There will be a long path over which they're all converged, 648 00:45:06,710 --> 00:45:10,560 and then towards the end, they'll be unrooted again. 649 00:45:10,560 --> 00:45:10,890 OK. 650 00:45:10,890 --> 00:45:11,540 Well, that's fine. 651 00:45:11,540 --> 00:45:12,730 What does that mean in practice? 652 00:45:12,730 --> 00:45:18,190 That means we make errors during the synchronization. 653 00:45:21,810 --> 00:45:27,800 But we say we're synchronized when we get this case where 654 00:45:27,800 --> 00:45:30,990 all the paths going to all the current survivors have a 655 00:45:30,990 --> 00:45:33,820 common central stage. 656 00:45:33,820 --> 00:45:36,710 And how long does it take to synchronize? 657 00:45:36,710 --> 00:45:41,540 Again, by analysis, it's exactly this same delta again. 658 00:45:41,540 --> 00:45:45,720 The probability of not being synchronized after delta goes 659 00:45:45,720 --> 00:45:48,130 down exponentially with delta in exactly the same way. 660 00:45:48,130 --> 00:45:50,940 It's just a mirror image from one end to the other. 661 00:45:50,940 --> 00:45:54,930 So from a practical point of view, this is no problem. 662 00:45:54,930 --> 00:45:57,750 And just starting the Viterbi decoder up, with arbitrary 663 00:45:57,750 --> 00:46:02,330 metrics here, and after five constraint lengths, if you 664 00:46:02,330 --> 00:46:05,460 like, it's highly likely to have gotten synchronized. 665 00:46:05,460 --> 00:46:07,880 You'll make errors for five constraint lengths and after 666 00:46:07,880 --> 00:46:10,956 that, you'll be OK, as though you knew the starting state. 667 00:46:13,640 --> 00:46:21,030 So the moral is, no problem. 668 00:46:21,030 --> 00:46:23,510 You can just set up the Viterbi 669 00:46:23,510 --> 00:46:24,760 algorithm and let it run. 670 00:46:31,330 --> 00:46:36,110 The costs will all, of course, increase without bound, which 671 00:46:36,110 --> 00:46:36,570 is [UNINTELLIGIBLE]. 672 00:46:36,570 --> 00:46:40,110 You can always renormalize them, subtract a common cost 673 00:46:40,110 --> 00:46:40,810 from all of them. 674 00:46:40,810 --> 00:46:42,490 That won't change anything. 675 00:46:42,490 --> 00:46:43,820 Keep them within running time. 676 00:46:43,820 --> 00:46:49,410 So we don't need to terminate convolutional codes in order 677 00:46:49,410 --> 00:46:50,620 to run the Viterbi algorithm. 678 00:46:50,620 --> 00:46:54,470 We just let it self-synchronize and we make 679 00:46:54,470 --> 00:46:57,180 decisions with some decision delay. 680 00:46:57,180 --> 00:47:00,290 And the additional problems that we have 681 00:47:00,290 --> 00:47:03,220 are very, very small. 682 00:47:03,220 --> 00:47:05,070 There are no additional problems. 683 00:47:05,070 --> 00:47:05,561 Yeah? 684 00:47:05,561 --> 00:47:06,811 AUDIENCE: [INAUDIBLE] 685 00:47:13,910 --> 00:47:16,560 PROFESSOR: At this point? 686 00:47:16,560 --> 00:47:19,220 Well, notice that we don't know that we've synchronized 687 00:47:19,220 --> 00:47:21,905 until we've continued further, and we've got -- 688 00:47:21,905 --> 00:47:25,420 you know, where we've really synchronized is when we see 689 00:47:25,420 --> 00:47:28,730 that every survivor path has the common root. 690 00:47:31,840 --> 00:47:37,490 So at that point, there is really only one path here. 691 00:47:37,490 --> 00:47:40,930 And we can say that the synchronized part is 692 00:47:40,930 --> 00:47:42,180 definitely decoded. 693 00:47:44,450 --> 00:47:47,420 And we can't really say too much about this out here, 694 00:47:47,420 --> 00:47:50,660 because this depends on what's happened out in the past. 695 00:47:50,660 --> 00:47:56,080 So you say this is erasures, if you like. 696 00:47:56,080 --> 00:47:59,550 The stuff that we're pretty sure has a high probability of 697 00:47:59,550 --> 00:48:00,220 being wrong. 698 00:48:00,220 --> 00:48:01,470 AUDIENCE: [INAUDIBLE] 699 00:48:05,570 --> 00:48:08,740 PROFESSOR: There is only one decoded 700 00:48:08,740 --> 00:48:12,407 path during this interval. 701 00:48:12,407 --> 00:48:13,676 AUDIENCE: But before that interval there are branches 702 00:48:13,676 --> 00:48:14,926 and so on right. 703 00:48:17,310 --> 00:48:19,800 PROFESSOR: Well here, we don't know anything. 704 00:48:19,800 --> 00:48:23,930 Here we know the results of one computation. 705 00:48:23,930 --> 00:48:24,810 What are you suggesting? 706 00:48:24,810 --> 00:48:26,410 Just pick the best one at that point? 707 00:48:26,410 --> 00:48:29,254 AUDIENCE: And finally after you reach the [INAUDIBLE]? 708 00:48:33,280 --> 00:48:34,800 PROFESSOR: And finally? 709 00:48:34,800 --> 00:48:36,960 I'm just not sure exactly what the 710 00:48:36,960 --> 00:48:38,680 logic is of your algorithm. 711 00:48:38,680 --> 00:48:40,730 It's clear for this, it wouldn't make any difference 712 00:48:40,730 --> 00:48:42,730 if we just started off arbitrarily so, we're going to 713 00:48:42,730 --> 00:48:44,820 start in the zero state. 714 00:48:44,820 --> 00:48:48,110 And we only allow things to start in the zero state. 715 00:48:48,110 --> 00:48:51,210 Well, we'll eventually get to this path anyway. 716 00:48:51,210 --> 00:48:53,820 So it really doesn't matter how you start. 717 00:48:53,820 --> 00:48:55,500 You're going to have garbage for a while, and then you're 718 00:48:55,500 --> 00:48:56,750 going to be OK. 719 00:48:58,510 --> 00:48:59,910 There's no point in doing anything more 720 00:48:59,910 --> 00:49:01,160 sophisticated than that. 721 00:49:08,610 --> 00:49:11,500 I don't want to discuss what you're suggesting, because I 722 00:49:11,500 --> 00:49:14,210 think there's a flaw in it. 723 00:49:14,210 --> 00:49:16,040 Try to figure out what time you're going to make this 724 00:49:16,040 --> 00:49:17,290 decision at. 725 00:49:19,720 --> 00:49:20,655 OK. 726 00:49:20,655 --> 00:49:24,720 Do we all understand the Viterbi algorithm? 727 00:49:24,720 --> 00:49:25,570 Yes? 728 00:49:25,570 --> 00:49:26,740 Good. 729 00:49:26,740 --> 00:49:28,720 We can easily program it up. 730 00:49:28,720 --> 00:49:31,776 There will be an exercise on the homework. 731 00:49:34,330 --> 00:49:37,110 But now you can all do the Viterbi algorithm. 732 00:49:39,800 --> 00:49:40,330 All right. 733 00:49:40,330 --> 00:49:40,980 So -- 734 00:49:40,980 --> 00:49:41,900 and oh. 735 00:49:41,900 --> 00:49:44,027 What's the complexity of the Viterbi algorithm? 736 00:49:51,190 --> 00:49:54,910 What is the complexity? 737 00:49:54,910 --> 00:49:57,530 We always want to be talking about performance versus 738 00:49:57,530 --> 00:49:58,780 complexity. 739 00:50:00,860 --> 00:50:03,520 So it's a recursive algorithm. 740 00:50:03,520 --> 00:50:08,180 We do exactly the same operations every unit of time. 741 00:50:08,180 --> 00:50:09,990 What do the operations consist of? 742 00:50:09,990 --> 00:50:13,380 They consist of add, compare, select. 743 00:50:13,380 --> 00:50:16,260 How many additions to we have to make? 744 00:50:16,260 --> 00:50:19,270 Additions are basically equal to the number of branches in 745 00:50:19,270 --> 00:50:22,520 each unit of time in the trellis, which is 2 746 00:50:22,520 --> 00:50:23,565 to the nu plus one. 747 00:50:23,565 --> 00:50:24,485 Is that clear? 748 00:50:24,485 --> 00:50:26,280 We have 2 to the nu states. 749 00:50:26,280 --> 00:50:29,670 Two branches out of, two branches into each state, 750 00:50:29,670 --> 00:50:31,250 always for rate one. 751 00:50:31,250 --> 00:50:38,100 So we have 2 to the nu plus 1 additions. 752 00:50:38,100 --> 00:50:43,520 We get 2 the nu compares, one for each state. 753 00:50:43,520 --> 00:50:46,575 Which is really, you can consider the select to be part 754 00:50:46,575 --> 00:50:48,780 of the compare. 755 00:50:48,780 --> 00:50:55,510 Overall, you can say the complexity is of the order of 756 00:50:55,510 --> 00:50:58,420 2 to the nu or 2 to the nu plus 1. 757 00:51:02,030 --> 00:51:06,660 This is the number of states or the state complexity. 758 00:51:06,660 --> 00:51:07,950 This is the number of branches. 759 00:51:11,410 --> 00:51:14,490 I will argue a little bit later that the branch 760 00:51:14,490 --> 00:51:16,470 complexity is really more fundamental. 761 00:51:16,470 --> 00:51:20,610 You've got to do at least one thing for each branch. 762 00:51:20,610 --> 00:51:24,130 So in different set up, the branch complexity. 763 00:51:24,130 --> 00:51:28,870 But these are practically the same thing, and so we say that 764 00:51:28,870 --> 00:51:33,030 the complexity of the Viterbi algorithm is basically like 765 00:51:33,030 --> 00:51:33,950 the number of states. 766 00:51:33,950 --> 00:51:37,220 We have a four state encoder, the complexity's like four, it 767 00:51:37,220 --> 00:51:39,970 goes up exponentially with the constraint length. 768 00:51:39,970 --> 00:51:43,700 This says, this is going to be nice, as long as we have short 769 00:51:43,700 --> 00:51:45,010 constraint length. 770 00:51:45,010 --> 00:51:48,020 For longer constraint lengths, you know, a constraint length 771 00:51:48,020 --> 00:51:51,720 20, it's going to be a pretty horrible algorithm. 772 00:51:51,720 --> 00:51:55,390 So we can only use the Viterbi algorithm relatively short 773 00:51:55,390 --> 00:51:58,390 constraint length codes, relatively 774 00:51:58,390 --> 00:52:01,080 small numbers of states. 775 00:52:01,080 --> 00:52:03,990 The biggest Viterbi algorithm that I'm aware has ever been 776 00:52:03,990 --> 00:52:07,270 built is a 2 the 14th state algorithm. 777 00:52:07,270 --> 00:52:10,360 The so-called big Viterbi decoder out at JPL. 778 00:52:10,360 --> 00:52:13,050 It was used for the Galileo space missions. 779 00:52:13,050 --> 00:52:14,440 It's in a rack that big. 780 00:52:14,440 --> 00:52:17,460 I'm sure nowadays you could practically get it on a chip, 781 00:52:17,460 --> 00:52:20,690 and you could maybe do 2 the 20th states. 782 00:52:20,690 --> 00:52:28,510 But so this exponential complexity, in computer 783 00:52:28,510 --> 00:52:31,970 science terms, not really what we want. 784 00:52:31,970 --> 00:52:36,230 But when we're talking about moderate complexity decoders, 785 00:52:36,230 --> 00:52:40,290 these really have proved to be the most effective, and become 786 00:52:40,290 --> 00:52:43,590 the standard moderate complexity decoder. 787 00:52:43,590 --> 00:52:46,500 The advantage is that we can do true maximum likelihood 788 00:52:46,500 --> 00:52:52,370 decoding on a sequence basis, using soft decisions, using 789 00:52:52,370 --> 00:52:54,910 whatever reliability information the channel has, 790 00:52:54,910 --> 00:52:58,270 as long as it's memoryless. 791 00:52:58,270 --> 00:53:03,550 So last topic is to talk about performance. 792 00:53:03,550 --> 00:53:06,170 How are we going to evaluate the performance of 793 00:53:06,170 --> 00:53:08,200 convolutional codes? 794 00:53:08,200 --> 00:53:11,120 You remember what we did on block codes? 795 00:53:11,120 --> 00:53:15,650 We basically looked at the pairwise error probability 796 00:53:15,650 --> 00:53:19,120 between block code words, we then did the union bound, 797 00:53:19,120 --> 00:53:25,360 based on the pairwise error probabilities. 798 00:53:25,360 --> 00:53:28,590 And we observed that the union bound was typically dominated 799 00:53:28,590 --> 00:53:31,370 by the minimum distance error events, and so we get the 800 00:53:31,370 --> 00:53:34,700 union bound estimate, which was purely based on the 801 00:53:34,700 --> 00:53:38,620 minimum distance possible errors. 802 00:53:38,620 --> 00:53:41,050 And we can do exactly the same thing in the 803 00:53:41,050 --> 00:53:42,700 convolutional case. 804 00:53:42,700 --> 00:53:45,530 Convolutional case, again, is a linear code. 805 00:53:45,530 --> 00:53:49,310 That means it has the group property, has symmetry such 806 00:53:49,310 --> 00:53:53,180 that the distances from every possible code sequence to all 807 00:53:53,180 --> 00:53:57,390 other code sequences are going to be the same, since we're 808 00:53:57,390 --> 00:54:02,730 talking on a long or possibly infinite sequence basis. 809 00:54:02,730 --> 00:54:07,840 And we need to be just a little bit more careful about 810 00:54:07,840 --> 00:54:09,420 what an error consists of. 811 00:54:09,420 --> 00:54:13,850 We need to talk about error events. 812 00:54:13,850 --> 00:54:16,622 And this is a simple concept. 813 00:54:16,622 --> 00:54:22,970 Let us again draw a path corresponding to the 814 00:54:22,970 --> 00:54:27,380 transmitted code words. 815 00:54:27,380 --> 00:54:32,460 A very long path, potentially infinite, but it is some 816 00:54:32,460 --> 00:54:34,930 definite sequence. 817 00:54:34,930 --> 00:54:38,520 And let's run it through a memory list channel, use the 818 00:54:38,520 --> 00:54:42,270 Viterbi algorithm, and we're going to get some received 819 00:54:42,270 --> 00:54:45,336 code word, or decoded code word. 820 00:54:51,170 --> 00:54:54,200 This is one place where you might want to insist that the 821 00:54:54,200 --> 00:54:57,650 Viterbi algorithm actually put out a code word. 822 00:54:57,650 --> 00:55:00,170 What is that going to look like? 823 00:55:00,170 --> 00:55:02,660 Well, if you're running normally, the received code 824 00:55:02,660 --> 00:55:04,850 word is going to equal the transmitted code 825 00:55:04,850 --> 00:55:06,870 word most the time. 826 00:55:06,870 --> 00:55:08,750 Except it's going to make errors. 827 00:55:08,750 --> 00:55:10,070 And what will the errors look like? 828 00:55:10,070 --> 00:55:14,090 They'll look like a branch off through the trellis, and then 829 00:55:14,090 --> 00:55:18,486 eventually a reemerging into the same state. 830 00:55:18,486 --> 00:55:21,110 And similarly, you go on longer, and you might have 831 00:55:21,110 --> 00:55:23,930 another error. 832 00:55:23,930 --> 00:55:26,276 And any place where there's a difference -- 833 00:55:28,786 --> 00:55:31,410 should have done a different color here -- 834 00:55:31,410 --> 00:55:33,470 any place where there's difference, this is called an 835 00:55:33,470 --> 00:55:34,720 error event. 836 00:55:36,950 --> 00:55:40,720 So what I'm illustrating is a case when we had two disjoint 837 00:55:40,720 --> 00:55:42,740 error events. 838 00:55:42,740 --> 00:55:45,500 Is that concept clear? 839 00:55:45,500 --> 00:55:47,670 We're going to draw the trellis paths corresponding 840 00:55:47,670 --> 00:55:52,100 the transmitted code word and the decoded code word. 841 00:55:52,100 --> 00:55:55,850 Wherever they diverge over a period of time, we're going to 842 00:55:55,850 --> 00:55:56,990 call it an error event. 843 00:55:56,990 --> 00:56:00,530 But eventually they will re-merge again. 844 00:56:00,530 --> 00:56:03,510 So it could be a short time. 845 00:56:03,510 --> 00:56:07,060 Can't be any shorter than the constraint length plus 1. 846 00:56:07,060 --> 00:56:09,040 That would be the minimum length error event. 847 00:56:09,040 --> 00:56:11,010 Could be a longer time. 848 00:56:11,010 --> 00:56:12,260 Unbounded, actually. 849 00:56:15,470 --> 00:56:16,870 OK. 850 00:56:16,870 --> 00:56:21,870 What is going to be the probability of an error event 851 00:56:21,870 --> 00:56:23,290 starting at some time? 852 00:56:37,340 --> 00:56:41,680 And when I say an error event starting at time k, let's 853 00:56:41,680 --> 00:56:44,180 suppose I've been going along on the transmitted path, and 854 00:56:44,180 --> 00:56:48,690 the decoder has still got that there. 855 00:56:48,690 --> 00:56:50,660 I'm asking, what is the probability that this code 856 00:56:50,660 --> 00:56:54,590 word is actually more likely on a maximum likelihood 857 00:56:54,590 --> 00:56:58,190 sequence detection basis than this one? 858 00:56:58,190 --> 00:57:00,980 Well, simply the probability that the received sequence is 859 00:57:00,980 --> 00:57:04,710 closer to this one than this one. 860 00:57:04,710 --> 00:57:09,530 We know how to analyze that for a finite differences. 861 00:57:09,530 --> 00:57:12,850 What is the difference here? 862 00:57:12,850 --> 00:57:22,030 The difference, call this y of d and this y hat of d. 863 00:57:22,030 --> 00:57:26,240 What is y of d minus y hat of d? 864 00:57:26,240 --> 00:57:30,980 We'll call that e of d. 865 00:57:30,980 --> 00:57:32,390 This is a code word. 866 00:57:32,390 --> 00:57:34,300 This is a decoded code word. 867 00:57:34,300 --> 00:57:38,400 So the error event has to be a code word. 868 00:57:38,400 --> 00:57:39,650 Right? 869 00:57:41,210 --> 00:57:42,855 Decoder made a mistake. 870 00:57:42,855 --> 00:57:46,160 The mistake has to be a code word. 871 00:57:46,160 --> 00:57:53,840 So we're asking, what is the probability that y of d sent y 872 00:57:53,840 --> 00:58:01,300 hat of d decoded, where y hat of d equals y 873 00:58:01,300 --> 00:58:03,945 of d plus e of d? 874 00:58:03,945 --> 00:58:08,880 We'll just ask for that particular event. 875 00:58:08,880 --> 00:58:20,830 This is the probability that r of d closer to y hat 876 00:58:20,830 --> 00:58:23,707 of d than y of d. 877 00:58:28,680 --> 00:58:30,260 Now, making a big leap. 878 00:58:34,030 --> 00:58:37,500 This is equal to -- 879 00:58:37,500 --> 00:58:41,790 we're just talking about two sequences in Euclidean space. 880 00:58:41,790 --> 00:58:43,130 All that matters is the Euclidean 881 00:58:43,130 --> 00:58:45,320 distance between them. 882 00:58:45,320 --> 00:58:47,870 What is the Euclidean distance between them? 883 00:58:47,870 --> 00:58:51,930 It's 4 alpha squared times the weight of this error event, 884 00:58:51,930 --> 00:58:53,180 the Hamming weight of this error event. 885 00:58:56,290 --> 00:59:00,050 And so if you remember how to calculate pairwise error 886 00:59:00,050 --> 00:59:05,540 probabilities, this is just alpha squared D over sigma 887 00:59:05,540 --> 00:59:11,950 squared, where D is the distance of e of d. 888 00:59:11,950 --> 00:59:18,054 The weight, the Hamming weight of e of d. 889 00:59:18,054 --> 00:59:20,040 Call it sigma squared. 890 00:59:20,040 --> 00:59:21,390 Remember something that looked like that? 891 00:59:25,100 --> 00:59:28,550 So once again, we go from the Hamming weight of a possible 892 00:59:28,550 --> 00:59:32,450 error event to a Euclidean weight, which is 4 alpha 893 00:59:32,450 --> 00:59:35,940 squared times the Hamming weight. 894 00:59:35,940 --> 00:59:40,190 We actually take the square root of that and we only need 895 00:59:40,190 --> 00:59:45,190 to make an error, a noise of half of that length. 896 00:59:45,190 --> 00:59:48,250 So that's where the 4 disappears. 897 00:59:48,250 --> 00:59:53,040 And we just get q of d squared over sigma squared, where d is 898 00:59:53,040 --> 00:59:56,550 the distance to the decision boundary. 899 00:59:56,550 --> 01:00:01,520 Compressing a lot of steps, but it's all something you 900 01:00:01,520 --> 01:00:06,110 felt you knew well a few chapters ago. 901 01:00:06,110 --> 01:00:16,680 So the union bound would simply be the sum over all 902 01:00:16,680 --> 01:00:22,920 error events in the code such that the start -- 903 01:00:22,920 --> 01:00:25,010 so e of d is -- 904 01:00:25,010 --> 01:00:27,330 you want them to start at time 0, say. 905 01:00:27,330 --> 01:00:29,970 Let's ask for the probably of an error event 906 01:00:29,970 --> 01:00:32,950 starting at time 0. 907 01:00:32,950 --> 01:00:37,590 So we want it to be polynomial, have no non-zero 908 01:00:37,590 --> 01:00:41,310 negative coefficients, and have the coefficient at time 0 909 01:00:41,310 --> 01:00:46,300 be 1, or not 0. 910 01:00:46,300 --> 01:00:49,346 I'm doing this very roughly. 911 01:00:49,346 --> 01:00:55,220 Of q to the square root of alpha squared times the weight 912 01:00:55,220 --> 01:00:58,920 Hamming of e of d over sigma squared. 913 01:01:02,390 --> 01:01:03,350 Just as before. 914 01:01:03,350 --> 01:01:07,540 So to get the union bound, we sum up over the weights of all 915 01:01:07,540 --> 01:01:11,350 sequences that start at time 0. 916 01:01:11,350 --> 01:01:15,916 Let's do it for our favorite example. 917 01:01:15,916 --> 01:01:22,630 Suppose g of d is 1 plus d squared, 1 plus d plus d 918 01:01:22,630 --> 01:01:27,550 squared, then what are the possible e of d's that I'm 919 01:01:27,550 --> 01:01:28,380 talking about? 920 01:01:28,380 --> 01:01:31,370 I have this itself. 921 01:01:31,370 --> 01:01:34,390 1 plus d squared, 1 plus d plus d squared. 922 01:01:34,390 --> 01:01:38,850 This is weight equal to 5. 923 01:01:38,850 --> 01:01:41,000 What's my next possible error event? 924 01:01:41,000 --> 01:01:43,860 Would be 1 plus d times this. 925 01:01:43,860 --> 01:01:50,740 So it's going to be a table. 926 01:01:50,740 --> 01:01:56,570 We have g of d, 1 plus d times g of d, which is equal to 1 927 01:01:56,570 --> 01:02:02,320 plus d plus d squared plus d cubed, 1 plus d cubed, that 928 01:02:02,320 --> 01:02:05,290 has weight equals 6. 929 01:02:05,290 --> 01:02:07,520 What's my next longer one? 930 01:02:07,520 --> 01:02:16,010 1 plus d squared times g of d equals 1 plus d fourth, 1 plus 931 01:02:16,010 --> 01:02:22,380 d plus d third plus d fourth, that has weight 6. 932 01:02:22,380 --> 01:02:26,370 1 plus d plus d squared times g of d. 933 01:02:31,160 --> 01:02:32,420 1 plus d plus -- 934 01:02:32,420 --> 01:02:35,180 this is 1 plus d plus -- 935 01:02:42,370 --> 01:02:45,710 plus d third, plus d forth. 936 01:02:45,710 --> 01:02:47,620 I may be making a mistake here. 937 01:02:47,620 --> 01:02:51,890 1 plus d squared plus d fourth. 938 01:02:51,890 --> 01:02:54,550 It's hard to do this by hand after a while. 939 01:02:54,550 --> 01:02:56,190 OK. 940 01:02:56,190 --> 01:02:59,450 Notice, we start tabulating all the error events, which I 941 01:02:59,450 --> 01:03:04,730 can do in order of the degree of u of d, always keeping the 942 01:03:04,730 --> 01:03:09,120 non-zero, the time 0 term equal to 1. 943 01:03:09,120 --> 01:03:10,910 So this is the only one of length 1. 944 01:03:10,910 --> 01:03:12,035 This is [UNINTELLIGIBLE] 945 01:03:12,035 --> 01:03:14,270 of degree 0. 946 01:03:14,270 --> 01:03:15,730 This is the only one of degree 1. 947 01:03:15,730 --> 01:03:17,170 There are two of them of degree 2. 948 01:03:17,170 --> 01:03:21,330 There will be four of them of degree 3, and so forth. 949 01:03:21,330 --> 01:03:25,870 So I can lay out what all the error events could be, and I 950 01:03:25,870 --> 01:03:27,720 look at what their weights are. 951 01:03:27,720 --> 01:03:30,060 Here's my minimum weight error event. 952 01:03:30,060 --> 01:03:32,260 Then I have two of weight six and so forth. 953 01:03:32,260 --> 01:03:35,920 So the union bound, I'd be adding up this term. 954 01:03:35,920 --> 01:03:40,080 I'd get one term where the weight is 5, two where the 955 01:03:40,080 --> 01:03:44,200 weight is 6, I don't know how many where the weight is 7. 956 01:03:44,200 --> 01:03:46,380 I happen to know there were only two 957 01:03:46,380 --> 01:03:48,190 weight 6 error events. 958 01:03:48,190 --> 01:03:51,890 And you simply have to find out what the weight profile is 959 01:03:51,890 --> 01:03:53,840 and put it in the union bound. 960 01:03:53,840 --> 01:04:00,470 Or you can use the union bound estimate, which is simply -- 961 01:04:00,470 --> 01:04:09,870 let's just take k nd, the number of error events of 962 01:04:09,870 --> 01:04:15,010 minimum weight d, times q to the square root of alpha 963 01:04:15,010 --> 01:04:24,805 squared d, where d equals min weight over sigma squared. 964 01:04:27,850 --> 01:04:28,280 OK. 965 01:04:28,280 --> 01:04:32,800 In this case, nd equals 1, d equals 6. 966 01:04:32,800 --> 01:04:35,270 Let me take this one step further. 967 01:04:35,270 --> 01:04:37,350 nd times q to the square root of -- 968 01:04:40,150 --> 01:04:46,398 what is sigma squared equals n0 over 2? 969 01:04:46,398 --> 01:04:53,250 And Eb equals n times alpha squared. 970 01:04:53,250 --> 01:04:56,120 The energy per transmitted bit is alpha squared. 971 01:04:56,120 --> 01:05:00,760 We're going to transmit n bits for every information bit. 972 01:05:00,760 --> 01:05:11,460 So plugging those in, I get 2d over n times Eb over N0, which 973 01:05:11,460 --> 01:05:16,830 is again of the form nd q to the square root of 2 times the 974 01:05:16,830 --> 01:05:22,310 coding gain of the code times Eb over N0. 975 01:05:22,310 --> 01:05:27,190 Bottom line is I get precisely the same performance analysis, 976 01:05:27,190 --> 01:05:36,020 or I get a nominal coding gain of -- 977 01:05:38,840 --> 01:05:40,980 in general, it's kd over n. 978 01:05:40,980 --> 01:05:44,940 If it were a rate k over n code, since we're only 979 01:05:44,940 --> 01:05:49,140 considering rate 1 over n codes, it's just d over n. 980 01:05:49,140 --> 01:05:59,720 And I get an error coefficient which equals the number of 981 01:05:59,720 --> 01:06:05,690 weight d code words, starting time 0. 982 01:06:09,950 --> 01:06:13,000 Of course an error event could start at any time. 983 01:06:13,000 --> 01:06:16,950 If there are nd starting at time 0, how many 984 01:06:16,950 --> 01:06:20,230 start at time 1? 985 01:06:20,230 --> 01:06:23,190 Time invariant code. 986 01:06:23,190 --> 01:06:26,620 So it's going to be the same numbers could possibly 987 01:06:26,620 --> 01:06:27,720 start at time 1. 988 01:06:27,720 --> 01:06:30,330 So what I'm computing here is the probability of an error 989 01:06:30,330 --> 01:06:34,500 event starting at a particular time. 990 01:06:34,500 --> 01:06:39,380 For this particular code, what do I have? 991 01:06:39,380 --> 01:06:46,000 I have one event of weight 5. 992 01:06:46,000 --> 01:06:55,360 So the nominal coding gain is 5 over n, which is 2. 993 01:06:55,360 --> 01:06:56,450 Which is -- 994 01:06:56,450 --> 01:07:03,920 5 is 7 dB, 2 is 3 dB, so this is 4 dB. 995 01:07:03,920 --> 01:07:06,300 That's pretty good. 996 01:07:06,300 --> 01:07:10,370 Nominal coding gain of 4 dB with only a four state code. 997 01:07:10,370 --> 01:07:12,690 Obviously a very simple decoder for this code. 998 01:07:15,770 --> 01:07:20,040 And what is kd equals 1? 999 01:07:20,040 --> 01:07:21,400 That implies -- 1000 01:07:21,400 --> 01:07:26,380 again, we have the same argument about whatever the 1001 01:07:26,380 --> 01:07:28,300 error coefficient is. 1002 01:07:28,300 --> 01:07:30,590 You could plot this curve. 1003 01:07:30,590 --> 01:07:33,390 The larger the error coefficient is, the more the 1004 01:07:33,390 --> 01:07:35,950 curve moves up, and therefore over. 1005 01:07:35,950 --> 01:07:39,350 You get an effective coding gain which is less. 1006 01:07:39,350 --> 01:07:42,900 But this means since it's 1, you don't have to do that. 1007 01:07:42,900 --> 01:07:45,720 The effective coding gain is the same as the 1008 01:07:45,720 --> 01:07:47,580 nominal coding gain. 1009 01:07:47,580 --> 01:07:49,890 It's still 4 dB. 1010 01:07:49,890 --> 01:07:54,010 So it's a real 4 dB of coding gain for the simple little two 1011 01:07:54,010 --> 01:07:55,260 state code. 1012 01:07:59,220 --> 01:08:07,630 This code compares very directly with the 8 4 4 1013 01:08:07,630 --> 01:08:11,430 Reed-Muller code, block code. 1014 01:08:11,430 --> 01:08:13,550 This code also has rate 1/2. 1015 01:08:13,550 --> 01:08:14,460 It has the same rate. 1016 01:08:14,460 --> 01:08:18,950 We'll see that it also has a four-state trellis diagram. 1017 01:08:18,950 --> 01:08:23,090 But it only has distance four, which means its nominal coding 1018 01:08:23,090 --> 01:08:26,970 gain is only 2, 3 dB. 1019 01:08:26,970 --> 01:08:34,370 And furthermore, it has 14 minimum weight words, which 1020 01:08:34,370 --> 01:08:37,939 even dividing by 4 to get the number per bit, there's still 1021 01:08:37,939 --> 01:08:41,779 a factor of 3, still going to cost us another couple of 1022 01:08:41,779 --> 01:08:42,710 tenths of a dB. 1023 01:08:42,710 --> 01:08:48,457 Its effective coding gain is only about 2.6 or 2.7 dB. 1024 01:08:48,457 --> 01:08:48,890 All right? 1025 01:08:48,890 --> 01:08:54,100 So this code has much better performance for about the same 1026 01:08:54,100 --> 01:08:59,920 complexity as this code, at the same rate. 1027 01:08:59,920 --> 01:09:01,520 And this tends to be typical. 1028 01:09:01,520 --> 01:09:05,939 Convolutional codes just beat block codes when you compare 1029 01:09:05,939 --> 01:09:07,640 them in this way. 1030 01:09:07,640 --> 01:09:11,620 Notice that we're assuming maximum likelihood decoding. 1031 01:09:11,620 --> 01:09:15,279 We don't yet have a maximum likelihood decoding algorithm 1032 01:09:15,279 --> 01:09:15,960 for this code. 1033 01:09:15,960 --> 01:09:19,510 We'll find for this code, we can also decode it using the 1034 01:09:19,510 --> 01:09:22,529 Viterbi algorithm with a four state decoder 1035 01:09:22,529 --> 01:09:24,100 comparable to this one. 1036 01:09:24,100 --> 01:09:26,640 So it would be, I would say, for maximum likelihood 1037 01:09:26,640 --> 01:09:28,740 decoding -- 1038 01:09:28,740 --> 01:09:30,040 about same complexity. 1039 01:09:30,040 --> 01:09:33,080 But we simply get much better performance with the 1040 01:09:33,080 --> 01:09:33,680 convolutional. 1041 01:09:33,680 --> 01:09:34,145 Yeah? 1042 01:09:34,145 --> 01:09:35,395 AUDIENCE: [INAUDIBLE] 1043 01:09:37,870 --> 01:09:38,500 PROFESSOR: Say again? 1044 01:09:38,500 --> 01:09:40,500 AUDIENCE: What happened to the k? 1045 01:09:40,500 --> 01:09:42,830 PROFESSOR: Why is k equal to 1? 1046 01:09:42,830 --> 01:09:43,660 Which k? 1047 01:09:43,660 --> 01:09:44,189 Over here? 1048 01:09:44,189 --> 01:09:47,130 AUDIENCE: Yeah. kd [INAUDIBLE]. 1049 01:09:47,130 --> 01:09:47,380 PROFESSOR: All right. 1050 01:09:47,380 --> 01:09:50,955 We're only considering rate 1/n codes. 1051 01:09:50,955 --> 01:09:55,850 If it were a rate k/n code, we would get k over n, because 1052 01:09:55,850 --> 01:10:00,000 this would be n over k times alpha squared. 1053 01:10:00,000 --> 01:10:02,440 Just the same as before. 1054 01:10:02,440 --> 01:10:05,800 In fact, I just want to wave my hand, say, everything goes 1055 01:10:05,800 --> 01:10:07,270 through as before. 1056 01:10:07,270 --> 01:10:10,830 As soon as you get the error event concept, you can reduce 1057 01:10:10,830 --> 01:10:13,870 it to the calculation of pairwise error probabilities, 1058 01:10:13,870 --> 01:10:16,160 and then the union bound estimate is as before. 1059 01:10:16,160 --> 01:10:17,410 AUDIENCE: [INAUDIBLE]? 1060 01:10:22,180 --> 01:10:27,600 PROFESSOR: Well, I had my little trellis here. 1061 01:10:27,600 --> 01:10:33,270 And how long in real time does it take for 1062 01:10:33,270 --> 01:10:34,565 two paths to merge? 1063 01:10:38,040 --> 01:10:41,560 An error event has got to take at least nu plus 1 time units 1064 01:10:41,560 --> 01:10:45,650 for two paths to diverge and then merge 1065 01:10:45,650 --> 01:10:46,900 again in the trellis. 1066 01:10:49,780 --> 01:10:51,030 Is that clear? 1067 01:10:57,850 --> 01:11:03,660 Or put another way, the lowest degree of any possible error 1068 01:11:03,660 --> 01:11:08,120 event is nu, which means it actually takes place over nu 1069 01:11:08,120 --> 01:11:09,575 plus 1 time units. 1070 01:11:12,400 --> 01:11:14,300 OK? 1071 01:11:14,300 --> 01:11:14,620 From this. 1072 01:11:14,620 --> 01:11:17,560 AUDIENCE: [INAUDIBLE] 1073 01:11:17,560 --> 01:11:18,810 PROFESSOR: Why is the lowest --? 1074 01:11:21,390 --> 01:11:23,760 I'm taking g of d -- 1075 01:11:23,760 --> 01:11:26,010 the definition of nu is what? 1076 01:11:26,010 --> 01:11:29,230 The maximum degree of g of d. 1077 01:11:29,230 --> 01:11:30,550 OK? 1078 01:11:30,550 --> 01:11:35,430 So if that's so, then the shortest length error event is 1079 01:11:35,430 --> 01:11:41,460 1 times g of d, which takes nu plus 1 time units to run out. 1080 01:11:41,460 --> 01:11:45,000 So error events have to be at least this long, and then they 1081 01:11:45,000 --> 01:11:47,310 can be any integer length longer than that. 1082 01:11:53,350 --> 01:11:54,950 You don't look totally happy. 1083 01:11:54,950 --> 01:11:56,780 AUDIENCE: [INAUDIBLE] 1084 01:11:56,780 --> 01:11:57,615 PROFESSOR: You understand? 1085 01:11:57,615 --> 01:11:57,890 OK. 1086 01:11:57,890 --> 01:11:59,140 AUDIENCE: [INAUDIBLE] 1087 01:12:06,820 --> 01:12:10,050 PROFESSOR: How I see it from this diagram? 1088 01:12:10,050 --> 01:12:11,880 Where have I got a picture of a trellis? 1089 01:12:11,880 --> 01:12:14,640 Here I've got a picture of a trellis. 1090 01:12:14,640 --> 01:12:19,520 I've defined an error event by taking a 1091 01:12:19,520 --> 01:12:20,690 transmitted code word. 1092 01:12:20,690 --> 01:12:22,880 It's a code sequence. 1093 01:12:22,880 --> 01:12:25,320 This is supposed to represent some path through the trellis. 1094 01:12:25,320 --> 01:12:27,060 There's one-to-one correspondence between code 1095 01:12:27,060 --> 01:12:29,500 sequences and trellis paths. 1096 01:12:29,500 --> 01:12:32,890 Then I find another code sequence, which is the one I 1097 01:12:32,890 --> 01:12:34,080 actually decided on. 1098 01:12:34,080 --> 01:12:37,210 Call that the decoded code sequence. 1099 01:12:37,210 --> 01:12:40,310 And I say, what's the minimum length of time they could be 1100 01:12:40,310 --> 01:12:43,040 diverged from one another? 1101 01:12:43,040 --> 01:12:43,200 All right? 1102 01:12:43,200 --> 01:12:46,230 Let's take this particular trellis. 1103 01:12:46,230 --> 01:12:49,080 What's the minimum length of time any two paths -- 1104 01:12:49,080 --> 01:12:51,995 say, here's the transmitted path. 1105 01:12:54,540 --> 01:12:58,690 Suppose I try to find another path that diverges from it? 1106 01:12:58,690 --> 01:12:59,935 Here's one. 1107 01:12:59,935 --> 01:13:01,600 Comes back to it. 1108 01:13:01,600 --> 01:13:02,950 Here's one. 1109 01:13:02,950 --> 01:13:03,470 Another one. 1110 01:13:03,470 --> 01:13:06,730 Comes back to it. 1111 01:13:06,730 --> 01:13:08,550 I say that the minimum length of time it could 1112 01:13:08,550 --> 01:13:10,030 take is nu plus 1. 1113 01:13:10,030 --> 01:13:10,910 Why? 1114 01:13:10,910 --> 01:13:14,940 Because the difference between these two paths is itself a 1115 01:13:14,940 --> 01:13:19,800 code sequence, is therefore a non-zero polynomial 1116 01:13:19,800 --> 01:13:23,310 multiple of g of d. 1117 01:13:23,310 --> 01:13:23,970 OK? 1118 01:13:23,970 --> 01:13:25,220 Same argument. 1119 01:13:30,210 --> 01:13:32,830 OK. 1120 01:13:32,830 --> 01:13:36,270 So I guess I'm not going to get into chapter 10, so I'll 1121 01:13:36,270 --> 01:13:41,240 discourse a little bit more about convolutional codes 1122 01:13:41,240 --> 01:13:42,490 versus block codes. 1123 01:13:54,940 --> 01:13:59,480 How do you construct convolutional codes, actually? 1124 01:13:59,480 --> 01:14:03,820 You see that really, what you want to do is to first of all, 1125 01:14:03,820 --> 01:14:07,250 maximize the minimum distance for certain constraint length. 1126 01:14:07,250 --> 01:14:10,220 Subject to that, you want to minimize the number of minimum 1127 01:14:10,220 --> 01:14:13,220 distance words. 1128 01:14:13,220 --> 01:14:18,050 You want to, in fact, get the best distance profile. 1129 01:14:18,050 --> 01:14:20,620 Is there any block codes? 1130 01:14:20,620 --> 01:14:23,340 We had nice, algebraic ways of doing this. 1131 01:14:23,340 --> 01:14:25,830 Roots of polynomials, Reed-Solomon codes. 1132 01:14:25,830 --> 01:14:31,090 We could develop an algebraic formula which told us what the 1133 01:14:31,090 --> 01:14:32,380 minimum distance was. 1134 01:14:32,380 --> 01:14:34,600 Do we have any nice algebraic constructions like that 1135 01:14:34,600 --> 01:14:35,830 convolutional code? 1136 01:14:35,830 --> 01:14:38,300 No. 1137 01:14:38,300 --> 01:14:41,940 Basically, you've just got to search all the possible 1138 01:14:41,940 --> 01:14:45,360 polynomials where the maximum degree is nu. 1139 01:14:45,360 --> 01:14:46,405 Take pairs of polynomials. 1140 01:14:46,405 --> 01:14:50,690 If you want a rate 1/2 code of degree nu, there's not that 1141 01:14:50,690 --> 01:14:52,470 many things you have to search. 1142 01:14:52,470 --> 01:14:53,190 All right? 1143 01:14:53,190 --> 01:14:56,660 You just take all possible pairs of binary polynomials of 1144 01:14:56,660 --> 01:15:01,760 degree nu or less, making sure that you don't take any two 1145 01:15:01,760 --> 01:15:06,170 which have a common divisor, an untrivial common divisor, 1146 01:15:06,170 --> 01:15:08,790 so you can wipe those out. 1147 01:15:08,790 --> 01:15:11,130 You want to make sure that the constant term is 1. 1148 01:15:11,130 --> 01:15:14,260 There's no point in sliding one over so it has a larger 1149 01:15:14,260 --> 01:15:16,110 constant term than 1. 1150 01:15:16,110 --> 01:15:20,750 But subject to those provisos, you simply try all pairs g1, 1151 01:15:20,750 --> 01:15:24,050 g2, and you just do -- 1152 01:15:24,050 --> 01:15:27,690 as soon as you've assured yourself they're not 1153 01:15:27,690 --> 01:15:30,740 catastrophic, they don't have a common divisor, then you can 1154 01:15:30,740 --> 01:15:33,600 just list, you know, the finite code words are going to 1155 01:15:33,600 --> 01:15:35,020 be the ones that are generated by 1156 01:15:35,020 --> 01:15:37,360 finite information sequences. 1157 01:15:37,360 --> 01:15:40,180 So you can just list all the code words, as I've started to 1158 01:15:40,180 --> 01:15:41,500 do up here. 1159 01:15:41,500 --> 01:15:47,550 Or since the trellis is, in fact, a way of listing all the 1160 01:15:47,550 --> 01:15:50,190 code words, there's a one-to-one correspondence 1161 01:15:50,190 --> 01:15:53,330 between code words and trellis paths, you can just start 1162 01:15:53,330 --> 01:15:57,430 searching through the trellis and you will quickly find all 1163 01:15:57,430 --> 01:16:01,020 the minimum weight code words, and thereby establish the 1164 01:16:01,020 --> 01:16:02,880 minimum distance, the weight profile as 1165 01:16:02,880 --> 01:16:04,550 far out as you like. 1166 01:16:04,550 --> 01:16:05,680 And you choose the best. 1167 01:16:05,680 --> 01:16:07,030 You try all possibilities. 1168 01:16:07,030 --> 01:16:11,980 So Joseph Odenwalder did this as soon as people recognized 1169 01:16:11,980 --> 01:16:15,130 that short convolutional codes could be practical, and he 1170 01:16:15,130 --> 01:16:20,140 published the tables back in his PhD thesis in '69, so he 1171 01:16:20,140 --> 01:16:22,546 got a PhD thesis out of this. 1172 01:16:22,546 --> 01:16:25,510 It wasn't that hard, it's done once and for all, and the 1173 01:16:25,510 --> 01:16:29,150 results are in the notes. 1174 01:16:29,150 --> 01:16:30,470 The tables. 1175 01:16:30,470 --> 01:16:36,750 And you can see from the tables that in terms of 1176 01:16:36,750 --> 01:16:39,480 performance versus moderate complexity, 1177 01:16:39,480 --> 01:16:41,360 things go pretty well. 1178 01:16:41,360 --> 01:16:43,660 Here's this four state code. 1179 01:16:43,660 --> 01:16:47,830 It already gets you 4 dB of effective coding gain. 1180 01:16:47,830 --> 01:16:51,610 To get to 6 dB of effective coding gain, you need to go up 1181 01:16:51,610 --> 01:16:54,810 to about 64 states. 1182 01:16:54,810 --> 01:16:57,850 First person to do this was Jerry Heller at Jet Propulsion 1183 01:16:57,850 --> 01:16:58,530 Laboratory. 1184 01:16:58,530 --> 01:17:03,550 Again, about '68, immediately after Viterbi proposed his 1185 01:17:03,550 --> 01:17:08,640 algorithm in '67, Heller was the first to go out and say, 1186 01:17:08,640 --> 01:17:10,560 well, let's see how these perform. 1187 01:17:10,560 --> 01:17:15,550 So he found a good 64 state rate 1/2 code, and he did 1188 01:17:15,550 --> 01:17:19,030 probability of error versus Eb over N0, and he said, wow. 1189 01:17:19,030 --> 01:17:21,340 I get a 6 dB coding gain. 1190 01:17:21,340 --> 01:17:26,040 And Viterbi had no idea that his algorithm would actually 1191 01:17:26,040 --> 01:17:27,650 be useful in practice. 1192 01:17:27,650 --> 01:17:29,970 He was just using it to make a proof. 1193 01:17:29,970 --> 01:17:33,330 And he didn't even know it was optimum. 1194 01:17:33,330 --> 01:17:36,360 But he's always given the credit to Heller for realizing 1195 01:17:36,360 --> 01:17:37,960 it could be practical. 1196 01:17:37,960 --> 01:17:43,450 And so that 64 state rate 1/2 code with Viterbi algorithm 1197 01:17:43,450 --> 01:17:47,360 decoding became very popular in the '70s. 1198 01:17:47,360 --> 01:17:50,780 Heller and Viterbi and Jacobs went off to form a company 1199 01:17:50,780 --> 01:17:52,150 called Linkabit. 1200 01:17:52,150 --> 01:17:58,180 And for the technology of the time, that seemed to be a very 1201 01:17:58,180 --> 01:18:00,060 appropriate solution. 1202 01:18:00,060 --> 01:18:02,340 You see you get approximately the same thing as a 1203 01:18:02,340 --> 01:18:04,220 rate 1/3, rate 1/4. 1204 01:18:04,220 --> 01:18:06,995 If you go lower in rate you can do marginally better. 1205 01:18:06,995 --> 01:18:09,675 If you are after tenths of a dB, it's worthwhile. 1206 01:18:12,280 --> 01:18:17,280 Later in the decade, is the best way to get more gain to 1207 01:18:17,280 --> 01:18:20,320 go to more and more complicated 1208 01:18:20,320 --> 01:18:21,420 convolutional codes? 1209 01:18:21,420 --> 01:18:22,840 No. 1210 01:18:22,840 --> 01:18:27,890 The best way is to use the concatenated idea. 1211 01:18:27,890 --> 01:18:32,950 Once you use these maximum likelihood decoders, the 1212 01:18:32,950 --> 01:18:37,020 Viterbi decoders, to get your error rate down to 10 to the 1213 01:18:37,020 --> 01:18:40,330 minus 3 or something, 1 in 1000, at that point you have 1214 01:18:40,330 --> 01:18:44,750 very few errors, and you can apply then an outer code to 1215 01:18:44,750 --> 01:18:47,550 clean up the error events that do occur. 1216 01:18:47,550 --> 01:18:51,190 You see, you're going to get bursts of errors in your 1217 01:18:51,190 --> 01:18:52,260 decoded sequence. 1218 01:18:52,260 --> 01:18:57,450 It's a very natural idea to have an outer code, which is 1219 01:18:57,450 --> 01:19:02,740 based on gf of 256, say, 8 bit bytes. 1220 01:19:02,740 --> 01:19:07,975 And so a Reed-Solomon code comes along and cleans up the 1221 01:19:07,975 --> 01:19:10,360 errors that do occur, and drives the error probability 1222 01:19:10,360 --> 01:19:13,580 down to 10 to the minus 12, to whatever you like it, with 1223 01:19:13,580 --> 01:19:18,040 very little redundancy, very little additional cost. 1224 01:19:18,040 --> 01:19:21,930 So that became the standard approach for space 1225 01:19:21,930 --> 01:19:27,430 communications in the '70s and indeed '80s. 1226 01:19:27,430 --> 01:19:31,980 I've already mentioned that around '90, they went up to 1227 01:19:31,980 --> 01:19:35,820 this 2 to the fourteenth state Viterbi decoder. 1228 01:19:35,820 --> 01:19:38,350 They went to much more powerful 1229 01:19:38,350 --> 01:19:40,520 outer codes, much cleverer. 1230 01:19:40,520 --> 01:19:43,670 And they were able to get to within about 2 or 3 dB of the 1231 01:19:43,670 --> 01:19:45,020 Shannon limit. 1232 01:19:45,020 --> 01:19:47,340 And that was the state-of-the-art on the eve of 1233 01:19:47,340 --> 01:19:50,520 the discovery of turbo codes, which is where we're going in 1234 01:19:50,520 --> 01:19:52,220 all of this. 1235 01:19:52,220 --> 01:19:55,070 So from a practical point of view, in the moderate 1236 01:19:55,070 --> 01:19:58,660 complexity regime, simple convolutional codes with 1237 01:19:58,660 --> 01:20:03,230 moderate complexity Viterbi decoding are the best still 1238 01:20:03,230 --> 01:20:04,790 that anybody knows how to do. 1239 01:20:04,790 --> 01:20:06,690 They have all these system advantages. 1240 01:20:06,690 --> 01:20:09,820 They work with nice synchronous streams of 1241 01:20:09,820 --> 01:20:12,380 traffic, which is what you want for data transmission. 1242 01:20:12,380 --> 01:20:14,300 They use soft decisions. 1243 01:20:14,300 --> 01:20:15,750 They use any kind of reliability 1244 01:20:15,750 --> 01:20:16,710 information you have. 1245 01:20:16,710 --> 01:20:19,230 They're not limited by hard decisions. 1246 01:20:19,230 --> 01:20:22,880 They're not limited by bounded systems as the algebraic 1247 01:20:22,880 --> 01:20:24,120 schemes are. 1248 01:20:24,120 --> 01:20:28,470 So it just proved to be a better way to go on channels 1249 01:20:28,470 --> 01:20:31,850 like the additive white Gaussian noise channel. 1250 01:20:31,850 --> 01:20:32,240 OK. 1251 01:20:32,240 --> 01:20:33,810 So we didn't get into chapter 10. 1252 01:20:33,810 --> 01:20:35,060 We'll start that next time.