1 00:00:00,000 --> 00:00:02,490 The following content is provided under a Creative 2 00:00:02,490 --> 00:00:04,059 Commons license. 3 00:00:04,059 --> 00:00:06,360 Your support will help MIT OpenCourseWare 4 00:00:06,360 --> 00:00:10,720 continue to offer high-quality educational resources for free. 5 00:00:10,720 --> 00:00:13,350 To make a donation or view additional materials 6 00:00:13,350 --> 00:00:17,290 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,290 --> 00:00:18,294 at ocw.mit.edu. 8 00:00:28,530 --> 00:00:32,009 GEORGE VERGHESE: The course is called Digital Communication 9 00:00:32,009 --> 00:00:32,570 Systems. 10 00:00:32,570 --> 00:00:36,390 So I wanted to say a bit about what that means. 11 00:00:36,390 --> 00:00:42,540 And the easiest way to do that is to contrast it with analog. 12 00:00:42,540 --> 00:00:43,938 What's analog communication? 13 00:00:43,938 --> 00:00:45,480 Well, in analog communication, you're 14 00:00:45,480 --> 00:00:48,990 typically focused on communicating 15 00:00:48,990 --> 00:00:50,290 some kind of a waveform. 16 00:00:50,290 --> 00:00:56,100 So you've got some continuous waveform, typically, an x of t, 17 00:00:56,100 --> 00:00:59,760 maybe the voltage picked up at a microphone at the source, 18 00:00:59,760 --> 00:01:02,370 and you want to get it across to a receiver. 19 00:01:02,370 --> 00:01:05,670 And it's under this umbrella that you 20 00:01:05,670 --> 00:01:08,610 have things like amplitude modulation, 21 00:01:08,610 --> 00:01:10,060 frequency modulation, and so on. 22 00:01:10,060 --> 00:01:13,130 These are all schemes aimed at transmitting 23 00:01:13,130 --> 00:01:14,920 a continuous waveform of this type. 24 00:01:14,920 --> 00:01:17,880 So an amplitude modulation, for instance, what you'll do 25 00:01:17,880 --> 00:01:20,385 is you'll take a sinusoidal carrier. 26 00:01:24,620 --> 00:01:26,310 The carrier carries the information 27 00:01:26,310 --> 00:01:28,910 about the analog waveform, and basically, it's 28 00:01:28,910 --> 00:01:34,370 a high-frequency sinusoid whose amplitude is varied 29 00:01:34,370 --> 00:01:35,873 in proportion to the signal. 30 00:01:35,873 --> 00:01:37,040 I haven't drawn it too well. 31 00:01:37,040 --> 00:01:38,998 It's supposed to be constant frequency and just 32 00:01:38,998 --> 00:01:40,460 the amplitude varying. 33 00:01:40,460 --> 00:01:44,600 So this is something of the type x of t 34 00:01:44,600 --> 00:01:49,640 cosine 2 pi fc t, for instance. 35 00:01:49,640 --> 00:01:51,080 It's a sinusoid of a fixed carrier 36 00:01:51,080 --> 00:01:54,920 frequency with the amplitude varying slowly. 37 00:01:54,920 --> 00:01:58,505 In FM, what you do is you have a fixed amplitude waveform, 38 00:01:58,505 --> 00:01:59,630 but you vary the frequency. 39 00:01:59,630 --> 00:02:06,470 So what you might do is have high frequency in this part, 40 00:02:06,470 --> 00:02:09,650 and then when the signal goes low, the frequency gets lower. 41 00:02:09,650 --> 00:02:13,730 And then it gets higher where the signal is high. 42 00:02:13,730 --> 00:02:15,800 So there's a modulation of the frequency, 43 00:02:15,800 --> 00:02:18,643 but the amplitude stays fixed. 44 00:02:18,643 --> 00:02:20,810 The good thing about this is you can be transmitting 45 00:02:20,810 --> 00:02:23,360 at full power all the time, and the information 46 00:02:23,360 --> 00:02:25,430 is coded onto frequency, whereas this 47 00:02:25,430 --> 00:02:28,190 can tend to be more susceptible to noise. 48 00:02:28,190 --> 00:02:33,170 But the focus is on an analog waveform and transmitting that. 49 00:02:33,170 --> 00:02:38,730 Now, in digital communication the focus changes. 50 00:02:38,730 --> 00:02:44,780 So in digital, we think in terms of sources with messages. 51 00:02:44,780 --> 00:02:45,650 So we have messages. 52 00:02:49,900 --> 00:02:54,270 There's a source of some kind that 53 00:02:54,270 --> 00:02:55,770 puts out a stream of symbols. 54 00:02:55,770 --> 00:02:58,470 So at Time 1, there's some symbol emitted. 55 00:02:58,470 --> 00:03:01,790 At Time 2, there's some other symbol, symbol, symbol. 56 00:03:01,790 --> 00:03:03,540 And these are all heading to the receiver. 57 00:03:15,240 --> 00:03:18,900 So already we're thinking of a clocked kind of system. 58 00:03:18,900 --> 00:03:20,790 We're thinking of symbols being transmitted 59 00:03:20,790 --> 00:03:22,770 at a particular rate. 60 00:03:22,770 --> 00:03:24,960 We're thinking of these discrete objects rather than 61 00:03:24,960 --> 00:03:26,600 continuous waveforms. 62 00:03:26,600 --> 00:03:29,070 And the focus is then on getting a message 63 00:03:29,070 --> 00:03:31,290 across as opposed to getting a waveform across 64 00:03:31,290 --> 00:03:33,010 with high fidelity. 65 00:03:33,010 --> 00:03:38,580 And that turns out to actually be a big shift in perspective. 66 00:03:38,580 --> 00:03:42,480 These symbols, then, will often get coded onto-- 67 00:03:42,480 --> 00:03:45,720 for instance, the symbols, if they originally were, 68 00:03:45,720 --> 00:03:49,210 let's say an A, B, C, D, for instance-- 69 00:03:49,210 --> 00:03:51,388 coding the grades in a class-- 70 00:03:51,388 --> 00:03:53,430 you might want to, when you're transmitting them, 71 00:03:53,430 --> 00:03:56,730 to adopt those symbols to what your channel is able to take. 72 00:03:56,730 --> 00:03:58,885 And maybe your channel is one that's 73 00:03:58,885 --> 00:04:00,510 able to distinguish between two states, 74 00:04:00,510 --> 00:04:02,910 but maybe not between four states, so you 75 00:04:02,910 --> 00:04:11,150 might want to code these onto a channel that-- 76 00:04:11,150 --> 00:04:12,710 well, onto strings of 0's and 1's, 77 00:04:12,710 --> 00:04:14,960 so that you can impress this on a channel that 78 00:04:14,960 --> 00:04:16,370 can respond to just two states. 79 00:04:16,370 --> 00:04:20,480 So you might have a coding step that takes the original 80 00:04:20,480 --> 00:04:23,660 symbols, puts out a stream of 0's and 1's. 81 00:04:23,660 --> 00:04:27,560 And then you've got the task of decoding the message. 82 00:04:27,560 --> 00:04:29,420 The channel might corrupt these streams, 83 00:04:29,420 --> 00:04:31,700 and that's another thing that you have to deal with. 84 00:04:31,700 --> 00:04:38,560 So what's made digital explode is the fact that it's really 85 00:04:38,560 --> 00:04:42,220 well matched to computation, memory, storage, 86 00:04:42,220 --> 00:04:45,317 all the stuff that's advancing rapidly. 87 00:04:45,317 --> 00:04:46,900 In the world of analog, you're talking 88 00:04:46,900 --> 00:04:51,370 about analog electronics, which is also advancing greatly 89 00:04:51,370 --> 00:04:53,590 but doesn't have the same flexibility. 90 00:04:53,590 --> 00:04:55,810 Here, you can do all sorts of things 91 00:04:55,810 --> 00:04:58,850 with digital and with the computation that's available, 92 00:04:58,850 --> 00:05:01,540 and that's growing in power all the time 93 00:05:01,540 --> 00:05:03,310 to do more and more fancy things. 94 00:05:03,310 --> 00:05:05,320 So digital communication is really 95 00:05:05,320 --> 00:05:07,190 most of what you see around you. 96 00:05:07,190 --> 00:05:11,920 So when you talk on the phone, or do computer-to-computer 97 00:05:11,920 --> 00:05:13,510 communication, or browse the web, 98 00:05:13,510 --> 00:05:16,180 and so on, it's really digital communication that 99 00:05:16,180 --> 00:05:18,970 you're talking about-- 100 00:05:18,970 --> 00:05:24,190 with one little caveat, I guess. 101 00:05:24,190 --> 00:05:28,480 When you get down to the details of how you get a 0 and 1 102 00:05:28,480 --> 00:05:32,260 across on a channel, you're back in the analog physical world. 103 00:05:32,260 --> 00:05:33,760 And you tend to be doing things that 104 00:05:33,760 --> 00:05:36,520 are much closer to what you worry about on analog channels. 105 00:05:36,520 --> 00:05:38,210 And we'll see that in this course. 106 00:05:38,210 --> 00:05:40,960 So for most of what we talk about, 107 00:05:40,960 --> 00:05:44,260 we'll be working at the level of the digital abstraction here. 108 00:05:44,260 --> 00:05:47,170 But when we come to talking about transmission on a link, 109 00:05:47,170 --> 00:05:50,080 and modulation, and demodulation, and the like, 110 00:05:50,080 --> 00:05:51,790 we're back in the analog world. 111 00:05:51,790 --> 00:05:53,770 And you'll actually get a good feel 112 00:05:53,770 --> 00:05:58,166 for some of this in the course of our digital communication. 113 00:06:02,580 --> 00:06:06,480 So to give you a sense of how the course is structured, 114 00:06:06,480 --> 00:06:09,780 we'll spend some time, first of all, talking 115 00:06:09,780 --> 00:06:13,530 about information-- information in a message 116 00:06:13,530 --> 00:06:16,450 and how you measure it, how you code it up. 117 00:06:16,450 --> 00:06:21,150 So this is sort of the bits piece of the course. 118 00:06:21,150 --> 00:06:23,820 And then we'll talk about how to get these messages 119 00:06:23,820 --> 00:06:24,940 across single channels. 120 00:06:24,940 --> 00:06:28,740 So this is a single link source at one end, receiver 121 00:06:28,740 --> 00:06:29,790 at the other end. 122 00:06:29,790 --> 00:06:32,790 And we'll focus on how you get the data across. 123 00:06:32,790 --> 00:06:34,740 And that brings us to the analog world 124 00:06:34,740 --> 00:06:37,930 and to the world of signals, so we'll spend time on that. 125 00:06:37,930 --> 00:06:40,140 And that's sort of the second third of the course. 126 00:06:43,050 --> 00:06:46,120 And then the last third of the course, 127 00:06:46,120 --> 00:06:48,990 which Harry will actually be lecturing, 128 00:06:48,990 --> 00:06:52,980 focuses on now when you have interconnected networks. 129 00:06:52,980 --> 00:06:55,140 So you've got multiple links, so you 130 00:06:55,140 --> 00:06:57,150 might want to communicate from some source 131 00:06:57,150 --> 00:07:00,600 here to a receiver that's way across on the network, 132 00:07:00,600 --> 00:07:03,570 going through multiple links and multiple nodes. 133 00:07:03,570 --> 00:07:05,470 And there are all sorts of issues there. 134 00:07:05,470 --> 00:07:08,100 And there, we're thinking in terms of a packet 135 00:07:08,100 --> 00:07:09,740 kind of abstraction. 136 00:07:09,740 --> 00:07:12,390 It's packets that we ship around the network 137 00:07:12,390 --> 00:07:15,330 with associated logic, and mechanisms 138 00:07:15,330 --> 00:07:17,190 for failure detection in the network, 139 00:07:17,190 --> 00:07:20,300 and coping with all of that So there 140 00:07:20,300 --> 00:07:22,830 are these three sort of chunks to the course, 141 00:07:22,830 --> 00:07:24,850 and you'll see that in the notes as well. 142 00:07:24,850 --> 00:07:27,540 So we'll start off with the bits piece. 143 00:07:27,540 --> 00:07:29,610 That's more or less Quiz one. 144 00:07:29,610 --> 00:07:31,230 Then we'll get to the signals piece. 145 00:07:31,230 --> 00:07:33,360 That's more or less Quiz two, and we'll 146 00:07:33,360 --> 00:07:36,600 get to the packets piece, and that's more or less Quiz three. 147 00:07:36,600 --> 00:07:40,530 And these will be relatively modular. 148 00:07:40,530 --> 00:07:43,860 So you get a chance to make a fresh start on each of them. 149 00:07:43,860 --> 00:07:48,210 But you'll find us reaching back to build on ideas developed 150 00:07:48,210 --> 00:07:51,100 earlier in the course. 151 00:07:51,100 --> 00:07:55,070 Now, as I think about where digital communication 152 00:07:55,070 --> 00:07:59,180 originated, it actually turns out 153 00:07:59,180 --> 00:08:05,460 to be largely due to the person who painted this painting. 154 00:08:05,460 --> 00:08:07,460 This is called "The Gallery of the Louvre." 155 00:08:07,460 --> 00:08:11,210 This is somebody who painted it in around 1830, 156 00:08:11,210 --> 00:08:12,140 an American painter. 157 00:08:12,140 --> 00:08:13,670 He was actually born in Charleston, 158 00:08:13,670 --> 00:08:19,640 close to here, studied at Yale, made enough 159 00:08:19,640 --> 00:08:22,490 of a name for himself that he had commissions to paint 160 00:08:22,490 --> 00:08:23,480 portraits and the like. 161 00:08:23,480 --> 00:08:27,290 He was actually called to Washington, DC 162 00:08:27,290 --> 00:08:32,720 to paint a portrait of the markets, the Lafayette. 163 00:08:32,720 --> 00:08:35,240 While he was there, he got a telegram from his father 164 00:08:35,240 --> 00:08:38,667 saying that his wife in New Haven was convalescing. 165 00:08:38,667 --> 00:08:40,250 By the time he got to New Haven, which 166 00:08:40,250 --> 00:08:43,159 was as soon as he could-- he abandoned the painting he was 167 00:08:43,159 --> 00:08:45,492 doing and left for New Haven-- by the time he got there, 168 00:08:45,492 --> 00:08:48,480 he found that she had actually died and been buried. 169 00:08:48,480 --> 00:08:52,970 And that sort of fortified him for what 170 00:08:52,970 --> 00:08:55,395 he decided was his life's work, which 171 00:08:55,395 --> 00:08:57,020 was to find better ways to communicate, 172 00:08:57,020 --> 00:08:58,280 faster ways to communicate. 173 00:08:58,280 --> 00:09:02,120 He didn't want to have to depend on horse riders 174 00:09:02,120 --> 00:09:06,110 to carry messages or ships across the ocean. 175 00:09:06,110 --> 00:09:08,300 He's actually painted himself into the middle 176 00:09:08,300 --> 00:09:11,660 of that painting. 177 00:09:11,660 --> 00:09:13,900 These were some friends that he made in Paris. 178 00:09:13,900 --> 00:09:17,300 It's actually the author, James Fenimore Cooper, 179 00:09:17,300 --> 00:09:20,180 of The Last of the Mohicans fame. 180 00:09:20,180 --> 00:09:22,100 He was actually hoping to sell his painting 181 00:09:22,100 --> 00:09:28,268 to Fenimore Cooper, but things didn't work out that way. 182 00:09:28,268 --> 00:09:29,810 In any case, this is a huge painting. 183 00:09:29,810 --> 00:09:31,640 It's about six feet by nine feet. 184 00:09:34,610 --> 00:09:37,520 He wrapped it up to bring it back to the States. 185 00:09:37,520 --> 00:09:38,990 It wasn't quite finished. 186 00:09:38,990 --> 00:09:42,000 This was about 1831 or so. 187 00:09:42,000 --> 00:09:43,730 And on the boat, he met this person 188 00:09:43,730 --> 00:09:47,850 who had a little electromagnet that he was playing with. 189 00:09:47,850 --> 00:09:49,520 And they had various discussions, 190 00:09:49,520 --> 00:09:54,480 and he got the idea for a telegraph. 191 00:09:54,480 --> 00:09:56,820 Anyone with a guess as to the name? 192 00:09:56,820 --> 00:09:57,990 Morse. 193 00:09:57,990 --> 00:10:00,755 Now we think of Morse as the Morse code guy, 194 00:10:00,755 --> 00:10:02,130 but it turns out that he actually 195 00:10:02,130 --> 00:10:05,280 did hugely more than the code. 196 00:10:05,280 --> 00:10:14,218 So that's Samuel Morse, looks pretty imposing. 197 00:10:14,218 --> 00:10:15,760 He didn't just come up with the code, 198 00:10:15,760 --> 00:10:18,370 he actually invented the whole system. 199 00:10:18,370 --> 00:10:20,442 Now, he didn't work in a vacuum. 200 00:10:20,442 --> 00:10:22,900 There were people doing related things in different places, 201 00:10:22,900 --> 00:10:26,335 but his was the first practical essentially single-wire system. 202 00:10:28,840 --> 00:10:30,460 If you look at his patent documents, 203 00:10:30,460 --> 00:10:32,590 he's got all the little pieces that it 204 00:10:32,590 --> 00:10:34,210 takes to make the system. 205 00:10:34,210 --> 00:10:35,950 A key piece was actually the relay. 206 00:10:35,950 --> 00:10:40,240 So he figured out, working with a colleague 207 00:10:40,240 --> 00:10:43,150 back in New York, he figured out that with a little battery, 208 00:10:43,150 --> 00:10:44,980 you could close an electromagnet or you 209 00:10:44,980 --> 00:10:46,897 could power an electromagnet at some distance. 210 00:10:46,897 --> 00:10:49,070 But you couldn't have that wire be too long. 211 00:10:49,070 --> 00:10:52,960 So what he arranged was a relay where that electromagnet then 212 00:10:52,960 --> 00:10:55,670 pulls another piece of metal, which then closes 213 00:10:55,670 --> 00:10:57,430 another switch on a separate circuit, 214 00:10:57,430 --> 00:10:59,890 so you can then start to propagate the signal over very 215 00:10:59,890 --> 00:11:00,760 large distances. 216 00:11:00,760 --> 00:11:03,280 And that was really a key part of his invention. 217 00:11:03,280 --> 00:11:05,320 Morse code-- there's actually some discussion 218 00:11:05,320 --> 00:11:07,600 as to whether he invented it or it was actually 219 00:11:07,600 --> 00:11:12,090 his assistant Vail, but it's called "Morse code" anyway. 220 00:11:12,090 --> 00:11:14,790 The other staggering thing about this story 221 00:11:14,790 --> 00:11:16,770 is how soon after the invention-- 222 00:11:16,770 --> 00:11:18,810 I mean, his patent was, let's see, 223 00:11:18,810 --> 00:11:22,810 1840, very early in the days of the Patent Office, 224 00:11:22,810 --> 00:11:26,880 as you can see from the numbers assigned to the patent. 225 00:11:26,880 --> 00:11:30,450 About 15 years later, there were people raising money 226 00:11:30,450 --> 00:11:34,080 to lay cable across the Atlantic to carry telegrams. 227 00:11:34,080 --> 00:11:38,548 So can you imagine, partly the bravery of these people? 228 00:11:38,548 --> 00:11:40,590 I mean, it's hard enough to think of laying cable 229 00:11:40,590 --> 00:11:41,610 across Boston Harbor. 230 00:11:41,610 --> 00:11:44,940 And they were prepared to design this cable, load it on a ship, 231 00:11:44,940 --> 00:11:47,280 and lay it across the entire Atlantic. 232 00:11:47,280 --> 00:11:50,940 They made an attempt in 1857. 233 00:11:50,940 --> 00:11:54,780 It actually turned out to work for about three weeks. 234 00:11:54,780 --> 00:11:56,550 That was long enough for Queen Victoria 235 00:11:56,550 --> 00:11:58,940 to congratulate President Buchanan, 236 00:11:58,940 --> 00:12:00,540 except it took almost all of a day 237 00:12:00,540 --> 00:12:04,215 to get the 98 words across from one side to the other. 238 00:12:04,215 --> 00:12:06,090 And the reason is when you put a little pulse 239 00:12:06,090 --> 00:12:08,670 on one end of a very long cable, it 240 00:12:08,670 --> 00:12:11,140 distorts like mad by the time it gets to the other end. 241 00:12:11,140 --> 00:12:12,660 So you can barely detect-- 242 00:12:12,660 --> 00:12:14,160 if you put a sharp change in one end 243 00:12:14,160 --> 00:12:15,990 and you've got a long cable, and if it's 244 00:12:15,990 --> 00:12:18,090 a poorly designed cable, it takes a long time 245 00:12:18,090 --> 00:12:23,430 to detect the rise at the other end if you detect it at all. 246 00:12:23,430 --> 00:12:25,530 It turns out the person at the American end 247 00:12:25,530 --> 00:12:28,650 was the person who would later become Lord Kelvin. 248 00:12:28,650 --> 00:12:31,980 He was called plain old Walter Thomson at that point. 249 00:12:31,980 --> 00:12:33,540 He had designed a very sensitive way 250 00:12:33,540 --> 00:12:37,980 to measure these changes in voltage at the ends of cables. 251 00:12:37,980 --> 00:12:40,650 But the person at the British end 252 00:12:40,650 --> 00:12:42,837 was actually a surgeon, a self-taught electrical 253 00:12:42,837 --> 00:12:45,420 engineer, who was convinced that the problem was there was not 254 00:12:45,420 --> 00:12:47,310 enough voltage on the cable. 255 00:12:47,310 --> 00:12:50,040 So he kept cranking up the voltage. 256 00:12:50,040 --> 00:12:55,390 When he got to 2,000 volts, the cable failed. 257 00:12:55,390 --> 00:12:58,503 And so there had been celebrations in the street, 258 00:12:58,503 --> 00:13:00,420 and there had been fireworks, and all of that. 259 00:13:00,420 --> 00:13:02,640 And then people got very angry, and thought 260 00:13:02,640 --> 00:13:05,590 this was a scam, and a way to raise money, and all of that. 261 00:13:05,590 --> 00:13:07,380 Despite all of the negative press, 262 00:13:07,380 --> 00:13:09,450 a year later here was this man again, 263 00:13:09,450 --> 00:13:11,940 with enough funding from governments and private sources 264 00:13:11,940 --> 00:13:13,980 to make another attempt at the cable. 265 00:13:17,000 --> 00:13:18,890 Anyway, it took a while. 266 00:13:18,890 --> 00:13:23,970 It took a good nine years to finally lay a good cable. 267 00:13:23,970 --> 00:13:28,310 They'd gone out about 1,200 miles with a cable in 1865 268 00:13:28,310 --> 00:13:29,790 before it broke. 269 00:13:29,790 --> 00:13:32,000 They had to start again in 1866. 270 00:13:32,000 --> 00:13:33,500 They managed to lay an entire cable, 271 00:13:33,500 --> 00:13:34,875 and then they came back and found 272 00:13:34,875 --> 00:13:37,070 the broken end of the 1865 cable, 273 00:13:37,070 --> 00:13:38,610 and picked it up, and continued it. 274 00:13:38,610 --> 00:13:42,310 So in 1866, they managed to get two cables working. 275 00:13:42,310 --> 00:13:43,790 And now it was a lot faster-- 276 00:13:43,790 --> 00:13:46,760 eight words a minute. 277 00:13:46,760 --> 00:13:48,200 It was digital communication. 278 00:13:48,200 --> 00:13:50,780 It's got all the ingredients of what we see 279 00:13:50,780 --> 00:13:52,330 in digital communication today. 280 00:13:55,090 --> 00:13:56,590 And then a little while later, there 281 00:13:56,590 --> 00:14:00,118 was a transcontinental line, which 282 00:14:00,118 --> 00:14:02,410 marked the end of, essentially, the Pony Express trying 283 00:14:02,410 --> 00:14:03,890 to carry mail across the continent. 284 00:14:03,890 --> 00:14:07,120 Now much more was going to happen on telegraph lines. 285 00:14:07,120 --> 00:14:09,340 There was a transpacific line in 1902, 286 00:14:09,340 --> 00:14:12,130 so that meant at that point, you could encircle the globe 287 00:14:12,130 --> 00:14:13,540 with telegraph. 288 00:14:13,540 --> 00:14:17,170 So it was really a transformative technology. 289 00:14:17,170 --> 00:14:19,870 And it was a digital technology because all 290 00:14:19,870 --> 00:14:22,480 you were trying to figure out at the other end 291 00:14:22,480 --> 00:14:24,700 was whether something was a dot or a dash. 292 00:14:24,700 --> 00:14:26,500 It was just basically two states that you 293 00:14:26,500 --> 00:14:29,890 were trying to distinguish. 294 00:14:29,890 --> 00:14:31,540 That's his Patent Office documents. 295 00:14:31,540 --> 00:14:35,500 Actually, they're interesting to read, but let's just see here-- 296 00:14:35,500 --> 00:14:37,740 "Be it known that I, the undersigned, Samuel F. B. 297 00:14:37,740 --> 00:14:40,450 Morse, have invented a new and useful machine and system 298 00:14:40,450 --> 00:14:43,000 of signs for transmitting intelligence 299 00:14:43,000 --> 00:14:45,970 between distant points by the means of a new application 300 00:14:45,970 --> 00:14:48,940 and effect of electromagnetism." 301 00:14:48,940 --> 00:14:54,560 And then he goes on to describe the equipment and the code 302 00:14:54,560 --> 00:14:56,010 itself. 303 00:14:56,010 --> 00:14:58,160 This is just a map to show you the kind of distance 304 00:14:58,160 --> 00:15:01,040 that they had to lay that first cable over. 305 00:15:05,290 --> 00:15:07,960 Morse code you've all seen. 306 00:15:07,960 --> 00:15:10,623 It's gone through some evolution, actually. 307 00:15:10,623 --> 00:15:12,040 Actually, Morse originally thought 308 00:15:12,040 --> 00:15:14,425 of just a code for numbers, and then 309 00:15:14,425 --> 00:15:16,150 he imagined a dictionary at the two ends, 310 00:15:16,150 --> 00:15:17,740 and you would just send the number 311 00:15:17,740 --> 00:15:20,410 for the word in the dictionary, and someone would look it up 312 00:15:20,410 --> 00:15:21,400 at the other end. 313 00:15:21,400 --> 00:15:24,880 But then with Vail, they developed this scheme. 314 00:15:24,880 --> 00:15:29,170 You notice the most frequently used letter 315 00:15:29,170 --> 00:15:31,450 has the shortest symbol here. 316 00:15:31,450 --> 00:15:33,220 It's just a dot. 317 00:15:33,220 --> 00:15:36,010 And then if you go to an A, it's a dot, dash, and so on. 318 00:15:36,010 --> 00:15:37,390 The T, I think, is a dash. 319 00:15:37,390 --> 00:15:38,920 Yeah. 320 00:15:38,920 --> 00:15:43,750 So the choice of symbols sort of matches the expected symbol 321 00:15:43,750 --> 00:15:46,090 frequencies in English text. 322 00:15:46,090 --> 00:15:47,800 You want the more frequently used letters 323 00:15:47,800 --> 00:15:49,835 to have the shorter symbols because there 324 00:15:49,835 --> 00:15:51,377 are going to be many of them, and you 325 00:15:51,377 --> 00:15:54,310 don't want to be sending along code words with them. 326 00:15:54,310 --> 00:15:55,390 But this was Morse code. 327 00:15:59,080 --> 00:16:00,610 Here's another way to represent it. 328 00:16:03,910 --> 00:16:07,410 So going to the left is a dot, going to the right is a dash. 329 00:16:07,410 --> 00:16:09,630 So a single dot brings you to an E. A dot, 330 00:16:09,630 --> 00:16:15,145 dot brings you to an I, dot, dot, dot to an S. Dash dot 331 00:16:15,145 --> 00:16:16,710 brings you to an N, and so on. 332 00:16:16,710 --> 00:16:20,630 So you can display this code on a graph. 333 00:16:20,630 --> 00:16:24,140 One thing you see right away from this display, 334 00:16:24,140 --> 00:16:26,270 and it was clear from the code itself, 335 00:16:26,270 --> 00:16:28,250 is you're not going to be able to get away 336 00:16:28,250 --> 00:16:30,860 with just two symbols. 337 00:16:30,860 --> 00:16:34,570 Because if you're trying to get to an A on this path, 338 00:16:34,570 --> 00:16:35,570 you hit an E on the way. 339 00:16:35,570 --> 00:16:38,840 And you need something to tell you that you aren't done yet. 340 00:16:38,840 --> 00:16:41,280 So there is a third symbol, and that's the space. 341 00:16:41,280 --> 00:16:44,210 So Morse code has dot, a dash, and a space. 342 00:16:44,210 --> 00:16:46,190 It's really a three-symbol alphabet, 343 00:16:46,190 --> 00:16:47,840 and the space is critical. 344 00:16:47,840 --> 00:16:52,700 If you want to have a code where you can deduce instantly 345 00:16:52,700 --> 00:16:55,870 that you've hit the letter that the sender intends, 346 00:16:55,870 --> 00:16:57,620 you need all the letters to be at the ends 347 00:16:57,620 --> 00:16:59,720 here, at the leaves of the tree. 348 00:16:59,720 --> 00:17:02,420 If you have all the code words at the leaves of the tree, 349 00:17:02,420 --> 00:17:03,920 right at the ends, then you're not 350 00:17:03,920 --> 00:17:07,099 going to encounter any other code words along the way. 351 00:17:07,099 --> 00:17:08,667 So you just keep going down the tree, 352 00:17:08,667 --> 00:17:11,420 dot to the left, dash to the right, 353 00:17:11,420 --> 00:17:14,810 till you hit the code word at the end, and then you're done. 354 00:17:14,810 --> 00:17:16,310 But in this kind of arrangement, you 355 00:17:16,310 --> 00:17:18,470 need a third symbol, actually, to demarcate 356 00:17:18,470 --> 00:17:21,920 the different words. 357 00:17:21,920 --> 00:17:26,720 So this made Morse a very celebrated man. 358 00:17:26,720 --> 00:17:28,369 He got patents in various places. 359 00:17:28,369 --> 00:17:30,780 He got medals from all over the world, 360 00:17:30,780 --> 00:17:32,570 including the Sultan of Turkey. 361 00:17:32,570 --> 00:17:35,300 I believe this one is a Diamond Medal 362 00:17:35,300 --> 00:17:37,190 from the Sultan of Turkey. 363 00:17:37,190 --> 00:17:40,850 He was celebrated on postage stamps, and all deservedly so. 364 00:17:40,850 --> 00:17:45,250 He really made a huge difference to communication, 365 00:17:45,250 --> 00:17:49,400 and set digital communication on the current path. 366 00:17:49,400 --> 00:17:51,470 So I've got to skip forward very quickly 367 00:17:51,470 --> 00:17:55,070 now to bring me to the part of the story I 368 00:17:55,070 --> 00:17:57,030 want to continue with. 369 00:17:57,030 --> 00:18:03,590 So we're going to hopscotch over a whole bunch of, again, 370 00:18:03,590 --> 00:18:05,780 transformative inventions. 371 00:18:05,780 --> 00:18:11,090 There was a telephone in '76, again with Boston connections. 372 00:18:11,090 --> 00:18:14,390 It was really thought of as speech telegraphy at that time, 373 00:18:14,390 --> 00:18:16,130 so it wasn't the telephone yet. 374 00:18:16,130 --> 00:18:20,270 Bell's patent is titled "Improvement in Telegraphy." 375 00:18:20,270 --> 00:18:21,770 There was wireless telegraphy, which 376 00:18:21,770 --> 00:18:28,310 was Marconi sending signals from Europe to, actually, Cape Cod, 377 00:18:28,310 --> 00:18:30,500 but it was not voice. 378 00:18:30,500 --> 00:18:34,590 It was Morse code, basically, dots and dashes. 379 00:18:34,590 --> 00:18:37,110 And then here came analog communication. 380 00:18:37,110 --> 00:18:41,030 So this was exactly what I talked about, AM and FM, 381 00:18:41,030 --> 00:18:43,530 and then later video images, and so on. 382 00:18:43,530 --> 00:18:47,300 So there's a lot of this going on during this period. 383 00:18:47,300 --> 00:18:49,970 A big player in the theory, that company-- this 384 00:18:49,970 --> 00:18:51,950 was actually Bell Labs. 385 00:18:51,950 --> 00:18:54,170 The Bell Labs is really full of people 386 00:18:54,170 --> 00:18:56,880 who made a huge difference to the development of all this. 387 00:18:56,880 --> 00:19:01,980 In fact, I had some names listed on a previous slide. 388 00:19:01,980 --> 00:19:03,225 Let me see if I-- 389 00:19:03,225 --> 00:19:05,490 I passed over them without mention. 390 00:19:05,490 --> 00:19:09,150 But in the development of the telegraph, 391 00:19:09,150 --> 00:19:10,950 I've mentioned Lord Kelvin already. 392 00:19:10,950 --> 00:19:13,170 He did a lot to model transmission lines 393 00:19:13,170 --> 00:19:16,290 and to show how to design better ones. 394 00:19:16,290 --> 00:19:18,085 Design of magnets and the invention 395 00:19:18,085 --> 00:19:19,710 of the relay-- that was actually Joseph 396 00:19:19,710 --> 00:19:22,890 Henry, a Professor at Colombia, after whom the unit 397 00:19:22,890 --> 00:19:23,880 of inductance is named. 398 00:19:23,880 --> 00:19:27,480 That's the Henry, and various other people. 399 00:19:27,480 --> 00:19:34,560 So this technology really was a very fertile kind 400 00:19:34,560 --> 00:19:37,500 of ground for people to develop things in. 401 00:19:42,510 --> 00:19:45,038 And if you take other courses in the department, 402 00:19:45,038 --> 00:19:47,330 these are names you will encounter all over the place-- 403 00:19:47,330 --> 00:19:52,650 Nyquist, Bode, e But the one I want to focus on 404 00:19:52,650 --> 00:19:55,740 is Claude Shannon, who's sort of the patron saint 405 00:19:55,740 --> 00:19:57,990 of this subject, I would say. 406 00:19:57,990 --> 00:20:02,460 Shannon did his Master's degree here at MIT. 407 00:20:02,460 --> 00:20:05,100 It's been called one of the most influential Master's theses 408 00:20:05,100 --> 00:20:09,180 ever, because he developed Boolean algebra as a way 409 00:20:09,180 --> 00:20:10,315 to design logic circuits. 410 00:20:10,315 --> 00:20:11,940 The logic circuits he was talking about 411 00:20:11,940 --> 00:20:15,660 were relay circuits of the time, but this was very quickly 412 00:20:15,660 --> 00:20:18,690 picked up, and quoted, and applied. 413 00:20:18,690 --> 00:20:21,840 And then he moved on to something else for his PhD. 414 00:20:21,840 --> 00:20:25,170 And I don't know the extent to which that's been influential 415 00:20:25,170 --> 00:20:26,370 in genetics. 416 00:20:26,370 --> 00:20:28,680 But then he joined Bell Labs just 417 00:20:28,680 --> 00:20:31,590 about the start of the war years. 418 00:20:31,590 --> 00:20:34,920 A lot of work on cryptography during that time, 419 00:20:34,920 --> 00:20:39,000 initially classified but then a declassified version published. 420 00:20:39,000 --> 00:20:41,130 During that time, he also had interaction 421 00:20:41,130 --> 00:20:43,410 with Alan Turing, who was working on cryptography 422 00:20:43,410 --> 00:20:46,500 in England, but had been sent over to Bell Labs 423 00:20:46,500 --> 00:20:49,680 to share ideas. 424 00:20:49,680 --> 00:20:53,070 And then in 1948, a groundbreaking paper 425 00:20:53,070 --> 00:20:56,850 that really is the basis for information theory today. 426 00:20:56,850 --> 00:21:01,740 So it was this that developed a mathematical basis 427 00:21:01,740 --> 00:21:03,570 for digital communications. 428 00:21:03,570 --> 00:21:08,610 And the impact has been just incredible since then. 429 00:21:08,610 --> 00:21:12,623 So that's what we want to talk about a little bit. 430 00:21:12,623 --> 00:21:14,040 Now I have here a checklist that I 431 00:21:14,040 --> 00:21:19,140 don't want you to look at now, but the theory 432 00:21:19,140 --> 00:21:21,930 that Shannon developed is actually a mathematical theory. 433 00:21:21,930 --> 00:21:24,300 It's a probabilistic theory. 434 00:21:24,300 --> 00:21:28,912 And if you're going to be doing calculations with probability, 435 00:21:28,912 --> 00:21:30,120 you need to know some basics. 436 00:21:32,670 --> 00:21:34,505 What I put down there is a checklist. 437 00:21:34,505 --> 00:21:36,880 We don't have a probability prerequisite for this course. 438 00:21:36,880 --> 00:21:40,590 We assume you've seen some in high school, some in 601, some 439 00:21:40,590 --> 00:21:41,707 elsewhere. 440 00:21:41,707 --> 00:21:44,040 By the way, I should say that all these slides are going 441 00:21:44,040 --> 00:21:45,915 to be on the web, or maybe they're on the web 442 00:21:45,915 --> 00:21:49,478 already, so you don't have to scramble to copy all this. 443 00:21:49,478 --> 00:21:51,270 We'll put the lecture slides up on the web. 444 00:21:51,270 --> 00:21:53,410 We may not have them exactly before lecture, 445 00:21:53,410 --> 00:21:56,520 because I'm often working right to the bell, 446 00:21:56,520 --> 00:21:58,350 but we'll have them after the lecture. 447 00:21:58,350 --> 00:22:01,140 So take down whatever notes you want, but you 448 00:22:01,140 --> 00:22:04,183 don't have to scramble to get every word here. 449 00:22:04,183 --> 00:22:05,850 By the way, the other thing I should say 450 00:22:05,850 --> 00:22:08,190 is that your contract in this course 451 00:22:08,190 --> 00:22:10,958 is not with whatever materials on the web 452 00:22:10,958 --> 00:22:12,750 or what you find from past terms and so on. 453 00:22:12,750 --> 00:22:15,390 It's really with us in lecture and in recitation. 454 00:22:15,390 --> 00:22:18,810 So we urge you to come to lecture even though all of this 455 00:22:18,810 --> 00:22:22,692 will be posted, because there's other learning that happens, 456 00:22:22,692 --> 00:22:24,900 and you have a chance to bring up questions, and hear 457 00:22:24,900 --> 00:22:26,842 other people's questions, and so on. 458 00:22:26,842 --> 00:22:28,300 I'm not going to read through that. 459 00:22:28,300 --> 00:22:31,140 But I want to have a little picture that you can carry away 460 00:22:31,140 --> 00:22:35,040 in your mind for what we think of when we 461 00:22:35,040 --> 00:22:37,630 think of a probabilistic model. 462 00:22:37,630 --> 00:22:40,320 So we've got a universe of possibilities. 463 00:22:40,320 --> 00:22:41,680 We've got outcomes. 464 00:22:41,680 --> 00:22:44,580 These are what are called elementary outcomes. 465 00:22:44,580 --> 00:22:46,410 Think of it as rolling a die, for instance, 466 00:22:46,410 --> 00:22:47,880 and I get one of six numbers. 467 00:22:47,880 --> 00:22:50,040 So each of those is an outcome. 468 00:22:50,040 --> 00:22:51,415 So here's the elementary outcome. 469 00:22:57,840 --> 00:23:01,770 I could number them s1 to s n, and they 470 00:23:01,770 --> 00:23:02,770 don't have to be finite. 471 00:23:02,770 --> 00:23:05,080 It could be an infinite number, a. 472 00:23:05,080 --> 00:23:07,930 Continuum we'll see examples of that later. 473 00:23:07,930 --> 00:23:11,430 But if you're thinking of a source emitting symbols 474 00:23:11,430 --> 00:23:15,540 to construct a message, then at every instant of time, 475 00:23:15,540 --> 00:23:18,330 the source is picking one of these symbols 476 00:23:18,330 --> 00:23:19,350 with some probability. 477 00:23:19,350 --> 00:23:22,590 So that's the kind of model we're thinking of. 478 00:23:22,590 --> 00:23:25,370 So here are the elementary outcomes-- 479 00:23:25,370 --> 00:23:27,990 s1, s13, and so on. 480 00:23:27,990 --> 00:23:31,860 You've got events, and events are just 481 00:23:31,860 --> 00:23:33,130 collections of outcomes. 482 00:23:33,130 --> 00:23:35,885 So events are sets. 483 00:23:41,470 --> 00:23:45,640 So this is the event or set a, just a collection 484 00:23:45,640 --> 00:23:47,140 of elementary outcomes. 485 00:23:47,140 --> 00:23:49,330 I say that the event has occurred 486 00:23:49,330 --> 00:23:53,073 if the outcome of the experiment is one of the dots in here. 487 00:23:53,073 --> 00:23:54,490 If the dot is out here, if this is 488 00:23:54,490 --> 00:23:56,198 what you got when you ran the experiment, 489 00:23:56,198 --> 00:23:57,410 the event didn't occur. 490 00:23:57,410 --> 00:24:01,060 So an event is just a set, a subset of these outcomes. 491 00:24:01,060 --> 00:24:03,850 We say "the event has occurred" if the outcome that 492 00:24:03,850 --> 00:24:08,260 actually occurs is sitting inside, in that set. 493 00:24:08,260 --> 00:24:11,110 And then we can talk about intersections of events. 494 00:24:11,110 --> 00:24:13,420 We say that if this event a and this 495 00:24:13,420 --> 00:24:16,930 is b, the event a and b corresponds 496 00:24:16,930 --> 00:24:19,580 to outcomes that live in both sets. 497 00:24:19,580 --> 00:24:27,320 So if I roll a die, and I get a number that is even on a prime, 498 00:24:27,320 --> 00:24:29,340 that tells me what that number is. 499 00:24:29,340 --> 00:24:32,930 So if this is the event of getting an even number 500 00:24:32,930 --> 00:24:35,150 on rolling a die, and this is the event 501 00:24:35,150 --> 00:24:38,120 of getting a prime number on rolling a die, 502 00:24:38,120 --> 00:24:42,080 what number do I get when both events have occurred? 503 00:24:42,080 --> 00:24:44,690 So I can identify different events, 504 00:24:44,690 --> 00:24:46,970 and then I assign probabilities to them. 505 00:24:46,970 --> 00:24:50,720 So I can talk about the probability of an event. 506 00:24:50,720 --> 00:24:55,070 And then you can combine probabilities in useful ways. 507 00:24:57,662 --> 00:24:59,870 So let's see, there's a lot on here because I wanted, 508 00:24:59,870 --> 00:25:01,580 in principle, to fit it all on one slide 509 00:25:01,580 --> 00:25:03,122 that you could carry around with you. 510 00:25:05,710 --> 00:25:08,980 Probabilities live between 0 and 1. 511 00:25:08,980 --> 00:25:11,890 The probability of the universal set 512 00:25:11,890 --> 00:25:15,600 is 1, meaning when you do the experiment, something happens. 513 00:25:15,600 --> 00:25:18,040 So it's guaranteed that something happens 514 00:25:18,040 --> 00:25:19,640 and therefore, u always happens. 515 00:25:19,640 --> 00:25:21,050 So the probability of u is 1. 516 00:25:24,280 --> 00:25:28,550 And then the probability of a or b happening 517 00:25:28,550 --> 00:25:32,840 is the probability of a plus the probability of b 518 00:25:32,840 --> 00:25:36,660 If a and b have no intersection, if they're mutually exclusive. 519 00:25:36,660 --> 00:25:39,590 So we say that two events are mutually exclusive-- actually, 520 00:25:39,590 --> 00:25:41,730 let me draw a c over here-- 521 00:25:41,730 --> 00:25:43,790 a and c in this picture are mutually 522 00:25:43,790 --> 00:25:46,460 exclusive because there's no outcome that's 523 00:25:46,460 --> 00:25:48,500 common to the two events. 524 00:25:48,500 --> 00:25:52,960 So if one event occurs, you know the other one didn't occur. 525 00:25:52,960 --> 00:25:56,220 And so if I now ask what's the probability of a or c 526 00:25:56,220 --> 00:25:59,678 occurring, it's the probability of a plus the probability of c. 527 00:25:59,678 --> 00:26:01,720 You'll be doing this all the time in this course. 528 00:26:01,720 --> 00:26:04,350 You'll be adding probabilities, but you've got to think-- 529 00:26:04,350 --> 00:26:07,290 am I looking at mutually exclusive events? 530 00:26:07,290 --> 00:26:09,190 If you've of got mutually exclusive events, 531 00:26:09,190 --> 00:26:11,520 then the probability of one or the other happening 532 00:26:11,520 --> 00:26:14,643 is the sum of the individual probabilities. 533 00:26:14,643 --> 00:26:16,060 If they're not mutually exclusive, 534 00:26:16,060 --> 00:26:19,890 then there's a little correction you have to make. 535 00:26:19,890 --> 00:26:21,690 The probability of a or b happening 536 00:26:21,690 --> 00:26:24,120 is the probability of a plus the probability 537 00:26:24,120 --> 00:26:27,630 of b minus the probability of both happening. 538 00:26:27,630 --> 00:26:31,900 All of this is quite intuitive. 539 00:26:31,900 --> 00:26:35,650 Another notion that's important is independence. 540 00:26:35,650 --> 00:26:37,960 So we've seen that mutual exclusivity allows 541 00:26:37,960 --> 00:26:40,180 you to add probabilities. 542 00:26:40,180 --> 00:26:42,650 Independence allows you to multiply probabilities. 543 00:26:42,650 --> 00:26:45,050 So we say that a set of events-- 544 00:26:45,050 --> 00:26:47,260 a, b, c, d, e, for instance-- 545 00:26:47,260 --> 00:26:51,880 are mutually independent if the probability of a and b 546 00:26:51,880 --> 00:26:55,910 and c and d happening is the product of the individual ones. 547 00:26:55,910 --> 00:26:58,193 But similarly for any subcollection-- 548 00:26:58,193 --> 00:27:00,110 so you're going to call a collection of events 549 00:27:00,110 --> 00:27:03,640 independent if the joint probability of their happening 550 00:27:03,640 --> 00:27:07,840 in any combination factors into the product 551 00:27:07,840 --> 00:27:09,180 of the individual probabilities. 552 00:27:09,180 --> 00:27:11,560 And again, this is a computation you'll 553 00:27:11,560 --> 00:27:13,780 be doing all the time in different settings, 554 00:27:13,780 --> 00:27:16,628 but you've got to think to yourself-- am I applying it 555 00:27:16,628 --> 00:27:17,920 to things that are independent? 556 00:27:17,920 --> 00:27:19,990 Because if not, then it's not clear you 557 00:27:19,990 --> 00:27:22,180 can do this factorization. 558 00:27:22,180 --> 00:27:24,790 We'll come later to talk about conditional probabilities. 559 00:27:24,790 --> 00:27:28,540 But the probability of a given that b has occurred-- 560 00:27:28,540 --> 00:27:31,690 we can actually sort of see it here-- 561 00:27:31,690 --> 00:27:35,200 the probability of a given that b has occurred 562 00:27:35,200 --> 00:27:42,640 is this area as a fraction of the whole area there. 563 00:27:42,640 --> 00:27:47,960 Sorry, the probability of a given that b has occurred 564 00:27:47,960 --> 00:27:57,110 is the probability of a and b over the probability of b. 565 00:27:59,602 --> 00:28:01,060 Given that b has occurred, you know 566 00:28:01,060 --> 00:28:03,373 that you're somewhere in here. 567 00:28:03,373 --> 00:28:05,540 And what's the probability that a has occurred given 568 00:28:05,540 --> 00:28:06,800 that you're somewhere in here? 569 00:28:06,800 --> 00:28:11,120 It's the probability associated with that intersection. 570 00:28:11,120 --> 00:28:13,310 One last thing-- expectation. 571 00:28:13,310 --> 00:28:16,820 We talk about the expected value of a random variable 572 00:28:16,820 --> 00:28:18,410 as being basically the average value 573 00:28:18,410 --> 00:28:23,410 it takes over a typical experiment, let's say. 574 00:28:23,410 --> 00:28:25,730 And the way you compute that is by computing 575 00:28:25,730 --> 00:28:28,160 the average weighted by the associated probabilities. 576 00:28:28,160 --> 00:28:31,130 And we'll see an example of that. 577 00:28:31,130 --> 00:28:37,290 I didn't feel right just jumping into Shannon's definition 578 00:28:37,290 --> 00:28:40,650 of information without saying a little bit about how you set up 579 00:28:40,650 --> 00:28:42,240 a probabilistic model. 580 00:28:42,240 --> 00:28:45,000 But with all that said, here's what 581 00:28:45,000 --> 00:28:49,570 Shannon had as the core of his story, 582 00:28:49,570 --> 00:28:51,480 and building on earlier work by other people. 583 00:28:54,010 --> 00:28:58,770 So if you're thinking of a source that's putting out 584 00:28:58,770 --> 00:29:02,790 symbols, the symbols can be s1 up to s n, 585 00:29:02,790 --> 00:29:08,460 the information in being told that the symbol s i was emitted 586 00:29:08,460 --> 00:29:13,758 is defined as log to the base 2 of 1 over the probability. 587 00:29:13,758 --> 00:29:15,300 So what you're trying to come up with 588 00:29:15,300 --> 00:29:17,562 is actually a measure of surprise. 589 00:29:17,562 --> 00:29:19,770 Maybe "surprise" is a better word than "information." 590 00:29:19,770 --> 00:29:21,880 "Information" is very loaded word. 591 00:29:21,880 --> 00:29:23,640 But what you're trying to measure here 592 00:29:23,640 --> 00:29:28,410 is how probable is the thing that I am just seeing. 593 00:29:28,410 --> 00:29:30,670 If it's a highly improbable event, 594 00:29:30,670 --> 00:29:33,660 I gain a lot of information by being told that it's occurred. 595 00:29:33,660 --> 00:29:36,810 If it's a high probability event, 596 00:29:36,810 --> 00:29:39,850 I don't get much information by being told that it's occurred. 597 00:29:39,850 --> 00:29:42,870 So you want something that's dependent reciprocally 598 00:29:42,870 --> 00:29:45,570 on probability. 599 00:29:45,570 --> 00:29:48,450 The log is useful because that allows 600 00:29:48,450 --> 00:29:52,260 you to have the information given to you by two 601 00:29:52,260 --> 00:29:55,260 independent events being the sum of the information in each 602 00:29:55,260 --> 00:29:56,590 of them. 603 00:29:56,590 --> 00:29:59,460 And the calculation is just this-- 604 00:29:59,460 --> 00:30:15,680 it says that if a and b are independent events, 605 00:30:15,680 --> 00:30:18,760 then the information I get on being told that both of them 606 00:30:18,760 --> 00:30:25,690 occur is 1 over log p a p b. 607 00:30:25,690 --> 00:30:35,590 But that then just becomes the sum of the individual ones. 608 00:30:41,110 --> 00:30:44,290 So the advantage of having a log in that definition 609 00:30:44,290 --> 00:30:47,610 is that for independent events where the-- 610 00:30:47,610 --> 00:30:50,830 I should have actually perhaps written one more here. 611 00:30:50,830 --> 00:30:54,490 Here's the information in being told 612 00:30:54,490 --> 00:30:57,350 that both events have occurred. 613 00:30:57,350 --> 00:30:59,540 Because they're independent, that joint probability 614 00:30:59,540 --> 00:31:02,570 factors into the product of the individual ones, which 615 00:31:02,570 --> 00:31:06,380 then factors into the sum of these two logarithms. 616 00:31:06,380 --> 00:31:12,020 So here's the information in being told a and b. 617 00:31:12,020 --> 00:31:13,670 Here's the information in being told 618 00:31:13,670 --> 00:31:16,790 just a occurred, and here's the information in being 619 00:31:16,790 --> 00:31:18,830 told that just b occurred. 620 00:31:18,830 --> 00:31:21,290 So the log allows things to be additive 621 00:31:21,290 --> 00:31:23,390 over independent events. 622 00:31:23,390 --> 00:31:26,590 Now, the base 2 was a matter of choice. 623 00:31:26,590 --> 00:31:30,350 Hartley chose base 10, Shannon chose base 2. 624 00:31:30,350 --> 00:31:32,240 And he called it the "bit." 625 00:31:32,240 --> 00:31:36,020 So when you measure information according to this formula, 626 00:31:36,020 --> 00:31:38,540 with the log taken to the base, 2 you 627 00:31:38,540 --> 00:31:42,770 call the resulting number the number of "bits" of information 628 00:31:42,770 --> 00:31:44,330 in that revelation. 629 00:31:44,330 --> 00:31:47,630 I'm being told that that's the output. 630 00:31:47,630 --> 00:31:50,160 Now for this lecture and probably only this lecture, 631 00:31:50,160 --> 00:31:53,750 I'm going to try and maintain a distinction between the bit 632 00:31:53,750 --> 00:31:59,210 as a unit of information and our everyday use of the word "bit" 633 00:31:59,210 --> 00:32:02,120 to mean a binary digit. 634 00:32:02,120 --> 00:32:04,460 It's unfortunate that they both have the same name, 635 00:32:04,460 --> 00:32:07,180 because they actually refer to slightly different things. 636 00:32:07,180 --> 00:32:10,010 A binary digit is just a 0 or 1 sitting 637 00:32:10,010 --> 00:32:13,040 in a register in your electronics, 638 00:32:13,040 --> 00:32:15,200 whereas this is a unit of measurement. 639 00:32:15,200 --> 00:32:18,620 And the two are not necessarily the same thing. 640 00:32:18,620 --> 00:32:22,130 So I'll try and catch myself and say "binary digit" 641 00:32:22,130 --> 00:32:24,910 when I mean something that can be a 0 or 1, 642 00:32:24,910 --> 00:32:28,920 and "bit" when I'm talking about a measure of information. 643 00:32:28,920 --> 00:32:33,020 But here, for instance, is a case where the two coincide. 644 00:32:33,020 --> 00:32:35,960 If I'm tossing a fair coin, so it's a probability 1/2 645 00:32:35,960 --> 00:32:40,040 that it comes up heads, 1/2 that it comes up tails, then 646 00:32:40,040 --> 00:32:44,260 log to the base 2 of 1 over 1/2 gives me 1. 647 00:32:44,260 --> 00:32:45,800 So there's one bit of information 648 00:32:45,800 --> 00:32:52,400 in being told what the outcome is on the toss of a fair coin. 649 00:32:52,400 --> 00:32:56,150 And that sort of aligns with our notion of a binary 650 00:32:56,150 --> 00:32:58,410 digit as being something that can be either 0 or 1. 651 00:32:58,410 --> 00:33:00,170 We don't usually associated probabilities 652 00:33:00,170 --> 00:33:05,530 when we use "binary digit," but with "bit," we do. 653 00:33:05,530 --> 00:33:09,590 So Shannon has a measure of information. 654 00:33:09,590 --> 00:33:12,090 And there are examples we can talk about there in the notes, 655 00:33:12,090 --> 00:33:16,640 so I won't go through them. 656 00:33:16,640 --> 00:33:18,370 And I think I've said this already, 657 00:33:18,370 --> 00:33:20,650 so I'll pass through that and get 658 00:33:20,650 --> 00:33:24,550 to his second important notion, which is the notion of entropy. 659 00:33:24,550 --> 00:33:27,920 The entropy is the expected information from a source. 660 00:33:27,920 --> 00:33:44,350 So what we have is expected information from a source 661 00:33:44,350 --> 00:33:46,540 or from the output of an experiment, 662 00:33:46,540 --> 00:33:52,810 but if you're thinking of a source emitting symbols, 663 00:33:52,810 --> 00:33:57,460 this source can emit symbols s1 all the way up to s n, 664 00:33:57,460 --> 00:34:06,360 let's say, with probabilities p1 up to p n, let's say. 665 00:34:06,360 --> 00:34:09,820 And the sum of those is going to be 1. 666 00:34:09,820 --> 00:34:13,570 If I tell you that s1 was emitted by the source, 667 00:34:13,570 --> 00:34:19,960 I've given you an information log 2 1 over p1. 668 00:34:19,960 --> 00:34:22,389 If I tell you s1 was emitted, that's 669 00:34:22,389 --> 00:34:25,120 the information I've given you. 670 00:34:25,120 --> 00:34:27,489 But if I ask you before you see anything coming out 671 00:34:27,489 --> 00:34:29,620 of the source, "What's the expected information, 672 00:34:29,620 --> 00:34:31,370 what information do you expect to get when 673 00:34:31,370 --> 00:34:34,540 I run the experiment, when I produce a symbol," 674 00:34:34,540 --> 00:34:37,210 then you've got to actually average 675 00:34:37,210 --> 00:34:40,719 this quantity over all possible symbols that you might get. 676 00:34:40,719 --> 00:34:42,760 But it's got to be weighted by the probability 677 00:34:42,760 --> 00:34:44,800 with which you're going to see that symbol. 678 00:34:44,800 --> 00:34:46,449 So this is exactly what I had defined 679 00:34:46,449 --> 00:34:48,020 earlier as an expected value. 680 00:34:48,020 --> 00:34:53,320 So the entropy of the source is the expected information-- 681 00:34:53,320 --> 00:35:04,120 or let's say the expected value of information 682 00:35:04,120 --> 00:35:06,430 you get when you're told the output of the source. 683 00:35:06,430 --> 00:35:13,600 And so if the emission is s1, then the information is this, 684 00:35:13,600 --> 00:35:16,570 but that happens with probability p1. 685 00:35:16,570 --> 00:35:25,150 If the emission is s2, that carries this information 686 00:35:25,150 --> 00:35:27,700 that happens with this probability, and so on. 687 00:35:30,757 --> 00:35:31,715 So this is the entropy. 688 00:35:38,500 --> 00:35:41,140 Shannon is borrowing here from ideas 689 00:35:41,140 --> 00:35:44,410 developed in thermodynamics and statistical physics. 690 00:35:44,410 --> 00:35:47,950 People like Gibbs at Yale in 1900 691 00:35:47,950 --> 00:35:50,260 already had notions of this type. 692 00:35:50,260 --> 00:35:52,210 His innovation is in actually applying this 693 00:35:52,210 --> 00:35:54,760 to communications, and he has several constructs beyond this. 694 00:35:54,760 --> 00:35:56,473 We'll come to some of them later. 695 00:35:56,473 --> 00:35:58,390 But up to this point, he's making a connection 696 00:35:58,390 --> 00:36:02,002 with what they do in statistical physics, 697 00:36:02,002 --> 00:36:03,460 except they're usually not thinking 698 00:36:03,460 --> 00:36:04,690 in terms of information. 699 00:36:04,690 --> 00:36:06,840 They're thinking in terms of uncertainty here. 700 00:36:06,840 --> 00:36:10,600 And they're not thinking of sources emitting symbols. 701 00:36:10,600 --> 00:36:11,610 So this is the entropy. 702 00:36:14,250 --> 00:36:16,650 So for instance, if you've got a case where 703 00:36:16,650 --> 00:36:21,280 you have capital N symbols and they're all equally likely, 704 00:36:21,280 --> 00:36:24,570 then the probability of any one of them is 1 over N. 705 00:36:24,570 --> 00:36:26,130 So what is the entropy? 706 00:36:26,130 --> 00:36:34,500 Well, it's going to be summation i equals 1 to N, 707 00:36:34,500 --> 00:36:46,750 each probability is 1 over N. I take log 2 1 over 1 over N. 708 00:36:46,750 --> 00:36:48,220 So what does that end up being? 709 00:36:48,220 --> 00:36:50,950 That ends up being-- 710 00:36:50,950 --> 00:36:54,370 I can take the log 2 N out, and then I've 711 00:36:54,370 --> 00:37:03,450 got the summation, 1 over N. And the result is log 2 N. So 712 00:37:03,450 --> 00:37:08,400 if I've got equally likely symbols, N of them, 713 00:37:08,400 --> 00:37:10,770 then the entropy, the expected information 714 00:37:10,770 --> 00:37:16,430 from being revealed what the outcome is, is log 2 N. 715 00:37:16,430 --> 00:37:21,260 It turns out that this is the best possible case in the sense 716 00:37:21,260 --> 00:37:22,280 of maximum uncertainty. 717 00:37:22,280 --> 00:37:25,080 If you're looking for a source that's maximally uncertain, 718 00:37:25,080 --> 00:37:28,460 that's going to surprise you the most when it emits a symbol, 719 00:37:28,460 --> 00:37:31,010 it's a source in which all the probabilities of symbols 720 00:37:31,010 --> 00:37:33,470 are equal. 721 00:37:33,470 --> 00:37:35,167 Symbols are equally likely-- 722 00:37:35,167 --> 00:37:37,250 that's when you're going to be surprised the most. 723 00:37:37,250 --> 00:37:40,410 Now you can see this in a particular example here. 724 00:37:40,410 --> 00:37:43,370 Let's look at the case of capital N equals 2. 725 00:37:48,780 --> 00:37:51,490 So we're just talking about a coin toss. 726 00:37:51,490 --> 00:37:52,165 I toss a coin. 727 00:37:57,410 --> 00:37:59,930 I get heads with probability p, some p. 728 00:37:59,930 --> 00:38:03,308 I get tails with some probability 1 minus p. 729 00:38:03,308 --> 00:38:04,850 Instead of saying "heads" or "tails," 730 00:38:04,850 --> 00:38:09,080 I could make it look a little more numerical. 731 00:38:09,080 --> 00:38:12,660 I could say C equals 1 for a head, C equals 0 for a tail. 732 00:38:12,660 --> 00:38:16,790 That's sort of coding the output of the coin toss. 733 00:38:16,790 --> 00:38:19,420 And now I can evaluate the entropy for any value p 734 00:38:19,420 --> 00:38:20,310 you give me. 735 00:38:20,310 --> 00:38:24,620 So if you've got a fair coin with p of 0.5, 736 00:38:24,620 --> 00:38:26,120 I evaluate the entropy. 737 00:38:26,120 --> 00:38:28,610 And I find, indeed, that it's one bit. 738 00:38:28,610 --> 00:38:31,670 So the average information conveyed to me 739 00:38:31,670 --> 00:38:35,000 by telling me the output of the toss of a fair coin is 740 00:38:35,000 --> 00:38:37,400 one bit of information. 741 00:38:37,400 --> 00:38:40,600 But if the coin is heavily biased, 742 00:38:40,600 --> 00:38:43,840 then the average information or the expected information 743 00:38:43,840 --> 00:38:44,770 can be a lot less. 744 00:38:52,997 --> 00:38:54,830 This turns out to be a very tight connection 745 00:38:54,830 --> 00:38:56,790 to this idea of coding. 746 00:38:56,790 --> 00:38:58,940 So let's actually take an extreme example. 747 00:38:58,940 --> 00:39:00,440 I've taken the case now where you've 748 00:39:00,440 --> 00:39:02,810 got a terribly biased coin. 749 00:39:02,810 --> 00:39:07,540 It's not p equals 0.5, it's p over 10 to the 24. 750 00:39:07,540 --> 00:39:10,480 I picked 10 to the 24 because log to the base 2 of that 751 00:39:10,480 --> 00:39:11,770 is easy. 752 00:39:11,770 --> 00:39:15,430 So it's a very small probability of getting a head. 753 00:39:15,430 --> 00:39:20,080 In fact, if you were to run 1,024 trials, the law 754 00:39:20,080 --> 00:39:22,440 of large numbers, which I haven't put on that one sheet, 755 00:39:22,440 --> 00:39:24,190 but you probably believe this-- 756 00:39:24,190 --> 00:39:28,000 if I had a coin that had a 1 in 1,024 probability of coming up 757 00:39:28,000 --> 00:39:31,810 heads, and I threw a coin 1,024 times, 758 00:39:31,810 --> 00:39:36,400 I'm more likely to get a heads once than anything else. 759 00:39:36,400 --> 00:39:38,650 And actually in a very long stretch, 760 00:39:38,650 --> 00:39:41,560 that's just about exactly the fraction of heads that you get. 761 00:39:41,560 --> 00:39:44,590 That's the law of large numbers. 762 00:39:44,590 --> 00:39:49,810 In that case, what is the entropy? 763 00:39:49,810 --> 00:39:56,800 So I've got p times log 2 1 over p plus 1 minus p times log 2 1 764 00:39:56,800 --> 00:39:57,380 minus p. 765 00:39:57,380 --> 00:39:59,820 I'm just evaluating that parabolic-looking function. 766 00:39:59,820 --> 00:40:01,632 It's not quite a parabola. 767 00:40:01,632 --> 00:40:07,660 And I see that I've got just 0.0112 bits of information 768 00:40:07,660 --> 00:40:08,680 per trial. 769 00:40:08,680 --> 00:40:10,375 So unlike the case of a fair coin-- 770 00:40:13,120 --> 00:40:17,200 remember, in the case of a fair coin it equals 0.5, 771 00:40:17,200 --> 00:40:19,600 I have an entropy of one bit. 772 00:40:19,600 --> 00:40:22,450 That's the average information revealed by a single toss. 773 00:40:22,450 --> 00:40:24,520 Now I'm down to much less. 774 00:40:27,850 --> 00:40:33,440 I'm down to 0.0112 bits per trial. 775 00:40:33,440 --> 00:40:35,870 And the reason is that this coin is almost certainly 776 00:40:35,870 --> 00:40:38,540 going to come up tails, because the probability of heads 777 00:40:38,540 --> 00:40:40,190 is so small. 778 00:40:40,190 --> 00:40:41,840 So for almost every trial, you'll 779 00:40:41,840 --> 00:40:44,677 tell me, "Oh, it came up tails." 780 00:40:44,677 --> 00:40:46,010 And there's no surprise in that. 781 00:40:46,010 --> 00:40:48,170 There's no information. 782 00:40:48,170 --> 00:40:51,800 There's just the occasional heads in that pile. 783 00:40:51,800 --> 00:40:54,800 And when you tell me that came up heads, I'll be surprised. 784 00:40:54,800 --> 00:40:55,970 I get a lot of information. 785 00:40:55,970 --> 00:40:58,370 But not when I average it out over all experiments. 786 00:40:58,370 --> 00:41:00,850 It's actually low average information there. 787 00:41:04,050 --> 00:41:07,830 So if you wanted to tell me the results of a series of coin 788 00:41:07,830 --> 00:41:11,462 tosses with this coin, you toss it 1,024 times, 789 00:41:11,462 --> 00:41:13,920 and you want to tell me what the result of that set of coin 790 00:41:13,920 --> 00:41:19,650 tosses is, it would seem to be very inefficient to give me 791 00:41:19,650 --> 00:41:26,640 1,024 0's and 1's, saying, it was 0, 0, 0, 0, all 792 00:41:26,640 --> 00:41:27,600 the way along here. 793 00:41:31,380 --> 00:41:33,510 Let me say it this way-- 794 00:41:33,510 --> 00:41:41,680 here's one way to code it that would tell me 795 00:41:41,680 --> 00:41:45,220 what you got in 1,024 trials. 796 00:41:45,220 --> 00:41:48,310 You could say, well, it was tails, tails, tails, tails, 797 00:41:48,310 --> 00:41:52,600 tails, tails, tails, oops, head, tails, tails, tails. 798 00:41:52,600 --> 00:41:58,180 So you could give me that 1,024 binary digits with a 1 799 00:41:58,180 --> 00:42:01,990 to tell me exactly where you got the heads. 800 00:42:01,990 --> 00:42:06,000 It seems a very inefficient use of binary digits. 801 00:42:06,000 --> 00:42:09,410 A binary digit can actually reveal a bit of information, 802 00:42:09,410 --> 00:42:12,820 and here you are using 1,024 binary digits 803 00:42:12,820 --> 00:42:15,320 to reveal much less information. 804 00:42:15,320 --> 00:42:16,480 In fact, let's see-- 805 00:42:16,480 --> 00:42:31,712 0.012 bits per toss times 1,024 is really 806 00:42:31,712 --> 00:42:32,920 all the information there is. 807 00:42:32,920 --> 00:42:35,470 And you shouldn't be using 1,024 binary digits 808 00:42:35,470 --> 00:42:36,970 to convey that information. 809 00:42:36,970 --> 00:42:38,887 If you're sending it over a transmission line, 810 00:42:38,887 --> 00:42:42,130 it's a very inefficient use. 811 00:42:42,130 --> 00:42:44,200 So can you think of a way to communicate 812 00:42:44,200 --> 00:42:48,280 the outcome of this result with something 813 00:42:48,280 --> 00:42:50,728 that's much more efficient? 814 00:42:57,698 --> 00:42:58,198 Yeah. 815 00:42:58,198 --> 00:43:01,690 AUDIENCE: [INAUDIBLE] 816 00:43:01,690 --> 00:43:03,740 GEORGE VERGHESE: Yeah, just since you're 817 00:43:03,740 --> 00:43:07,760 expecting in 1,024 tosses that there'll typically 818 00:43:07,760 --> 00:43:10,610 be just a single one, just encode the position 819 00:43:10,610 --> 00:43:12,080 where that one occurs. 820 00:43:12,080 --> 00:43:17,180 How many binary digits does it take to do that? 821 00:43:17,180 --> 00:43:19,160 10, right? 822 00:43:19,160 --> 00:43:22,430 1,024-- you've got to tell me, is it in position 1, 2, 3, 4? 823 00:43:22,430 --> 00:43:25,520 You've just got to be able to count up to 1,024. 824 00:43:25,520 --> 00:43:28,213 So if you send me 10 binary digits 825 00:43:28,213 --> 00:43:29,630 to tell me where that 1 is, you'll 826 00:43:29,630 --> 00:43:32,180 have revealed what the outcome of the sequence of experiments 827 00:43:32,180 --> 00:43:32,900 is. 828 00:43:32,900 --> 00:43:37,160 So 10 binary digits over 1,024 trials, 829 00:43:37,160 --> 00:43:41,375 so here's the average usage of binary digits-- 830 00:43:45,670 --> 00:43:53,490 binary digits per trial if I use your scheme. 831 00:43:53,490 --> 00:43:54,950 And that's much more. 832 00:43:54,950 --> 00:43:58,610 That's much closer to the actual bits per outcome. 833 00:43:58,610 --> 00:44:00,378 And somebody had a question on that? 834 00:44:00,378 --> 00:44:01,334 AUDIENCE: Yeah. 835 00:44:01,334 --> 00:44:03,724 Is it actually less than [INAUDIBLE]?? 836 00:44:06,683 --> 00:44:08,100 GEORGE VERGHESE: It better not be. 837 00:44:08,100 --> 00:44:11,580 And part of it might be that I've rounded this here. 838 00:44:14,840 --> 00:44:16,310 Is it a small rounding difference? 839 00:44:16,310 --> 00:44:18,147 Did you actually compute something there? 840 00:44:18,147 --> 00:44:19,002 AUDIENCE: 0.0097. 841 00:44:19,002 --> 00:44:19,960 GEORGE VERGHESE: Sorry. 842 00:44:19,960 --> 00:44:22,870 AUDIENCE: 0.0097 [INAUDIBLE]. 843 00:44:22,870 --> 00:44:25,270 GEORGE VERGHESE: Oh, is it not 0.0112? 844 00:44:25,270 --> 00:44:28,540 OK, good, I'm glad somebody computed that. 845 00:44:28,540 --> 00:44:29,710 How did I get that? 846 00:44:29,710 --> 00:44:30,210 Sorry. 847 00:44:30,210 --> 00:44:32,252 AUDIENCE: This is also only because a possibility 848 00:44:32,252 --> 00:44:34,685 of 1 [INAUDIBLE]. 849 00:44:34,685 --> 00:44:35,810 GEORGE VERGHESE: Oh, I see. 850 00:44:35,810 --> 00:44:38,020 What you're saying is that this was-- 851 00:44:38,020 --> 00:44:40,800 right, you're saying this is 0.99 something. 852 00:44:40,800 --> 00:44:42,940 OK, I'm just saying that we're in the ballpark 853 00:44:42,940 --> 00:44:45,570 if we try to code just for the single 1. 854 00:44:45,570 --> 00:44:47,960 But there will be cases in my experiments 855 00:44:47,960 --> 00:44:49,518 where there might be two of these, 856 00:44:49,518 --> 00:44:51,560 and then I've got to use a more elaborate coding. 857 00:44:51,560 --> 00:44:53,618 I'll use a longer code word. 858 00:44:53,618 --> 00:44:55,160 Those are less likely events, so I've 859 00:44:55,160 --> 00:44:56,868 got to factor in all those probabilities. 860 00:44:56,868 --> 00:44:57,680 Yeah, good. 861 00:44:57,680 --> 00:44:59,000 I'm glad you caught that. 862 00:44:59,000 --> 00:45:02,240 I don't want to get too sunk in this 863 00:45:02,240 --> 00:45:04,440 because I just want to convey the idea. 864 00:45:04,440 --> 00:45:11,570 The idea is that the Shannon entropy actually 865 00:45:11,570 --> 00:45:14,300 sets a lower limit to the average length of a code word. 866 00:45:23,270 --> 00:45:26,250 And so when you're trying to do design of codes, 867 00:45:26,250 --> 00:45:32,260 you're actually trying to find codes 868 00:45:32,260 --> 00:45:35,720 that will get you close to the Shannon entropy limit. 869 00:45:35,720 --> 00:45:39,580 So what I want to just briefly mention, 870 00:45:39,580 --> 00:45:41,740 and you'll follow up in recitation, 871 00:45:41,740 --> 00:45:51,640 is something called Huffman coding, which you might 872 00:45:51,640 --> 00:45:53,062 apply to a situation like this. 873 00:45:53,062 --> 00:45:54,520 So you're coding, let's say, grades 874 00:45:54,520 --> 00:45:56,530 to send to the registrar. 875 00:45:56,530 --> 00:46:01,990 A's occur with probability 1/3, B's with 1/2, and so on. 876 00:46:01,990 --> 00:46:05,050 You want a coding whose expected length will come 877 00:46:05,050 --> 00:46:06,560 close to the Shannon entropy. 878 00:46:06,560 --> 00:46:10,690 So the question is, what's the Shannon entropy? 879 00:46:10,690 --> 00:46:13,940 I hope I haven't jumped over too many slides. 880 00:46:13,940 --> 00:46:15,550 I have jumped over too many slides. 881 00:46:15,550 --> 00:46:20,230 Let's go back and find the Shannon entropy here. 882 00:46:20,230 --> 00:46:24,490 For that particular case, if we compute the entropy, 883 00:46:24,490 --> 00:46:29,560 we get 1.626 bits. 884 00:46:29,560 --> 00:46:34,840 If you are communicating four possible grades for 1,000 885 00:46:34,840 --> 00:46:37,870 students to the registrar, one way to do it 886 00:46:37,870 --> 00:46:39,940 would be to use two binary digits. 887 00:46:39,940 --> 00:46:42,700 You can cover all four grades, send 2,000 bits 888 00:46:42,700 --> 00:46:44,590 to the registrar. 889 00:46:44,590 --> 00:46:50,110 The entropy says that you've got 1.626 bits per grade 890 00:46:50,110 --> 00:46:50,750 on average. 891 00:46:50,750 --> 00:46:53,590 So for 1,000 grades, you should be able to get something 892 00:46:53,590 --> 00:46:55,810 closer to 1626 bits. 893 00:46:55,810 --> 00:47:00,550 So can you communicate a set of 1,000 grades occurring 894 00:47:00,550 --> 00:47:04,900 with these probabilities with a code whose expected length is 895 00:47:04,900 --> 00:47:08,290 closer to the 1626? 896 00:47:08,290 --> 00:47:10,840 That's the task for designing a variable length code. 897 00:47:10,840 --> 00:47:15,010 Now, it turns out that this task was set by Professor Fano, who 898 00:47:15,010 --> 00:47:16,780 was a Professor here, retired, but still 899 00:47:16,780 --> 00:47:20,440 comes to our weekly lunches, set as a term paper 900 00:47:20,440 --> 00:47:23,230 in the course he taught on information theory, actually 901 00:47:23,230 --> 00:47:26,830 just three years after Shannon's paper appeared. 902 00:47:26,830 --> 00:47:28,870 He posed the problem of designing 903 00:47:28,870 --> 00:47:31,930 a variable-length code whose expected length came 904 00:47:31,930 --> 00:47:36,040 as close as possible to the Shannon limit. 905 00:47:36,040 --> 00:47:38,920 Huffman struggled with that almost to the end. 906 00:47:38,920 --> 00:47:41,140 Fano offered the option of doing a final exam if you 907 00:47:41,140 --> 00:47:42,422 didn't have a term paper. 908 00:47:42,422 --> 00:47:44,380 He was about to give up on it, and then came up 909 00:47:44,380 --> 00:47:46,360 with an idea that turns out to be 910 00:47:46,360 --> 00:47:48,370 the optimal variable-length coding 911 00:47:48,370 --> 00:47:50,630 scheme for this scenario. 912 00:47:50,630 --> 00:47:56,870 So what he does is, just to very quickly finish with that, 913 00:47:56,870 --> 00:48:01,720 he takes the two lowest probability events, 914 00:48:01,720 --> 00:48:04,430 groups them together to make a single event that 915 00:48:04,430 --> 00:48:08,740 is C or D with probability 1/6. 916 00:48:08,740 --> 00:48:10,430 Then in that resulting reduced set, 917 00:48:10,430 --> 00:48:12,830 he looks at the two lowest probability events, 918 00:48:12,830 --> 00:48:15,110 combines to make them a meta-event 919 00:48:15,110 --> 00:48:17,840 with a probability that's the sum of the individual ones, 920 00:48:17,840 --> 00:48:18,360 and so on. 921 00:48:18,360 --> 00:48:20,480 So he chases this procedure up-- 922 00:48:20,480 --> 00:48:24,290 take the two lowest probability events, combine them 923 00:48:24,290 --> 00:48:26,770 into a single one with the probability that's 924 00:48:26,770 --> 00:48:30,230 a sum of these individual ones, in the resulting reduced set 925 00:48:30,230 --> 00:48:32,960 look for the two lowest probability events, and so on. 926 00:48:32,960 --> 00:48:34,670 Build up a tree. 927 00:48:34,670 --> 00:48:37,970 The resulting tree then reveals the Huffman code. 928 00:48:37,970 --> 00:48:42,950 The Hoffman code is guaranteed to have an expected length that 929 00:48:42,950 --> 00:48:47,730 satisfies this constraint, but actually has an upper bound, 930 00:48:47,730 --> 00:48:48,230 too. 931 00:48:48,230 --> 00:48:51,530 It's within entropy plus 1 on the upper side. 932 00:48:51,530 --> 00:48:55,310 We'll talk next time about how to improve this, 933 00:48:55,310 --> 00:48:57,110 but in recitation tomorrow, you'll 934 00:48:57,110 --> 00:49:01,160 get practice at a little bit more leisurely pace 935 00:49:01,160 --> 00:49:04,820 than I did here with constructing Huffman codes.