1 00:00:00,500 --> 00:00:02,730 Mathematicians like to model uncertainty 2 00:00:02,730 --> 00:00:06,130 about a particular circumstance by introducing the concept 3 00:00:06,130 --> 00:00:07,830 of a random variable. 4 00:00:07,830 --> 00:00:09,600 For our application, we'll always 5 00:00:09,600 --> 00:00:12,380 be dealing with circumstances where there are a finite number 6 00:00:12,380 --> 00:00:14,750 N of distinct choices, so we'll be 7 00:00:14,750 --> 00:00:16,470 using a discrete random variable that 8 00:00:16,470 --> 00:00:18,630 can take on one of N possible values: 9 00:00:18,630 --> 00:00:22,980 x_1, x_2, and so on up to x_N. 10 00:00:22,980 --> 00:00:25,410 The probability that X will take on the value x_1 11 00:00:25,410 --> 00:00:28,590 is given by the probability p_1, the value 12 00:00:28,590 --> 00:00:33,010 x_2 by probability p_2, and so on. 13 00:00:33,010 --> 00:00:35,490 The smaller the probability, the more uncertain 14 00:00:35,490 --> 00:00:39,410 it is that X will take on that particular value. 15 00:00:39,410 --> 00:00:41,690 Claude Shannon, in his seminal work on the theory 16 00:00:41,690 --> 00:00:44,300 of information, defined the information received when 17 00:00:44,300 --> 00:00:49,480 learning that X had taken on the value x_i as the log-base-2 18 00:00:49,480 --> 00:00:51,790 of 1/p_i. 19 00:00:51,790 --> 00:00:53,520 Note that uncertainty of a choice 20 00:00:53,520 --> 00:00:56,060 is inversely proportional to its probability, 21 00:00:56,060 --> 00:00:59,390 so the term inside of the log is basically the uncertainty 22 00:00:59,390 --> 00:01:01,700 of that particular choice. 23 00:01:01,700 --> 00:01:04,040 We use the log-base-2 to measure the magnitude 24 00:01:04,040 --> 00:01:07,120 of the uncertainty in bits -- where a bit is a quantity that 25 00:01:07,120 --> 00:01:10,770 can take on the value 0 or 1 -- think of the information 26 00:01:10,770 --> 00:01:14,380 content as the number of bits we would require to encode this 27 00:01:14,380 --> 00:01:16,490 choice. 28 00:01:16,490 --> 00:01:18,540 Suppose the data we receive doesn't 29 00:01:18,540 --> 00:01:20,280 resolve all the uncertainty. 30 00:01:20,280 --> 00:01:22,980 For example, when earlier we received the data 31 00:01:22,980 --> 00:01:25,560 that the card was a heart: some of uncertainty 32 00:01:25,560 --> 00:01:27,850 has been resolved since we know more about the card 33 00:01:27,850 --> 00:01:29,810 than we did before the receiving the data, 34 00:01:29,810 --> 00:01:32,100 but we don't yet know the exact card, 35 00:01:32,100 --> 00:01:34,990 so some uncertainty still remains. 36 00:01:34,990 --> 00:01:37,320 We can still use the formula for information content 37 00:01:37,320 --> 00:01:40,210 from the previous slide, using the probability 38 00:01:40,210 --> 00:01:43,910 we received to compute the information content. 39 00:01:43,910 --> 00:01:46,230 In our example the probability of learning 40 00:01:46,230 --> 00:01:49,860 that a card chosen randomly from a 52-card deck is a heart 41 00:01:49,860 --> 00:01:54,670 is 13/52, the number of hearts over the total number 42 00:01:54,670 --> 00:01:55,980 of choices. 43 00:01:55,980 --> 00:02:01,070 So p_data is 13/52, or 1/4 and the information content is 44 00:02:01,070 --> 00:02:03,030 computed as log-base-2 of 1/(1/4), 45 00:02:03,030 --> 00:02:07,730 which figures out to be 2 bits. 46 00:02:07,730 --> 00:02:10,280 This example is one we encounter often -- 47 00:02:10,280 --> 00:02:13,850 we receive partial information about N equally-probable 48 00:02:13,850 --> 00:02:18,220 choices (each choice has probability 1/N) that narrows 49 00:02:18,220 --> 00:02:21,000 the number of choices down to M. 50 00:02:21,000 --> 00:02:24,740 The probability of receiving such information is M*(1/N), 51 00:02:24,740 --> 00:02:31,740 so information received is log-base-2 of N/M bits. 52 00:02:31,740 --> 00:02:33,600 Let's look at some examples. 53 00:02:33,600 --> 00:02:36,160 If we learn the result (HEADS or TAILS) 54 00:02:36,160 --> 00:02:39,370 of a flip of a fair coin, we go from 2 choices 55 00:02:39,370 --> 00:02:40,600 to a single choice. 56 00:02:40,600 --> 00:02:43,100 So the information received is log-base-2 of 2/1, 57 00:02:43,100 --> 00:02:45,780 or a single bit. 58 00:02:45,780 --> 00:02:47,330 This makes sense: it would take us 59 00:02:47,330 --> 00:02:49,880 one bit to encode which of the two possibilities 60 00:02:49,880 --> 00:02:54,920 actually happened, say, "1" for heads and "0" for tails. 61 00:02:54,920 --> 00:02:57,160 Reviewing the example from the previous slide: 62 00:02:57,160 --> 00:02:59,970 learning that a card drawn from a fresh deck is a heart 63 00:02:59,970 --> 00:03:05,590 gives us log-base-2 of 52/13, or 2 bits of information. 64 00:03:05,590 --> 00:03:07,290 Again this makes sense: it would take us 65 00:03:07,290 --> 00:03:10,200 two bits to encode which of four possible card suits 66 00:03:10,200 --> 00:03:12,410 had turned up. 67 00:03:12,410 --> 00:03:14,190 Finally consider what information 68 00:03:14,190 --> 00:03:18,220 we get from rolling two dice, one red and one green. 69 00:03:18,220 --> 00:03:23,200 Each die has six faces, so there are 36 possible combinations. 70 00:03:23,200 --> 00:03:25,580 Once we learn the exact outcome of the roll, 71 00:03:25,580 --> 00:03:30,700 we've received log-base-2 of 36/1 or 5.17 bits 72 00:03:30,700 --> 00:03:32,320 of information. 73 00:03:32,320 --> 00:03:32,960 Hmm. 74 00:03:32,960 --> 00:03:35,470 What do those fractional bits mean? 75 00:03:35,470 --> 00:03:38,740 Our circuitry will only deal with whole bits! 76 00:03:38,740 --> 00:03:42,360 So to encode a single outcome, we'd need to use six bits. 77 00:03:42,360 --> 00:03:44,670 But suppose we wanted to record the outcome of 10 78 00:03:44,670 --> 00:03:46,550 successive rolls. 79 00:03:46,550 --> 00:03:50,760 At 6 bits per roll, we would need a total of 60 bits. 80 00:03:50,760 --> 00:03:52,690 What this formula is telling us is 81 00:03:52,690 --> 00:03:55,720 that we would need not 60 bits, but only 82 00:03:55,720 --> 00:03:59,620 52 bits to unambiguously encode the results. 83 00:03:59,620 --> 00:04:01,560 Whether we can come with an encoding that 84 00:04:01,560 --> 00:04:04,310 achieves this lower bound is an interesting question, 85 00:04:04,310 --> 00:04:07,850 which we will take up later in the chapter. 86 00:04:07,850 --> 00:04:10,850 To wrap up, let's return to our initial example. 87 00:04:10,850 --> 00:04:13,430 Here's table showing the different choices for the data 88 00:04:13,430 --> 00:04:15,630 received, along with the probability 89 00:04:15,630 --> 00:04:19,662 of that event and the computed information content. 90 00:04:19,662 --> 00:04:21,120 We've already talked about learning 91 00:04:21,120 --> 00:04:22,560 that the card was a heart. 92 00:04:22,560 --> 00:04:26,340 The probability of this event is 13/52 with an information 93 00:04:26,340 --> 00:04:28,820 content of 2 bits. 94 00:04:28,820 --> 00:04:31,250 Learning that a card is not the Ace of spades 95 00:04:31,250 --> 00:04:33,870 is quite likely, since there's only one chance in 52 96 00:04:33,870 --> 00:04:36,600 that it is the Ace of spades. 97 00:04:36,600 --> 00:04:40,930 So we only get a small amount of information from this event -- 98 00:04:40,930 --> 00:04:42,790 .028 bits. 99 00:04:42,790 --> 00:04:44,990 There twelve face cards in a card deck, 100 00:04:44,990 --> 00:04:47,360 so the probability of this event is 12/52 101 00:04:47,360 --> 00:04:52,290 and we would receive 2.115 bits. 102 00:04:52,290 --> 00:04:55,040 A bit more information than learning about the card's suit 103 00:04:55,040 --> 00:04:59,200 since there's slightly less residual uncertainty. 104 00:04:59,200 --> 00:05:01,880 Finally, we get the most information when all 105 00:05:01,880 --> 00:05:04,910 uncertainty is eliminated -- a bit more than 5.7 bits. 106 00:05:04,910 --> 00:05:11,120 The results line up nicely with our and Mr. Blue's intuition: 107 00:05:11,120 --> 00:05:14,310 the more uncertainty is resolved, the more information 108 00:05:14,310 --> 00:05:16,090 we have received. 109 00:05:16,090 --> 00:05:19,040 Now try your hand at computing the information for a few 110 00:05:19,040 --> 00:05:22,070 more examples in the following exercises.