1
00:00:00,500 --> 00:00:02,730
Mathematicians like
to model uncertainty
2
00:00:02,730 --> 00:00:06,130
about a particular circumstance
by introducing the concept
3
00:00:06,130 --> 00:00:07,830
of a random variable.
4
00:00:07,830 --> 00:00:09,600
For our application,
we'll always
5
00:00:09,600 --> 00:00:12,380
be dealing with circumstances
where there are a finite number
6
00:00:12,380 --> 00:00:14,750
N of distinct
choices, so we'll be
7
00:00:14,750 --> 00:00:16,470
using a discrete
random variable that
8
00:00:16,470 --> 00:00:18,630
can take on one of
N possible values:
9
00:00:18,630 --> 00:00:22,980
x_1, x_2, and so on up to x_N.
10
00:00:22,980 --> 00:00:25,410
The probability that X
will take on the value x_1
11
00:00:25,410 --> 00:00:28,590
is given by the
probability p_1, the value
12
00:00:28,590 --> 00:00:33,010
x_2 by probability
p_2, and so on.
13
00:00:33,010 --> 00:00:35,490
The smaller the probability,
the more uncertain
14
00:00:35,490 --> 00:00:39,410
it is that X will take
on that particular value.
15
00:00:39,410 --> 00:00:41,690
Claude Shannon, in his
seminal work on the theory
16
00:00:41,690 --> 00:00:44,300
of information, defined the
information received when
17
00:00:44,300 --> 00:00:49,480
learning that X had taken on
the value x_i as the log-base-2
18
00:00:49,480 --> 00:00:51,790
of 1/p_i.
19
00:00:51,790 --> 00:00:53,520
Note that uncertainty
of a choice
20
00:00:53,520 --> 00:00:56,060
is inversely proportional
to its probability,
21
00:00:56,060 --> 00:00:59,390
so the term inside of the log
is basically the uncertainty
22
00:00:59,390 --> 00:01:01,700
of that particular choice.
23
00:01:01,700 --> 00:01:04,040
We use the log-base-2
to measure the magnitude
24
00:01:04,040 --> 00:01:07,120
of the uncertainty in bits --
where a bit is a quantity that
25
00:01:07,120 --> 00:01:10,770
can take on the value 0 or 1
-- think of the information
26
00:01:10,770 --> 00:01:14,380
content as the number of bits
we would require to encode this
27
00:01:14,380 --> 00:01:16,490
choice.
28
00:01:16,490 --> 00:01:18,540
Suppose the data
we receive doesn't
29
00:01:18,540 --> 00:01:20,280
resolve all the uncertainty.
30
00:01:20,280 --> 00:01:22,980
For example, when earlier
we received the data
31
00:01:22,980 --> 00:01:25,560
that the card was a
heart: some of uncertainty
32
00:01:25,560 --> 00:01:27,850
has been resolved since we
know more about the card
33
00:01:27,850 --> 00:01:29,810
than we did before the
receiving the data,
34
00:01:29,810 --> 00:01:32,100
but we don't yet
know the exact card,
35
00:01:32,100 --> 00:01:34,990
so some uncertainty
still remains.
36
00:01:34,990 --> 00:01:37,320
We can still use the formula
for information content
37
00:01:37,320 --> 00:01:40,210
from the previous slide,
using the probability
38
00:01:40,210 --> 00:01:43,910
we received to compute
the information content.
39
00:01:43,910 --> 00:01:46,230
In our example the
probability of learning
40
00:01:46,230 --> 00:01:49,860
that a card chosen randomly
from a 52-card deck is a heart
41
00:01:49,860 --> 00:01:54,670
is 13/52, the number of
hearts over the total number
42
00:01:54,670 --> 00:01:55,980
of choices.
43
00:01:55,980 --> 00:02:01,070
So p_data is 13/52, or 1/4
and the information content is
44
00:02:01,070 --> 00:02:03,030
computed as
log-base-2 of 1/(1/4),
45
00:02:03,030 --> 00:02:07,730
which figures out to be 2 bits.
46
00:02:07,730 --> 00:02:10,280
This example is one
we encounter often --
47
00:02:10,280 --> 00:02:13,850
we receive partial information
about N equally-probable
48
00:02:13,850 --> 00:02:18,220
choices (each choice has
probability 1/N) that narrows
49
00:02:18,220 --> 00:02:21,000
the number of choices down to M.
50
00:02:21,000 --> 00:02:24,740
The probability of receiving
such information is M*(1/N),
51
00:02:24,740 --> 00:02:31,740
so information received
is log-base-2 of N/M bits.
52
00:02:31,740 --> 00:02:33,600
Let's look at some examples.
53
00:02:33,600 --> 00:02:36,160
If we learn the result
(HEADS or TAILS)
54
00:02:36,160 --> 00:02:39,370
of a flip of a fair coin,
we go from 2 choices
55
00:02:39,370 --> 00:02:40,600
to a single choice.
56
00:02:40,600 --> 00:02:43,100
So the information received
is log-base-2 of 2/1,
57
00:02:43,100 --> 00:02:45,780
or a single bit.
58
00:02:45,780 --> 00:02:47,330
This makes sense:
it would take us
59
00:02:47,330 --> 00:02:49,880
one bit to encode which
of the two possibilities
60
00:02:49,880 --> 00:02:54,920
actually happened, say, "1"
for heads and "0" for tails.
61
00:02:54,920 --> 00:02:57,160
Reviewing the example
from the previous slide:
62
00:02:57,160 --> 00:02:59,970
learning that a card drawn
from a fresh deck is a heart
63
00:02:59,970 --> 00:03:05,590
gives us log-base-2 of 52/13,
or 2 bits of information.
64
00:03:05,590 --> 00:03:07,290
Again this makes
sense: it would take us
65
00:03:07,290 --> 00:03:10,200
two bits to encode which
of four possible card suits
66
00:03:10,200 --> 00:03:12,410
had turned up.
67
00:03:12,410 --> 00:03:14,190
Finally consider
what information
68
00:03:14,190 --> 00:03:18,220
we get from rolling two
dice, one red and one green.
69
00:03:18,220 --> 00:03:23,200
Each die has six faces, so there
are 36 possible combinations.
70
00:03:23,200 --> 00:03:25,580
Once we learn the exact
outcome of the roll,
71
00:03:25,580 --> 00:03:30,700
we've received log-base-2
of 36/1 or 5.17 bits
72
00:03:30,700 --> 00:03:32,320
of information.
73
00:03:32,320 --> 00:03:32,960
Hmm.
74
00:03:32,960 --> 00:03:35,470
What do those
fractional bits mean?
75
00:03:35,470 --> 00:03:38,740
Our circuitry will only
deal with whole bits!
76
00:03:38,740 --> 00:03:42,360
So to encode a single outcome,
we'd need to use six bits.
77
00:03:42,360 --> 00:03:44,670
But suppose we wanted to
record the outcome of 10
78
00:03:44,670 --> 00:03:46,550
successive rolls.
79
00:03:46,550 --> 00:03:50,760
At 6 bits per roll, we would
need a total of 60 bits.
80
00:03:50,760 --> 00:03:52,690
What this formula
is telling us is
81
00:03:52,690 --> 00:03:55,720
that we would need
not 60 bits, but only
82
00:03:55,720 --> 00:03:59,620
52 bits to unambiguously
encode the results.
83
00:03:59,620 --> 00:04:01,560
Whether we can come
with an encoding that
84
00:04:01,560 --> 00:04:04,310
achieves this lower bound
is an interesting question,
85
00:04:04,310 --> 00:04:07,850
which we will take up
later in the chapter.
86
00:04:07,850 --> 00:04:10,850
To wrap up, let's return
to our initial example.
87
00:04:10,850 --> 00:04:13,430
Here's table showing the
different choices for the data
88
00:04:13,430 --> 00:04:15,630
received, along
with the probability
89
00:04:15,630 --> 00:04:19,662
of that event and the
computed information content.
90
00:04:19,662 --> 00:04:21,120
We've already talked
about learning
91
00:04:21,120 --> 00:04:22,560
that the card was a heart.
92
00:04:22,560 --> 00:04:26,340
The probability of this event
is 13/52 with an information
93
00:04:26,340 --> 00:04:28,820
content of 2 bits.
94
00:04:28,820 --> 00:04:31,250
Learning that a card is
not the Ace of spades
95
00:04:31,250 --> 00:04:33,870
is quite likely, since
there's only one chance in 52
96
00:04:33,870 --> 00:04:36,600
that it is the Ace of spades.
97
00:04:36,600 --> 00:04:40,930
So we only get a small amount of
information from this event --
98
00:04:40,930 --> 00:04:42,790
.028 bits.
99
00:04:42,790 --> 00:04:44,990
There twelve face
cards in a card deck,
100
00:04:44,990 --> 00:04:47,360
so the probability of
this event is 12/52
101
00:04:47,360 --> 00:04:52,290
and we would receive 2.115 bits.
102
00:04:52,290 --> 00:04:55,040
A bit more information than
learning about the card's suit
103
00:04:55,040 --> 00:04:59,200
since there's slightly
less residual uncertainty.
104
00:04:59,200 --> 00:05:01,880
Finally, we get the most
information when all
105
00:05:01,880 --> 00:05:04,910
uncertainty is eliminated
-- a bit more than 5.7 bits.
106
00:05:04,910 --> 00:05:11,120
The results line up nicely with
our and Mr. Blue's intuition:
107
00:05:11,120 --> 00:05:14,310
the more uncertainty is
resolved, the more information
108
00:05:14,310 --> 00:05:16,090
we have received.
109
00:05:16,090 --> 00:05:19,040
Now try your hand at computing
the information for a few
110
00:05:19,040 --> 00:05:22,070
more examples in the
following exercises.