1
00:00:01,069 --> 00:00:05,050
In the next section we're going to start our
discussion on how to actually engineer the
2
00:00:05,050 --> 00:00:09,660
bit encodings we'll use in our circuitry,
but first we'll need a way to evaluate the
3
00:00:09,660 --> 00:00:12,120
efficacy of an encoding.
4
00:00:12,120 --> 00:00:16,980
The entropy of a random variable is average
amount of information received when learning
5
00:00:16,980 --> 00:00:19,500
the value of the random variable.
6
00:00:19,500 --> 00:00:23,439
The mathematician's name for "average" is
"expected value"; that's what the capital
7
00:00:23,439 --> 00:00:25,050
E means.
8
00:00:25,050 --> 00:00:30,560
We compute the average in the obvious way:
we take the weighted sum, where the amount
9
00:00:30,560 --> 00:00:37,329
of information received when learning of particular
choice i -- that's the log-base-2 of 1/p_i
10
00:00:37,329 --> 00:00:41,160
-- is weighted by the probability of that
choice actually happening.
11
00:00:41,160 --> 00:00:43,489
Here's an example.
12
00:00:43,489 --> 00:00:49,149
We have a random variable that can take on
one of four values: A, B, C or D.
13
00:00:49,149 --> 00:00:53,739
The probabilities of each choice are shown
in the table, along with the associated information
14
00:00:53,739 --> 00:00:56,280
content.
15
00:00:56,280 --> 00:01:00,949
Now we'll compute the entropy using the probabilities
and information content.
16
00:01:00,949 --> 00:01:06,770
So we have the probability of A (1/3) times
its associated information content (1.58 bits),
17
00:01:06,770 --> 00:01:13,360
plus the probability of B times its associated
information content, and so on.
18
00:01:13,360 --> 00:01:17,079
The result is 1.626 bits.
19
00:01:17,079 --> 00:01:21,330
This is telling us that a clever encoding
scheme should be able to do better than simply
20
00:01:21,330 --> 00:01:27,100
encoding each symbol using 2 bits to represent
which of the four possible values is next.
21
00:01:27,100 --> 00:01:29,020
Food for thought!
22
00:01:29,020 --> 00:01:33,399
We'll discuss this further in the third section
of this chapter.
23
00:01:33,399 --> 00:01:37,820
So, what is the entropy telling us?
24
00:01:37,820 --> 00:01:42,490
Suppose we have a sequence of data describing
a sequence of values of the random variable
25
00:01:42,490 --> 00:01:43,490
X.
26
00:01:43,490 --> 00:01:49,860
If, on the average, we use less than H(X)
bits to transmit each piece of information
27
00:01:49,860 --> 00:01:54,340
in the sequence, we will not be sending enough
information to resolve the uncertainty about
28
00:01:54,340 --> 00:01:55,840
the values.
29
00:01:55,840 --> 00:02:01,649
In other words, the entropy is a lower bound
on the number of bits we need to transmit.
30
00:02:01,649 --> 00:02:06,579
Getting less than this number of bits wouldn't
be good if the goal was to unambiguously describe
31
00:02:06,579 --> 00:02:10,970
the sequence of values -- we'd have failed
at our job!
32
00:02:10,970 --> 00:02:16,140
On the other hand, if we send, on the average,
more than H(X) bits to describe the sequence
33
00:02:16,140 --> 00:02:20,970
of values, we will not be making the most
effective use of our resources, since the
34
00:02:20,970 --> 00:02:25,050
same information might have been able to be
represented with fewer bits.
35
00:02:25,050 --> 00:02:29,560
This is okay, but perhaps with some insights
we could do better.
36
00:02:29,560 --> 00:02:35,970
Finally, if we send, on the average, exactly
H(X) bits, then we'd have the perfect encoding.
37
00:02:35,970 --> 00:02:40,660
Alas, perfection is, as always, a tough goal,
so most of the time we'll have to settle for
38
00:02:40,660 --> 00:02:42,040
getting close.
39
00:02:42,040 --> 00:02:48,300
In the final set of exercises for this section,
try computing the entropy for various scenarios.