WEBVTT
00:00:01.069 --> 00:00:05.050
In the next section we're going to start our
discussion on how to actually engineer the
00:00:05.050 --> 00:00:09.660
bit encodings we'll use in our circuitry,
but first we'll need a way to evaluate the
00:00:09.660 --> 00:00:12.120
efficacy of an encoding.
00:00:12.120 --> 00:00:16.980
The entropy of a random variable is average
amount of information received when learning
00:00:16.980 --> 00:00:19.500
the value of the random variable.
00:00:19.500 --> 00:00:23.439
The mathematician's name for "average" is
"expected value"; that's what the capital
00:00:23.439 --> 00:00:25.050
E means.
00:00:25.050 --> 00:00:30.560
We compute the average in the obvious way:
we take the weighted sum, where the amount
00:00:30.560 --> 00:00:37.329
of information received when learning of particular
choice i -- that's the log-base-2 of 1/p_i
00:00:37.329 --> 00:00:41.160
-- is weighted by the probability of that
choice actually happening.
00:00:41.160 --> 00:00:43.489
Here's an example.
00:00:43.489 --> 00:00:49.149
We have a random variable that can take on
one of four values: A, B, C or D.
00:00:49.149 --> 00:00:53.739
The probabilities of each choice are shown
in the table, along with the associated information
00:00:53.739 --> 00:00:56.280
content.
00:00:56.280 --> 00:01:00.949
Now we'll compute the entropy using the probabilities
and information content.
00:01:00.949 --> 00:01:06.770
So we have the probability of A (1/3) times
its associated information content (1.58 bits),
00:01:06.770 --> 00:01:13.360
plus the probability of B times its associated
information content, and so on.
00:01:13.360 --> 00:01:17.079
The result is 1.626 bits.
00:01:17.079 --> 00:01:21.330
This is telling us that a clever encoding
scheme should be able to do better than simply
00:01:21.330 --> 00:01:27.100
encoding each symbol using 2 bits to represent
which of the four possible values is next.
00:01:27.100 --> 00:01:29.020
Food for thought!
00:01:29.020 --> 00:01:33.399
We'll discuss this further in the third section
of this chapter.
00:01:33.399 --> 00:01:37.820
So, what is the entropy telling us?
00:01:37.820 --> 00:01:42.490
Suppose we have a sequence of data describing
a sequence of values of the random variable
00:01:42.490 --> 00:01:43.490
X.
00:01:43.490 --> 00:01:49.860
If, on the average, we use less than H(X)
bits to transmit each piece of information
00:01:49.860 --> 00:01:54.340
in the sequence, we will not be sending enough
information to resolve the uncertainty about
00:01:54.340 --> 00:01:55.840
the values.
00:01:55.840 --> 00:02:01.649
In other words, the entropy is a lower bound
on the number of bits we need to transmit.
00:02:01.649 --> 00:02:06.579
Getting less than this number of bits wouldn't
be good if the goal was to unambiguously describe
00:02:06.579 --> 00:02:10.970
the sequence of values -- we'd have failed
at our job!
00:02:10.970 --> 00:02:16.140
On the other hand, if we send, on the average,
more than H(X) bits to describe the sequence
00:02:16.140 --> 00:02:20.970
of values, we will not be making the most
effective use of our resources, since the
00:02:20.970 --> 00:02:25.050
same information might have been able to be
represented with fewer bits.
00:02:25.050 --> 00:02:29.560
This is okay, but perhaps with some insights
we could do better.
00:02:29.560 --> 00:02:35.970
Finally, if we send, on the average, exactly
H(X) bits, then we'd have the perfect encoding.
00:02:35.970 --> 00:02:40.660
Alas, perfection is, as always, a tough goal,
so most of the time we'll have to settle for
00:02:40.660 --> 00:02:42.040
getting close.
00:02:42.040 --> 00:02:48.300
In the final set of exercises for this section,
try computing the entropy for various scenarios.