WEBVTT
00:00:00.500 --> 00:00:03.370
Given a set of symbols
and their probabilities,
00:00:03.370 --> 00:00:06.960
Huffman’s Algorithm tells us
how to construct an optimal
00:00:06.960 --> 00:00:08.750
variable-length encoding.
00:00:08.750 --> 00:00:11.570
By “optimal” we mean that,
assuming we’re encoding each
00:00:11.570 --> 00:00:15.000
symbol one-at-a-time, no other
variable-length code will have
00:00:15.000 --> 00:00:17.370
a shorter expected length.
00:00:17.370 --> 00:00:20.000
The algorithm builds the
binary tree for the encoding
00:00:20.000 --> 00:00:21.340
from the bottom up.
00:00:21.340 --> 00:00:23.830
Start by choosing the two
symbols with the smallest
00:00:23.830 --> 00:00:25.550
probability (which
means they have
00:00:25.550 --> 00:00:28.010
highest information content
and should have the longest
00:00:28.010 --> 00:00:29.750
encoding).
00:00:29.750 --> 00:00:31.440
If anywhere along
the way, two symbols
00:00:31.440 --> 00:00:35.010
have the same probability,
simply choose one arbitrarily.
00:00:35.010 --> 00:00:37.340
In our running example, the
two symbols with the lowest
00:00:37.340 --> 00:00:40.310
probability are C and D.
00:00:40.310 --> 00:00:42.580
Combine the symbols
as a binary subtree,
00:00:42.580 --> 00:00:44.770
with one branch labeled
“0” and the other “1”.
00:00:44.770 --> 00:00:48.530
It doesn’t matter which
labels go with which branch.
00:00:48.530 --> 00:00:51.050
Remove C and D from
our list of symbols,
00:00:51.050 --> 00:00:54.840
and replace them with the newly
constructed subtree, whose root
00:00:54.840 --> 00:00:57.660
has the associated
probability 1/6,
00:00:57.660 --> 00:01:01.640
the sum of the probabilities
of its two branches.
00:01:01.640 --> 00:01:04.620
Now continue, at each step
choosing the two symbols
00:01:04.620 --> 00:01:07.050
and/or subtrees with the
lowest probabilities,
00:01:07.050 --> 00:01:10.620
combining the choices
into a new subtree.
00:01:10.620 --> 00:01:12.650
At this point in our
example, the symbol A
00:01:12.650 --> 00:01:16.880
has the probability 1/3, the
symbol B the probability 1/2
00:01:16.880 --> 00:01:19.610
and the C/D subtree
probability 1/6.
00:01:19.610 --> 00:01:24.290
So we’ll combine A
with the C/D subtree.
00:01:24.290 --> 00:01:26.880
On the final step we only
have two choices left:
00:01:26.880 --> 00:01:29.810
B and the A/C/D
subtree, which we
00:01:29.810 --> 00:01:32.350
combine in a new subtree,
whose root then becomes
00:01:32.350 --> 00:01:33.910
the root of the
tree representing
00:01:33.910 --> 00:01:36.440
the optimal
variable-length code.
00:01:36.440 --> 00:01:40.060
Happily, this is the code
we’ve been using all along!
00:01:40.060 --> 00:01:42.730
As mentioned above, we can
produce a number of different
00:01:42.730 --> 00:01:45.890
variable-length codes by
swapping the “0” and “1” labels
00:01:45.890 --> 00:01:48.290
on any of the subtree branches.
00:01:48.290 --> 00:01:51.120
But all those encodings would
have the same expected length,
00:01:51.120 --> 00:01:53.770
which is determined by the
distance of each symbol
00:01:53.770 --> 00:01:56.640
from the root of the tree,
not the labels along the path
00:01:56.640 --> 00:01:58.560
from root to leaf.
00:01:58.560 --> 00:02:00.250
So all these different
encodings are
00:02:00.250 --> 00:02:03.110
equivalent in terms
of their efficiency.
00:02:03.110 --> 00:02:05.450
Now try your hand at
using Huffman’s Algorithm
00:02:05.450 --> 00:02:08.960
to construct optimal
variable-length encodings.