6.441 | Spring 2016 | Graduate

Information Theory

Lecture Notes

The following lecture notes were written for 6.441 by Professors Yury Polyanskiy of MIT and Yihong Wu of University of Illinois Urbana-Champaign. A complete copy of the notes are available for download (PDF - 7.6MB).

Part I: Information Measures
Chapter 1: Information measures: Entropy and divergence (PDF)

1.1 Entropy

1.2 Divergence

1.3 Differential entropy

Chapter 2: Information measures: Mutual information (PDF)

2.1 Divergence: Main inequality

2.2 Conditional divergence

2.3 Mutual information

2.4 Conditional mutual information and conditional independence

2.5 Strong data-processing inequalities

2.6 How to avoid measurability problems?

Chapter 3: Sufficient statistic. Continuity of divergence and mutual information. (PDF)

3.1 Sufficient statistics and data-processing

3.2 Geometric interpretation of mutual information

3.3 Variational characterizations of divergence: Donsker-Varadhan

3.4 Variational characterizations of divergence: Gelfand-Yaglom-Perez

3.5 Continuity of divergence. Dependence on sigma-algebra

3.6 Variational characterizations and continuity of mutual information

Chapter 4: Extremization of mutual information: Capacity saddle point (PDF)

4.1 Convexity of information measures

4.2 Local behavior of divergence

4.3 Local behavior of divergence and Fisher information

4.4 Extremization of mutual information

4.5 Capacity = information radius

4.6 Existence of caod (general case)

4.7 Gaussian saddle point

Chapter 5: Single-letterization. Probability of error. Entropy rate. (PDF)

5.1 Extremization of mutual information for memoryless sources and channels

5.2 Gaussian capacity via orthogonal symmetry

5.3 Information measures and probability of error

5.4 Fano, LeCam and minimax risks

5.5 Entropy rate

5.6 Entropy and symbol (bit) error rate

5.7 Mutual information rate

5.8 Toeplitz matrices and Szego’s theorem

Part II: Lossless Data Compression
Chapter 6: Variable-length Lossless Compression (PDF - 1.1MB)

6.1 Variable-length, lossless, optimal compressor

6.2 Uniquely decodable codes, prefix codes and Huffman codes

Chapter 7: Fixed-length (almost lossless) compression. Slepian-Wolf problem. (PDF)

7.1 Fixed-length code, almost lossless

7.2 Linear Compression

7.3 Compression with Side Information at both compressor and decompressor

7.4 Slepian-Wolf (Compression with Side Information at Decompressor only)

7.5 Multi-terminal Slepian Wolf

7.6 Source-coding with a helper (Ahlswede-Korner-Wyner)

Chapter 8: Compressing stationary ergodic sources (PDF)

8.1 Bits of ergodic theory

8.2 Proof of Shannon-McMillan

8.3 Proof of Birkhoff -Khintchine

8.4 Sinai’s generator theorem

Chapter 9: Universal compression (PDF)

9.1 Arithmetic coding

9.2 Combinatorial construction of Fitingof

9.3 Optimal compressors for a class of sources. Redundancy

9.4 Approximate minimax solution: Je_reys prior

9.5 Sequential probability assignment: Krichevsky-Trofimov

9.6 Lempel-Ziv compressor

Part III: Binary Hypothesis Testing
Chapter 10: Binary hypothesis testing (PDF)

10.1 Binary Hypothesis Testing

10.2 Neyman-Pearson formulation

10.3 Likelihood ratio tests

10.4 Converse bounds on R(P, Q)

10.5 Achievability bounds on R(P,Q)

10.6 Asymptotics

Chapter 11: Hypothesis testing asymptotics I (PDF)

11.1 Stein’s regime

11.2 Chernoff regime

11.3 Basics of Large deviation theory

Chapter 12: Information projection and Large deviation (PDF)

12.1 Large-deviation exponents

12.2 Information Projection

12.3 Interpretation of Information Projection

12.4 Generalization: Sanov’s theorem

Chapter 13: Hypothesis testing asymptotics II (PDF - 2.0MB)

13.1 (E0,E1)-Tradeoff

13.2 Equivalent forms of Theorem 13.1

13.3 Sequential Hypothesis Testing

Part IV: Channel Coding
Chapter 14: Channel coding (PDF)

14.1 Channel Coding

14.2 Basic Results

14.3 General (Weak) Converse Bounds

14.4 General achievability bounds: Preview

Chapter 15: Channel coding: Achievability bounds (PDF)

15.1 Information density

15.2 Shannon’s achievability bound

15.3 Dependence-testing bound

15.4 Feinstein’s Lemma

Chapter 16: Linear codes. Channel capacity. (PDF)

16.1 Linear coding

16.2 Channels and channel capacity

16.3 Bounds on C_e; Capacity of Stationary Memoryless Channels

16.4 Examples of DMC

16.5 Information Stability

Chapter 17: Channels with input constraints. Gaussian channels. (PDF)

17.1 Channel coding with input constraints

17.2 Capacity under input constraint C(P) ?= Ci(P)

17.3 Applications

17.4 Non-stationary AWGN

17.5 Stationary Additive Colored Gaussian noise channel

17.6 Additive White Gaussian Noise channel with Intersymbol Interference

17.7 Gaussian channels with amplitude constraints

17.8 Gaussian channels with fading

Chapter 18: Lattice codes (by O. Ordentlich) (PDF)

18.1 Lattice Definitions

18.2 First Attempt at AWGN Capacity

18.3 Nested Lattice Codes/Voronoi Constellations

18.4 Dirty Paper Coding

18.5 Construction of Good Nested Lattice Pairs

Chapter 19: Channel coding: Energy-per-bit, continuous-time channels (PDF - 1.1MB)

19.1 Energy per bit

19.2 What is N0?

19.3 Capacity of the continuous-time band-limited AWGN channel

19.4 Capacity of the continuous-time band-unlimited AWGN channel

19.5 Capacity per unit cost

Chapter 20: Advanced channel coding. Source-Channel separation. (PDF)

20.1 Strong Converse

20.2 Stationary memoryless channel without strong converse

20.3 Channel Dispersion

20.4 Normalized Rate

20.5 Joint Source Channel Coding

Chapter 21: Channel coding with feedback (PDF - 1.2MB)

21.1 Feedback does not increase capacity for stationary memoryless channels

21.2 Alternative proof of Theorem 21.1 and Massey’s directed information

21.3 When is feedback really useful?

Chapter 22: Capacity-achieving codes via Forney concatenation (PDF)

22.1 Error exponents

22.2 Achieving polynomially small error probability

22.3 Concatenated codes

22.4 Achieving exponentially small error probability

Part V: Lossy Data Compression
Chapter 23: Rate-distortion theory (PDF)

23.1 Scalar quantization

23.2 Information-theoretic vector quantization

23.3 Converting excess distortion to average

Chapter 24: Rate distortion: Achievability bounds (PDF)

24.1 Recap

24.2 Shannon’s rate-distortion theorem

24.3 Covering lemma

Chapter 25: Evaluating R(D). Lossy Source-Channel separation. (PDF)

25.1 Evaluation of R(D)

25.2 Analog of saddle-point property in rate-distortion

25.3 Lossy joint source-channel coding

25.4 What is lacking in classical lossy compression?

Part VI: Advanced Topics
Chapter 26: Multiple-access channel (PDF)

26.1 Problem motivation and main results

26.2 MAC achievability bound

26.3 MAC capacity region proof

Chapter 27: Examples of MACs. Maximal Pe and zero-error capacity. (PDF)

27.1 Recap

27.2 Orthogonal MAC

27.3 BSC MAC

27.4 Adder MAC

27.5 Multiplier MAC

27.6 Contraction MAC

27.7 Gaussian MAC

27.8 MAC Peculiarities

Chapter 28: Random number generators (PDF)

28.1 Setup

28.2 Converse

28.3 Elias’ construction of RNG from lossless compressors

28.4 Peres’ iterated von Neumann’s scheme

28.5 Bernoulli factory

28.6 Related problems

Course Info

As Taught In
Spring 2016