6.231 | Fall 2015 | Graduate

Dynamic Programming and Stochastic Control

Related Video Lectures

Summer 2014

These videos and lecture notes are from a 6-lecture, 12-hour short course on Approximate Dynamic Programming, taught by Professor Dimitri P. Bertsekas at Tsinghua University in Beijing, China in June 2014. They focus primarily on the advanced research-oriented issues of large scale infinite horizon dynamic programming, which corresponds to lectures 11-23 of the MIT 6.231 course.

The complete set of lecture notes are available here: Complete Slides (PDF - 1.6MB), and are also divided by lecture below. Additional supporting material can be obtained on Prof. Bertsekas’ web site.

Note To OCW Users: All videos are from Shuvomoy Das Gupta on Youtube and are not provided under our Creative Commons License.

TOPICS VIDEO LECTURES LECTURE NOTES

Introduction to Dynamic Programming (DP)

  • Approximate DP
  • Finite Horizon Problems
  • DP Algorithm for Finite Horizon Problems
  • Infinite Horizon Problems
  • Basic Theory of Discounted Infinite Horizon Problems

Approximate Dynamic Programming, Lecture 1, Part 1

Lecture 1 (PDF)
Approximate Dynamic Programming, Lecture 1, Part 2
Approximate Dynamic Programming, Lecture 1, Part 3

Review of Discounted Problem Theory, Shorthand Notation

  • Algorithms for Discounted DP
  • Value Iteration (VI)
  • Policy Iteration (PI)
  • Q-Factors and Q-Learning
  • DP Models
  • Asynchronous Algorithms

Approximate Dynamic Programming, Lecture 2, Part 1 Lecture 2 (PDF)
Approximate Dynamic Programming, Lecture 2, Part 2
Approximate Dynamic Programming, Lecture 2, Part 3

General Issues of Approximation and Simulation for Large-Scale Problems

  • Introduction to Approximate DP
  • Approximation Architectures
  • Simulation-Based Approximate Policy Evaluation
  • General Issues Regarding Approximation and Simulation

Approximate Dynamic Programming, Lecture 3, Part 1 Lecture 3 (PDF)
Approximate Dynamic Programming, Lecture 3, Part 2

Approximate Policy Iteration based on Temporal Differences, Projected Equations, Galerkin Approximation

  • Approximation in Value Space
  • Approximate VI and PI
  • Projected Bellman Equations
  • Matrix Form of the Projected Equation
  • Simulation-Based Implementation
  • LSTD and LSPE Methods
  • Bias-Variance Tradeoff

Approximate Dynamic Programming, Lecture 4, Part 1 Lecture 4 (PDF)
Approximate Dynamic Programming, Lecture 4, Part 2

Aggregation Methods

  • Review of Approximate PI Based on Projected Bellman Equations
  • Issues of Policy Improvement
  • Exploration Enhancement in Policy Evaluation
  • Oscillations in Approximate PI
  • Aggregation: Examples, Simulation-Based, Relation with Projected Equations

Approximate Dynamic Programming, Lecture 5, Part 1 Lecture 5 (PDF)
Approximate Dynamic Programming, Lecture 5, Part 2
Approximate Dynamic Programming, Lecture 5, Part 3

Q-Learning, Approximation in Policy Space

  • Review of Q-Factors and Bellman Equations for Q-Factors
  • VI and PI for Q-Factors
  • Q-Learning: Combination of VI and Sampling
  • Q-Learning and Cost Function Approximation
  • Adaptive Dynamic Programming
  • Approximation in Policy Space
  • Additional Topics

Approximate Dynamic Programming, Lecture 6, Part 1 Lecture 6 (PDF)
Approximate Dynamic Programming, Lecture 6, Part 2

Summer 2012

These notes are from a condensed, more research-oriented version of the course, given by Prof. Bertsekas in Summer 2012.

Short Course Notes (PDF)

LEC # LECTURE NOTES
1 Exact DP: Infinite Horizon Problems (PDF)
2 Exact DP: Large-scale Computational Methods (PDF)
3 General Issues of Approximation and Simulation (PDF)
4 Temporal Differences (TD), Projected Equations, Galerkin Approximation (PDF)
5 Aggregation Methods (PDF)
6 Stochastic Approximation, Q-learning, and Other Methods (PDF)
7 Monte Carlo Methods (PDF)

Course Info

As Taught In
Fall 2015
Level