| LEC # | TOPICS | LECTURE NOTES | 
|---|---|---|
| 1 | 
Markov Decision Processes
 Finite-Horizon Problems: Backwards Induction Discounted-Cost Problems: Cost-to-Go Function, Bellman’s Equation  | 
(PDF) | 
| 2 | 
Value Iteration
 Existence and Uniqueness of Bellman’s Equation Solution Gauss-Seidel Value Iteration  | 
(PDF) | 
| 3 | 
Optimality of Policies derived from the Cost-to-Go Function
 Policy Iteration Asynchronous Policy Iteration  | 
(PDF) | 
| 4 | 
Average-Cost Problems
 Relationship with Discounted-Cost Problems Bellman’s Equation Blackwell Optimality  | 
(PDF) | 
| 5 | 
Average-Cost Problems
 Computational Methods  | 
(PDF) | 
| 6 | 
Application of Value Iteration to Optimization of Multiclass Queueing Networks
 Introduction to Simulation-based Methods Real-Time Value Iteration  | 
(PDF) | 
| 7 | 
Q-Learning
 Stochastic Approximations  | 
(PDF) | 
| 8 | 
Stochastic Approximations: Lyapunov Function Analysis
 The ODE Method Convergence of Q-Learning  | 
(PDF) | 
| 9 | Exploration versus Exploitation: The Complexity of Reinforcement Learning | (PDF) | 
| 10 | 
Introduction to Value Function Approximation
 Curse of Dimensionality Approximation Architectures  | 
(PDF) | 
| 11 | Model Selection and Complexity | (PDF) | 
| 12 | 
Introduction to Value Function Approximation Algorithms
 Performance Bounds  | 
(PDF) | 
| 13 | Temporal-Difference Learning with Value Function Approximation | (PDF) | 
| 14 | Temporal-Difference Learning with Value Function Approximation (cont.) | (PDF) | 
| 15 | 
Temporal-Difference Learning with Value Function Approximation (cont.)
 Optimal Stopping Problems General Control Problems  | 
(PDF) | 
| 16 | Approximate Linear Programming | (PDF) | 
| 17 | Approximate Linear Programming (cont.) | (PDF) | 
| 18 | Efficient Solutions for Approximate Linear Programming | (PDF) | 
| 19 | Efficient Solutions for Approximate Linear Programming: Factored MDPs | (PDF) | 
| 20 | Policy Search Methods | (PDF) | 
| 21 | Policy Search Methods (cont.) | (PDF) | 
| 22 | 
Policy Search Methods for POMDPs
 Application: Call Admission Control Actor-Critic Methods  | 
|
| 23 | Approximate POMDP Compression | |
| 24 | 
Policy Search Methods: PEGASUS
 Application: Helicopter Control  | 
Lecture Notes
Course Info
Instructor
Departments
As Taught In
            Spring
            
              2004
            
          
        Level
Topics
Learning Resource Types
    notes
    Lecture Notes
  
    group_work
    Projects with Examples
  
    assignment
    Problem Sets