Syllabus

Course Meeting Times

Lectures: 2 sessions / week, 1.5 hours / session

Topic Coverage

Topic coverage will be adapted according to students’ interests. Some or all of the following will be covered:

Markov Decision Processes and Dynamic Programming (2-3 weeks)

Stochastic Models, Dynamic Programming Theory, Value and Policy Iteration.

Simulation-Based Methods (2 weeks)

Asynchronous Value and Policy Iteration, Q-Learning, Complexity of Reinforcement Learning. Revision of underlying tools such as Lyapunov Function Analysis and the ODE Approach.

Value Function Approximation (4 weeks)

TD-Learning, Approximate Linear Programming, Performance Bounds, Theory of Function Approximation.

Policy Search Methods (2-3 weeks)

Policy Gradient and Actor-Critic Methods. Complexity of Policy Search.

Online Learning and Games (2 weeks)

Experts Algorithms, Regret Minimization and Calibration.

We will see applications throughout the course, including dynamic resource allocation, finance and queuing networks, among others.

Textbooks

Bertsekas, Dimitri P. Dynamic Programming and Optimal Control. 2 vols. Belmont, MA: Athena Scientific, 2007. ISBN: 9781886529083.

Bertsekas, Dimitri P., and John N. Tsitsiklis. Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996. ISBN: 9781886529106.

Individual Papers are also used for many class sessions, as listed in the readings section.

Grading Policy

ACTIVITIES	PERCENTAGES
Weekly/Bi-weekly Problem Sets with 2 or 3 questions each	40%
Final Project	60%

Term Project

Students will be offered the option of working on theory, algorithms and/or applications. Project proposals will be submitted midway through the term, with the final project due at the end of the term.

A 10-15 page project report and 15-20 minute presentation are required.

Browse Course Material

Course Info

Instructor

Departments

As Taught In

Level

Topics

Learning Resource Types

Decision Making in Large Scale Systems