Calendar | Nonlinear Optimization | Electrical Engineering and Computer Science

The course is roughly divided into three interconnected parts:

Part I will focus on the characterization of optimal points from an analytic point of view. In particular,
we will develop analytic techniques to find and certify optimal points in convex and nonconvex problems.
Part II will focus on gradient-based optimization methods in convex and nonconvex problems.
Part III will focus on preconditioned algorithms, second-order algorithms, or more generally algorithms that exploit the geometry of the objective function beyond its mere gradient.

Lecture 1: Introduction

Topics: Overview of the course material; goals and applications; general form of an optimization problem; Weierstrass’s theorem; computational considerations.

Part I: Nonlinear Optimization Theory

Lecture 2: First-order optimality conditions

Topics: First-order necessary optimality conditions; the unconstrained case; halfspace constraints.

Key Dates: Homework 1 out

Lecture 3: More on normal cones, and a first taste of duality

Topics: The connection between normal cones and Lagrange multipliers; linear programming duality as a straightforward consequence of normal cones.

Lecture 4: Convex functions and sufficiency of normal cones

Topics: Definition of convexity for functions; local optimality implies global optimality; sufficiency of first-order optimality conditions; how to recognize a convex function.

Lecture 5: The driver of duality: separation

Topics: Feasibility as minimization of distance from the feasible set; how separation implies the normal cone to the intersection of halfspaces.

Lecture 6: Separation as a proof system

Topics: Primer on computational complexity; the class NP coNP and its significance; certificate of infeasibility and Farkas’s lemma.

Key Dates: Homework 2 out

Lecture 7: Functional constraints, Lagrange multipliers, and KKT conditions

Topics: Optimization problems with functional constraints; Lagrangian function and Lagrange multipliers; constraint qualifications (linear independence of constraint gradients, Slaters condition).

Lecture 8: Lagrangian duality

Topics: Lagrangian function; connections with KKT conditions; dual problem; examples.

Lecture 9: Conic constraints

Topics: Conic programs and notable cases: linear programs, second-order cone programs, semidefinite programs; selected applications.

Lecture 10: Polynomial optimization

Topics: Polynomial optimization problems; connection between polynomial optimization and semidefinite programming; the sum of squares decomposition.

Key Dates: Homework 3 out

Lecture 11: Polarity and oracle equivalence

Topics: Duality between optimization and separation; polar and bipolar.

Session: Midterm review

Session: In-class midterm

Key Dates: Midterm exam

Part II: Algorithms based on first-order information

Lecture 12: Gradient descent and descent lemmas

Topics: Gradient descent algorithm; smoothness and the gradient descent lemma; computation of a point with small gradient in general (non-convex) functions; descent in function value for convex functions: the Euclidean mirror descent lemma.

Lecture 13: Acceleration and momentum

Topics: The idea of momentum; Nesterov’s accelerated gradient descent; Allen-Zhu and Orecchia’s accelerated gradient descent; practical considerations.

Key Dates: Homework 4 out

Lecture 14: Projected gradient descent and mirror descent (Part I)

Topics: The projected gradient descent (PGD) algorithm; distance-generating functions and Bregman divergences; proximal steps and their properties; the mirror descent algorithm; descent lemmas for mirror descent.

Lecture 15: Projected gradient descent and mirror descent (Part II)

Lecture 16: Stochastic gradient descent and empirical risk minimization

Topics: Practical importance of stochastic gradient descent with momentum in training deep learning models; empirical risk minimization problems; minibatches; stochastic gradient descent lemmas; the effect of variance.

Key Dates: Homework 5 out

Lecture 17: Distributed optimization and ADMM [Optional]

Topics: The setting of distributed optimization; the alternating direction method of multipliers (ADMM) algorithm; convergence analysis of ADMM in the case of convex functions.

Part III: Preconditioning and higher-order information

Lecture 18: Hessians, preconditioning, and Newton’s method

Topics: Local quadratic approximation of objectives; Newton’s method; local quadratic convergence rate; global convergence for certain classes of functions; practical considerations.

Lecture 19: Adaptive preconditioning: AdaGrad and ADAM

Topics: Adaptive construction of diagonal preconditioning matrices; the AdaGrad algorithm; the convergence rate of AdaGrad; proof sketch; AdaGrad with momentum: the popular ADAM algorithm.

Lecture 20: Self-concordant functions (Part I)

Topics: Limitations of the classic analysis of Newton’s method; desiderata and self-concordant functions; properties and examples.

Key Dates: Homework 6 out

Lecture 21: Self-concordant functions (Part II)