Course Meeting Times
Lectures: 3 sessions / week, 1 hour / session
Prerequisites
Permission of instructor is required. Helpful courses (ideal but not required): Theory of Probability (18.175) and either Statistical Learning Theory and Applications (9.520) or Machine Learning (6.867)
Description
The main goal of this course is to study the generalization ability of a number of popular machine learning algorithms such as boosting, support vector machines and neural networks. We will develop a number of technical tools that will allow us to give qualitative explanations of why these learning algorithms work so well in many classification problems.
Topics of the course include Vapnik-Chervonenkis theory, concentration inequalities in product spaces, and other elements of empirical process theory.
Grading
The grade is based upon two problem sets and class attendance.
Course Outline
Introduction
- Classification problem set-up
- Examples of learning algorithms: Voting algorithms (boosting), support vector machines, neural networks
- Analyzing generalization ability
Technical Tools: Elements of Empirical Process Theory
One-dimensional Concentration Inequalities
- Chebyshev (Markov), Rademacher, Hoeffding, Bernstein, Bennett
- Toward uniform bounds: Union bound, clustering
Vapnik-Chervonenkis Theory and More
- VC classes of sets and functions
- Shattering numbers, growth function, covering numbers
- Examples of VC classes, properties
- Uniform deviation bounds
- Symmetrization
- Kolmogorov’s chaining technique
- Dudley’s entropy integral
- Contraction principles
Concentration Inequalities
- Talagrand’s concentration inequality on the cube
- Symmetrization
- Talagrand’s concentration inequality for empirical processes
- Vapnik-Chervonenkis type inequalities
- Martingale-difference inequalities
Applications
- Generalization ability of voting classifiers, neural networks, support vector machines