## Course Meeting Times

Lectures: 3 sessions / week, 1 hour / session

## Prerequisites

Permission of instructor is required. Helpful courses (ideal but not required): Theory of Probability (18.175) and either Statistical Learning Theory and Applications (9.520) or Machine Learning (6.867)

## Description

The main goal of this course is to study the generalization ability of a number of popular machine learning algorithms such as boosting, support vector machines and neural networks. We will develop a number of technical tools that will allow us to give qualitative explanations of why these learning algorithms work so well in many classification problems.

Topics of the course include Vapnik-Chervonenkis theory, concentration inequalities in product spaces, and other elements of empirical process theory.

## Grading

The grade is based upon two problem sets and class attendance.

## Course Outline

### Introduction

- Classification problem set-up
- Examples of learning algorithms: Voting algorithms (boosting), support vector machines, neural networks
- Analyzing generalization ability

### Technical Tools: Elements of Empirical Process Theory

One-dimensional Concentration Inequalities

- Chebyshev (Markov), Rademacher, Hoeffding, Bernstein, Bennett
- Toward uniform bounds: Union bound, clustering

Vapnik-Chervonenkis Theory and More

- VC classes of sets and functions
- Shattering numbers, growth function, covering numbers
- Examples of VC classes, properties
- Uniform deviation bounds
- Symmetrization
- Kolmogorov’s chaining technique
- Dudley’s entropy integral
- Contraction principles

Concentration Inequalities

- Talagrand’s concentration inequality on the cube
- Symmetrization
- Talagrand’s concentration inequality for empirical processes
- Vapnik-Chervonenkis type inequalities
- Martingale-difference inequalities

### Applications

- Generalization ability of voting classifiers, neural networks, support vector machines