18.S096 | Fall 2015 | Undergraduate

Topics in Mathematics of Data Science

Syllabus

Course Meeting Times

Lectures: 2 sessions / week, 1.5 hours / session

Prerequisites

Working knowledge of 18.06SC Linear Algebra and 18.05 Introduction to Probability and Statistics is required. Some familiarity with the basics of optimization and algorithms is also recommended.

Description

This is a mostly self-contained research-oriented course designed for undergraduate students (but also extremely welcoming to graduate students) with an interest in doing research in theoretical aspects of algorithms that aim to extract information from data. These often lie in overlaps of two or more of the following: Mathematics, Applied Mathematics, Computer Science, Electrical Engineering, Statistics, and / or Operations Research.

The topics covered include:

  1. Principal Component Analysis (PCA) and some random matrix theory that will be used to understand the performance of PCA in high dimensions, through spike models.
  2. Manifold Learning and Diffusion Maps: A nonlinear dimension reduction tool, alternative to PCA. Semisupervised Learning and its relations to Sobolev Embedding Theorem.
  3. Spectral Clustering and a guarantee for its performance: Cheeger’s Inequality.
  4. Concentration of Measure and tail bounds in probability, both for scalar variables and matrix variables.
  5. Dimension reduction through Johnson-Lindenstrauss Lemma and Gordon’s Escape Through a Mesh Theorem.
  6. Compressed Sensing / Sparse Recovery, Matrix Completion, etc. If time permits, I will present Number Theory inspired constructions of measurement matrices.
  7. Group Testing. Here we will use combinatorial tools to establish lower bounds on testing procedures and, if there is time, I might give a crash course on error-correcting codes and show a use of them in group testing.
  8. Approximation algorithms in Theoretical Computer Science and the Max-Cut problem.
  9. Clustering on random graphs: Stochastic Block Model. Basics of duality in optimization.
  10. Synchronization, inverse problems on graphs, and estimation of unknown variables from pairwise ratios on compact groups.

Grading

ACTIVITIES PERCENTAGES
Assignments 40%
Project 60%

40% of the grade is based on a handful of homework problem sets (to be handed out roughly bi-weekly). You are welcome to work on the problem sets in groups, but you have to write your own solutions.

60% of the grade is based on a project. The project (which can be done individually or in groups of two) can be a literature review, but I would recommended attempting to do original research, either by trying to make partial progress on (or completely solve!) one of the open problems, or by pursuing another research direction. The project report is due on the last week of classes. A preliminary abstract will be due roughly a month before the project due date and each group is expected to make a 5 minute presentation on class about their project before the due date.

Course Info

Instructor
Departments
As Taught In
Fall 2015
Learning Resource Types
Problem Sets
Lecture Notes
Instructor Insights