LEC #  TOPICS  READINGS 

1  Introduction, linear classification, perceptron update rule  
2  Perceptron convergence, generalization  
3  Maximum margin classification 
OptionalCristianini, Nello, and John ShaweTaylor. An Introduction to Support Vector Machines and Other Kernelbased Learning Methods. Cambridge, UK: Cambridge University Press, 2000. ISBN: 9780521780193. Burges, Christopher. “A Tutorial on Support Vector Machines for Pattern Recognition.” Data Mining and Knowledge Discovery 2, no. 2 (June 1998): 121167. 
4  Classification errors, regularization, logistic regression  
5  Linear regression, estimator bias and variance, active learning  
6  Active learning (cont.), nonlinear predictions, kernals  
7  Kernal regression, kernels  
8  Support vector machine (SVM) and kernels, kernel optimization 
Short tutorial on Lagrange multipliers (PDF) OptionalStephen Boyd’s course notes on convex optimization Boyd, Stephen, and Lieven Vandenberghe. Convex Optimization. Cambridge, UK: Cambridge University Press, 2004. ISBN: 9780521833783. 
9  Model selection  
10  Model selection criteria  
Midterm  
11  Description length, feature selection  
12  Combining classifiers, boosting  
13  Boosting, margin, and complexity 
OptionalSchapire, Robert. “A Brief Introduction to Boosting.” Proceedings of the 16^{th} International Joint Conference on Artificial Intelligence, 1999, pp. 14011406. 
14  Margin and generalization, mixture models 
OptionalBartlett, Peter, Yoav Freund, Wee sun Lee, and Robert E. Schapire. “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods.” Annals of Statistics 26, no. 5 (1998): 16511686. 
15  Mixtures and the expectation maximization (EM) algorithm  
16  EM, regularization, clustering  
17  Clustering  
18  Spectral clustering, Markov models 
OptionalShi, Jianbo, and Jitendra Malik. “Normalized Cuts and Image Segmentation.” IEEE Transactions on Pattern Analysis and Machine Intelligence 22, no. 8 (2000): 888905. 
19  Hidden Markov models (HMMs) 
OptionalRabiner, Lawrence R. “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.” Proceedings of the IEEE 77, no. 2 (1989): 257286. 
20  HMMs (cont.)  
21  Bayesian networks 
OptionalHeckerman, David. “A Tutorial on Learning with Bayesian Networks.” In Learning in Graphical Models by Michael I. Jordan. Cambridge, MA: MIT Press, 1998. ISBN: 9780262600323. 
22  Learning Bayesian networks  
23 
Probabilistic inference Guest lecture on collaborative filtering 

Final  
24  Current problems in machine learning, wrap up 
References
Bishop, Christopher. Neural Networks for Pattern Recognition. New York, NY: Oxford University Press, 1995. ISBN: 9780198538646.
Duda, Richard, Peter Hart, and David Stork. Pattern Classification. 2nd ed. New York, NY: WileyInterscience, 2000. ISBN: 9780471056690.
Hastie, T., R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning: Data Mining, Inference and Prediction. New York, NY: Springer, 2001. ISBN: 9780387952840.
MacKay, David. Information Theory, Inference, and Learning Algorithms. Cambridge, UK: Cambridge University Press, 2003. ISBN: 9780521642989. Available online here.
Mitchell, Tom. Machine Learning. New York, NY: McGrawHill, 1997. ISBN: 9780070428072.
Cover, Thomas M., and Joy A. Thomas. Elements of Information Theory. New York, NY: WileyInterscience, 1991. ISBN: 9780471062592.