9.520-A | Spring 2001 | Graduate

Networks for Learning: Regression and Classification

Readings

The readings listed below are arranged by topic and provide the foundation of this course. The instructor has annotated many of the readings. Click on the links to view abstracts for the journal articles, where available.

The Learning Problem in Perspective

Bertero, M., T. Poggio, and V. Torre. “Ill-posed Problems in Early Vision.” Proc. of the IEEE 76 (1988): 869-889.

Though restricted to early vision, it contains an easy-to-read introduction to ill-posed problems and regularization methods.

Girosi, F., M. Jones, and T. Poggio. “Regularization Theory and Neural Network Architectures.” Neural Computation 7 (1995): 219-269.

A thorough introduction to the connection between learning and Regularization Theory. We will often refer to this paper in this and in the next few classes.

Vapnik, V. The Nature of Statistical Learning Theory. Springer, 1995.

Chapter 1 is a readable first-hand introduction to the subject.

Further Readings:

Bertero, M. “Regularization Methods for Linear Inverse Problems.” In Inverse Problems. Edited by G. Talenti. Lecture Notes in Mathematics. Vol. 1225. 1986, pp. 52-112.

Still a very good survey of the subject.

Tikhonov, A. N. and V. Y. Arsenin. Solutions of Ill-posed Problems. W. H. Winston, 1977.

Everybody’s first book on Regularization Theory.

Vapnik, V. Statistical Learning Theory. Wiley, 1998.

Browse the first chapters of this book if you want to go deeper into the foundations of SLT.

Regularized Solutions

Girosi, F., M. Jones, and T. Poggio. “Regularization Theory and Neural Network Architectures.” Neural Computation 7 (1995): 219-269.

A thorough introduction to the connection between learning and Regularization Theory. We will often refer to this paper in this and in the next few classes.

Kolmogorov, N., and S.V. Fomine. Elements of the Theory of Functions and Functional Analysis. Dover, 1975.

A classic. Though you should be able to follow the class anyway, go through Sec. 5.1, 6.4, and 6.5 of Ch. 2 and Sec. 13.1, 13.2, 13.3, 13.5, 13.6, and 15.1 of Ch. 4 paying particular attention to all what concerns function spaces.

Strang, G. Calculus. Wellesley-Cambridge Press, 1991.

Chapter 13 contains an excellent exposition of the Lagrange multipliers technique.

Further Readings:

Bertero, M. “Regularization Methods for Linear Inverse Problems” In Inverse Problems. Edited by G. Talenti. Lecture Notes in Mathematics. Vol. 1225. 1986, pp. 52-112.

Still a very good survey of the subject.

Tikhonov, A. N., and V. Y. Arsenin. Solutions of Ill-posed Problems. W. H. Winston, 1977.

Everybody’s first book on Regularization Theory.

Reproducing Kernel Hilbert Spaces

Kolmogorov, N. and S. V. Fomine. Elements of the Theory of Functions and Functional Analysis. Dover, 1975.

A classic. Though you should be able to follow the class anyway, go through Sec. 5.1, 6.4, and 6.5 of Ch. 2 and Sec. 13.1, 13.2, 13.3, 13.5, 13.6, and 15.1 of Ch. 4 paying particular attention to all what concerns function spaces.

Strang, G. Introduction to Linear Algebra. Wellesley-Cambridge Press, 1993.

Chapter 6 contains the matrix algebra used in this class (and more!).
**
Further Readings:**

Aronszajn, N. “Theory of Reproducing Kernels.” Trans. Amer. Math. Soc. 686 (1950): 337-404.

RKHS the hard way.

Girosi, F. “An Equivalence Between Sparse Approximation and Support Vector Machines.” Neural Computation 10 (1998): 1455-1480.

In Appendix A of this paper you find a smooth introduction to RKHS.

Wahba, G. Spline Models for Observational Data. SIAM, 1990.

Chapter 1 introduces you to the world of RKHS.

Classic Approximation Schemes

Girosi, F., M. Jones, and T. Poggio. “Regularization Theory and Neural Network Architectures.” Neural Computation 7 (1995): 219-269.

A thorough introduction to the connection between Learning and Regularization Theory. Most of this class can be found in this paper.

Strang, G. Calculus. Wellesley-Cambridge Press, 1991.

Chapter 13 contains an excellent exposition of the Lagrange multipliers technique.

Nonparametric Techniques and Regularization Theory

Girosi, F., M. Jones, and T. Poggio. “Regularization Theory and Neural Network Architectures.” Neural Computation 7 (1995): 219-269.

Part of this paper is good for this class too!

Further Readings:

Vapnik, V. N. Estimation of Dependences Based on Empirical Data. Springer, 1982.

Chapter 9 contains a discussion of the Parzen windows method within the framework of Regularization Theory.

Ridge Approximation Techniques

Bishop, C. M. Neural Networks for Pattern Recognition. Clarendon, 1995.

Chapters 3 and 4 discuss single and multi-layer perceptrons at length.

Girosi, F., M. Jones, and T. Poggio. “Regularization Theory and Neural Network Architectures.” Neural Computation 7 (1995): 219-269.

Once more a very good source of information about connections between different approximation techniques.
 
Further Readings:

Hertz, J., A. Krogh, and R. G. Palmer. Introduction to the Theory of Neural Computation. Addison Wesley, 1991.

A good book on Neural Networks viewed from the physicist perspective.

Regularization Networks and Beyond

Girosi, F., M. Jones, and T. Poggio. “Regularization Theory and Neural Network Architectures.” Neural Computation 7 (1995): 219-269.

This is really the last time you have to go through it!

Further Readings:

Hertz, J., A. Krogh, and R. G. Palmer. Introduction to the Theory of Neural Computation. Addison Wesley, 1991.

A good book on Neural Networks viewed from the physicist perspective.

Introduction to Statistical Learning Theory

Vapnik, V. The Nature of Statistical Learning Theory. Springer, 1995.

Chapter 1 is a readable first-hand introduction to the subject.

Further Readings:

Vapnik, V. Statistical Learning Theory. Wiley, 1998.

Browse the first chapters of this book if you want to go deeper into the foundations of SLT.

Consistency of the Empirical Risk Minimization Principle

Vapnik, V. Statistical Learning Theory. Wiley, 1998.

Chapter 3 contains all the material covered in this class (and much more!). Several parts of Chapter 2 give you the perspective behind the theory, but if you want to appreciate the difference between stating a result and proving it, browse chapter 14…

VC-Dimension and VC-bounds

Vapnik, V. Statistical Learning Theory. Wiley, 1998.

Chapter 4 contains all the material covered in this class (and much more!)

VC Theory for Regression and Structural Risk Minimization

Alon, N., et al. “Scale Sensitive Dimensions, Uniform Convergence, and Learnability.” Symposium on Foundation of Computer Science (1993).

This paper gives the necessary and sufficient conditions for distribution independent uniform convergence for real valued functions.

Evgeniou, T., M. Pontil, and T. Poggio. “Regularization Networks and Support Vector Machines.” Advances in Computational Mathematics 13 (2000): 1-50.

Most of this class can be found in this paper.

Vapnik, V. Statistical Learning Theory. Wiley, 1998.

Chapters 5 and 6 tell you most but not the whole story about the results discussed in this class.

Support Vector Machines for Classification

Strang, G. Calculus. Wellesley-Cambridge Press, 1991.

Chapter 13 contains an excellent exposition of the Lagrange multipliers technique.

Vapnik, V. Statistical Learning Theory. Wiley, 1998.

This class will cover part of chapter 10. You may want to go through chapter 8 for putting SVMs in perspective with respect to other techniques.

Support Vector Machines for Regression

Evgeniou, T., M. Pontil, and T. Poggio. “Regularization Networks and Support Vector Machines.” Advances in Computational Mathematics 13 (2000): 1-50.

The discussion on the Bayesian interpretation of RN and SVM can be found in this paper.

Girosi, F. “An Equivalence between Sparse Approximation and Support Vector Machines.” Neural Computation 10 (1998): 1455-1480.

This is the paper in which the relation between SVM and BPD is studied.

Vapnik, V. Statistical Learning Theory. Wiley, 1998.

This class will cover part of chapters 11 and 13.

Further Readings:

Chen, S., D. Donoho, and M. Saunders. “Atomic Decomposition by Basis Pursuit.” Tech Rep 479. Dept. of Statistics. Stanford University. 1995.

Daubechies, I. “Time Frequency Localization Operators: a Geometric Phase Space Approach.” IEEE Trans. on Information Theory 34 (1988): 605-612.

Mallat, S., and S. Zhang. “Matching Pursuits with Time-Frequency Dictionaries.” IEEE. Trans. on Signal Proc. 41 (1993): 3397-3415.

Pontil, M., S. Mukherjee, and F. Girosi. “On the Noise Model of Support Vector Machine Regression.” CBCL Paper #168, AI Memo #1651, Massachusetts Institute of Technology, Cambridge, MA (1998).

Current Topics of Research I: Kernel Engineering

Cristianini, N., and J. Shawe-Taylor. Support Vector Machines and Other Kernel-based Learning Methods. Cambridge, 2000.

Chapter 3 of this book covers kernels in depth.

Vapnik, V. Statistical Learning Theory. Wiley, 1998.

You’ll find kernels and ideas on kernels throughout chapters 10, 11 and 12.

Further Readings:

Berg, C., J. P. R. Christensen, and P. Ressel. “Harmonic Analysis on Semigroups.” Springer Verlag.

The title is intimidating, but chapter 3 is easy to read and contains a lucid introduction to positive definite functions.

Jaakkola, T., and D. Haussler. “Exploiting Generative Models in Discriminative Classifiers.” NIPS (1998).

Niyogi, P., T. Poggio, and F. Girosi. “Incorporating Prior Information in Machine Learning by Creating Virtual Examples.” IEEE Proceedings on Intelligent Signal Processing 86 (1998): 2196-2209.

Neuroscience II

Logothetis, N. K., J. Pauls, and T. Poggio. “Viewered-Centered Object Recognition in Monkeys .” AI Memo 1472, CBCL Paper 95 (1994).

Logothetis, N. K., T. Vetter, A. Hulbert, and T. Poggio. “View-Based Models of 3D Object Recognition and Class-Specific Invariances.” AI Memo 1473, CBCL Paper 94 (1994).

Riesenhuber, M., and T. Poggio. “Hierarchical Models of Object Recognition in Cortex.” Nature Neuroscience 2 (1999): 1019-1025.

Current Topics of Research II: Approximation Error and Approximation Theory

Lorentz, G. G. “Approximation of Functions.” Chelsea Pub co, 1986.

Not easy but very compact and rigorous introduction to the subject (Chapters 1, 5 and 8-10 in particular).

Niyogi, P., and F. Girosi. “On the Relationship between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions.” Neural Computation 8 (1996): 819-842.

Here you find the material for the discussion on the various types of error.

Current Topics of Research III: Theory and Implementation of Support Vector Machines

Bazaraa, Sherali, and Shetty. Nonlinear Programming, Theory and Techniques. John Wiley & Sons, 1993.
A textbook on Optimization Theory.

Is the SVM solution unique?
Burges, and Crisp. Uniqueness of the SVM Solution NIPS 12 (1999).

**The Decomposition Method for SVMs:
**Osuna, Edgar. Support Vector Machines: Training And Applications. Ph.D. Thesis (1998).

Optimizing over 2 variables at a time:
Platt, John C. “Sequential Minimal Optimization: A Fast Algorithm For Training Support Vector Machines.” Microsoft Research MST-TR-98-14 (1998).

Analysis of the Decomposition Method:
Chang, Chih-Chung, Chih-Wei Hsu, and Chih-Jen Lin. “The Analysis of Decomposition Methods For Support Vector Machines.” Proceedings of IJCAI99, SVM workshop (1999).

Keerthi, S. S., and E. G. Gilbert. Convergence of a Generalized SMO Algorithm For SVM Classifier Design Control Division. Dept. of Mechanical and Production Engineering, National University of Singapore CD-00-01 (2000).

Keerthi, S. S., S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy. Improvements to Platt’s SMO Algorithm for SVM Classifier Design Control Division. Dept. of Mechanical and Production Engineering, National University of Singapore CD-99-14 (1999).

Sparsity Control:
Osuna, Freund, and Girosi. “Reducing Run-time Complexity in SVMs.” Proceedings of the 14th Int’l Conference on Pattern Recognition.

Current Topics of Research V: Bagging and Boosting

Breiman, L. “Bagging Predictors.” Machine Learning 26 (1996): 123-140.

Schapire, R. E., Y. Freund, P. Bartlett, and W. S. Lee. “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods.” The Annals of Statistics 26 (1998): 1651-1686.

Selected Topic: Wavelets and Frames

Stollnitz, E. J., T. D. DeRose, and D. H. Salesin. “Wavelets for Computer Graphics: A Primer Department of Computer Science and Engineering.” University of Washington Tech Rep 94-09-11 (1994).

A readable introduction to wavelets.

Daubechies, I. “Ten Lectures on Wavelets.” CBMS-NSF Regional Conferences Series in Applied Mathematics, SIAM, Philadelphia PA (1992).

More advanced but it also contains the basic theoretical results on frames.

Course Info

As Taught In
Spring 2001
Level
Learning Resource Types
Problem Sets
Written Assignments