Regularization Networks and Support Vector Machines

Evgeniou, Theodoros; Pontil, Massimiliano; Poggio, Tomaso

doi:10.1023/A:1018946025316

Regularization Networks and Support Vector Machines

Published: April 2000

Volume 13, pages 1–50, (2000)
Cite this article

Advances in Computational Mathematics Aims and scope Submit manuscript

Theodoros Evgeniou¹,
Massimiliano Pontil¹ &
Tomaso Poggio¹

2397 Accesses
778 Citations
9 Altmetric
Explore all metrics

Abstract

Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular, the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik's theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

D. Allen, The relationship between variable selection and data augmentation and a method for prediction, Technometrics 16 (1974) 125–127.
Article MATH MathSciNet Google Scholar
N. Alon, S. Ben-David, N. Cesa-Bianchi and D. Haussler, Scale-sensitive-dimensions, uniform convergence, and learnability, in: Symposium on Foundations of Computer Science(1993).
S. Amari, A. Cichocki and H. Yang, A new learning algorithm for bling signal separation, in: Advances in Neural Information Processing System(MIT Press, Cambridge, MA, 1995) pp. 757–763.
Google Scholar
S.I. Amari, Natural gradient works efficiently in learning, Neural Comput. 10 (1998) 251–276.
Article Google Scholar
N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc. 686 (1950) 337–404.
Article MathSciNet Google Scholar
P. Bartlett, The sample complexity of pattern classification with neural networks: the size of the weights is more important that the size of the network, IEEE Trans. Inform. Theory (1998).
P. Bartlett, P.M. Long and R.C. Williamson, Fat-shattering and the learnability of real-valued functions, J. Comput. Systems Sci. 52(3) (1996) 434–452.
Article MATH MathSciNet Google Scholar
P. Bartlett and J. Shawe-Taylor, Generalization performance of support vector machine and other patern classifiers, in: Advances in Kernel Methods-Support Vector Learning, eds. C. Burges and B. Scholkopf (MIT Press, Cambridge, MA, 1998).
Google Scholar
A.J. Bell and T. Sejnowski, An information maximization approach to blind separation and blind deconvolution, Neural Comput. 7 (1995) 1129–1159.
Google Scholar
M. Bertero, Regularization methods for linear inverse problems, in: Inverse Problems, ed. C.G. Talenti (Springer, Berlin, 1986).
Google Scholar
M. Bertero, T. Poggio and V. Torre, Ill-posed problems in early vision, Proc. IEEE 76 (1988) 869–889.
Article Google Scholar
L. Bottou and V. Vapnik, Local learning algorithms, Neural Comput. 4(6) (1992) 888–900.
Google Scholar
M.D. Buhmann, Multivariate cardinal interpolation with radial basis functions, Constr. Approx. 6 (1990) 225–255.
Article MATH MathSciNet Google Scholar
M.D. Buhmann, On quasi-interpolation with Radial Basis Functions, Numerical Analysis Reports DAMPT 1991/NA3, Department of Applied Mathematics and Theoretical Physics, Cambridge, England (March 1991).
Google Scholar
C. Burges, A tutorial on support vector machines for pattern recognition, in: Data Mining and Knowledge Discovery, Vol. 2 (Kluwer Academic, Boston, 1998).
Google Scholar
O. Chapelle and V. Vapnik, Model selection for support vector machines, in: Advances in Neural Information Processing Systems(1999).
S. Chen, D. Donoho and M. Saunders, Atomic decomposition by basis pursuit, Technical Report 479, Department of Statistics, Stanford University (May 1995).
S. Chen, Basis Pursuit, Ph.D. thesis, Department of Statistics, Stanford University (November 1995).
V. Cherkassky and F. Mulier, Learning from Data: Concepts, Theory, and Methods(Wiley, New York, 1998).
MATH Google Scholar
J.A. Cochran, The Analysis of Linear Integral Equations(McGraw-Hill, New York, 1972).
MATH Google Scholar
R.R. Coifman and M.V. Wickerhauser, Entropy-based algorithms for best-basis selection, IEEE Trans. Inform. Theory 38 (1992) 713–718.
Article MATH Google Scholar
C. Cortes and V. Vapnik, Support vector networks, Machine Learning 20 (1995) 1–25.
Google Scholar
R. Courant and D. Hilbert, Methods of Mathematical Physics, Vol. 2 (Interscience, London, England, 1962).
MATH Google Scholar
I. Daubechies, Ten Lectures on Wavelets, CBMS-NSF Regional Conferences Series in Applied Mathematics (SIAM, Philadelphia, PA, 1992).
MATH Google Scholar
C. de Boor, Quasi-interpolants and approximation power of multivariate splines, in: Computation of Curves and Surfaces, eds. M. Gasca and C.A. Micchelli (Kluwer Academic, Dordrecht, Netherlands, 1990) pp. 313–345.
Google Scholar
R.A. DeVore, Nonlinear approximation, Acta Numerica (1998) 51–150.
L. Devroye, L. Gy¨orfi and G. Lugosi, A Probabilistic Theory of Pattern Recognition, Applications of Mathematics, Vol. 31 (Springer, New York, 1996).
Google Scholar
R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis(Wiley, New York, 1973).
MATH Google Scholar
R.M. Dudley, A course on empirical processes, in: Lecture Notes in Mathematics, Vol. 1097 (1984) pp. 2–142.
MathSciNet Google Scholar
R.M. Dudley, E. Gine and J. Zinn, Uniform and universal Glivenko-Cantelli classes, J. Theoret. Probab. 4 (1991) 485–510.
Article MATH MathSciNet Google Scholar
N. Dyn, Interpolation and approximation by radial and related functions, in: Approximation Theory VI, eds. C.K. Chui, L.L. Schumaker and D.J. Ward (Academic Press, New York, 1991) pp. 211–234.
Google Scholar
N. Dyn, I.R.H. Jackson, D. Levin and A. Ron, On multivariate approximation by integer translates of a basis function, Computer Sciences Technical Report 886, University of Wisconsin-Madison (November 1989).
Google Scholar
N. Dyn, D. Levin and S. Rippa, Numerical procedures for surface fitting of scattered data by radial functions, SIAM J. Sci. Statist. Comput. 7(2) (1986) 639–659.
Article MATH MathSciNet Google Scholar
T. Evgeniou, L. Perez-Breva, M. Pontil and T. Poggio, Bounds on the generalization performance of kernel machines ensembles, A.I. Memo, MIT Artificial Intelligence Lab. (1999).
T. Evgeniou and M. Pontil, A note on the generalization performance of kernel classifiers with margin, A.I. Memo, MIT Artificial Intelligence Lab. (1999).
T. Evgeniou and M. Pontil, On the v-gamma-dimension for regression in reproducing kernel Hilbert spaces, A.I. Memo, MIT Artificial Intelligence Lab. (1999).
F. Girosi, Models of noise and robust estimates, A.I. Memo 1287, Artificial Intelligence Laboratory, Massachusetts Institute of Technology (1991).
F. Girosi, An equivalence between sparse approximation and Support Vector Machines, Neural Comput. 10(6) (1998) 1455–1480.
Article Google Scholar
F. Girosi, M. Jones and T. Poggio, Regularization theory and neural networks architectures, Neural Comput. 7 (1995) 219–269.
Google Scholar
F. Girosi, T. Poggio and B. Caprile, Extensions of a theory of networks for approximation and learning: outliers and negative examples, in: Advances in Neural Information Processings Systems, Vol. 3, eds. R. Lippmann, J. Moody and D. Touretzky (Morgan Kaufmann, San Mateo, CA, 1991).
Google Scholar
W. H¨ardle, Applied Nonparametric Regression, Econometric Society Monographs, Vol. 19 (Cambridge University Press, 1990).
G.F. Harpur and R.W. Prager, Development of low entropy coding in a recurrent network, Network 7 (1996) 277–284.
Article Google Scholar
T. Hastie and R. Tibshirani, Generalized Additive Models, Monographs on Statistics and Applied Probability, Vol. 43 (Chapman and Hall, London, 1990).
Google Scholar
S. Haykin, Neural Networks: A Comprehensive Foundation(Macmillan, New York, 1994).
MATH Google Scholar
H. Hochstadt, Integral Equations, Wiley Classics Library (Wiley, New York, 1973).
MATH Google Scholar
V.V. Ivanov, The Theory of Approximate Methods and Their Application to the Numerical Solution of Singular Integral Equations(Noordhoff International, Leiden, 1976).
MATH Google Scholar
T. Jaakkola and D. Haussler, Probabilistic kernel regression models, in: Proc. of Neural Information Processing Conference(1998).
I.R.H. Jackson, Radial Basis Functions methods for multivariate approximation, Ph.D. thesis, University of Cambridge, UK (1988).
Google Scholar
M. Kearns, Y. Mansour, A. Ng and D. Ron, An experimental and theoretical comparison of model selection methods, in: Proceedings of the 8th Annual ACM Conference on Computational Learning Theory(1995).
M. Kearns and R. E. Shapire, Efficient distribution-free learning of probabilistic concepts, J. Comput. Syst. Sci. 48(3) (1994) 464–497.
Article MATH Google Scholar
G.S. Kimeldorf and G. Wahba, A correspondence between Bayesan estimation on stochastic processes and smoothing by splines, Ann. Math. Statist. 41(2) (1971) 495–502.
MathSciNet Google Scholar
T.-W. Lee, M. Girolami, A.J. Bell and T. Sejnowski, A unifying information-theoretical framework for independent component analysis, Internat. J. Math. Comput. Mod. (1998) to appear.
M. Lewicki and T. Sejnowski, Learning nonlinear overcomplete representation for efficient coding, in: Advances in Neural Information Processing System(1997).
G.G. Lorentz, Approximation of Functions(Chelsea, New York, 1986).
MATH Google Scholar
D.J.C. MacKay, Introduction to Gaussian processes (1997) (available at the URL: http://wol.ra.phy.cam.ac.uk/mackay).
W.R. Madych and S.A. Nelson, Polyharmonic cardinal splines: a minimization property, J. Approx. Theory 63 (1990) 303–320.
Article MATH MathSciNet Google Scholar
S. Mallat and Z. Zhang, Matching Pursuit in a time-frequency dictionary, IEEE Trans. Signal Process. 41 (1993) 3397–3415.
Article MATH Google Scholar
J.L. Marroquin, S. Mitter and T. Poggio, Probabilistic solution of ill-posed problems in computational vision, J. Amer. Statist. Assoc. 82 (1987) 76–89.
Article MATH Google Scholar
H.N. Mhaskar, Neural networks for localized approximation of real functions, in: Neural Networks for Signal Processing III, Proceedings of the 1993 IEEE-SP Workshop, eds. C.A. Kamm et al. (IEEE Signal Processing Society, New York, 1993) pp. 190–196.
Google Scholar
C.A. Micchelli, Interpolation of scattered data: distance matrices and conditionally positive definite functions, Constr. Approx. 2 (1986) 11–22.
Article MATH MathSciNet Google Scholar
P. Niyogi and F. Girosi, On the relationship between generalization error, hypothesis complexity, and sample complexity for radial basis functions, Neural Comput. 8 (1996) 819–842.
Google Scholar
P. Niyogi, F. Girosi and T. Poggio, Incorporating prior information in machine learning by creating virt ual examples, Proc. IEEE 86(11) (1998) 2196–2209.
Article Google Scholar
E. Oja, The nonlinear pca learning rule in independent component analysis, Neurocomput. 17 (1997) 25–45.
Article Google Scholar
B. Olshausen, Learning linear, sparse, factorial codes, A.I. Memo 1580, MIT Artificial Intelligence Lab. (1996).
B.A. Olshausen and D.J. Field, Emergence of simple-cell receptive field properties by learning a sparse code for natural images, Nature381 (1996) 607–609.
Article Google Scholar
C. Papageorgiou, F. Girosi and T. Poggio, Sparse correlation kernel based signal reconstruction, Technical Report 1635, Artificial Intelligence Laboratory, Massachusetts Institute of Technology (1998). (CBCL Memo 162.)
Google Scholar
G. Parisi, Statistical Field Theory(Addison-Wesley, Reading, MA, 1988).
MATH Google Scholar
T. Poggio, On optimal nonlinear associative recall, Biological Cybernetics 19 (1975) 201–209.
Article MATH MathSciNet Google Scholar
T. Poggio and F. Girosi, A theory of networks for approximation and learning, A.I. Memo No. 1140, Artificial Intelligence Laboratory, Massachusetts Institute of Technology (1989).
Google Scholar
T. Poggio and F. Girosi, Networks for approximation and learning, Proc. IEEE 78(9) (1990).
T. Poggio and F. Girosi, Networks for approximation and learning, in: Foundations of Neural Networks, ed. C. Lau (IEEE Press, Piscataway, NJ, 1992) pp. 91–106.
Google Scholar
T. Poggio and F. Girosi, A sparse representation for function approximation, Neural Comput. 10(6) (1998).
T. Poggio, V. Torre and C. Koch, Computational vision and regularization theory, Nature 317 (1985) 314–319.
Article Google Scholar
D. Pollard, Convergence of Stochastic Processes(Springer, Berlin, 1984).
MATH Google Scholar
M. Pontil, S. Mukherjee and F. Girosi, On the noise model of support vector machine regression, A.I. Memo, MIT Artificial Intelligence Laboratory (1998) (in preparation).
M. Pontil, R. Rifkin and T. Evgeniou, From regression to classification in support vector machines, A.I. Memo 1649, MIT Artificial Intelligence Lab. (1998).
M.J.D. Powell, The theory of radial basis functions approximation in 1990, in: Advances in Numerical Analysis Volume II: Wavelets, Subdivision Algorithms and Radial Basis Functions, ed. W.A. Light (Oxford University Press, 1992) pp. 105–210.
C. Rabut, How to build quasi-interpolants. Applications to polyharmonic B-splines, in: Curves and Surfaces, eds. P.-J. Laurent, A. Le M´ehaut´e and L.L. Schumaker (Academic Press, New York, 1991) pp. 391–402.
Google Scholar
C. Rabut, An introduction to Schoenberg's approximation, Comput. Math. Appl. 24(12) (1992) 149–175.
Article MATH MathSciNet Google Scholar
R. Rifkin, M. Pontil and A. Verri, A note on support vector machine degeneracy, A.I. Memo, MIT Artificial Intelligence Lab. (1999).
J. Rissanen, Modeling by shortest data description, Automatica 14 (1978) 465–471.
Article MATH Google Scholar
I.J. Schoenberg, Contributions to the problem of approximation of equidistant data by analytic functions, Part A: On the problem of smoothing of graduation, a first class of analytic approximation formulae, Quart. Appl. Math. 4 (1946) 45–99.
MATH MathSciNet Google Scholar
I.J. Schoenberg, Cardinal interpolation and spline functions, J. Approx. Theory 2 (1969) 167–206.
Article MATH MathSciNet Google Scholar
B. Scholkpof, P. Simard, A. Smola and V. Vapnik, Prior knowledge in suport vector kernels, in: Advances in Neural Information Processing Systems 9(1997).
L.L. Schumaker, Spline Functions: Basic Theory(Wiley, New York, 1981).
MATH Google Scholar
J. Shawe-Taylor, P.L. Bartlett, R.C. Williamson and M. Anthony, Structural risk minimization over data-dependent hierarchies, IEEE Trans. Inform. Theory (1998) to appear. Also: NeuroCOLT Technical Report NC-TR-96–053 (1996) ftp://ftp.dcs.rhbnc.ac.uk/pub/neurocolt/tech reports.
J. Shawe-Taylor and N. Cristianini, Robust bounds on generalization from the margin distribution, Technical Report NeuroCOLT2, Technical Report NC2–TR-1998–029, NeuroCOLT2 (1998).
B.W. Silverman, Spline smoothing: the equivalent variable kernel method, Ann. Statist. 12 (1984) 898–916.
MATH MathSciNet Google Scholar
P. Simard, Y. LeCun and J. Denker, Efficient pattern recognition using a new transformation distance, in: Advances in Neural Information Processing Systems 5(1993) pp. 50–58.
Google Scholar
A. Smola and B. Sch¨olkopf, On a kernel-based method for pattern recognition, regression, approximation and operator inversion, Algorithmica 22 (1998) 211–231.
Article MATH MathSciNet Google Scholar
J. Stewart, Positive definite functions and generalizations, an historical survey, Rocky Mountain J. Math. 6 (1976) 409–434.
Article MATH MathSciNet Google Scholar
A.N. Tikhonov and V.Y. Arsenin, Solutions of Ill-posed Problems(W.H. Winston, Washington, DC, 1977).
MATH Google Scholar
L.G. Valiant, A theory of learnable, in: Proc. of the 1984 STOC(1984) pp. 436–445.
V.N. Vapnik, Estimation of Dependences Based on Empirical Data(Springer, Berlin, 1982).
MATH Google Scholar
V.N. Vapnik, The Nature of Statistical Learning Theory(Springer, New York, 1995).
MATH Google Scholar
V.N. Vapnik, Statistical Learning Theory(Wiley, New York, 1998).
MATH Google Scholar
V.N. Vapnik and A.Y. Chervonenkis, On the uniform convergence of relative frequences of events to their probabilities, Theory Probab. Appl. 17(2) (1971) 264–280.
Article MathSciNet Google Scholar
V.N. Vapnik and A.Ya. Chervonenkis, The necessary and sufficient conditions for the uniform convergence of averages to their expected values, Teor. Veroyatn. i Primenen. 26(3) (1981) 543–564.
MATH MathSciNet Google Scholar
V.N. Vapnik and A.Ya. Chervonenkis, The necessary and sufficient conditions for consistency in the empirical risk minimization method, Pattern Recognition and Image Analysis 1(3) (1991) 283–305.
Google Scholar
G. Wahba, Spline bases, regularization, and generalized cross-validation for solving approximation problems with large quantities of noisy data, in: Proceedings of the International Conference on Approximation Theory in Honour of George Lorenz, eds. J. Ward and E. Cheney, Austin, TX January 8–10, 1980 (Academic Press, 1980).
G. Wahba, A comparison of GCV and GML for choosing the smoothing parameter in the generalized splines smoothing problem, Ann. Statist. 13 (1985) 1378–1402.
MATH MathSciNet Google Scholar
G. Wahba, Splines Models for Observational Data, Series in Applied Mathematics, Vol. 59 (SIAM, Philadelphia, PA, 1990).
Google Scholar
R. Williamson, A. Smola and B. Scholkopf, Generalization performance of regularization networks and support vector machines via entropy numbers, Technical Report NC-TR-98–019, Royal Holloway College University of London (1998).

Download references

Author information

Authors and Affiliations

Center for Biological and Computational Learning and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA E-mail:
Theodoros Evgeniou, Massimiliano Pontil & Tomaso Poggio

Authors

Theodoros Evgeniou
View author publications
You can also search for this author in PubMed Google Scholar
Massimiliano Pontil
View author publications
You can also search for this author in PubMed Google Scholar
Tomaso Poggio
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Evgeniou, T., Pontil, M. & Poggio, T. Regularization Networks and Support Vector Machines. Advances in Computational Mathematics 13, 1–50 (2000). https://doi.org/10.1023/A:1018946025316

Download citation

Issue Date: April 2000
DOI: https://doi.org/10.1023/A:1018946025316

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Regularization Networks and Support Vector Machines

Abstract

Access this article

Similar content being viewed by others

Support Vector Machines

Kernel Ridge Regression

Regularization: From Inverse Problems to Large-Scale Machine Learning

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Regularization Networks and Support Vector Machines

Abstract

Access this article

Similar content being viewed by others

Support Vector Machines

Kernel Ridge Regression

Regularization: From Inverse Problems to Large-Scale Machine Learning

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation