nach oben

Erschienen in:

2011 | OriginalPaper | Buchkapitel

Supervised Learning by Support Vector Machines

verfasst von : Gabriele Steidl

Erschienen in: Handbook of Mathematical Methods in Imaging

Verlag: Springer New York

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

During the last 2 decades support vector machine learning has become a very active field of research with a large amount of both sophisticated theoretical results and exciting real-word applications. This chapter gives a brief introduction into the basic concepts of supervised support vector learning and touches some recent developments in this broad field.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Statistical Methods in Imaging

Nächstes Kapitel Total Variation in Imaging

Aizerman M, Braverman E, Rozonoer L (1964) Uncovering shared structures in multiclassification. Int Conf Mach Learn 25: 821–837MathSciNet

Amit Y, Fink M, Srebro N, Ullman S (2007) Theoretocal foundations of the potential function method in pattern recognition learning. Automat Rem Contr 25:17–24

Anthony M, Bartlett PL (1999) Neural network learning: theoretical foundations. Cambridge University Press, CambridgeMATHCrossRef

Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272CrossRef

Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68:337–404MathSciNetMATHCrossRef

Bartlett PL, Jordan MI, McAuliffe JD (2006) Convexity, classification, and risk bounds. J Am Stat Assoc 101:138–156MathSciNetMATHCrossRef

Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Optim Methods Softw 1:23–34CrossRef

Berlinet A, Thomas-Agnan C (2004) Reproducing kernel Hilbert spaces in probability and statistics. Kluwer, DordrechtMATHCrossRef

Bishop CM (2006) Pattern recognition and machine learning. Springer, HeidelbergMATH

10.

Björck A (1996) Least squares problems. SIAM, PhiladelphiaMATHCrossRef

11.

Bonnans JF, Shapiro A (2000) Perturbation analysis of optimization problems. Springer, New YorkMATH

12.

Boser GE, Guyon I, Vapnik V (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual ACM workshop on computational learning theory, Madison, pp 144–152

13.

Bottou L, Chapelle L, DeCoste O, Weston J (eds) (2007) Large scale kernel machines. MIT Press, Cambridge

14.

Boucheron S, Bousquet O, Lugosi G (2005) Theory of classification: a survey on some recent advances. ESAIM Probab Stat 9:323–375MathSciNetMATHCrossRef

15.

Bousquet O, Elisseeff A (2001) Algorithmic stability and generalization performance. In: Leen TK, Dietterich TG, Tresp V (eds) Advances in neural information processing systems 13. MIT Press, Cambridge, pp 196–202

16.

Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of the 15th international conference on machine learning, Morgan Kaufmann, San Francisco, pp 82–90

17.

Brown M, Grundy W, Lin D, Cristianini N, Sugnet C, Furey T, Ares M, Haussler D (2000) Knowledge-based analysis of microarray gene-expression data by using support vector machines. Proc Natl Acad Sci 97(1): 262–267CrossRef

18.

Buhmann MD (2003) Radial basis functions. Cambridge University Press, CambridgeMATHCrossRef

19.

Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167CrossRef

20.

Cai J-F, Candès EJ, Shen Z (2008) A singular value thresholding algorithm for matrix completion. Technical report, UCLA computational and applied mathematics

21.

Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75MathSciNetCrossRef

22.

Chang C-C, Lin C-J (2004) LIBSVM: a library for support vector machines. www.csie.ntu.edu.tw/cjlin/papers/libsvm.ps.gz

23.

Chapelle O, Haffner P, Vapnik VN (1999) SVMs for histogram-based image classification. IEEE Trans Neural Netw 10(5):1055–1064CrossRef

24.

Chen P-H, Fan R-E, Lin C-J (2006) A study on SMO-type decomposition methods for support vector machines. IEEE Trans Neural Netw 17:893–908CrossRef

25.

Collobert R, Bengio S (2001) Support vector machines for large scale regression problems. J Mach Learn Res 1:143–160MathSciNet

26.

Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297MATH

27.

Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge

28.

Cucker F, Smale S (2002) On the mathematical foundations of learning. Bull Am Math Soc 39:1–49MathSciNetMATHCrossRef

29.

Cucker F, Zhou DX (2007) Learning theory: an approximation point of view. Cambridge University Press, CambridgeCrossRef

30.

Devroye L, Gyrfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, New YorkMATH

31.

Devroye LP (1982) Any discrimination rule can have an arbitrarily bad probability of error for finite sample size. IEEE Trans Pattern Anal Mach Intell 4:154–157MATHCrossRef

32.

Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artfic Int Res 2:263–286MATH

33.

Dinuzzo F, Neve M, Nicolao GD, Gianazza UP (2007) On the representer theorem and equivalent degrees of freedom of SVR. J Mach Learn Res 8:2467–2495MathSciNetMATH

34.

Duda RO, Hart PE, Stork D (2001) Pattern classification, 2nd edn. Wiley, New YorkMATH

35.

Edmunds DE, Triebel H (1996) Function spaces, entropy numbers, differential operators. Cambridge University Press, CambridgeMATHCrossRef

36.

Elisseeff A, Evgeniou A, Pontil M (2005) Stability of randomised learning algorithms. J Mach Learn Res 6:55–79MathSciNetMATH

37.

Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv Comput Math 13(1):1–50MathSciNetMATHCrossRef

38.

Fan R-E, Chen P-H, Lin C-J (2005) Working set selection using second order information for training support vector machines. J Mach Learn Res 6:1889–1918MathSciNetMATH

39.

Fasshauer GE (2007) Meshfree approximation methods with MATLAB. World Scientific, New JerseyMATH

40.

Fazel M, Hindi H, Boyd SP (2001) A rank minimization heuristic with application to minimum order system approximation. In: Proceedings of the American control conference, Arlington, pp 4734–4739

41.

Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugenics 7:179–188CrossRef

42.

Flake GW, Lawrence S (1999) Efficient SVM regression training with SMO. Technical report, NEC Research Institute

43.

Gauss CF (1963) Theory of the motion of the heavenly bodies moving about the sun in conic sections. (trans: Davis CH). Dover, New York; first published 1809

44.

Girosi F (1998) An equivalence between sparse approximation and support vector machines. Neural Comput 10(6):1455–1480CrossRef

45.

Golub GH, Loan CFV (1996) Matrix computation, 3rd edn. John Hopkins University Press, Baltimore

46.

Gyrfi L, Kohler M, Krzyżak A, Walk H (2002) A distribution-free theory of non-parametric regression. Springer, New YorkCrossRef

47.

Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, New YorkMATH

48.

Herbrich R (2001) Learning Kernel classifiers: theory and algorithms. MIT Press, Cambridge

49.

Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67MathSciNetMATH

50.

Huang T, Kecman V, Kopriva I, Friedman J (2006) Kernel based algorithms for mining huge data sets: supervised semi-supervised and unsupervised learning. Springer, BerlinMATH

51.

Jaakkola TS, Haussler D (1999) Probabilistic kerbnel regression models. In: Proceedings of the 1999 conference on artificial inteligence and statistics

52.

Joachims T (1999) Making large-scale SVM learning practical. In: Schlkopf B, Burges C, Smola A (eds) Advances in Kernel methods-support vector learning. MIT Press, Cambridge, pp 41–56

53.

Joachims T (2002) Learning to classify text using support vector machines. Kluwer, BostonCrossRef

54.

Kailath T (1971) RKHS approach to detection and estimation problems: Part I: deterministic signals in Gaussian noise. IEEE Trans Inform Theory 17(5):530–549MathSciNetMATHCrossRef

55.

Keerthi SS, Shevade SK, Battacharyya C, Murthy KRK (2001) Improvements to Platt’s SMO algorithm for SMV classifier design. Neural Comput 13:637–649MATHCrossRef

56.

Kimeldorf GS, Wahba G (1971) Some results on Tchebycheffian spline functions. J Math Anal Appl 33:82–95MathSciNetMATHCrossRef

57.

Kolmogorov AN, Tikhomirov VM (1961) ε-entropy and ε-capacity of sets in functional spaces. Am Math Soc Trans 17:277–364

58.

Kondor RI, Lafferty J (2002) Diffusion kernels on graphs and other discrete structures. In: Kauffman M (ed) Proceedings of the international conference on machine learning, Morgan Kaufman, San Mateo

59.

Krige DG (1951) A statistical approach to some basic mine valuation problems on the witwatersrand. J Chem Met Mining Soc S Africa 52(6):119–139

60.

Kuhn HW, Tucker AW (1951) Nonlinear programming. In: Proceedings of the Berkley symposium on mathematical statistics and probability, University of California Press, Berkeley, pp 482–492

61.

Laplace PS (1816) Théorie Analytique des Probabilités, 3rd edn. Courier, Paris

62.

LeCun Y, Jackel LD, Bottou L, Brunot A, Cortes C, Denker JS, Drucker H, Guyon I, Müller U, Säckinger E, Simard P, Vapnik V (1995) Comparison of learning algorithms for handwritten digit recognition. In: Fogelman-Souleé F, Gallinari P (eds) Proceedings of ICANN’95, vol 2. EC2 & Cie, Paris, pp 53–60

63.

Legendre AM (1805) Nouvelles Méthodes pour la Determination des Orbites des Cométes. Courier, Paris

64.

Leopold E, Kinderman J (2002) Text categogization with support vector machines how to represent text in input space? Mach Learn 46(1–3):223–244

65.

Lin CJ (2001) On the convergence of the decomposition method for support vector machines. IEEE Trans Neural Netw 12:1288–1298CrossRef

66.

Lu Z, Monteiro RDC, Yuan M (2008) Convex optimization methods for dimension reduction and coefficient estimation in multivariate linear regression. Submitted to Math Program

67.

Ma S, Goldfarb D, Chen L (2008) Fixed point and Bregman iterative methods for matrix rank minimization. Technical report 08-78, UCLA Computational and applied mathematics

68.

Mangasarian OL (1994) Nonlinear programming. SIAM, MadisonMATHCrossRef

69.

Mangasarian OL, Musicant DR (1999) Successive overrelaxation for support vector machines. IEEE Trans Neural Netw 10:1032–1037CrossRef

70.

Matheron G (1963) Principles of geostatistics. Econ Geol 58:1246–1266CrossRef

71.

Micchelli CA (1986) Interpolation of scattered data: distance matices and conditionally positive definite functions. Constr Approx 2:11–22MathSciNetMATHCrossRef

72.

Micchelli CA, Pontil M (2005) On learning vector-valued functions. Neural Comput 17: 177–204MathSciNetMATHCrossRef

73.

Mitchell TM (1997) Machine learning. McGraw-Hill, BostonMATH

74.

Mukherjee S, Niyogi P, Poggio T, Rifkin R (2006) Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv Comput Math 25:161–193MathSciNetMATHCrossRef

75.

Neumann J, Schnörr C, Steidl G (2005) Efficient wavelet adaptation for hybrid wavelet–large margin classifiers. Pattern Recogn 38: 1815–1830MATHCrossRef

76.

Obozinski G, Taskar B, Jordan MI (2009) Joint covariate selection and joint subspace selection for multiple classification problems. Stat Comput (in press)

77.

Osuna E, Freund R, Girosi F (1997) Training of support vector machines: an application to face detection. In: Proceedings of the CVPR’97, IEEE Computer Society, Washington, pp 130–136

78.

Parzen E (1970) Statistical inference on time series by RKHS methods. Technical report, Department of Statistics, Stanford University

79.

Pinkus A (1996) N-width in approximation theory. Springer, Berlin

80.

Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in Kernel methods – support vector learning. MIT Press, Cambridge, pp 185–208

81.

Poggio T, Girosi F (1990) Networks for approximation and learning. Proc IEEE 78(9):1481–1497CrossRef

82.

Pong TK, Tseng P, Ji S, Ye J (2009) Trace norm regularization: reformulations, algorithms and multi-task learning. University of Washington, preprint

83.

Povzner AY (1950) A class of Hilbert function spaces. Doklady Akademii Nauk USSR 68: 817–820MathSciNet

84.

Rosenblatt F (1959) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65: 386–408CrossRef

85.

Schoenberg IJ (1938) Metric spaces and completely monotone functions. Ann Math 39: 811–841MathSciNetCrossRef

86.

Schölkopf B, Herbrich R, Smola AJ (2001) A generalized representer theorem. In: Helmbold D, Williamson B (eds) Proceedings of the 14th annual conference on computational learning theory. Springer, New York, pp 416–426

87.

Schölkopf B, Smola AJ (2002) Learning with Kernels: support vector machnes, regularization, optimization, and beyond. MIT Press, Cambridge

88.

Shawe-Taylor J, Cristianini N (2009) Kernel methods for pattern analysis, 4th edn. Cambridge University Press, New York

89.

Smola AJ, Schölkopf B, Müller KR (1998) The connection between regularization operators and support vector kernels. Neural Netw 11: 637–649CrossRef

90.

Spellucci P (1993) Numerische verfahren der nichtlinearen optimierung. Birkhäuser, Basel/Boston/BerlinMATHCrossRef

91.

Srebro N, Rennie JDM, Jaakkola TS (2005) Maximum-margin matrix factorization. In NIPS, MIT Press, Cambridge, pp 1329–1336

92.

Steinwart I (2003) Sparseness of support vector machines. J Mach Learn Res 4:1071–1105MathSciNet

93.

Steinwart I, Christmann A (2008) Support vector machines. Springer, New YorkMATH

94.

Stone C (1977) Consistent nonparametric regression. Ann Stat 5:595–645MATHCrossRef

95.

Strauss DJ, Steidl G (2002) Hybrid wavelet-support vector classification of waveforms. J Comput Appl Math 148:375–400MathSciNetMATHCrossRef

96.

Strauss DJ, Steidl G, Delb D (2003) Feature extraction by shape-adapted local discriminant bases. Signal Process 83:359–376MATHCrossRef

97.

Sutton RS, Barton AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

98.

Suykens JAK, Gestel TV, Brabanter JD, Moor BD, Vandewalle J (2002) Least squares support vector machines. World Scientific, SingaporeMATHCrossRef

99.

Suykens JAK, Vandevalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300CrossRef

100.

Tao PD, An LTH (1998) A d.c. optimization algorithm for solving the trust-region subproblem. SIAM J Optimiz 8(2):476–505

101.

Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58(1): 267–288MathSciNetMATH

102.

Tikhonov AN, Arsenin VY (1977) Solution of ill-posed problems. Winston, Washington

103.

Toh K-C, Yun S (2009) An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Technical report, Department of Mathematics, National University of Singapore, Singapore

104.

Tsypkin Y (1971) Adaptation and learning in automatic systems. Academic, New York

105.

Vapnik V (1998) Statistical learning theory. Wiley, New YorkMATH

106.

Vapnik VN (1982) Estimation of dependicies based on empirical data. Springer, New York

107.

Vapnik VN, Chervonenkis A (1974) Theory of pattern regognition (in Russian). Nauka, Moscow; German translation: Theorie der Zeichenerkennung, Akademie-Verlag, Berlin, 1979 edition

108.

Vapnik VN, Lerner A (1963) Pattern recognition using generalized portrait method. Automat Rem Contr 24:774–780

109.

Vidyasagar M (2002) A theory of learning and generalization: with applications to neural networks and control systems. 2nd edn. Springer, London

110.

Viola P, Jones M (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154CrossRef

111.

Vito ED, Rosasco L, Caponnetto A, Piana M, Verri A (2004) Some properties of regularized kernel methods. J Mach Learn Res 5:1363–1390MATH

112.

Wahba G (1990) Spline models for observational data. SIAM, New YorkMATHCrossRef

113.

Weimer M, Karatzoglou A, Smola A (2008) Improving maximum margin matrix factorization. Mach Learn 72(3):263–276CrossRef

114.

Wendland H (2005) Scattered data approximation. Cambridge University Press, CambridgeMATH

115.

Weston J, Elisseeff A, Schölkopf B, Tipping M (2003) Use of the zero-norm with linear models and kernel methods. J Mach Learn Res 3: 1439–1461MATH

116.

Weston J, Watkins C (1999) Multi-class support vector machines. In: Verlysen M (ed) Proceedings of ESANN’99, D-Facto Publications, Brussels

117.

Wolfe P (1961) Duality theorem for nonlinear programming. Q Appl Math 19:239–244MathSciNetMATH

118.

Zdenek D (2009) Optimal quadratic programming algorithms with applications to variational inequalities. Springer, New YorkMATH

119.

Zhang T (2004) Statistical behaviour and consistency of classification methods based on convex risk minimization. Ann Stat 32:56–134MATHCrossRef

120.

Zoutendijk G (1960) Methods of feasible directions. A study in linear and nonlinear programming. Elsevier, Amsterdam

Titel: Supervised Learning by Support Vector Machines
verfasst von: Gabriele Steidl
Verlag: Springer New York
Buch: Handbook of Mathematical Methods in Imaging
Print ISBN: 978-0-387-92919-4

Electronic ISBN: 978-0-387-92920-0

Copyright-Jahr: 2011
DOI: https://doi.org/10.1007/978-0-387-92920-0_22

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"