Skip to main content
Top

2013 | OriginalPaper | Chapter

13. Nonlinear Classification Models

Authors : Max Kuhn, Kjell Johnson

Published in: Applied Predictive Modeling

Publisher: Springer New York

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Chapter 12 discussed classification models that defined linear classification boundaries. In this chapter we present models that generate nonlinear boundaries. We begin with explaining several generalizations to the linear discriminant analysis framework such as quadratic discriminant analysis, regularized discriminant analysis, and mixture discriminant analysis (Section 13.1). Other nonlinear classification models include neural networks (Section 13.2), flexible discriminant analysis (Section 13.3), support vector machines (Section 13.4), K-nearest neighbors (Section 13.5), and naive Bayes (Section 13.6). In the Computing Section (13.7) we demonstrate how to train each of these models in R. Finally, exercises are provided at the end of the chapter to solidify the concepts.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
However, MARS and FDA models tend to be more stable than tree-based models since they use linear regression to estimate the model parameters.
 
2
Recall a similar situation with support vector regression models where the prediction function was determined by the samples with the largest residuals.
 
Literature
go back to reference Bergmeir C, Benitez JM (2012). “Neural Networks in R Using the Stuttgart Neural Network Simulator: RSNNS.” Journal of Statistical Software, 46(7), 1–26.CrossRef Bergmeir C, Benitez JM (2012). “Neural Networks in R Using the Stuttgart Neural Network Simulator: RSNNS.” Journal of Statistical Software, 46(7), 1–26.CrossRef
go back to reference Bishop C (1995). Neural Networks for Pattern Recognition. Oxford University Press, Oxford.MATH Bishop C (1995). Neural Networks for Pattern Recognition. Oxford University Press, Oxford.MATH
go back to reference Boser B, Guyon I, Vapnik V (1992). “A Training Algorithm for Optimal Margin Classifiers.” In “Proceedings of the Fifth Annual Workshop on Computational Learning Theory,” pp. 144–152. Boser B, Guyon I, Vapnik V (1992). “A Training Algorithm for Optimal Margin Classifiers.” In “Proceedings of the Fifth Annual Workshop on Computational Learning Theory,” pp. 144–152.
go back to reference Cancedda N, Gaussier E, Goutte C, Renders J (2003). “Word–Sequence Kernels.” The Journal of Machine Learning Research, 3, 1059–1082.MathSciNetMATH Cancedda N, Gaussier E, Goutte C, Renders J (2003). “Word–Sequence Kernels.” The Journal of Machine Learning Research, 3, 1059–1082.MathSciNetMATH
go back to reference Clemmensen L, Hastie T, Witten D, Ersboll B (2011). “Sparse Discriminant Analysis.” Technometrics, 53(4), 406–413.MathSciNetCrossRef Clemmensen L, Hastie T, Witten D, Ersboll B (2011). “Sparse Discriminant Analysis.” Technometrics, 53(4), 406–413.MathSciNetCrossRef
go back to reference Cortes C, Vapnik V (1995). “Support–Vector Networks.” Machine Learning, 20(3), 273–297.MATH Cortes C, Vapnik V (1995). “Support–Vector Networks.” Machine Learning, 20(3), 273–297.MATH
go back to reference Dillon W, Goldstein M (1984). Multivariate Analysis: Methods and Applications. Wiley, New York.MATH Dillon W, Goldstein M (1984). Multivariate Analysis: Methods and Applications. Wiley, New York.MATH
go back to reference Duan K, Keerthi S (2005). “Which is the Best Multiclass SVM Method? An Empirical Study.” Multiple Classifier Systems, pp. 278–285. Duan K, Keerthi S (2005). “Which is the Best Multiclass SVM Method? An Empirical Study.” Multiple Classifier Systems, pp. 278–285.
go back to reference Friedman J (1989). “Regularized Discriminant Analysis.” Journal of the American Statistical Association, 84(405), 165–175.MathSciNetCrossRef Friedman J (1989). “Regularized Discriminant Analysis.” Journal of the American Statistical Association, 84(405), 165–175.MathSciNetCrossRef
go back to reference Hardle W, Werwatz A, Müller M, Sperlich S, Hardle W, Werwatz A, Müller M, Sperlich S (2004). “Nonparametric Density Estimation.” In “Nonparametric and Semiparametric Models,” pp. 39–83. Springer Berlin Heidelberg. Hardle W, Werwatz A, Müller M, Sperlich S, Hardle W, Werwatz A, Müller M, Sperlich S (2004). “Nonparametric Density Estimation.” In “Nonparametric and Semiparametric Models,” pp. 39–83. Springer Berlin Heidelberg.
go back to reference Hastie T, Tibshirani R (1996). “Discriminant Analysis by Gaussian Mixtures.” Journal of the Royal Statistical Society. Series B, pp. 155–176. Hastie T, Tibshirani R (1996). “Discriminant Analysis by Gaussian Mixtures.” Journal of the Royal Statistical Society. Series B, pp. 155–176.
go back to reference Hastie T, Tibshirani R, Buja A (1994). “Flexible Discriminant Analysis by Optimal Scoring.” Journal of the American Statistical Association, 89(428), 1255–1270.MathSciNetCrossRefMATH Hastie T, Tibshirani R, Buja A (1994). “Flexible Discriminant Analysis by Optimal Scoring.” Journal of the American Statistical Association, 89(428), 1255–1270.MathSciNetCrossRefMATH
go back to reference Hsu C, Lin C (2002). “A Comparison of Methods for Multiclass Support Vector Machines.” IEEE Transactions on Neural Networks, 13(2), 415–425.CrossRef Hsu C, Lin C (2002). “A Comparison of Methods for Multiclass Support Vector Machines.” IEEE Transactions on Neural Networks, 13(2), 415–425.CrossRef
go back to reference Kline DM, Berardi VL (2005). “Revisiting Squared–Error and Cross–Entropy Functions for Training Neural Network Classifiers.” Neural Computing and Applications, 14(4), 310–318.CrossRef Kline DM, Berardi VL (2005). “Revisiting Squared–Error and Cross–Entropy Functions for Training Neural Network Classifiers.” Neural Computing and Applications, 14(4), 310–318.CrossRef
go back to reference Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002). “Text Classification Using String Kernels.” The Journal of Machine Learning Research, 2, 419–444.MATH Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002). “Text Classification Using String Kernels.” The Journal of Machine Learning Research, 2, 419–444.MATH
go back to reference Mahé P, Ueda N, Akutsu T, Perret J, Vert J (2005). “Graph Kernels for Molecular Structure–Activity Relationship Analysis with Support Vector Machines.” Journal of Chemical Information and Modeling, 45(4), 939–951.CrossRef Mahé P, Ueda N, Akutsu T, Perret J, Vert J (2005). “Graph Kernels for Molecular Structure–Activity Relationship Analysis with Support Vector Machines.” Journal of Chemical Information and Modeling, 45(4), 939–951.CrossRef
go back to reference Mahé P, Vert J (2009). “Graph Kernels Based on Tree Patterns for Molecules.” Machine Learning, 75(1), 3–35.CrossRef Mahé P, Vert J (2009). “Graph Kernels Based on Tree Patterns for Molecules.” Machine Learning, 75(1), 3–35.CrossRef
go back to reference Niblett T (1987). “Constructing Decision Trees in Noisy Domains.” In I Bratko, N Lavrač (eds.), “Progress in Machine Learning: Proceedings of EWSL–87,” pp. 67–78. Sigma Press, Bled, Yugoslavia. Niblett T (1987). “Constructing Decision Trees in Noisy Domains.” In I Bratko, N Lavrač (eds.), “Progress in Machine Learning: Proceedings of EWSL–87,” pp. 67–78. Sigma Press, Bled, Yugoslavia.
go back to reference Osuna E, Freund R, Girosi F (1997). “Support Vector Machines: Training and Applications.” Technical report, MIT Artificial Intelligence Laboratory. Osuna E, Freund R, Girosi F (1997). “Support Vector Machines: Training and Applications.” Technical report, MIT Artificial Intelligence Laboratory.
go back to reference Platt J (2000). “Probabilistic Outputs for Support Vector Machines and Comparison to Regularized Likelihood Methods.” In B Bartlett, B Schölkopf, D Schuurmans, A Smola (eds.), “Advances in Kernel Methods Support Vector Learning,” pp. 61–74. Cambridge, MA: MIT Press. Platt J (2000). “Probabilistic Outputs for Support Vector Machines and Comparison to Regularized Likelihood Methods.” In B Bartlett, B Schölkopf, D Schuurmans, A Smola (eds.), “Advances in Kernel Methods Support Vector Learning,” pp. 61–74. Cambridge, MA: MIT Press.
go back to reference Provost F, Domingos P (2003). “Tree Induction for Probability–Based Ranking.” Machine Learning, 52(3), 199–215.CrossRefMATH Provost F, Domingos P (2003). “Tree Induction for Probability–Based Ranking.” Machine Learning, 52(3), 199–215.CrossRefMATH
go back to reference Suykens J, Vandewalle J (1999). “Least Squares Support Vector Machine Classifiers.” Neural processing letters, 9(3), 293–300.MathSciNetCrossRefMATH Suykens J, Vandewalle J (1999). “Least Squares Support Vector Machine Classifiers.” Neural processing letters, 9(3), 293–300.MathSciNetCrossRefMATH
go back to reference Tipping M (2001). “Sparse Bayesian Learning and the Relevance Vector Machine.” Journal of Machine Learning Research, 1, 211–244.MathSciNetMATH Tipping M (2001). “Sparse Bayesian Learning and the Relevance Vector Machine.” Journal of Machine Learning Research, 1, 211–244.MathSciNetMATH
go back to reference Vapnik V (2010). The Nature of Statistical Learning Theory. Springer. Vapnik V (2010). The Nature of Statistical Learning Theory. Springer.
go back to reference Venables W, Ripley B (2002). Modern Applied Statistics with S. Springer. Venables W, Ripley B (2002). Modern Applied Statistics with S. Springer.
go back to reference Zadrozny B, Elkan C (2001). “Obtaining Calibrated Probability Estimates from Decision Trees and Naive Bayesian Classifiers.” In “Proceedings of the 18th International Conference on Machine Learning,” pp. 609–616. Morgan Kaufmann. Zadrozny B, Elkan C (2001). “Obtaining Calibrated Probability Estimates from Decision Trees and Naive Bayesian Classifiers.” In “Proceedings of the 18th International Conference on Machine Learning,” pp. 609–616. Morgan Kaufmann.
go back to reference Zhu J, Hastie T (2005). “Kernel Logistic Regression and the Import Vector Machine.” Journal of Computational and Graphical Statistics, 14(1), 185–205.MathSciNetCrossRef Zhu J, Hastie T (2005). “Kernel Logistic Regression and the Import Vector Machine.” Journal of Computational and Graphical Statistics, 14(1), 185–205.MathSciNetCrossRef
Metadata
Title
Nonlinear Classification Models
Authors
Max Kuhn
Kjell Johnson
Copyright Year
2013
Publisher
Springer New York
DOI
https://doi.org/10.1007/978-1-4614-6849-3_13

Premium Partner