Skip to main content

2016 | OriginalPaper | Buchkapitel

8. Classification Models

verfasst von : Tilo Wendler, Sören Gröttrup

Erschienen in: Data Mining with SPSS Modeler

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

One of the main problems often occurs in data analytics is assigning a category to each data record. These kinds of problems are very common in all kinds of areas and fields, such as Economics, Medicine, and Computer Science. For example, one classical use case in the online marketing sector is to decide if a customer should get a certain e-mail promotion, as he or she is likely to respond to it. Classification models are the mathematical tool to face these problems. In this chapter, we introduce the most famous classification methods, which are provided by the IBM SPSS Modeler. We explain how these classifiers are trained and validated with the IBM SPSS Modeler and describe their usage and interpretation on data examples.
After finishing this chapter, the reader …
1.
is familiar with the most challenges when dealing with a classification problem and knows how to handles them.
 
2.
possesses a large toolbox of different classification methods and knows their advantages and disadvantages.
 
3.
is able to build various classification models with the SPSS Modeler and is able to apply it to new data for prediction.
 
knows various validation methods and criteria and can evaluate the quality of the trained classification models within the SPSS Modeler stream.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Allison, P. D. (2014). Measures of fit for logistic regression. Accessed 19/09/2015, from http://support.sas.com/resources/papers/proceedings14/1485-2014.pdf Allison, P. D. (2014). Measures of fit for logistic regression. Accessed 19/09/2015, from http://​support.​sas.​com/​resources/​papers/​proceedings14/​1485-2014.​pdf
Zurück zum Zitat Azzalini, A., & Scarpa, B. (2012). Data analysis and data mining: An introduction. Oxford: Oxford University Press.MATH Azzalini, A., & Scarpa, B. (2012). Data analysis and data mining: An introduction. Oxford: Oxford University Press.MATH
Zurück zum Zitat Ben-Gal, I. (2008). Bayesian Networks. In F. Ruggeri, R. S. Kenett, & F. W. Faltin (Eds.), Encyclopedia of statistics in quality and reliability. Chichester, UK: Wiley. Ben-Gal, I. (2008). Bayesian Networks. In F. Ruggeri, R. S. Kenett, & F. W. Faltin (Eds.), Encyclopedia of statistics in quality and reliability. Chichester, UK: Wiley.
Zurück zum Zitat Beyer, K., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is “Nearest Neighbor” meaningful? In G. Goos, J. Hartmanis, J. van Leeuwen, C. Beeri, & P. Buneman (Eds.), Database Theory—ICDT’99, Lecture notes in computer science (Vol. 1540, pp. 217–235). Berlin: Springer. Beyer, K., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is “Nearest Neighbor” meaningful? In G. Goos, J. Hartmanis, J. van Leeuwen, C. Beeri, & P. Buneman (Eds.), Database Theory—ICDT’99, Lecture notes in computer science (Vol. 1540, pp. 217–235). Berlin: Springer.
Zurück zum Zitat Biggs, D., de Ville, B., & Suen, E. (1991). A method of choosing multiway partitions for classification and decision trees. Journal of Applied Statistics, 18(1), 49–62.CrossRef Biggs, D., de Ville, B., & Suen, E. (1991). A method of choosing multiway partitions for classification and decision trees. Journal of Applied Statistics, 18(1), 49–62.CrossRef
Zurück zum Zitat Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. Boca Raton, FL: CRC Press.MATH Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. Boca Raton, FL: CRC Press.MATH
Zurück zum Zitat Cheng, B., & Titterington, D. M. (1994). Neural Networks: A review from a statistical perspective. Statistical Science, 9(1), 2–30.CrossRefMATHMathSciNet Cheng, B., & Titterington, D. M. (1994). Neural Networks: A review from a statistical perspective. Statistical Science, 9(1), 2–30.CrossRefMATHMathSciNet
Zurück zum Zitat Cormen, T. H. (2009). Introduction to algorithms. Cambridge: MIT Press.MATH Cormen, T. H. (2009). Introduction to algorithms. Cambridge: MIT Press.MATH
Zurück zum Zitat Esposito, F., Malerba, D., Semeraro, G., & Kay, J. (1997). A comparative analysis of methods for pruning decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 476–493.CrossRef Esposito, F., Malerba, D., Semeraro, G., & Kay, J. (1997). A comparative analysis of methods for pruning decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 476–493.CrossRef
Zurück zum Zitat Fisher, R. A. (1936). The use of multiple measurement in taxonomic problems. Annals of Eugenics, 7(2), 179–188.CrossRef Fisher, R. A. (1936). The use of multiple measurement in taxonomic problems. Annals of Eugenics, 7(2), 179–188.CrossRef
Zurück zum Zitat He, H., & Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.CrossRef He, H., & Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.CrossRef
Zurück zum Zitat Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques, The Morgan Kaufmann series in data management systems (3rd ed.). Waltham, MA: Morgan Kaufmann. Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques, The Morgan Kaufmann series in data management systems (3rd ed.). Waltham, MA: Morgan Kaufmann.
Zurück zum Zitat James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 103). New York: Springer.MATH James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 103). New York: Springer.MATH
Zurück zum Zitat Kanji, G. K. (2009). 100 statistical tests (3rd ed.). London: Sage (reprinted). Kanji, G. K. (2009). 100 statistical tests (3rd ed.). London: Sage (reprinted).
Zurück zum Zitat Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2), 119.CrossRef Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2), 119.CrossRef
Zurück zum Zitat Lantz, B. (2013). Machine learning with R: Learn how to use R to apply powerful machine learning methods and gain an insight into real-world applications, Open source. Community experience distilled. Lantz, B. (2013). Machine learning with R: Learn how to use R to apply powerful machine learning methods and gain an insight into real-world applications, Open source. Community experience distilled.
Zurück zum Zitat Loh, W.-Y., & Shih, Y.-S. (1997). Split selection methods for classification trees. Statistica Sinica, 7(4), 815–840.MATHMathSciNet Loh, W.-Y., & Shih, Y.-S. (1997). Split selection methods for classification trees. Statistica Sinica, 7(4), 815–840.MATHMathSciNet
Zurück zum Zitat Niedermeyer, E., Schomer, D. L., & Lopes da Silva, F. H. (2011). Niedermeyer’s electroencephalography: Basic principles, clinical applications, and related fields (6th ed.). Philadelphia: Wolters Kluwer/Lippincott Williams & Wilkins Health. Niedermeyer, E., Schomer, D. L., & Lopes da Silva, F. H. (2011). Niedermeyer’s electroencephalography: Basic principles, clinical applications, and related fields (6th ed.). Philadelphia: Wolters Kluwer/Lippincott Williams & Wilkins Health.
Zurück zum Zitat Oh, S.-H., Lee, Y.-R., & Kim, H.-N. (2014). A novel EEG feature extraction method using Hjorth parameter. International Journal of Electronics and Electrical Engineering, 2(2), 106–110.CrossRef Oh, S.-H., Lee, Y.-R., & Kim, H.-N. (2014). A novel EEG feature extraction method using Hjorth parameter. International Journal of Electronics and Electrical Engineering, 2(2), 106–110.CrossRef
Zurück zum Zitat Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Zurück zum Zitat Quinlan, J. R. (1993). C4.5: Programs for machine learning, The Morgan Kaufmann series in machine learning. San Mateo, CA: Morgan Kaufmann. Quinlan, J. R. (1993). C4.5: Programs for machine learning, The Morgan Kaufmann series in machine learning. San Mateo, CA: Morgan Kaufmann.
Zurück zum Zitat Rivest, R. (1987). Learning decision lists. Machine Learning, 2(3), 229–246. Rivest, R. (1987). Learning decision lists. Machine Learning, 2(3), 229–246.
Zurück zum Zitat Runkler, T. A. (2012). Data analytics: Models and algorithms for intelligent data analysis. Wiesbaden: Springer Vieweg.CrossRef Runkler, T. A. (2012). Data analytics: Models and algorithms for intelligent data analysis. Wiesbaden: Springer Vieweg.CrossRef
Zurück zum Zitat Schölkopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond, Adaptive computation and machine learning. Cambridge, MA: MIT Press. Schölkopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond, Adaptive computation and machine learning. Cambridge, MA: MIT Press.
Zurück zum Zitat Tuffery, S. (2011). Data mining and statistics for decision making, Wiley series in computational statistics. Chichester: Wiley.CrossRefMATH Tuffery, S. (2011). Data mining and statistics for decision making, Wiley series in computational statistics. Chichester: Wiley.CrossRefMATH
Zurück zum Zitat Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, 87(23), 9193–9196.CrossRefMATH Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, 87(23), 9193–9196.CrossRefMATH
Zurück zum Zitat Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G. J., Ng, A., Liu, B., Yu, P. S., Zhou, Z.-H., Steinbach, M., Hand, D. J., & Steinberg, D. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1–37.CrossRef Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G. J., Ng, A., Liu, B., Yu, P. S., Zhou, Z.-H., Steinbach, M., Hand, D. J., & Steinberg, D. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1–37.CrossRef
Zurück zum Zitat Zhou, Z.-H. (2012). Ensemble methods: Foundations and algorithms (Chapman & Hall/CRC machine learning & pattern recognition series). Boca Raton, FL: Taylor & Francis. Zhou, Z.-H. (2012). Ensemble methods: Foundations and algorithms (Chapman & Hall/CRC machine learning & pattern recognition series). Boca Raton, FL: Taylor & Francis.
Metadaten
Titel
Classification Models
verfasst von
Tilo Wendler
Sören Gröttrup
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-28709-6_8