Skip to main content
Erschienen in: Pattern Analysis and Applications 4/2017

17.03.2016 | Theoretical Advances

A framework to induce more stable decision trees for pattern classification

verfasst von: Zahra Mirzamomen, Mohammad Reza Kangavari

Erschienen in: Pattern Analysis and Applications | Ausgabe 4/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Decision tree learning algorithms are known to be unstable, such that small changes in the training data can result in highly different output models. Instability is an important issue in the context of machine learning which is usually overlooked. In this paper, we illustrate and discuss the problem of instability of decision tree induction algorithms and propose a framework to induce more stable decision trees. In the proposed framework, the split test encompasses two advantageous properties: First, it is able to contribute multiple attributes. Second, it has a polylithic structure. The first property alleviates the race between the competing attributes to be installed at an internal node, which is the major cause of instability. The second property has the potential of improving the stability by providing the locality of the effect of the instances on the split test. We illustrate the effectiveness of the proposed framework by providing a complying decision tree learning algorithm and conducting several experiments. We have evaluated the structural stability of the algorithms by employing three measures. The experimental results reveal that the decision trees induced by the proposed framework exhibit great stability and competitive accuracy in comparison with several well-known decision tree learning algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Breiman L (1996) Bagging predictors. Mach Learn 24:123–140MATH Breiman L (1996) Bagging predictors. Mach Learn 24:123–140MATH
3.
Zurück zum Zitat Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth International Group, Belmont, CaliforniaMATH Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth International Group, Belmont, CaliforniaMATH
4.
Zurück zum Zitat Brodley CE, Utgoff PE (1995) Multivariate decision trees. Mach Learn 19:45–77MATH Brodley CE, Utgoff PE (1995) Multivariate decision trees. Mach Learn 19:45–77MATH
5.
Zurück zum Zitat Chandra B, Kothari R, Paul P (1995) A new node splitting measure for decision tree construction. Patt Recogn 43:2725–2731CrossRefMATH Chandra B, Kothari R, Paul P (1995) A new node splitting measure for decision tree construction. Patt Recogn 43:2725–2731CrossRefMATH
6.
Zurück zum Zitat Demšar, J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 1–16 Demšar, J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 1–16
7.
Zurück zum Zitat Dwyer KD (2007) Decision tree instability and active learning, Ph.D. thesis. University of Alberta, Edmonton Dwyer KD (2007) Decision tree instability and active learning, Ph.D. thesis. University of Alberta, Edmonton
8.
Zurück zum Zitat Fong PK, Weber-Jahnke J (2012) Privacy preserving decision tree learning using unrealized data sets. IEEE Trans Know Data Eng 24:353–364CrossRef Fong PK, Weber-Jahnke J (2012) Privacy preserving decision tree learning using unrealized data sets. IEEE Trans Know Data Eng 24:353–364CrossRef
9.
10.
Zurück zum Zitat Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: An update. SIGKDD Explor. Newsl 11:10–18 Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: An update. SIGKDD Explor. Newsl 11:10–18
11.
Zurück zum Zitat Hu Q, Che X, Zhang L, Zhang D, Guo M, Yu D (2012) Rank entropy-based decision trees for monotonic classification. IEEE Trans Know Data Eng 24:2052–2064CrossRef Hu Q, Che X, Zhang L, Zhang D, Guo M, Yu D (2012) Rank entropy-based decision trees for monotonic classification. IEEE Trans Know Data Eng 24:2052–2064CrossRef
12.
Zurück zum Zitat Kohavi R (1996) Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid, Proceedings of the second international conference on knowledge discovery and data mining, AAAI Press, 202–207 Kohavi R (1996) Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid, Proceedings of the second international conference on knowledge discovery and data mining, AAAI Press, 202–207
13.
Zurück zum Zitat Kohavi R, Kunz C (1997) Option decision trees with majority votes, Proceedings of the Fourteenth International Conference on Machine Learning, ICML ’97 Kohavi R, Kunz C (1997) Option decision trees with majority votes, Proceedings of the Fourteenth International Conference on Machine Learning, ICML ’97
14.
Zurück zum Zitat Last M, Maimon O, Minkov E (2002) Improving stability of decision trees. Int J Patt Recogn Art Intell 16:145–159CrossRef Last M, Maimon O, Minkov E (2002) Improving stability of decision trees. Int J Patt Recogn Art Intell 16:145–159CrossRef
15.
Zurück zum Zitat Lichman M (2013) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science Lichman M (2013) UCI machine learning repository [http://​archive.​ics.​uci.​edu/​ml]. Irvine, CA: University of California, School of Information and Computer Science
16.
Zurück zum Zitat Maher PE, St.Clair D (1993) Uncertain reasoning in an ID3 machine learning framework, Second IEEE International Conference on Fuzzy Systems, 7–12 Maher PE, St.Clair D (1993) Uncertain reasoning in an ID3 machine learning framework, Second IEEE International Conference on Fuzzy Systems, 7–12
17.
Zurück zum Zitat Paul J, Verleysen M, Dupont P (2012) The stability of feature selection and class prediction from ensemble tree classifiers, ESANN2012 Special Session on Machine Ensembles Paul J, Verleysen M, Dupont P (2012) The stability of feature selection and class prediction from ensemble tree classifiers, ESANN2012 Special Session on Machine Ensembles
18.
Zurück zum Zitat Quinlan JR (1986) Induction of Decision Trees. Mach Learn 1(1):81–106 Quinlan JR (1986) Induction of Decision Trees. Mach Learn 1(1):81–106
19.
Zurück zum Zitat Quinlan JR (1993) C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco Quinlan JR (1993) C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco
20.
Zurück zum Zitat Shi H (2007) Best-first decision tree learning, Master’s thesis. University of Waikato Shi H (2007) Best-first decision tree learning, Master’s thesis. University of Waikato
21.
Zurück zum Zitat Simpson PK (1992) Fuzzy min-max neural networks. I. Classification. IEEE Trans Neural Networks 3:776–786CrossRef Simpson PK (1992) Fuzzy min-max neural networks. I. Classification. IEEE Trans Neural Networks 3:776–786CrossRef
22.
Zurück zum Zitat Turney P (1995) Technical note: Bias and the quantification of stability. Mach Learn 20:23–33 Turney P (1995) Technical note: Bias and the quantification of stability. Mach Learn 20:23–33
23.
Zurück zum Zitat Wang X, Liu X, Pedrycz W, Zhang L (2015) Fuzzy rule based decision trees. Patt Recogn 48:50–59CrossRef Wang X, Liu X, Pedrycz W, Zhang L (2015) Fuzzy rule based decision trees. Patt Recogn 48:50–59CrossRef
24.
Zurück zum Zitat Yi W, Lu M, Liu Z (2011) Multi-valued attribute and multi-labelled data decision tree algorithm. Int J Mach Learn Cyber 2:67–74CrossRef Yi W, Lu M, Liu Z (2011) Multi-valued attribute and multi-labelled data decision tree algorithm. Int J Mach Learn Cyber 2:67–74CrossRef
25.
Zurück zum Zitat Zimmermann A (2008) Ensemble-trees: leveraging ensemble power inside decision trees. Discovery Science, Springer, Berlin Heidelberg, Lecture Notes in Computer Science 5255:76–87 Zimmermann A (2008) Ensemble-trees: leveraging ensemble power inside decision trees. Discovery Science, Springer, Berlin Heidelberg, Lecture Notes in Computer Science 5255:76–87
26.
Zurück zum Zitat Dannegger F (2000) Tree stability diagnostics and some remedies against instability. Stat Med 19:475–491CrossRef Dannegger F (2000) Tree stability diagnostics and some remedies against instability. Stat Med 19:475–491CrossRef
27.
Zurück zum Zitat Furnkranz J (1998) Integrative windowing. Stat Med 8:129–164MATH Furnkranz J (1998) Integrative windowing. Stat Med 8:129–164MATH
28.
Zurück zum Zitat Rokach L, Maimon O (2008) Data Mining with Decision Trees: Theory and Applications, World Scientific Publishing Co. Pte. Ltd, volume 69 series in machine learning and artificial intelligence Rokach L, Maimon O (2008) Data Mining with Decision Trees: Theory and Applications, World Scientific Publishing Co. Pte. Ltd, volume 69 series in machine learning and artificial intelligence
29.
Zurück zum Zitat Alpaydin E (2010) Introduction to Machine Learning, The MIT Press, 2nd edition Alpaydin E (2010) Introduction to Machine Learning, The MIT Press, 2nd edition
30.
Zurück zum Zitat Briand B, Ducharme GR, Parache V, Mercat-Rommens C (2009) A similarity measure to assess the stability of classification trees. Comp Stat Data Anal 53(4):1208–1217CrossRefMATHMathSciNet Briand B, Ducharme GR, Parache V, Mercat-Rommens C (2009) A similarity measure to assess the stability of classification trees. Comp Stat Data Anal 53(4):1208–1217CrossRefMATHMathSciNet
31.
Zurück zum Zitat Mirzamomen Z, Kangavari M (2016) Fuzzy Min-Max neural network based decision trees, Intelligent Data Analysis, to appear soon in. 20(4) Mirzamomen Z, Kangavari M (2016) Fuzzy Min-Max neural network based decision trees, Intelligent Data Analysis, to appear soon in. 20(4)
33.
Zurück zum Zitat Domingos P, Hulten G (2000) Mining high-speed data streams. Proceedi Sixth ACM SIGKDD Int Conf Know Dis Data Mining, KDD 00:71–80CrossRef Domingos P, Hulten G (2000) Mining high-speed data streams. Proceedi Sixth ACM SIGKDD Int Conf Know Dis Data Mining, KDD 00:71–80CrossRef
34.
Zurück zum Zitat Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams, Proceedings of the 2001 ACM SIGKDD International Conference On Knowledge Discovery and Data Mining, 97–106 Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams, Proceedings of the 2001 ACM SIGKDD International Conference On Knowledge Discovery and Data Mining, 97–106
35.
Zurück zum Zitat Gama J, Rocha R, Medas P (2003) Accurate decision trees for mining high-speed data streams, Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’03, New York, USA, 523–528 Gama J, Rocha R, Medas P (2003) Accurate decision trees for mining high-speed data streams, Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’03, New York, USA, 523–528
36.
Zurück zum Zitat Hashemi S, Yang Y (2009) Flexible decision tree for data stream classification in the presence of concept change, noise and missing values. Data Mining Know Dis 19:95–131CrossRefMathSciNet Hashemi S, Yang Y (2009) Flexible decision tree for data stream classification in the presence of concept change, noise and missing values. Data Mining Know Dis 19:95–131CrossRefMathSciNet
37.
Zurück zum Zitat Bifet A, Gavaldá R (2009) Adaptive learning from evolving data streams. Advances in Intelligent Data Analysis VIII, Lecture Notes in Computer Science, Springer, Berlin Heidelberg 5772:249–260 Bifet A, Gavaldá R (2009) Adaptive learning from evolving data streams. Advances in Intelligent Data Analysis VIII, Lecture Notes in Computer Science, Springer, Berlin Heidelberg 5772:249–260
38.
Zurück zum Zitat Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: massive online analysis. J Mach Learn Res 11:1601–1604 Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: massive online analysis. J Mach Learn Res 11:1601–1604
39.
Zurück zum Zitat Zhao Q (2005) Learning with data streams: an nntree based approach, Embedded and Ubiquitous Computing. In: T Enokido, L Yan , B Xiao, D Kim, Y Dai, L Yang (eds) Lecture Notes in Computer Science. Springer, Berlin, Heidelberg vol 3823, pp 519–528 Zhao Q (2005) Learning with data streams: an nntree based approach, Embedded and Ubiquitous Computing. In: T Enokido, L Yan , B Xiao, D Kim, Y Dai, L Yang (eds) Lecture Notes in Computer Science. Springer, Berlin, Heidelberg vol 3823, pp 519–528
40.
Zurück zum Zitat Heath David, Kasif Simon, Salzberg Steven (1993) Induction of oblique decision trees. J Arti Intell Res 2(2):1–32MATH Heath David, Kasif Simon, Salzberg Steven (1993) Induction of oblique decision trees. J Arti Intell Res 2(2):1–32MATH
Metadaten
Titel
A framework to induce more stable decision trees for pattern classification
verfasst von
Zahra Mirzamomen
Mohammad Reza Kangavari
Publikationsdatum
17.03.2016
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 4/2017
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-016-0542-2

Weitere Artikel der Ausgabe 4/2017

Pattern Analysis and Applications 4/2017 Zur Ausgabe