Skip to main content

2015 | OriginalPaper | Buchkapitel

A New Proposal for Tree Model Selection and Visualization

verfasst von : Carmela Iorio, Massimo Aria, Antonio D’Ambrosio

Erschienen in: Advances in Statistical Models for Data Analysis

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The most common approach to build a decision tree is based on a two-step procedure: growing a full tree and then prune it back. The goal is to identify the tree with the lowest error rate. Alternative pruning criteria have been proposed in literature. Within the framework of recursive partitioning algorithms by tree-based methods, this paper provides a contribution on both the visual representation of the data partition in a geometrical space and the selection of the decision tree. In our visual approach the identification of the best tree and of the weakest links is immediately evaluable by the graphical analysis of the tree structure without considering the pruning sequence. The results in terms of error rate are really similar to the ones returned by the classification and regression trees (CART) procedure, showing how this new way to select the best tree is a valid alternative to the well-known cost-complexity pruning.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ankerst, M., Ester, M., Kriegel, H.P.: Towards an effective cooperation of the computer and the user for classificaton. In: Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining, Boston, pp. 178–188 (2000) Ankerst, M., Ester, M., Kriegel, H.P.: Towards an effective cooperation of the computer and the user for classificaton. In: Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining, Boston, pp. 178–188 (2000)
2.
Zurück zum Zitat Ankerst, M., Keim, D.A., Kriegel, H.P.: Circle segments: a technique for visually exploring large multidimensional datasets. In: Proceedings of IEEE Visualization, Hot Topic Session, Sab Francisco (1996) Ankerst, M., Keim, D.A., Kriegel, H.P.: Circle segments: a technique for visually exploring large multidimensional datasets. In: Proceedings of IEEE Visualization, Hot Topic Session, Sab Francisco (1996)
3.
Zurück zum Zitat Apté, C., Weiss, S.: Data mining with decision trees and decision rules. Future Gener. Comput. Syst. 13, 197–210 (1997)CrossRef Apté, C., Weiss, S.: Data mining with decision trees and decision rules. Future Gener. Comput. Syst. 13, 197–210 (1997)CrossRef
4.
Zurück zum Zitat Aria, M., Siciliano, R.: Learning from trees: two-stage enhancements. In: Proceedings of Classification and Data Analysis Group (CLADAG 2003), Cleub, pp. 22–24 (2003) Aria, M., Siciliano, R.: Learning from trees: two-stage enhancements. In: Proceedings of Classification and Data Analysis Group (CLADAG 2003), Cleub, pp. 22–24 (2003)
5.
Zurück zum Zitat Barlow, S.T., Neville, P.A.: Comparison of 2-D visualization of hierarchies. In: Proceedings of the IEEE Symposium on Information Visualization, San Diego, pp. 131–138 (2001) Barlow, S.T., Neville, P.A.: Comparison of 2-D visualization of hierarchies. In: Proceedings of the IEEE Symposium on Information Visualization, San Diego, pp. 131–138 (2001)
6.
Zurück zum Zitat Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)MATH Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)MATH
7.
Zurück zum Zitat Cappelli, C., Mola, F., Siciliano, R.: An alternative pruning method based on the impurity-complexity measure. In: Rayne, R., Green, P. (eds.) Proceedings in Computational Statistics 13th Symposium, pp. 221–226. Springer, New York (1998) Cappelli, C., Mola, F., Siciliano, R.: An alternative pruning method based on the impurity-complexity measure. In: Rayne, R., Green, P. (eds.) Proceedings in Computational Statistics 13th Symposium, pp. 221–226. Springer, New York (1998)
8.
Zurück zum Zitat Esposito, F., Malerba, D., Semeraro, G., Kay, J.: A comparative analysis of methods for pruning decision trees. IEEE Trans. Pattern Anal. Mach. Intell. 19, 476–491 (1997)CrossRef Esposito, F., Malerba, D., Semeraro, G., Kay, J.: A comparative analysis of methods for pruning decision trees. IEEE Trans. Pattern Anal. Mach. Intell. 19, 476–491 (1997)CrossRef
9.
Zurück zum Zitat Fayyad, U.M., Grinstein, G., Wierse, A.: Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann Publishers, San Francisco (2002) Fayyad, U.M., Grinstein, G., Wierse, A.: Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann Publishers, San Francisco (2002)
10.
Zurück zum Zitat Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2009) Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2009)
11.
Zurück zum Zitat Kass, G.V.: An exploratory technique for investigating large quantities of categorical data. J. Appl. Stat. 29, 119–127 (1980)CrossRef Kass, G.V.: An exploratory technique for investigating large quantities of categorical data. J. Appl. Stat. 29, 119–127 (1980)CrossRef
12.
Zurück zum Zitat Liu, Y., Salvendy, G.: Design and evaluation of visualization support to facilitate decision trees classifications. Int. J. Hum. Comput. Stud. 65, 95–110 (2007)CrossRef Liu, Y., Salvendy, G.: Design and evaluation of visualization support to facilitate decision trees classifications. Int. J. Hum. Comput. Stud. 65, 95–110 (2007)CrossRef
13.
Zurück zum Zitat Messenger, R., Mandell, L.: A modal search technique for predictive nominal scale multivariate analysis. J. Am. Stat. Assoc. 67, 768–772 (1972) Messenger, R., Mandell, L.: A modal search technique for predictive nominal scale multivariate analysis. J. Am. Stat. Assoc. 67, 768–772 (1972)
14.
Zurück zum Zitat Mola, F., Siciliano, R.: A fast splitting procedures for classification and regression trees. Stat. Comput. 7, 208–216 (1997)CrossRef Mola, F., Siciliano, R.: A fast splitting procedures for classification and regression trees. Stat. Comput. 7, 208–216 (1997)CrossRef
15.
Zurück zum Zitat Morgan, J.N., Messenger, R.C.: THAID a Sequential Analysis Program for Analysis of Nominal Scale Dependent Variables. Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor (1973) Morgan, J.N., Messenger, R.C.: THAID a Sequential Analysis Program for Analysis of Nominal Scale Dependent Variables. Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor (1973)
16.
Zurück zum Zitat Morgan, J.N., Sonquist, J.A.: Problems in the analysis of survey data and a proposal. J. Am. Stat. Assoc. 58, 415–434 (1963)CrossRefMATH Morgan, J.N., Sonquist, J.A.: Problems in the analysis of survey data and a proposal. J. Am. Stat. Assoc. 58, 415–434 (1963)CrossRefMATH
17.
Zurück zum Zitat Quinlan, J.R.: Discovering rules by induction from large collections of examples. In: Michie, D. (ed.) Expert Systems in the Micro Electronic AgeSoftware Pioneers, pp. 168–201. Edinburgh University Press, Edinburgh (1979) Quinlan, J.R.: Discovering rules by induction from large collections of examples. In: Michie, D. (ed.) Expert Systems in the Micro Electronic AgeSoftware Pioneers, pp. 168–201. Edinburgh University Press, Edinburgh (1979)
18.
Zurück zum Zitat Quinlan, J.R.: Simplifying decision trees. Int. J. Man Mach. Stud. 27, 221–234 (1987)CrossRef Quinlan, J.R.: Simplifying decision trees. Int. J. Man Mach. Stud. 27, 221–234 (1987)CrossRef
19.
Zurück zum Zitat Quinlan, J.R.: C.4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993) Quinlan, J.R.: C.4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
20.
Zurück zum Zitat Shneiderman, B.: Tree visualization with tree-maps: 2-d space. J. ACM Trans. Graphs (TOG) 11, 92–99 (1992)CrossRefMATH Shneiderman, B.: Tree visualization with tree-maps: 2-d space. J. ACM Trans. Graphs (TOG) 11, 92–99 (1992)CrossRefMATH
21.
Zurück zum Zitat Siciliano, R., Aria, M.: TWO-CLASS trees for non parametric regression analysis. In: Fichet, B., Piccolo, D., Verde, R., Vichi, M. (eds.) Classification and Multivariate Analysis for Complex Data Structures. Series of Studies in Classification, Data Analysis and Knowledge Organizations, pp. 63–71. Springer, Heidelberg (2011) Siciliano, R., Aria, M.: TWO-CLASS trees for non parametric regression analysis. In: Fichet, B., Piccolo, D., Verde, R., Vichi, M. (eds.) Classification and Multivariate Analysis for Complex Data Structures. Series of Studies in Classification, Data Analysis and Knowledge Organizations, pp. 63–71. Springer, Heidelberg (2011)
22.
Zurück zum Zitat Siciliano, R., Aria, M., D’Ambrosio, A.: Posterior prediction modelling of optimal trees. In: Brito, P. (ed.), Proceedings in Computational Statistics (COMPSTAT 2008), 18th Symposium, pp. 323–334. Springer, New York (2008) Siciliano, R., Aria, M., D’Ambrosio, A.: Posterior prediction modelling of optimal trees. In: Brito, P. (ed.), Proceedings in Computational Statistics (COMPSTAT 2008), 18th Symposium, pp. 323–334. Springer, New York (2008)
Metadaten
Titel
A New Proposal for Tree Model Selection and Visualization
verfasst von
Carmela Iorio
Massimo Aria
Antonio D’Ambrosio
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-17377-1_16

Premium Partner