Skip to main content
Erschienen in: Evolutionary Intelligence 1/2022

03.01.2021 | Research Paper

A novel approach to build accurate and diverse decision tree forest

verfasst von: Archana R. Panhalkar, Dharmpal D. Doye

Erschienen in: Evolutionary Intelligence | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Decision tree is one of the best expressive classifiers in data mining. A decision tree is popular due to its simplicity and straightforward visualization capability for all types of datasets. Decision tree forest is an ensemble of decision trees. The prediction accuracy of the decision tree forest is more than a decision tree algorithm. Constant efforts are going on to create accurate and diverse trees in the decision tree forest. In this paper, we propose Tangent Weighted Decision Tree Forest (TWDForest), which is more accurate and diverse than random forest. The strength of this technique is that it uses a more accurate and uniform tangent weighting function to create a weighted decision tree forest. It also improves performance by taking opinions from previous trees to best fit the successor tree and avoids the toggling of the root node. Due to this novel approach, the decision trees from the forest are more accurate and diverse as compared to other decision forest algorithms. Experiments of this novel method are performed on 15 well known, publicly available UCI machine learning repository datasets of various sizes. The results of the TWDForest method demonstrate that the entire forest and decision trees produced in TWDForest have high prediction accuracy of 1–7% more than existing methods. TWDForest also creates more diverse trees than other forest algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Han J, Kamber M (2006) Data mining: concepts and techniques. In: Morgan Kaufmann Publishers, 3rd Edition, pp 223–357 Han J, Kamber M (2006) Data mining: concepts and techniques. In: Morgan Kaufmann Publishers, 3rd Edition, pp 223–357
2.
Zurück zum Zitat Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6:21–45CrossRef Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6:21–45CrossRef
3.
Zurück zum Zitat Kuncheva LI (2005) Using diversity measures for generating error-correcting output codes in classifier ensembles. Pattern Recogn Lett 26(1):83–90CrossRef Kuncheva LI (2005) Using diversity measures for generating error-correcting output codes in classifier ensembles. Pattern Recogn Lett 26(1):83–90CrossRef
4.
Zurück zum Zitat Shipp CA, Kuncheva LI (2002) Relationships between combination methods and measures of diversity in combining classifiers. Inform Fusion 3:135–148CrossRef Shipp CA, Kuncheva LI (2002) Relationships between combination methods and measures of diversity in combining classifiers. Inform Fusion 3:135–148CrossRef
6.
Zurück zum Zitat Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATH Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATH
7.
Zurück zum Zitat Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. Proc 13th Int Conf Mach Learn 96:148–156 Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. Proc 13th Int Conf Mach Learn 96:148–156
8.
Zurück zum Zitat Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20:832–844CrossRef Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20:832–844CrossRef
9.
Zurück zum Zitat Ho TK (1995) “Random decision forests”, In: Proceedings of 3rd international conference on document analysis and recognition, vol. 1, pp. 278–282. IEEE Ho TK (1995) “Random decision forests”, In: Proceedings of 3rd international conference on document analysis and recognition, vol. 1, pp. 278–282. IEEE
11.
Zurück zum Zitat Jinyan L, Huiqing Liu L (2003) “Ensembles of cascading trees”, In: Proceedings of the third IEEE international conference on Data Mining (ICDM 03), IEEE, pp. 9–17 Jinyan L, Huiqing Liu L (2003) “Ensembles of cascading trees”, In: Proceedings of the third IEEE international conference on Data Mining (ICDM 03), IEEE, pp. 9–17
12.
Zurück zum Zitat Hong H, Jiuyong L, Wang H, Daggard G, Shi M (2006) “A maximally diversified multiple decision tree algorithm for microarray data classification”, In: Proceedings of the Workshop on Intelligent Systems for Bioinformatics (WISB 2006), Conferences in Research and Practice in Information Technology (CRPIT), vol. 73 Hong H, Jiuyong L, Wang H, Daggard G, Shi M (2006) “A maximally diversified multiple decision tree algorithm for microarray data classification”, In: Proceedings of the Workshop on Intelligent Systems for Bioinformatics (WISB 2006), Conferences in Research and Practice in Information Technology (CRPIT), vol. 73
13.
Zurück zum Zitat Bernard S, Heutte L, Adam S (2008) “Forest-RK: A new random forest induction method”, In: Advanced Intelligent Computing Theories and Applications, Lecture Notes in Computer Science, pp. 430–437 Bernard S, Heutte L, Adam S (2008) “Forest-RK: A new random forest induction method”, In: Advanced Intelligent Computing Theories and Applications, Lecture Notes in Computer Science, pp. 430–437
14.
Zurück zum Zitat Maudes J, Rodriguez JJ, Osorio CG, Pedrajas NG (2012) Random feature weights for decision tree ensemble construction. Inform Fusion 13:20–30CrossRef Maudes J, Rodriguez JJ, Osorio CG, Pedrajas NG (2012) Random feature weights for decision tree ensemble construction. Inform Fusion 13:20–30CrossRef
15.
Zurück zum Zitat Islam Z, Giggins H (2011) Knowledge discovery through SysFor: a systematically developed forest of multiple decision trees. Proc 9th Austr Data Mining Conf 121:195–204 Islam Z, Giggins H (2011) Knowledge discovery through SysFor: a systematically developed forest of multiple decision trees. Proc 9th Austr Data Mining Conf 121:195–204
16.
Zurück zum Zitat Bernard S, Adam S, Heutte L (2012) Dynamic random forests. Pattern Recogn Lett 33:1580–1586CrossRef Bernard S, Adam S, Heutte L (2012) Dynamic random forests. Pattern Recogn Lett 33:1580–1586CrossRef
17.
Zurück zum Zitat Ye Y, Wu Q, Huang JZ, Ng MK, Li X (2014) Stratified sampling of feature subspace selection in random forests for high dimensional data. Pattern Recogn 46:769–787CrossRef Ye Y, Wu Q, Huang JZ, Ng MK, Li X (2014) Stratified sampling of feature subspace selection in random forests for high dimensional data. Pattern Recogn 46:769–787CrossRef
18.
Zurück zum Zitat Adnan M. N., Islam M. Z. (2016) “Forest CERN: A new decision forest building technique”, In: Proceedings of the 20th Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp. 304–315 Adnan M. N., Islam M. Z. (2016) “Forest CERN: A new decision forest building technique”, In: Proceedings of the 20th Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp. 304–315
19.
Zurück zum Zitat Adnan MN, Islam MZ (2016) Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm. Knowl-Based Syst 110:86–97CrossRef Adnan MN, Islam MZ (2016) Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm. Knowl-Based Syst 110:86–97CrossRef
20.
Zurück zum Zitat Adnan MN, Islam MZ (2017) Forest PA: Constructing a decision forest by penalizing attributes used in previous trees. Expert Syst Appl 89:389–403CrossRef Adnan MN, Islam MZ (2017) Forest PA: Constructing a decision forest by penalizing attributes used in previous trees. Expert Syst Appl 89:389–403CrossRef
21.
Zurück zum Zitat Breiman L, Friedman J, Olshen R, Stone C (1985) Classification and Regression Trees. U.S.A, Wadsworth International Group, CAMATH Breiman L, Friedman J, Olshen R, Stone C (1985) Classification and Regression Trees. U.S.A, Wadsworth International Group, CAMATH
22.
Zurück zum Zitat Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63:3–42CrossRef Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63:3–42CrossRef
23.
Zurück zum Zitat Quinlan JR (1993) C4.5: Programs Programming for Machine Learning. Morgan Kaufman, San Francisco Quinlan JR (1993) C4.5: Programs Programming for Machine Learning. Morgan Kaufman, San Francisco
24.
Zurück zum Zitat Munoz GM, Suarez A (2010) Out-of-bag estimation of the optimal sample size in bagging. Pattern Recogn 43:143–152CrossRef Munoz GM, Suarez A (2010) Out-of-bag estimation of the optimal sample size in bagging. Pattern Recogn 43:143–152CrossRef
25.
Zurück zum Zitat Kirby KN, Maraković NN (1995) Modeling myopic decisions: Evidence for hyperbolic delay-discounting within-subjects and amounts. Organ Behav Hum Decis Process 64(1):22–30CrossRef Kirby KN, Maraković NN (1995) Modeling myopic decisions: Evidence for hyperbolic delay-discounting within-subjects and amounts. Organ Behav Hum Decis Process 64(1):22–30CrossRef
26.
Zurück zum Zitat Vuchinich RE, Simpson CA (1998) Hyperbolic temporal discounting in social drinkers and problem drinkers. Exp Clin Psychopharmacol 6(3):292–305CrossRef Vuchinich RE, Simpson CA (1998) Hyperbolic temporal discounting in social drinkers and problem drinkers. Exp Clin Psychopharmacol 6(3):292–305CrossRef
27.
Zurück zum Zitat Beebe, H. F. Nelson, “The mathematical-function computation handbook - programming using the MathCW portable software library“, Springer International Publishing AG, 1st Edition, pp. 273–282. DOI:https://doi.org/10.1007/978-3-319-64110-2. ISBN 978–3–319–64109–62017 Beebe, H. F. Nelson, “The mathematical-function computation handbook - programming using the MathCW portable software library“, Springer International Publishing AG, 1st Edition, pp. 273–282. DOI:https://​doi.​org/​10.​1007/​978-3-319-64110-2. ISBN 978–3–319–64109–62017
28.
Zurück zum Zitat Tang EK, Suganthan PN, Yao X (2006) An analysis of diversity measures. Machine Learning 65:247–271CrossRef Tang EK, Suganthan PN, Yao X (2006) An analysis of diversity measures. Machine Learning 65:247–271CrossRef
29.
Zurück zum Zitat Margineantu DD, Dietterich TG (1997) “Pruning adaptive boosting”, In: Proceedings of the 14th International Conference on Machine Learning, pp. 211–218 Margineantu DD, Dietterich TG (1997) “Pruning adaptive boosting”, In: Proceedings of the 14th International Conference on Machine Learning, pp. 211–218
30.
Zurück zum Zitat Bhatnagar V, Bhardwaj M, Sharma S, Haroon S (2014) Accuracy-diversity based pruning of classifier ensembles. Prog Artif Intell 2:97–111CrossRef Bhatnagar V, Bhardwaj M, Sharma S, Haroon S (2014) Accuracy-diversity based pruning of classifier ensembles. Prog Artif Intell 2:97–111CrossRef
32.
Zurück zum Zitat Liu Y, Zhang L, Nie Yan Y, Rosenblum DS (2016) “Fortune teller: predicting your career path”, In: Thirtieth AAAI conference on artificial intelligence Liu Y, Zhang L, Nie Yan Y, Rosenblum DS (2016) “Fortune teller: predicting your career path”, In: Thirtieth AAAI conference on artificial intelligence
33.
Zurück zum Zitat Peersman C, Daelemans W, Van Vaerenbergh L (2011) ”Predicting age and gender in online social networks”, In: 3rd international workshop on Search and mining user-generated contents, pp. 37–44 Peersman C, Daelemans W, Van Vaerenbergh L (2011) ”Predicting age and gender in online social networks”, In: 3rd international workshop on Search and mining user-generated contents, pp. 37–44
34.
Zurück zum Zitat Bin Tareaf R, Berger P, Hennig P, Jung J, Meinel C (2017) “Identifying audience attributes: predicting age, gender and personality for enhanced article writing”, In: 2017 International Conference on Cloud and Big Data Computing, pp. 79–88 Bin Tareaf R, Berger P, Hennig P, Jung J, Meinel C (2017) “Identifying audience attributes: predicting age, gender and personality for enhanced article writing”, In: 2017 International Conference on Cloud and Big Data Computing, pp. 79–88
35.
Zurück zum Zitat Asim Y, Raza B, Malik AK, Shahaid AR, Alquhayz H (2019) “An adaptive model for identification of influential bloggers based on case-based reasoning using random forest”, IEEE Access, pp. 87732–87749. Asim Y, Raza B, Malik AK, Shahaid AR, Alquhayz H (2019) “An adaptive model for identification of influential bloggers based on case-based reasoning using random forest”, IEEE Access, pp. 87732–87749.
36.
Zurück zum Zitat Soonthornphisaj N, Sira-Aksorn T, Suksankawanich P (2018) “Social media comment management using smote and random forest algorithms”, In: 9th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 129–134, IEEE. Soonthornphisaj N, Sira-Aksorn T, Suksankawanich P (2018) “Social media comment management using smote and random forest algorithms”, In: 9th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 129–134, IEEE.
37.
Zurück zum Zitat Preoţiuc-Pietro D, Liu Y, Hopkins D, Ungar L (2017) “Beyond binary labels: political ideology prediction of twitter users”, In: 55th Annual Meeting of the Association for Computational Linguistics, Vol. 1, pp. 729–740 Preoţiuc-Pietro D, Liu Y, Hopkins D, Ungar L (2017) “Beyond binary labels: political ideology prediction of twitter users”, In: 55th Annual Meeting of the Association for Computational Linguistics, Vol. 1, pp. 729–740
38.
Zurück zum Zitat Liu Y, Zheng Y, Liang Y, Liu S, Rosenblum DS (2016) “Urban water quality prediction based on multi-task multi-view learning”, In: Proceedings of the 25th International Joint Conference on Artificial Intelligence, pp 2576- 2582 Liu Y, Zheng Y, Liang Y, Liu S, Rosenblum DS (2016) “Urban water quality prediction based on multi-task multi-view learning”, In: Proceedings of the 25th International Joint Conference on Artificial Intelligence, pp 2576- 2582
39.
Zurück zum Zitat Ying-Xun L, Lai C-F, Huang Y-M, Chao H-C (2013) “Multi-appliance recognition system with hybrid SVM/GMM classifier in ubiquitous smart home”, Information Sciences, Vol. 230, ISSN 0020-0255, pp. 39–55 Ying-Xun L, Lai C-F, Huang Y-M, Chao H-C (2013) “Multi-appliance recognition system with hybrid SVM/GMM classifier in ubiquitous smart home”, Information Sciences, Vol. 230, ISSN 0020-0255, pp. 39–55
40.
Zurück zum Zitat Cafri G, Li L, Paxton EW, Fan J (2018) Predicting risk for adverse health events using random forest. J Appl Stat 45(12):2279–2294MathSciNetCrossRef Cafri G, Li L, Paxton EW, Fan J (2018) Predicting risk for adverse health events using random forest. J Appl Stat 45(12):2279–2294MathSciNetCrossRef
41.
Zurück zum Zitat Iwendi C, Bashir AK, Peshkar A, Sujatha R, Chatterjee JM, Pasupuleti S, Mishra R, Pillai S, Jo O (2020) COVID-19 patient health prediction using boosted random forest algorithm. Front Public Health 8:357CrossRef Iwendi C, Bashir AK, Peshkar A, Sujatha R, Chatterjee JM, Pasupuleti S, Mishra R, Pillai S, Jo O (2020) COVID-19 patient health prediction using boosted random forest algorithm. Front Public Health 8:357CrossRef
42.
Zurück zum Zitat Malki Z, Atlam ES, Hassanien AE, Dagnew G, Elhosseini MA, Gad I (2020) “Association between weather data and COVID-19 pandemic predicting mortality rate: Machine learning approaches”, Chaos, Solitons & Fractals, pp.110–137 Malki Z, Atlam ES, Hassanien AE, Dagnew G, Elhosseini MA, Gad I (2020) “Association between weather data and COVID-19 pandemic predicting mortality rate: Machine learning approaches”, Chaos, Solitons & Fractals, pp.110–137
Metadaten
Titel
A novel approach to build accurate and diverse decision tree forest
verfasst von
Archana R. Panhalkar
Dharmpal D. Doye
Publikationsdatum
03.01.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
Evolutionary Intelligence / Ausgabe 1/2022
Print ISSN: 1864-5909
Elektronische ISSN: 1864-5917
DOI
https://doi.org/10.1007/s12065-020-00519-0

Weitere Artikel der Ausgabe 1/2022

Evolutionary Intelligence 1/2022 Zur Ausgabe

Premium Partner