Skip to main content
Top

2020 | OriginalPaper | Chapter

Random Forests with a Steepend Gini-Index Split Function and Feature Coherence Injection

Authors : Mandlenkosi Victor Gwetu, Jules-Raymond Tapamo, Serestina Viriri

Published in: Machine Learning for Networking

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Although Random Forests (RFs) are an effective and scalable ensemble machine learning approach, they are highly dependent on the discriminative ability of the available individual features. Since most data mining problems occur in the context of pre-existing data, there is little room to choose the original input features. Individual RF decision trees follow a greedy algorithm that iteratively selects the feature with the highest potential for achieving subsample purity. Common heuristics for ranking this potential include the gini-index and information gain metrics. This study seeks to improve the effectiveness of RFs through an adapted gini-index splitting function and a feature engineering technique. Using a structured framework for comparative evaluation of RFs, the study demonstrates that the effectiveness of the proposed methods is comparable with conventional gini-index based RFs. Improvements in the minimum accuracy recorded over some UCI data sets, demonstrate the potential for a hybrid set of splitting functions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
The few misclassifications that individual classifiers make, are in different contexts.
 
2
Higher scores are achieved on impure data sets, so it can be seen as measuring impurity.
 
3
Since this data set was used for parameter tuning, final evaluation is mainly based on other data sets to ensure an unbiased experimental context.
 
4
Each node has at most two child nodes and each node has at most two classes. The permutations of nodes with more than two classes were not explored due to the computational overhead of computing them.
 
5
The sonar data set is used for parameter tuning.
 
6
M is the number of attributes used to represent each instance in the data set.
 
Literature
1.
go back to reference Bader-El-Den, M.: Self-adaptive heterogeneous random forest. In: 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA), pp. 640–646. IEEE (2014) Bader-El-Den, M.: Self-adaptive heterogeneous random forest. In: 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA), pp. 640–646. IEEE (2014)
2.
go back to reference Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)CrossRef Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)CrossRef
4.
go back to reference Bischof, C.: Parallel Computing: Architectures, Algorithms, and Applications, vol. 15. IOS Press, Amsterdam (2008) Bischof, C.: Parallel Computing: Architectures, Algorithms, and Applications, vol. 15. IOS Press, Amsterdam (2008)
5.
go back to reference Bottema, M.J.: Circularity of objects in images. In: 2000 Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP 2000, vol. 4, pp. 2247–2250. IEEE (2000) Bottema, M.J.: Circularity of objects in images. In: 2000 Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP 2000, vol. 4, pp. 2247–2250. IEEE (2000)
7.
go back to reference Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Wadsworth, Belmont (1984) MATH Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Wadsworth, Belmont (1984) MATH
8.
go back to reference Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press, Boca Raton (1984)MATH Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press, Boca Raton (1984)MATH
9.
go back to reference Carroll, T.A., Pinnick, H.A., Carroll, W.E.: Probability and the westgard rules. Ann. Clin. Lab. Sci. 33(1), 113–114 (2003) Carroll, T.A., Pinnick, H.A., Carroll, W.E.: Probability and the westgard rules. Ann. Clin. Lab. Sci. 33(1), 113–114 (2003)
10.
go back to reference Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7(Jan), 1–30 (2006)MathSciNetMATH Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7(Jan), 1–30 (2006)MathSciNetMATH
11.
go back to reference Fawagreh, K., Gaber, M.M., Elyan, E.: Random forests: from early developments to recent advancements. Syst. Sci. Control Eng.: Open Access J. 2(1), 602–609 (2014)CrossRef Fawagreh, K., Gaber, M.M., Elyan, E.: Random forests: from early developments to recent advancements. Syst. Sci. Control Eng.: Open Access J. 2(1), 602–609 (2014)CrossRef
12.
go back to reference Grahn, H., Lavesson, N., Lapajne, M.H., Slat, D.: CudaRF: a CUDA-based implementation of random forests. In: 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), pp. 95–101. IEEE (2011) Grahn, H., Lavesson, N., Lapajne, M.H., Slat, D.: CudaRF: a CUDA-based implementation of random forests. In: 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), pp. 95–101. IEEE (2011)
13.
go back to reference Guo, H.: A simple algorithm for fitting a gaussian function. IEEE Signal Process. Mag. 28(5), 134–137 (2011)CrossRef Guo, H.: A simple algorithm for fitting a gaussian function. IEEE Signal Process. Mag. 28(5), 134–137 (2011)CrossRef
15.
go back to reference Heaton, J.: An empirical analysis of feature engineering for predictive modeling. In: SoutheastCon 2016, pp. 1–6. IEEE (2016) Heaton, J.: An empirical analysis of feature engineering for predictive modeling. In: SoutheastCon 2016, pp. 1–6. IEEE (2016)
16.
go back to reference Kotsiantis, S.B., Zaharakis, I., Pintelas, P.: Supervised machine learning: a review of classification techniques (2007) Kotsiantis, S.B., Zaharakis, I., Pintelas, P.: Supervised machine learning: a review of classification techniques (2007)
17.
go back to reference Kulkarni, V.Y., Sinha, P.K.: Random forest classifiers: a survey and future research directions. Int. J. Adv. Comput. 36(1), 1144–1153 (2013) Kulkarni, V.Y., Sinha, P.K.: Random forest classifiers: a survey and future research directions. Int. J. Adv. Comput. 36(1), 1144–1153 (2013)
18.
go back to reference Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)CrossRef Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)CrossRef
20.
go back to reference Pagallo, G.: Learning DNF by decision trees. In: IJCAI, vol. 89, pp. 639–644 (1989) Pagallo, G.: Learning DNF by decision trees. In: IJCAI, vol. 89, pp. 639–644 (1989)
25.
go back to reference Singh, S., Gupta, P.: Comparative study ID3, cart and C4. 5 decision tree algorithm: a survey. Int. J. Adv. Inf. Sci. Technol. (IJAIST) 27, 97–103 (2014) Singh, S., Gupta, P.: Comparative study ID3, cart and C4. 5 decision tree algorithm: a survey. Int. J. Adv. Inf. Sci. Technol. (IJAIST) 27, 97–103 (2014)
26.
go back to reference Siroky, D.S., et al.: Navigating random forests and related advances in algorithmic modeling. Stat. Surv. 3, 147–163 (2009)MathSciNetCrossRef Siroky, D.S., et al.: Navigating random forests and related advances in algorithmic modeling. Stat. Surv. 3, 147–163 (2009)MathSciNetCrossRef
27.
go back to reference Sondhi, P.: Feature construction methods: a survey. sifaka. cs. uiuc. edu 69, 70–71 (2009) Sondhi, P.: Feature construction methods: a survey. sifaka. cs. uiuc. edu 69, 70–71 (2009)
29.
go back to reference Tague, N.R.: The Quality Toolbox, vol. 600. ASQ Quality Press, Milwaukee (2005) Tague, N.R.: The Quality Toolbox, vol. 600. ASQ Quality Press, Milwaukee (2005)
31.
go back to reference Timofeev, R.: Classification and regression trees (cart) theory and applications. Ph.D. thesis, Humboldt University, Berlin (2004) Timofeev, R.: Classification and regression trees (cart) theory and applications. Ph.D. thesis, Humboldt University, Berlin (2004)
33.
go back to reference Vanwinckelen, G., Blockeel, H.: On estimating model accuracy with repeated cross-validation. In: BeneLearn 2012: Proceedings of the 21st Belgian-Dutch Conference on Machine Learning, pp. 39–44 (2012) Vanwinckelen, G., Blockeel, H.: On estimating model accuracy with repeated cross-validation. In: BeneLearn 2012: Proceedings of the 21st Belgian-Dutch Conference on Machine Learning, pp. 39–44 (2012)
34.
go back to reference Zhou, S., Chen, Q., Wang, X.: Active deep networks for semi-supervised sentiment classification. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 1515–1523. Association for Computational Linguistics (2010) Zhou, S., Chen, Q., Wang, X.: Active deep networks for semi-supervised sentiment classification. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 1515–1523. Association for Computational Linguistics (2010)
Metadata
Title
Random Forests with a Steepend Gini-Index Split Function and Feature Coherence Injection
Authors
Mandlenkosi Victor Gwetu
Jules-Raymond Tapamo
Serestina Viriri
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-45778-5_17

Premium Partner