Top

Published in:

2020 | OriginalPaper | Chapter

Random Forests with a Steepend Gini-Index Split Function and Feature Coherence Injection

Authors : Mandlenkosi Victor Gwetu, Jules-Raymond Tapamo, Serestina Viriri

Published in: Machine Learning for Networking

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Although Random Forests (RFs) are an effective and scalable ensemble machine learning approach, they are highly dependent on the discriminative ability of the available individual features. Since most data mining problems occur in the context of pre-existing data, there is little room to choose the original input features. Individual RF decision trees follow a greedy algorithm that iteratively selects the feature with the highest potential for achieving subsample purity. Common heuristics for ranking this potential include the gini-index and information gain metrics. This study seeks to improve the effectiveness of RFs through an adapted gini-index splitting function and a feature engineering technique. Using a structured framework for comparative evaluation of RFs, the study demonstrates that the effectiveness of the proposed methods is comparable with conventional gini-index based RFs. Improvements in the minimum accuracy recorded over some UCI data sets, demonstrate the potential for a hybrid set of splitting functions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter MLP4NIDS: An Efficient MLP-Based Network Intrusion Detection for CICIDS2017 Dataset

next chapter Emotion-Based Adaptive Learning Systems

The few misclassifications that individual classifiers make, are in different contexts.

Higher scores are achieved on impure data sets, so it can be seen as measuring impurity.

Since this data set was used for parameter tuning, final evaluation is mainly based on other data sets to ensure an unbiased experimental context.

Each node has at most two child nodes and each node has at most two classes. The permutations of nodes with more than two classes were not explored due to the computational overhead of computing them.

The sonar data set is used for parameter tuning.

M is the number of attributes used to represent each instance in the data set.

Bader-El-Den, M.: Self-adaptive heterogeneous random forest. In: 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA), pp. 640–646. IEEE (2014)

Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)CrossRef

Bernard, S., Heutte, L., Adam, S.: Forest-RK: a new random forest induction method. In: Huang, D.-S., Wunsch, D.C., Levine, D.S., Jo, K.-H. (eds.) ICIC 2008. LNCS (LNAI), vol. 5227, pp. 430–437. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85984-0_52CrossRef

Bischof, C.: Parallel Computing: Architectures, Algorithms, and Applications, vol. 15. IOS Press, Amsterdam (2008)

Bottema, M.J.: Circularity of objects in images. In: 2000 Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP 2000, vol. 4, pp. 2247–2250. IEEE (2000)

Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRef

Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Wadsworth, Belmont (1984) MATH

Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press, Boca Raton (1984)MATH

Carroll, T.A., Pinnick, H.A., Carroll, W.E.: Probability and the westgard rules. Ann. Clin. Lab. Sci. 33(1), 113–114 (2003)

10.

Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7(Jan), 1–30 (2006)MathSciNetMATH

11.

Fawagreh, K., Gaber, M.M., Elyan, E.: Random forests: from early developments to recent advancements. Syst. Sci. Control Eng.: Open Access J. 2(1), 602–609 (2014)CrossRef

12.

Grahn, H., Lavesson, N., Lapajne, M.H., Slat, D.: CudaRF: a CUDA-based implementation of random forests. In: 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), pp. 95–101. IEEE (2011)

13.

Guo, H.: A simple algorithm for fitting a gaussian function. IEEE Signal Process. Mag. 28(5), 134–137 (2011)CrossRef

14.

Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001). https://doi.org/10.1007/978-0-387-21606-5CrossRefMATH

15.

Heaton, J.: An empirical analysis of feature engineering for predictive modeling. In: SoutheastCon 2016, pp. 1–6. IEEE (2016)

16.

Kotsiantis, S.B., Zaharakis, I., Pintelas, P.: Supervised machine learning: a review of classification techniques (2007)

17.

Kulkarni, V.Y., Sinha, P.K.: Random forest classifiers: a survey and future research directions. Int. J. Adv. Comput. 36(1), 1144–1153 (2013)

18.

Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)CrossRef

19.

Newman, C.B.D., Merz, C.: UCI repository of machine learning databases (1998). http://www.ics.uci.edu/~mlearn/MLRepository.html

20.

Pagallo, G.: Learning DNF by decision trees. In: IJCAI, vol. 89, pp. 639–644 (1989)

21.

Quinlan, J.: Building classification models: Id3 i c4. 5. Dane udostepnione pod adresem: http://yoda.cis.temple.edu (1993). 8080

22.

Raileanu, L.E., Stoffel, K.: Theoretical comparison between the gini index and information gain criteria. Ann. Math. Artif. Intell. 41(1), 77–93 (2004). https://doi.org/10.1023/B:AMAI.0000018580.96245.c6MathSciNetCrossRefMATH

23.

Robnik-Šikonja, M.: Improving random forests. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 359–370. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_34CrossRef

24.

Rokach, L., Maimon, O.: Decision trees. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 165–192. Springer, Boston (2005). https://doi.org/10.1007/0-387-25465-X_9CrossRefMATH

25.

Singh, S., Gupta, P.: Comparative study ID3, cart and C4. 5 decision tree algorithm: a survey. Int. J. Adv. Inf. Sci. Technol. (IJAIST) 27, 97–103 (2014)

26.

Siroky, D.S., et al.: Navigating random forests and related advances in algorithmic modeling. Stat. Surv. 3, 147–163 (2009)MathSciNetCrossRef

27.

Sondhi, P.: Feature construction methods: a survey. sifaka. cs. uiuc. edu 69, 70–71 (2009)

28.

de Sousa, J.M., Pereira, E.T., Veloso, L.R.: A robust music genre classification approach for global and regional music datasets evaluation. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 109–113 (2016). https://doi.org/10.1109/ICDSP.2016.7868526

29.

Tague, N.R.: The Quality Toolbox, vol. 600. ASQ Quality Press, Milwaukee (2005)

30.

Teknomo, K.: Tutorial on decision tree (2009). http://people.revoledu.com/kardi/tutorial/decisiontree

31.

Timofeev, R.: Classification and regression trees (cart) theory and applications. Ph.D. thesis, Humboldt University, Berlin (2004)

32.

Touw, W.G., et al.: Data mining in the life sciences with random forest: a walk in the park or lost in the jungle? Brief. Bioinform. 14, 315–326 (2012). https://doi.org/10.1093/bib/bbs034CrossRef

33.

Vanwinckelen, G., Blockeel, H.: On estimating model accuracy with repeated cross-validation. In: BeneLearn 2012: Proceedings of the 21st Belgian-Dutch Conference on Machine Learning, pp. 39–44 (2012)

34.

Zhou, S., Chen, Q., Wang, X.: Active deep networks for semi-supervised sentiment classification. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 1515–1523. Association for Computational Linguistics (2010)

Title: Random Forests with a Steepend Gini-Index Split Function and Feature Coherence Injection
Authors: Mandlenkosi Victor Gwetu
Jules-Raymond Tapamo
Serestina Viriri
Publisher: Springer International Publishing
Book: Machine Learning for Networking
Print ISBN: 978-3-030-45777-8

Electronic ISBN: 978-3-030-45778-5

Copyright Year: 2020
DOI: https://doi.org/10.1007/978-3-030-45778-5_17

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner