nach oben

Erschienen in:

2016 | OriginalPaper | Buchkapitel

An Outlier Ranking Tree Selection Approach to Extreme Pruning of Random Forests

verfasst von : Khaled Fawagreh, Mohamed Medhat Gaber, Eyad Elyan

Erschienen in: Engineering Applications of Neural Networks

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Random Forest (RF) is an ensemble classification technique that was developed by Breiman over a decade ago. Compared with other ensemble techniques, it has proved its accuracy and superiority. Many researchers, however, believe that there is still room for enhancing and improving its performance in terms of predictive accuracy. This explains why, over the past decade, there have been many extensions of RF where each extension employed a variety of techniques and strategies to improve certain aspect(s) of RF. Since it has been proven empirically that ensembles tend to yield better results when there is a significant diversity among the constituent models, the objective of this paper is twofold. First, it investigates how an unsupervised learning technique, namely, Local Outlier Factor (LOF) can be used to identify diverse trees in the RF. Second, trees with the highest LOF scores are then used to create a new RF termed LOFB-DRF that is much smaller in size than RF, and yet performs at least as good as RF, but mostly exhibits higher performance in terms of accuracy. The latter refers to a known technique called ensemble pruning. Experimental results on 10 real datasets prove the superiority of our proposed method over the traditional RF. Unprecedented pruning levels reaching as high as 99 % have been achieved at the time of boosting the predictive accuracy of the ensemble. The notably extreme pruning level makes the technique a good candidate for real-time applications.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Accelerated Optimal Topology Search for Two-Hidden-Layer Feedforward Neural Networks

Nächstes Kapitel Lower Bounds on Complexity of Shallow Perceptron Networks

Garcıa Adeva, J.J., Beresi, U., Calvo, R.: Accuracy and diversity in ensembles of text categorisers. CLEI Electron. J. 9(1), 1–12 (2005)

Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Comput. 9(7), 1545–1588 (1997)CrossRef

Arning, A., Agrawal, R., Raghavan, P.: A linear method for deviation detection in large databases. In: KDD, pp. 164–169 (1996)

Bache, K., Lichman, M.: Uci machine learning repository (2013)

Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)MathSciNetMATH

Breiman, L.: Stacked regressions. Mach. Learn. 24(1), 49–64 (1996)MathSciNetMATH

Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)MathSciNetCrossRefMATH

Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. In: ACM Sigmod Record, vol. 29, pp. 93–104. ACM (2000)

Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. Inf. Fusion 6(1), 5–20 (2005)CrossRef

10.

Kriegel, H.-P., Kroger, P., Schubert, E., Zimek, A.: Interpreting and unifying outlier scores. In: 11th SIAM International Conference on Data Mining (SDM), Mesa, AZ (2011)

11.

Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15(1), 3133–3181 (2014)MathSciNetMATH

12.

Fleiss, J.L., Levin, B., Paik, M.C.: Statistical Methods for Rates and Proportions. Wiley, New York (2013)MATH

13.

Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)MathSciNetCrossRefMATH

14.

Giacinto, G., Roli, F.: Design of effective neural network ensembles for image classification purposes. Image Vis. Comput. 19(9), 699–707 (2001)CrossRef

15.

Ho, T.K.: Random decision forests. In: 1995 Proceedings of the Third International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282. IEEE (1995)

16.

Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)CrossRef

17.

Knorr, E.M., Ng, R.T.: Finding intensional knowledge of distancebased outliers. VLDB 99, 211–222 (1999)

18.

Knox, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the International Conference on Very Large Data Bases. Citeseer (1998)

19.

Kohavi, R., Wolpert, D.H., et al.: Bias plus variance decomposition for zero-one loss functions. In: ICML, pp. 275–283 (1996)

20.

Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)CrossRefMATH

21.

Maclin, R., Opitz, D.: Popular ensemble methods: an empirical study. J. Artif. Intell. Res. 11(1–2), 169–198 (1999)MATH

22.

Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: ICML, vol. 97, pp. 211–218. Citeseer (1997)

23.

Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H., Hall, M., Frank, E.: The WEKA data mining software: an update. SIGKDD Explor. Newslett. 11(1), 10–18 (2009)CrossRef

24.

Martínez-Muñoz, G., Suárez, A.: Pruning in ordered bagging ensembles. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 609–616. ACM (2006)

25.

Martinez-Muoz, G., Hernández-Lobato, D., Suárez, A.: An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 245–259 (2009)CrossRef

26.

Partridge, D., Krzanowski, W.: Software diversity: practical statistics for its measurement and exploitation. Inf. Softw. Technol. 39(10), 707–717 (1997)CrossRef

27.

Partridge, D., Yates, W.B.: Engineering multiversion neural-net systems. Neural Comput. 8(4), 869–893 (1996)CrossRef

28.

Polikar, R.: Ensemble based systems in decision making. IEEE Circ. Syst. Mag. 6(3), 21–45 (2006)CrossRef

29.

Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1–2), 1–39 (2010)CrossRef

30.

Ruts, I., Rousseeuw, P.J.: Computing depth contours of bivariate point clouds. Comput. Stat. Data Anal. 23(1), 153–168 (1996)CrossRefMATH

31.

Schubert, E., Wojdanowski, R., Zimek, A., Kriegel, H.-P.: On evaluation of outlier rankings and outlier scores. In: 2012 Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, pp. 1047–1058 (2012)

32.

Skalak, D.B.: The sources of increased accuracy for two proposed boosting algorithms. In: Proceedings of American Association for Artificial Intelligence, AAAI-96, Integrating Multiple Learned Models Workshop, vol. 1129, p. 1133. Citeseer (1996)

33.

Smyth, P., Wolpert, D.: Linearly combining density estimators via stacking. Mach. Learn. 36(1–2), 59–83 (1999)CrossRef

34.

Tang, E.K., Suganthan, P.N., Yao, X.: An analysis of diversity measures. Mach. Learn. 65(1), 247–271 (2006)CrossRef

35.

Tsoumakas, G., Partalas, I., Vlahavas, I.: An ensemble pruning primer. In: Okun, O., Valentini, G. (eds.) Applications of Supervised and Unsupervised Ensemble Methods. SCI, pp. 1–13. Springer, Heidelberg (2009)CrossRef

36.

Williams, G.: Use R: Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery. Springer, New York (2011)MATH

37.

Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)MathSciNetCrossRef

38.

Yan, W., Goebel, K.F.: Designing classifier ensembles with constrained performance requirements. In: Defense and Security. International Society for Optics and Photonics, pp. 59–68 (2004)

39.

Yang, Y., Korb, K.B., Ting, K.M., Webb, G.I.: Ensemble selection for superparent-one-dependence estimators. In: Zhang, S., Jarvis, R.A. (eds.) AI 2005. LNCS (LNAI), vol. 3809, pp. 102–112. Springer, Heidelberg (2005)CrossRef

Titel: An Outlier Ranking Tree Selection Approach to Extreme Pruning of Random Forests
verfasst von: Khaled Fawagreh
Mohamed Medhat Gaber
Eyad Elyan
Verlag: Springer International Publishing
Buch: Engineering Applications of Neural Networks
Print ISBN: 978-3-319-44187-0

Electronic ISBN: 978-3-319-44188-7

Copyright-Jahr: 2016
DOI: https://doi.org/10.1007/978-3-319-44188-7_20

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner