Skip to main content

2016 | OriginalPaper | Buchkapitel

An Outlier Ranking Tree Selection Approach to Extreme Pruning of Random Forests

verfasst von : Khaled Fawagreh, Mohamed Medhat Gaber, Eyad Elyan

Erschienen in: Engineering Applications of Neural Networks

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Random Forest (RF) is an ensemble classification technique that was developed by Breiman over a decade ago. Compared with other ensemble techniques, it has proved its accuracy and superiority. Many researchers, however, believe that there is still room for enhancing and improving its performance in terms of predictive accuracy. This explains why, over the past decade, there have been many extensions of RF where each extension employed a variety of techniques and strategies to improve certain aspect(s) of RF. Since it has been proven empirically that ensembles tend to yield better results when there is a significant diversity among the constituent models, the objective of this paper is twofold. First, it investigates how an unsupervised learning technique, namely, Local Outlier Factor (LOF) can be used to identify diverse trees in the RF. Second, trees with the highest LOF scores are then used to create a new RF termed LOFB-DRF that is much smaller in size than RF, and yet performs at least as good as RF, but mostly exhibits higher performance in terms of accuracy. The latter refers to a known technique called ensemble pruning. Experimental results on 10 real datasets prove the superiority of our proposed method over the traditional RF. Unprecedented pruning levels reaching as high as 99 % have been achieved at the time of boosting the predictive accuracy of the ensemble. The notably extreme pruning level makes the technique a good candidate for real-time applications.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Garcıa Adeva, J.J., Beresi, U., Calvo, R.: Accuracy and diversity in ensembles of text categorisers. CLEI Electron. J. 9(1), 1–12 (2005) Garcıa Adeva, J.J., Beresi, U., Calvo, R.: Accuracy and diversity in ensembles of text categorisers. CLEI Electron. J. 9(1), 1–12 (2005)
2.
Zurück zum Zitat Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Comput. 9(7), 1545–1588 (1997)CrossRef Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Comput. 9(7), 1545–1588 (1997)CrossRef
3.
Zurück zum Zitat Arning, A., Agrawal, R., Raghavan, P.: A linear method for deviation detection in large databases. In: KDD, pp. 164–169 (1996) Arning, A., Agrawal, R., Raghavan, P.: A linear method for deviation detection in large databases. In: KDD, pp. 164–169 (1996)
4.
Zurück zum Zitat Bache, K., Lichman, M.: Uci machine learning repository (2013) Bache, K., Lichman, M.: Uci machine learning repository (2013)
8.
Zurück zum Zitat Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. In: ACM Sigmod Record, vol. 29, pp. 93–104. ACM (2000) Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. In: ACM Sigmod Record, vol. 29, pp. 93–104. ACM (2000)
9.
Zurück zum Zitat Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. Inf. Fusion 6(1), 5–20 (2005)CrossRef Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. Inf. Fusion 6(1), 5–20 (2005)CrossRef
10.
Zurück zum Zitat Kriegel, H.-P., Kroger, P., Schubert, E., Zimek, A.: Interpreting and unifying outlier scores. In: 11th SIAM International Conference on Data Mining (SDM), Mesa, AZ (2011) Kriegel, H.-P., Kroger, P., Schubert, E., Zimek, A.: Interpreting and unifying outlier scores. In: 11th SIAM International Conference on Data Mining (SDM), Mesa, AZ (2011)
11.
Zurück zum Zitat Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15(1), 3133–3181 (2014)MathSciNetMATH Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15(1), 3133–3181 (2014)MathSciNetMATH
12.
Zurück zum Zitat Fleiss, J.L., Levin, B., Paik, M.C.: Statistical Methods for Rates and Proportions. Wiley, New York (2013)MATH Fleiss, J.L., Levin, B., Paik, M.C.: Statistical Methods for Rates and Proportions. Wiley, New York (2013)MATH
13.
Zurück zum Zitat Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)MathSciNetCrossRefMATH Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)MathSciNetCrossRefMATH
14.
Zurück zum Zitat Giacinto, G., Roli, F.: Design of effective neural network ensembles for image classification purposes. Image Vis. Comput. 19(9), 699–707 (2001)CrossRef Giacinto, G., Roli, F.: Design of effective neural network ensembles for image classification purposes. Image Vis. Comput. 19(9), 699–707 (2001)CrossRef
15.
Zurück zum Zitat Ho, T.K.: Random decision forests. In: 1995 Proceedings of the Third International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282. IEEE (1995) Ho, T.K.: Random decision forests. In: 1995 Proceedings of the Third International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282. IEEE (1995)
16.
Zurück zum Zitat Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)CrossRef Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)CrossRef
17.
Zurück zum Zitat Knorr, E.M., Ng, R.T.: Finding intensional knowledge of distancebased outliers. VLDB 99, 211–222 (1999) Knorr, E.M., Ng, R.T.: Finding intensional knowledge of distancebased outliers. VLDB 99, 211–222 (1999)
18.
Zurück zum Zitat Knox, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the International Conference on Very Large Data Bases. Citeseer (1998) Knox, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the International Conference on Very Large Data Bases. Citeseer (1998)
19.
Zurück zum Zitat Kohavi, R., Wolpert, D.H., et al.: Bias plus variance decomposition for zero-one loss functions. In: ICML, pp. 275–283 (1996) Kohavi, R., Wolpert, D.H., et al.: Bias plus variance decomposition for zero-one loss functions. In: ICML, pp. 275–283 (1996)
20.
Zurück zum Zitat Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)CrossRefMATH Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)CrossRefMATH
21.
Zurück zum Zitat Maclin, R., Opitz, D.: Popular ensemble methods: an empirical study. J. Artif. Intell. Res. 11(1–2), 169–198 (1999)MATH Maclin, R., Opitz, D.: Popular ensemble methods: an empirical study. J. Artif. Intell. Res. 11(1–2), 169–198 (1999)MATH
22.
Zurück zum Zitat Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: ICML, vol. 97, pp. 211–218. Citeseer (1997) Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: ICML, vol. 97, pp. 211–218. Citeseer (1997)
23.
Zurück zum Zitat Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H., Hall, M., Frank, E.: The WEKA data mining software: an update. SIGKDD Explor. Newslett. 11(1), 10–18 (2009)CrossRef Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H., Hall, M., Frank, E.: The WEKA data mining software: an update. SIGKDD Explor. Newslett. 11(1), 10–18 (2009)CrossRef
24.
Zurück zum Zitat Martínez-Muñoz, G., Suárez, A.: Pruning in ordered bagging ensembles. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 609–616. ACM (2006) Martínez-Muñoz, G., Suárez, A.: Pruning in ordered bagging ensembles. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 609–616. ACM (2006)
25.
Zurück zum Zitat Martinez-Muoz, G., Hernández-Lobato, D., Suárez, A.: An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 245–259 (2009)CrossRef Martinez-Muoz, G., Hernández-Lobato, D., Suárez, A.: An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 245–259 (2009)CrossRef
26.
Zurück zum Zitat Partridge, D., Krzanowski, W.: Software diversity: practical statistics for its measurement and exploitation. Inf. Softw. Technol. 39(10), 707–717 (1997)CrossRef Partridge, D., Krzanowski, W.: Software diversity: practical statistics for its measurement and exploitation. Inf. Softw. Technol. 39(10), 707–717 (1997)CrossRef
27.
Zurück zum Zitat Partridge, D., Yates, W.B.: Engineering multiversion neural-net systems. Neural Comput. 8(4), 869–893 (1996)CrossRef Partridge, D., Yates, W.B.: Engineering multiversion neural-net systems. Neural Comput. 8(4), 869–893 (1996)CrossRef
28.
Zurück zum Zitat Polikar, R.: Ensemble based systems in decision making. IEEE Circ. Syst. Mag. 6(3), 21–45 (2006)CrossRef Polikar, R.: Ensemble based systems in decision making. IEEE Circ. Syst. Mag. 6(3), 21–45 (2006)CrossRef
29.
Zurück zum Zitat Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1–2), 1–39 (2010)CrossRef Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1–2), 1–39 (2010)CrossRef
30.
Zurück zum Zitat Ruts, I., Rousseeuw, P.J.: Computing depth contours of bivariate point clouds. Comput. Stat. Data Anal. 23(1), 153–168 (1996)CrossRefMATH Ruts, I., Rousseeuw, P.J.: Computing depth contours of bivariate point clouds. Comput. Stat. Data Anal. 23(1), 153–168 (1996)CrossRefMATH
31.
Zurück zum Zitat Schubert, E., Wojdanowski, R., Zimek, A., Kriegel, H.-P.: On evaluation of outlier rankings and outlier scores. In: 2012 Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, pp. 1047–1058 (2012) Schubert, E., Wojdanowski, R., Zimek, A., Kriegel, H.-P.: On evaluation of outlier rankings and outlier scores. In: 2012 Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, pp. 1047–1058 (2012)
32.
Zurück zum Zitat Skalak, D.B.: The sources of increased accuracy for two proposed boosting algorithms. In: Proceedings of American Association for Artificial Intelligence, AAAI-96, Integrating Multiple Learned Models Workshop, vol. 1129, p. 1133. Citeseer (1996) Skalak, D.B.: The sources of increased accuracy for two proposed boosting algorithms. In: Proceedings of American Association for Artificial Intelligence, AAAI-96, Integrating Multiple Learned Models Workshop, vol. 1129, p. 1133. Citeseer (1996)
33.
Zurück zum Zitat Smyth, P., Wolpert, D.: Linearly combining density estimators via stacking. Mach. Learn. 36(1–2), 59–83 (1999)CrossRef Smyth, P., Wolpert, D.: Linearly combining density estimators via stacking. Mach. Learn. 36(1–2), 59–83 (1999)CrossRef
34.
Zurück zum Zitat Tang, E.K., Suganthan, P.N., Yao, X.: An analysis of diversity measures. Mach. Learn. 65(1), 247–271 (2006)CrossRef Tang, E.K., Suganthan, P.N., Yao, X.: An analysis of diversity measures. Mach. Learn. 65(1), 247–271 (2006)CrossRef
35.
Zurück zum Zitat Tsoumakas, G., Partalas, I., Vlahavas, I.: An ensemble pruning primer. In: Okun, O., Valentini, G. (eds.) Applications of Supervised and Unsupervised Ensemble Methods. SCI, pp. 1–13. Springer, Heidelberg (2009)CrossRef Tsoumakas, G., Partalas, I., Vlahavas, I.: An ensemble pruning primer. In: Okun, O., Valentini, G. (eds.) Applications of Supervised and Unsupervised Ensemble Methods. SCI, pp. 1–13. Springer, Heidelberg (2009)CrossRef
36.
Zurück zum Zitat Williams, G.: Use R: Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery. Springer, New York (2011)MATH Williams, G.: Use R: Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery. Springer, New York (2011)MATH
38.
Zurück zum Zitat Yan, W., Goebel, K.F.: Designing classifier ensembles with constrained performance requirements. In: Defense and Security. International Society for Optics and Photonics, pp. 59–68 (2004) Yan, W., Goebel, K.F.: Designing classifier ensembles with constrained performance requirements. In: Defense and Security. International Society for Optics and Photonics, pp. 59–68 (2004)
39.
Zurück zum Zitat Yang, Y., Korb, K.B., Ting, K.M., Webb, G.I.: Ensemble selection for superparent-one-dependence estimators. In: Zhang, S., Jarvis, R.A. (eds.) AI 2005. LNCS (LNAI), vol. 3809, pp. 102–112. Springer, Heidelberg (2005)CrossRef Yang, Y., Korb, K.B., Ting, K.M., Webb, G.I.: Ensemble selection for superparent-one-dependence estimators. In: Zhang, S., Jarvis, R.A. (eds.) AI 2005. LNCS (LNAI), vol. 3809, pp. 102–112. Springer, Heidelberg (2005)CrossRef
Metadaten
Titel
An Outlier Ranking Tree Selection Approach to Extreme Pruning of Random Forests
verfasst von
Khaled Fawagreh
Mohamed Medhat Gaber
Eyad Elyan
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-44188-7_20

Premium Partner