Skip to main content

2016 | OriginalPaper | Buchkapitel

On Improving Random Forest for Hard-to-Classify Records

verfasst von : Md Nasim Adnan, Md Zahidul Islam

Erschienen in: Advanced Data Mining and Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Random Forest draws much interest from the research community because of its simplicity and excellent performance. The splitting attribute at each node of a decision tree for Random Forest is determined from a predefined number of randomly selected subset of attributes of the entire attribute set. The size of the subset is one of the most controversial points of Random Forest that encouraged many contributions. However, a little attention is given to improve Random Forest specifically for those records that are hard to classify. In this paper, we propose a novel technique of detecting hard-to-classify records and increase the weights of those records in a training data set. We then build Random Forest from the weighted training data set. The experimental results presented in this paper indicate that the ensemble accuracy of Random Forest can be improved when applied on weighted training data sets with more emphasis on hard-to-classify records.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Adnan, M.N.: On dynamic selection of subspace for random forest. In: Luo, X., Yu, J.X., Li, Z. (eds.) ADMA 2014. LNCS (LNAI), vol. 8933, pp. 370–379. Springer, Heidelberg (2014). doi:10.1007/978-3-319-14717-8_29 Adnan, M.N.: On dynamic selection of subspace for random forest. In: Luo, X., Yu, J.X., Li, Z. (eds.) ADMA 2014. LNCS (LNAI), vol. 8933, pp. 370–379. Springer, Heidelberg (2014). doi:10.​1007/​978-3-319-14717-8_​29
2.
Zurück zum Zitat Adnan, M.N., Islam, M.Z.: A comprehensive method for attribute space extension for random forest. In: Proceedings of 17th International Conference on Computer and Information Technology, December 2014 Adnan, M.N., Islam, M.Z.: A comprehensive method for attribute space extension for random forest. In: Proceedings of 17th International Conference on Computer and Information Technology, December 2014
3.
Zurück zum Zitat Adnan, M.N., Islam, M.Z.: Complement random forest. In: Proceedings of the 13th Australasian Data Mining Conference (AusDM), pp. 89–97 (2015) Adnan, M.N., Islam, M.Z.: Complement random forest. In: Proceedings of the 13th Australasian Data Mining Conference (AusDM), pp. 89–97 (2015)
4.
Zurück zum Zitat Adnan, M.N., Islam, M.Z.: Improving the random forest algorithm by randomly varying the size of the bootstrap samples for low dimensional data sets. In: Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 391–396 (2015) Adnan, M.N., Islam, M.Z.: Improving the random forest algorithm by randomly varying the size of the bootstrap samples for low dimensional data sets. In: Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 391–396 (2015)
5.
Zurück zum Zitat Adnan, M.N., Islam, M.Z.: One-vs-all binarization technique in the context of random forest. In: Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 385–390 (2015) Adnan, M.N., Islam, M.Z.: One-vs-all binarization technique in the context of random forest. In: Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 385–390 (2015)
6.
Zurück zum Zitat Adnan, M.N., Islam, M.Z.: Forest CERN: a new decision forest building technique. In: In proceedings of the The 20th Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp. 304–315 (2016) Adnan, M.N., Islam, M.Z.: Forest CERN: a new decision forest building technique. In: In proceedings of the The 20th Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp. 304–315 (2016)
7.
Zurück zum Zitat Ahmad, A., Brown, G.: Random projection random discretization ensembles - ensembles of linear multivariate decision trees. IEEE Trans. Knowl. Data Eng. 26(5), 1225–1239 (2014)CrossRef Ahmad, A., Brown, G.: Random projection random discretization ensembles - ensembles of linear multivariate decision trees. IEEE Trans. Knowl. Data Eng. 26(5), 1225–1239 (2014)CrossRef
8.
Zurück zum Zitat Amasyali, M.F., Ersoy, O.K.: Classifier ensembles with the extended space forest. IEEE Trans. Knowl. Data Eng. 16, 145–153 (2014) Amasyali, M.F., Ersoy, O.K.: Classifier ensembles with the extended space forest. IEEE Trans. Knowl. Data Eng. 16, 145–153 (2014)
9.
Zurück zum Zitat Bernard, S., Adam, S., Heutte, L.: Dynamic random forests. Pattern Recogn. Lett. 33, 1580–1586 (2012)CrossRef Bernard, S., Adam, S., Heutte, L.: Dynamic random forests. Pattern Recogn. Lett. 33, 1580–1586 (2012)CrossRef
10.
Zurück zum Zitat Bernard, S., Heutte, L., Adam, S.: Forest-RK: a new random forest induction method. In: Huang, D.-S., Wunsch, D.C., Levine, D.S., Jo, K.-H. (eds.) ICIC 2008. LNCS (LNAI), vol. 5227, pp. 430–437. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85984-0_52 Bernard, S., Heutte, L., Adam, S.: Forest-RK: a new random forest induction method. In: Huang, D.-S., Wunsch, D.C., Levine, D.S., Jo, K.-H. (eds.) ICIC 2008. LNCS (LNAI), vol. 5227, pp. 430–437. Springer, Heidelberg (2008). doi:10.​1007/​978-3-540-85984-0_​52
11.
Zurück zum Zitat Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2008)MATH Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2008)MATH
12.
Zurück zum Zitat Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)MATH Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)MATH
14.
Zurück zum Zitat Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group, CA (1985)MATH Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group, CA (1985)MATH
15.
Zurück zum Zitat Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2, 121–167 (1998)CrossRef Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2, 121–167 (1998)CrossRef
16.
Zurück zum Zitat Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40, 16–28 (2014)CrossRef Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40, 16–28 (2014)CrossRef
17.
Zurück zum Zitat Cutler, A., Zhao, G.: Pert: perfect random tree ensembles. Comput. Sci. Stat. 33, 490–497 (2001) Cutler, A., Zhao, G.: Pert: perfect random tree ensembles. Comput. Sci. Stat. 33, 490–497 (2001)
18.
Zurück zum Zitat Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 148–156 (1996) Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 148–156 (1996)
19.
20.
Zurück zum Zitat Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes. Pattern Recogn. 44, 1761–1776 (2011)CrossRef Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes. Pattern Recogn. 44, 1761–1776 (2011)CrossRef
21.
Zurück zum Zitat Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 3–42 (2006)CrossRefMATH Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 3–42 (2006)CrossRefMATH
22.
Zurück zum Zitat Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2006)MATH Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2006)MATH
23.
Zurück zum Zitat Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998)CrossRef Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998)CrossRef
24.
Zurück zum Zitat Islam, M.Z., Giggins, H.: Knowledge discovery through SysFor - a systematically developed forest of multiple decision trees. In: Proceedings of the 9th Australian Data Mining Conference (2011) Islam, M.Z., Giggins, H.: Knowledge discovery through SysFor - a systematically developed forest of multiple decision trees. In: Proceedings of the 9th Australian Data Mining Conference (2011)
25.
Zurück zum Zitat Jain, A.K., Mao, J.: Artificial neural network: a tutorial. Computer 29(3), 31–44 (1996)CrossRef Jain, A.K., Mao, J.: Artificial neural network: a tutorial. Computer 29(3), 31–44 (1996)CrossRef
26.
Zurück zum Zitat Kwak, N., Choi, C.H.: Input feature selection for classification problems. IEEE Trans. Neural Netw. 13(1), 143–159 (2012)CrossRef Kwak, N., Choi, C.H.: Input feature selection for classification problems. IEEE Trans. Neural Netw. 13(1), 143–159 (2012)CrossRef
28.
Zurück zum Zitat Lobato, D.H., Munoz, G.M., Suarez, A.: How large should ensembles of classifiers be? Pattern Recogn. 46, 1323–1336 (2013)CrossRefMATH Lobato, D.H., Munoz, G.M., Suarez, A.: How large should ensembles of classifiers be? Pattern Recogn. 46, 1323–1336 (2013)CrossRefMATH
29.
Zurück zum Zitat Lorena, A.C., de Carvalho, A.C.P.L.F., Gama, J.M.P.: A review on the combination of binary classifiers in multiclass problems. Artif. Intell. Rev. 30, 19–37 (2008)CrossRef Lorena, A.C., de Carvalho, A.C.P.L.F., Gama, J.M.P.: A review on the combination of binary classifiers in multiclass problems. Artif. Intell. Rev. 30, 19–37 (2008)CrossRef
30.
Zurück zum Zitat Menze, B., Petrich, W., Hamprecht, F.: Multivariate feature selection and hierarchical classification for infrared spectroscopy: serum-based detection of bovine spongiform encephalopathy. Anal. Bioanal. Chem. 387, 1801–1807 (2007)CrossRef Menze, B., Petrich, W., Hamprecht, F.: Multivariate feature selection and hierarchical classification for infrared spectroscopy: serum-based detection of bovine spongiform encephalopathy. Anal. Bioanal. Chem. 387, 1801–1807 (2007)CrossRef
31.
Zurück zum Zitat Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)MATH Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)MATH
32.
Zurück zum Zitat Polikar, R.: Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6, 21–45 (2006)CrossRef Polikar, R.: Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6, 21–45 (2006)CrossRef
33.
Zurück zum Zitat Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993) Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
34.
Zurück zum Zitat Quinlan, J.R.: Improved use of continuous attributes in c4.5. J. Artif. Intell. Res. 4, 77–90 (1996)MATH Quinlan, J.R.: Improved use of continuous attributes in c4.5. J. Artif. Intell. Res. 4, 77–90 (1996)MATH
35.
Zurück zum Zitat Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: a new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1619–1630 (2006)CrossRef Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: a new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1619–1630 (2006)CrossRef
36.
Zurück zum Zitat Saeys, Y., Abeel, T., Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), vol. 5212, pp. 313–325. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87481-2_21 CrossRef Saeys, Y., Abeel, T., Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), vol. 5212, pp. 313–325. Springer, Heidelberg (2008). doi:10.​1007/​978-3-540-87481-2_​21 CrossRef
37.
Zurück zum Zitat Schapire, R.E.: Explaining AdaBoost. In: Schölkopf, B., Luo, Z., Vovk, V. (eds.) Empirical Inference, pp. 37–52. Springer, Heidelberg (2013)CrossRef Schapire, R.E.: Explaining AdaBoost. In: Schölkopf, B., Luo, Z., Vovk, V. (eds.) Empirical Inference, pp. 37–52. Springer, Heidelberg (2013)CrossRef
38.
Zurück zum Zitat Robnik-Šikonja, M.: Improving random forests. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 359–370. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30115-8_34 CrossRef Robnik-Šikonja, M.: Improving random forests. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 359–370. Springer, Heidelberg (2004). doi:10.​1007/​978-3-540-30115-8_​34 CrossRef
39.
Zurück zum Zitat Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, London (2006) Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, London (2006)
40.
Zurück zum Zitat Tuv, E., Borisov, A., Runger, G., Torkkola, K.: Feature selection with ensembles, artificial variables, and redundancy elimination. J. Mach. Learn. Res. 10, 1341–1366 (2009)MathSciNetMATH Tuv, E., Borisov, A., Runger, G., Torkkola, K.: Feature selection with ensembles, artificial variables, and redundancy elimination. J. Mach. Learn. Res. 10, 1341–1366 (2009)MathSciNetMATH
41.
Zurück zum Zitat Ye, Y., Wu, Q., Huang, J.Z., Ng, M.K., Li, X.: Stratified sampling of feature subspace selection in random forests for high dimensional data. Pattern Recogn. 46, 769–787 (2014)CrossRef Ye, Y., Wu, Q., Huang, J.Z., Ng, M.K., Li, X.: Stratified sampling of feature subspace selection in random forests for high dimensional data. Pattern Recogn. 46, 769–787 (2014)CrossRef
42.
Zurück zum Zitat Zhang, C., Masseglia, F., Zhang, X.: Discovering highly informative feature set over high dimensions. In: Proceedings of the IEEE 24th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1059–1064, November 2012 Zhang, C., Masseglia, F., Zhang, X.: Discovering highly informative feature set over high dimensions. In: Proceedings of the IEEE 24th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1059–1064, November 2012
43.
Zurück zum Zitat Zhang, G., Patuwo, B.E., Hu, M.Y.: Forecasting with artificial neural networks: The state of the art. Int. J. Forecast. 14, 35–62 (1998)CrossRef Zhang, G., Patuwo, B.E., Hu, M.Y.: Forecasting with artificial neural networks: The state of the art. Int. J. Forecast. 14, 35–62 (1998)CrossRef
44.
Zurück zum Zitat Zhang, G.P.: Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybern. 30, 451–462 (2000)CrossRef Zhang, G.P.: Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybern. 30, 451–462 (2000)CrossRef
45.
Zurück zum Zitat Zhang, L., Suganthan, P.N.: Random forests with ensemble of feature spaces. Pattern Recogn. 47, 3429–3437 (2014)CrossRef Zhang, L., Suganthan, P.N.: Random forests with ensemble of feature spaces. Pattern Recogn. 47, 3429–3437 (2014)CrossRef
Metadaten
Titel
On Improving Random Forest for Hard-to-Classify Records
verfasst von
Md Nasim Adnan
Md Zahidul Islam
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-49586-6_39