Skip to main content
Top

2018 | OriginalPaper | Chapter

A Novel Random Forest Approach Using Specific Under Sampling Strategy

Authors : L. Surya Prasanthi, R. Kiran Kumar, Kudipudi Srinivas

Published in: Data Engineering and Intelligent Computing

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In Data Mining the knowledge is discovered from the existing real world data sets. In real time scenario, the category of datasets varies dynamically. One of the emerging categories of dataset is class imbalance data. In Class Imbalance data, the percentages of instances in one class are far greater than the other class. The traditional data mining algorithms are well applicable for knowledge discovery from balance datasets. Efficient knowledge discovery is hampered in the case of class imbalance datasets. In this paper, we propose a novel approach dubbed as Under Sampling using Random Forest (USRF) for efficient knowledge discovery from imbalance datasets. The proposed USRF approach is verified on the 11 benchmark datasets from UCI repository. The experimental observations show that an improved accuracy and AUC is achieved with the proposed USRF approach with a good reduction in RMS error.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39, 3446–3453 (2012)CrossRef Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39, 3446–3453 (2012)CrossRef
2.
go back to reference Lorena, A.C., Jacintho, L.F.O., Siqueira, M.F., De Giovanni, R., Lohmann, L.G., de Carvalho, A.C.P.L.F., Yamamoto, M.: Comparing machine learning classifiers in potential distribution modeling. Expert Syst. Appl. 38, 5268–5275 (2011) Lorena, A.C., Jacintho, L.F.O., Siqueira, M.F., De Giovanni, R., Lohmann, L.G., de Carvalho, A.C.P.L.F., Yamamoto, M.: Comparing machine learning classifiers in potential distribution modeling. Expert Syst. Appl. 38, 5268–5275 (2011)
3.
go back to reference Molaei, E., Vadiatizadeh, H., Amirmahdimohammadighavam, Rajabpour, N.: Fatemehziasistani Distributed algorithm for privacy preserving data mining based on ID3 and improved secure sum. Int. J. Adv. Stud. Comput. Sci. Eng. IJASCSE 3(1), 28–34 (2014) Molaei, E., Vadiatizadeh, H., Amirmahdimohammadighavam, Rajabpour, N.: Fatemehziasistani Distributed algorithm for privacy preserving data mining based on ID3 and improved secure sum. Int. J. Adv. Stud. Comput. Sci. Eng. IJASCSE 3(1), 28–34 (2014)
4.
go back to reference Hua, Y., Feng, B., Zhang, X., Ngai, E.W.T., Liu, M.: Stock trading rule discovery with an evolutionary trend following model. Expert Syst. Appl. 42, 212–222 (2015) Hua, Y., Feng, B., Zhang, X., Ngai, E.W.T., Liu, M.: Stock trading rule discovery with an evolutionary trend following model. Expert Syst. Appl. 42, 212–222 (2015)
5.
go back to reference López, V., Triguero, I., Carmona, C.J., García, S., Herrera, F.: Addressing imbalanced classification with instance generation techniques:IPADE-ID. Neurocomputing 126, 15–28 (2014)CrossRef López, V., Triguero, I., Carmona, C.J., García, S., Herrera, F.: Addressing imbalanced classification with instance generation techniques:IPADE-ID. Neurocomputing 126, 15–28 (2014)CrossRef
6.
go back to reference Kumar, S., Jain, S.: Intrusion detection and classification using improved ID3 algorithm of data mining. Int. J. Adv. Res. Comput. Eng. Technol. 1(5), 352–356 (2012) Kumar, S., Jain, S.: Intrusion detection and classification using improved ID3 algorithm of data mining. Int. J. Adv. Res. Comput. Eng. Technol. 1(5), 352–356 (2012)
7.
go back to reference Manohar, S., Mittal, A., Naik, S., Ambre, A.: A dynamic classifier using decision tree algorithm. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 5(1), 628–631 (2015) Manohar, S., Mittal, A., Naik, S., Ambre, A.: A dynamic classifier using decision tree algorithm. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 5(1), 628–631 (2015)
8.
go back to reference Verbiest, N., Ramentol, E., Cornelisa, C., Herrerac, F.: Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection. Applied Soft Comput. 22, 511–517 (2014) Verbiest, N., Ramentol, E., Cornelisa, C., Herrerac, F.: Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection. Applied Soft Comput. 22, 511–517 (2014)
9.
go back to reference Hall, M.A.: Correlation-based feature subset selection for machine learning. PhD thesis (1998) Hall, M.A.: Correlation-based feature subset selection for machine learning. PhD thesis (1998)
11.
go back to reference Quinlan, J.: Induction of decision trees. Mach. Learn. 1, 81C106 (1986) Quinlan, J.: Induction of decision trees. Mach. Learn. 1, 81C106 (1986)
12.
go back to reference Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth, Belmont, CA (1984)MATH Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth, Belmont, CA (1984)MATH
13.
go back to reference Kohavi, R.: Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: Second International Conference on Knoledge Discovery and Data Mining, pp. 202–207 (1996) Kohavi, R.: Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: Second International Conference on Knoledge Discovery and Data Mining, pp. 202–207 (1996)
14.
go back to reference Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)MATH Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)MATH
Metadata
Title
A Novel Random Forest Approach Using Specific Under Sampling Strategy
Authors
L. Surya Prasanthi
R. Kiran Kumar
Kudipudi Srinivas
Copyright Year
2018
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-3223-3_24

Premium Partner