Skip to main content

2023 | OriginalPaper | Buchkapitel

Empirical Analysis of Preprocessing Techniques for Imbalanced Dataset Using Logistic Regression

verfasst von : M. Revathi, D. Ramyachitra

Erschienen in: Proceedings of the 6th International Conference on Advance Computing and Intelligent Engineering

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper attempts to examine the performance of preprocessing strategies with logistic regression classifier. The goal of this paper is to see if there is a feasible and efficient strategy to enhance the performance of classification techniques on imbalanced datasets for different training dataset percentages. The experiments were conducted on Cleveland dataset—binary class. Several data preprocessing methods like Smote, Borderline-Smote, and ADAYSN were applied to data in order to classify various training dataset percentages. It was necessary to ascertain how the training dataset percentage affected the final classification for preprocessing methods. The experimental results explained that the ratio of 70–30 datasets performed better or better than other ratios when on train and test datasets, respectively. It was found from experimental results that the algorithms gave better accuracy when the training to testing ratio was 70:30 compared to other ratios.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat He, H., & Garcia, E. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21, 1263–1284.CrossRef He, H., & Garcia, E. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21, 1263–1284.CrossRef
2.
Zurück zum Zitat Chawla, N. V., Japkowicz, N., & Kotcz, A. (2004). Editorial: Special issue on learning from imbalanced data sets. SIGKDD Explorations, 6(1), 1–6.CrossRef Chawla, N. V., Japkowicz, N., & Kotcz, A. (2004). Editorial: Special issue on learning from imbalanced data sets. SIGKDD Explorations, 6(1), 1–6.CrossRef
3.
Zurück zum Zitat Yang, Q., & Wu, X. (2006).10 challenging problems in data mining research. International Journal of Information Technology and Decision Making, 5(4). Yang, Q., & Wu, X. (2006).10 challenging problems in data mining research. International Journal of Information Technology and Decision Making, 5(4).
5.
Zurück zum Zitat He, H., Bai, Y., Garcia, E. A., & Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference Neural Networks (IJCNN’08) (pp. 1322–1328), Hong Kong. He, H., Bai, Y., Garcia, E. A., & Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference Neural Networks (IJCNN’08) (pp. 1322–1328), Hong Kong.
7.
Zurück zum Zitat Hosmer, D. W., & Lemeshow, S., Introduction to the logistic regression model. Hosmer, D. W., & Lemeshow, S., Introduction to the logistic regression model.
9.
Zurück zum Zitat Chawla, N. V., Bowyer, K. W., Hall, L. O., & Philip, W. (2002). SMOTE: Synthetic minority over-sampling technique. The Journal of Artificial Intelligence Research, 16, 321–357.CrossRefMATH Chawla, N. V., Bowyer, K. W., Hall, L. O., & Philip, W. (2002). SMOTE: Synthetic minority over-sampling technique. The Journal of Artificial Intelligence Research, 16, 321–357.CrossRefMATH
10.
Zurück zum Zitat Han, H., Wang, W., & Mao, B. (2005). Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In Proceedings of the International Conference on Intelligent Computing (pp. 878–887). Han, H., Wang, W., & Mao, B. (2005). Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In Proceedings of the International Conference on Intelligent Computing (pp. 878–887).
Metadaten
Titel
Empirical Analysis of Preprocessing Techniques for Imbalanced Dataset Using Logistic Regression
verfasst von
M. Revathi
D. Ramyachitra
Copyright-Jahr
2023
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-19-2225-1_30

Neuer Inhalt