Skip to main content

2016 | OriginalPaper | Buchkapitel

A Swarm Intelligence Approach in Undersampling Majority Class

verfasst von : Haya Abdullah Alhakbani, Mohammad Majid al-Rifaie

Erschienen in: Swarm Intelligence

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Over the years, machine learning has been facing the issue of imbalance dataset. It occurs when the number of instances in one class significantly outnumbers the instances in the other class. This study investigates a new approach for balancing the dataset using a swarm intelligence technique, Stochastic Diffusion Search (SDS), to undersample the majority class on a direct marketing dataset. The outcome of the novel application of this swarm intelligence algorithm demonstrates promising results which encourage the possibility of undersampling a majority class by removing redundant data whist protecting the useful data in the dataset. This paper details the behaviour of the proposed algorithm in dealing with this problem and investigates the results which are contrasted against other techniques.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat al-Rifaie, M.M., Aber, A., Sayers, R., Choke, E., Bown, M.: Deploying swarm intelligence in medical imaging. In: 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 14–21. IEEE (2014) al-Rifaie, M.M., Aber, A., Sayers, R., Choke, E., Bown, M.: Deploying swarm intelligence in medical imaging. In: 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 14–21. IEEE (2014)
2.
Zurück zum Zitat Bahnsen, A.C., Aouada, D., Ottersten, B.: Ensemble of example-dependent cost-sensitive decision trees. arXiv preprint (2015). arXiv:1505.04637 Bahnsen, A.C., Aouada, D., Ottersten, B.: Ensemble of example-dependent cost-sensitive decision trees. arXiv preprint (2015). arXiv:​1505.​04637
3.
Zurück zum Zitat Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explor. Newslett. 6(1), 20–29 (2004)CrossRef Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explor. Newslett. 6(1), 20–29 (2004)CrossRef
4.
Zurück zum Zitat Beckmann, M., Ebecken, N.F., de Lima, B.S.P.: A KNN undersampling approach for data balancing. J. Intell. Learn. Syst. Appl. 7(04), 104 (2015) Beckmann, M., Ebecken, N.F., de Lima, B.S.P.: A KNN undersampling approach for data balancing. J. Intell. Learn. Syst. Appl. 7(04), 104 (2015)
5.
Zurück zum Zitat Bishop, J.: Stochastic searching networks. In: Procedings of the 1st IEE Conference on Artifical Neural Networks, pp. 329–331 (1989) Bishop, J.: Stochastic searching networks. In: Procedings of the 1st IEE Conference on Artifical Neural Networks, pp. 329–331 (1989)
6.
Zurück zum Zitat Bishop, J., Torr, P.: The stochastic search network. In: Linggard, R., Myers, D.J., Nightingale, C. (eds.) Neural Networks for Vision, Speech and Natural Language. BT Telecommunications Series, vol. 1, pp. 370–387. Springer, Cambridge (1992)CrossRef Bishop, J., Torr, P.: The stochastic search network. In: Linggard, R., Myers, D.J., Nightingale, C. (eds.) Neural Networks for Vision, Speech and Natural Language. BT Telecommunications Series, vol. 1, pp. 370–387. Springer, Cambridge (1992)CrossRef
7.
Zurück zum Zitat Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)MATH Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)MATH
8.
Zurück zum Zitat Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. ACM Sigkdd Explor. Newslett. 6(1), 1–6 (2004)CrossRef Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. ACM Sigkdd Explor. Newslett. 6(1), 1–6 (2004)CrossRef
9.
Zurück zum Zitat Drown, D.J., Khoshgoftaar, T.M., Narayanan, R.: Using evolutionary sampling to mine imbalanced data. In: Sixth International Conference on Machine Learning and Applications, ICMLA 2007, pp. 363–368. IEEE (2007) Drown, D.J., Khoshgoftaar, T.M., Narayanan, R.: Using evolutionary sampling to mine imbalanced data. In: Sixth International Conference on Machine Learning and Applications, ICMLA 2007, pp. 363–368. IEEE (2007)
10.
Zurück zum Zitat Elsalamony, H.A.: Bank direct marketing analysis of data mining techniques. Int. J. Comput. Appl. 85(7), 12–22 (2014) Elsalamony, H.A.: Bank direct marketing analysis of data mining techniques. Int. J. Comput. Appl. 85(7), 12–22 (2014)
11.
Zurück zum Zitat Feng, G., Zhang, J.D., Liao, S.S.: A novel method for combining bayesian networks, theoretical analysis, and its applications. Pattern Recogn. 47(5), 2057–2069 (2014)CrossRefMATH Feng, G., Zhang, J.D., Liao, S.S.: A novel method for combining bayesian networks, theoretical analysis, and its applications. Pattern Recogn. 47(5), 2057–2069 (2014)CrossRefMATH
12.
Zurück zum Zitat García, V., Sánchez, J.S., Mollineda, R.A.: On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl. Based Syst. 25(1), 13–21 (2012)CrossRef García, V., Sánchez, J.S., Mollineda, R.A.: On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl. Based Syst. 25(1), 13–21 (2012)CrossRef
13.
Zurück zum Zitat Grech-Cini, H., McKee, G.T.: Locating the mouth region in images of human faces. In: Optical Tools for Manufacturing and Advanced Automation, pp. 458–465. International Society for Optics and Photonics (1993) Grech-Cini, H., McKee, G.T.: Locating the mouth region in images of human faces. In: Optical Tools for Manufacturing and Advanced Automation, pp. 458–465. International Society for Optics and Photonics (1993)
14.
Zurück zum Zitat Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005)CrossRef Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005)CrossRef
15.
Zurück zum Zitat Hurley, S., Whitaker, R.M.: An agent based approach to site selection for wireless networks. In: Proceedings of the 2002 ACM Symposium on Applied Computing, pp. 574–577. ACM (2002) Hurley, S., Whitaker, R.M.: An agent based approach to site selection for wireless networks. In: Proceedings of the 2002 ACM Symposium on Applied Computing, pp. 574–577. ACM (2002)
16.
Zurück zum Zitat Japkowicz, N., et al.: Learning from imbalanced data sets: a comparison of various strategies. In: AAAI Workshop on Learning from Imbalanced Data Sets, vol. 68, pp. 10–15, Menlo Park, CA (2000) Japkowicz, N., et al.: Learning from imbalanced data sets: a comparison of various strategies. In: AAAI Workshop on Learning from Imbalanced Data Sets, vol. 68, pp. 10–15, Menlo Park, CA (2000)
17.
Zurück zum Zitat Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3(3), 263–286 (2001)CrossRefMATH Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3(3), 263–286 (2001)CrossRefMATH
18.
Zurück zum Zitat Kubat, M., Matwin, S., et al.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML, vol. 97, pp. 179–186, Nashville, USA (1997) Kubat, M., Matwin, S., et al.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML, vol. 97, pp. 179–186, Nashville, USA (1997)
19.
Zurück zum Zitat McCluskey, A., Lalkhen, A.G.: Statistics II: central tendency and spread of data. Conti. Educ. Anaesth. Crit. Care Pain 7(4), 127–130 (2007)CrossRef McCluskey, A., Lalkhen, A.G.: Statistics II: central tendency and spread of data. Conti. Educ. Anaesth. Crit. Care Pain 7(4), 127–130 (2007)CrossRef
20.
Zurück zum Zitat Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)CrossRef Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)CrossRef
21.
Zurück zum Zitat Moro, S., Laureano, R., Cortez, P.: Using data mining for bank direct marketing: an application of the crisp-dm methodology. In: Proceedings of the European Simulation and Modelling Conference, Eurosis (2011) Moro, S., Laureano, R., Cortez, P.: Using data mining for bank direct marketing: an application of the crisp-dm methodology. In: Proceedings of the European Simulation and Modelling Conference, Eurosis (2011)
22.
Zurück zum Zitat Nasuto, S.: Resource allocation analysis of the stochastic diffusion search. Ph.D. thesis, University of Reading (1999) Nasuto, S.: Resource allocation analysis of the stochastic diffusion search. Ph.D. thesis, University of Reading (1999)
23.
Zurück zum Zitat al Rifaie, M.M., Bishop, J.M.: Stochastic diffusion search review. J. Behav. Robot. 3, 155–173 (2013) al Rifaie, M.M., Bishop, J.M.: Stochastic diffusion search review. J. Behav. Robot. 3, 155–173 (2013)
24.
Zurück zum Zitat Sun, Y., Wong, A.K., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recogn. Artif. Intell. 23(04), 687–719 (2009)CrossRef Sun, Y., Wong, A.K., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recogn. Artif. Intell. 23(04), 687–719 (2009)CrossRef
Metadaten
Titel
A Swarm Intelligence Approach in Undersampling Majority Class
verfasst von
Haya Abdullah Alhakbani
Mohammad Majid al-Rifaie
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-44427-7_19