Skip to main content
Erschienen in: Cognitive Neurodynamics 6/2015

01.12.2015 | Research Article

Integrating new data balancing technique with committee networks for imbalanced data: GRSOM approach

verfasst von: Danaipong Chetchotsak, Sirorat Pattanapairoj, Banchar Arnonkijpanich

Erschienen in: Cognitive Neurodynamics | Ausgabe 6/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

To deal with imbalanced data in a classification problem, this paper proposes a data balancing technique to be used in conjunction with a committee network. The proposed data balancing technique is based on the concept of the growing ring self-organizing map (GRSOM) which is an unsupervised learning algorithm. GRSOM balances the data through growing new data on a well-defined ring structure, which is iteratively developed based on the winning node nearby the samples. Accordingly, the new balanced data still preserve the topology of the original data. The performance of our proposed method is evaluated using four real data sets from the UCI Machine Learning Repository and the classification performance is measured using the fivefold cross validation method. Classifiers with most common data balancing techniques, namely the Minority Over-Sampling Technique (SMOTE) and the Random under-sampling Technique (RT), are used as the baseline methods in this study. The results reveal that a committee of classifiers constructed using GRSOM performs at least as well as the baseline methods. The results also suggest that classifiers constructed using neural networks with the backpropagation algorithm are more robust than those using the support vector machine.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Adrianto I, Richman MB, Trafalis TB (2010) Machine learning techniques for imbalanced data: an application for tornado detection. In: Proceedings of the international conference on artificial neural networks in engineering, pp 509–516 Adrianto I, Richman MB, Trafalis TB (2010) Machine learning techniques for imbalanced data: an application for tornado detection. In: Proceedings of the international conference on artificial neural networks in engineering, pp 509–516
Zurück zum Zitat Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: European Conference on Machine Learning, pp 39–50 Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: European Conference on Machine Learning, pp 39–50
Zurück zum Zitat Arnonkijpanich B, Hasenfuss A, Hammer B (2010) Local matrix learning in clustering and applications for manifold visualization. Neural Netw 23:476–486CrossRefPubMed Arnonkijpanich B, Hasenfuss A, Hammer B (2010) Local matrix learning in clustering and applications for manifold visualization. Neural Netw 23:476–486CrossRefPubMed
Zurück zum Zitat Arnonkijpanich B, Hasenfuss A, Hammer B (2011) Local matrix adaptation in topographic neural maps. Neurocomputing 74:522–539CrossRef Arnonkijpanich B, Hasenfuss A, Hammer B (2011) Local matrix adaptation in topographic neural maps. Neurocomputing 74:522–539CrossRef
Zurück zum Zitat Bai Y, Zhang W, Hu H (2006a) An efficient growing ring SOM and its application to TSP. In: Proceedings of the international conference on applied mathematics, pp 351–355 Bai Y, Zhang W, Hu H (2006a) An efficient growing ring SOM and its application to TSP. In: Proceedings of the international conference on applied mathematics, pp 351–355
Zurück zum Zitat Bai Y, Zhang W, Jin Z (2006b) An new self-organizing maps strategy for solving the traveling salesman problem. Chaos Solitons Fract 28:1082–1089CrossRef Bai Y, Zhang W, Jin Z (2006b) An new self-organizing maps strategy for solving the traveling salesman problem. Chaos Solitons Fract 28:1082–1089CrossRef
Zurück zum Zitat Batista A, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor 6:20–29CrossRef Batista A, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor 6:20–29CrossRef
Zurück zum Zitat Chan PK, Wei F, Prodromidis A, Stolfo SJ (1999) Distributed data mining in credit card fraud detection. IEEE Intell Syst 14:67–74CrossRef Chan PK, Wei F, Prodromidis A, Stolfo SJ (1999) Distributed data mining in credit card fraud detection. IEEE Intell Syst 14:67–74CrossRef
Zurück zum Zitat Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res 16:321–357 Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res 16:321–357
Zurück zum Zitat Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: Proceedings of the European conference on principles and practice of knowledge discovery in databases, pp 107–119 Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: Proceedings of the European conference on principles and practice of knowledge discovery in databases, pp 107–119
Zurück zum Zitat Chetchotsak D, Pattanapairoj S (2010) Committee network model for HDD functional tests. In: Proceedings of international conference on artificial neural networks in engineering, pp 629–636 Chetchotsak D, Pattanapairoj S (2010) Committee network model for HDD functional tests. In: Proceedings of international conference on artificial neural networks in engineering, pp 629–636
Zurück zum Zitat Chetchotsak D, Twomey JM (2007) Combining neural networks for function approximation under conditions of sparse data: the biased regression approach. Int J Gen Syst 36:479–499CrossRef Chetchotsak D, Twomey JM (2007) Combining neural networks for function approximation under conditions of sparse data: the biased regression approach. Int J Gen Syst 36:479–499CrossRef
Zurück zum Zitat Chyi YM (2003) Classification analysis techniques for skewed class distribution problems. Master thesis, Department of Information Management, National Sun Yat-Sen University Chyi YM (2003) Classification analysis techniques for skewed class distribution problems. Master thesis, Department of Information Management, National Sun Yat-Sen University
Zurück zum Zitat Cohen G, Hilario M, Sax H, Hugonnet S, Geissbuhler A (2006) Learning from imbalanced data in surveillance of nosocomial infection. Artif Intell Med 37:7–18CrossRefPubMed Cohen G, Hilario M, Sax H, Hugonnet S, Geissbuhler A (2006) Learning from imbalanced data in surveillance of nosocomial infection. Artif Intell Med 37:7–18CrossRefPubMed
Zurück zum Zitat Daskalaki S, Kopanas I, Avouris N (2006) Evaluation of classifiers for an uneven class distribution problem. Appl Artif Intell 20:381–417CrossRef Daskalaki S, Kopanas I, Avouris N (2006) Evaluation of classifiers for an uneven class distribution problem. Appl Artif Intell 20:381–417CrossRef
Zurück zum Zitat Drummond C, Holte RC (2003) C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Proceedings of the international conference on machine learning Drummond C, Holte RC (2003) C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Proceedings of the international conference on machine learning
Zurück zum Zitat Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of international joint conference on artificial intelligence, pp 973–978 Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of international joint conference on artificial intelligence, pp 973–978
Zurück zum Zitat Fawcett T, Provost F (1997) Adaptive fraud detection. Data Min Knowl Discov 1:291–316CrossRef Fawcett T, Provost F (1997) Adaptive fraud detection. Data Min Knowl Discov 1:291–316CrossRef
Zurück zum Zitat Fernandez A, Garcia S, Jesus MJ, Herrera F (2008) A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst 159:2378–2398CrossRef Fernandez A, Garcia S, Jesus MJ, Herrera F (2008) A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst 159:2378–2398CrossRef
Zurück zum Zitat Ganji MF, Abadeh MS, Hedayati M, Bakhtiari N (2010) Fuzzy classification of imbalanced data sets for medical diagnosis. In: Proceedings of Iranian conference on biomedical engineering, pp 1–5 Ganji MF, Abadeh MS, Hedayati M, Bakhtiari N (2010) Fuzzy classification of imbalanced data sets for medical diagnosis. In: Proceedings of Iranian conference on biomedical engineering, pp 1–5
Zurück zum Zitat Hilas CS, Mastorocostas PA (2008) An application of supervised and unsupervised learning approaches to telecommunications fraud detection. Knowl Based Syst 21:721–726CrossRef Hilas CS, Mastorocostas PA (2008) An application of supervised and unsupervised learning approaches to telecommunications fraud detection. Knowl Based Syst 21:721–726CrossRef
Zurück zum Zitat Huang YM, Hung CM, Jiau HC (2006) Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem. Nonlinear Anal Real World Appl 7:720–757CrossRef Huang YM, Hung CM, Jiau HC (2006) Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem. Nonlinear Anal Real World Appl 7:720–757CrossRef
Zurück zum Zitat Hwang JP, Park S, Kim E (2011) A new weighted approach to imbalanced data classification problem via support vector machine with quadratic cost function. Expert Syst Appl 38:8580–8585CrossRef Hwang JP, Park S, Kim E (2011) A new weighted approach to imbalanced data classification problem via support vector machine with quadratic cost function. Expert Syst Appl 38:8580–8585CrossRef
Zurück zum Zitat Kang P, Cho S, MacLachlan DL (2012) Improved response modeling based on clustering, under-sampling, and ensemble. Expert Syst Appl 39:6738–6753CrossRef Kang P, Cho S, MacLachlan DL (2012) Improved response modeling based on clustering, under-sampling, and ensemble. Expert Syst Appl 39:6738–6753CrossRef
Zurück zum Zitat Kubat MR, Holte C, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30:195–215CrossRef Kubat MR, Holte C, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30:195–215CrossRef
Zurück zum Zitat Li DC, Liu CW, Hu SC (2010) A learning method for the class imbalance problem with medical data sets. Comput Biol Med 40:509–518CrossRefPubMed Li DC, Liu CW, Hu SC (2010) A learning method for the class imbalance problem with medical data sets. Comput Biol Med 40:509–518CrossRefPubMed
Zurück zum Zitat Ling C, Li C (1998) Data mining for direct marketing: problems and solutions. In: Proceedings of international conference on knowledge discovery and data mining, pp 73–79 Ling C, Li C (1998) Data mining for direct marketing: problems and solutions. In: Proceedings of international conference on knowledge discovery and data mining, pp 73–79
Zurück zum Zitat Liu Y, Yu X, Huang JX, An A (2011) Combining integrated sampling with SVM ensembles for learning from imbalanced dataset. Inf Process Manage 47:617–631CrossRef Liu Y, Yu X, Huang JX, An A (2011) Combining integrated sampling with SVM ensembles for learning from imbalanced dataset. Inf Process Manage 47:617–631CrossRef
Zurück zum Zitat Mazurowski MA, Habas PA, Zurada JM, Lo JY, Baker JA, Tourassi GD (2008) Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw 21:427–436PubMedCentralCrossRefPubMed Mazurowski MA, Habas PA, Zurada JM, Lo JY, Baker JA, Tourassi GD (2008) Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw 21:427–436PubMedCentralCrossRefPubMed
Zurück zum Zitat Nakamura M, Kajiwara Y, Otsuka A, Kimura H (2013) LVQ–SMOTE—learning vector quantization based synthetic Minority Over-Sampling Technique for biomedical data. BioData Min 6:16PubMedCentralCrossRefPubMed Nakamura M, Kajiwara Y, Otsuka A, Kimura H (2013) LVQ–SMOTE—learning vector quantization based synthetic Minority Over-Sampling Technique for biomedical data. BioData Min 6:16PubMedCentralCrossRefPubMed
Zurück zum Zitat Nanthapodej R, Chetchotsak D (2009) Classification performance of committee networks improvement under sparse data conditions. Khon Kaen Univ Res J 9:65–76 Nanthapodej R, Chetchotsak D (2009) Classification performance of committee networks improvement under sparse data conditions. Khon Kaen Univ Res J 9:65–76
Zurück zum Zitat Parmanto B, Munro PW, Doyle HR (1996) Reducing variance of committee prediction with resampling techiques. Connect Sci 8:405–425CrossRef Parmanto B, Munro PW, Doyle HR (1996) Reducing variance of committee prediction with resampling techiques. Connect Sci 8:405–425CrossRef
Zurück zum Zitat Ren J (2012) ANN vs. SVM: which one performs better in classification of MCCs in mammogram imaging. Knowl Based Syst 26:144–153CrossRef Ren J (2012) ANN vs. SVM: which one performs better in classification of MCCs in mammogram imaging. Knowl Based Syst 26:144–153CrossRef
Zurück zum Zitat Sasamura H, Ohta R, Saito T (2002) A simple learning algorithm for growing ring SOM and its application to TSP. In: Proceedings of international conference on neural information processing, pp 1287–1290 Sasamura H, Ohta R, Saito T (2002) A simple learning algorithm for growing ring SOM and its application to TSP. In: Proceedings of international conference on neural information processing, pp 1287–1290
Zurück zum Zitat Sun Y, Kamel MS, Wong A, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. J Pattern Recogn Soc 40:3358–3378CrossRef Sun Y, Kamel MS, Wong A, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. J Pattern Recogn Soc 40:3358–3378CrossRef
Zurück zum Zitat Tang Y, Zhang YQ, Chawla NV, Krasser S (2002) SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man Cybern Part B Cybern 39:281–288CrossRef Tang Y, Zhang YQ, Chawla NV, Krasser S (2002) SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man Cybern Part B Cybern 39:281–288CrossRef
Zurück zum Zitat Yen SJ, Lee YS (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36:5718–5727CrossRef Yen SJ, Lee YS (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36:5718–5727CrossRef
Zurück zum Zitat Yong Y (2012) The research of imbalanced data set of sample sampling method based on k- means cluster and genetic algorithm. Energy Procedia 17:164–170CrossRef Yong Y (2012) The research of imbalanced data set of sample sampling method based on k- means cluster and genetic algorithm. Energy Procedia 17:164–170CrossRef
Zurück zum Zitat Young W, Nykl S, Weckman G, Chelberg D (2015) Using Voronoi diagrams to improve classification performances when modeling imbalanced datasets. Neural Comput Appl 26:1041–1054CrossRef Young W, Nykl S, Weckman G, Chelberg D (2015) Using Voronoi diagrams to improve classification performances when modeling imbalanced datasets. Neural Comput Appl 26:1041–1054CrossRef
Zurück zum Zitat Zhang J, Mani I (2003) kNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of the ICML workshop on learning from imbalanced dataset Zhang J, Mani I (2003) kNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of the ICML workshop on learning from imbalanced dataset
Zurück zum Zitat Zhang Y, Zhang D, Mi G, Ma D, Li G, Guo Y, Li M, Zhu M (2012) Using ensemble methods to deal with imbalanced data in predicting protein–protein interactions. Comput Biol Chem 36:36–41CrossRefPubMed Zhang Y, Zhang D, Mi G, Ma D, Li G, Guo Y, Li M, Zhu M (2012) Using ensemble methods to deal with imbalanced data in predicting protein–protein interactions. Comput Biol Chem 36:36–41CrossRefPubMed
Metadaten
Titel
Integrating new data balancing technique with committee networks for imbalanced data: GRSOM approach
verfasst von
Danaipong Chetchotsak
Sirorat Pattanapairoj
Banchar Arnonkijpanich
Publikationsdatum
01.12.2015
Verlag
Springer Netherlands
Erschienen in
Cognitive Neurodynamics / Ausgabe 6/2015
Print ISSN: 1871-4080
Elektronische ISSN: 1871-4099
DOI
https://doi.org/10.1007/s11571-015-9350-4

Weitere Artikel der Ausgabe 6/2015

Cognitive Neurodynamics 6/2015 Zur Ausgabe

Neuer Inhalt