Skip to main content
Top
Published in: Cognitive Neurodynamics 6/2015

01-12-2015 | Research Article

Integrating new data balancing technique with committee networks for imbalanced data: GRSOM approach

Authors: Danaipong Chetchotsak, Sirorat Pattanapairoj, Banchar Arnonkijpanich

Published in: Cognitive Neurodynamics | Issue 6/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

To deal with imbalanced data in a classification problem, this paper proposes a data balancing technique to be used in conjunction with a committee network. The proposed data balancing technique is based on the concept of the growing ring self-organizing map (GRSOM) which is an unsupervised learning algorithm. GRSOM balances the data through growing new data on a well-defined ring structure, which is iteratively developed based on the winning node nearby the samples. Accordingly, the new balanced data still preserve the topology of the original data. The performance of our proposed method is evaluated using four real data sets from the UCI Machine Learning Repository and the classification performance is measured using the fivefold cross validation method. Classifiers with most common data balancing techniques, namely the Minority Over-Sampling Technique (SMOTE) and the Random under-sampling Technique (RT), are used as the baseline methods in this study. The results reveal that a committee of classifiers constructed using GRSOM performs at least as well as the baseline methods. The results also suggest that classifiers constructed using neural networks with the backpropagation algorithm are more robust than those using the support vector machine.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Adrianto I, Richman MB, Trafalis TB (2010) Machine learning techniques for imbalanced data: an application for tornado detection. In: Proceedings of the international conference on artificial neural networks in engineering, pp 509–516 Adrianto I, Richman MB, Trafalis TB (2010) Machine learning techniques for imbalanced data: an application for tornado detection. In: Proceedings of the international conference on artificial neural networks in engineering, pp 509–516
go back to reference Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: European Conference on Machine Learning, pp 39–50 Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: European Conference on Machine Learning, pp 39–50
go back to reference Arnonkijpanich B, Hasenfuss A, Hammer B (2010) Local matrix learning in clustering and applications for manifold visualization. Neural Netw 23:476–486CrossRefPubMed Arnonkijpanich B, Hasenfuss A, Hammer B (2010) Local matrix learning in clustering and applications for manifold visualization. Neural Netw 23:476–486CrossRefPubMed
go back to reference Arnonkijpanich B, Hasenfuss A, Hammer B (2011) Local matrix adaptation in topographic neural maps. Neurocomputing 74:522–539CrossRef Arnonkijpanich B, Hasenfuss A, Hammer B (2011) Local matrix adaptation in topographic neural maps. Neurocomputing 74:522–539CrossRef
go back to reference Bai Y, Zhang W, Hu H (2006a) An efficient growing ring SOM and its application to TSP. In: Proceedings of the international conference on applied mathematics, pp 351–355 Bai Y, Zhang W, Hu H (2006a) An efficient growing ring SOM and its application to TSP. In: Proceedings of the international conference on applied mathematics, pp 351–355
go back to reference Bai Y, Zhang W, Jin Z (2006b) An new self-organizing maps strategy for solving the traveling salesman problem. Chaos Solitons Fract 28:1082–1089CrossRef Bai Y, Zhang W, Jin Z (2006b) An new self-organizing maps strategy for solving the traveling salesman problem. Chaos Solitons Fract 28:1082–1089CrossRef
go back to reference Batista A, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor 6:20–29CrossRef Batista A, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor 6:20–29CrossRef
go back to reference Chan PK, Wei F, Prodromidis A, Stolfo SJ (1999) Distributed data mining in credit card fraud detection. IEEE Intell Syst 14:67–74CrossRef Chan PK, Wei F, Prodromidis A, Stolfo SJ (1999) Distributed data mining in credit card fraud detection. IEEE Intell Syst 14:67–74CrossRef
go back to reference Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res 16:321–357 Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res 16:321–357
go back to reference Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: Proceedings of the European conference on principles and practice of knowledge discovery in databases, pp 107–119 Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: Proceedings of the European conference on principles and practice of knowledge discovery in databases, pp 107–119
go back to reference Chetchotsak D, Pattanapairoj S (2010) Committee network model for HDD functional tests. In: Proceedings of international conference on artificial neural networks in engineering, pp 629–636 Chetchotsak D, Pattanapairoj S (2010) Committee network model for HDD functional tests. In: Proceedings of international conference on artificial neural networks in engineering, pp 629–636
go back to reference Chetchotsak D, Twomey JM (2007) Combining neural networks for function approximation under conditions of sparse data: the biased regression approach. Int J Gen Syst 36:479–499CrossRef Chetchotsak D, Twomey JM (2007) Combining neural networks for function approximation under conditions of sparse data: the biased regression approach. Int J Gen Syst 36:479–499CrossRef
go back to reference Chyi YM (2003) Classification analysis techniques for skewed class distribution problems. Master thesis, Department of Information Management, National Sun Yat-Sen University Chyi YM (2003) Classification analysis techniques for skewed class distribution problems. Master thesis, Department of Information Management, National Sun Yat-Sen University
go back to reference Cohen G, Hilario M, Sax H, Hugonnet S, Geissbuhler A (2006) Learning from imbalanced data in surveillance of nosocomial infection. Artif Intell Med 37:7–18CrossRefPubMed Cohen G, Hilario M, Sax H, Hugonnet S, Geissbuhler A (2006) Learning from imbalanced data in surveillance of nosocomial infection. Artif Intell Med 37:7–18CrossRefPubMed
go back to reference Daskalaki S, Kopanas I, Avouris N (2006) Evaluation of classifiers for an uneven class distribution problem. Appl Artif Intell 20:381–417CrossRef Daskalaki S, Kopanas I, Avouris N (2006) Evaluation of classifiers for an uneven class distribution problem. Appl Artif Intell 20:381–417CrossRef
go back to reference Drummond C, Holte RC (2003) C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Proceedings of the international conference on machine learning Drummond C, Holte RC (2003) C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Proceedings of the international conference on machine learning
go back to reference Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of international joint conference on artificial intelligence, pp 973–978 Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of international joint conference on artificial intelligence, pp 973–978
go back to reference Fawcett T, Provost F (1997) Adaptive fraud detection. Data Min Knowl Discov 1:291–316CrossRef Fawcett T, Provost F (1997) Adaptive fraud detection. Data Min Knowl Discov 1:291–316CrossRef
go back to reference Fernandez A, Garcia S, Jesus MJ, Herrera F (2008) A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst 159:2378–2398CrossRef Fernandez A, Garcia S, Jesus MJ, Herrera F (2008) A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst 159:2378–2398CrossRef
go back to reference Ganji MF, Abadeh MS, Hedayati M, Bakhtiari N (2010) Fuzzy classification of imbalanced data sets for medical diagnosis. In: Proceedings of Iranian conference on biomedical engineering, pp 1–5 Ganji MF, Abadeh MS, Hedayati M, Bakhtiari N (2010) Fuzzy classification of imbalanced data sets for medical diagnosis. In: Proceedings of Iranian conference on biomedical engineering, pp 1–5
go back to reference Hilas CS, Mastorocostas PA (2008) An application of supervised and unsupervised learning approaches to telecommunications fraud detection. Knowl Based Syst 21:721–726CrossRef Hilas CS, Mastorocostas PA (2008) An application of supervised and unsupervised learning approaches to telecommunications fraud detection. Knowl Based Syst 21:721–726CrossRef
go back to reference Huang YM, Hung CM, Jiau HC (2006) Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem. Nonlinear Anal Real World Appl 7:720–757CrossRef Huang YM, Hung CM, Jiau HC (2006) Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem. Nonlinear Anal Real World Appl 7:720–757CrossRef
go back to reference Hwang JP, Park S, Kim E (2011) A new weighted approach to imbalanced data classification problem via support vector machine with quadratic cost function. Expert Syst Appl 38:8580–8585CrossRef Hwang JP, Park S, Kim E (2011) A new weighted approach to imbalanced data classification problem via support vector machine with quadratic cost function. Expert Syst Appl 38:8580–8585CrossRef
go back to reference Kang P, Cho S, MacLachlan DL (2012) Improved response modeling based on clustering, under-sampling, and ensemble. Expert Syst Appl 39:6738–6753CrossRef Kang P, Cho S, MacLachlan DL (2012) Improved response modeling based on clustering, under-sampling, and ensemble. Expert Syst Appl 39:6738–6753CrossRef
go back to reference Kubat MR, Holte C, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30:195–215CrossRef Kubat MR, Holte C, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30:195–215CrossRef
go back to reference Li DC, Liu CW, Hu SC (2010) A learning method for the class imbalance problem with medical data sets. Comput Biol Med 40:509–518CrossRefPubMed Li DC, Liu CW, Hu SC (2010) A learning method for the class imbalance problem with medical data sets. Comput Biol Med 40:509–518CrossRefPubMed
go back to reference Ling C, Li C (1998) Data mining for direct marketing: problems and solutions. In: Proceedings of international conference on knowledge discovery and data mining, pp 73–79 Ling C, Li C (1998) Data mining for direct marketing: problems and solutions. In: Proceedings of international conference on knowledge discovery and data mining, pp 73–79
go back to reference Liu Y, Yu X, Huang JX, An A (2011) Combining integrated sampling with SVM ensembles for learning from imbalanced dataset. Inf Process Manage 47:617–631CrossRef Liu Y, Yu X, Huang JX, An A (2011) Combining integrated sampling with SVM ensembles for learning from imbalanced dataset. Inf Process Manage 47:617–631CrossRef
go back to reference Mazurowski MA, Habas PA, Zurada JM, Lo JY, Baker JA, Tourassi GD (2008) Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw 21:427–436PubMedCentralCrossRefPubMed Mazurowski MA, Habas PA, Zurada JM, Lo JY, Baker JA, Tourassi GD (2008) Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw 21:427–436PubMedCentralCrossRefPubMed
go back to reference Nakamura M, Kajiwara Y, Otsuka A, Kimura H (2013) LVQ–SMOTE—learning vector quantization based synthetic Minority Over-Sampling Technique for biomedical data. BioData Min 6:16PubMedCentralCrossRefPubMed Nakamura M, Kajiwara Y, Otsuka A, Kimura H (2013) LVQ–SMOTE—learning vector quantization based synthetic Minority Over-Sampling Technique for biomedical data. BioData Min 6:16PubMedCentralCrossRefPubMed
go back to reference Nanthapodej R, Chetchotsak D (2009) Classification performance of committee networks improvement under sparse data conditions. Khon Kaen Univ Res J 9:65–76 Nanthapodej R, Chetchotsak D (2009) Classification performance of committee networks improvement under sparse data conditions. Khon Kaen Univ Res J 9:65–76
go back to reference Parmanto B, Munro PW, Doyle HR (1996) Reducing variance of committee prediction with resampling techiques. Connect Sci 8:405–425CrossRef Parmanto B, Munro PW, Doyle HR (1996) Reducing variance of committee prediction with resampling techiques. Connect Sci 8:405–425CrossRef
go back to reference Ren J (2012) ANN vs. SVM: which one performs better in classification of MCCs in mammogram imaging. Knowl Based Syst 26:144–153CrossRef Ren J (2012) ANN vs. SVM: which one performs better in classification of MCCs in mammogram imaging. Knowl Based Syst 26:144–153CrossRef
go back to reference Sasamura H, Ohta R, Saito T (2002) A simple learning algorithm for growing ring SOM and its application to TSP. In: Proceedings of international conference on neural information processing, pp 1287–1290 Sasamura H, Ohta R, Saito T (2002) A simple learning algorithm for growing ring SOM and its application to TSP. In: Proceedings of international conference on neural information processing, pp 1287–1290
go back to reference Sun Y, Kamel MS, Wong A, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. J Pattern Recogn Soc 40:3358–3378CrossRef Sun Y, Kamel MS, Wong A, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. J Pattern Recogn Soc 40:3358–3378CrossRef
go back to reference Tang Y, Zhang YQ, Chawla NV, Krasser S (2002) SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man Cybern Part B Cybern 39:281–288CrossRef Tang Y, Zhang YQ, Chawla NV, Krasser S (2002) SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man Cybern Part B Cybern 39:281–288CrossRef
go back to reference Yen SJ, Lee YS (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36:5718–5727CrossRef Yen SJ, Lee YS (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36:5718–5727CrossRef
go back to reference Yong Y (2012) The research of imbalanced data set of sample sampling method based on k- means cluster and genetic algorithm. Energy Procedia 17:164–170CrossRef Yong Y (2012) The research of imbalanced data set of sample sampling method based on k- means cluster and genetic algorithm. Energy Procedia 17:164–170CrossRef
go back to reference Young W, Nykl S, Weckman G, Chelberg D (2015) Using Voronoi diagrams to improve classification performances when modeling imbalanced datasets. Neural Comput Appl 26:1041–1054CrossRef Young W, Nykl S, Weckman G, Chelberg D (2015) Using Voronoi diagrams to improve classification performances when modeling imbalanced datasets. Neural Comput Appl 26:1041–1054CrossRef
go back to reference Zhang J, Mani I (2003) kNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of the ICML workshop on learning from imbalanced dataset Zhang J, Mani I (2003) kNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of the ICML workshop on learning from imbalanced dataset
go back to reference Zhang Y, Zhang D, Mi G, Ma D, Li G, Guo Y, Li M, Zhu M (2012) Using ensemble methods to deal with imbalanced data in predicting protein–protein interactions. Comput Biol Chem 36:36–41CrossRefPubMed Zhang Y, Zhang D, Mi G, Ma D, Li G, Guo Y, Li M, Zhu M (2012) Using ensemble methods to deal with imbalanced data in predicting protein–protein interactions. Comput Biol Chem 36:36–41CrossRefPubMed
Metadata
Title
Integrating new data balancing technique with committee networks for imbalanced data: GRSOM approach
Authors
Danaipong Chetchotsak
Sirorat Pattanapairoj
Banchar Arnonkijpanich
Publication date
01-12-2015
Publisher
Springer Netherlands
Published in
Cognitive Neurodynamics / Issue 6/2015
Print ISSN: 1871-4080
Electronic ISSN: 1871-4099
DOI
https://doi.org/10.1007/s11571-015-9350-4

Other articles of this Issue 6/2015

Cognitive Neurodynamics 6/2015 Go to the issue