Skip to main content
Top
Published in: Knowledge and Information Systems 1/2018

06-10-2017 | Regular Paper

A hybrid approach for classification of rare class data

Authors: Kapil Keshao Wankhade, Kalpana C. Jondhale, Vijaya R. Thool

Published in: Knowledge and Information Systems | Issue 1/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Learning of rare class data is a challenging problem in field of classification process. A rare class or imbalanced class learning is the common problem faced by many real-world applications, because of this many researcher work focused on this issue. Rare class data always generate wrong results because of overwhelming accuracy of minority class by majority class. There are lots of methods been proposed to handle imbalanced class or rare class or skew class problem. This paper proposes a hybrid method, i. e. classification- and clustering-based method, solving rare class problem. This proposed hybrid method uses k-means, ensemble and divide and merge methods. This method tries to improve detection rate of every class. For experimental work, the proposed method is tested on real datasets. The experimental results show that proposed method works well as compared with other algorithms.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Gudadhe M, Prakash P, Wankhade K (2010) A new data mining based network intrusion detection model. In: The proceedings of international conference on computer and communication technology (IEEE), Allahabad, India, pp 731–735 Gudadhe M, Prakash P, Wankhade K (2010) A new data mining based network intrusion detection model. In: The proceedings of international conference on computer and communication technology (IEEE), Allahabad, India, pp 731–735
2.
go back to reference Medioni G, Cohen I, Brémond F, Hongeng S, Nevatia R (2001) Event detection and analysis from video streams. IEEE Trans Pattern Anal Mach Intell 2001 23(8):873–889CrossRef Medioni G, Cohen I, Brémond F, Hongeng S, Nevatia R (2001) Event detection and analysis from video streams. IEEE Trans Pattern Anal Mach Intell 2001 23(8):873–889CrossRef
3.
go back to reference Zhong H, Shi J, Visontai M (2004) Detecting unusual activity in video. In: The proceeding of the IEEE computer society conference on computer vision and pattern recognition (CVPR’04), 2004, Washington, DC, 2:819–826 Zhong H, Shi J, Visontai M (2004) Detecting unusual activity in video. In: The proceeding of the IEEE computer society conference on computer vision and pattern recognition (CVPR’04), 2004, Washington, DC, 2:819–826
4.
go back to reference Huda S, Yearwood J, Jelinek HF, Hassan MM, Fortino G, Buckland M (2016) A hybrid feature selection with ensemble classification for imbalanced healthcare data: a case study for brain tumor diagnosis. IEEE Access 4:9145–9154CrossRef Huda S, Yearwood J, Jelinek HF, Hassan MM, Fortino G, Buckland M (2016) A hybrid feature selection with ensemble classification for imbalanced healthcare data: a case study for brain tumor diagnosis. IEEE Access 4:9145–9154CrossRef
5.
go back to reference Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In: The proceedings of 17th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’94), Dublin, Ireland, pp 3–12 Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In: The proceedings of 17th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’94), Dublin, Ireland, pp 3–12
6.
go back to reference Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. J Mach Learn 30(2):195–215CrossRef Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. J Mach Learn 30(2):195–215CrossRef
7.
go back to reference Phua C, Alahakoon D, Lee V (2004) Minority report in fraud detection: classification of skewed data. ACM SIGKDD Explor Newslett 6(1):50–59 (Special Issue on Learning from Imbalanced Datasets)CrossRef Phua C, Alahakoon D, Lee V (2004) Minority report in fraud detection: classification of skewed data. ACM SIGKDD Explor Newslett 6(1):50–59 (Special Issue on Learning from Imbalanced Datasets)CrossRef
8.
go back to reference Sit WY, Mao KZ (2013) Learning imbalanced classes in the presence of concept growth. In: The proceeding of IEEE conference on evolving and adaptive intelligent systems (EAIS), 2013, pp 62–69 Sit WY, Mao KZ (2013) Learning imbalanced classes in the presence of concept growth. In: The proceeding of IEEE conference on evolving and adaptive intelligent systems (EAIS), 2013, pp 62–69
9.
go back to reference Lin SC, Chang CYI, Yang WN (2009) Meta-learning for imbalanced data and classification ensemble in binary classification. J Neurocomput 73(1–3):484–494CrossRef Lin SC, Chang CYI, Yang WN (2009) Meta-learning for imbalanced data and classification ensemble in binary classification. J Neurocomput 73(1–3):484–494CrossRef
10.
go back to reference Khoshgoftaar TM, Seiffert C, Hulse JV, Napolitano A, Folleco A (2007) Learning with limited minority class data. In: The proceeding of 6th international conference on machine learning and applications (IEEE), pp 348–353 Khoshgoftaar TM, Seiffert C, Hulse JV, Napolitano A, Folleco A (2007) Learning with limited minority class data. In: The proceeding of 6th international conference on machine learning and applications (IEEE), pp 348–353
11.
go back to reference Wang S, Yao X (2013) Relationships between diversity of classification ensembles and single-class performance measures. IEEE Trans Knowl Data Eng 25(1):206–219CrossRef Wang S, Yao X (2013) Relationships between diversity of classification ensembles and single-class performance measures. IEEE Trans Knowl Data Eng 25(1):206–219CrossRef
12.
go back to reference He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef
13.
go back to reference Seiffert C, Khoshgoftaar TM, Hulse JV, Napolitano A (2007) Mining data with rare events: a case study. In: The proceeding of the 19th IEEE international conference on tools with artificial intelligence, pp 132–139 Seiffert C, Khoshgoftaar TM, Hulse JV, Napolitano A (2007) Mining data with rare events: a case study. In: The proceeding of the 19th IEEE international conference on tools with artificial intelligence, pp 132–139
14.
go back to reference Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2011) A review on ensembles for the class imbalance problem: bagging–boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern-Part C: Appl Rev 42(4):463–484CrossRef Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2011) A review on ensembles for the class imbalance problem: bagging–boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern-Part C: Appl Rev 42(4):463–484CrossRef
15.
go back to reference Krawczyk B, Schaefer G, Wozniak M (2013) An evaluation of classifier ensembles for class imbalance problems. In: The proceeding of international conference on informatics, electronics and vision (ICIEV-IEEE), pp 1–4 Krawczyk B, Schaefer G, Wozniak M (2013) An evaluation of classifier ensembles for class imbalance problems. In: The proceeding of international conference on informatics, electronics and vision (ICIEV-IEEE), pp 1–4
16.
go back to reference Wang S, Yao X (2012) Multiclass imbalance problems: analysis and potential solutions. IEEE Trans Syst Man Cybern-Part B: Cybern 42(4):1119–1130CrossRef Wang S, Yao X (2012) Multiclass imbalance problems: analysis and potential solutions. IEEE Trans Syst Man Cybern-Part B: Cybern 42(4):1119–1130CrossRef
17.
go back to reference Liu N, Woon WL, Aung Z, Afshari A (2014) Handling class imbalance in customer behavior prediction. In: The proceedings of international conference on collaboration technologies and systems (CTS-IEEE), pp 100–103 Liu N, Woon WL, Aung Z, Afshari A (2014) Handling class imbalance in customer behavior prediction. In: The proceedings of international conference on collaboration technologies and systems (CTS-IEEE), pp 100–103
18.
go back to reference Yang Z, Tang W, Shintemirov A, Wu Q (2009) Association rule mining-based dissolved gas analysis for fault diagnosis of power transformers. IEEE Trans Syst Man Cybern—Part C Appl Rev 39(6):597–610CrossRef Yang Z, Tang W, Shintemirov A, Wu Q (2009) Association rule mining-based dissolved gas analysis for fault diagnosis of power transformers. IEEE Trans Syst Man Cybern—Part C Appl Rev 39(6):597–610CrossRef
19.
go back to reference Zhu ZB, Song ZH (2010) Fault diagnosis based on imbalance modified kernel fisher discriminant analysis. J Chem Eng Res Des 88(8):936–951CrossRef Zhu ZB, Song ZH (2010) Fault diagnosis based on imbalance modified kernel fisher discriminant analysis. J Chem Eng Res Des 88(8):936–951CrossRef
20.
go back to reference Khreich W, Granger E, Miri A, Sabourin R (2010) Iterative boolean combination of classifiers in the roc space: an application to anomaly detection with HMMs. J Pattern Recognit 43(8):2732–2752CrossRefMATH Khreich W, Granger E, Miri A, Sabourin R (2010) Iterative boolean combination of classifiers in the roc space: an application to anomaly detection with HMMs. J Pattern Recognit 43(8):2732–2752CrossRefMATH
21.
go back to reference Tavallaee M, Stakhanova N, Ghorbani A (2010) Toward credible evaluation of anomaly-based intrusion-detection methods. IEEE Trans Syst Cybern: Part C Appl Rev 40(5):516–524 Tavallaee M, Stakhanova N, Ghorbani A (2010) Toward credible evaluation of anomaly-based intrusion-detection methods. IEEE Trans Syst Cybern: Part C Appl Rev 40(5):516–524
22.
go back to reference Mazurowski MA, Habas PA, Zurada JM, Lo JY, Baker JA, Tourassi GD (2008) Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw 21(2–3):427–436CrossRef Mazurowski MA, Habas PA, Zurada JM, Lo JY, Baker JA, Tourassi GD (2008) Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw 21(2–3):427–436CrossRef
23.
go back to reference del Castillo MD, Serrano JI (2004) A multi strategy approach for digital text categorization from imbalanced documents. ACM SIGKDD Explor Newslett 6(1):70–79 (Special Issue on Learning from Imbalanced Datasets)CrossRef del Castillo MD, Serrano JI (2004) A multi strategy approach for digital text categorization from imbalanced documents. ACM SIGKDD Explor Newslett 6(1):70–79 (Special Issue on Learning from Imbalanced Datasets)CrossRef
24.
go back to reference Turney PD (2000) Learning algorithms for key phrase extraction. J Inf Retr 2(4):303–336CrossRef Turney PD (2000) Learning algorithms for key phrase extraction. J Inf Retr 2(4):303–336CrossRef
25.
go back to reference Ling CX, Li C (1998) Data mining for direct marketing: problems and solutions. In: The proceedings of 4th international conference on knowledge discovery and data mining (KDD), pp 73–79 Ling CX, Li C (1998) Data mining for direct marketing: problems and solutions. In: The proceedings of 4th international conference on knowledge discovery and data mining (KDD), pp 73–79
26.
go back to reference Bermejo P, Gamez JA, Puerta JM (2011) Improving the performance of naive bayes multinomial in e-mail foldering by introducing distribution based balance of datasets. J Expert Syst Appl 38(3):2072–2080CrossRef Bermejo P, Gamez JA, Puerta JM (2011) Improving the performance of naive bayes multinomial in e-mail foldering by introducing distribution based balance of datasets. J Expert Syst Appl 38(3):2072–2080CrossRef
27.
go back to reference Liu YH, Chen YT (2005) Total margin-based adaptive fuzzy support vector machines for multiview face recognition. In: The proceeding IEEE international conference on system, man and cybernetics 2:1704–1711 Liu YH, Chen YT (2005) Total margin-based adaptive fuzzy support vector machines for multiview face recognition. In: The proceeding IEEE international conference on system, man and cybernetics 2:1704–1711
28.
go back to reference Breiman L (1996) Bagging predictors. J Mach Learn 24(2):123–140MATH Breiman L (1996) Bagging predictors. J Mach Learn 24(2):123–140MATH
29.
go back to reference Freund Y, Schapire RE (1997) A decision-theoretic generalization of online learning and an application to boosting. J Comput Syst Sci 55(1):119–139CrossRefMATH Freund Y, Schapire RE (1997) A decision-theoretic generalization of online learning and an application to boosting. J Comput Syst Sci 55(1):119–139CrossRefMATH
30.
go back to reference Lin S, Wang C, Wu Z, Chung Y (2013) Detect rare events via MICE algorithm with optimal threshold. In: The proceeding of 7th international conference on innovative mobile and internet services in ubiquitous computing (IEEE), pp 70–75 Lin S, Wang C, Wu Z, Chung Y (2013) Detect rare events via MICE algorithm with optimal threshold. In: The proceeding of 7th international conference on innovative mobile and internet services in ubiquitous computing (IEEE), pp 70–75
31.
go back to reference Seiffert C, Khoshgoftaar TM, Hulse JV, Napolitano A (2010) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern-Part A: Syst Hum 40(1):185–197CrossRef Seiffert C, Khoshgoftaar TM, Hulse JV, Napolitano A (2010) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern-Part A: Syst Hum 40(1):185–197CrossRef
32.
go back to reference Oh S, Lee MS, Zhang B (2011) Ensemble learning with active example selection for imbalanced biomedical data classification. IEEE/ACM Trans Comput Biol Bioinf 8(2):316–325CrossRef Oh S, Lee MS, Zhang B (2011) Ensemble learning with active example selection for imbalanced biomedical data classification. IEEE/ACM Trans Comput Biol Bioinf 8(2):316–325CrossRef
33.
go back to reference Yang P, Yoo PD, Fernando J, Zhou BB, Zhang Z, Zomaya AY (2014) Sample subset optimization techniques for imbalanced and ensemble learning problems in bioinformatics applications. IEEE Trans Cybern 44(3):445–455CrossRef Yang P, Yoo PD, Fernando J, Zhou BB, Zhang Z, Zomaya AY (2014) Sample subset optimization techniques for imbalanced and ensemble learning problems in bioinformatics applications. IEEE Trans Cybern 44(3):445–455CrossRef
34.
go back to reference Ditzler G, Polikar R (2013) Incremental learning of concept drift from streaming imbalanced data. IEEE Trans Knowl Data Eng 25(10):2283–2301CrossRef Ditzler G, Polikar R (2013) Incremental learning of concept drift from streaming imbalanced data. IEEE Trans Knowl Data Eng 25(10):2283–2301CrossRef
35.
go back to reference Sun Y, Kamel MS, Wang Y (2006) Boosting for learning multiple classes with imbalanced class distribution. In: The proceeding of 6th international conference on data mining (ICDM), pp 592–602 Sun Y, Kamel MS, Wang Y (2006) Boosting for learning multiple classes with imbalanced class distribution. In: The proceeding of 6th international conference on data mining (ICDM), pp 592–602
36.
go back to reference Krawczyk B, Schaefer G, Wozniak M (2013) Combining one-class classifiers for imbalanced classification of breast thermogram features. In: The proceeding of the IEEE 4th international workshop on computational intelligence in medical imaging (CIMI), 2013, pp 36–41 Krawczyk B, Schaefer G, Wozniak M (2013) Combining one-class classifiers for imbalanced classification of breast thermogram features. In: The proceeding of the IEEE 4th international workshop on computational intelligence in medical imaging (CIMI), 2013, pp 36–41
37.
go back to reference Wang S, Minku LL, Yao X (2015) Resampling-based ensemble methods for online class imbalance learning. IEEE Trans Knowl Data Eng 27(5):1356–1368CrossRef Wang S, Minku LL, Yao X (2015) Resampling-based ensemble methods for online class imbalance learning. IEEE Trans Knowl Data Eng 27(5):1356–1368CrossRef
38.
go back to reference Ahumada H, Grinblat GL, Uzal LC, Granitto PM, Ceccatto A (2008) REPMAC: A new hybrid approach to highly imbalanced classification problems. In: The proceeding of 8th international conference on hybrid intelligent systems (IEEE) pp 386–391 Ahumada H, Grinblat GL, Uzal LC, Granitto PM, Ceccatto A (2008) REPMAC: A new hybrid approach to highly imbalanced classification problems. In: The proceeding of 8th international conference on hybrid intelligent systems (IEEE) pp 386–391
39.
go back to reference Jeatrakul P, Wong KW (2012) Enhancing classification performance of multi-class imbalanced data using the OAA-DB algorithm. In: The proceeding of IEEE world congress on computational intelligence (WCCI), pp 1–8 Jeatrakul P, Wong KW (2012) Enhancing classification performance of multi-class imbalanced data using the OAA-DB algorithm. In: The proceeding of IEEE world congress on computational intelligence (WCCI), pp 1–8
40.
go back to reference Tan SC, Watada J, Ibrahim Z, Khalid M, Jau LW, Chew LC (2011), Learning with imbalanced datasets using fuzzy ARTMAP-based neural network models. In: The proceeding of IEEE international conference on fuzzy systems, 2011, Taiwan, pp 1084–1089 Tan SC, Watada J, Ibrahim Z, Khalid M, Jau LW, Chew LC (2011), Learning with imbalanced datasets using fuzzy ARTMAP-based neural network models. In: The proceeding of IEEE international conference on fuzzy systems, 2011, Taiwan, pp 1084–1089
41.
go back to reference Cao P, Li B, Zhao D, Zaiane O (2013) A novel cost sensitive neural network ensemble for multiclass imbalance data learning. In: The proceeding of international joint conference on neural networks (IJCNN- IEEE) pp 1–8 Cao P, Li B, Zhao D, Zaiane O (2013) A novel cost sensitive neural network ensemble for multiclass imbalance data learning. In: The proceeding of international joint conference on neural networks (IJCNN- IEEE) pp 1–8
42.
go back to reference Fu J, Lee S (2011) Certainty-enhanced active learning for improving imbalanced data classification. In: The proceeding of 11th IEEE international conference on data mining workshops, IEEE, pp 405–412 Fu J, Lee S (2011) Certainty-enhanced active learning for improving imbalanced data classification. In: The proceeding of 11th IEEE international conference on data mining workshops, IEEE, pp 405–412
43.
go back to reference Antwi DK, Viktor HL, Japkowicz N (2012) The PerfSim algorithm for concept drift detection in imbalanced data. In: The proceeding of 12th IEEE international conference on data mining workshops, pp 619–628 Antwi DK, Viktor HL, Japkowicz N (2012) The PerfSim algorithm for concept drift detection in imbalanced data. In: The proceeding of 12th IEEE international conference on data mining workshops, pp 619–628
44.
go back to reference Alhammady H, Ramamohanarao K (2004) Using emerging patterns and decision trees in rare-class classification. In: The proceedings of the 4th IEEE international conference on data mining (ICDM’04), pp 315–318 Alhammady H, Ramamohanarao K (2004) Using emerging patterns and decision trees in rare-class classification. In: The proceedings of the 4th IEEE international conference on data mining (ICDM’04), pp 315–318
45.
go back to reference Wang P, Wang H, Wu X, Wang W, Shi B (2007) A low-granularity classifier for data streams with concept drifts and biased class distribution. IEEE Trans Knowl Data Eng 19(9):1202–1213CrossRef Wang P, Wang H, Wu X, Wang W, Shi B (2007) A low-granularity classifier for data streams with concept drifts and biased class distribution. IEEE Trans Knowl Data Eng 19(9):1202–1213CrossRef
46.
go back to reference Thach NH, Rojanavasu P, Pinngern O (2008) Cost-sensitive XCS classifier system addressing imbalance problems. In: The proceeding of 5th international conference on fuzzy systems and knowledge discovery, pp 132–136 Thach NH, Rojanavasu P, Pinngern O (2008) Cost-sensitive XCS classifier system addressing imbalance problems. In: The proceeding of 5th international conference on fuzzy systems and knowledge discovery, pp 132–136
47.
go back to reference Orriols-Puig A, Bernadó-Mansilla E, Goldberg DE, Sastry K, Lanzi PL (2009) Facetwise analysis of XCS for problems with class imbalances. IEEE Trans Evol Comput 13(5):1093–1119CrossRef Orriols-Puig A, Bernadó-Mansilla E, Goldberg DE, Sastry K, Lanzi PL (2009) Facetwise analysis of XCS for problems with class imbalances. IEEE Trans Evol Comput 13(5):1093–1119CrossRef
48.
go back to reference He J, Tong H, Carbonell J (2010) Rare category characterization. In: The proceeding of IEEE international conference on data mining, pp 226–235 He J, Tong H, Carbonell J (2010) Rare category characterization. In: The proceeding of IEEE international conference on data mining, pp 226–235
49.
go back to reference Wallace BC, Dahabreh IJ (2012) Class probability estimates are unreliable for imbalanced data (and how to fix them). In: The proceeding of IEEE 12th international conference on data mining, pp 695–704 Wallace BC, Dahabreh IJ (2012) Class probability estimates are unreliable for imbalanced data (and how to fix them). In: The proceeding of IEEE 12th international conference on data mining, pp 695–704
50.
go back to reference Hospedales TM, Gong S, Xiang T (2013) Finding rare classes: active learning with generative and discriminative models. IEEE Trans Knowl Data Eng 25(2):374–386CrossRef Hospedales TM, Gong S, Xiang T (2013) Finding rare classes: active learning with generative and discriminative models. IEEE Trans Knowl Data Eng 25(2):374–386CrossRef
51.
go back to reference Own HS, AAl NAA, Abraham A (2010) A new weighted rough set framework for imbalance class distribution. In: The proceeding of international conference of soft computing and pattern recognition (IEEE), pp 29–34 Own HS, AAl NAA, Abraham A (2010) A new weighted rough set framework for imbalance class distribution. In: The proceeding of international conference of soft computing and pattern recognition (IEEE), pp 29–34
52.
go back to reference Huang K, Yang H, King I, Lyu MR (2006) Imbalanced learning with a biased minimax probability machine. IEEE Trans Syst Man Cybern-Part B: Cybern 36(4):913–923CrossRef Huang K, Yang H, King I, Lyu MR (2006) Imbalanced learning with a biased minimax probability machine. IEEE Trans Syst Man Cybern-Part B: Cybern 36(4):913–923CrossRef
53.
go back to reference Huang K, Yang H, King I, Lyu MR (2004) Learning classifiers from imbalanced data based on biased minimax probability machine. In: The proceeding of the IEEE computer society conference on computer vision and pattern recognition (CVPR’04), 2004, pp 558–563 Huang K, Yang H, King I, Lyu MR (2004) Learning classifiers from imbalanced data based on biased minimax probability machine. In: The proceeding of the IEEE computer society conference on computer vision and pattern recognition (CVPR’04), 2004, pp 558–563
54.
go back to reference Su C, Hsiao Y (2007) An evaluation of the robustness of MTS for imbalanced data. IEEE Trans Knowl Data Eng 19(10):1321–1332CrossRef Su C, Hsiao Y (2007) An evaluation of the robustness of MTS for imbalanced data. IEEE Trans Knowl Data Eng 19(10):1321–1332CrossRef
55.
go back to reference Diamantini C, Potena D (2009) Bayes vector quantizer for class-imbalance problem. IEEE Trans Knowl Data Eng 21(5):638–651CrossRef Diamantini C, Potena D (2009) Bayes vector quantizer for class-imbalance problem. IEEE Trans Knowl Data Eng 21(5):638–651CrossRef
56.
go back to reference Williams DP, Myers V, Silvious MS (2009) Mine classification with imbalanced data. IEEE Geosci Remote Sens Lett 6(3):528–532CrossRef Williams DP, Myers V, Silvious MS (2009) Mine classification with imbalanced data. IEEE Geosci Remote Sens Lett 6(3):528–532CrossRef
57.
go back to reference Castro CL, Braga AP (2013) Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Trans Neural Netw Learn Syst 24(6):888–899CrossRef Castro CL, Braga AP (2013) Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Trans Neural Netw Learn Syst 24(6):888–899CrossRef
58.
go back to reference Wu G, Chang EY (2005) KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Trans Knowl Data Eng 17(6):786–795CrossRef Wu G, Chang EY (2005) KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Trans Knowl Data Eng 17(6):786–795CrossRef
59.
go back to reference Chen S, He H (2009) SERA: selectively recursive approach towards nonstationary imbalanced stream data mining. In: The proceeding of international joint conference on neural networks (IEEE) USA, pp 522–529 Chen S, He H (2009) SERA: selectively recursive approach towards nonstationary imbalanced stream data mining. In: The proceeding of international joint conference on neural networks (IEEE) USA, pp 522–529
60.
go back to reference Fu J, Lee S (2011) Certainty-enhanced active learning for improving imbalanced data classification. In: The proceeding of the 11th IEEE international conference on data mining workshops, 2011, pp 405–412 Fu J, Lee S (2011) Certainty-enhanced active learning for improving imbalanced data classification. In: The proceeding of the 11th IEEE international conference on data mining workshops, 2011, pp 405–412
61.
go back to reference Yang Z, Gao D (2012) An active under-sampling approach for imbalanced data classification. In: The proceeding of the 5th international symposium on computational intelligence and design (IEEE), pp 270–273 Yang Z, Gao D (2012) An active under-sampling approach for imbalanced data classification. In: The proceeding of the 5th international symposium on computational intelligence and design (IEEE), pp 270–273
62.
go back to reference Kwak J, Lee T, Kim CO (2015) An incremental clustering-based fault detection algorithm for class-imbalanced process data. IEEE Trans Semicond Manuf 28(3):1–11CrossRef Kwak J, Lee T, Kim CO (2015) An incremental clustering-based fault detection algorithm for class-imbalanced process data. IEEE Trans Semicond Manuf 28(3):1–11CrossRef
63.
go back to reference Zhang X, Hu B (2014) A new strategy of cost-free learning in the class imbalance problem. IEEE Trans Knowl Data Eng 26(12):2872–2885CrossRef Zhang X, Hu B (2014) A new strategy of cost-free learning in the class imbalance problem. IEEE Trans Knowl Data Eng 26(12):2872–2885CrossRef
64.
go back to reference Park S, Ha Y (2014) Large imbalance data classification based on mapreduce for traffic accident prediction. In: The proceeding of 8th international conference on innovative mobile and internet services in ubiquitous computing, pp. 45–49 Park S, Ha Y (2014) Large imbalance data classification based on mapreduce for traffic accident prediction. In: The proceeding of 8th international conference on innovative mobile and internet services in ubiquitous computing, pp. 45–49
65.
go back to reference Das B, Krishnan NC, Cook DJ (2015) RACOG and wRACOG: two probabilistic oversampling techniques. IEEE Trans Knowl Data Eng 27(1):222–234CrossRef Das B, Krishnan NC, Cook DJ (2015) RACOG and wRACOG: two probabilistic oversampling techniques. IEEE Trans Knowl Data Eng 27(1):222–234CrossRef
66.
go back to reference Yu X, Zhang X (2012) Imbalanced data classification algorithm based on hybrid model. In: The proceeding of international conference on machine learning and cybernetics (IEEE) pp 735–740 Yu X, Zhang X (2012) Imbalanced data classification algorithm based on hybrid model. In: The proceeding of international conference on machine learning and cybernetics (IEEE) pp 735–740
67.
go back to reference Tang Y, Zhang Y, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man and Cybern-Part B: Cybern 39(1):281–288CrossRef Tang Y, Zhang Y, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man and Cybern-Part B: Cybern 39(1):281–288CrossRef
68.
go back to reference Phoungphol P, Zhang Y, Zhao Y, Srichandan B (2012) Multiclass SVM with ramp loss for imbalanced data classification. In: The proceeding of the IEEE international conference on granular computing, 2012, pp 376–381 Phoungphol P, Zhang Y, Zhao Y, Srichandan B (2012) Multiclass SVM with ramp loss for imbalanced data classification. In: The proceeding of the IEEE international conference on granular computing, 2012, pp 376–381
69.
go back to reference Zhou X, Lu S, Hu L, Zhang M (2012) Imbalanced extreme support vector machine. In: The proceeding of the international conference on machine learning and cybernetics (IEEE), 2012, pp 483–489 Zhou X, Lu S, Hu L, Zhang M (2012) Imbalanced extreme support vector machine. In: The proceeding of the international conference on machine learning and cybernetics (IEEE), 2012, pp 483–489
70.
go back to reference Anand R, Mehrotra KG, Mohan KC, Ranka S (1993) An improved algorithm for neural network classification of imbalanced training sets. IEEE Trans Neural Netw 4(6):962–969CrossRef Anand R, Mehrotra KG, Mohan KC, Ranka S (1993) An improved algorithm for neural network classification of imbalanced training sets. IEEE Trans Neural Netw 4(6):962–969CrossRef
71.
go back to reference Lin M, Tang K, Yao X (2013) Dynamic sampling approach to training neural networks for multiclass imbalance classification. IEEE Trans Neural Netw Learn Syst 24(4):647–660CrossRef Lin M, Tang K, Yao X (2013) Dynamic sampling approach to training neural networks for multiclass imbalance classification. IEEE Trans Neural Netw Learn Syst 24(4):647–660CrossRef
72.
go back to reference Vorraboot P, Rasmequan S, Lursinsap C, Chinnasarn K (2012) A modified error function for Imbalanced dataset classification problem. In: The proceeding of 7th international conference on computing and convergence technology (ICCCT-IEEE), pp 854–859 Vorraboot P, Rasmequan S, Lursinsap C, Chinnasarn K (2012) A modified error function for Imbalanced dataset classification problem. In: The proceeding of 7th international conference on computing and convergence technology (ICCCT-IEEE), pp 854–859
73.
go back to reference Lee MS, Oh S, Zhang B (2009) Ensemble learning based on active example selection for solving imbalanced data problem in biomedical data. In: The proceeding of IEEE international conference on bioinformatics and biomedicine, pp 350–355 Lee MS, Oh S, Zhang B (2009) Ensemble learning based on active example selection for solving imbalanced data problem in biomedical data. In: The proceeding of IEEE international conference on bioinformatics and biomedicine, pp 350–355
74.
go back to reference Murphey YL, Wang H, Ou G, Feldkamp LA (2007), OAHO: an effective algorithm for multi-class learning from imbalanced data. In: The proceeding of international joint conference on neural networks (IEEE) USA, pp 406–411 Murphey YL, Wang H, Ou G, Feldkamp LA (2007), OAHO: an effective algorithm for multi-class learning from imbalanced data. In: The proceeding of international joint conference on neural networks (IEEE) USA, pp 406–411
75.
go back to reference Nguyen HM, Cooper EW, Kamei K (2011) Online learning from imbalanced data streams. In: The proceeding of international conference of soft computing and pattern recognition (SoCPaR-IEEE), pp 347–352 Nguyen HM, Cooper EW, Kamei K (2011) Online learning from imbalanced data streams. In: The proceeding of international conference of soft computing and pattern recognition (SoCPaR-IEEE), pp 347–352
76.
go back to reference Koknar-Tezel S, Latecki LJ (2009) Improving SVM classification on imbalanced data sets in distance spaces. In: The proceeding of 9th IEEE international conference on data mining, pp 259–267 Koknar-Tezel S, Latecki LJ (2009) Improving SVM classification on imbalanced data sets in distance spaces. In: The proceeding of 9th IEEE international conference on data mining, pp 259–267
77.
go back to reference Zhou B, Yang C, Guo H, Hu J (2013) A Quasi-linear SVM combined with assembled SMOTE for imbalanced data classification. In: The proceeding of international joint conference on neural networks (IJCNN-IEEE), 2013, pp 1–7 Zhou B, Yang C, Guo H, Hu J (2013) A Quasi-linear SVM combined with assembled SMOTE for imbalanced data classification. In: The proceeding of international joint conference on neural networks (IJCNN-IEEE), 2013, pp 1–7
78.
go back to reference Pengfei J, Chunkai Z, Zhenyu H (2014) A new sampling approach for classification of imbalanced data sets with high density. In: The proceeding of international conference on big data and smart computing (BigComp-IEEE) pp 217–222 Pengfei J, Chunkai Z, Zhenyu H (2014) A new sampling approach for classification of imbalanced data sets with high density. In: The proceeding of international conference on big data and smart computing (BigComp-IEEE) pp 217–222
79.
go back to reference Huang H, Lin Y, Chen Y, Lu H (2012) Imbalanced data classification using random subspace method and SMOTE. In: The proceeding of joint 6th international conference on soft computing and intelligent systems (SCIS) and 13th international symposium on advanced intelligent systems (ISIS), 2012, Japan, pp 817–820 Huang H, Lin Y, Chen Y, Lu H (2012) Imbalanced data classification using random subspace method and SMOTE. In: The proceeding of joint 6th international conference on soft computing and intelligent systems (SCIS) and 13th international symposium on advanced intelligent systems (ISIS), 2012, Japan, pp 817–820
80.
go back to reference Rashu RI, Haq N, Rahman RM (2014) Data mining approaches to predict final grade by overcoming class imbalance problem. In: The proceeding of 17th international conference on computer and information technology (ICCIT), pp 14–19 Rashu RI, Haq N, Rahman RM (2014) Data mining approaches to predict final grade by overcoming class imbalance problem. In: The proceeding of 17th international conference on computer and information technology (ICCIT), pp 14–19
81.
go back to reference Han J, Kamber M (2006) Data Mining : Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers, BurlingtonMATH Han J, Kamber M (2006) Data Mining : Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers, BurlingtonMATH
82.
go back to reference Muda Z, Yassin W, Sulaiman MN, Udzir NI (2011) Intrusion detection based on K-means clustering and Naïve Bayes classification. In: Proceedings of 7th International Conference on IT in Asia (CITA-IEEE) pp 1–6 Muda Z, Yassin W, Sulaiman MN, Udzir NI (2011) Intrusion detection based on K-means clustering and Naïve Bayes classification. In: Proceedings of 7th International Conference on IT in Asia (CITA-IEEE) pp 1–6
83.
go back to reference Attar V, Sinha P, Wankhade K (2010) A fast and light classifier for data streams. Spring Evolv Syst 1(3):199–207CrossRef Attar V, Sinha P, Wankhade K (2010) A fast and light classifier for data streams. Spring Evolv Syst 1(3):199–207CrossRef
84.
go back to reference Cheng D, Kannan R, Vempala S, Wang G (2006) A divide-and-merge methodology for clustering. ACM Trans Database Syst 21(4):1499–1525CrossRef Cheng D, Kannan R, Vempala S, Wang G (2006) A divide-and-merge methodology for clustering. ACM Trans Database Syst 21(4):1499–1525CrossRef
86.
go back to reference Oza N, Russell S (2001) Online bagging and boosting. In: Artificial intelligence and statistics, Morgan Kaufmann, pp 105–112 Oza N, Russell S (2001) Online bagging and boosting. In: Artificial intelligence and statistics, Morgan Kaufmann, pp 105–112
88.
go back to reference Bieft A, Holmes G, Pfahringer B, Kirkby R, Gavalda R (2009) New ensemble methods for evolving data streams. In: KDD, pp 139–148 Bieft A, Holmes G, Pfahringer B, Kirkby R, Gavalda R (2009) New ensemble methods for evolving data streams. In: KDD, pp 139–148
89.
go back to reference Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques, Morgan Kaufmann series in data management systems, 2nd ed, pp 1–525 Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques, Morgan Kaufmann series in data management systems, 2nd ed, pp 1–525
Metadata
Title
A hybrid approach for classification of rare class data
Authors
Kapil Keshao Wankhade
Kalpana C. Jondhale
Vijaya R. Thool
Publication date
06-10-2017
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 1/2018
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-017-1114-5

Other articles of this Issue 1/2018

Knowledge and Information Systems 1/2018 Go to the issue

Premium Partner