Skip to main content
Erschienen in: Soft Computing 22/2020

13.05.2020 | Methodologies and Application

A design of information granule-based under-sampling method in imbalanced data classification

verfasst von: Tianyu Liu, Xiubin Zhu, Witold Pedrycz, Zhiwu Li

Erschienen in: Soft Computing | Ausgabe 22/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In numerous real-world problems, we are faced with difficulties in learning from imbalanced data. The classification performance of a “standard” classifier (learning algorithm) is evidently hindered by the imbalanced distribution of data. The over-sampling and under-sampling methods have been researched extensively with the aim to increase the predication accuracy over the minority class. However, traditional under-sampling methods tend to ignore important characteristics pertinent to the majority class. In this paper, a novel under-sampling method based on information granules is proposed. The method exploits the concepts and algorithms of granular computing. First, information granules are built around the selected patterns coming from the majority class to capture the essence of the data belonging to this class. In the sequel, the resultant information granules are evaluated in terms of their quality and those with the highest specificity values are selected. Next, the selected numeric data are augmented by some weights implied by the size of information granules. Finally, a support vector machine and a K-nearest-neighbor classifier, both being regarded here as representative classifiers, are built based on the weighted data. Experimental studies are carried out using synthetic data as well as a suite of imbalanced data sets coming from the public machine learning repositories. The experimental results quantify the performance of support vector machine and K-nearest-neighbor with under-sampling method based on information granules. The results demonstrate the superiority of the performance obtained for these classifiers endowed with conventional under-sampling method. In general, the improvement of performance expressed in terms of G-means is over 10% when applying information granule under-sampling compared with random under-sampling.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abualigah LMQ (2018) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, Berlin Abualigah LMQ (2018) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, Berlin
Zurück zum Zitat Abualigah LMQ, Hanandeh ES (2015) “Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl IJCSEA 5(1):19–28 Abualigah LMQ, Hanandeh ES (2015) “Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl IJCSEA 5(1):19–28
Zurück zum Zitat Abualigah LM, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73(11):4773–4795 Abualigah LM, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73(11):4773–4795
Zurück zum Zitat Abualigah LM, Khader AT, Hanandeh ES (2017) A novel hybridization strategy for krill herd algorithm applied to clustering techniques. Appl Soft Comput 60:423–435 Abualigah LM, Khader AT, Hanandeh ES (2017) A novel hybridization strategy for krill herd algorithm applied to clustering techniques. Appl Soft Comput 60:423–435
Zurück zum Zitat Abualigah LM, Khader AT, Hanandeh ES (2018a) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci 25:456–466 Abualigah LM, Khader AT, Hanandeh ES (2018a) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci 25:456–466
Zurück zum Zitat Abualigah LM, Khader AT, Hanandeh ES (2018b) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071 Abualigah LM, Khader AT, Hanandeh ES (2018b) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071
Zurück zum Zitat Abualigah LM, Khader AT, Hanandeh ES (2018c) A combination of objective functions and hybrid Krill herd algorithm for text document clustering analysis. Eng Appl Artif Intell 73:111–125 Abualigah LM, Khader AT, Hanandeh ES (2018c) A combination of objective functions and hybrid Krill herd algorithm for text document clustering analysis. Eng Appl Artif Intell 73:111–125
Zurück zum Zitat Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17(2–3):255–287 Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17(2–3):255–287
Zurück zum Zitat Alibeigi M, Hashemi S, Hamzeh A (2012) DBFS: an effective density based feature selection scheme for small sample size and high dimensional imbalanced data sets. Data Knowl Eng 81–82:67–103 Alibeigi M, Hashemi S, Hamzeh A (2012) DBFS: an effective density based feature selection scheme for small sample size and high dimensional imbalanced data sets. Data Knowl Eng 81–82:67–103
Zurück zum Zitat Barua S, Islam MM, Yao X, Muras K (2014) MWMOTE—Majority weighted minority over-sampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425 Barua S, Islam MM, Yao X, Muras K (2014) MWMOTE—Majority weighted minority over-sampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425
Zurück zum Zitat Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6(1):20–29 Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6(1):20–29
Zurück zum Zitat Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of 5th annual ACM workshop on computer learning theory. ACM Press, Pittsburgh, PA, pp 144–152 Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of 5th annual ACM workshop on computer learning theory. ACM Press, Pittsburgh, PA, pp 144–152
Zurück zum Zitat Bunkhumpornpat C, Sinapiromsaran K (2017) DBMUTE: density-based majority under-sampling technique. Knowl Inf Syst 50(3):827–850 Bunkhumpornpat C, Sinapiromsaran K (2017) DBMUTE: density-based majority under-sampling technique. Knowl Inf Syst 50(3):827–850
Zurück zum Zitat Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Adv Knowl Discov Data Min 5476:475–482 Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Adv Knowl Discov Data Min 5476:475–482
Zurück zum Zitat Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357MATH Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357MATH
Zurück zum Zitat Chawla N, Lazarevic A, Hall L, Bowyer K (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: Proceedings of 7th European conference on principles and practice of knowledge discovery in databases (PKDD), Dubrovnik, Croatia, pp 107–119 Chawla N, Lazarevic A, Hall L, Bowyer K (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: Proceedings of 7th European conference on principles and practice of knowledge discovery in databases (PKDD), Dubrovnik, Croatia, pp 107–119
Zurück zum Zitat Chawla NV, Cieslak DA, Hall LO, Joshi A (2008) Automatically countering imbalance and its empirical relationship to cost. Data Min Knowl Disc 17(2):225–252MathSciNet Chawla NV, Cieslak DA, Hall LO, Joshi A (2008) Automatically countering imbalance and its empirical relationship to cost. Data Min Knowl Disc 17(2):225–252MathSciNet
Zurück zum Zitat Coomans D, Massart DL (1982) Alternative k-nearest neighbour rules in supervised pattern recognition: part 1. K-Nearest neighbour classification by using alternative voting rules. Anal Chim Acta 136(APR):15–27 Coomans D, Massart DL (1982) Alternative k-nearest neighbour rules in supervised pattern recognition: part 1. K-Nearest neighbour classification by using alternative voting rules. Anal Chim Acta 136(APR):15–27
Zurück zum Zitat Duda R, Hart P (1973) Pattern classification and scene analysis. Wiley, New YorkMATH Duda R, Hart P (1973) Pattern classification and scene analysis. Wiley, New YorkMATH
Zurück zum Zitat Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the 17th international joint conference on artificial intelligence, pp 973–978 Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the 17th international joint conference on artificial intelligence, pp 973–978
Zurück zum Zitat Estabrooks A, Jo T, Japkowicz N (2004) A multiple resampling method for learning from imbalanced data sets. Comput Intell 20(1):18–36MathSciNet Estabrooks A, Jo T, Japkowicz N (2004) A multiple resampling method for learning from imbalanced data sets. Comput Intell 20(1):18–36MathSciNet
Zurück zum Zitat Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Proceedings of 13th international conference on machine learning, Bari, Italy, pp 148–156 Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Proceedings of 13th international conference on machine learning, Bari, Italy, pp 148–156
Zurück zum Zitat Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C Appl Rev 42(4):463–484 Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C Appl Rev 42(4):463–484
Zurück zum Zitat Galar M, Fernández A, Barrenechea E, Herrera F (2013) EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn 46(12):3460–3471 Galar M, Fernández A, Barrenechea E, Herrera F (2013) EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn 46(12):3460–3471
Zurück zum Zitat Gao XY, Chen ZY, Tang S, Zhang YD, Li JT (2016) Adaptive weighted imbalance learning with application to abnormal activity recognition. Neurocomputing 173(3):1927–1935 Gao XY, Chen ZY, Tang S, Zhang YD, Li JT (2016) Adaptive weighted imbalance learning with application to abnormal activity recognition. Neurocomputing 173(3):1927–1935
Zurück zum Zitat Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Int Conf Intell Comput 3644(5):878–887 Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Int Conf Intell Comput 3644(5):878–887
Zurück zum Zitat He HB, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284 He HB, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Zurück zum Zitat He HB, Ma YQ (2013) Imbalanced learning: foundations, algorithms, and applications. Wiley, New YorkMATH He HB, Ma YQ (2013) Imbalanced learning: foundations, algorithms, and applications. Wiley, New YorkMATH
Zurück zum Zitat He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of IEEE international joint conference on neural networks, pp 1322–1328 He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of IEEE international joint conference on neural networks, pp 1322–1328
Zurück zum Zitat Hsieh CJ, Chang KW, Lin CJ, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear SVM. In: Proceedings of 25th international conference on Machine learning, Helsinki, Finland, pp 408–415 Hsieh CJ, Chang KW, Lin CJ, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear SVM. In: Proceedings of 25th international conference on Machine learning, Helsinki, Finland, pp 408–415
Zurück zum Zitat Hulse JV, Khoshgoftaar TM, Napolitano A (2007) Experimental perspectives on learning from imbalanced data. In: Proceedings of 24th international conference on machine learning, pp 935–942 Hulse JV, Khoshgoftaar TM, Napolitano A (2007) Experimental perspectives on learning from imbalanced data. In: Proceedings of 24th international conference on machine learning, pp 935–942
Zurück zum Zitat Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449MATH Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449MATH
Zurück zum Zitat Jian CX, Gao J, Ao YH (2016) A new sampling method for classifying imbalanced data based on support vector machine ensemble. Neurocomputing 193:115–122 Jian CX, Gao J, Ao YH (2016) A new sampling method for classifying imbalanced data based on support vector machine ensemble. Neurocomputing 193:115–122
Zurück zum Zitat Kang Q, Chen XS, Li SS, Zhou MC (2017) A noise-filtered under-sampling scheme for imbalanced classification. IEEE Trans Cybern 47(12):4263–4274 Kang Q, Chen XS, Li SS, Zhou MC (2017) A noise-filtered under-sampling scheme for imbalanced classification. IEEE Trans Cybern 47(12):4263–4274
Zurück zum Zitat Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232 Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232
Zurück zum Zitat Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of international conference on machine learning, pp 179–186 Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of international conference on machine learning, pp 179–186
Zurück zum Zitat Li QJ, Mao YB (2014) A review of boosting methods for imbalanced data classification. Pattern Anal Appl 17(4):679–693MathSciNetMATH Li QJ, Mao YB (2014) A review of boosting methods for imbalanced data classification. Pattern Anal Appl 17(4):679–693MathSciNetMATH
Zurück zum Zitat Long PM, Servedio RA (2010) Random classification noise defeats all convex potential boosters. Mach Learn 78(3):287–304MathSciNet Long PM, Servedio RA (2010) Random classification noise defeats all convex potential boosters. Mach Learn 78(3):287–304MathSciNet
Zurück zum Zitat Majid A, Ali S, Iqbal M, Kausar N (2014) Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines. Comput Methods Programs Biomed 113(3):792–808 Majid A, Ali S, Iqbal M, Kausar N (2014) Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines. Comput Methods Programs Biomed 113(3):792–808
Zurück zum Zitat Napierala K, Stefanowski J (2016) Types of minority class examples and their influence on learning classifiers from imbalanced data. J Intell Inf Syst 46(3):563–597 Napierala K, Stefanowski J (2016) Types of minority class examples and their influence on learning classifiers from imbalanced data. J Intell Inf Syst 46(3):563–597
Zurück zum Zitat Pavón R, Laza R, Reboiro-Jato M, Fdez-Riverola F (2011) Assessing the impact of class-imbalanced data for classifying relevant/irrelevant medline documents. Adv Intell Soft Comput 93:345–353 Pavón R, Laza R, Reboiro-Jato M, Fdez-Riverola F (2011) Assessing the impact of class-imbalanced data for classifying relevant/irrelevant medline documents. Adv Intell Soft Comput 93:345–353
Zurück zum Zitat Pedrajas NG, Rodríguez JP, Pedrajas MG, Boyer DO, Fyfe C (2012) Class imbalance methods for translation initiation site recognition in DNA sequences. Knowl Based Syst 25(1):22–34 Pedrajas NG, Rodríguez JP, Pedrajas MG, Boyer DO, Fyfe C (2012) Class imbalance methods for translation initiation site recognition in DNA sequences. Knowl Based Syst 25(1):22–34
Zurück zum Zitat Pedrycz W (2007) Granular computing-the emerging paradigm. J Uncertain Syst 1(1):38–61 Pedrycz W (2007) Granular computing-the emerging paradigm. J Uncertain Syst 1(1):38–61
Zurück zum Zitat Pedrycz W (2013) Granular computing: analysis and design of intelligent systems. CRC. Press/Francis Taylor, Boca Raton Pedrycz W (2013) Granular computing: analysis and design of intelligent systems. CRC. Press/Francis Taylor, Boca Raton
Zurück zum Zitat Pedrycz W, Homenda W (2013) Building the fundamentals of granular computing: a principle of justifiable granularity. Appl Soft Comput 13(10):4209–4218 Pedrycz W, Homenda W (2013) Building the fundamentals of granular computing: a principle of justifiable granularity. Appl Soft Comput 13(10):4209–4218
Zurück zum Zitat Santosab MS, Abreuab PH, García-Laencinac PJ, Simãod A, Carvalhod A (2015) A new cluster-based over-sampling method for improving survival prediction of hepatocellular carcinoma patients. J Biomed Inform 58:49–59 Santosab MS, Abreuab PH, García-Laencinac PJ, Simãod A, Carvalhod A (2015) A new cluster-based over-sampling method for improving survival prediction of hepatocellular carcinoma patients. J Biomed Inform 58:49–59
Zurück zum Zitat Seiffert C, Khoshgoftaar T, van Hulse J (2009) Improving software-quality predictions with data sampling and boosting. IEEE Trans Syst Man Cybern Part A Syst Hum 39(6):1283–1294 Seiffert C, Khoshgoftaar T, van Hulse J (2009) Improving software-quality predictions with data sampling and boosting. IEEE Trans Syst Man Cybern Part A Syst Hum 39(6):1283–1294
Zurück zum Zitat Seiffert C, Khoshgoftaar TM, van van Hulse J, Napolitano A (2010) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Part A Syst Hum 40(1):185–197 Seiffert C, Khoshgoftaar TM, van van Hulse J, Napolitano A (2010) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Part A Syst Hum 40(1):185–197
Zurück zum Zitat Shalev-Shwartz S, Singer Y, Srebro N, Cotter A (2011) Pegasos: primal estimated sub-gradient solver for SVM. Math Program 127(1):3–30MathSciNetMATH Shalev-Shwartz S, Singer Y, Srebro N, Cotter A (2011) Pegasos: primal estimated sub-gradient solver for SVM. Math Program 127(1):3–30MathSciNetMATH
Zurück zum Zitat Su CT, Chen LS, Yih Y (2006) Knowledge acquisition through information granulation for imbalanced data. Expert Syst Appl 31(3):531–541 Su CT, Chen LS, Yih Y (2006) Knowledge acquisition through information granulation for imbalanced data. Expert Syst Appl 31(3):531–541
Zurück zum Zitat Sun YM, Kamel MS, Wong AKC, Wang Y (2007) Cost sensitive boosting for classification of imbalanced data. Pattern Recogn 40(12):3358–3378MATH Sun YM, Kamel MS, Wong AKC, Wang Y (2007) Cost sensitive boosting for classification of imbalanced data. Pattern Recogn 40(12):3358–3378MATH
Zurück zum Zitat Sun YM, Wong AKC, Kamel MS (2009) Classification of imbalanced data: a review. Int J Pattern Recognit Artif Intell 23(4):687–719 Sun YM, Wong AKC, Kamel MS (2009) Classification of imbalanced data: a review. Int J Pattern Recognit Artif Intell 23(4):687–719
Zurück zum Zitat Sun T, Jiao L, Feng J, Liu F, Zhang X (2014) Imbalanced hyperspectral image classification based on maximum margin. IEEE Geosci Remote Sens Lett 12(3):522–526 Sun T, Jiao L, Feng J, Liu F, Zhang X (2014) Imbalanced hyperspectral image classification based on maximum margin. IEEE Geosci Remote Sens Lett 12(3):522–526
Zurück zum Zitat Sun ZB, Song QB, Zhu XY, Sun HL, Xu BW, Zhou YM (2015) A novel ensemble method for classifying imbalanced data. Pattern Recogn 48(5):1623–1637 Sun ZB, Song QB, Zhu XY, Sun HL, Xu BW, Zhou YM (2015) A novel ensemble method for classifying imbalanced data. Pattern Recogn 48(5):1623–1637
Zurück zum Zitat Thomas C (2013) Improving intrusion detection for imbalanced network traffic. Secur Commun Netw 6(3):309–324 Thomas C (2013) Improving intrusion detection for imbalanced network traffic. Secur Commun Netw 6(3):309–324
Zurück zum Zitat Vapnik VN (1998) Statistical learning theory. Wiley, New YorkMATH Vapnik VN (1998) Statistical learning theory. Wiley, New YorkMATH
Zurück zum Zitat Wang BX, Japkowicz N (2010) Boosting support vector machines for imbalanced data sets. Knowl Inf Syst 25(1):1–20 Wang BX, Japkowicz N (2010) Boosting support vector machines for imbalanced data sets. Knowl Inf Syst 25(1):1–20
Zurück zum Zitat Wang D, Pedrycz W, Li ZW (2019) Granular data aggregation: an adaptive principle of the justifiable granularity approach. IEEE Trans Cybern 49(2):417–426 Wang D, Pedrycz W, Li ZW (2019) Granular data aggregation: an adaptive principle of the justifiable granularity approach. IEEE Trans Cybern 49(2):417–426
Zurück zum Zitat Wei W, Li J, Cao L, Ou Y, Chen J (2013) Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web 16(4):449–475 Wei W, Li J, Cao L, Ou Y, Chen J (2013) Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web 16(4):449–475
Zurück zum Zitat Wu G, Chang EY (2003) Class-boundary alignment for imbalanced dataset learning. In: Proceedings of international conference on machine learning 2003 workshop on learning from imbalanced data sets II, Washington, DC Wu G, Chang EY (2003) Class-boundary alignment for imbalanced dataset learning. In: Proceedings of international conference on machine learning 2003 workshop on learning from imbalanced data sets II, Washington, DC
Zurück zum Zitat Xu KJ, Pedrycz W, Li ZW (2019b) High-accuracy signal subspace separation algorithm based on Gaussian kernel soft partition. IEEE Trans Ind Electron 66(1):491–499 Xu KJ, Pedrycz W, Li ZW (2019b) High-accuracy signal subspace separation algorithm based on Gaussian kernel soft partition. IEEE Trans Ind Electron 66(1):491–499
Zurück zum Zitat Yu HL, Ni J, Zhao J (2013) ACOSampling: an ant colony optimization-based under-sampling method for classifying imbalanced DNA microarray data. Neurocomputing 101(2):309–318 Yu HL, Ni J, Zhao J (2013) ACOSampling: an ant colony optimization-based under-sampling method for classifying imbalanced DNA microarray data. Neurocomputing 101(2):309–318
Zurück zum Zitat Zadeh LA (1997) Towards a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst 90(2):111–117MathSciNetMATH Zadeh LA (1997) Towards a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst 90(2):111–117MathSciNetMATH
Zurück zum Zitat Zadeh LA (2005) Toward a generalized theory of uncertainty (GTU)—an outline. Inf Sci 172(1–2):1–40MathSciNetMATH Zadeh LA (2005) Toward a generalized theory of uncertainty (GTU)—an outline. Inf Sci 172(1–2):1–40MathSciNetMATH
Zurück zum Zitat Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of 21st international conference on machine learning (ICML) Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of 21st international conference on machine learning (ICML)
Zurück zum Zitat Zhang HX, Li MF (2014) RWO-Sampling: a random walk over-sampling approach to imbalanced data classification, Information Fusion. Inf Fusion 20:99–116 Zhang HX, Li MF (2014) RWO-Sampling: a random walk over-sampling approach to imbalanced data classification, Information Fusion. Inf Fusion 20:99–116
Zurück zum Zitat Zhu XB, Pedrycz W, Li ZW (2017a) Granular data description: designing ellipsoidal information granules. IEEE Trans Cybern 47(12):4475–4484 Zhu XB, Pedrycz W, Li ZW (2017a) Granular data description: designing ellipsoidal information granules. IEEE Trans Cybern 47(12):4475–4484
Zurück zum Zitat Zhu XB, Pedrycz W, Li ZW (2017b) Granular encoders and decoders: a study in processing information granules. IEEE Trans Fuzzy Syst 25(5):1115–1126 Zhu XB, Pedrycz W, Li ZW (2017b) Granular encoders and decoders: a study in processing information granules. IEEE Trans Fuzzy Syst 25(5):1115–1126
Zurück zum Zitat Zhu XB, Pedrycz W, Li ZW (2018a) Granular representation of data: a design of families of ɛ-information granules. IEEE Trans Fuzzy Syst 26(4):2107–2119 Zhu XB, Pedrycz W, Li ZW (2018a) Granular representation of data: a design of families of ɛ-information granules. IEEE Trans Fuzzy Syst 26(4):2107–2119
Zurück zum Zitat Zhu XB, Pedrycz W, Li ZW (2018b) A design of granular Takagi-Sugeno fuzzy model through the synergy of fuzzy subspace clustering and optimal allocation of information granularity. IEEE Trans Fuzzy Syst 26(5):2499–2509 Zhu XB, Pedrycz W, Li ZW (2018b) A design of granular Takagi-Sugeno fuzzy model through the synergy of fuzzy subspace clustering and optimal allocation of information granularity. IEEE Trans Fuzzy Syst 26(5):2499–2509
Zurück zum Zitat Zhu XB, Pedrycz W, Li ZW (2018c) Granular models and granular outliers. IEEE Trans Fuzzy Syst 26(6):3835–3846 Zhu XB, Pedrycz W, Li ZW (2018c) Granular models and granular outliers. IEEE Trans Fuzzy Syst 26(6):3835–3846
Metadaten
Titel
A design of information granule-based under-sampling method in imbalanced data classification
verfasst von
Tianyu Liu
Xiubin Zhu
Witold Pedrycz
Zhiwu Li
Publikationsdatum
13.05.2020
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 22/2020
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-020-05023-2

Weitere Artikel der Ausgabe 22/2020

Soft Computing 22/2020 Zur Ausgabe