Skip to main content
Erschienen in: Granular Computing 3/2019

25.06.2018 | Original Paper

Neighborhood attribute reduction for imbalanced data

verfasst von: Wendong Zhang, Xun Wang, Xibei Yang, Xiangjian Chen, Pingxin Wang

Erschienen in: Granular Computing | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

From the viewpoint of rough granular computing, neighborhood decision error rate-based attribute reduction aims to improve the classification performance of the neighborhood classifier. Nevertheless, for imbalanced data which can be seen everywhere in real-world applications, such reduction does not pay much attention to the classification results of samples in minority class. Therefore, a new strategy to attribute reduction is proposed, which is embedded with preprocessing of the imbalanced data. First, the widely accepted SMOTE algorithm and K-means algorithm are used for oversampling and undersampling, respectively. Second, the neighborhood decision error rate-based attribute reduction is designed for those updated data. Finally, the neighborhood classifier can be tested with the attributes in reducts. The experimental results on some UCI and PROMISE data sets show that our approach is superior to the traditional attribute reduction based on the evaluations of F-measure and G-mean. Therefore, the contribution of this paper is to construct the attribute reduction strategy for imbalanced data, which can select useful attributes for improving the classification performance in such data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Castellanos FJ, Valero-Mas JJ, Calvo-Zaragoza J, Rico-Juan JR (2018) Oversampling imbalanced data in the string space. Pattern Recognit Lett 103:32–38CrossRef Castellanos FJ, Valero-Mas JJ, Calvo-Zaragoza J, Rico-Juan JR (2018) Oversampling imbalanced data in the string space. Pattern Recognit Lett 103:32–38CrossRef
Zurück zum Zitat Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357MATHCrossRef Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357MATHCrossRef
Zurück zum Zitat Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: Knowledge discovery in databases: Pkdd 2003, European conference on principles and practice of knowledge discovery in databases, Cavtat-Dubrovnik, Croatia, September 22–26, 2003, Proceedings, pp 107–119 Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: Knowledge discovery in databases: Pkdd 2003, European conference on principles and practice of knowledge discovery in databases, Cavtat-Dubrovnik, Croatia, September 22–26, 2003, Proceedings, pp 107–119
Zurück zum Zitat Das AK, Sengupta S, Bhattacharyya S (2018) A group incremental feature selection for classification using rough set theory based genetic algorithm. Appl Soft Comput 65:400–411CrossRef Das AK, Sengupta S, Bhattacharyya S (2018) A group incremental feature selection for classification using rough set theory based genetic algorithm. Appl Soft Comput 65:400–411CrossRef
Zurück zum Zitat Dou HL, Yang XB, Song XN, Yu HL, Wu WZ, Yang JY (2016) Decision-theoretic rough set: a multicost strategy. Knowl Based Syst 91:71–83CrossRef Dou HL, Yang XB, Song XN, Yu HL, Wu WZ, Yang JY (2016) Decision-theoretic rough set: a multicost strategy. Knowl Based Syst 91:71–83CrossRef
Zurück zum Zitat Guo YW, Jiao LC, Wang S, Wang S, Liu F, Rong K, Xiong T (2014) A novel dynamic rough subspace based selective ensemble. Pattern Recognit 48(5):1638–1652CrossRef Guo YW, Jiao LC, Wang S, Wang S, Liu F, Rong K, Xiong T (2014) A novel dynamic rough subspace based selective ensemble. Pattern Recognit 48(5):1638–1652CrossRef
Zurück zum Zitat Hu QH, Yu DR, Xie ZX, Li XD (2007) EROS: ensemble rough subspaces. Pattern Recognit 40(12):3728–3739MATHCrossRef Hu QH, Yu DR, Xie ZX, Li XD (2007) EROS: ensemble rough subspaces. Pattern Recognit 40(12):3728–3739MATHCrossRef
Zurück zum Zitat Hu QH, Yu DR, Liu JF, Wu CX (2008a) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci Int J 178(18):3577–3594MathSciNetMATH Hu QH, Yu DR, Liu JF, Wu CX (2008a) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci Int J 178(18):3577–3594MathSciNetMATH
Zurück zum Zitat Hu QH, Yu DR, Xie ZX (2008b) Neighborhood classifiers. Expert Syst Appl 34(2):866–876CrossRef Hu QH, Yu DR, Xie ZX (2008b) Neighborhood classifiers. Expert Syst Appl 34(2):866–876CrossRef
Zurück zum Zitat Hu QH, Pedrycz W, Yu DR, Lang J (2009) Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Trans Syst Man 40(1):137–150 Hu QH, Pedrycz W, Yu DR, Lang J (2009) Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Trans Syst Man 40(1):137–150
Zurück zum Zitat Huang B, Li HX (2018) Distance-based information granularity in neighborhood-based granular space. Granul Comput 3(2):93–110CrossRef Huang B, Li HX (2018) Distance-based information granularity in neighborhood-based granular space. Granul Comput 3(2):93–110CrossRef
Zurück zum Zitat Ju HR, Yang XB, Yu HL, Li TJ, Yu DJ, Yang JY (2016) Cost-sensitive rough set approach. Inf Sci Int J 355(C):282–298MATH Ju HR, Yang XB, Yu HL, Li TJ, Yu DJ, Yang JY (2016) Cost-sensitive rough set approach. Inf Sci Int J 355(C):282–298MATH
Zurück zum Zitat Ju HR, Li HX, Yang XB, Zhou XZ, Huang B (2017) Cost-sensitive rough set: a multi-granulation approach. Knowl Based Syst 123:137–153CrossRef Ju HR, Li HX, Yang XB, Zhou XZ, Huang B (2017) Cost-sensitive rough set: a multi-granulation approach. Knowl Based Syst 123:137–153CrossRef
Zurück zum Zitat Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learning 51(2):181–207MATHCrossRef Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learning 51(2):181–207MATHCrossRef
Zurück zum Zitat Li SQ, Harner EJ, Adjeroh DA (2011) Random KNN feature selection—a fast and stable alternative to random forests. BMC Bioinform 12(1):1–11CrossRef Li SQ, Harner EJ, Adjeroh DA (2011) Random KNN feature selection—a fast and stable alternative to random forests. BMC Bioinform 12(1):1–11CrossRef
Zurück zum Zitat Lin WC, Tsai CF, Hu YH, Jhang JS (2017) Clustering-based undersampling in class-imbalanced data. Inf Sci 409:17–26CrossRef Lin WC, Tsai CF, Hu YH, Jhang JS (2017) Clustering-based undersampling in class-imbalanced data. Inf Sci 409:17–26CrossRef
Zurück zum Zitat Liu BX, Li Y, Li LH, Yu YP (2010) An approximate reduction algorithm based on conditional entropy. In: Information computing and applications—international conference, Icica 2010, Tangshan, China, October 15–18, 2010. Proceedings, pp 319–325 Liu BX, Li Y, Li LH, Yu YP (2010) An approximate reduction algorithm based on conditional entropy. In: Information computing and applications—international conference, Icica 2010, Tangshan, China, October 15–18, 2010. Proceedings, pp 319–325
Zurück zum Zitat Mi JS, Wu WZ, Zhang WX (2004) Approaches to knowledge reduction based on variable precision rough set model. Inf Sci 159(3–4):255–272MathSciNetMATHCrossRef Mi JS, Wu WZ, Zhang WX (2004) Approaches to knowledge reduction based on variable precision rough set model. Inf Sci 159(3–4):255–272MathSciNetMATHCrossRef
Zurück zum Zitat Mieszkowicz-Rolka A, Rolka L (2004) Remarks on approximation quality in variable precision fuzzy rough sets model. In: Rough sets and current trends in computing, international conference, Rsctc 2004, Uppsala, Sweden, June 1–5, 2004, Proceedings, pp 402–411 Mieszkowicz-Rolka A, Rolka L (2004) Remarks on approximation quality in variable precision fuzzy rough sets model. In: Rough sets and current trends in computing, international conference, Rsctc 2004, Uppsala, Sweden, June 1–5, 2004, Proceedings, pp 402–411
Zurück zum Zitat Min F, Zhu W (2011) Minimal cost attribute reduction through backtracking. Commun Comput Inf Sci 258:100–107 Min F, Zhu W (2011) Minimal cost attribute reduction through backtracking. Commun Comput Inf Sci 258:100–107
Zurück zum Zitat Mohanavalli S, Jaisakthi SM, Aravindan C (2011) Strategies for parallelizing kmeans data clustering algorithm. Plos One 3(3):e1828–e1828 Mohanavalli S, Jaisakthi SM, Aravindan C (2011) Strategies for parallelizing kmeans data clustering algorithm. Plos One 3(3):e1828–e1828
Zurück zum Zitat Pal SK, Shankar BU, Mitra P (2004) Granular computing, rough entropy and object extraction. Pattern Recognit Lett 26(16):2509–2517CrossRef Pal SK, Shankar BU, Mitra P (2004) Granular computing, rough entropy and object extraction. Pattern Recognit Lett 26(16):2509–2517CrossRef
Zurück zum Zitat Pawlak Z (1992) Rough sets: theoretical aspects of reasoning about data. Kluwer Academic Publishers, NetherlandsMATH Pawlak Z (1992) Rough sets: theoretical aspects of reasoning about data. Kluwer Academic Publishers, NetherlandsMATH
Zurück zum Zitat Pedrycz W, Chen SM (2011) Granular computing and intelligent systems. Springer, BerlinCrossRef Pedrycz W, Chen SM (2011) Granular computing and intelligent systems. Springer, BerlinCrossRef
Zurück zum Zitat Pedrycz W, Chen SM (2015a) Granular computing and decision-making. Springer International Publishing, New YorkCrossRef Pedrycz W, Chen SM (2015a) Granular computing and decision-making. Springer International Publishing, New YorkCrossRef
Zurück zum Zitat Pedrycz W, Chen SM (2015b) Information granularity, big data, and computational intelligence. Springer International Publishing, New YorkCrossRef Pedrycz W, Chen SM (2015b) Information granularity, big data, and computational intelligence. Springer International Publishing, New YorkCrossRef
Zurück zum Zitat Skowron A, Stepaniuk J, Swiniarski R (2012) Modeling rough granular computing based on approximation spaces. Inf Sci 184(1):20–43MATHCrossRef Skowron A, Stepaniuk J, Swiniarski R (2012) Modeling rough granular computing based on approximation spaces. Inf Sci 184(1):20–43MATHCrossRef
Zurück zum Zitat Su CT, Chen LS, Yih Y (2006) Knowledge acquisition through information granulation for imbalanced data. Expert Syst Appl 31(3):531–541CrossRef Su CT, Chen LS, Yih Y (2006) Knowledge acquisition through information granulation for imbalanced data. Expert Syst Appl 31(3):531–541CrossRef
Zurück zum Zitat Sun XB, Tang XH, Zeng HL, Zhou SY (2008) A heuristic algorithm based on attribute importance for feature selection. In: International conference on rough sets and knowledge technology, pp 189–196 Sun XB, Tang XH, Zeng HL, Zhou SY (2008) A heuristic algorithm based on attribute importance for feature selection. In: International conference on rough sets and knowledge technology, pp 189–196
Zurück zum Zitat Tang B, He H (2017) GIR-based ensemble sampling approaches for imbalanced learning. Pattern Recognit 71:306–319CrossRef Tang B, He H (2017) GIR-based ensemble sampling approaches for imbalanced learning. Pattern Recognit 71:306–319CrossRef
Zurück zum Zitat Wang G (2017) DGCC: data-driven granular cognitive computing. Granul Comput 2:343–355CrossRef Wang G (2017) DGCC: data-driven granular cognitive computing. Granul Comput 2:343–355CrossRef
Zurück zum Zitat William-West TO, Singh D (2017) Information granulation for rough fuzzy hypergraphs. Granul Comput 3:75–92CrossRef William-West TO, Singh D (2017) Information granulation for rough fuzzy hypergraphs. Granul Comput 3:75–92CrossRef
Zurück zum Zitat Xu SP, Wang PX, Li JH, Yang XB, Chen XJ (2017a) Attribute reduction: an ensemble strategy. In: International joint conference on rough sets, pp 362–375 Xu SP, Wang PX, Li JH, Yang XB, Chen XJ (2017a) Attribute reduction: an ensemble strategy. In: International joint conference on rough sets, pp 362–375
Zurück zum Zitat Xu SP, Yang XB, Tsang ECC, Mantey EA (2017b) Neighborhood collaborative classifiers. In: 2016 international conference on machine learning and cybernetics, pp 470–476 Xu SP, Yang XB, Tsang ECC, Mantey EA (2017b) Neighborhood collaborative classifiers. In: 2016 international conference on machine learning and cybernetics, pp 470–476
Zurück zum Zitat Xu WH, Li WT, Zhang XT (2017c) Generalized multigranulation rough sets and optimal granularity selection. Granul Comput 2:271–288CrossRef Xu WH, Li WT, Zhang XT (2017c) Generalized multigranulation rough sets and optimal granularity selection. Granul Comput 2:271–288CrossRef
Zurück zum Zitat Yang XB, Qi Y, Yu HL, Song XN, Yang JY (2014) Updating multigranulation rough approximations with increasing of granular structures. Knowl Based Syst 64(1):59–69CrossRef Yang XB, Qi Y, Yu HL, Song XN, Yang JY (2014) Updating multigranulation rough approximations with increasing of granular structures. Knowl Based Syst 64(1):59–69CrossRef
Zurück zum Zitat Yao YY (1998) Relational interpretations of neighborhood operators and rough set approximation operators. Inf Sci 111(1–4):239–259MathSciNetMATHCrossRef Yao YY (1998) Relational interpretations of neighborhood operators and rough set approximation operators. Inf Sci 111(1–4):239–259MathSciNetMATHCrossRef
Zurück zum Zitat Yao YY (2001) Information granulation and rough set approximation. Int J Intell Syst 16(1):87–104MATHCrossRef Yao YY (2001) Information granulation and rough set approximation. Int J Intell Syst 16(1):87–104MATHCrossRef
Zurück zum Zitat Yao YY (2010) Human-inspired granular computing. In: Novel developments in granular computing: applications for advanced human reasoning and soft computation. Herskey, PA, pp 1–15 Yao YY (2010) Human-inspired granular computing. In: Novel developments in granular computing: applications for advanced human reasoning and soft computation. Herskey, PA, pp 1–15
Zurück zum Zitat Yu HL, Ni J, Zhao J (2013) ACOSampling: an ant colony optimization-based undersampling method for classifying imbalanced dna microarray data. Neurocomputing 101(2):309–318CrossRef Yu HL, Ni J, Zhao J (2013) ACOSampling: an ant colony optimization-based undersampling method for classifying imbalanced dna microarray data. Neurocomputing 101(2):309–318CrossRef
Zurück zum Zitat Yu HL, Sun CY, Yang XB, Yang WK, Shen JF, Qi YS (2016) ODOC-ELM: optimal decision outputs compensation-based extreme learning machine for classifying imbalanced data. Knowl Based Syst 92:55–70CrossRef Yu HL, Sun CY, Yang XB, Yang WK, Shen JF, Qi YS (2016) ODOC-ELM: optimal decision outputs compensation-based extreme learning machine for classifying imbalanced data. Knowl Based Syst 92:55–70CrossRef
Zurück zum Zitat Zadeh LA (1997) Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Elsevier North-Holland, Inc., AmsterdamMATHCrossRef Zadeh LA (1997) Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Elsevier North-Holland, Inc., AmsterdamMATHCrossRef
Zurück zum Zitat Zhang X, Mei CL, Chen DG, Li JH (2016) Feature selection in mixed data: a method using a novel fuzzy rough set-based information entropy. Pattern Recognit 56(1):1–15MATHCrossRef Zhang X, Mei CL, Chen DG, Li JH (2016) Feature selection in mixed data: a method using a novel fuzzy rough set-based information entropy. Pattern Recognit 56(1):1–15MATHCrossRef
Zurück zum Zitat Zhao H, Wang P, Hu QH (2016) Cost-sensitive feature selection based on adaptive neighborhood granularity with multi-level confidence. Inf Sci 366:134–149MathSciNetCrossRef Zhao H, Wang P, Hu QH (2016) Cost-sensitive feature selection based on adaptive neighborhood granularity with multi-level confidence. Inf Sci 366:134–149MathSciNetCrossRef
Zurück zum Zitat Zhu TF, Lin YP, Liu YH (2017) Synthetic minority oversampling technique for multiclass imbalance problems. Pattern Recognit 72:327–340CrossRef Zhu TF, Lin YP, Liu YH (2017) Synthetic minority oversampling technique for multiclass imbalance problems. Pattern Recognit 72:327–340CrossRef
Metadaten
Titel
Neighborhood attribute reduction for imbalanced data
verfasst von
Wendong Zhang
Xun Wang
Xibei Yang
Xiangjian Chen
Pingxin Wang
Publikationsdatum
25.06.2018
Verlag
Springer International Publishing
Erschienen in
Granular Computing / Ausgabe 3/2019
Print ISSN: 2364-4966
Elektronische ISSN: 2364-4974
DOI
https://doi.org/10.1007/s41066-018-0105-6

Weitere Artikel der Ausgabe 3/2019

Granular Computing 3/2019 Zur Ausgabe