Skip to main content

2016 | OriginalPaper | Buchkapitel

Dealing with Data Difficulty Factors While Learning from Imbalanced Data

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Learning from imbalanced data is still one of challenging tasks in machine learning and data mining. We discuss the following data difficulty factors which deteriorate classification performance: decomposition of the minority class into rare sub-concepts, overlapping of classes and distinguishing different types of examples. New experimental studies showing the influence of these factors on classifiers are presented. The paper also includes critical discussions of methods for their identification in real world data. Finally, open research issues are stated.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Anyfantis D, Karagiannopoulos M, Kotsiantis S, Pintelas P (2007) Robustness of learning techniques in handling class noise in imbalanced datasets. In: Proceedings of the IFIP conference on artificial intelligence applications and innovations, pp 21–28 Anyfantis D, Karagiannopoulos M, Kotsiantis S, Pintelas P (2007) Robustness of learning techniques in handling class noise in imbalanced datasets. In: Proceedings of the IFIP conference on artificial intelligence applications and innovations, pp 21–28
2.
Zurück zum Zitat Batista G, Prati R, Monard M (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6(1):20–29CrossRef Batista G, Prati R, Monard M (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6(1):20–29CrossRef
3.
Zurück zum Zitat Batista G, Prati R, Monard M (2005) Balancing strategies and class overlapping. In: Proceedings of the IDA 2005, LNCS vol 3646, pp 24–35, Springer Batista G, Prati R, Monard M (2005) Balancing strategies and class overlapping. In: Proceedings of the IDA 2005, LNCS vol 3646, pp 24–35, Springer
4.
Zurück zum Zitat Bishop Ch (2006) Pattern recognition and machine learning. Information science and statistics. Springer, New York Bishop Ch (2006) Pattern recognition and machine learning. Information science and statistics. Springer, New York
5.
Zurück zum Zitat Błaszczyński J, Stefanowski J (2015) Neighbourhood sampling in bagging for imbalanced data. Neurocomputing 150(Part B):529–542 Błaszczyński J, Stefanowski J (2015) Neighbourhood sampling in bagging for imbalanced data. Neurocomputing 150(Part B):529–542
6.
Zurück zum Zitat Błaszczyński J, Deckert M, Stefanowski J, Wilk Sz (2010) Integrating selective pre-processing of imbalanced data with Ivotes ensemble. In: Proceedings of the 7th international conference RSCTC 2010, LNAI vol 6086, pp 148–157, Springer Błaszczyński J, Deckert M, Stefanowski J, Wilk Sz (2010) Integrating selective pre-processing of imbalanced data with Ivotes ensemble. In: Proceedings of the 7th international conference RSCTC 2010, LNAI vol 6086, pp 148–157, Springer
7.
Zurück zum Zitat Błaszczyński J, Stefanowski J, Idkowiak L (2013) Extending bagging for imbalanced data. In: Proceedings of the 8th CORES 2013, Springer Series on Advances in Intelligent Systems and Computing, vol 226, pp 269–278 Błaszczyński J, Stefanowski J, Idkowiak L (2013) Extending bagging for imbalanced data. In: Proceedings of the 8th CORES 2013, Springer Series on Advances in Intelligent Systems and Computing, vol 226, pp 269–278
8.
Zurück zum Zitat Borowski J (2014) Constructing data representations and classification of imbalanced text documents. Master Thesis, Poznan University of Technology (supervised by Stefanowski J.) Borowski J (2014) Constructing data representations and classification of imbalanced text documents. Master Thesis, Poznan University of Technology (supervised by Stefanowski J.)
9.
Zurück zum Zitat Brodley CE, Friedl M (1999) A: Identifying mislabeled training data. J Artif Intell Res 11:131–167 Brodley CE, Friedl M (1999) A: Identifying mislabeled training data. J Artif Intell Res 11:131–167
10.
Zurück zum Zitat Chawla N (2005) Data mining for imbalanced datasets: an overview. In: Maimon O, Rokach L (eds) The data mining and knowledge discovery handbook, pp 853–867, Springer, New York Chawla N (2005) Data mining for imbalanced datasets: an overview. In: Maimon O, Rokach L (eds) The data mining and knowledge discovery handbook, pp 853–867, Springer, New York
11.
Zurück zum Zitat Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:341–378 Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:341–378
12.
Zurück zum Zitat Cost S, Salzberg S (1993) A weighted nearest neighbor algorithm for learning with symbolic features. Mach Learn J 10(1):1213–1228 Cost S, Salzberg S (1993) A weighted nearest neighbor algorithm for learning with symbolic features. Mach Learn J 10(1):1213–1228
13.
Zurück zum Zitat Davis J, Goadrich M (2006) The relationship between Precision- Recall and ROC curves. In: Proceedings of the international conference on machine learning ICML, pp 233–240 Davis J, Goadrich M (2006) The relationship between Precision- Recall and ROC curves. In: Proceedings of the international conference on machine learning ICML, pp 233–240
14.
Zurück zum Zitat Denil M, Trappenberg T (2011) A characterization of the combined effects of overlap and imbalance on the SVM classifier. In: Proceedings of CoRR conference, pp 1–10 Denil M, Trappenberg T (2011) A characterization of the combined effects of overlap and imbalance on the SVM classifier. In: Proceedings of CoRR conference, pp 1–10
15.
Zurück zum Zitat Drummond C, Holte R (2006) Cost curves: an improved method for visualizing classifier performance. Mach Learn J 65(1):95–130 Drummond C, Holte R (2006) Cost curves: an improved method for visualizing classifier performance. Mach Learn J 65(1):95–130
16.
Zurück zum Zitat Elklan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the international joint conference on artificial intelligence IJCAI-01, pp 63–66 Elklan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the international joint conference on artificial intelligence IJCAI-01, pp 63–66
17.
Zurück zum Zitat Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases. In: Proceedings of the international conference KDD’96, pp 226–231 Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases. In: Proceedings of the international conference KDD’96, pp 226–231
18.
Zurück zum Zitat Fernandez A, Garcia S, Herrera F (2011) Addressing the classification with imbalanced data: open problems and new challenges on class distribution. In: Proceedings of the HAIS conference (part. 1), pp 1–10 Fernandez A, Garcia S, Herrera F (2011) Addressing the classification with imbalanced data: open problems and new challenges on class distribution. In: Proceedings of the HAIS conference (part. 1), pp 1–10
19.
Zurück zum Zitat Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2011) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C: Appl Rev 99:1–22 Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2011) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C: Appl Rev 99:1–22
20.
Zurück zum Zitat Gamberger D, Boskovic R, Lavrac N, Groselj C (1999) Experiments with noise filtering in a medical domain. In: Proceedings of the 16th international conference on machine learning ICML’99, pp 143–151 Gamberger D, Boskovic R, Lavrac N, Groselj C (1999) Experiments with noise filtering in a medical domain. In: Proceedings of the 16th international conference on machine learning ICML’99, pp 143–151
21.
Zurück zum Zitat Garcia S, Herrera F (2009) Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy. Evol Comput 17(3):275–306CrossRef Garcia S, Herrera F (2009) Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy. Evol Comput 17(3):275–306CrossRef
22.
Zurück zum Zitat Garcia V, Sanchez JS, Mollineda RA (2007) An empirical study of the behaviour of classifiers on imbalanced and overlapped data sets. In: Proceedings of progress in pattern recognition, image analysis and applications 2007, LNCS, vol 4756, pp 397–406, Springer Garcia V, Sanchez JS, Mollineda RA (2007) An empirical study of the behaviour of classifiers on imbalanced and overlapped data sets. In: Proceedings of progress in pattern recognition, image analysis and applications 2007, LNCS, vol 4756, pp 397–406, Springer
23.
Zurück zum Zitat Garcia V, Mollineda R, Sanchez JS (2008) On the k-nn performance in a challenging scenario of imbalance and overlapping. Pattern Anal Appl 11(3–4):269–280MathSciNetCrossRef Garcia V, Mollineda R, Sanchez JS (2008) On the k-nn performance in a challenging scenario of imbalance and overlapping. Pattern Anal Appl 11(3–4):269–280MathSciNetCrossRef
24.
Zurück zum Zitat Grzymala-Busse JW, Goodwin LK, Grzymala-Busse W, Zheng X (2000) An approach to imbalanced data sets based on changing rule strength. In: Proceeding of learning from imbalanced data sets, AAAI workshop at the 17th conference on AI, pp 69–74 Grzymala-Busse JW, Goodwin LK, Grzymala-Busse W, Zheng X (2000) An approach to imbalanced data sets based on changing rule strength. In: Proceeding of learning from imbalanced data sets, AAAI workshop at the 17th conference on AI, pp 69–74
25.
Zurück zum Zitat Grzymala-Busse JW, Stefanowski J, Wilk S (2005) A comparison of two approaches to data mining from imbalanced data. J Intell Manufact 16(6):565–574CrossRef Grzymala-Busse JW, Stefanowski J, Wilk S (2005) A comparison of two approaches to data mining from imbalanced data. J Intell Manufact 16(6):565–574CrossRef
26.
Zurück zum Zitat Gumkowski M (2014) Using cluster analysis to classification of imbalanced data. Master Thesis, Poznan University of Technology (supervised by Stefanowski J.) Gumkowski M (2014) Using cluster analysis to classification of imbalanced data. Master Thesis, Poznan University of Technology (supervised by Stefanowski J.)
27.
Zurück zum Zitat Han H, Wang W, Mao B (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Proceedings of the ICIC, LNCS vol 3644, pp 878–887, Springer Han H, Wang W, Mao B (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Proceedings of the ICIC, LNCS vol 3644, pp 878–887, Springer
28.
Zurück zum Zitat Hand D (2009) Measuring classifier performance. A coherent alternative to the area under the ROC curve. Mach Learn J 42:203–231 Hand D (2009) Measuring classifier performance. A coherent alternative to the area under the ROC curve. Mach Learn J 42:203–231
29.
Zurück zum Zitat He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Data Knowl Eng 21(9):1263–1284CrossRef He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Data Knowl Eng 21(9):1263–1284CrossRef
30.
Zurück zum Zitat He H, Ma Y (eds) (2013) Imbalanced learning. Foundations, algorithms and applications. IEEE—Wiley He H, Ma Y (eds) (2013) Imbalanced learning. Foundations, algorithms and applications. IEEE—Wiley
31.
32.
Zurück zum Zitat Holte C, Acker LE, Porter BW (1989) Concept Learning and the problem of small disjuncts. In: Proceedings of the 11th IJCAI conference, pp 813–818 Holte C, Acker LE, Porter BW (1989) Concept Learning and the problem of small disjuncts. In: Proceedings of the 11th IJCAI conference, pp 813–818
33.
Zurück zum Zitat Japkowicz N (2001) Concept-learning in the presence of between-class and within-class imbalances. In: Proceedings of the Canadian conference on AI, pp 67–77 Japkowicz N (2001) Concept-learning in the presence of between-class and within-class imbalances. In: Proceedings of the Canadian conference on AI, pp 67–77
34.
Zurück zum Zitat Japkowicz N (2003) Class imbalance: are we focusing on the right issue? In: Proceedings of the II workshop on learning from imbalanced data sets, ICML conference, pp 17–23: Japkowicz N (2003) Class imbalance: are we focusing on the right issue? In: Proceedings of the II workshop on learning from imbalanced data sets, ICML conference, pp 17–23:
35.
Zurück zum Zitat Japkowicz N, Mohak S (2011) Evaluating learning algorithms: a classification perspective. Cambridge University Press, Cambridge Japkowicz N, Mohak S (2011) Evaluating learning algorithms: a classification perspective. Cambridge University Press, Cambridge
36.
Zurück zum Zitat Japkowicz N, Stephen S (2002) Class imbalance problem: a systematic study. Intell Data Anal J 6(5):429–450 Japkowicz N, Stephen S (2002) Class imbalance problem: a systematic study. Intell Data Anal J 6(5):429–450
37.
38.
Zurück zum Zitat Japkowicz N (2013) Assessment metrics for imbalanced learning. In: He H, Ma Y (eds) Imbalanced learning. foundations, algorithms and applications. IEEE—Wiley, pp 187–206 Japkowicz N (2013) Assessment metrics for imbalanced learning. In: He H, Ma Y (eds) Imbalanced learning. foundations, algorithms and applications. IEEE—Wiley, pp 187–206
39.
Zurück zum Zitat Kaluzny K (2009) Analysis of class decomposition in imbalanced data. Master Thesis (supervised by J. Stefanowski), Poznan University of Technology Kaluzny K (2009) Analysis of class decomposition in imbalanced data. Master Thesis (supervised by J. Stefanowski), Poznan University of Technology
40.
Zurück zum Zitat Khoshgoftaar T, Van Hulse J, Napolitano A (2011) Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans Syst Man Cybern-Part A 41(3):552–568CrossRef Khoshgoftaar T, Van Hulse J, Napolitano A (2011) Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans Syst Man Cybern-Part A 41(3):552–568CrossRef
41.
Zurück zum Zitat Krawczyk B, Wozniak M, Schaefer G (2014) Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl Soft Comput 14:544–562CrossRef Krawczyk B, Wozniak M, Schaefer G (2014) Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl Soft Comput 14:544–562CrossRef
42.
Zurück zum Zitat Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-side selection. In: Proceedings of the 14th international conference on machine learning ICML-97, pp 179–186 Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-side selection. In: Proceedings of the 14th international conference on machine learning ICML-97, pp 179–186
43.
Zurück zum Zitat Kubat M, Holte R, Matwin S (1998) Machine learning for the detection of oil spills in radar images. Mach Learn J 30:195–215CrossRef Kubat M, Holte R, Matwin S (1998) Machine learning for the detection of oil spills in radar images. Mach Learn J 30:195–215CrossRef
44.
Zurück zum Zitat Laurikkala J (2001) Improving identification of difficult small classes by balancing class distribution. Technical Report A-2001-2, University of Tampere Laurikkala J (2001) Improving identification of difficult small classes by balancing class distribution. Technical Report A-2001-2, University of Tampere
45.
Zurück zum Zitat Lewis D, Catlett J (1994) Heterogeneous uncertainty sampling for supervised learning. In: Proceedings of 11th international conference on machine learning, pp 148–156 Lewis D, Catlett J (1994) Heterogeneous uncertainty sampling for supervised learning. In: Proceedings of 11th international conference on machine learning, pp 148–156
46.
Zurück zum Zitat Lumijarvi J, Laurikkala J, Juhola M (2004) A comparison of different heterogeneous proximity functions and Euclidean distance. Stud Health Technol Inform 107(Part 2):1362–1366 Lumijarvi J, Laurikkala J, Juhola M (2004) A comparison of different heterogeneous proximity functions and Euclidean distance. Stud Health Technol Inform 107(Part 2):1362–1366
47.
Zurück zum Zitat Lopez V, Fernandez A, Garcia S, Palade V, Herrera F (2014) An Insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inform Sci 257:113–141CrossRef Lopez V, Fernandez A, Garcia S, Palade V, Herrera F (2014) An Insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inform Sci 257:113–141CrossRef
48.
Zurück zum Zitat Maciejewski T, Stefanowski J (2011) Local neighbourhood extension of SMOTE for mining imbalanced data. In: Proceedings of the IEEE symposium on computational intelligence and data mining, pp 104–111 Maciejewski T, Stefanowski J (2011) Local neighbourhood extension of SMOTE for mining imbalanced data. In: Proceedings of the IEEE symposium on computational intelligence and data mining, pp 104–111
49.
Zurück zum Zitat Maimon O, Rokach L (eds) (2005) The data mining and knowledge discovery handbook, Springer, New York Maimon O, Rokach L (eds) (2005) The data mining and knowledge discovery handbook, Springer, New York
50.
Zurück zum Zitat Maloof M (2003) Learning when data sets are imbalanced and when costs are unequal and unknown. In: Proceedings of the II workshop on learning from imbalanced data sets, ICML conference Maloof M (2003) Learning when data sets are imbalanced and when costs are unequal and unknown. In: Proceedings of the II workshop on learning from imbalanced data sets, ICML conference
51.
Zurück zum Zitat Moore A, Pelleg D (2000) X-means: extending k-means with efficient estimation of the numbers of clusters. In: Proceedings of the 17th ICML, pp 727–734 Moore A, Pelleg D (2000) X-means: extending k-means with efficient estimation of the numbers of clusters. In: Proceedings of the 17th ICML, pp 727–734
52.
Zurück zum Zitat Napierala K (2013) Improving rule classifiers for imbalanced data. Ph.D. Thesis. Poznan University of Technology Napierala K (2013) Improving rule classifiers for imbalanced data. Ph.D. Thesis. Poznan University of Technology
53.
Zurück zum Zitat Napierala K, Stefanowski J (2012) The influence of minority class distribution on learning from imbalance data. In: Proceedings of the 7th conference HAIS 2012, LNAI vol 7209, pp 139–150, Springer Napierala K, Stefanowski J (2012) The influence of minority class distribution on learning from imbalance data. In: Proceedings of the 7th conference HAIS 2012, LNAI vol 7209, pp 139–150, Springer
54.
Zurück zum Zitat Napierala K, Stefanowski J (2012) BRACID: a comprehensive approach to learning rules from imbalanced data. J Intell Inform Syst 39(2):335–373CrossRef Napierala K, Stefanowski J (2012) BRACID: a comprehensive approach to learning rules from imbalanced data. J Intell Inform Syst 39(2):335–373CrossRef
55.
Zurück zum Zitat Napierala K, Stefanowski J, Wilk Sz (2010) Learning from imbalanced data in presence of noisy and borderline examples. In: Proceedings of 7th international conference RSCTC 2010, LNAI vol 6086, pp 158–167, Springer Napierala K, Stefanowski J, Wilk Sz (2010) Learning from imbalanced data in presence of noisy and borderline examples. In: Proceedings of 7th international conference RSCTC 2010, LNAI vol 6086, pp 158–167, Springer
56.
Zurück zum Zitat Napierala K, Stefanowski J, Trzcielinska M (2014) Local characteristics of minority examples in pre-processing of imbalanced data. In: Proceedings of the ISMIS 2014, pp 123–132 Napierala K, Stefanowski J, Trzcielinska M (2014) Local characteristics of minority examples in pre-processing of imbalanced data. In: Proceedings of the ISMIS 2014, pp 123–132
57.
Zurück zum Zitat Nickerson A, Japkowicz N, Milios E (2001) Using unsupervised learning to guide re-sampling in imbalanced data sets. In: Proceedings of the 8th international workshop on artificial intelligence and statistics, pp 261–265 Nickerson A, Japkowicz N, Milios E (2001) Using unsupervised learning to guide re-sampling in imbalanced data sets. In: Proceedings of the 8th international workshop on artificial intelligence and statistics, pp 261–265
58.
Zurück zum Zitat Niemann U, Spiliopoulou M, Volzke, H, Kuhn JP (2014) Subpopulation discovery in epidemiological data with subspace clustering. Found Comput Decis Sci 39(4) Niemann U, Spiliopoulou M, Volzke, H, Kuhn JP (2014) Subpopulation discovery in epidemiological data with subspace clustering. Found Comput Decis Sci 39(4)
59.
Zurück zum Zitat Prati R, Gustavo E, Batista G, Monard M (2004) Learning with class skews and small disjuncts. In: Proceedings of the SBIA 2004, LNAI vol 3171, pp 296–306, Springer Prati R, Gustavo E, Batista G, Monard M (2004) Learning with class skews and small disjuncts. In: Proceedings of the SBIA 2004, LNAI vol 3171, pp 296–306, Springer
60.
Zurück zum Zitat Prati R, Batista G, Monard M (2004) Class imbalance versus class overlapping: an analysis of a learning system behavior. In: Proceedings 3rd mexican international conference on artificial intelligence, pp 312–321 Prati R, Batista G, Monard M (2004) Class imbalance versus class overlapping: an analysis of a learning system behavior. In: Proceedings 3rd mexican international conference on artificial intelligence, pp 312–321
61.
Zurück zum Zitat Parinaz S, Victor H, Matwin S (2014) Learning from imbalanced data using ensemble methods and cluster-based undersampling. In: Electronic Proceedings of the NFMCP 2014 workshop at ECML-PKDD 2014, Nancy Parinaz S, Victor H, Matwin S (2014) Learning from imbalanced data using ensemble methods and cluster-based undersampling. In: Electronic Proceedings of the NFMCP 2014 workshop at ECML-PKDD 2014, Nancy
62.
Zurück zum Zitat Saez JA, Luengo J, Stefanowski J, Herrera F (2015) Addressing the noisy and borderline examples problem in classification with imbalanced datasets via a class noise filtering method-based re-sampling technique. Inform Sci 291:184–203CrossRef Saez JA, Luengo J, Stefanowski J, Herrera F (2015) Addressing the noisy and borderline examples problem in classification with imbalanced datasets via a class noise filtering method-based re-sampling technique. Inform Sci 291:184–203CrossRef
63.
Zurück zum Zitat Stefanowski J (2007) On combined classifiers, rule induction and rough sets. Trans Rough Sets 6:329–350 Stefanowski J (2007) On combined classifiers, rule induction and rough sets. Trans Rough Sets 6:329–350
64.
Zurück zum Zitat Stefanowski J (2013) Overlapping, rare examples and class decomposition in learning classifiers from imbalanced data. In: Ramanna S, Jain LC, Howlett RJ (eds) Emerging paradigms in machine learning, pp 277–306 Stefanowski J (2013) Overlapping, rare examples and class decomposition in learning classifiers from imbalanced data. In: Ramanna S, Jain LC, Howlett RJ (eds) Emerging paradigms in machine learning, pp 277–306
65.
Zurück zum Zitat Stefanowski J, Wilk Sz (2008) Selective pre-processing of imbalanced data for improving classification performance. In: Proceedings of the 10th international confernace DaWaK 2008. LNCS vol 5182, pp 283–292, Springer Stefanowski J, Wilk Sz (2008) Selective pre-processing of imbalanced data for improving classification performance. In: Proceedings of the 10th international confernace DaWaK 2008. LNCS vol 5182, pp 283–292, Springer
66.
Zurück zum Zitat Stefanowski J, Wilk Sz (2009) Extending rule-based classifiers to improve recognition of imbalanced classes. In: Ras ZW, Dardzinska A (eds) Advances in data management, Studies in computational intelligence, vol 223, pp 131–154, Springer Stefanowski J, Wilk Sz (2009) Extending rule-based classifiers to improve recognition of imbalanced classes. In: Ras ZW, Dardzinska A (eds) Advances in data management, Studies in computational intelligence, vol 223, pp 131–154, Springer
67.
Zurück zum Zitat Ting K (1997) The problem of small disjuncts. Its remedy in decision trees. In: Proceedings of the 10th Canadian conference on AI, pp 91–97 Ting K (1997) The problem of small disjuncts. Its remedy in decision trees. In: Proceedings of the 10th Canadian conference on AI, pp 91–97
69.
Zurück zum Zitat Van Hulse J, Khoshgoftarr T (2009) Knowledge discovery from imbalanced and noisy data. Data Knowl Eng 68:1513–1542CrossRef Van Hulse J, Khoshgoftarr T (2009) Knowledge discovery from imbalanced and noisy data. Data Knowl Eng 68:1513–1542CrossRef
70.
Zurück zum Zitat Van Hulse J, Khoshgoftarr T, Napolitano A (2007) Experimental perspectives on learning from imbalanced data. In: Proceedings of ICML, pp 935–942 Van Hulse J, Khoshgoftarr T, Napolitano A (2007) Experimental perspectives on learning from imbalanced data. In: Proceedings of ICML, pp 935–942
71.
Zurück zum Zitat Verbiest N, Ramentol E, Cornelis C, Herrera F (2012) Improving SMOTE with fuzzy rough prototype selection to detect noise in imbalanced classification data. In: Proceedings of the international conference IBERAMIA, pp 169–178 Verbiest N, Ramentol E, Cornelis C, Herrera F (2012) Improving SMOTE with fuzzy rough prototype selection to detect noise in imbalanced classification data. In: Proceedings of the international conference IBERAMIA, pp 169–178
72.
Zurück zum Zitat Weiss GM (2004) Mining with rarity: a unifying framework. ACM SIGKDD Explor Newsl 6(1):7–19CrossRef Weiss GM (2004) Mining with rarity: a unifying framework. ACM SIGKDD Explor Newsl 6(1):7–19CrossRef
73.
Zurück zum Zitat Weiss GM, Hirsh H (2000) A quantitative study of small disjuncts. In: Proceedings of the 17th national conference on artificial intelligence—AAAI00, pp 665–670 Weiss GM, Hirsh H (2000) A quantitative study of small disjuncts. In: Proceedings of the 17th national conference on artificial intelligence—AAAI00, pp 665–670
74.
Zurück zum Zitat Weiss GM, Provost F (2003) Learning when training data are costly: the efect of class distribution on tree induction. J Artif Intell Res 19:315–354 Weiss GM, Provost F (2003) Learning when training data are costly: the efect of class distribution on tree induction. J Artif Intell Res 19:315–354
75.
Zurück zum Zitat Wilson DR, Martinez TR (1997) Improved heterogeneous distance functions. J Artif Intell Res 6:1–34MathSciNet Wilson DR, Martinez TR (1997) Improved heterogeneous distance functions. J Artif Intell Res 6:1–34MathSciNet
76.
Zurück zum Zitat Zhu X, Wu X, Yang Y (2014) Error detection and impact-sensitive instance ranking in noisy data sets. In: Proceeding of the 19th national conference on AI, AAAI’04 Zhu X, Wu X, Yang Y (2014) Error detection and impact-sensitive instance ranking in noisy data sets. In: Proceeding of the 19th national conference on AI, AAAI’04
Metadaten
Titel
Dealing with Data Difficulty Factors While Learning from Imbalanced Data
verfasst von
Jerzy Stefanowski
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-18781-5_17