Skip to main content
Erschienen in: Soft Computing 11/2015

01.11.2015 | Methodologies and Application

NEATER: filtering of over-sampled data using non-cooperative game theory

verfasst von: B. A. Almogahed, I. A. Kakadiaris

Erschienen in: Soft Computing | Ausgabe 11/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we present a method for the filteriNg of ovEr-sampled dAta using non-cooperaTive gamE theoRy (NEATER) to address the imbalanced data problem. Specifically, the problem is formulated as a non-cooperative game where all the data are players and the goal is to uniformly and consistently label all of the synthetic data created by any over-sampling technique. The proposed algorithm does not require any prior assumptions and selects representative synthetic instances while generating a very small number of noisy data. We present extensive experimental results over a large collection of datasets using three different classifiers to demonstrate the advantages of our method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Ahn H, Moon H, Fazzari MJ, Lim N, Chen JJ, Kodell RL (2007) Classification by ensembles from random partitions of high-dimensional data. Comput Stat Data Anal 51(12):6166–6179MATHMathSciNetCrossRef Ahn H, Moon H, Fazzari MJ, Lim N, Chen JJ, Kodell RL (2007) Classification by ensembles from random partitions of high-dimensional data. Comput Stat Data Anal 51(12):6166–6179MATHMathSciNetCrossRef
Zurück zum Zitat Alcalá J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2010) KEEL data-mining software: data set repository, integration of algorithms and experimental analysis framework. J Multiple Valued Logic Soft Comput 255–287 Alcalá J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2010) KEEL data-mining software: data set repository, integration of algorithms and experimental analysis framework. J Multiple Valued Logic Soft Comput 255–287
Zurück zum Zitat Almogahed BA, Kakadiaris IA (2014) NEATER: filtering of over-sampled data using non-cooperative game theory. In: Proceedings of the international conference of pattern recognition, Stockholm, Sweden (in press) Almogahed BA, Kakadiaris IA (2014) NEATER: filtering of over-sampled data using non-cooperative game theory. In: Proceedings of the international conference of pattern recognition, Stockholm, Sweden (in press)
Zurück zum Zitat Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750CrossRef Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750CrossRef
Zurück zum Zitat Barandela R, Sánchez JS, García V, Rangel E (2003) Strategies for learning in class imbalance problems. Pattern Recogn 36(3):849–851CrossRef Barandela R, Sánchez JS, García V, Rangel E (2003) Strategies for learning in class imbalance problems. Pattern Recogn 36(3):849–851CrossRef
Zurück zum Zitat Batista G, Prati R, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6(1):20–29CrossRef Batista G, Prati R, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6(1):20–29CrossRef
Zurück zum Zitat Blagus R, Lusa L (2013) SMOTE for high-dimensional class-imbalanced data. BMC Bioinform 14(1):106CrossRef Blagus R, Lusa L (2013) SMOTE for high-dimensional class-imbalanced data. BMC Bioinform 14(1):106CrossRef
Zurück zum Zitat Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Springer (ed) Advances in knowledge discovery and data mining. Springer, New York, pp 475–482 Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Springer (ed) Advances in knowledge discovery and data mining. Springer, New York, pp 475–482
Zurück zum Zitat Chawla N, Bowyer K, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATH Chawla N, Bowyer K, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATH
Zurück zum Zitat Chen JJ, Tsai CA, Young JF, Kodell RL (2005) Classification ensembles for unbalanced class sizes in predictive toxicology. SAR QSAR Environ Res 16(6):517–529CrossRef Chen JJ, Tsai CA, Young JF, Kodell RL (2005) Classification ensembles for unbalanced class sizes in predictive toxicology. SAR QSAR Environ Res 16(6):517–529CrossRef
Zurück zum Zitat Chen X, Song E, Ma G (2010) An adaptive cost-sensitive classifier. In: Proceedings of the 2nd international conference on computer automation engineering, Singapore, pp 699–701 Chen X, Song E, Ma G (2010) An adaptive cost-sensitive classifier. In: Proceedings of the 2nd international conference on computer automation engineering, Singapore, pp 699–701
Zurück zum Zitat Christensen BC, Houseman AE, Marsit CJ, Zheng S, Wrensch MR, Wiemels JL, Nelson HH, Karagas MR, Padbury JF, Bueno R, Sugarbaker DJ, Yeh R, Wiencke JK, Kelsey KT (2009) Aging and environmental exposures alter tissue-specific dna methylation dependent upon CPG island context. PLOS Genet 5(8):e1000602 Christensen BC, Houseman AE, Marsit CJ, Zheng S, Wrensch MR, Wiemels JL, Nelson HH, Karagas MR, Padbury JF, Bueno R, Sugarbaker DJ, Yeh R, Wiencke JK, Kelsey KT (2009) Aging and environmental exposures alter tissue-specific dna methylation dependent upon CPG island context. PLOS Genet 5(8):e1000602
Zurück zum Zitat Cohen G, Hilario M, Sax H, Hugonnet S, Geissbuhler A (2006) Learning from imbalanced data in surveillance of nosocomial infection. Artif Intell Med 37(1):7–18CrossRef Cohen G, Hilario M, Sax H, Hugonnet S, Geissbuhler A (2006) Learning from imbalanced data in surveillance of nosocomial infection. Artif Intell Med 37(1):7–18CrossRef
Zurück zum Zitat Cressman R (1992) The stability concept of evolutionary game theory: a dynamic approach. Springer-Verlag, New York Cressman R (1992) The stability concept of evolutionary game theory: a dynamic approach. Springer-Verlag, New York
Zurück zum Zitat Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MATHMathSciNet Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MATHMathSciNet
Zurück zum Zitat Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18CrossRef Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18CrossRef
Zurück zum Zitat García S, Herrera F (2009) Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy. Evol Comput 17(3):275–306CrossRef García S, Herrera F (2009) Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy. Evol Comput 17(3):275–306CrossRef
Zurück zum Zitat García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf Sci 180(10):2044–2064CrossRef García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf Sci 180(10):2044–2064CrossRef
Zurück zum Zitat García V, Sánchez JS, Mollineda RA (2012) On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl Based Syst 25(1):13–21CrossRef García V, Sánchez JS, Mollineda RA (2012) On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl Based Syst 25(1):13–21CrossRef
Zurück zum Zitat Gordon GJ, Jensen RV, Hsiao L, Gullans SR, Blumenstock JE, Ramaswamy S, Richards WG, Sugarbaker DJ, Bueno R (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62(17):4963–4967 Gordon GJ, Jensen RV, Hsiao L, Gullans SR, Blumenstock JE, Ramaswamy S, Richards WG, Sugarbaker DJ, Bueno R (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62(17):4963–4967
Zurück zum Zitat Guyon I (2003) Design of experiments of the NIPS 2003 variable selection benchmark. NIPS 2003 workshop on feature extraction and feature selection Guyon I (2003) Design of experiments of the NIPS 2003 variable selection benchmark. NIPS 2003 workshop on feature extraction and feature selection
Zurück zum Zitat Guyon IS, Gunn MN, Zadeh L (2006) Feature extraction. Springer, New York Guyon IS, Gunn MN, Zadeh L (2006) Feature extraction. Springer, New York
Zurück zum Zitat Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten H (2009) WEKA data mining software. ACM SIGKDD Explor Newslett 11(1):10–18CrossRef Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten H (2009) WEKA data mining software. ACM SIGKDD Explor Newslett 11(1):10–18CrossRef
Zurück zum Zitat Han H, Wang W, Mao B (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Adv Intell Comput (Springer) 3644:878–887 Han H, Wang W, Mao B (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Adv Intell Comput (Springer) 3644:878–887
Zurück zum Zitat Hart PE (1968) The condensed nearest neighbour rule. IEEE Trans Inf Theory 515–516 Hart PE (1968) The condensed nearest neighbour rule. IEEE Trans Inf Theory 515–516
Zurück zum Zitat He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef
Zurück zum Zitat He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the IEEE international joint conference on neural networks, Hong Kong, pp 1322–1328 He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the IEEE international joint conference on neural networks, Hong Kong, pp 1322–1328
Zurück zum Zitat Holte RC, Acker LE, Porter BW (1989) Concept learning and the problem of small disjuncts. In: Proceedings of the 11th international joint conference on artificial intelligence, vol 1, Detroit Holte RC, Acker LE, Porter BW (1989) Concept learning and the problem of small disjuncts. In: Proceedings of the 11th international joint conference on artificial intelligence, vol 1, Detroit
Zurück zum Zitat Hu B, Dong W (2014) A study on cost behaviors of binary classification measures in class-imbalanced problems. arXiv preprint arXiv:1403.7100, p 1 Hu B, Dong W (2014) A study on cost behaviors of binary classification measures in class-imbalanced problems. arXiv preprint arXiv:​1403.​7100, p 1
Zurück zum Zitat Howson TJ (1972) Equilibria of polymatrix games. Manag Sci 312–318 Howson TJ (1972) Equilibria of polymatrix games. Manag Sci 312–318
Zurück zum Zitat Kreps DM (1990) Game theory and economic modelling. Clarendon, OxfordCrossRef Kreps DM (1990) Game theory and economic modelling. Clarendon, OxfordCrossRef
Zurück zum Zitat Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the 14th international conference on machine learning, pp 179–186 Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the 14th international conference on machine learning, pp 179–186
Zurück zum Zitat Laurikkala J (2001) Improving identification of difficult small classes by balancing class distribution. Artif Intell Med 63–66 Laurikkala J (2001) Improving identification of difficult small classes by balancing class distribution. Artif Intell Med 63–66
Zurück zum Zitat Lemnaru C, Rodica P (2012) Imbalanced classification problems: systematic study, issues and best practices. In: Springer (ed) Enterprise information systems. Springer, New York, pp 35–50 Lemnaru C, Rodica P (2012) Imbalanced classification problems: systematic study, issues and best practices. In: Springer (ed) Enterprise information systems. Springer, New York, pp 35–50
Zurück zum Zitat Lusa L, Blagus R (2010) Class prediction for high-dimensional class-imbalanced data. BMC Bioinform 11(1):523CrossRef Lusa L, Blagus R (2010) Class prediction for high-dimensional class-imbalanced data. BMC Bioinform 11(1):523CrossRef
Zurück zum Zitat Maratea A, Petrosino A, Manzo M (2014) Adjusted f-measure and kernel scaling for imbalanced data learning. Inf Sci 257:331–341CrossRef Maratea A, Petrosino A, Manzo M (2014) Adjusted f-measure and kernel scaling for imbalanced data learning. Inf Sci 257:331–341CrossRef
Zurück zum Zitat Meng HH, Li GZ, Wang R, Zhao X, Chen L (2008) The imbalanced problem in mass-spectrometry data analysis. In: Proceedings of the LNOR 9: the second international symposium on optimization and systems biology (OSB108), Lijiang, pp 136–143 Meng HH, Li GZ, Wang R, Zhao X, Chen L (2008) The imbalanced problem in mass-spectrometry data analysis. In: Proceedings of the LNOR 9: the second international symposium on optimization and systems biology (OSB108), Lijiang, pp 136–143
Zurück zum Zitat Merz C, Murphy P, Aha D (2012) UCI repository of machine learning databases. Department of Information and Computer Science, University of California Merz C, Murphy P, Aha D (2012) UCI repository of machine learning databases. Department of Information and Computer Science, University of California
Zurück zum Zitat Nash J (1951) Non-cooperative games. Ann Math 54(2):286–295 Nash J (1951) Non-cooperative games. Ann Math 54(2):286–295
Zurück zum Zitat Nisan N, Roughgarden T, Tardos E, Vazirani VV (2007) Algorithmic game theory. Cambridge University Press, Cambridge Nisan N, Roughgarden T, Tardos E, Vazirani VV (2007) Algorithmic game theory. Cambridge University Press, Cambridge
Zurück zum Zitat Oh S (2011) Error back-propagation algorithm for classification of imbalanced data. Neurocomputing 74(6):1058–1061CrossRef Oh S (2011) Error back-propagation algorithm for classification of imbalanced data. Neurocomputing 74(6):1058–1061CrossRef
Zurück zum Zitat Ordeshook PC (1986) Game theory and political theory: an introduction. Cambridge University Press, Cambridge Ordeshook PC (1986) Game theory and political theory: an introduction. Cambridge University Press, Cambridge
Zurück zum Zitat Orriols-Puig A, Bernadó-Mansilla E (2009) Evolutionary rule-based systems for imbalanced data sets. Soft Comput Fusion Found Methodol Appl 13(3):213–225 Orriols-Puig A, Bernadó-Mansilla E (2009) Evolutionary rule-based systems for imbalanced data sets. Soft Comput Fusion Found Methodol Appl 13(3):213–225
Zurück zum Zitat Ramentol E, Caballero Y, Bello R, Herrera F (2012) Smote-rsb*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory. Knowl Inf Syst 33(2):245–265CrossRef Ramentol E, Caballero Y, Bello R, Herrera F (2012) Smote-rsb*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory. Knowl Inf Syst 33(2):245–265CrossRef
Zurück zum Zitat Rota Bulò S, Bomze IM (2011) Infection and immunization: a new class of evolutionary game dynamics. Games Econ Behav 71(1):193–211MATHCrossRef Rota Bulò S, Bomze IM (2011) Infection and immunization: a new class of evolutionary game dynamics. Games Econ Behav 71(1):193–211MATHCrossRef
Zurück zum Zitat Smith J (1982) Evolution and the theory of games. Cambridge University Press, Cambridge Smith J (1982) Evolution and the theory of games. Cambridge University Press, Cambridge
Zurück zum Zitat Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437CrossRef Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437CrossRef
Zurück zum Zitat Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen M, Michael B, Rijn MV, Jeffrey S, Thorsen T, Quist H, Matese JC, Brown PO, Botstein D, Lønning E, Børresen-Dale A (2001) Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci 98:10869–10874 Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen M, Michael B, Rijn MV, Jeffrey S, Thorsen T, Quist H, Matese JC, Brown PO, Botstein D, Lønning E, Børresen-Dale A (2001) Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci 98:10869–10874
Zurück zum Zitat Tao D, Tang X, Li X, Wu X (2006) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans Pattern Anal Mach Intell 28(7):1088–1099CrossRef Tao D, Tang X, Li X, Wu X (2006) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans Pattern Anal Mach Intell 28(7):1088–1099CrossRef
Zurück zum Zitat Wang BX, Japkowicz N (2004) Imbalanced data set learning with synthetic samples. In: Proceedings of the IRIS machine learning workshop, Canada Wang BX, Japkowicz N (2004) Imbalanced data set learning with synthetic samples. In: Proceedings of the IRIS machine learning workshop, Canada
Zurück zum Zitat Weibull JW (1997) Evolutionary game theory. MIT Press, London Weibull JW (1997) Evolutionary game theory. MIT Press, London
Zurück zum Zitat Yang K, Cai Z, Li J, Lin G (2006) A stable gene selection in microarray data analysis. BMC Bioinform 7(1):228CrossRef Yang K, Cai Z, Li J, Lin G (2006) A stable gene selection in microarray data analysis. BMC Bioinform 7(1):228CrossRef
Zurück zum Zitat Yoon K, Kwek S (2005) An unsupervised learning approach to resolving the data imbalanced issue in supervised learning problems in functional genomics. In: Proceedings of the hybrid intelligent systems, Rio de Janeiro, p 6 Yoon K, Kwek S (2005) An unsupervised learning approach to resolving the data imbalanced issue in supervised learning problems in functional genomics. In: Proceedings of the hybrid intelligent systems, Rio de Janeiro, p 6
Zurück zum Zitat Zhang D, Liu W, Gong X, Jin H (2011) A novel improved smote resampling algorithm based on fractal. J Comput Inf Syst 7(6):2204–2211 Zhang D, Liu W, Gong X, Jin H (2011) A novel improved smote resampling algorithm based on fractal. J Comput Inf Syst 7(6):2204–2211
Metadaten
Titel
NEATER: filtering of over-sampled data using non-cooperative game theory
verfasst von
B. A. Almogahed
I. A. Kakadiaris
Publikationsdatum
01.11.2015
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 11/2015
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-014-1484-5

Weitere Artikel der Ausgabe 11/2015

Soft Computing 11/2015 Zur Ausgabe