Skip to main content
Erschienen in: Soft Computing 5/2012

01.05.2012 | Focus

Missing data imputation for fuzzy rule-based classification systems

verfasst von: Julián Luengo, José A. Sáez, Francisco Herrera

Erschienen in: Soft Computing | Ausgabe 5/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Fuzzy rule-based classification systems (FRBCSs) are known due to their ability to treat with low quality data and obtain good results in these scenarios. However, their application in problems with missing data are uncommon while in real-life data, information is frequently incomplete in data mining, caused by the presence of missing values in attributes. Several schemes have been studied to overcome the drawbacks produced by missing values in data mining tasks; one of the most well known is based on preprocessing, formerly known as imputation. In this work, we focus on FRBCSs considering 14 different approaches to missing attribute values treatment that are presented and analyzed. The analysis involves three different methods, in which we distinguish between Mamdani and TSK models. From the obtained results, the convenience of using imputation methods for FRBCSs with missing values is stated. The analysis suggests that each type behaves differently while the use of determined missing values imputation methods could improve the accuracy obtained for these methods. Thus, the use of particular imputation methods conditioned to the type of FRBCSs is required.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Acuna E, Rodriguez C (2004) The treatment of missing values and its effect in the classifier accuracy. In: Banks D, House L, McMorris F, Arabie P, Gaul W (eds) Classification, clustering and data mining applications. Springer, Berlin, pp 639–648 Acuna E, Rodriguez C (2004) The treatment of missing values and its effect in the classifier accuracy. In: Banks D, House L, McMorris F, Arabie P, Gaul W (eds) Classification, clustering and data mining applications. Springer, Berlin, pp 639–648
Zurück zum Zitat Alcalá-Fdez J, Sánchez L, García S, Jesus MJD, Ventura S, Garrell JM, Otero J, Bacardit J, Rivas VM, Fernández JC, Herrera F (2009) Keel: A software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318CrossRef Alcalá-Fdez J, Sánchez L, García S, Jesus MJD, Ventura S, Garrell JM, Otero J, Bacardit J, Rivas VM, Fernández JC, Herrera F (2009) Keel: A software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318CrossRef
Zurück zum Zitat Barnard J, Meng X (1999) Applications of multiple imputation in medical studies: From AIDS to NHANES. Stat Methods Med Res 8(1):17–36CrossRef Barnard J, Meng X (1999) Applications of multiple imputation in medical studies: From AIDS to NHANES. Stat Methods Med Res 8(1):17–36CrossRef
Zurück zum Zitat Batista G, Monard M (2003) An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell 17(5):519–533CrossRef Batista G, Monard M (2003) An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell 17(5):519–533CrossRef
Zurück zum Zitat Berthold MR, Huber KP (1998) Missing values and learning of fuzzy rules. Int J Uncertain, Fuzziness and Knowl-Based Syst 6:171–178MATHCrossRef Berthold MR, Huber KP (1998) Missing values and learning of fuzzy rules. Int J Uncertain, Fuzziness and Knowl-Based Syst 6:171–178MATHCrossRef
Zurück zum Zitat Chen Y, Wang JZ (2003) Support vector learning for fuzzy rule-based classification systems. IEEE Trans on Fuzzy Systems 11(6):716–728CrossRef Chen Y, Wang JZ (2003) Support vector learning for fuzzy rule-based classification systems. IEEE Trans on Fuzzy Systems 11(6):716–728CrossRef
Zurück zum Zitat Chi Z, Yan H, Pham T (1996) Fuzzy algorithms with applications to image processing and pattern recognition. World Scientific Chi Z, Yan H, Pham T (1996) Fuzzy algorithms with applications to image processing and pattern recognition. World Scientific
Zurück zum Zitat Cover TM, Thomas JA (1991) Elements of Information Theory, 2nd edn. John Wiley Cover TM, Thomas JA (1991) Elements of Information Theory, 2nd edn. John Wiley
Zurück zum Zitat Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, New York Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, New York
Zurück zum Zitat Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH
Zurück zum Zitat Farhangfar A, Kurgan LA, Pedrycz W (2007) A novel framework for imputation of missing values in databases. IEEE Trans Syst, Man, Cybern, Part A 37(5):692–709CrossRef Farhangfar A, Kurgan LA, Pedrycz W (2007) A novel framework for imputation of missing values in databases. IEEE Trans Syst, Man, Cybern, Part A 37(5):692–709CrossRef
Zurück zum Zitat Farhangfar A, Kurgan L, Dy J (2008) Impact of imputation of missing values on classification error for discrete data. Pattern Recognit 41(12):3692–3705MATHCrossRef Farhangfar A, Kurgan L, Dy J (2008) Impact of imputation of missing values on classification error for discrete data. Pattern Recognit 41(12):3692–3705MATHCrossRef
Zurück zum Zitat Feng H, Guoshun C, Cheng Y, Yang B, Chen Y (2005) A SVM regression based approach to filling in missing values. In: Khosla R, Howlett RJ, Jain LC (eds) 9th international conference on knowledge-based & intelligent information & engineering systems (KES 2005), Springer, Lecture Notes in Computer Science, vol 3683, pp 581–587 Feng H, Guoshun C, Cheng Y, Yang B, Chen Y (2005) A SVM regression based approach to filling in missing values. In: Khosla R, Howlett RJ, Jain LC (eds) 9th international conference on knowledge-based & intelligent information & engineering systems (KES 2005), Springer, Lecture Notes in Computer Science, vol 3683, pp 581–587
Zurück zum Zitat Gabriel TR, Berthold MR (2005) Missing values in fuzzy rule induction. In: Anderson G, Tunstel E (eds) 2005 IEEE conference on systems, man and cybernetics, IEEE Press Gabriel TR, Berthold MR (2005) Missing values in fuzzy rule induction. In: Anderson G, Tunstel E (eds) 2005 IEEE conference on systems, man and cybernetics, IEEE Press
Zurück zum Zitat García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694MATH García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694MATH
Zurück zum Zitat García-Laencina P, Sancho-Gómez J, Figueiras-Vidal A (2009) Pattern classification with missing data: a review. Neural Comput Appl 9(1):1–12 García-Laencina P, Sancho-Gómez J, Figueiras-Vidal A (2009) Pattern classification with missing data: a review. Neural Comput Appl 9(1):1–12
Zurück zum Zitat Gheyas IA, Smith LS (2010) A neural network-based framework for the reconstruction of incomplete data sets. Neurocomputing 73(16–18):3039–3065CrossRef Gheyas IA, Smith LS (2010) A neural network-based framework for the reconstruction of incomplete data sets. Neurocomputing 73(16–18):3039–3065CrossRef
Zurück zum Zitat Grzymala-Busse J, Goodwin L, Grzymala-Busse W, Zheng X (2005) Handling missing attribute values in preterm birth data sets. In: 10th international conference of rough sets and fuzzy sets and data mining and granular computing (RSFDGrC’5), pp 342–351 Grzymala-Busse J, Goodwin L, Grzymala-Busse W, Zheng X (2005) Handling missing attribute values in preterm birth data sets. In: 10th international conference of rough sets and fuzzy sets and data mining and granular computing (RSFDGrC’5), pp 342–351
Zurück zum Zitat Grzymala-Busse JW, Hu M (2000) A comparison of several approaches to missing attribute values in data mining. In: Ziarko W, Yao YY (eds) Rough sets and current trends in computing, Springer, lecture notes in computer science, vol 2005, pp 378–385 Grzymala-Busse JW, Hu M (2000) A comparison of several approaches to missing attribute values in data mining. In: Ziarko W, Yao YY (eds) Rough sets and current trends in computing, Springer, lecture notes in computer science, vol 2005, pp 378–385
Zurück zum Zitat Ishibuchi H, Nakashima T, Nii M (2004) Classification and modeling with linguistic information granules: advanced approaches to linguistic data mining. Springer-Verlag New York Inc. Ishibuchi H, Nakashima T, Nii M (2004) Classification and modeling with linguistic information granules: advanced approaches to linguistic data mining. Springer-Verlag New York Inc.
Zurück zum Zitat Ishibuchi H, Yamamoto T, Nakashima T (2005) Hybridization of fuzzy GBML approaches for pattern classification problems. IEEE Trans Syst, Man Cybernet B 35(2):359–365CrossRef Ishibuchi H, Yamamoto T, Nakashima T (2005) Hybridization of fuzzy GBML approaches for pattern classification problems. IEEE Trans Syst, Man Cybernet B 35(2):359–365CrossRef
Zurück zum Zitat Hruschka Jr. ER, Hruschka ER, Ebecken NFF (2007) Bayesian networks for imputation in classification problems. J Intell Inf Syst 29(3):231–252 Hruschka Jr. ER, Hruschka ER, Ebecken NFF (2007) Bayesian networks for imputation in classification problems. J Intell Inf Syst 29(3):231–252
Zurück zum Zitat Kim H, Golub GH, Park H (2005) Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics 21(2):187–198CrossRef Kim H, Golub GH, Park H (2005) Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics 21(2):187–198CrossRef
Zurück zum Zitat Kuncheva L (2000) Fuzzy classifier design. Springer, BerlinMATH Kuncheva L (2000) Fuzzy classifier design. Springer, BerlinMATH
Zurück zum Zitat Kwak N, Choi CH (2002a) Input feature selection by mutual information based on parzen window. IEEE Trans Pattern Anal Mach Intell 24(12):1667–1671CrossRef Kwak N, Choi CH (2002a) Input feature selection by mutual information based on parzen window. IEEE Trans Pattern Anal Mach Intell 24(12):1667–1671CrossRef
Zurück zum Zitat Kwak N, Choi CH (2002b) Input feature selection for classification problems. IEEE Trans Neural Netw 13(1):143–159CrossRef Kwak N, Choi CH (2002b) Input feature selection for classification problems. IEEE Trans Neural Netw 13(1):143–159CrossRef
Zurück zum Zitat Li D, Deogun J, Spaulding W, Shuart B (2004) Towards missing data imputation: a study of fuzzy k-means clustering method. In: 4th international conference of rough sets and current trends in computing (RSCTC04), pp 573–579 Li D, Deogun J, Spaulding W, Shuart B (2004) Towards missing data imputation: a study of fuzzy k-means clustering method. In: 4th international conference of rough sets and current trends in computing (RSCTC04), pp 573–579
Zurück zum Zitat Little RJA, Rubin DB (1987) Statistical Analysis with Missing Data, 1st edn. Wiley series in probability and statistics. Wiley, New York Little RJA, Rubin DB (1987) Statistical Analysis with Missing Data, 1st edn. Wiley series in probability and statistics. Wiley, New York
Zurück zum Zitat Luengo J, García S, Herrera F (2010) A study on the use of imputation methods for experimentation with radial basis function network classifiers handling missing attribute values: the good synergy between RBFNs and event covering method. Neural Netw 23:406–418CrossRef Luengo J, García S, Herrera F (2010) A study on the use of imputation methods for experimentation with radial basis function network classifiers handling missing attribute values: the good synergy between RBFNs and event covering method. Neural Netw 23:406–418CrossRef
Zurück zum Zitat Matsubara ET, Prati RC, Batista GEAPA, Monard MC (2008) Missing value imputation using a semi-supervised rank aggregation approach. In: Zaverucha G, da Costa ACPL (eds) 19th Brazilian symposium on artificial intelligence (SBIA 2008), Springer, Lecture Notes in Computer Science, vol 5249, pp 217–226 Matsubara ET, Prati RC, Batista GEAPA, Monard MC (2008) Missing value imputation using a semi-supervised rank aggregation approach. In: Zaverucha G, da Costa ACPL (eds) 19th Brazilian symposium on artificial intelligence (SBIA 2008), Springer, Lecture Notes in Computer Science, vol 5249, pp 217–226
Zurück zum Zitat Oba S, aki Sato M, Takemasa I, Monden M, ichi Matsubara K, Ishii S (2003) A bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16):2088–2096CrossRef Oba S, aki Sato M, Takemasa I, Monden M, ichi Matsubara K, Ishii S (2003) A bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16):2088–2096CrossRef
Zurück zum Zitat Peng H, Long F, Ding C (2005) Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IIEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRef Peng H, Long F, Ding C (2005) Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IIEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRef
Zurück zum Zitat Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methods: support vector learning, MIT Press, Cambridge, pp 185–208 Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methods: support vector learning, MIT Press, Cambridge, pp 185–208
Zurück zum Zitat Pyle D (1999) Data preparation for data mining. Morgan Kaufmann Pyle D (1999) Data preparation for data mining. Morgan Kaufmann
Zurück zum Zitat Schneider T (2001) Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. J Clim 14:853–871CrossRef Schneider T (2001) Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. J Clim 14:853–871CrossRef
Zurück zum Zitat Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for dna microarrays. Bioinformatics 17(6):520–525CrossRef Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for dna microarrays. Bioinformatics 17(6):520–525CrossRef
Zurück zum Zitat Vapnik VN (1998) Statistical learning theory. Wiley-Interscience Vapnik VN (1998) Statistical learning theory. Wiley-Interscience
Zurück zum Zitat Wang H, Wang S (2010) Mining incomplete survey data through classification. Knowl Inf Syst 24(2):221–233CrossRef Wang H, Wang S (2010) Mining incomplete survey data through classification. Knowl Inf Syst 24(2):221–233CrossRef
Zurück zum Zitat Wang LX, Mendel JM (1992) Generating fuzzy rules by learning from examples. IEEE Trans Syst, Man, Cybernet 25(2):353–361MathSciNet Wang LX, Mendel JM (1992) Generating fuzzy rules by learning from examples. IEEE Trans Syst, Man, Cybernet 25(2):353–361MathSciNet
Zurück zum Zitat Wilson D (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst, Man, Cybernet 2(3):408–421MATHCrossRef Wilson D (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst, Man, Cybernet 2(3):408–421MATHCrossRef
Zurück zum Zitat Wong AKC, Chiu DKY (1987) Synthesizing statistical knowledge from incomplete mixed-mode data. IEEE Trans Pattern Anal Mach Intell 9(6):796–805CrossRef Wong AKC, Chiu DKY (1987) Synthesizing statistical knowledge from incomplete mixed-mode data. IEEE Trans Pattern Anal Mach Intell 9(6):796–805CrossRef
Metadaten
Titel
Missing data imputation for fuzzy rule-based classification systems
verfasst von
Julián Luengo
José A. Sáez
Francisco Herrera
Publikationsdatum
01.05.2012
Verlag
Springer-Verlag
Erschienen in
Soft Computing / Ausgabe 5/2012
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-011-0774-4

Weitere Artikel der Ausgabe 5/2012

Soft Computing 5/2012 Zur Ausgabe