Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 1/2018

29.03.2015 | Original Article

Information-decomposition-model-based missing value estimation for not missing at random dataset

verfasst von: Shigang Liu, Honghua Dai, Min Gan

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Missing data estimation is an important strategy for improving learning performance in learning from incomplete data, especially, when there are non discardable records with missing values. However, most of the existing algorithms are focused on missing at random (MAR) or missing completely at random (MCAR), and less attention has been paid to data not missing at random (NMAR). In this paper, an information decomposition imputation (IDIM) algorithm using fuzzy membership function is proposed for addressing the missing value problem under NMAR. Firstly, the proposed IDIM algorithm is presented with detailed examples. Then, the proposed approach is evaluated with extensive experiments compared with some typical algorithms. The experimental results demonstrate that the proposed algorithm has higher accuracy than the exiting imputation approaches in terms of normal root mean square error (NRMSE) and TP+TN evaluation under different missing strategies.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Qin Y, Zhang S et al (2009) POP algorithm: kernel-based imputation to treat missing values in knowledge discovery from databases. Expert syst Appl 36(2):2794–2804CrossRef Qin Y, Zhang S et al (2009) POP algorithm: kernel-based imputation to treat missing values in knowledge discovery from databases. Expert syst Appl 36(2):2794–2804CrossRef
2.
Zurück zum Zitat Vagin V, Fomina M (2011) Problem of knowledge discovery in noisy databases. Inter J Mach Learn Cybern 2(3):135–145CrossRef Vagin V, Fomina M (2011) Problem of knowledge discovery in noisy databases. Inter J Mach Learn Cybern 2(3):135–145CrossRef
3.
Zurück zum Zitat Yu T, Peng H et al (2011) Incorporating nonlinear relationships in microarray missing value imputation. Comput Biol Bioinform IEEE/ACM Trans 8(3):723–731CrossRef Yu T, Peng H et al (2011) Incorporating nonlinear relationships in microarray missing value imputation. Comput Biol Bioinform IEEE/ACM Trans 8(3):723–731CrossRef
5.
Zurück zum Zitat Zhang S, Qin Z et al (2005) Missing is useful: missing values in cost-sensitive decision trees. Knowl Data Eng IEEE Trans 17(12):1689–1693CrossRef Zhang S, Qin Z et al (2005) Missing is useful: missing values in cost-sensitive decision trees. Knowl Data Eng IEEE Trans 17(12):1689–1693CrossRef
6.
Zurück zum Zitat Qin Y, Zhang S et al (2007) Semi-parametric optimization for missing data imputation. Appl Intell 27(1):79–88MATHCrossRef Qin Y, Zhang S et al (2007) Semi-parametric optimization for missing data imputation. Appl Intell 27(1):79–88MATHCrossRef
7.
Zurück zum Zitat Saar-Tsechansky M, Provost F (2007) Handling missing values when applying classification models. J Mach Learn Res 8:1217–1250MathSciNetMATH Saar-Tsechansky M, Provost F (2007) Handling missing values when applying classification models. J Mach Learn Res 8:1217–1250MathSciNetMATH
8.
Zurück zum Zitat Zhu X, Zhang S et al (2011) Missing value estimation for mixed-attribute data sets. Knowl Data Eng IEEE Trans 23(1):110–121MathSciNetCrossRef Zhu X, Zhang S et al (2011) Missing value estimation for mixed-attribute data sets. Knowl Data Eng IEEE Trans 23(1):110–121MathSciNetCrossRef
9.
Zurück zum Zitat Allison PD (2000) Missing data. Sage Thousand Oaks, USAMATH Allison PD (2000) Missing data. Sage Thousand Oaks, USAMATH
10.
Zurück zum Zitat Little RJ, Rubin DB (2002) Statistical analysis with missing data Little RJ, Rubin DB (2002) Statistical analysis with missing data
11.
Zurück zum Zitat Rubin DB (2004) Multiple imputation for nonresponse in surveys. John Wiley and Sons, New York Rubin DB (2004) Multiple imputation for nonresponse in surveys. John Wiley and Sons, New York
12.
Zurück zum Zitat Ramoni M, Sebastiani P (1997) Learning Bayesian networks from incomplete databases. In: Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., USA Ramoni M, Sebastiani P (1997) Learning Bayesian networks from incomplete databases. In: Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., USA
13.
Zurück zum Zitat Ghahramani Z, Jordan MI (1997) Mixture models for learning from incomplete data. Comput Learn Theory Nat Learn Syst 4:67–85 Ghahramani Z, Jordan MI (1997) Mixture models for learning from incomplete data. Comput Learn Theory Nat Learn Syst 4:67–85
14.
Zurück zum Zitat Dick U, Haider P et al. (2008) Learning from incomplete data with infinite imputations. In: Proceedings of the 25th international conference on Machine learning, ACM Dick U, Haider P et al. (2008) Learning from incomplete data with infinite imputations. In: Proceedings of the 25th international conference on Machine learning, ACM
15.
Zurück zum Zitat Dai H, Ciesielski V (1994) Learning of inexact rules by the fish-net algorithm from low quality data. In: Proceedings of the Eigth Australian Joint Artificial Intelligence Conference, Citeseer Dai H, Ciesielski V (1994) Learning of inexact rules by the fish-net algorithm from low quality data. In: Proceedings of the Eigth Australian Joint Artificial Intelligence Conference, Citeseer
16.
Zurück zum Zitat Scheffer J (2002) Dealing with missing data Scheffer J (2002) Dealing with missing data
17.
Zurück zum Zitat Dempster AP, Laird NM et al. (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B 1–38 Dempster AP, Laird NM et al. (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B 1–38
18.
Zurück zum Zitat Zhang S (2008) Parimputation: from imputation and null-imputation to partially imputation. IEEE Intell Inform Bull 9(1):32–38 Zhang S (2008) Parimputation: from imputation and null-imputation to partially imputation. IEEE Intell Inform Bull 9(1):32–38
19.
Zurück zum Zitat Zhang C, Zhu X et al (2007) GBKII: an imputation method for missing values. Adv Knowl Discov Data Mining 1080–1087 Zhang C, Zhu X et al (2007) GBKII: an imputation method for missing values. Adv Knowl Discov Data Mining 1080–1087
20.
Zurück zum Zitat Wang Q, Rao J (2002) Empirical likelihood-based inference under imputation for missing response data. Annal Stat 30(3):896–924MathSciNetMATHCrossRef Wang Q, Rao J (2002) Empirical likelihood-based inference under imputation for missing response data. Annal Stat 30(3):896–924MathSciNetMATHCrossRef
21.
Zurück zum Zitat Pérez A, Dennis RJ et al (2002) Use of the mean, hot deck and multiple imputation techniques to predict outcome in intensive care unit patients in Colombia. Stat Med 21(24):3885–3896CrossRef Pérez A, Dennis RJ et al (2002) Use of the mean, hot deck and multiple imputation techniques to predict outcome in intensive care unit patients in Colombia. Stat Med 21(24):3885–3896CrossRef
22.
Zurück zum Zitat Jerez JM, Molina I et al (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med 50(2):105–115CrossRef Jerez JM, Molina I et al (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med 50(2):105–115CrossRef
23.
Zurück zum Zitat Bø TH, Dysvik B et al (2004) LSimpute: accurate estimation of missing values in microarray data with least squares methods. Nucleic Acids Res 32(3):e34–e34 Bø TH, Dysvik B et al (2004) LSimpute: accurate estimation of missing values in microarray data with least squares methods. Nucleic Acids Res 32(3):e34–e34
24.
Zurück zum Zitat Choong MK, Charbit M et al (2009) Autoregressive-model-based missing value estimation for DNA microarray time series data. Inform Technol Biomed IEEE Trans 13(1):131–137CrossRef Choong MK, Charbit M et al (2009) Autoregressive-model-based missing value estimation for DNA microarray time series data. Inform Technol Biomed IEEE Trans 13(1):131–137CrossRef
25.
Zurück zum Zitat Kim H, Golub GH et al (2005) Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics 21(2):187–198CrossRef Kim H, Golub GH et al (2005) Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics 21(2):187–198CrossRef
26.
Zurück zum Zitat Oba S, Sato M-A et al (2003) A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16):2088–2096CrossRef Oba S, Sato M-A et al (2003) A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16):2088–2096CrossRef
27.
Zurück zum Zitat Wang X, Li A et al (2006) Missing value estimation for DNA microarray gene expression data by support vector regression imputation and orthogonal coding scheme. BMC Bioinform 7(1):32CrossRef Wang X, Li A et al (2006) Missing value estimation for DNA microarray gene expression data by support vector regression imputation and orthogonal coding scheme. BMC Bioinform 7(1):32CrossRef
28.
Zurück zum Zitat Wong DS, Wong FK et al (2007) A multi-stage approach to clustering and imputation of gene expression profiles. Bioinformatics 23(8):998–1005CrossRef Wong DS, Wong FK et al (2007) A multi-stage approach to clustering and imputation of gene expression profiles. Bioinformatics 23(8):998–1005CrossRef
29.
Zurück zum Zitat Diggle P, Kenward MG (1994) Informative drop-out in longitudinal data analysis. Appl Stat 49–93 Diggle P, Kenward MG (1994) Informative drop-out in longitudinal data analysis. Appl Stat 49–93
30.
31.
Zurück zum Zitat Little RJ (2008) Selection and pattern-mixture models. Longitud Data Anal 409–431 Little RJ (2008) Selection and pattern-mixture models. Longitud Data Anal 409–431
32.
Zurück zum Zitat Muthén B, Asparouhov T et al (2011) Growth modeling with nonignorable dropout: alternative analyses of the STAR* D antidepressant trial. Psychol Methods 16(1):17CrossRef Muthén B, Asparouhov T et al (2011) Growth modeling with nonignorable dropout: alternative analyses of the STAR* D antidepressant trial. Psychol Methods 16(1):17CrossRef
33.
Zurück zum Zitat Albert PS, Follmann DA (2009) Shared-parameter models. Longitud Data Anal 433–452 Albert PS, Follmann DA (2009) Shared-parameter models. Longitud Data Anal 433–452
34.
Zurück zum Zitat Beunckens C, Molenberghs G et al (2008) A latent class mixture model for incomplete longitudinal Gaussian data. Biometrics 64(1):96–105MathSciNetMATHCrossRef Beunckens C, Molenberghs G et al (2008) A latent class mixture model for incomplete longitudinal Gaussian data. Biometrics 64(1):96–105MathSciNetMATHCrossRef
35.
Zurück zum Zitat Dantan E, Proust-Lima C et al (2008) Pattern mixture models and latent class models for the analysis of multivariate longitudinal data with informative dropouts. Inter J Biostat 4(1):1–26MathSciNetCrossRef Dantan E, Proust-Lima C et al (2008) Pattern mixture models and latent class models for the analysis of multivariate longitudinal data with informative dropouts. Inter J Biostat 4(1):1–26MathSciNetCrossRef
36.
Zurück zum Zitat Roy J, Daniels MJ (2008) A general class of pattern mixture models for nonignorable dropout with many possible dropout times. Biometrics 64(2):538–545MathSciNetMATHCrossRef Roy J, Daniels MJ (2008) A general class of pattern mixture models for nonignorable dropout with many possible dropout times. Biometrics 64(2):538–545MathSciNetMATHCrossRef
37.
Zurück zum Zitat Jansen I, Hens N et al (2006) The nature of sensitivity in monotone missing not at random models. Comput Stat Data Anal 50(3):830–858MathSciNetMATHCrossRef Jansen I, Hens N et al (2006) The nature of sensitivity in monotone missing not at random models. Comput Stat Data Anal 50(3):830–858MathSciNetMATHCrossRef
38.
Zurück zum Zitat Hogan JW, Roy J et al (2004) Handling dropout in longitudinal studies. Stat Med 23(9):1455–1497CrossRef Hogan JW, Roy J et al (2004) Handling dropout in longitudinal studies. Stat Med 23(9):1455–1497CrossRef
39.
Zurück zum Zitat Kenward MG (1998) Selection models for repeated measurements with nonandom dropout: an illustration of sensitivity. Stat Med 17(23):2723–2732CrossRef Kenward MG (1998) Selection models for repeated measurements with nonandom dropout: an illustration of sensitivity. Stat Med 17(23):2723–2732CrossRef
40.
Zurück zum Zitat Michiels B, Molenberghs G et al (2002) Selection models and patternmixture models to analyse longitudinal quality of life data subject to dropout. Stat Med 21(8):1023–1041CrossRef Michiels B, Molenberghs G et al (2002) Selection models and patternmixture models to analyse longitudinal quality of life data subject to dropout. Stat Med 21(8):1023–1041CrossRef
41.
Zurück zum Zitat Ma J et al (2014) Fuzzy clustering with non-local information for image segmentation. Inter J Mach Learn Cybern 5(6):845–859CrossRef Ma J et al (2014) Fuzzy clustering with non-local information for image segmentation. Inter J Mach Learn Cybern 5(6):845–859CrossRef
42.
Zurück zum Zitat Vishwakarma VP (2013) Illumination normalization using fuzzy filter in DCT domain for face recognition. Inter J Mach Learn Cybern 6(1):17–34CrossRef Vishwakarma VP (2013) Illumination normalization using fuzzy filter in DCT domain for face recognition. Inter J Mach Learn Cybern 6(1):17–34CrossRef
44.
Zurück zum Zitat Chongfu H (2000) Demonstration of benefit of information distribution for probability estimation. Signal Process 80(6):1037–1048MATHCrossRef Chongfu H (2000) Demonstration of benefit of information distribution for probability estimation. Signal Process 80(6):1037–1048MATHCrossRef
45.
Zurück zum Zitat Lakshminarayan K, Harp SA et al (1999) Imputation of missing data in industrial databases. Appl Intell 11(3):259–275CrossRef Lakshminarayan K, Harp SA et al (1999) Imputation of missing data in industrial databases. Appl Intell 11(3):259–275CrossRef
46.
Zurück zum Zitat Merz CJ, Murphy PM (1998) UCI Repository of machine learning databases Merz CJ, Murphy PM (1998) UCI Repository of machine learning databases
47.
Zurück zum Zitat Schneider T (2001) Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. J Climate 14(5):853–871CrossRef Schneider T (2001) Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. J Climate 14(5):853–871CrossRef
48.
Zurück zum Zitat Troyanskaya O, Cantor M et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525CrossRef Troyanskaya O, Cantor M et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525CrossRef
Metadaten
Titel
Information-decomposition-model-based missing value estimation for not missing at random dataset
verfasst von
Shigang Liu
Honghua Dai
Min Gan
Publikationsdatum
29.03.2015
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 1/2018
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-015-0354-5

Weitere Artikel der Ausgabe 1/2018

International Journal of Machine Learning and Cybernetics 1/2018 Zur Ausgabe

Neuer Inhalt