nach oben

International Journal of Machine Learning and Cybernetics

Erschienen in:

29.03.2015 | Original Article

Information-decomposition-model-based missing value estimation for not missing at random dataset

verfasst von: Shigang Liu, Honghua Dai, Min Gan

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Missing data estimation is an important strategy for improving learning performance in learning from incomplete data, especially, when there are non discardable records with missing values. However, most of the existing algorithms are focused on missing at random (MAR) or missing completely at random (MCAR), and less attention has been paid to data not missing at random (NMAR). In this paper, an information decomposition imputation (IDIM) algorithm using fuzzy membership function is proposed for addressing the missing value problem under NMAR. Firstly, the proposed IDIM algorithm is presented with detailed examples. Then, the proposed approach is evaluated with extensive experiments compared with some typical algorithms. The experimental results demonstrate that the proposed algorithm has higher accuracy than the exiting imputation approaches in terms of normal root mean square error (NRMSE) and TP+TN evaluation under different missing strategies.

Vorheriger Artikel Sentimental feature selection for sentiment analysis of Chinese online reviews

Nächster Artikel A novel hybrid model using teaching–learning-based optimization and a support vector machine for commodity futures index forecasting

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

Jetzt informieren

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

Jetzt informieren

Qin Y, Zhang S et al (2009) POP algorithm: kernel-based imputation to treat missing values in knowledge discovery from databases. Expert syst Appl 36(2):2794–2804CrossRef

Vagin V, Fomina M (2011) Problem of knowledge discovery in noisy databases. Inter J Mach Learn Cybern 2(3):135–145CrossRef

Yu T, Peng H et al (2011) Incorporating nonlinear relationships in microarray missing value imputation. Comput Biol Bioinform IEEE/ACM Trans 8(3):723–731CrossRef

Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592MathSciNetMATHCrossRef

Zhang S, Qin Z et al (2005) Missing is useful: missing values in cost-sensitive decision trees. Knowl Data Eng IEEE Trans 17(12):1689–1693CrossRef

Qin Y, Zhang S et al (2007) Semi-parametric optimization for missing data imputation. Appl Intell 27(1):79–88MATHCrossRef

Saar-Tsechansky M, Provost F (2007) Handling missing values when applying classification models. J Mach Learn Res 8:1217–1250MathSciNetMATH

Zhu X, Zhang S et al (2011) Missing value estimation for mixed-attribute data sets. Knowl Data Eng IEEE Trans 23(1):110–121MathSciNetCrossRef

Allison PD (2000) Missing data. Sage Thousand Oaks, USAMATH

10.

Little RJ, Rubin DB (2002) Statistical analysis with missing data

11.

Rubin DB (2004) Multiple imputation for nonresponse in surveys. John Wiley and Sons, New York

12.

Ramoni M, Sebastiani P (1997) Learning Bayesian networks from incomplete databases. In: Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., USA

13.

Ghahramani Z, Jordan MI (1997) Mixture models for learning from incomplete data. Comput Learn Theory Nat Learn Syst 4:67–85

14.

Dick U, Haider P et al. (2008) Learning from incomplete data with infinite imputations. In: Proceedings of the 25th international conference on Machine learning, ACM

15.

Dai H, Ciesielski V (1994) Learning of inexact rules by the fish-net algorithm from low quality data. In: Proceedings of the Eigth Australian Joint Artificial Intelligence Conference, Citeseer

16.

Scheffer J (2002) Dealing with missing data

17.

Dempster AP, Laird NM et al. (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B 1–38

18.

Zhang S (2008) Parimputation: from imputation and null-imputation to partially imputation. IEEE Intell Inform Bull 9(1):32–38

19.

Zhang C, Zhu X et al (2007) GBKII: an imputation method for missing values. Adv Knowl Discov Data Mining 1080–1087

20.

Wang Q, Rao J (2002) Empirical likelihood-based inference under imputation for missing response data. Annal Stat 30(3):896–924MathSciNetMATHCrossRef

21.

Pérez A, Dennis RJ et al (2002) Use of the mean, hot deck and multiple imputation techniques to predict outcome in intensive care unit patients in Colombia. Stat Med 21(24):3885–3896CrossRef

22.

Jerez JM, Molina I et al (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med 50(2):105–115CrossRef

23.

Bø TH, Dysvik B et al (2004) LSimpute: accurate estimation of missing values in microarray data with least squares methods. Nucleic Acids Res 32(3):e34–e34

24.

Choong MK, Charbit M et al (2009) Autoregressive-model-based missing value estimation for DNA microarray time series data. Inform Technol Biomed IEEE Trans 13(1):131–137CrossRef

25.

Kim H, Golub GH et al (2005) Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics 21(2):187–198CrossRef

26.

Oba S, Sato M-A et al (2003) A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16):2088–2096CrossRef

27.

Wang X, Li A et al (2006) Missing value estimation for DNA microarray gene expression data by support vector regression imputation and orthogonal coding scheme. BMC Bioinform 7(1):32CrossRef

28.

Wong DS, Wong FK et al (2007) A multi-stage approach to clustering and imputation of gene expression profiles. Bioinformatics 23(8):998–1005CrossRef

29.

Diggle P, Kenward MG (1994) Informative drop-out in longitudinal data analysis. Appl Stat 49–93

30.

Little RJ (1995) Modeling the drop-out mechanism in repeated-measures studies. J Am Stat Assoc 90(431):1112–1121MathSciNetMATHCrossRef

31.

Little RJ (2008) Selection and pattern-mixture models. Longitud Data Anal 409–431

32.

Muthén B, Asparouhov T et al (2011) Growth modeling with nonignorable dropout: alternative analyses of the STAR* D antidepressant trial. Psychol Methods 16(1):17CrossRef

33.

Albert PS, Follmann DA (2009) Shared-parameter models. Longitud Data Anal 433–452

34.

Beunckens C, Molenberghs G et al (2008) A latent class mixture model for incomplete longitudinal Gaussian data. Biometrics 64(1):96–105MathSciNetMATHCrossRef

35.

Dantan E, Proust-Lima C et al (2008) Pattern mixture models and latent class models for the analysis of multivariate longitudinal data with informative dropouts. Inter J Biostat 4(1):1–26MathSciNetCrossRef

36.

Roy J, Daniels MJ (2008) A general class of pattern mixture models for nonignorable dropout with many possible dropout times. Biometrics 64(2):538–545MathSciNetMATHCrossRef

37.

Jansen I, Hens N et al (2006) The nature of sensitivity in monotone missing not at random models. Comput Stat Data Anal 50(3):830–858MathSciNetMATHCrossRef

38.

Hogan JW, Roy J et al (2004) Handling dropout in longitudinal studies. Stat Med 23(9):1455–1497CrossRef

39.

Kenward MG (1998) Selection models for repeated measurements with nonandom dropout: an illustration of sensitivity. Stat Med 17(23):2723–2732CrossRef

40.

Michiels B, Molenberghs G et al (2002) Selection models and patternmixture models to analyse longitudinal quality of life data subject to dropout. Stat Med 21(8):1023–1041CrossRef

41.

Ma J et al (2014) Fuzzy clustering with non-local information for image segmentation. Inter J Mach Learn Cybern 5(6):845–859CrossRef

42.

Vishwakarma VP (2013) Illumination normalization using fuzzy filter in DCT domain for face recognition. Inter J Mach Learn Cybern 6(1):17–34CrossRef

43.

Zadeh LA (1965) Fuzzy sets. Inform control 8(3):338–353MATHCrossRef

44.

Chongfu H (2000) Demonstration of benefit of information distribution for probability estimation. Signal Process 80(6):1037–1048MATHCrossRef

45.

Lakshminarayan K, Harp SA et al (1999) Imputation of missing data in industrial databases. Appl Intell 11(3):259–275CrossRef

46.

Merz CJ, Murphy PM (1998) UCI Repository of machine learning databases

47.

Schneider T (2001) Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. J Climate 14(5):853–871CrossRef

48.

Troyanskaya O, Cantor M et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525CrossRef

Titel: Information-decomposition-model-based missing value estimation for not missing at random dataset
verfasst von: Shigang Liu
Honghua Dai
Min Gan
Publikationsdatum: 29.03.2015
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal of Machine Learning and Cybernetics / Ausgabe 1/2018
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI: https://doi.org/10.1007/s13042-015-0354-5

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

ATZelectronics worldwide

ATZelektronik

Weitere Artikel der Ausgabe 1/2018

A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority

Sentimental feature selection for sentiment analysis of Chinese online reviews

Robust image watermarking scheme in lifting wavelet domain using GA-LSVR hybridization

The effect of online reviews on e-tailers’ pricing in a dual-channel market with competition

A novel hybrid model using teaching–learning-based optimization and a support vector machine for commodity futures index forecasting

Colour image segmentation with histogram and homogeneity histogram difference using evolutionary algorithms

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.