Skip to main content

2018 | OriginalPaper | Buchkapitel

MIDA: Multiple Imputation Using Denoising Autoencoders

verfasst von : Lovedeep Gondara, Ke Wang

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Missing data is a significant problem impacting all domains. State-of-the-art framework for minimizing missing data bias is multiple imputation, for which the choice of an imputation model remains nontrivial. We propose a multiple imputation model based on overcomplete deep denoising autoencoders. Our proposed model is capable of handling different data types, missingness patterns, missingness proportions and distributions. Evaluation on several real life datasets show our proposed model significantly outperforms current state-of-the-art methods under varying conditions while simultaneously improving end of the line analytics.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Beaulieu-Jones, B.K., Moore, J.H.: The pooled resource open-access ALS, and clinical trials consortium. Missing data imputation in the electronic health record using deeply learned autoencoders. In: Pacific Symposium on Biocomputing, vol. 22, pp. 207. NIH Public Access (2016) Beaulieu-Jones, B.K., Moore, J.H.: The pooled resource open-access ALS, and clinical trials consortium. Missing data imputation in the electronic health record using deeply learned autoencoders. In: Pacific Symposium on Biocomputing, vol. 22, pp. 207. NIH Public Access (2016)
2.
Zurück zum Zitat Bengio, Y., Yao, L., Alain, G., Vincent, P.: Generalized denoising auto-encoders as generative models. In: Advances in Neural Information Processing Systems, pp. 899–907 (2013) Bengio, Y., Yao, L., Alain, G., Vincent, P.: Generalized denoising auto-encoders as generative models. In: Advances in Neural Information Processing Systems, pp. 899–907 (2013)
3.
Zurück zum Zitat Buuren, S., Groothuis-Oudshoorn, K.: MICE: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–68 (2011)CrossRef Buuren, S., Groothuis-Oudshoorn, K.: MICE: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–68 (2011)CrossRef
4.
Zurück zum Zitat Chen, P.: Optimization algorithms on subspaces: revisiting missing data problem in low-rank matrix. Int. J. Comput. Vis. 80(1), 125–142 (2008)CrossRef Chen, P.: Optimization algorithms on subspaces: revisiting missing data problem in low-rank matrix. Int. J. Comput. Vis. 80(1), 125–142 (2008)CrossRef
5.
Zurück zum Zitat Duan, Y., Lv, Y., Kang, W., Zhao, Y.: A deep learning based approach for traffic data imputation. In: 2014 IEEE 17th International Conference on Intelligent Transportation Systems (ITSC), pp. 912–917. IEEE (2014) Duan, Y., Lv, Y., Kang, W., Zhao, Y.: A deep learning based approach for traffic data imputation. In: 2014 IEEE 17th International Conference on Intelligent Transportation Systems (ITSC), pp. 912–917. IEEE (2014)
6.
Zurück zum Zitat LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRef LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRef
7.
Zurück zum Zitat Leisch, F., Dimitriadou, E.: Machine learning benchmark problems (2010) Leisch, F., Dimitriadou, E.: Machine learning benchmark problems (2010)
8.
Zurück zum Zitat Li, S., Kawale, J., Fu, Y.: Deep collaborative filtering via marginalized denoising auto-encoder. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 811–820. ACM (2015) Li, S., Kawale, J., Fu, Y.: Deep collaborative filtering via marginalized denoising auto-encoder. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 811–820. ACM (2015)
9.
Zurück zum Zitat Little, R.J.A.: Missing-data adjustments in large surveys. J. Bus. Econ. Stat. 6(3), 287–296 (1988) Little, R.J.A.: Missing-data adjustments in large surveys. J. Bus. Econ. Stat. 6(3), 287–296 (1988)
10.
Zurück zum Zitat Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, Hoboken (2014)MATH Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, Hoboken (2014)MATH
11.
Zurück zum Zitat Morris, T.P., White, I.R., Royston, P.: Tuning multiple imputation by predictive mean matching and local residual draws. BMC Med. Res. Methodol. 14(1), 75 (2014)CrossRef Morris, T.P., White, I.R., Royston, P.: Tuning multiple imputation by predictive mean matching and local residual draws. BMC Med. Res. Methodol. 14(1), 75 (2014)CrossRef
12.
Zurück zum Zitat Nelwamondo, F.V., Mohamed, S., Marwala, T.: Missing data: A comparison of neural network and expectation maximisation techniques. arXiv preprint arXiv:0704.3474 (2007) Nelwamondo, F.V., Mohamed, S., Marwala, T.: Missing data: A comparison of neural network and expectation maximisation techniques. arXiv preprint arXiv:​0704.​3474 (2007)
13.
Zurück zum Zitat Nesterov, Y.: A method of solving a convex programming problem with convergence rate O (1/k2) (1983) Nesterov, Y.: A method of solving a convex programming problem with convergence rate O (1/k2) (1983)
16.
Zurück zum Zitat Shah, A.D., Bartlett, J.W., Carpenter, J., Nicholas, O., Hemingway, H.: Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. Am. J. Epidemiol. 179(6), 764–774 (2014)CrossRef Shah, A.D., Bartlett, J.W., Carpenter, J., Nicholas, O., Hemingway, H.: Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. Am. J. Epidemiol. 179(6), 764–774 (2014)CrossRef
17.
Zurück zum Zitat Sterne, J.A.C., White, I.R., Carlin, J.B., Spratt, M., Royston, P., Kenward, M.G., Wood, A.M., Carpenter, J.R.: Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338, b2393 (2009)CrossRef Sterne, J.A.C., White, I.R., Carlin, J.B., Spratt, M., Royston, P., Kenward, M.G., Wood, A.M., Carpenter, J.R.: Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338, b2393 (2009)CrossRef
18.
Zurück zum Zitat Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103. ACM (2008) Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103. ACM (2008)
Metadaten
Titel
MIDA: Multiple Imputation Using Denoising Autoencoders
verfasst von
Lovedeep Gondara
Ke Wang
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-93040-4_21