Skip to main content
Top

2018 | OriginalPaper | Chapter

MIDA: Multiple Imputation Using Denoising Autoencoders

Authors : Lovedeep Gondara, Ke Wang

Published in: Advances in Knowledge Discovery and Data Mining

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Missing data is a significant problem impacting all domains. State-of-the-art framework for minimizing missing data bias is multiple imputation, for which the choice of an imputation model remains nontrivial. We propose a multiple imputation model based on overcomplete deep denoising autoencoders. Our proposed model is capable of handling different data types, missingness patterns, missingness proportions and distributions. Evaluation on several real life datasets show our proposed model significantly outperforms current state-of-the-art methods under varying conditions while simultaneously improving end of the line analytics.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Beaulieu-Jones, B.K., Moore, J.H.: The pooled resource open-access ALS, and clinical trials consortium. Missing data imputation in the electronic health record using deeply learned autoencoders. In: Pacific Symposium on Biocomputing, vol. 22, pp. 207. NIH Public Access (2016) Beaulieu-Jones, B.K., Moore, J.H.: The pooled resource open-access ALS, and clinical trials consortium. Missing data imputation in the electronic health record using deeply learned autoencoders. In: Pacific Symposium on Biocomputing, vol. 22, pp. 207. NIH Public Access (2016)
2.
go back to reference Bengio, Y., Yao, L., Alain, G., Vincent, P.: Generalized denoising auto-encoders as generative models. In: Advances in Neural Information Processing Systems, pp. 899–907 (2013) Bengio, Y., Yao, L., Alain, G., Vincent, P.: Generalized denoising auto-encoders as generative models. In: Advances in Neural Information Processing Systems, pp. 899–907 (2013)
3.
go back to reference Buuren, S., Groothuis-Oudshoorn, K.: MICE: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–68 (2011)CrossRef Buuren, S., Groothuis-Oudshoorn, K.: MICE: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–68 (2011)CrossRef
4.
go back to reference Chen, P.: Optimization algorithms on subspaces: revisiting missing data problem in low-rank matrix. Int. J. Comput. Vis. 80(1), 125–142 (2008)CrossRef Chen, P.: Optimization algorithms on subspaces: revisiting missing data problem in low-rank matrix. Int. J. Comput. Vis. 80(1), 125–142 (2008)CrossRef
5.
go back to reference Duan, Y., Lv, Y., Kang, W., Zhao, Y.: A deep learning based approach for traffic data imputation. In: 2014 IEEE 17th International Conference on Intelligent Transportation Systems (ITSC), pp. 912–917. IEEE (2014) Duan, Y., Lv, Y., Kang, W., Zhao, Y.: A deep learning based approach for traffic data imputation. In: 2014 IEEE 17th International Conference on Intelligent Transportation Systems (ITSC), pp. 912–917. IEEE (2014)
6.
go back to reference LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRef LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRef
7.
go back to reference Leisch, F., Dimitriadou, E.: Machine learning benchmark problems (2010) Leisch, F., Dimitriadou, E.: Machine learning benchmark problems (2010)
8.
go back to reference Li, S., Kawale, J., Fu, Y.: Deep collaborative filtering via marginalized denoising auto-encoder. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 811–820. ACM (2015) Li, S., Kawale, J., Fu, Y.: Deep collaborative filtering via marginalized denoising auto-encoder. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 811–820. ACM (2015)
9.
go back to reference Little, R.J.A.: Missing-data adjustments in large surveys. J. Bus. Econ. Stat. 6(3), 287–296 (1988) Little, R.J.A.: Missing-data adjustments in large surveys. J. Bus. Econ. Stat. 6(3), 287–296 (1988)
10.
go back to reference Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, Hoboken (2014)MATH Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, Hoboken (2014)MATH
11.
go back to reference Morris, T.P., White, I.R., Royston, P.: Tuning multiple imputation by predictive mean matching and local residual draws. BMC Med. Res. Methodol. 14(1), 75 (2014)CrossRef Morris, T.P., White, I.R., Royston, P.: Tuning multiple imputation by predictive mean matching and local residual draws. BMC Med. Res. Methodol. 14(1), 75 (2014)CrossRef
12.
go back to reference Nelwamondo, F.V., Mohamed, S., Marwala, T.: Missing data: A comparison of neural network and expectation maximisation techniques. arXiv preprint arXiv:0704.3474 (2007) Nelwamondo, F.V., Mohamed, S., Marwala, T.: Missing data: A comparison of neural network and expectation maximisation techniques. arXiv preprint arXiv:​0704.​3474 (2007)
13.
go back to reference Nesterov, Y.: A method of solving a convex programming problem with convergence rate O (1/k2) (1983) Nesterov, Y.: A method of solving a convex programming problem with convergence rate O (1/k2) (1983)
16.
go back to reference Shah, A.D., Bartlett, J.W., Carpenter, J., Nicholas, O., Hemingway, H.: Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. Am. J. Epidemiol. 179(6), 764–774 (2014)CrossRef Shah, A.D., Bartlett, J.W., Carpenter, J., Nicholas, O., Hemingway, H.: Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. Am. J. Epidemiol. 179(6), 764–774 (2014)CrossRef
17.
go back to reference Sterne, J.A.C., White, I.R., Carlin, J.B., Spratt, M., Royston, P., Kenward, M.G., Wood, A.M., Carpenter, J.R.: Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338, b2393 (2009)CrossRef Sterne, J.A.C., White, I.R., Carlin, J.B., Spratt, M., Royston, P., Kenward, M.G., Wood, A.M., Carpenter, J.R.: Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338, b2393 (2009)CrossRef
18.
go back to reference Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103. ACM (2008) Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103. ACM (2008)
Metadata
Title
MIDA: Multiple Imputation Using Denoising Autoencoders
Authors
Lovedeep Gondara
Ke Wang
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-93040-4_21

Premium Partner