Skip to main content

2021 | OriginalPaper | Buchkapitel

ImputeRNN: Imputing Missing Values in Electronic Medical Records

verfasst von : Jiawei Ouyang, Yuhao Zhang, Xiangrui Cai, Ying Zhang, Xiaojie Yuan

Erschienen in: Database Systems for Advanced Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Electronic Medical Records (EMRs), which record visits of patients to the hospital, are the main resources for medical data analysis. However, plenty of missing values in EMRs limit the model capability for various researches in healthcare. Recently, many imputation methods have been proposed to address this challenging problem, but they fail to take medical bias into account. Medical bias is a ubiquitous phenomenon that the missingness of medical data is missing not at random because doctors prone to measure features related to the disease of patients. It reflects the physical conditions of patients, which helps impute missing data with accurate and practical values. In this paper, we propose a novel joint recurrent neural network (RNN) model called ImputeRNN, which considers medical bias for EMR imputation. We model the medical bias by an additional RNN based on a mask (missing or not) matrix, whose hidden vectors are incorporated into the model as contexts by a fusion layer. Extensive experiments on two real-world EMR datasets demonstrate that ImputeRNN outperforms state-of-the-art methods on imputation and downstream prediction tasks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agniel, D., Kohane, I.S., Weber, G.M.: Biases in electronic health record data due to processes within the healthcare system: retrospective observational study. Br. Med. J. 361 (2018) Agniel, D., Kohane, I.S., Weber, G.M.: Biases in electronic health record data due to processes within the healthcare system: retrospective observational study. Br. Med. J. 361 (2018)
2.
Zurück zum Zitat Cao, W., Wang, D., Li, J., Zhou, H., Li, L., Li, Y.: BRITS: bidirectional recurrent imputation for time series. In: Advances in Neural Information Processing Systems, NeurIPS, pp. 6776–6786 (2018) Cao, W., Wang, D., Li, J., Zhou, H., Li, L., Li, Y.: BRITS: bidirectional recurrent imputation for time series. In: Advances in Neural Information Processing Systems, NeurIPS, pp. 6776–6786 (2018)
3.
Zurück zum Zitat Che, Z., Purushotham, S., Cho, K., Sontag, D.A., Liu, Y.: Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 8(1), 1–12 (2018) Che, Z., Purushotham, S., Cho, K., Sontag, D.A., Liu, Y.: Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 8(1), 1–12 (2018)
4.
Zurück zum Zitat Che, Z., Purushotham, S., Li, M.G., Jiang, B., Liu, Y.: Hierarchical deep generative models for multi-rate multivariate time series. In: International Conference on Machine Learning, ICML, vol. 80, pp. 783–792 (2018) Che, Z., Purushotham, S., Li, M.G., Jiang, B., Liu, Y.: Hierarchical deep generative models for multi-rate multivariate time series. In: International Conference on Machine Learning, ICML, vol. 80, pp. 783–792 (2018)
5.
Zurück zum Zitat Fan, J., Zhang, Y., Udell, M.: Polynomial matrix completion for missing data imputation and transductive learning. In: Association for the Advancement of Artificial Intelligence, AAAI, pp. 3842–3849 (2020) Fan, J., Zhang, Y., Udell, M.: Polynomial matrix completion for missing data imputation and transductive learning. In: Association for the Advancement of Artificial Intelligence, AAAI, pp. 3842–3849 (2020)
6.
Zurück zum Zitat García-Laencina, P.J., Sancho-Gómez, J., Figueiras-Vidal, A.R., Verleysen, M.: K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing 72(7–9), 1483–1493 (2009)CrossRef García-Laencina, P.J., Sancho-Gómez, J., Figueiras-Vidal, A.R., Verleysen, M.: K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing 72(7–9), 1483–1493 (2009)CrossRef
7.
Zurück zum Zitat Haneuse, S., Daniels, M.: A general framework for considering selection bias in EHR-based studies: what data are observed and why? Gener. Evid. Methods Improve Patient Outcomes 4(1), 1203–1203 (2016) Haneuse, S., Daniels, M.: A general framework for considering selection bias in EHR-based studies: what data are observed and why? Gener. Evid. Methods Improve Patient Outcomes 4(1), 1203–1203 (2016)
8.
Zurück zum Zitat Jerez, J.M., et al.: Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif. Intell. Med. 50(2), 105–115 (2010)CrossRef Jerez, J.M., et al.: Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif. Intell. Med. 50(2), 105–115 (2010)CrossRef
9.
Zurück zum Zitat Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)CrossRef Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)CrossRef
10.
Zurück zum Zitat Khayati, M., Lerner, A., Tymchenko, Z., Cudré-Mauroux, P.: Mind the gap: an experimental evaluation of imputation of missing values techniques in time series. Proc. VLDB Endow. 13(5), 768–782 (2020)CrossRef Khayati, M., Lerner, A., Tymchenko, Z., Cudré-Mauroux, P.: Mind the gap: an experimental evaluation of imputation of missing values techniques in time series. Proc. VLDB Endow. 13(5), 768–782 (2020)CrossRef
11.
Zurück zum Zitat Kiela, D., Grave, E., Joulin, A., Mikolov, T.: Efficient large-scale multi-modal classification. In: Association for the Advancement of Artificial Intelligence, AAAI, pp. 5198–5204 (2018) Kiela, D., Grave, E., Joulin, A., Mikolov, T.: Efficient large-scale multi-modal classification. In: Association for the Advancement of Artificial Intelligence, AAAI, pp. 5198–5204 (2018)
12.
Zurück zum Zitat Kim, Y., Chi, M.: Temporal belief memory: imputing missing data during RNN training. In: International Joint Conference on Artificial Intelligence, IJCAI, pp. 2326–2332 (2018) Kim, Y., Chi, M.: Temporal belief memory: imputing missing data during RNN training. In: International Joint Conference on Artificial Intelligence, IJCAI, pp. 2326–2332 (2018)
13.
Zurück zum Zitat Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, ICLR (2015) Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, ICLR (2015)
14.
Zurück zum Zitat Li, S.C., Jiang, B., Marlin, B.M.: MisGAN: learning from incomplete data with generative adversarial networks. In: International Conference on Learning Representations, ICLR (2019) Li, S.C., Jiang, B., Marlin, B.M.: MisGAN: learning from incomplete data with generative adversarial networks. In: International Conference on Learning Representations, ICLR (2019)
15.
Zurück zum Zitat Luo, J., Ye, M., Xiao, C., Ma, F.: HiTANet: hierarchical time-aware attention networks for risk prediction on electronic health records. In: Special Interest Group on Knowledge Discovery in Data, SIGKDD, pp. 647–656 (2020) Luo, J., Ye, M., Xiao, C., Ma, F.: HiTANet: hierarchical time-aware attention networks for risk prediction on electronic health records. In: Special Interest Group on Knowledge Discovery in Data, SIGKDD, pp. 647–656 (2020)
16.
Zurück zum Zitat Luo, Y., Cai, X., Zhang, Y., Xu, J., Yuan, X.: Multivariate time series imputation with generative adversarial networks. In: Advances in Neural Information Processing Systems, NeurIPS, pp. 1603–1614 (2018) Luo, Y., Cai, X., Zhang, Y., Xu, J., Yuan, X.: Multivariate time series imputation with generative adversarial networks. In: Advances in Neural Information Processing Systems, NeurIPS, pp. 1603–1614 (2018)
17.
Zurück zum Zitat Luo, Y., Zhang, Y., Cai, X., Yuan, X.: E\({^2}\)GAN: end-to-end generative adversarial network for multivariate time series imputation. In: International Joint Conference on Artificial Intelligence, IJCAI, pp. 3094–3100 (2019) Luo, Y., Zhang, Y., Cai, X., Yuan, X.: E\({^2}\)GAN: end-to-end generative adversarial network for multivariate time series imputation. In: International Joint Conference on Artificial Intelligence, IJCAI, pp. 3094–3100 (2019)
18.
Zurück zum Zitat MacNamee, B., Cunningham, P., Byrne, S., Corrigan, O.I.: The problem of bias in training data in regression problems in medical decision support. Artif. Intell. Med. 24(1), 51–70 (2002)CrossRef MacNamee, B., Cunningham, P., Byrne, S., Corrigan, O.I.: The problem of bias in training data in regression problems in medical decision support. Artif. Intell. Med. 24(1), 51–70 (2002)CrossRef
19.
Zurück zum Zitat Ovalle, J.E.A., Solorio, T., Montes-y-Gómez, M., González, F.A.: Gated multimodal units for information fusion. In: International Conference on Learning Representations, ICLR (2017) Ovalle, J.E.A., Solorio, T., Montes-y-Gómez, M., González, F.A.: Gated multimodal units for information fusion. In: International Conference on Learning Representations, ICLR (2017)
20.
Zurück zum Zitat Phelan, M., Bhavsar, N.A., Goldstein, B.A.: Illustrating informed presence bias in electronic health records data: how patient interactions with a health system can impact inference. Gener. Evid. Methods Improve Patient Outcomes 5(1), 22 (2017)CrossRef Phelan, M., Bhavsar, N.A., Goldstein, B.A.: Illustrating informed presence bias in electronic health records data: how patient interactions with a health system can impact inference. Gener. Evid. Methods Improve Patient Outcomes 5(1), 22 (2017)CrossRef
21.
Zurück zum Zitat Pivovarov, R., Albers, D.J., Sepulveda, J.L., Elhadad, N.: Identifying and mitigating biases in EHR laboratory tests. Biomed. Inform. 51, 24–34 (2014)CrossRef Pivovarov, R., Albers, D.J., Sepulveda, J.L., Elhadad, N.: Identifying and mitigating biases in EHR laboratory tests. Biomed. Inform. 51, 24–34 (2014)CrossRef
22.
Zurück zum Zitat Purushotham, S., Meng, C., Che, Z., Liu, Y.: Benchmarking deep learning models on large healthcare datasets. Biomed. Inform. 83, 112–134 (2018)CrossRef Purushotham, S., Meng, C., Che, Z., Liu, Y.: Benchmarking deep learning models on large healthcare datasets. Biomed. Inform. 83, 112–134 (2018)CrossRef
23.
Zurück zum Zitat Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. In: Advances in Neural Information Processing Systems, NeurIPS, pp. 1257–1264 (2007) Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. In: Advances in Neural Information Processing Systems, NeurIPS, pp. 1257–1264 (2007)
24.
Zurück zum Zitat Silva, I., Moody, G., Scott, D.J., Celi, L.A., Mark, R.G.: Predicting in-hospital mortality of ICU patients: the PhysioNet/computing in cardiology challenge 2012. Comput. Cardiol. 39, 245–248 (2012) Silva, I., Moody, G., Scott, D.J., Celi, L.A., Mark, R.G.: Predicting in-hospital mortality of ICU patients: the PhysioNet/computing in cardiology challenge 2012. Comput. Cardiol. 39, 245–248 (2012)
25.
Zurück zum Zitat Smieja, M., Struski, L., Tabor, J., Zielinski, B., Spurek, P.: Processing of missing data by neural networks. In: Advances in Neural Information Processing Systems, NeurIPS, pp. 2724–2734 (2018) Smieja, M., Struski, L., Tabor, J., Zielinski, B., Spurek, P.: Processing of missing data by neural networks. In: Advances in Neural Information Processing Systems, NeurIPS, pp. 2724–2734 (2018)
26.
Zurück zum Zitat Sportisse, A., Boyer, C., Josse, J.: Estimation and imputation in probabilistic principal component analysis with missing not at random data. In: Advances in Neural Information Processing Systems, NeurIPS (2020) Sportisse, A., Boyer, C., Josse, J.: Estimation and imputation in probabilistic principal component analysis with missing not at random data. In: Advances in Neural Information Processing Systems, NeurIPS (2020)
27.
Zurück zum Zitat Sterne, J.A., et al.: Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. Br. Med. J. 338 (2009) Sterne, J.A., et al.: Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. Br. Med. J. 338 (2009)
28.
Zurück zum Zitat Tang, X., Yao, H., Sun, Y., Aggarwal, C.C., Mitra, P., Wang, S.: Joint modeling of local and global temporal dynamics for multivariate time series forecasting with missing values. In: Association for the Advancement of Artificial Intelligence, AAAI, pp. 5956–5963 (2020) Tang, X., Yao, H., Sun, Y., Aggarwal, C.C., Mitra, P., Wang, S.: Joint modeling of local and global temporal dynamics for multivariate time series forecasting with missing values. In: Association for the Advancement of Artificial Intelligence, AAAI, pp. 5956–5963 (2020)
29.
Zurück zum Zitat Vassy, J., et al.: Yield and bias in defining a cohort study baseline from electronic health record data. Biomed. Inform. 78, 54–59 (2018)CrossRef Vassy, J., et al.: Yield and bias in defining a cohort study baseline from electronic health record data. Biomed. Inform. 78, 54–59 (2018)CrossRef
30.
Zurück zum Zitat Yadav, P., Steinbach, M.S., Kumar, V., Simon, G.J.: Mining electronic health records (EHRs): a survey. ACM Comput. Surv. 50(6), 85:1–85:40 (2018) Yadav, P., Steinbach, M.S., Kumar, V., Simon, G.J.: Mining electronic health records (EHRs): a survey. ACM Comput. Surv. 50(6), 85:1–85:40 (2018)
31.
Zurück zum Zitat Yoon, J., Jordon, J., van der Schaar, M.: GAIN: missing data imputation using generative adversarial nets. In: International Conference on Machine Learning, ICML, vol. 80, pp. 5675–5684 (2018) Yoon, J., Jordon, J., van der Schaar, M.: GAIN: missing data imputation using generative adversarial nets. In: International Conference on Machine Learning, ICML, vol. 80, pp. 5675–5684 (2018)
32.
Zurück zum Zitat Yoon, J., Zame, W.R., van der Schaar, M.: Estimating missing data in temporal data streams using multi-directional recurrent neural networks. IEEE Trans. Biomed. Eng. 66(5), 1477–1490 (2019)CrossRef Yoon, J., Zame, W.R., van der Schaar, M.: Estimating missing data in temporal data streams using multi-directional recurrent neural networks. IEEE Trans. Biomed. Eng. 66(5), 1477–1490 (2019)CrossRef
33.
Zurück zum Zitat Zheng, K., Gao, J., Ngiam, K.Y., Ooi, B.C., Yip, J.W.L.: Resolving the bias in electronic medical records. In: Special Interest Group on Knowledge Discovery in Data, SIGKDD, pp. 2171–2180 (2017) Zheng, K., Gao, J., Ngiam, K.Y., Ooi, B.C., Yip, J.W.L.: Resolving the bias in electronic medical records. In: Special Interest Group on Knowledge Discovery in Data, SIGKDD, pp. 2171–2180 (2017)
Metadaten
Titel
ImputeRNN: Imputing Missing Values in Electronic Medical Records
verfasst von
Jiawei Ouyang
Yuhao Zhang
Xiangrui Cai
Ying Zhang
Xiaojie Yuan
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-73200-4_28