Skip to main content
main-content
Top

Hint

Swipe to navigate through the chapters of this book

2020 | OriginalPaper | Chapter

Hybrid Text Feature Modeling for Disease Group Prediction Using Unstructured Physician Notes

Authors : Gokul S. Krishnan, S. Sowmya Kamath

Published in: Computational Science – ICCS 2020

Publisher: Springer International Publishing

share
SHARE

Abstract

Existing Clinical Decision Support Systems (CDSSs) largely depend on the availability of structured patient data and Electronic Health Records (EHRs) to aid caregivers. However, in case of hospitals in developing countries, structured patient data formats are not widely adopted, where medical professionals still rely on clinical notes in the form of unstructured text. Such unstructured clinical notes recorded by medical personnel can also be a potential source of rich patient-specific information which can be leveraged to build CDSSs, even for hospitals in developing countries. If such unstructured clinical text can be used, the manual and time-consuming process of EHR generation will no longer be required, with huge person-hours and cost savings. In this article, we propose a generic ICD9 disease group prediction CDSS built on unstructured physician notes modeled using hybrid word embeddings. These word embeddings are used to train a deep neural network for effectively predicting ICD9 disease groups. Experimental evaluation showed that the proposed approach outperformed the state-of-the-art disease group prediction model built on structured EHRs by 15% in terms of AUROC and 40% in terms of AUPRC, thus proving our hypothesis and eliminating dependency on availability of structured patient data.
Footnotes
1
ICD-9-CM: International Classification of Diseases, Ninth Revision, Clinical Modification.
 
Literature
1.
go back to reference Appelros, P.: Prediction of length of stay for stroke patients. Acta Neurol. Scand. 116(1), 15–19 (2007) CrossRef Appelros, P.: Prediction of length of stay for stroke patients. Acta Neurol. Scand. 116(1), 15–19 (2007) CrossRef
2.
go back to reference Ayyar, S., Don, O., Iv, W.: Tagging patient notes with ICD-9 codes. In: Proceedings of the 29th Conference on Neural Information Processing Systems (2016) Ayyar, S., Don, O., Iv, W.: Tagging patient notes with ICD-9 codes. In: Proceedings of the 29th Conference on Neural Information Processing Systems (2016)
3.
go back to reference Baumel, T., Nassour-Kassis, J., Cohen, R., Elhadad, M., Elhadad, N.: Multi-label classification of patient notes: case study on ICD code assignment. In: Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence (2018) Baumel, T., Nassour-Kassis, J., Cohen, R., Elhadad, M., Elhadad, N.: Multi-label classification of patient notes: case study on ICD code assignment. In: Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence (2018)
4.
go back to reference Berndorfer, S., Henriksson, A.: Automated diagnosis coding with combined text representations. Stud. Health Technol. Inf. 235, 201 (2017) Berndorfer, S., Henriksson, A.: Automated diagnosis coding with combined text representations. Stud. Health Technol. Inf. 235, 201 (2017)
5.
go back to reference Calvert, J., et al.: Using electronic health record collected clinical variables to predict medical intensive care unit mortality. Ann. Med. Surg. 11, 52–57 (2016) CrossRef Calvert, J., et al.: Using electronic health record collected clinical variables to predict medical intensive care unit mortality. Ann. Med. Surg. 11, 52–57 (2016) CrossRef
6.
go back to reference Choi, E., Bahadori, M.T., Schuetz, A., Stewart, W.F., Sun, J.: Doctor AI: predicting clinical events via recurrent neural networks. In: Machine Learning for Healthcare Conference, pp. 301–318 (2016) Choi, E., Bahadori, M.T., Schuetz, A., Stewart, W.F., Sun, J.: Doctor AI: predicting clinical events via recurrent neural networks. In: Machine Learning for Healthcare Conference, pp. 301–318 (2016)
7.
go back to reference Farkas, R., Szarvas, G.: Automatic construction of rule-based ICD-9-CM coding systems. In: BMC Bioinformatics, vol. 9, p. S10. BioMed Central (2008) Farkas, R., Szarvas, G.: Automatic construction of rule-based ICD-9-CM coding systems. In: BMC Bioinformatics, vol. 9, p. S10. BioMed Central (2008)
8.
go back to reference Gangavarapu, T., Krishnan, G., Kamath, S., Jeganathan, J.: FarSight: long-term disease prediction using unstructured clinical nursing notes. IEEE Trans. Emerg. Top. Comput. 01, 1 (2020) Gangavarapu, T., Krishnan, G., Kamath, S., Jeganathan, J.: FarSight: long-term disease prediction using unstructured clinical nursing notes. IEEE Trans. Emerg. Top. Comput. 01, 1 (2020)
9.
go back to reference Gangavarapu, T., Jayasimha, A., Krishnan, G.S., Kamath, S.: Predicting ICD-9 code groups with fuzzy similarity based supervised multi-label classification of unstructured clinical nursing notes. Knowl.-Based Syst. 190, 105321 (2020) CrossRef Gangavarapu, T., Jayasimha, A., Krishnan, G.S., Kamath, S.: Predicting ICD-9 code groups with fuzzy similarity based supervised multi-label classification of unstructured clinical nursing notes. Knowl.-Based Syst. 190, 105321 (2020) CrossRef
10.
go back to reference Ge, W., Huh, J.W., Park, Y.R., Lee, J.H., Kim, Y.H., Turchin, A.: An interpretable ICU mortality prediction model based on logistic regression and recurrent neural networks with LSTM units. In: AMIA Annual Symposium Proceedings, vol. 2018, p. 460. American Medical Informatics Association (2018) Ge, W., Huh, J.W., Park, Y.R., Lee, J.H., Kim, Y.H., Turchin, A.: An interpretable ICU mortality prediction model based on logistic regression and recurrent neural networks with LSTM units. In: AMIA Annual Symposium Proceedings, vol. 2018, p. 460. American Medical Informatics Association (2018)
11.
go back to reference Harutyunyan, H., Khachatrian, H., Kale, D.C., Galstyan, A.: Multitask learning and benchmarking with clinical time series data. arXiv preprint arXiv:​1703.​07771 (2017) Harutyunyan, H., Khachatrian, H., Kale, D.C., Galstyan, A.: Multitask learning and benchmarking with clinical time series data. arXiv preprint arXiv:​1703.​07771 (2017)
12.
go back to reference Jiang, S., Chin, K.S., Qu, G., Tsui, K.L.: An integrated machine learning framework for hospital readmission prediction. Knowl.-Based Syst. 146, 73–90 (2018) CrossRef Jiang, S., Chin, K.S., Qu, G., Tsui, K.L.: An integrated machine learning framework for hospital readmission prediction. Knowl.-Based Syst. 146, 73–90 (2018) CrossRef
13.
go back to reference Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016) CrossRef Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016) CrossRef
14.
go back to reference Kansagara, D., et al.: Risk prediction models for hospital readmission: a systematic review. JAMA 306(15), 1688–1698 (2011) CrossRef Kansagara, D., et al.: Risk prediction models for hospital readmission: a systematic review. JAMA 306(15), 1688–1698 (2011) CrossRef
15.
go back to reference Krishnan, G.S., Kamath, S.S.: Ontology-driven text feature modeling for disease prediction using unstructured radiological notes. Comput. Sistemas 23(3), 915–922 (2019) Krishnan, G.S., Kamath, S.S.: Ontology-driven text feature modeling for disease prediction using unstructured radiological notes. Comput. Sistemas 23(3), 915–922 (2019)
17.
go back to reference Li, M., et al.: Automated ICD-9 coding via a deep learning approach. IEEE/ACM Trans. Comput. Biol. Bioinf. 16, 1193–1202 (2018) CrossRef Li, M., et al.: Automated ICD-9 coding via a deep learning approach. IEEE/ACM Trans. Comput. Biol. Bioinf. 16, 1193–1202 (2018) CrossRef
18.
go back to reference Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781 (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781 (2013)
19.
go back to reference Miotto, R., Li, L., Kidd, B.A., Dudley, J.T.: Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 6, 26094 (2016) CrossRef Miotto, R., Li, L., Kidd, B.A., Dudley, J.T.: Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 6, 26094 (2016) CrossRef
20.
go back to reference Nédellec, C., et al.: Overview of BioNLP shared task 2013. In: Proceedings of the BioNLP Shared Task 2013 Workshop, pp. 1–7 (2013) Nédellec, C., et al.: Overview of BioNLP shared task 2013. In: Proceedings of the BioNLP Shared Task 2013 Workshop, pp. 1–7 (2013)
21.
go back to reference Purushotham, S., Meng, C., Che, Z., Liu, Y.: Benchmarking deep learning models on large healthcare datasets. J. Biomed. Inf. 83, 112–134 (2018) CrossRef Purushotham, S., Meng, C., Che, Z., Liu, Y.: Benchmarking deep learning models on large healthcare datasets. J. Biomed. Inf. 83, 112–134 (2018) CrossRef
22.
go back to reference Reddy, B.K., Delen, D.: Predicting hospital readmission for lupus patients: an RNN-LSTM-based deep-learning methodology. Comput. Biol. Med. 101, 199–209 (2018) CrossRef Reddy, B.K., Delen, D.: Predicting hospital readmission for lupus patients: an RNN-LSTM-based deep-learning methodology. Comput. Biol. Med. 101, 199–209 (2018) CrossRef
23.
go back to reference Shickel, B., Loftus, T.J., Adhikari, L., Ozrazgat-Baslanti, T., Bihorac, A., Rashidi, P.: DeepSOFA: a continuous acuity score for critically ill patients using clinically interpretable deep learning. Sci. Rep. 9(1), 1879 (2019) CrossRef Shickel, B., Loftus, T.J., Adhikari, L., Ozrazgat-Baslanti, T., Bihorac, A., Rashidi, P.: DeepSOFA: a continuous acuity score for critically ill patients using clinically interpretable deep learning. Sci. Rep. 9(1), 1879 (2019) CrossRef
24.
go back to reference Van Houdenhoven, M., et al.: Optimizing intensive care capacity using individual length-of-stay prediction models. Crit. Care 11(2), R42 (2007) CrossRef Van Houdenhoven, M., et al.: Optimizing intensive care capacity using individual length-of-stay prediction models. Crit. Care 11(2), R42 (2007) CrossRef
25.
go back to reference Xie, P., Xing, E.: A neural architecture for automated ICD coding. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1066–1076 (2018) Xie, P., Xing, E.: A neural architecture for automated ICD coding. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1066–1076 (2018)
26.
go back to reference Zeng, M., Li, M., Fei, Z., Yu, Y., Pan, Y., Wang, J.: Automatic ICD-9 coding via deep transfer learning. Neurocomputing 324, 43–50 (2019) CrossRef Zeng, M., Li, M., Fei, Z., Yu, Y., Pan, Y., Wang, J.: Automatic ICD-9 coding via deep transfer learning. Neurocomputing 324, 43–50 (2019) CrossRef
Metadata
Title
Hybrid Text Feature Modeling for Disease Group Prediction Using Unstructured Physician Notes
Authors
Gokul S. Krishnan
S. Sowmya Kamath
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-50423-6_24