ABSTRACT
Accurate knowledge of a patient's disease state and trajectory is critical in a clinical setting. Modern electronic healthcare records contain an increasingly large amount of data, and the ability to automatically identify the factors that influence patient outcomes stand to greatly improve the efficiency and quality of care.
We examined the use of latent variable models (viz. Latent Dirichlet Allocation) to decompose free-text hospital notes into meaningful features, and the predictive power of these features for patient mortality. We considered three prediction regimes: (1) baseline prediction, (2) dynamic (time-varying) outcome prediction, and (3) retrospective outcome prediction. In each, our prediction task differs from the familiar time-varying situation whereby data accumulates; since fewer patients have long ICU stays, as we move forward in time fewer patients are available and the prediction task becomes increasingly difficult.
We found that latent topic-derived features were effective in determining patient mortality under three timelines: in-hospital, 30 day post-discharge, and 1 year post-discharge mortality. Our results demonstrated that the latent topic features important in predicting hospital mortality are very different from those that are important in post-discharge mortality. In general, latent topic features were more predictive than structured features, and a combination of the two performed best.
The time-varying models that combined latent topic features and baseline features had AUCs that reached 0.85, 0.80, and 0.77 for in-hospital, 30 day post-discharge and 1 year post-discharge mortality respectively. Our results agreed with other work suggesting that the first 24 hours of patient information are often the most predictive of hospital mortality. Retrospective models that used a combination of latent topic features and structured features achieved AUCs of 0.96, 0.82, and 0.81 for in-hospital, 30 day, and 1-year mortality prediction.
Our work focuses on the dynamic (time-varying) setting because models from this regime could facilitate an on-going severity stratification system that helps direct care-staff resources and inform treatment strategies.
Supplemental Material
- C. Arnold et al. Clinical case-based retrieval using latent topic analysis. In AMIA Annual Symposium Proceedings, volume 2010, page 26. AMIA, 2010.Google Scholar
- D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. JMLR, 3(5):993--1022, 2003. Google ScholarDigital Library
- D. M. Blei and J. D. Lafferty. Dynamic topic models. In Proceedings of the 23rd international conference on Machine learning, pages 113--120. ACM, 2006. Google ScholarDigital Library
- C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM TIST, 2:27:1--27:27, 2011. Google ScholarDigital Library
- M. Ghassemi, T. Naumann, R. Joshi, and A. Rumshisky. Topic models for mortality modeling in intensive care units. In Proceedings of ICML 2012(Machine Learning for Clinical Data Analysis Workshop), Poster Presentation, Edinburgh, UK, June 2012.Google Scholar
- T. Griffiths and M. Steyvers. Finding scientific topics. In PNAS, volume 101, pages 5228--5235, 2004.Google ScholarCross Ref
- C. W. Hug and P. Szolovits. Icu acuity: real-time models versus daily models. In AMIA Annual Symposium Proceedings, volume 2009, page 260. American Medical Informatics Association, 2009.Google Scholar
- A. E. Johnson, A. A. Kramer, and G. D. Clifford. A new severity of illness scale using a subset of acute physiology and chronic health evaluation data elements shows comparable predictive accuracy*. Critical care medicine, 41(7):1711--1718, 2013.Google Scholar
- W. A. Knaus, D. Wagner, E. e. a. Draper, J. Zimmerman, M. Bergner, P. G. Bastos, C. Sirio, D. Murphy, T. Lotring, and A. Damiano. The apache iii prognostic system. risk prediction of hospital mortality for critically ill hospitalized adults. CHEST Journal, 100(6):1619--1636, 1991.Google ScholarCross Ref
- J. Le Gall, S. Lemeshow, and F. Saulnier. A new simplified acute physiology score (saps ii) based on a european/north american multicenter study. JAMA, 270(24):2957--2963, 1993.Google ScholarCross Ref
- L.-w. Lehman, M. Saeed, W. Long, J. Lee, and R. Mark. Risk stratification of icu patients using topic models inferred from unstructured progress notes. In AMIA Annual Symposium Proceedings, volume 2012, page 505. American Medical Informatics Association, 2012.Google Scholar
- B. M. Marlin, D. C. Kale, R. G. Khemani, and R. C. Wetzel. Unsupervised pattern discovery in electronic health care data using probabilistic clustering models. In Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, pages 389--398. ACM, 2012. Google ScholarDigital Library
- M. Saeed et al. Multiparameter Intelligent Monitoring in Intensive Care II: A public-access intensive care unit database. Critical Care Medicine, 39(5):952--960, May 2011.Google ScholarCross Ref
- G. Salton and C. S. Yang. On the specification of term values in automatic indexing. Journal of Documentation, 29(4):351--372, 1973.Google ScholarCross Ref
- S. Saria, G. McElvain, A. K. Rajani, A. A. Penn, and D. L. Koller. Combining structured and free-text data for automatic coding of patient outcomes. In AMIA Annual Symposium Proceedings, volume 2010, page 712. American Medical Informatics Association, 2010.Google Scholar
- G. Siontis, I. Tzoulaki, and J. Ioannidis. Predicting death: an empirical evaluation of predictive tools for mortality. Archives of internal medicine, pages archinternmed-2011, 2011.Google Scholar
- J.-L. Vincent, R. Moreno, J. Takala, S. Willatts, A. De Mendonca, H. Bruining, C. Reinhart, P. Suter, and L. Thijs. The sofa (sepsis-related organ failure assessment) score to describe organ dysfunction/failure. Intensive care medicine, 22(7):707--710, 1996.Google Scholar
Index Terms
- Unfolding physiological state: mortality modelling in intensive care units
Recommendations
Predictive modelling of survival and length of stay in critically ill patients using sequential organ failure scores
HighlightsA dataset of 14,480 critically ill patients within the ICU was collected.Machine learning models are constructed to predict patient length of stay and mortality.Prediction accuracy was improved by using a two-by-two classification ...
A Data-Driven Model Based on Support Vector Machine to Identify Chronic Hypertensive and Diabetic Patients
Physiological Computing SystemsAbstractHypertension and diabetes are chronic conditions that have a considerable prevalence in the elderly. It is estimated that both hypertensive patients and people with diagnosed diabetes double cost of normotensive individuals and those in the ...
Predicting ICU readmission using grouped physiological and medication trends
Abstract BackgroundPatients who are readmitted to an intensive care unit (ICU) usually have a high risk of mortality and an increased length of stay. ICU readmission risk prediction may help physicians to re-evaluate the patient’s ...
Comments