skip to main content
10.1145/3097983.3098126acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

GRAM: Graph-based Attention Model for Healthcare Representation Learning

Published:04 August 2017Publication History

ABSTRACT

Deep learning methods exhibit promising performance for predictive modeling in healthcare, but two important challenges remain: -

  • Data insufficiency: Often in healthcare predictive modeling, the sample size is insufficient for deep learning methods to achieve satisfactory results.

  • Interpretation: The representations learned by deep learning methods should align with medical knowledge.

To address these challenges, we propose GRaph-based Attention Model (GRAM) that supplements electronic health records (EHR) with hierarchical information inherent to medical ontologies. Based on the data volume and the ontology structure, GRAM represents a medical concept as a combination of its ancestors in the ontology via an attention mechanism.

We compared predictive performance (i.e. accuracy, data needs, interpretability) of GRAM to various methods including the recurrent neural network (RNN) in two sequential diagnoses prediction tasks and one heart failure prediction task. Compared to the basic RNN, GRAM achieved 10% higher accuracy for predicting diseases rarely observed in the training data and 3% improved area under the ROC curve for predicting heart failure using an order of magnitude less training data. Additionally, unlike other methods, the medical concept representations learned by GRAM are well aligned with the medical ontology. Finally, GRAM exhibits intuitive attention behaviors by adaptively generalizing to higher level concepts when facing data insufficiency at the lower level concepts.

References

  1. Jimmy Ba, Volodymyr Mnih, and Koray Kavukcuoglu. 2014. Multiple object recognition with visual attention. arXiv:1412.7755 (2014).Google ScholarGoogle Scholar
  2. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv:1409.0473 (2014).Google ScholarGoogle Scholar
  3. Yoshua Bengio, Patrice Simard, and Paolo Frasconi. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5, 2 (1994). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In NIPS.Google ScholarGoogle Scholar
  6. Zhengping Che, David Kale, Wenzhe Li, Mohammad Taha Bahadori, and Yan Liu. 2015. Deep Computational Phenotyping. In SIGKDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. 2016. Recurrent Neural Networks for Multivariate Time Series with Missing Values. arXiv:1606.01865 (2016).Google ScholarGoogle Scholar
  8. Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP. Google ScholarGoogle ScholarCross RefCross Ref
  9. Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F. Stewart, and Jimeng Sun. 2016. Doctor AI: Predicting Clinical Events via Recurrent Neural Networks. In MLHC.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F. Stewart, and Jimeng Sun. 2016. RETAIN: Interpretable Predictive Model in Healthcare using Reverse Time Attention Mechanism. In NIPS.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Edward Choi, Mohammad Taha Bahadori, Elizabeth Searles, Catherine Coffey, Michael Thompson, James Bost, Javier T Sojo, and Jimeng Sun. 2016. Multi-layer Representation Learning for Medical Concepts. In SIGKDD.Google ScholarGoogle Scholar
  12. Edward Choi, Andy Schuetz, Walter F Stewart, and Jimeng Sun. 2016. Using Recurrent Neural Network Models for Early Detection of Heart Failure Onset. JAMIA (2016).Google ScholarGoogle Scholar
  13. Youngduck Choi, Chill Yi-I Chiu, and David Sontag. 2016. Learning Low-Dimensional Representations of Medical Concepts. (2016). AMIA CRI.Google ScholarGoogle Scholar
  14. Jan Chorowski, Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. End-to-end continuous speech recognition using attention-based recurrent NN: First results. arXiv:1412.1602 (2014).Google ScholarGoogle Scholar
  15. Ary Goldberger and others. 2000. Physiobank, physiotoolkit, and physionet components of a new research resource for complex physiologic signals. Circulation (2000).Google ScholarGoogle Scholar
  16. Aditya Grover and Jure Leskovec. 2016. Node2Vec: Scalable Feature Learning for Networks. In SIGKDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jerry Gurwitz, David Magid, David Smith, Robert Goldberg, David McManus, Larry Allen, Jane Saczynski, Micah Thorp, Grace Hsu, Sue Hee Sung, and others. 2013. Contemporary prevalence and correlates of incident heart failure with preserved ejection fraction. The American journal of medicine 126, 5 (2013). Google ScholarGoogle ScholarCross RefCross Ref
  18. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Alistair Johnson and others. 2016. MIMIC-III, a freely accessible critical care database. Scientific Data 3 (2016). Google ScholarGoogle ScholarCross RefCross Ref
  20. Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv:1609.02907 (2016).Google ScholarGoogle Scholar
  21. Quoc V Le, Navdeep Jaitly, and Geoffrey E Hinton. 2015. A Simple Way to Initialize Recurrent Networks of Rectified Linear Units. arXiv:1504.00941 (2015).Google ScholarGoogle Scholar
  22. Yuezhang Li, Ronghuo Zheng, Tian Tian, Zhiting Hu, Rahul Iyer, and Katia Sycara. 2016. Joint Embedding of Hierarchical Categories and Entities for Concept Categorization and Dataless Classification. (2016).Google ScholarGoogle Scholar
  23. Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In AAAI.Google ScholarGoogle Scholar
  24. Zachary C Lipton, David C Kale, Charles Elkan, and Randall Wetzell. 2015. Learning to Diagnose with LSTM Recurrent Neural Networks. arXiv:1511.03677 (2015).Google ScholarGoogle Scholar
  25. Zachary C Lipton, David C Kale, and Randall Wetzel. 2016. Modeling Missing Data in Clinical Time Series with RNNs. In MLHC.Google ScholarGoogle Scholar
  26. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. JMLR 9, Nov (2008).Google ScholarGoogle Scholar
  27. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS.Google ScholarGoogle Scholar
  28. George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM 38, 11 (1995).Google ScholarGoogle Scholar
  29. Riccardo Miotto, Li Li, Brian A Kidd, and Joel T Dudley. 2016. Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records. Scientific Reports 6 (2016).Google ScholarGoogle Scholar
  30. Phuoc Nguyen, Truyen Tran, Nilmini Wickramasinghe, and Svetha Venkatesh. 2016. Deepr: A Convolutional Net for Medical Records. arXiv:1607.07519 (2016).Google ScholarGoogle Scholar
  31. Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global Vectors for Word Representation. In EMNLP.Google ScholarGoogle Scholar
  32. Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online Learning of Social Representations. In SIGKDD.Google ScholarGoogle Scholar
  33. Healthcare Cost & Utilization Project and others. 2010. Clinical classifications software (CCS) for ICD-9-CM. Rockville, MD: Agency for Healthcare Research and Quality (2010).Google ScholarGoogle Scholar
  34. Narges Razavian, Jake Marcus, and David Sontag. 2016. Multi-task Prediction of Disease Onsets from Longitudinal Lab Tests. In MLHC.Google ScholarGoogle Scholar
  35. Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In NIPS.Google ScholarGoogle Scholar
  36. Michael Q Stearns, Colin Price, Kent A Spackman, and Amy Y Wang. 2001. SNOMED clinical terms: overview of the development process and project status. In AMIA.Google ScholarGoogle Scholar
  37. Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale Information Network Embedding. In WWW.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. The Theano Development Team. 2016. Theano: A Python framework for fast computation of mathematical expressions. arXiv:1605.02688 (2016).Google ScholarGoogle Scholar
  39. Rajakrishnan Vijayakrishnan, Steven Steinhubl, Kenney Ng, Jimeng Sun, Roy Byrd, Zahra Daar, Brent Williams, Shahram Ebadollahi, Walter Stewart, and others. 2014. Prevalence of heart failure signs and symptoms in a large primary care population identified through the use of text and data mining of the electronic health record. Journal of cardiac failure 20, 7 (2014). Google ScholarGoogle ScholarCross RefCross Ref
  40. Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. In AAAI .Google ScholarGoogle Scholar
  41. Kilian Q Weinberger, Fei Sha, Qihui Zhu, and Lawrence K Saul. 2006. Graph Laplacian Regularization for Large-Scale Semidefinite Programming. In NIPS.Google ScholarGoogle Scholar
  42. Ruobing Xie, Zhiyuan Liu, and Maosong Sun. 2016. Representation Learning of Knowledge Graphs with Hierarchical Types. In IJCAI.Google ScholarGoogle Scholar
  43. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, and Yoshua Bengio. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.. In ICML.Google ScholarGoogle Scholar
  44. Zhilin Yang, William Cohen, and Ruslan Salakhutdinov. 2016. Revisiting Semi- Supervised Learning with Graph Embeddings. arXiv:1603.08861 (2016).Google ScholarGoogle Scholar
  45. Matthew D Zeiler. 2012. ADADELTA: an adaptive learning rate method. arXiv:1212.5701 (2012)Google ScholarGoogle Scholar

Index Terms

  1. GRAM: Graph-based Attention Model for Healthcare Representation Learning

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
          August 2017
          2240 pages
          ISBN:9781450348874
          DOI:10.1145/3097983

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 August 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          KDD '17 Paper Acceptance Rate64of748submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader