skip to main content
10.1145/3308558.3313630acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Evaluating Neural Text Simplification in the Medical Domain

Published:13 May 2019Publication History

ABSTRACT

Health literacy, i.e. the ability to read and understand medical text, is a relevant component of public health. Unfortunately, many medical texts are hard to grasp by the general population as they are targeted at highly-skilled professionals and use complex language and domain-specific terms. Here, automatic text simplification making text commonly understandable would be very beneficial. However, research and development into medical text simplification is hindered by the lack of openly available training and test corpora which contain complex medical sentences and their aligned simplified versions. In this paper, we introduce such a dataset to aid medical text simplification research. The dataset is created by filtering aligned health sentences using expert knowledge from an existing aligned corpus and a novel simple, language independent monolingual text alignment method. Furthermore, we use the dataset to train a state-of-the-art neural machine translation model, and compare it to a model trained on a general simplification dataset using an automatic evaluation, and an extensive human-expert evaluation.

References

  1. Emil Abrahamsson, Timothy Forni, Maria Skeppstedt, and Maria Kvist. 2014. Medical text simplification using synonym replacement: Adapting assessment of word difficulty to a compounding language. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR). 57-65.Google ScholarGoogle ScholarCross RefCross Ref
  2. Viraj Adduru, Sadid Hasan, Joey Liu, Yuan Ling, Vivek Datla, and Kathy Lee. 2018. Towards dataset creation and establishing baselines for sentence-level neural clinical paraphrase generation and simplification. In The 3rd International Workshop on Knowledge Discovery in Healthcare Data.Google ScholarGoogle Scholar
  3. Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In The semantic web. Springer, 722-735. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Olivier Bodenreider. 2004. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research32, suppl_1 (2004), D267-D270.Google ScholarGoogle Scholar
  5. Jinying Chen, Emily Druhl, Balaji Polepalli Ramesh, Thomas K Houston, Cynthia A Brandt, Donna M Zulman, Varsha G Vimalananda, Samir Malkani, and Hong Yu. 2018. A Natural Language Processing System That Links Medical Terms in Electronic Health Record Notes to Lay Definitions: System Development Using Physician Reviews. Journal of Medical Internet Research20, 1 (2018), e26.Google ScholarGoogle ScholarCross RefCross Ref
  6. Jinying Chen, Abhyuday N Jagannatha, Samah J Fodeh, and Hong Yu. 2017. Ranking medical terms to support expansion of lay language resources for patient comprehension of electronic health record notes: adapted distant supervision approach. JMIR medical informatics5, 4 (2017).Google ScholarGoogle Scholar
  7. Jinying Chen and Hong Yu. 2017. Unsupervised ensemble ranking of terms in electronic health record notes based on their importance to patients. Journal of biomedical informatics68 (2017), 121-131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jinying Chen, Jiaping Zheng, and Hong Yu. 2016. Finding important terms for patients in their electronic health records: a learning-to-rank approach using expert annotations. JMIR medical informatics4, 4 (2016).Google ScholarGoogle Scholar
  9. Kevin Donnelly. 2006. SNOMED-CT: The advanced terminology and coding system for eHealth. Studies in health technology and informatics121 (2006), 279.Google ScholarGoogle Scholar
  10. Goran Glavaš and Sanja Štajner. 2015. Simplifying lexical simplification: do we need simplified corpora?. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Vol. 2. 63-68.Google ScholarGoogle ScholarCross RefCross Ref
  11. Zhe He, Zhiwei Chen, Sanghee Oh, Jinghui Hou, and Jiang Bian. 2017. Enriching consumer health vocabulary through mining a social Q&A site: A similarity-based approach. Journal of biomedical informatics69 (2017), 75-85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. William Hwang, Hannaneh Hajishirzi, Mari Ostendorf, and Wei Wu. 2015. Aligning sentences from standard wikipedia to simple wikipedia. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 211-217.Google ScholarGoogle ScholarCross RefCross Ref
  13. Ling Jiang and Christopher C Yang. 2015. Expanding consumer health vocabularies by learning consumer health expressions from online health social media. In International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction. Springer, 314-320.Google ScholarGoogle ScholarCross RefCross Ref
  14. Tomoyuki Kajiwara and Mamoru Komachi. 2016. Building a monolingual parallel corpus for text simplification using sentence similarity based on alignment between word embeddings. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 1147-1158.Google ScholarGoogle Scholar
  15. Aris Kosmopoulos, Ion Androutsopoulos, and Georgios Paliouras. 2015. Biomedical semantic indexing using dense word vectors in bioasq. J BioMed Semant Suppl BioMedl Inf Retr3410 (2015), 959136040-1510456246.Google ScholarGoogle Scholar
  16. Poorna Kushalnagar, Scott Smith, Melinda Hopper, Claire Ryan, Micah Rinkevich, and Raja Kushalnagar. 2018. Making cancer health text on the Internet easier to read for deaf people who use American Sign Language. Journal of Cancer Education33, 1 (2018), 134-140.Google ScholarGoogle ScholarCross RefCross Ref
  17. Gondy Leroy, David Kauchak, and Alan Hogue. 2016. Effects on text simplification: Evaluation of splitting up noun phrases. Journal of health communication21, sup1 (2016), 18-26.Google ScholarGoogle Scholar
  18. Carolyn E Lipscomb. 2000. Medical subject headings (MeSH). Bulletin of the Medical Library Association88, 3(2000), 265.Google ScholarGoogle Scholar
  19. Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025(2015).Google ScholarGoogle Scholar
  20. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781(2013).Google ScholarGoogle Scholar
  21. Partha Mukherjee, Gondy Leroy, David Kauchak, Srinidhi Rajanarayanan, Damian Y Romero Diaz, Nicole P Yuan, T Gail Pritchard, and Sonia Colina. 2017. NegAIT: A new parser for medical text simplification using morphological, sentential and double negation. Journal of biomedical informatics69 (2017), 55-62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Sergiu Nisioi, Sanja Štajner, Simone Paolo Ponzetto, and Liviu P Dinu. 2017. Exploring neural text simplification models. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vol. 2. 85-91.Google ScholarGoogle ScholarCross RefCross Ref
  23. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311-318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Basel Qenam, Tae Youn Kim, Mark J Carroll, and Michael Hogarth. 2017. Text simplification using consumer health vocabulary to generate patient-centered radiology reporting: translation and evaluation. Journal of medical Internet research19, 12 (2017).Google ScholarGoogle Scholar
  25. Isabel Segura-Bedmar and Paloma Martínez. 2017. Simplifying drug package leaflets written in Spanish by using word embedding. Journal of biomedical semantics8, 1 (2017), 45.Google ScholarGoogle ScholarCross RefCross Ref
  26. Luca Soldaini and Nazli Goharian. 2016. Quickumls: a fast, unsupervised approach for medical concept extraction. In MedIR workshop, sigir.Google ScholarGoogle Scholar
  27. Sanja Štajner and Goran Glavaš. 2017. Leveraging event-based semantics for automated text simplification. Expert systems with applications82 (2017), 383-395. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Elior Sulem, Omri Abend, and Ari Rappoport. 2018. Simple and Effective Text Simplification Using Semantic and Neural Methods. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 162-173.Google ScholarGoogle ScholarCross RefCross Ref
  29. Sharon Swee-Lin Tan and Nadee Goonawardene. 2017. Internet health information seeking and the patient-physician relationship: a systematic review. Journal of medical Internet research19, 1 (2017).Google ScholarGoogle Scholar
  30. VG Vinod Vydiswaran, Qiaozhu Mei, David A Hanauer, and Kai Zheng. 2014. Mining consumer health vocabulary from community-generated text. In AMIA Annual Symposium Proceedings, Vol. 2014. American Medical Informatics Association, 1150.Google ScholarGoogle Scholar
  31. World Health Organization (WHO and others. 2018. Health literacy. The solid facts. Self (2018).Google ScholarGoogle Scholar
  32. Sander Wubben, Antal Van Den Bosch, and Emiel Krahmer. 2012. Sentence simplification by monolingual machine translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 1015-1024. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Deborah X Xie, Ray Y Wang, and Sivakumar Chinnadurai. 2018. Readability of online patient education materials for velopharyngeal insufficiency. International journal of pediatric otorhinolaryngology104 (2018), 113-119.Google ScholarGoogle ScholarCross RefCross Ref
  34. Wei Xu, Chris Callison-Burch, and Courtney Napoles. 2015. Problems in current text simplification research: New data can help. Transactions of the Association of Computational Linguistics3, 1(2015), 283-297.Google ScholarGoogle ScholarCross RefCross Ref
  35. Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, and Chris Callison-Burch. 2016. Optimizing statistical machine translation for text simplification. Transactions of the Association for Computational Linguistics4 (2016), 401-415.Google ScholarGoogle Scholar
  36. Ming Yang and Melody Kiang. 2015. Extracting Consumer Health Expressions of Drug Safety from Web Forum. In System Sciences (HICSS), 2015 48th Hawaii International Conference on. IEEE, 2896-2905. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Qing T Zeng and Tony Tse. 2006. Exploring and developing consumer health vocabularies. Journal of the American Medical Informatics Association13, 1(2006), 24-29.Google ScholarGoogle ScholarCross RefCross Ref
  38. Zhemin Zhu, Delphine Bernhard, and Iryna Gurevych. 2010. A monolingual tree-based translation model for sentence simplification. In Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics, 1353-1361. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    WWW '19: The World Wide Web Conference
    May 2019
    3620 pages
    ISBN:9781450366748
    DOI:10.1145/3308558

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 May 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate1,899of8,196submissions,23%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format