ABSTRACT
Health literacy, i.e. the ability to read and understand medical text, is a relevant component of public health. Unfortunately, many medical texts are hard to grasp by the general population as they are targeted at highly-skilled professionals and use complex language and domain-specific terms. Here, automatic text simplification making text commonly understandable would be very beneficial. However, research and development into medical text simplification is hindered by the lack of openly available training and test corpora which contain complex medical sentences and their aligned simplified versions. In this paper, we introduce such a dataset to aid medical text simplification research. The dataset is created by filtering aligned health sentences using expert knowledge from an existing aligned corpus and a novel simple, language independent monolingual text alignment method. Furthermore, we use the dataset to train a state-of-the-art neural machine translation model, and compare it to a model trained on a general simplification dataset using an automatic evaluation, and an extensive human-expert evaluation.
- Emil Abrahamsson, Timothy Forni, Maria Skeppstedt, and Maria Kvist. 2014. Medical text simplification using synonym replacement: Adapting assessment of word difficulty to a compounding language. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR). 57-65.Google ScholarCross Ref
- Viraj Adduru, Sadid Hasan, Joey Liu, Yuan Ling, Vivek Datla, and Kathy Lee. 2018. Towards dataset creation and establishing baselines for sentence-level neural clinical paraphrase generation and simplification. In The 3rd International Workshop on Knowledge Discovery in Healthcare Data.Google Scholar
- Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In The semantic web. Springer, 722-735. Google ScholarDigital Library
- Olivier Bodenreider. 2004. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research32, suppl_1 (2004), D267-D270.Google Scholar
- Jinying Chen, Emily Druhl, Balaji Polepalli Ramesh, Thomas K Houston, Cynthia A Brandt, Donna M Zulman, Varsha G Vimalananda, Samir Malkani, and Hong Yu. 2018. A Natural Language Processing System That Links Medical Terms in Electronic Health Record Notes to Lay Definitions: System Development Using Physician Reviews. Journal of Medical Internet Research20, 1 (2018), e26.Google ScholarCross Ref
- Jinying Chen, Abhyuday N Jagannatha, Samah J Fodeh, and Hong Yu. 2017. Ranking medical terms to support expansion of lay language resources for patient comprehension of electronic health record notes: adapted distant supervision approach. JMIR medical informatics5, 4 (2017).Google Scholar
- Jinying Chen and Hong Yu. 2017. Unsupervised ensemble ranking of terms in electronic health record notes based on their importance to patients. Journal of biomedical informatics68 (2017), 121-131. Google ScholarDigital Library
- Jinying Chen, Jiaping Zheng, and Hong Yu. 2016. Finding important terms for patients in their electronic health records: a learning-to-rank approach using expert annotations. JMIR medical informatics4, 4 (2016).Google Scholar
- Kevin Donnelly. 2006. SNOMED-CT: The advanced terminology and coding system for eHealth. Studies in health technology and informatics121 (2006), 279.Google Scholar
- Goran Glavaš and Sanja Štajner. 2015. Simplifying lexical simplification: do we need simplified corpora?. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Vol. 2. 63-68.Google ScholarCross Ref
- Zhe He, Zhiwei Chen, Sanghee Oh, Jinghui Hou, and Jiang Bian. 2017. Enriching consumer health vocabulary through mining a social Q&A site: A similarity-based approach. Journal of biomedical informatics69 (2017), 75-85. Google ScholarDigital Library
- William Hwang, Hannaneh Hajishirzi, Mari Ostendorf, and Wei Wu. 2015. Aligning sentences from standard wikipedia to simple wikipedia. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 211-217.Google ScholarCross Ref
- Ling Jiang and Christopher C Yang. 2015. Expanding consumer health vocabularies by learning consumer health expressions from online health social media. In International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction. Springer, 314-320.Google ScholarCross Ref
- Tomoyuki Kajiwara and Mamoru Komachi. 2016. Building a monolingual parallel corpus for text simplification using sentence similarity based on alignment between word embeddings. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 1147-1158.Google Scholar
- Aris Kosmopoulos, Ion Androutsopoulos, and Georgios Paliouras. 2015. Biomedical semantic indexing using dense word vectors in bioasq. J BioMed Semant Suppl BioMedl Inf Retr3410 (2015), 959136040-1510456246.Google Scholar
- Poorna Kushalnagar, Scott Smith, Melinda Hopper, Claire Ryan, Micah Rinkevich, and Raja Kushalnagar. 2018. Making cancer health text on the Internet easier to read for deaf people who use American Sign Language. Journal of Cancer Education33, 1 (2018), 134-140.Google ScholarCross Ref
- Gondy Leroy, David Kauchak, and Alan Hogue. 2016. Effects on text simplification: Evaluation of splitting up noun phrases. Journal of health communication21, sup1 (2016), 18-26.Google Scholar
- Carolyn E Lipscomb. 2000. Medical subject headings (MeSH). Bulletin of the Medical Library Association88, 3(2000), 265.Google Scholar
- Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025(2015).Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781(2013).Google Scholar
- Partha Mukherjee, Gondy Leroy, David Kauchak, Srinidhi Rajanarayanan, Damian Y Romero Diaz, Nicole P Yuan, T Gail Pritchard, and Sonia Colina. 2017. NegAIT: A new parser for medical text simplification using morphological, sentential and double negation. Journal of biomedical informatics69 (2017), 55-62. Google ScholarDigital Library
- Sergiu Nisioi, Sanja Štajner, Simone Paolo Ponzetto, and Liviu P Dinu. 2017. Exploring neural text simplification models. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vol. 2. 85-91.Google ScholarCross Ref
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311-318. Google ScholarDigital Library
- Basel Qenam, Tae Youn Kim, Mark J Carroll, and Michael Hogarth. 2017. Text simplification using consumer health vocabulary to generate patient-centered radiology reporting: translation and evaluation. Journal of medical Internet research19, 12 (2017).Google Scholar
- Isabel Segura-Bedmar and Paloma Martínez. 2017. Simplifying drug package leaflets written in Spanish by using word embedding. Journal of biomedical semantics8, 1 (2017), 45.Google ScholarCross Ref
- Luca Soldaini and Nazli Goharian. 2016. Quickumls: a fast, unsupervised approach for medical concept extraction. In MedIR workshop, sigir.Google Scholar
- Sanja Štajner and Goran Glavaš. 2017. Leveraging event-based semantics for automated text simplification. Expert systems with applications82 (2017), 383-395. Google ScholarDigital Library
- Elior Sulem, Omri Abend, and Ari Rappoport. 2018. Simple and Effective Text Simplification Using Semantic and Neural Methods. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 162-173.Google ScholarCross Ref
- Sharon Swee-Lin Tan and Nadee Goonawardene. 2017. Internet health information seeking and the patient-physician relationship: a systematic review. Journal of medical Internet research19, 1 (2017).Google Scholar
- VG Vinod Vydiswaran, Qiaozhu Mei, David A Hanauer, and Kai Zheng. 2014. Mining consumer health vocabulary from community-generated text. In AMIA Annual Symposium Proceedings, Vol. 2014. American Medical Informatics Association, 1150.Google Scholar
- World Health Organization (WHO and others. 2018. Health literacy. The solid facts. Self (2018).Google Scholar
- Sander Wubben, Antal Van Den Bosch, and Emiel Krahmer. 2012. Sentence simplification by monolingual machine translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 1015-1024. Google ScholarDigital Library
- Deborah X Xie, Ray Y Wang, and Sivakumar Chinnadurai. 2018. Readability of online patient education materials for velopharyngeal insufficiency. International journal of pediatric otorhinolaryngology104 (2018), 113-119.Google ScholarCross Ref
- Wei Xu, Chris Callison-Burch, and Courtney Napoles. 2015. Problems in current text simplification research: New data can help. Transactions of the Association of Computational Linguistics3, 1(2015), 283-297.Google ScholarCross Ref
- Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, and Chris Callison-Burch. 2016. Optimizing statistical machine translation for text simplification. Transactions of the Association for Computational Linguistics4 (2016), 401-415.Google Scholar
- Ming Yang and Melody Kiang. 2015. Extracting Consumer Health Expressions of Drug Safety from Web Forum. In System Sciences (HICSS), 2015 48th Hawaii International Conference on. IEEE, 2896-2905. Google ScholarDigital Library
- Qing T Zeng and Tony Tse. 2006. Exploring and developing consumer health vocabularies. Journal of the American Medical Informatics Association13, 1(2006), 24-29.Google ScholarCross Ref
- Zhemin Zhu, Delphine Bernhard, and Iryna Gurevych. 2010. A monolingual tree-based translation model for sentence simplification. In Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics, 1353-1361. Google ScholarDigital Library
Recommendations
Leveraging Social Media for Medical Text Simplification
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information RetrievalPatients are increasingly using the web for understanding medical information, making health decisions, and validating physicians' advice. However, most of this content is tailored to an expert audience, due to which people with inadequate health ...
Text simplification using Neural Machine Translation
AAAI'16: Proceedings of the Thirtieth AAAI Conference on Artificial IntelligenceText simplification (TS) is the technique of reducing the lexical, syntactical complexity of text. Existing automatic TS systems can simplify text only by lexical simplification or by manually defined rules. Neural Machine Translation (NMT) is a recently ...
Quadrilateral mesh simplification
We introduce a simplification algorithm for meshes composed of quadrilateral elements. It is reminiscent of edge-collapse based methods for triangle meshes, but takes a novel approach to the challenging problem of maintaining the quadrilateral ...
Comments