Top

Published in:

2021 | OriginalPaper | Chapter

Drug and Disease Interpretation Learning with Biomedical Entity Representation Transformer

Authors : Zulfat Miftahutdinov, Artur Kadurin, Roman Kudrin, Elena Tutubalina

Published in: Advances in Information Retrieval

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Concept normalization in free-form texts is a crucial step in every text-mining pipeline. Neural architectures based on Bidirectional Encoder Representations from Transformers (BERT) have achieved state-of-the-art results in the biomedical domain. In the context of drug discovery and development, clinical trials are necessary to establish the efficacy and safety of drugs. We investigate the effectiveness of transferring concept normalization from the general biomedical domain to the clinical trials domain in a zero-shot setting with an absence of labeled data. We propose a simple and effective two-stage neural approach based on fine-tuned BERT architectures. In the first stage, we train a metric learning model that optimizes relative similarity of mentions and concepts via triplet loss. The model is trained on available labeled corpora of scientific abstracts to obtain vector embeddings of concept names and entity mentions from texts. In the second stage, we find the closest concept name representation in an embedding space to a given clinical mention. We evaluated several models, including state-of-the-art architectures, on a dataset of abstracts and a real-world dataset of trial records with interventions and conditions mapped to drug and disease terminologies. Extensive experiments validate the effectiveness of our approach in knowledge transfer from the scientific literature to clinical trials.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Predicting User Engagement Status for Online Evaluation of Intelligent Assistants

next chapter CEQE: Contextualized Embeddings for Query Expansion

https://clinicaltrials.gov/.

https://www.ctti-clinicaltrials.org/aact-database.

Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium, p. 17. American Medical Informatics Association (2001)

Atal, I., Zeitoun, J.D., Névéol, A., Ravaud, P., Porcher, R., Trinquart, L.: Automatic classification of registered clinical trials towards the global burden of diseases taxonomy of diseases and injuries. BMC Bioinform. 17(1), 392 (2016)CrossRef

Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(suppl\_1), D267–D270 (2004)

Boland, M.R., Miotto, R., Gao, J., Weng, C.: Feasibility of feature-based indexing, clustering, and search of clinical trials. Meth. Inf. Med. 52(05), 382–394 (2013)CrossRef

Brown, A.S., Patel, C.J.: A standard database for drug repositioning. Sci. Data 4(1), 1–7 (2017)CrossRef

Coletti, M.H., Bleich, H.L.: Medical subject headings used to search the biomedical literature. J. Am. Med. Inform. Assoc. 8(4), 317–323 (2001)CrossRef

Davis, A.P., et al.: The comparative toxicogenomics database: update 2019. Nucleic Acids Res. 47(D1), D948–D954 (2019)CrossRef

Davis, A.P., Wiegers, T.C., Rosenstein, M.C., Mattingly, C.J.: Medic: a practical disease vocabulary used at the comparative toxicogenomics database. Database 2012 (2012)

Dermouche, M., Looten, V., Flicoteaux, R., Chevret, S., Velcin, J., Taright, N.: ECSTRA-INSERM@ CLEF eHealth2016-task 2: ICD10 code extraction from death certificates. CLEF (2016)

10.

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)

11.

Gayvert, K.M., Madhukar, N.S., Elemento, O.: A data-driven approach to predicting successes and failures of clinical trials. Cell Chem. Biol. 23(10), 1294–1301 (2016)CrossRef

12.

Ghiasvand, O., Kate, R.J.: UWM: Disorder mention extraction from clinical text using CRFs and normalization using learned edit distance patterns. In: SemEval@ COLING, pp. 828–832 (2014)

13.

Gill, S.K., Christopher, A.F., Gupta, V., Bansal, P.: Emerging role of bioinformatics tools and software in evolution of clinical research. Perspect. Clin. Res. 7(3), 115 (2016)CrossRef

14.

Gillick, D., et al.: Learning dense representations for entity retrieval. In: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pp. 528–537 (2019)

15.

Hao, T., Rusanov, A., Boland, M.R., Weng, C.: Clustering clinical trials with similar eligibility criteria features. J. Biomed. Inform. 52, 112–120 (2014)CrossRef

16.

Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: Feragen, A., Pelillo, M., Loog, M. (eds.) SIMBAD 2015. LNCS, vol. 9370, pp. 84–92. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24261-3_7CrossRef

17.

Huang, C.C., Lu, Z.: Community challenges in biomedical text mining over 10 years: success, failure and the future. Briefings Bioinform. 17(1), 132–144 (2015)CrossRef

18.

Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 2333–2338 (2013)

19.

Humeau, S., Shuster, K., Lachaux, M.A., Weston, J.: Poly-encoders: transformer architectures and pre-training strategies for fast and accurate multi-sentence scoring. CoRR abs/1905.01969. External Links: Link Cited by 2, 2–2 (2019)

20.

Ivanenkov, Y., et al.: Identification of novel antibacterials using machine-learning techniques. Front. Pharmacol. 10, 913 (2019)CrossRef

21.

Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUS. arXiv preprint arXiv:1702.08734 (2017)

22.

Leaman, R., Lu, Z.: Taggerone: joint named entity recognition and normalization with semi-markov models. Bioinformatics 32(18), 2839–2846 (2016)CrossRef

23.

Lee, J., et al.: Biobert: pre-trained biomedical language representation model for biomedical text mining. Bioinformatics (2019)

24.

Leveling, J.: Patient selection for clinical trials based on concept-based retrieval and result filtering and ranking. In: TREC (2017)

25.

Li, H., et al.: CNN-based ranking for biomedical entity normalization. BMC Bioinform. 18(11), 79–86 (2017)

26.

Li, J., Lu, Z.: Systematic identification of pharmacogenomics information from clinical trials. J. Biomed. Inform. 45(5), 870–878 (2012)CrossRef

27.

Li, J., et al.: Biocreative V CDR task corpus: a resource for chemical disease relation extraction. Database 2016 (2016)

28.

Liu, Y., Guo, Y., Bakker, E.M., Lew, M.S.: Learning a recurrent residual fusion network for multimodal matching. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4107–4116 (2017)

29.

Lo, B.: Sharing clinical trial data: maximizing benefits, minimizing risk. Jama 313(8), 793–794 (2015)CrossRef

30.

McNemar, Q.: Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2), 153–157 (1947)CrossRef

31.

Miftahutdinov, Z., Tutubalina, E.: Deep neural models for medical concept normalization in user-generated texts. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 393–399 (2019)

32.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

33.

Mork, J.G., Jimeno-Yepes, A., Aronson, A.R.: The NLM medical text indexer system for indexing biomedical literature. In: BioASQ@ CLEF (2013)

34.

NLM: Umls glossary (2016). http://www.nlm.nih.gov/research/umls/new_users/glossary.html

35.

Phan, M.C., Sun, A., Tay, Y.: Robust representation learning of biomedical names. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3275–3285 (2019)

36.

Pradhan, S., Elhadad, N., Chapman, W.W., Manandhar, S., Savova, G.: Semeval-2014 task 7: Analysis of clinical text. In: SemEval@ COLING, pp. 54–62 (2014)

37.

Reimers, N., Gurevych, I.: Sentence-bert: sentence embeddings using siamese bert-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3973–3983 (2019)

38.

Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)

39.

Sen, A., et al.: The representativeness of eligible patients in type 2 diabetes trials: a case study using gist 2.0. J. Am. Med. Inform. Assoc. 25(3), 239–247 (2018)

40.

Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 27(2), 443–460 (2014)CrossRef

41.

Sung, M., Jeon, H., Lee, J., Kang, J.: Biomedical entity representations with synonym marginalization. arXiv preprint arXiv:2005.00239 (2020)

42.

Suominen, H., et al.: Overview of the ShARe/CLEF eHealth evaluation lab 2013. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 212–231. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40802-1_24CrossRef

43.

Tutubalina, E., Kadurin, A., Miftahutdinov, Z.: Fair evaluation in concept normalization: a large-scale comparative analysis for bert-based models. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 6710–6716 (2020)

44.

Tutubalina, E., Miftahutdinov, Z., Nikolenko, S., Malykh, V.: Medical concept normalization in social media posts with recurrent neural networks. J. Biomed. Inform. 84, 93–102 (2018)CrossRef

45.

Van Mulligen, E., Afzal, Z., Akhondi, S.A., Vo, D., Kors, J.A.: Erasmus MC at CLEF eHealth 2016: Concept recognition and coding in French texts. CLEF (2016)

46.

Wishart, D.S., et al.: Drugbank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 34(suppl\_1), D668–D672 (2006)

47.

Wright, D., Katsis, Y., Mehta, R., Hsu, C.N.: Normco: deep disease normalization for biomedical knowledge base construction. In: Automated Knowledge Base Construction (2019). https://openreview.net/forum?id=BJerQWcp6Q

48.

Wu, P., Hoi, S.C., Xia, H., Zhao, P., Wang, D., Miao, C.: Online multimodal deep similarity learning with application to image retrieval. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 153–162 (2013)

49.

Zhao, S., Liu, T., Zhao, S., Wang, F.: A neural multi-task learning framework to jointly model medical named entity recognition and normalization. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 817–824 (2019)

50.

Zhavoronkov, A., et al.: Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37(9), 1038–1040 (2019)CrossRef

51.

Zhu, M., Celikkaya, B., Bhatia, P., Reddy, C.K.: Latte: Latent type modeling for biomedical entity linking. arXiv preprint arXiv:1911.09787 (2019)

Title: Drug and Disease Interpretation Learning with Biomedical Entity Representation Transformer
Authors: Zulfat Miftahutdinov
Artur Kadurin
Roman Kudrin
Elena Tutubalina
Publisher: Springer International Publishing
Book: Advances in Information Retrieval
Print ISBN: 978-3-030-72112-1

Electronic ISBN: 978-3-030-72113-8

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-3-030-72113-8_30

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"