Skip to main content
Top

2022 | OriginalPaper | Chapter

Text Mining Enhancements for Image Recognition of Gene Names and Gene Relations

Authors : Yijie Ren, Fei He, Jing Qu, Yifan Li, Joshua Thompson, Mark Hannink, Mihail Popescu, Dong Xu

Published in: Computational Intelligence Methods for Bioinformatics and Biostatistics

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The volume of the biological literature has been increasing fast, which leads to a rapid growth of biological pathway figures included in the related biological papers. Each pathway figure encompasses rich biological information, consisting of gene names and gene relations. However, manual curations for pathway figures require tremendous time and labor. While leveraging advanced image understanding models may accelerate the process of curations, the accuracy of these models still needs improvements. Since each pathway figure is associated with a paper, most of the gene names and gene relations in a pathway figure also appear in the related paper text, where we can utilize text mining to improve the image recognition results. In this paper, we applied a fuzzy match method to detect gene names with different “gene dictionaries,” as well as gene co-occurrence in the plain text for suggesting gene relations. We have demonstrated that the performance of image understanding for both gene name recognitions and gene relation extractions can be improved with the help of text mining methods. All the data and code are available at GitHub (https://​github.​com/​lyfer233/​Text-Mining-Enhancements-for-Image-Recognition-of-Gene-Names-and-Gene-Relations).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference He, F., et al.: Extracting molecular entities and their interactions from pathway figures based on deep learning. In: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Niagara Falls, NY, USA, pp. 397–404. Association for Computing Machinery (2019) He, F., et al.: Extracting molecular entities and their interactions from pathway figures based on deep learning. In: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Niagara Falls, NY, USA, pp. 397–404. Association for Computing Machinery (2019)
4.
5.
go back to reference Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2019)PubMedCentral Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2019)PubMedCentral
6.
go back to reference Kim, D., et al.: A neural named entity recognition and multi-type normalization tool for biomedical text mining. IEEE Access 7, 73729–73740 (2019)CrossRef Kim, D., et al.: A neural named entity recognition and multi-type normalization tool for biomedical text mining. IEEE Access 7, 73729–73740 (2019)CrossRef
7.
go back to reference Kim, M., Baek, S.H., Song, M.: Relation extraction for biological pathway construction using node2vec. BMC Bioinform. 19(8), 206 (2018)CrossRef Kim, M., Baek, S.H., Song, M.: Relation extraction for biological pathway construction using node2vec. BMC Bioinform. 19(8), 206 (2018)CrossRef
8.
go back to reference Zhou, J., Fu, B.-Q.: The research on gene-disease association based on text-mining of PubMed. BMC Bioinform. 19(1), 37 (2018)CrossRef Zhou, J., Fu, B.-Q.: The research on gene-disease association based on text-mining of PubMed. BMC Bioinform. 19(1), 37 (2018)CrossRef
9.
go back to reference Braschi, B., et al.: Genenames.org: the HGNC and VGNC resources in 2019. Nucleic Acids Res. 47(D1), D786–D792 (2018) Braschi, B., et al.: Genenames.org: the HGNC and VGNC resources in 2019. Nucleic Acids Res. 47(D1), D786–D792 (2018)
10.
go back to reference Tafti, A.P., Baghaie, A., Assefi, M., Arabnia, H.R., Yu, Z., Peissig, P.: OCR as a service: an experimental evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym. In: Bebis, G., et al. (eds.) ISVC 2016. LNCS, vol. 10072, pp. 735–746. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50835-1_66 Tafti, A.P., Baghaie, A., Assefi, M., Arabnia, H.R., Yu, Z., Peissig, P.: OCR as a service: an experimental evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym. In: Bebis, G., et al. (eds.) ISVC 2016. LNCS, vol. 10072, pp. 735–746. Springer, Cham (2016). https://​doi.​org/​10.​1007/​978-3-319-50835-1_​66
11.
go back to reference Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Dokl. Akad. Nauk SSSR 10, 707–710 (1965) Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Dokl. Akad. Nauk SSSR 10, 707–710 (1965)
12.
go back to reference Kato, H., Katoh, R., Kitamura, M.: Dual regulation of cadmium-induced apoptosis by mTORC1 through selective induction of IRE1 branches in unfolded protein response. PLoS ONE 8(5), e64344–e64344 (2013)CrossRefPubMedPubMedCentral Kato, H., Katoh, R., Kitamura, M.: Dual regulation of cadmium-induced apoptosis by mTORC1 through selective induction of IRE1 branches in unfolded protein response. PLoS ONE 8(5), e64344–e64344 (2013)CrossRefPubMedPubMedCentral
13.
go back to reference Yu, Q., et al.: Fibronectin promotes the malignancy of glioma stem-like cells via modulation of cell adhesion, differentiation, proliferation and chemoresistance. Front. Mol. Neurosci. 11, 130 (2018)CrossRefPubMedPubMedCentral Yu, Q., et al.: Fibronectin promotes the malignancy of glioma stem-like cells via modulation of cell adhesion, differentiation, proliferation and chemoresistance. Front. Mol. Neurosci. 11, 130 (2018)CrossRefPubMedPubMedCentral
Metadata
Title
Text Mining Enhancements for Image Recognition of Gene Names and Gene Relations
Authors
Yijie Ren
Fei He
Jing Qu
Yifan Li
Joshua Thompson
Mark Hannink
Mihail Popescu
Dong Xu
Copyright Year
2022
DOI
https://doi.org/10.1007/978-3-031-20837-9_11

Premium Partner