Skip to main content

2022 | OriginalPaper | Buchkapitel

Text Mining Enhancements for Image Recognition of Gene Names and Gene Relations

verfasst von : Yijie Ren, Fei He, Jing Qu, Yifan Li, Joshua Thompson, Mark Hannink, Mihail Popescu, Dong Xu

Erschienen in: Computational Intelligence Methods for Bioinformatics and Biostatistics

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The volume of the biological literature has been increasing fast, which leads to a rapid growth of biological pathway figures included in the related biological papers. Each pathway figure encompasses rich biological information, consisting of gene names and gene relations. However, manual curations for pathway figures require tremendous time and labor. While leveraging advanced image understanding models may accelerate the process of curations, the accuracy of these models still needs improvements. Since each pathway figure is associated with a paper, most of the gene names and gene relations in a pathway figure also appear in the related paper text, where we can utilize text mining to improve the image recognition results. In this paper, we applied a fuzzy match method to detect gene names with different “gene dictionaries,” as well as gene co-occurrence in the plain text for suggesting gene relations. We have demonstrated that the performance of image understanding for both gene name recognitions and gene relation extractions can be improved with the help of text mining methods. All the data and code are available at GitHub (https://​github.​com/​lyfer233/​Text-Mining-Enhancements-for-Image-Recognition-of-Gene-Names-and-Gene-Relations).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat He, F., et al.: Extracting molecular entities and their interactions from pathway figures based on deep learning. In: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Niagara Falls, NY, USA, pp. 397–404. Association for Computing Machinery (2019) He, F., et al.: Extracting molecular entities and their interactions from pathway figures based on deep learning. In: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Niagara Falls, NY, USA, pp. 397–404. Association for Computing Machinery (2019)
4.
Zurück zum Zitat Wei, C.-H., et al.: PubTator central: automated concept annotation for biomedical full text articles. Nucleic Acids Res. 47(W1), W587–W593 (2019)CrossRefPubMedPubMedCentral Wei, C.-H., et al.: PubTator central: automated concept annotation for biomedical full text articles. Nucleic Acids Res. 47(W1), W587–W593 (2019)CrossRefPubMedPubMedCentral
5.
Zurück zum Zitat Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2019)PubMedCentral Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2019)PubMedCentral
6.
Zurück zum Zitat Kim, D., et al.: A neural named entity recognition and multi-type normalization tool for biomedical text mining. IEEE Access 7, 73729–73740 (2019)CrossRef Kim, D., et al.: A neural named entity recognition and multi-type normalization tool for biomedical text mining. IEEE Access 7, 73729–73740 (2019)CrossRef
7.
Zurück zum Zitat Kim, M., Baek, S.H., Song, M.: Relation extraction for biological pathway construction using node2vec. BMC Bioinform. 19(8), 206 (2018)CrossRef Kim, M., Baek, S.H., Song, M.: Relation extraction for biological pathway construction using node2vec. BMC Bioinform. 19(8), 206 (2018)CrossRef
8.
Zurück zum Zitat Zhou, J., Fu, B.-Q.: The research on gene-disease association based on text-mining of PubMed. BMC Bioinform. 19(1), 37 (2018)CrossRef Zhou, J., Fu, B.-Q.: The research on gene-disease association based on text-mining of PubMed. BMC Bioinform. 19(1), 37 (2018)CrossRef
9.
Zurück zum Zitat Braschi, B., et al.: Genenames.org: the HGNC and VGNC resources in 2019. Nucleic Acids Res. 47(D1), D786–D792 (2018) Braschi, B., et al.: Genenames.org: the HGNC and VGNC resources in 2019. Nucleic Acids Res. 47(D1), D786–D792 (2018)
10.
Zurück zum Zitat Tafti, A.P., Baghaie, A., Assefi, M., Arabnia, H.R., Yu, Z., Peissig, P.: OCR as a service: an experimental evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym. In: Bebis, G., et al. (eds.) ISVC 2016. LNCS, vol. 10072, pp. 735–746. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50835-1_66 Tafti, A.P., Baghaie, A., Assefi, M., Arabnia, H.R., Yu, Z., Peissig, P.: OCR as a service: an experimental evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym. In: Bebis, G., et al. (eds.) ISVC 2016. LNCS, vol. 10072, pp. 735–746. Springer, Cham (2016). https://​doi.​org/​10.​1007/​978-3-319-50835-1_​66
11.
Zurück zum Zitat Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Dokl. Akad. Nauk SSSR 10, 707–710 (1965) Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Dokl. Akad. Nauk SSSR 10, 707–710 (1965)
12.
Zurück zum Zitat Kato, H., Katoh, R., Kitamura, M.: Dual regulation of cadmium-induced apoptosis by mTORC1 through selective induction of IRE1 branches in unfolded protein response. PLoS ONE 8(5), e64344–e64344 (2013)CrossRefPubMedPubMedCentral Kato, H., Katoh, R., Kitamura, M.: Dual regulation of cadmium-induced apoptosis by mTORC1 through selective induction of IRE1 branches in unfolded protein response. PLoS ONE 8(5), e64344–e64344 (2013)CrossRefPubMedPubMedCentral
13.
Zurück zum Zitat Yu, Q., et al.: Fibronectin promotes the malignancy of glioma stem-like cells via modulation of cell adhesion, differentiation, proliferation and chemoresistance. Front. Mol. Neurosci. 11, 130 (2018)CrossRefPubMedPubMedCentral Yu, Q., et al.: Fibronectin promotes the malignancy of glioma stem-like cells via modulation of cell adhesion, differentiation, proliferation and chemoresistance. Front. Mol. Neurosci. 11, 130 (2018)CrossRefPubMedPubMedCentral
Metadaten
Titel
Text Mining Enhancements for Image Recognition of Gene Names and Gene Relations
verfasst von
Yijie Ren
Fei He
Jing Qu
Yifan Li
Joshua Thompson
Mark Hannink
Mihail Popescu
Dong Xu
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-031-20837-9_11

Premium Partner