Skip to main content

2021 | OriginalPaper | Buchkapitel

A Multi-instance Multi-label Weakly Supervised Approach for Dealing with Emerging MeSH Descriptors

verfasst von : Nikolaos Mylonas, Stamatis Karlos, Grigorios Tsoumakas

Erschienen in: Artificial Intelligence in Medicine

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The constant evolution of Medical Subject Headings (MeSH) vocabulary and specifically the changes in its descriptors brings forth a number of issues that need automation. The main one being that changed descriptors often lack proper ground truth articles. Therefore, the learning models which demand strong supervision are not directly applicable, settling the predictions on such changes not a straightforward task. The importance of this problem is also enforced by its multi-label nature and the fine-grained character of the examined class-descriptors, factors that demand a lot of human resources. In this work, we alleviate these issues through retrieving insights from a source of information about those descriptors present in MeSH in order to create a weakly-labeled train set. Furthermore, we exploit short-text information per article, implementing an averaging transformation on the corresponding sentence embeddings, applying a similarity mechanism for assigning weak-labels to our formatted data set, thus we named our approach WeakMeSH. The benefits of applying the proposed end-to-end approach are examined on a large-scale subset of the BioASQ 2018 data set consisting of 900 thousand instances, investigating two separate groups of MeSH changes: brand new and complex changes. Our performance tested on BioASQ 2020 data set against several other approaches that can either distill weak information on their own or apply alternative transformations against the proposed one was proven highly competitive.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Balikas, G., Krithara, A., Partalas, I., Paliouras, G.: BioASQ: a challenge on large-scale biomedical semantic indexing and question answering. In: Müller, H., Jimenez del Toro, O.A., Hanbury, A., Langs, G., Foncubierta Rodríguez, A. (eds.) Multimodal Retrieval in the Medical Domain. LNCS, vol. 9059, pp. 26–39. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24471-6_3CrossRef Balikas, G., Krithara, A., Partalas, I., Paliouras, G.: BioASQ: a challenge on large-scale biomedical semantic indexing and question answering. In: Müller, H., Jimenez del Toro, O.A., Hanbury, A., Langs, G., Foncubierta Rodríguez, A. (eds.) Multimodal Retrieval in the Medical Domain. LNCS, vol. 9059, pp. 26–39. Springer, Cham (2015). https://​doi.​org/​10.​1007/​978-3-319-24471-6_​3CrossRef
3.
Zurück zum Zitat Dai, S., You, R., Lu, Z., Huang, X., Mamitsuka, H., Zhu, S.: FullMeSH: improving large-scale MeSH indexing with full text. Bioinform 36(5), 1533–1541 (2020)CrossRef Dai, S., You, R., Lu, Z., Huang, X., Mamitsuka, H., Zhu, S.: FullMeSH: improving large-scale MeSH indexing with full text. Bioinform 36(5), 1533–1541 (2020)CrossRef
4.
Zurück zum Zitat Jain, S., R., K., Kuo, T., Bhargava, S., Lin, G., Hsu, C.: Weakly supervised learning of biomedical information extraction from curated data. BMC Bioinform. 17(S-1), 1–12 (2016) Jain, S., R., K., Kuo, T., Bhargava, S., Lin, G., Hsu, C.: Weakly supervised learning of biomedical information extraction from curated data. BMC Bioinform. 17(S-1), 1–12 (2016)
5.
Zurück zum Zitat Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics (2019) Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics (2019)
6.
Zurück zum Zitat Li, X., Yang, B.: A pseudo label based dataless Naive Bayes algorithm for text classification with seed words. In: Proceedings of the 27th International Conference on Computational Linguistics. pp. 1908–1917. ACM, Santa Fe, New Mexico, USA, August 2018 Li, X., Yang, B.: A pseudo label based dataless Naive Bayes algorithm for text classification with seed words. In: Proceedings of the 27th International Conference on Computational Linguistics. pp. 1908–1917. ACM, Santa Fe, New Mexico, USA, August 2018
7.
Zurück zum Zitat Meng, Y., Shen, J., Zhang, C., Han, J.: Weakly-supervised neural text classification. In: Cuzzocrea, A., et al. (eds.) CIKM, pp. 983–992. ACM (2018) Meng, Y., Shen, J., Zhang, C., Han, J.: Weakly-supervised neural text classification. In: Cuzzocrea, A., et al. (eds.) CIKM, pp. 983–992. ACM (2018)
8.
Zurück zum Zitat Mikalsen, K.Ø., et al.: Using anchors from free text in electronic health records to diagnose postoperative delirium. Comput. Meth. Programs Biomed. 152, 105–114 (2017)CrossRef Mikalsen, K.Ø., et al.: Using anchors from free text in electronic health records to diagnose postoperative delirium. Comput. Meth. Programs Biomed. 152, 105–114 (2017)CrossRef
10.
Zurück zum Zitat Mylonas, N., Karlos, S., Tsoumakas, G.: Zero-shot classification of biomedical articles with emerging mesh descriptors. In: 11th Hellenic Conference on Artificial Intelligence, pp. 175–184. SETN 2020. Association for Computing Machinery, New York, NY, USA (2020) Mylonas, N., Karlos, S., Tsoumakas, G.: Zero-shot classification of biomedical articles with emerging mesh descriptors. In: 11th Hellenic Conference on Artificial Intelligence, pp. 175–184. SETN 2020. Association for Computing Machinery, New York, NY, USA (2020)
11.
Zurück zum Zitat Nentidis, A., Krithara, A., Tsoumakas, G., Paliouras, G.: What is all this new mesh about? exploring the semantic provenance of new descriptors in the mesh thesaurus (2021) Nentidis, A., Krithara, A., Tsoumakas, G., Paliouras, G.: What is all this new mesh about? exploring the semantic provenance of new descriptors in the mesh thesaurus (2021)
12.
Zurück zum Zitat Papanikolaou, Y., Tsoumakas, G., Laliotis, M., Markantonatos, N., Vlahavas, I.: Large-scale online semantic indexing of biomedical articles via an ensemble of multi-label classification models. J. Biomed. Semant. 8(1), 1–13 (2017). https://doi.org/10.1186/s13326-017-0150-0 Papanikolaou, Y., Tsoumakas, G., Laliotis, M., Markantonatos, N., Vlahavas, I.: Large-scale online semantic indexing of biomedical articles via an ensemble of multi-label classification models. J. Biomed. Semant. 8(1), 1–13 (2017). https://​doi.​org/​10.​1186/​s13326-017-0150-0
14.
Zurück zum Zitat Reynolds, D.: Gaussian Mixture Models. Encycl. Biometrics, 741, 659–663 (2009) Reynolds, D.: Gaussian Mixture Models. Encycl. Biometrics, 741, 659–663 (2009)
15.
Zurück zum Zitat Romera-Paredes, B., Torr, P.H.S.: An embarrassingly simple approach to zero-shot learning. In: Bach, F.R., Blei, D.M. (eds.) ICML, Lille, France. JMLR Workshop and Conference Proceedings, vol. 37, pp. 2152–2161. JMLR.org (2015) Romera-Paredes, B., Torr, P.H.S.: An embarrassingly simple approach to zero-shot learning. In: Bach, F.R., Blei, D.M. (eds.) ICML, Lille, France. JMLR Workshop and Conference Proceedings, vol. 37, pp. 2152–2161. JMLR.org (2015)
16.
Zurück zum Zitat Varma, P., Ré, C.: Snuba: automating weak supervision to label training data. Proc. VLDB Endow. 12(3), 223–236 (2018)CrossRef Varma, P., Ré, C.: Snuba: automating weak supervision to label training data. Proc. VLDB Endow. 12(3), 223–236 (2018)CrossRef
17.
Zurück zum Zitat Xun, G., Jha, K., Zhang, A.: MeSHProbeNet-P: improving large-scale MeSH indexing with personalizable MeSH probes. ACM Trans. Knowl. Discov. Data 15(1), 14 (2020) Xun, G., Jha, K., Zhang, A.: MeSHProbeNet-P: improving large-scale MeSH indexing with personalizable MeSH probes. ACM Trans. Knowl. Discov. Data 15(1), 14 (2020)
18.
Zurück zum Zitat Yin, W., Hay, J., Roth, D.: Benchmarking zero-shot text classification: datasets, evaluation and entailment approach. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) EMNLP-IJCNLP, pp. 3912–3921. ACM (2019) Yin, W., Hay, J., Roth, D.: Benchmarking zero-shot text classification: datasets, evaluation and entailment approach. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) EMNLP-IJCNLP, pp. 3912–3921. ACM (2019)
Metadaten
Titel
A Multi-instance Multi-label Weakly Supervised Approach for Dealing with Emerging MeSH Descriptors
verfasst von
Nikolaos Mylonas
Stamatis Karlos
Grigorios Tsoumakas
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-77211-6_47