Skip to main content

2014 | OriginalPaper | Buchkapitel

A Hybrid Approach to Compiling Bilingual Dictionaries of Medical Terms from Parallel Corpora

verfasst von : Georgios Kontonatsios, Claudiu Mihăilă, Ioannis Korkontzelos, Paul Thompson, Sophia Ananiadou

Erschienen in: Statistical Language and Speech Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Existing bilingual dictionaries of technical terms suffer from limited coverage and are only available for a small number of language pairs. In response to these problems, we present a method for automatically constructing and updating bilingual dictionaries of medical terms by exploiting parallel corpora. We focus on the extraction of multi-word terms, which constitute a challenging problem for term alignment algorithms. We apply our method to two low resourced language pairs, namely English-Greek and English-Romanian, for which such resources did not previously exist in the medical domain. Our approach combines two term alignment models to improve the accuracy of the extracted medical term translations. Evaluation results show that the precision of our method is \(86\,\%\) and \(81\,\%\) for English-Greek and English-Romanian respectively, considering only the highest ranked candidate translation.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aronson, A.R.: Effective mapping of biomedical text to the umls metathesaurus: the metamap program. In: Proceedings of the AMIA Symposium, p. 17. American Medical Informatics Association (2001) Aronson, A.R.: Effective mapping of biomedical text to the umls metathesaurus: the metamap program. In: Proceedings of the AMIA Symposium, p. 17. American Medical Informatics Association (2001)
2.
Zurück zum Zitat Ballesteros, L., Croft, W.: Phrasal translation and query expansion techniques for cross-language information retrieval. In: ACM SIGIR Forum, vol. 31, pp. 84–91. ACM (1997) Ballesteros, L., Croft, W.: Phrasal translation and query expansion techniques for cross-language information retrieval. In: ACM SIGIR Forum, vol. 31, pp. 84–91. ACM (1997)
3.
Zurück zum Zitat Bouamor, D., Semmar, N., Zweigenbaum, P.: Identifying bilingual multi-word expressions for statistical machine translation. In: LREC, pp. 674–679 (2012) Bouamor, D., Semmar, N., Zweigenbaum, P.: Identifying bilingual multi-word expressions for statistical machine translation. In: LREC, pp. 674–679 (2012)
4.
Zurück zum Zitat Brown, P., Pietra, V., Pietra, S., Mercer, R.: The mathematics of statistical machine translation: parameter estimation. Comput. linguist. 19(2), 263–311 (1993) Brown, P., Pietra, V., Pietra, S., Mercer, R.: The mathematics of statistical machine translation: parameter estimation. Comput. linguist. 19(2), 263–311 (1993)
5.
Zurück zum Zitat Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. linguist. 16(1), 22–29 (1990) Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. linguist. 16(1), 22–29 (1990)
6.
Zurück zum Zitat Dagan, I., Church, K.: Termight: identifying and translating technical terminology. In: Proceedings of the Fourth Conference on Applied Natural Language Processing, pp. 34–40. Association for Computational Linguistics (1994) Dagan, I., Church, K.: Termight: identifying and translating technical terminology. In: Proceedings of the Fourth Conference on Applied Natural Language Processing, pp. 34–40. Association for Computational Linguistics (1994)
7.
Zurück zum Zitat Dagan, I., Church, K.W., Gale, W.A.: Robust bilingual word alignment for machine aided translation. In: Proceedings of the Workshop on Very Large Corpora, pp. 1–8 (1993) Dagan, I., Church, K.W., Gale, W.A.: Robust bilingual word alignment for machine aided translation. In: Proceedings of the Workshop on Very Large Corpora, pp. 1–8 (1993)
8.
Zurück zum Zitat Delpech, E.: Evaluation of terminologies acquired from comparable corpora: an application perspective. In: Proceedings of the 18th International Nordic Conference of Computational Linguistics (NODALIDA 2011), pp. 66–73 (2011) Delpech, E.: Evaluation of terminologies acquired from comparable corpora: an application perspective. In: Proceedings of the 18th International Nordic Conference of Computational Linguistics (NODALIDA 2011), pp. 66–73 (2011)
9.
Zurück zum Zitat Van der Eijk, P.: Automating the acquisition of bilingual terminology. In: Proceedings of the Sixth Conference on European Chapter of the Association for Computational Linguistics, pp. 113–119. Association for Computational Linguistics (1993) Van der Eijk, P.: Automating the acquisition of bilingual terminology. In: Proceedings of the Sixth Conference on European Chapter of the Association for Computational Linguistics, pp. 113–119. Association for Computational Linguistics (1993)
10.
Zurück zum Zitat Fung, P., McKeown, K.: A technical word-and term-translation aid using noisy parallel corpora across language groups. Mach. Transl. 12(1), 53–87 (1997)CrossRef Fung, P., McKeown, K.: A technical word-and term-translation aid using noisy parallel corpora across language groups. Mach. Transl. 12(1), 53–87 (1997)CrossRef
11.
Zurück zum Zitat Fung, P., Yee, L.Y.: An ir approach for translating new words from nonparallel, comparable texts. In: Proceedings of the 17th International Conference on Computational linguistics, vol. 1, pp. 414–420. Association for Computational Linguistics (1998) Fung, P., Yee, L.Y.: An ir approach for translating new words from nonparallel, comparable texts. In: Proceedings of the 17th International Conference on Computational linguistics, vol. 1, pp. 414–420. Association for Computational Linguistics (1998)
12.
Zurück zum Zitat Habash, N.: Four techniques for online handling of out-of-vocabulary words in arabic-english statistical machine translation. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pp. 57–60. Association for Computational Linguistics (2008) Habash, N.: Four techniques for online handling of out-of-vocabulary words in arabic-english statistical machine translation. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pp. 57–60. Association for Computational Linguistics (2008)
13.
Zurück zum Zitat Haghighi, A., Liang, P., Berg-Kirkpatrick, T., Klein, D.: Learning bilingual lexicons from monolingual corpora. In: ACL, vol. 2008, pp. 771–779 (2008) Haghighi, A., Liang, P., Berg-Kirkpatrick, T., Klein, D.: Learning bilingual lexicons from monolingual corpora. In: ACL, vol. 2008, pp. 771–779 (2008)
14.
Zurück zum Zitat Harris, Z.: Distributional structure. Word (1954) Harris, Z.: Distributional structure. Word (1954)
15.
Zurück zum Zitat Irvine, A., Callison-Burch, C.: Combining bilingual and comparable corpora for low resource machine translation. In: Proceedings of the Eighth Workshop on Statistical Machine Translation. Association for Computational Linguistics, August 2013 Irvine, A., Callison-Burch, C.: Combining bilingual and comparable corpora for low resource machine translation. In: Proceedings of the Eighth Workshop on Statistical Machine Translation. Association for Computational Linguistics, August 2013
16.
Zurück zum Zitat Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 48–54. Association for Computational Linguistics (2003) Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 48–54. Association for Computational Linguistics (2003)
17.
Zurück zum Zitat Kontonatsios, G., Korkontzelos, I., Tsujii, J., Ananiadou, S.: Using a random forest classifier to compile bilingual dictionaries of technical terms from comparable corpora. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics: Short Papers, vol. 2, pp. 111–116. Association for Computational Linguistics, April 2014, http://www.aclweb.org/anthology/E14-4022 Kontonatsios, G., Korkontzelos, I., Tsujii, J., Ananiadou, S.: Using a random forest classifier to compile bilingual dictionaries of technical terms from comparable corpora. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics: Short Papers, vol. 2, pp. 111–116. Association for Computational Linguistics, April 2014, http://​www.​aclweb.​org/​anthology/​E14-4022
18.
Zurück zum Zitat Kontonatsios, G., Korkontzelos, I., Tsujii, J., Ananiadou, S.: Using random forest to recognise translation equivalents of biomedical terms across languages. In: Proceedings of the Sixth Workshop on Building and Using Comparable Corpora, pp. 95–104. Association for Computational Linguistics, August 2013, http://www.aclweb.org/anthology/W13-2512 Kontonatsios, G., Korkontzelos, I., Tsujii, J., Ananiadou, S.: Using random forest to recognise translation equivalents of biomedical terms across languages. In: Proceedings of the Sixth Workshop on Building and Using Comparable Corpora, pp. 95–104. Association for Computational Linguistics, August 2013, http://​www.​aclweb.​org/​anthology/​W13-2512
19.
Zurück zum Zitat Kupiec, J.: An algorithm for finding noun phrase correspondences in bilingual corpora. In: Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, pp. 17–22. Association for Computational Linguistics (1993) Kupiec, J.: An algorithm for finding noun phrase correspondences in bilingual corpora. In: Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, pp. 17–22. Association for Computational Linguistics (1993)
20.
Zurück zum Zitat Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)CrossRefMATHMathSciNet Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)CrossRefMATHMathSciNet
21.
Zurück zum Zitat Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. linguist. 29(1), 19–51 (2003)CrossRefMATH Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. linguist. 29(1), 19–51 (2003)CrossRefMATH
22.
Zurück zum Zitat Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002) Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
23.
Zurück zum Zitat Pustejovsky, J., Castano, J., Cochran, B., Kotecki, M., Morrell, M.: Automatic extraction of acronym-meaning pairs from medline databases. Studies in health technology and informatics, pp. 371–375 (2001) Pustejovsky, J., Castano, J., Cochran, B., Kotecki, M., Morrell, M.: Automatic extraction of acronym-meaning pairs from medline databases. Studies in health technology and informatics, pp. 371–375 (2001)
24.
Zurück zum Zitat Rapp, R.: Automatic identification of word translations from unrelated english and german corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 519–526. Association for Computational Linguistics (1999) Rapp, R.: Automatic identification of word translations from unrelated english and german corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 519–526. Association for Computational Linguistics (1999)
25.
Zurück zum Zitat Smadja, F., McKeown, K.R., Hatzivassiloglou, V.: Translating collocations for bilingual lexicons: a statistical approach. Comput. linguist. 22(1), 1–38 (1996) Smadja, F., McKeown, K.R., Hatzivassiloglou, V.: Translating collocations for bilingual lexicons: a statistical approach. Comput. linguist. 22(1), 1–38 (1996)
26.
Zurück zum Zitat Tamura, A., Watanabe, T., Sumita, E.: Bilingual lexicon extraction from comparable corpora using label propagation. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 24–36. Association for Computational Linguistics (2012) Tamura, A., Watanabe, T., Sumita, E.: Bilingual lexicon extraction from comparable corpora using label propagation. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 24–36. Association for Computational Linguistics (2012)
27.
Zurück zum Zitat Tiedemann, J.: News from opus-a collection of multilingual parallel corpora with tools and interfaces. In: Recent Advances in Natural Language Processing, vol. 5, pp. 237–248 (2009) Tiedemann, J.: News from opus-a collection of multilingual parallel corpora with tools and interfaces. In: Recent Advances in Natural Language Processing, vol. 5, pp. 237–248 (2009)
28.
Zurück zum Zitat Vintar, S., Fiser, D.: Harvesting multi-word expressions from parallel corpora. In: LREC (2008) Vintar, S., Fiser, D.: Harvesting multi-word expressions from parallel corpora. In: LREC (2008)
29.
Zurück zum Zitat Wu, C.C., Chang, J.S.: Bilingual collocation extraction based on syntactic and statistical analyses. In: ROCLING (2003) Wu, C.C., Chang, J.S.: Bilingual collocation extraction based on syntactic and statistical analyses. In: ROCLING (2003)
30.
Zurück zum Zitat Yu, K., Tsujii, J.: Bilingual dictionary extraction from wikipedia. In: Proceedings of Machine Translation Summit XII, pp. 379–386 (2009) Yu, K., Tsujii, J.: Bilingual dictionary extraction from wikipedia. In: Proceedings of Machine Translation Summit XII, pp. 379–386 (2009)
Metadaten
Titel
A Hybrid Approach to Compiling Bilingual Dictionaries of Medical Terms from Parallel Corpora
verfasst von
Georgios Kontonatsios
Claudiu Mihăilă
Ioannis Korkontzelos
Paul Thompson
Sophia Ananiadou
Copyright-Jahr
2014
DOI
https://doi.org/10.1007/978-3-319-11397-5_4

Premium Partner