Skip to main content

2020 | OriginalPaper | Buchkapitel

Query or Document Translation for Academic Search – What’s the Real Difference?

verfasst von : Vivien Petras, Andreas Lüschow, Roland Ramthun, Juliane Stiller, Cristina España-Bonet, Sophie Henning

Erschienen in: Experimental IR Meets Multilinguality, Multimodality, and Interaction

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We compare query and document translation from and to English, French, German and Spanish for multilingual retrieval in an academic search portal: PubPsych. Both translation approaches improve the retrieval performance of the system with document translation providing better results. Performance inversely correlates with the amount of available original language documents. The more documents already available in a language, the fewer improvements can be observed. Retrieval performance with English as a source language does not improve with translation as most documents already contained English-language content in our text collection. The large-scale evaluation study is based on a corpus of more than 1M metadata documents and 50 real queries taken from the query log files of the portal.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
4
A reviewer of this paper pointed out that recall-oriented searches for systematic reviews are another important use case for academic search portals. This use case was not addressed in this study.
 
Literatur
1.
Zurück zum Zitat Ammon, U.: Global scientific communication: open questions and policy suggestions. AILA Rev. 20, 123–133 (2007)CrossRef Ammon, U.: Global scientific communication: open questions and policy suggestions. AILA Rev. 20, 123–133 (2007)CrossRef
2.
Zurück zum Zitat Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization at the 43rd Annual Meeting of the Association of Computational Linguistics (ACL-2005), Ann Arbor, Michigan, June 2005 Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization at the 43rd Annual Meeting of the Association of Computational Linguistics (ACL-2005), Ann Arbor, Michigan, June 2005
3.
Zurück zum Zitat Bernardi, R., et al.: Multilingual search in libraries. The case-study of the Free University of Bozen-Bolzano. In: LREC, pp. 2287–2290 (2006) Bernardi, R., et al.: Multilingual search in libraries. The case-study of the Free University of Bozen-Bolzano. In: LREC, pp. 2287–2290 (2006)
4.
Zurück zum Zitat Biswas, S.C.: Multilingual access to information in a networked environment character encoding & unicode standard. In: INFLIBNET 3rd Convention Planner, Assam University, Silchar, 10–11 November 2005, pp. 176–186. INFLIBNET Centre (2005). http://hdl.handle.net/1944/1391 Biswas, S.C.: Multilingual access to information in a networked environment character encoding & unicode standard. In: INFLIBNET 3rd Convention Planner, Assam University, Silchar, 10–11 November 2005, pp. 176–186. INFLIBNET Centre (2005). http://​hdl.​handle.​net/​1944/​1391
5.
Zurück zum Zitat Bornmann, L., Mutz, R.: Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Am. Soc. Inf. Sci. Technol. 66(11), 2215–2222 (2015)CrossRef Bornmann, L., Mutz, R.: Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Am. Soc. Inf. Sci. Technol. 66(11), 2215–2222 (2015)CrossRef
7.
Zurück zum Zitat Broder, A.: A taxonomy of web search. SIGIR Forum 36(2), 3–10 (2002)CrossRef Broder, A.: A taxonomy of web search. SIGIR Forum 36(2), 3–10 (2002)CrossRef
9.
Zurück zum Zitat Clough, P., Sanderson, M.: User experiments with the eurovision cross-language image retrieval system. J. Am. Soc. Inform. Sci. Technol. 57(5), 697–708 (2006)CrossRef Clough, P., Sanderson, M.: User experiments with the eurovision cross-language image retrieval system. J. Am. Soc. Inform. Sci. Technol. 57(5), 697–708 (2006)CrossRef
10.
Zurück zum Zitat Di Bitetti, M.S., Ferreras, J.A.: Publish (in English) or perish: the effect on citation rate of using languages other than English in scientific publications. Ambio 46(1), 121–127 (2017)CrossRef Di Bitetti, M.S., Ferreras, J.A.: Publish (in English) or perish: the effect on citation rate of using languages other than English in scientific publications. Ambio 46(1), 121–127 (2017)CrossRef
14.
Zurück zum Zitat España-Bonet, C., Stiller, J., Ramthun, R., van Genabith, J., Petras, V.: Query translation for cross-lingual search in the academic search engine PubPsych. In: Garoufallou, E., Sartori, F., Siatri, R., Zervas, M. (eds.) MTSR 2018. CCIS, vol. 846, pp. 37–49. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-14401-2_4CrossRef España-Bonet, C., Stiller, J., Ramthun, R., van Genabith, J., Petras, V.: Query translation for cross-lingual search in the academic search engine PubPsych. In: Garoufallou, E., Sartori, F., Siatri, R., Zervas, M. (eds.) MTSR 2018. CCIS, vol. 846, pp. 37–49. Springer, Cham (2019). https://​doi.​org/​10.​1007/​978-3-030-14401-2_​4CrossRef
16.
Zurück zum Zitat Henrich, J., Heine, S.J., Norenzayan, A.: Most people are not WEIRD. Nature 466, 29 (2010)CrossRef Henrich, J., Heine, S.J., Norenzayan, A.: Most people are not WEIRD. Nature 466, 29 (2010)CrossRef
18.
19.
Zurück zum Zitat Khabsa, M., Wu, Z., Giles, C.L.: Towards better understanding of academic search. In: JCDL 2016, pp. 111–114. ACM (2016) Khabsa, M., Wu, Z., Giles, C.L.: Towards better understanding of academic search. In: JCDL 2016, pp. 111–114. ACM (2016)
20.
Zurück zum Zitat Király, P.: Query translation in Europeana. Code4Lib J. 27 (2015) Király, P.: Query translation in Europeana. Code4Lib J. 27 (2015)
21.
Zurück zum Zitat Kornadt, H.J., Trommsdorff, G., Kobayashi, R.B.: “Mein Hund hat mich bestorben”: sprachlicher Ausdruck von Gefühlen im deutsch-japanischen Vergleich. In: Kornadt, H.J. (ed.) Sprache und Kognition: Perspektiven moderner Sprachpsychologie, pp. 233–250. Spektrum Akad. Verl., Heidelberg (1994) Kornadt, H.J., Trommsdorff, G., Kobayashi, R.B.: “Mein Hund hat mich bestorben”: sprachlicher Ausdruck von Gefühlen im deutsch-japanischen Vergleich. In: Kornadt, H.J. (ed.) Sprache und Kognition: Perspektiven moderner Sprachpsychologie, pp. 233–250. Spektrum Akad. Verl., Heidelberg (1994)
22.
Zurück zum Zitat Li, X., Schijvenaars, B.J., de Rijke, M.: Investigating queries and search failures in academic search. Inf. Process. Manag. 53(3), 666–683 (2017)CrossRef Li, X., Schijvenaars, B.J., de Rijke, M.: Investigating queries and search failures in academic search. Inf. Process. Manag. 53(3), 666–683 (2017)CrossRef
23.
Zurück zum Zitat McCarley, J.S.: Should we translate the documents or the queries in cross-language information retrieval? In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, ACL 1999, USA, pp. 208–299 (1999). https://doi.org/10.3115/1034678.1034716 McCarley, J.S.: Should we translate the documents or the queries in cross-language information retrieval? In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, ACL 1999, USA, pp. 208–299 (1999). https://​doi.​org/​10.​3115/​1034678.​1034716
24.
Zurück zum Zitat Nikoulina, V., Kovachev, B., Lagos, N., Monz, C.: Adaptation of statistical machine translation model for cross-lingual information retrieval in a service context. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 109–119 (2012) Nikoulina, V., Kovachev, B., Lagos, N., Monz, C.: Adaptation of statistical machine translation model for cross-lingual information retrieval in a service context. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 109–119 (2012)
25.
Zurück zum Zitat Nzomo, P., Ajiferuke, I., Vaughan, L., McKenzie, P.: Multilingual information retrieval & use: perceptions and practices amongst bi/multilingual academic users. J. Acad. Librariansh. 42(5), 495–502 (2016)CrossRef Nzomo, P., Ajiferuke, I., Vaughan, L., McKenzie, P.: Multilingual information retrieval & use: perceptions and practices amongst bi/multilingual academic users. J. Acad. Librariansh. 42(5), 495–502 (2016)CrossRef
26.
Zurück zum Zitat Oard, D.W.: Serving users in many languages: cross-language information retrieval for digital libraries. D-Lib Mag. (1997) Oard, D.W.: Serving users in many languages: cross-language information retrieval for digital libraries. D-Lib Mag. (1997)
28.
Zurück zum Zitat Oard, D.W., Hackett, P.G.: Document translation for cross-language text retrieval at the University of Maryland. In: Proceedings of the Sixth Text REtrieval Conference (TREC-6), pp. 687–696 (1997) Oard, D.W., Hackett, P.G.: Document translation for cross-language text retrieval at the University of Maryland. In: Proceedings of the Sixth Text REtrieval Conference (TREC-6), pp. 687–696 (1997)
29.
Zurück zum Zitat Palotti, J.A., Hanbury, A., Müller, H., Kahn Jr., C.E.: How users search and what they search for in the medical domain. Inf. Retrieval 19(1–2), 189–224 (2016)CrossRef Palotti, J.A., Hanbury, A., Müller, H., Kahn Jr., C.E.: How users search and what they search for in the medical domain. Inf. Retrieval 19(1–2), 189–224 (2016)CrossRef
30.
Zurück zum Zitat Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the Association of Computational Linguistics, pp. 311–318 (2002) Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the Association of Computational Linguistics, pp. 311–318 (2002)
33.
Zurück zum Zitat Sanderson, M., et al.: Test collection based evaluation of information retrieval systems. Found. Trends® Inform. Retrieval 4(4), 247–375 (2010)CrossRef Sanderson, M., et al.: Test collection based evaluation of information retrieval systems. Found. Trends® Inform. Retrieval 4(4), 247–375 (2010)CrossRef
35.
Zurück zum Zitat Schuers, M., et al.: Lost in translation? A multilingual query builder improves the quality of pubmed queries: a randomised controlled trial. BMC Med. Inform. Decis. Mak. 17(1), 94 (2017)CrossRef Schuers, M., et al.: Lost in translation? A multilingual query builder improves the quality of pubmed queries: a randomised controlled trial. BMC Med. Inform. Decis. Mak. 17(1), 94 (2017)CrossRef
36.
Zurück zum Zitat Türe, F., Boschee, E.: Learning to translate: a query-specific combination approach for cross-lingual information retrieval. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 589–599 (2014) Türe, F., Boschee, E.: Learning to translate: a query-specific combination approach for cross-lingual information retrieval. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 589–599 (2014)
37.
Zurück zum Zitat Uhl, M.: Survey on European psychology publication issues. Psychol. Sci. Q. 51(1), 19–26 (2009) Uhl, M.: Survey on European psychology publication issues. Psychol. Sci. Q. 51(1), 19–26 (2009)
38.
Zurück zum Zitat Vanopstal, K., Buysschaert, J., Laureys, G., Stichele, R.V.: Lost in PubMed. Factors influencing the success of medical information retrieval. Expert Syst. Appl. 40(10), 4106–4114 (2013)CrossRef Vanopstal, K., Buysschaert, J., Laureys, G., Stichele, R.V.: Lost in PubMed. Factors influencing the success of medical information retrieval. Expert Syst. Appl. 40(10), 4106–4114 (2013)CrossRef
40.
Zurück zum Zitat Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30. pp. 5998–6008. Curran Associates, Inc. (2017) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30. pp. 5998–6008. Curran Associates, Inc. (2017)
42.
Zurück zum Zitat Weichselgartner, E., Baier, C., Ramthun, R.: Pubpsych: a powerful research tool providing access to a broad supranational body of psychological knowledge. Datenbank-Spektrum 17(1), 35–39 (2017)CrossRef Weichselgartner, E., Baier, C., Ramthun, R.: Pubpsych: a powerful research tool providing access to a broad supranational body of psychological knowledge. Datenbank-Spektrum 17(1), 35–39 (2017)CrossRef
43.
Zurück zum Zitat Yi, K., Beheshti, J., Cole, C., Leide, J.E., Large, A.: User search behavior of domain-specific information retrieval systems: an analysis of the query logs from PsycINFO and ABC-Clio’s historical abstracts-America: history and life: research articles. J. Am. Soc. Inf. Sci. Technol. 57(9), 1208–1220 (2006)CrossRef Yi, K., Beheshti, J., Cole, C., Leide, J.E., Large, A.: User search behavior of domain-specific information retrieval systems: an analysis of the query logs from PsycINFO and ABC-Clio’s historical abstracts-America: history and life: research articles. J. Am. Soc. Inf. Sci. Technol. 57(9), 1208–1220 (2006)CrossRef
44.
Zurück zum Zitat Zhang, Y.: Improved cross-language information retrieval via disambiguation and vocabulary discovery. Ph.D. thesis, School of Computer Science and Information Technology RMIT University, Melbourne, Victoria, Australia (2006) Zhang, Y.: Improved cross-language information retrieval via disambiguation and vocabulary discovery. Ph.D. thesis, School of Computer Science and Information Technology RMIT University, Melbourne, Victoria, Australia (2006)
Metadaten
Titel
Query or Document Translation for Academic Search – What’s the Real Difference?
verfasst von
Vivien Petras
Andreas Lüschow
Roland Ramthun
Juliane Stiller
Cristina España-Bonet
Sophie Henning
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-58219-7_3