skip to main content
10.1145/1835449.1835558acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

To translate or not to translate?

Published:19 July 2010Publication History

ABSTRACT

Query translation is an important task in cross-language information retrieval (CLIR) aiming to translate queries into languages used in documents. The purpose of this paper is to investigate the necessity of translating query terms, which might differ from one term to another. Some untranslated terms cause irreparable performance drop while others do not. We propose an approach to estimate the translation probability of a query term, which helps decide if it should be translated or not. The approach learns regression and classification models based on a rich set of linguistic and statistical properties of the term. Experiments on NTCIR-4 and NTCIR-5 English-Chinese CLIR tasks demonstrate that the proposed approach can significantly improve CLIR performance. An in-depth analysis is also provided for discussing the impact of untranslated out-of-vocabulary (OOV) query terms and translation quality of non-OOV query terms on CLIR performance.

References

  1. J. Allan, J. Callan, W. B. Croft, L. Ballesteros, J. Broglio, J. Xu, and H. Shu. Inquery at trec-5. In Proc. of the Fifth Text Retrieval Conference TREC-5, pages 119--132, 1997.Google ScholarGoogle Scholar
  2. L. Ballesteros and W. B. Croft. Dictionary methods for cross-lingual information retrieval. In Database and Expert Systems Applications, pages 791--801, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Ballesteros and W. B. Croft. Resolving ambiguity for cross-language retrieval. In Proc. of ACM-SIGIR '98, pages 64--71, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Bendersky and W. B. Croft. Discovering key concepts in verbose queries. In Proc. of ACM-SIGIR '08, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Carbonell, Y. Yang, R. Frederking, R. Brown, Y. Geng, and D. Lee. Translingual information retrieval: A comparative evaluation. In Proc. of IJCAI, pages 708--715, 1997.Google ScholarGoogle Scholar
  6. C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google ScholarGoogle Scholar
  7. P.-J. Cheng, J.-W. Teng, R.-C. Chen, J.-H. Wang, W.-H. Lu, and L.-F. Chien. Translating unknown queries with web corpora for cross-language information retrieval. In Proc. of ACM-SIGIR '04, pages 146--153, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Federico and N. Bertoldi. Statistical cross-language information retrieval using n-best query translations. In Proc. of ACM-SIGIR '02, pages 167--174, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Gao, J.-Y. Nie, E. Xun, J. Zhang, M. Zhou, and C. Huang. Improving query translation for cross language information retrieval using statistical models. In Proc. of ACM-SIGIR '01, pages 96--104, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Kishida. Prediction of performance of cross-language information retrieval using automatic evaluation of translation. Library & Information Science Research, 30(2):138--144, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  11. J. Kupiec. An algorithm for finding noun phrase correspondences in bilingual corpora. In Proc. of ACL, pages 17--22. Association for Computational Linguistics, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. McNamee and J. Mayfield. Comparing cross-language query expansion techniques by degrading translation resources. In Proc. of ACM-SIGIR '02, pages 159--166, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J.-Y. Nie, M. Simard, P. Isabelle, and R. Durand. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In Proc. of ACM-SIGIR '99, pages 74--81, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Oard. A comparative study of query and document translation for cross language information retrieval. Machine Translation and the Information Soup, pages 472--483, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Oard and A. Diekema. Cross-language information retrieval. Anne Diekema, page 5, 1998.Google ScholarGoogle Scholar
  16. A. Pirkola. The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In Proc. of ACM-SIGIR '98, pages 55--63, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Smadja, K. McKeown, and V. Hatzivassiloglou. Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22(1):1--38, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Zhu and H. Wang. The effect of translation quality in mt-based cross-language information retrieval. In Proc. of ACL, pages 593--600. Association for Computational Linguistics, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. To translate or not to translate?

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
      July 2010
      944 pages
      ISBN:9781450301534
      DOI:10.1145/1835449

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 July 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGIR '10 Paper Acceptance Rate87of520submissions,17%Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader