ABSTRACT
Query translation is an important task in cross-language information retrieval (CLIR) aiming to translate queries into languages used in documents. The purpose of this paper is to investigate the necessity of translating query terms, which might differ from one term to another. Some untranslated terms cause irreparable performance drop while others do not. We propose an approach to estimate the translation probability of a query term, which helps decide if it should be translated or not. The approach learns regression and classification models based on a rich set of linguistic and statistical properties of the term. Experiments on NTCIR-4 and NTCIR-5 English-Chinese CLIR tasks demonstrate that the proposed approach can significantly improve CLIR performance. An in-depth analysis is also provided for discussing the impact of untranslated out-of-vocabulary (OOV) query terms and translation quality of non-OOV query terms on CLIR performance.
- J. Allan, J. Callan, W. B. Croft, L. Ballesteros, J. Broglio, J. Xu, and H. Shu. Inquery at trec-5. In Proc. of the Fifth Text Retrieval Conference TREC-5, pages 119--132, 1997.Google Scholar
- L. Ballesteros and W. B. Croft. Dictionary methods for cross-lingual information retrieval. In Database and Expert Systems Applications, pages 791--801, 1996. Google ScholarDigital Library
- L. Ballesteros and W. B. Croft. Resolving ambiguity for cross-language retrieval. In Proc. of ACM-SIGIR '98, pages 64--71, 1998. Google ScholarDigital Library
- M. Bendersky and W. B. Croft. Discovering key concepts in verbose queries. In Proc. of ACM-SIGIR '08, 2008. Google ScholarDigital Library
- J. Carbonell, Y. Yang, R. Frederking, R. Brown, Y. Geng, and D. Lee. Translingual information retrieval: A comparative evaluation. In Proc. of IJCAI, pages 708--715, 1997.Google Scholar
- C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google Scholar
- P.-J. Cheng, J.-W. Teng, R.-C. Chen, J.-H. Wang, W.-H. Lu, and L.-F. Chien. Translating unknown queries with web corpora for cross-language information retrieval. In Proc. of ACM-SIGIR '04, pages 146--153, 2004. Google ScholarDigital Library
- M. Federico and N. Bertoldi. Statistical cross-language information retrieval using n-best query translations. In Proc. of ACM-SIGIR '02, pages 167--174, 2002. Google ScholarDigital Library
- J. Gao, J.-Y. Nie, E. Xun, J. Zhang, M. Zhou, and C. Huang. Improving query translation for cross language information retrieval using statistical models. In Proc. of ACM-SIGIR '01, pages 96--104, 2001. Google ScholarDigital Library
- K. Kishida. Prediction of performance of cross-language information retrieval using automatic evaluation of translation. Library & Information Science Research, 30(2):138--144, 2008.Google ScholarCross Ref
- J. Kupiec. An algorithm for finding noun phrase correspondences in bilingual corpora. In Proc. of ACL, pages 17--22. Association for Computational Linguistics, 1993. Google ScholarDigital Library
- P. McNamee and J. Mayfield. Comparing cross-language query expansion techniques by degrading translation resources. In Proc. of ACM-SIGIR '02, pages 159--166, 2002. Google ScholarDigital Library
- J.-Y. Nie, M. Simard, P. Isabelle, and R. Durand. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In Proc. of ACM-SIGIR '99, pages 74--81, 1999. Google ScholarDigital Library
- D. Oard. A comparative study of query and document translation for cross language information retrieval. Machine Translation and the Information Soup, pages 472--483, 1998. Google ScholarDigital Library
- D. Oard and A. Diekema. Cross-language information retrieval. Anne Diekema, page 5, 1998.Google Scholar
- A. Pirkola. The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In Proc. of ACM-SIGIR '98, pages 55--63, 1998. Google ScholarDigital Library
- F. Smadja, K. McKeown, and V. Hatzivassiloglou. Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22(1):1--38, 1996. Google ScholarDigital Library
- J. Zhu and H. Wang. The effect of translation quality in mt-based cross-language information retrieval. In Proc. of ACL, pages 593--600. Association for Computational Linguistics, 2006. Google ScholarDigital Library
Index Terms
- To translate or not to translate?
Recommendations
Divide and translate: improving long distance reordering in statistical machine translation
WMT '10: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATRThis paper proposes a novel method for long distance, clause-level reordering in statistical machine translation (SMT). The proposed method separately translates clauses in the source sentence and reconstructs the target sentence using the clause ...
Extending query translation to cross-language query expansion with markov chain models
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge managementDictionary-based approaches to query translation have been widely used in Cross-Language Information Retrieval (CLIR) experiments. However, translation has been not only limited by the coverage of the dictionary, but also affected by translation ...
Exploiting query logs for cross-lingual query suggestions
Query suggestion aims to suggest relevant queries for a given query, which helps users better specify their information needs. Previous work on query suggestion has been limited to the same language. In this article, we extend it to cross-lingual query ...
Comments