ABSTRACT
Query translation in Cross Language Information Retrieval (CLIR) can be performed using multiple resources. Previous attempts to combine different translation resources use simple methods such as linear combination. Unfortunately, these approaches are insufficient to combine different types of resources such as bilingual dictionaries and statistical translation models. In this paper, we use confidence measures for this combination for the purpose of English-Arabic CLIR. Confidence measure is used to adjust the original scores of translations and to create a weight of the same nature for translations with different resources. We tested this technique on two test CLIR collections from TREC and obtained encouraging improvements compared to the results of linear combination.
- Al-Onaizan, Y., Curin, J., Jahr, M., Knight, K., Lafferty, J., Melamed, D., Och, F., Purdy, D., Smith, N., and Yarowsky, D. Statistical Machine Translation. Technical Report, CLSP/JHU 99 Workshop, Baltimore, MD, 1999.Google Scholar
- Blatz, J., Fitzgerald, E., Foster, G., Gandrabur, S., Goutte, C., Kulesza, A., Sanchis, A., and Ueffing, N. Confidence estimation for machine translation. Technical Report, CLSP/JHU 2003 Summer Workshop, Baltimore, 2003.Google Scholar
- Brown, P. F., Pietra, S. A., Pietra, V. J., and Mercer, R. L. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 263--311, 1993. Google ScholarDigital Library
- Fraser, A., Xu, J., and Weischedel, R. TREC 2002 Cross-lingual Retrieval at BBN. TREC11 conference, 2002.Google Scholar
- Gandrabur, S., and Foster, G. Confidence Estimation for Text Prediction. Proceedings of the Conference on Natural Language Learning (CoNLL 2003), Edmonton, May 2003. Google ScholarDigital Library
- Hazen, T. J., Burianek, T., Polifroni, J., and Seneff, S. Recognition confidence scoring for use in speech understanding systems. Computer Speech and Language, Num. 16, pp. 49--67, 2002.Google ScholarDigital Library
- Kadri, Y., and Nie, J. Y. Query translation for English-Arabic cross language information retrieval. Proceedings of the TALN conference, 2004.Google Scholar
- Kadri, Y., and Nie, J. Y. Effective stemming for Arabic information retrieval. The challenge of Arabic for NLP/MT Conference. The British Computer Society. London, UK, 2006.Google Scholar
- Nie, J. N., Simard, M., and Foster, G. Multilingual information retrieval based on parallel texts from the Web. In LNCS 2069, C. Peters editor, CLEF2000, pages 188--201, Lisbon 2000. Google ScholarDigital Library
- Oard, D. W., and Diekema, A. Cross-Language Information Retrieval. In M. Williams (ed.), Annual review of Information science, 1998:223--256, 1998.Google Scholar
- Vogel, S., and Monson, C. Augmenting Manual Dictionaries for Statistical Machine Translation Systems. Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC), 2004.Google Scholar
- Xu, J., and Weischedel, R. Empirical studies on the impact of lexical resources on CLIR performance. Information processing & management, 41(3), 475--487, 2005. Google ScholarDigital Library
- Zhai, C., and Lafferty, J. Model-based feedback in the language modeling approach to information retrieval. Tenth International Conference on Information and Knowledge Management (CIKM 2001), 2001. Google ScholarDigital Library
- Zhai, C., and Lafferty, J. A study of smoothing methods for language models applied to ad hoc information retrieval. Proceedings of the ACM--SIGIR, 2001. Google ScholarDigital Library
Index Terms
- Combining resources with confidence measures for cross language information retrieval
Recommendations
Comparing different units for query translation in Chinese cross-language information retrieval
InfoScale '07: Proceedings of the 2nd international conference on Scalable information systemsAlthough both words and n-grams of characters have been used in Chinese IR, they have often been used as two competing methods. For cross-language IR with Chinese, word translation has been used in all previous studies. In this paper, we re-examine the ...
Statistical query translation models for cross-language information retrieval
Query translation is an important task in cross-language information retrieval (CLIR), which aims to determine the best translation words and weights for a query. This article presents three statistical query translation models that focus on the ...
Exploring Bilingual Word Vectors for Hindi-English Cross-Language Information Retrieval
ICIA-16: Proceedings of the International Conference on Informatics and AnalyticsTodays, The internet has become a source of multi-lingual content. Users are not aware of multiple languages, so the language diversity becomes a great barrier for world communication. Cross-Language Information Retrieval (CLIR) provides a solution for ...
Comments