skip to main content
10.1145/1316874.1316896acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Combining resources with confidence measures for cross language information retrieval

Published:09 November 2007Publication History

ABSTRACT

Query translation in Cross Language Information Retrieval (CLIR) can be performed using multiple resources. Previous attempts to combine different translation resources use simple methods such as linear combination. Unfortunately, these approaches are insufficient to combine different types of resources such as bilingual dictionaries and statistical translation models. In this paper, we use confidence measures for this combination for the purpose of English-Arabic CLIR. Confidence measure is used to adjust the original scores of translations and to create a weight of the same nature for translations with different resources. We tested this technique on two test CLIR collections from TREC and obtained encouraging improvements compared to the results of linear combination.

References

  1. Al-Onaizan, Y., Curin, J., Jahr, M., Knight, K., Lafferty, J., Melamed, D., Och, F., Purdy, D., Smith, N., and Yarowsky, D. Statistical Machine Translation. Technical Report, CLSP/JHU 99 Workshop, Baltimore, MD, 1999.Google ScholarGoogle Scholar
  2. Blatz, J., Fitzgerald, E., Foster, G., Gandrabur, S., Goutte, C., Kulesza, A., Sanchis, A., and Ueffing, N. Confidence estimation for machine translation. Technical Report, CLSP/JHU 2003 Summer Workshop, Baltimore, 2003.Google ScholarGoogle Scholar
  3. Brown, P. F., Pietra, S. A., Pietra, V. J., and Mercer, R. L. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 263--311, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Fraser, A., Xu, J., and Weischedel, R. TREC 2002 Cross-lingual Retrieval at BBN. TREC11 conference, 2002.Google ScholarGoogle Scholar
  5. Gandrabur, S., and Foster, G. Confidence Estimation for Text Prediction. Proceedings of the Conference on Natural Language Learning (CoNLL 2003), Edmonton, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hazen, T. J., Burianek, T., Polifroni, J., and Seneff, S. Recognition confidence scoring for use in speech understanding systems. Computer Speech and Language, Num. 16, pp. 49--67, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kadri, Y., and Nie, J. Y. Query translation for English-Arabic cross language information retrieval. Proceedings of the TALN conference, 2004.Google ScholarGoogle Scholar
  8. Kadri, Y., and Nie, J. Y. Effective stemming for Arabic information retrieval. The challenge of Arabic for NLP/MT Conference. The British Computer Society. London, UK, 2006.Google ScholarGoogle Scholar
  9. Nie, J. N., Simard, M., and Foster, G. Multilingual information retrieval based on parallel texts from the Web. In LNCS 2069, C. Peters editor, CLEF2000, pages 188--201, Lisbon 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Oard, D. W., and Diekema, A. Cross-Language Information Retrieval. In M. Williams (ed.), Annual review of Information science, 1998:223--256, 1998.Google ScholarGoogle Scholar
  11. Vogel, S., and Monson, C. Augmenting Manual Dictionaries for Statistical Machine Translation Systems. Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC), 2004.Google ScholarGoogle Scholar
  12. Xu, J., and Weischedel, R. Empirical studies on the impact of lexical resources on CLIR performance. Information processing & management, 41(3), 475--487, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Zhai, C., and Lafferty, J. Model-based feedback in the language modeling approach to information retrieval. Tenth International Conference on Information and Knowledge Management (CIKM 2001), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Zhai, C., and Lafferty, J. A study of smoothing methods for language models applied to ad hoc information retrieval. Proceedings of the ACM--SIGIR, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Combining resources with confidence measures for cross language information retrieval

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PIKM '07: Proceedings of the ACM first Ph.D. workshop in CIKM
        November 2007
        184 pages
        ISBN:9781595938329
        DOI:10.1145/1316874

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 November 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate25of62submissions,40%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader