skip to main content
10.1145/1835449.1835490acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Information-based models for ad hoc IR

Authors Info & Claims
Published:19 July 2010Publication History

ABSTRACT

We introduce in this paper the family of information-based models for ad hoc information retrieval. These models draw their inspiration from a long-standing hypothesis in IR, namely the fact that the difference in the behaviors of a word at the document and collection levels brings information on the significance of the word for the document. This hypothesis has been exploited in the 2-Poisson mixture models, in the notion of eliteness in BM25, and more recently in DFR models. We show here that, combined with notions related to burstiness, it can lead to simpler and better models.

References

  1. E. M. Airoldi, W. W. Cohen, and S. E. Fienberg. Bayesian methods for frequent terms in text: Models of contagion and the S2 statistic.Google ScholarGoogle Scholar
  2. G. Amati, C. Carpineto, G. Romano, and F. U. Bordoni. Fondazione Ugo Bordoni at TREC 2003: robust and web track, 2003.Google ScholarGoogle Scholar
  3. G. Amati and C. J. V. Rijsbergen. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst., 20(4):357--389, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509--512, October 1999.Google ScholarGoogle ScholarCross RefCross Ref
  5. D. Chakrabarti and C. Faloutsos. Graph mining: Laws, generators, and algorithms. ACM Comput. Surv., 38(1):2, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. W. Church and W. A. Gale. Poisson mixtures. Natural Language Engineering, 1:163--190, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  7. S. Clinchant and É. Gaussier. The BNB distribution for text modeling. In Macdonald et al. {12}, pages 150--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Elkan. Clustering documents with an exponential-family approximation of the dirichlet compound multinomial distribution. In W. W. Cohen and A. Moore, editors, ICML, volume 148 of ACM International Conference Proceeding Series, pages 289--296. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Fang, T. Tao, and C. Zhai. A formal study of information retrieval heuristics. In SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. Feller. An Introduction to Probability Theory and Its Applications, Vol. I. Wiley, New York, 1968.Google ScholarGoogle Scholar
  11. S. P. Harter. A probabilistic approach to automatic keyword indexing. Journal of the American Society for Information Science, 26, 1975.Google ScholarGoogle Scholar
  12. C. Macdonald, I. Ounis, V. Plachouras, I. Ruthven, and R. W. White, editors. Advances in Information Retrieval, 30th European Conference on IR Research, ECIR 2008, Glasgow, UK, March 30-April 3, 2008. Proceedings, volume 4956 of Lecture Notes in Computer Science. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. E. Madsen, D. Kauchak, and C. Elkan. Modeling word burstiness using the dirichlet distribution. In L. D. Raedt and S. Wrobel, editors, ICML, volume 119 of ACM International Conference Proceeding Series, pages 545--552. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S.-H. Na, I.-S. Kang, and J.-H. Lee. Improving term frequency normalization for multi-topical documents and application to language modeling approaches. In Macdonald et al. {12}, pages 382--393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. E. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR '94: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 232--241, New York, NY, USA, 1994. Springer-Verlag New York, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In SIGIR '96: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 21--29, New York, NY, USA, 1996. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. O. V. Plachouras, B. He. University of Glasgow at TREC 2004: Experiments in web, robust and terabyte tracks with terrier, 2004.Google ScholarGoogle Scholar
  19. Z. Xu and R. Akella. A new probabilistic retrieval model based on the dirichlet compound multinomial distribution. In SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 427--434, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In CIKM '01: Proceedings of the tenth international conference on Information and knowledge management, pages 403--410, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179--214, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Information-based models for ad hoc IR

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
      July 2010
      944 pages
      ISBN:9781450301534
      DOI:10.1145/1835449

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 July 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGIR '10 Paper Acceptance Rate87of520submissions,17%Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader