Skip to main content
Top
Published in: Discover Computing 6/2008

01-12-2008

On event space and rank equivalence between probabilistic retrieval models

Author: Robert W. P. Luk

Published in: Discover Computing | Issue 6/2008

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper discusses various issues about the rank equivalence of Lafferty and Zhai between the log-odds ratio and the query likelihood of probabilistic retrieval models. It highlights that Robertson’s concerns about this equivalence may arise when multiple probability distributions are assumed to be uniformly distributed, after assuming that the marginal probability logically follows from Kolmogorov’s probability axioms. It also clarifies that there are two types of rank equivalence relations between probabilistic models, namely strict and weak rank equivalence. This paper focuses on the strict rank equivalence which requires the event spaces of the participating probabilistic models to be identical. It is possible that two probabilistic models are strict rank equivalent when they use different probability estimation methods. This paper shows that the query likelihood, p(q|d, r), is strict rank equivalent to p(q|d) of the language model of Ponte and Croft by applying assumptions 1 and 2 of Lafferty and Zhai. In addition, some statistical component language model may be strict rank equivalent to the log-odds ratio, and that some statistical component model using the log-odds ratio may be strict rank equivalent to the query likelihood. Finally, we suggest adding a random variable for the user information need to the probabilistic retrieval models for clarification when these models deal with multiple requests.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
go back to reference Berger, A., & Lafferty, J. (1999). Information retrieval as statistical translation. In F. Gey, M. Hearst, & R. Tong, (Eds.), Proceedings of SIGIR ‘99: The 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 222–229). Berkeley, CA, USA, New York: ACM Press. Berger, A., & Lafferty, J. (1999). Information retrieval as statistical translation. In F. Gey, M. Hearst, & R. Tong, (Eds.), Proceedings of SIGIR ‘99: The 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 222–229). Berkeley, CA, USA, New York: ACM Press.
go back to reference Cooper, W. S. (1995). Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval. ACM Transactions on Information Systems, 13(1), 100–111.CrossRef Cooper, W. S. (1995). Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval. ACM Transactions on Information Systems, 13(1), 100–111.CrossRef
go back to reference Croft, W. B., & Lafferty, J. (Eds.). (2003). Language modeling for information retrieval. Kluwer Academic Publishers. Croft, W. B., & Lafferty, J. (Eds.). (2003). Language modeling for information retrieval. Kluwer Academic Publishers.
go back to reference Fuhr, N. (1992). Probabilistic models in information retrieval. The Computer Journal, 35(3), 243–255.MATHCrossRef Fuhr, N. (1992). Probabilistic models in information retrieval. The Computer Journal, 35(3), 243–255.MATHCrossRef
go back to reference Fuhr, N. (2008). A probability ranking principle for interactive information retrieval. Information Retrieval, 11(3), 251–265.CrossRef Fuhr, N. (2008). A probability ranking principle for interactive information retrieval. Information Retrieval, 11(3), 251–265.CrossRef
go back to reference Gao, J., Nie, J.-Y., Wu, G., & Cao, G. (2004). Dependence language model for information retrieval. In M. Sanderson, K. Järvelin, J. Allan, & P. Bruza, (Eds.), Proceedings of SIGIR ‘04: The 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 170–177). Sheffield, UK, New York: ACM Press. Gao, J., Nie, J.-Y., Wu, G., & Cao, G. (2004). Dependence language model for information retrieval. In M. Sanderson, K. Järvelin, J. Allan, & P. Bruza, (Eds.), Proceedings of SIGIR ‘04: The 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 170–177). Sheffield, UK, New York: ACM Press.
go back to reference Hiemstra, D. (1998). A linguistically motivated probabilistic model of information retrieval. In C. Nicolaou & C. Stephanidis, (Eds.), Lecture notes in computer science: Research and advanced technology for digital libraries (Vol. 1513, pp. 569–584). Greece: Springer-Verlag. Hiemstra, D. (1998). A linguistically motivated probabilistic model of information retrieval. In C. Nicolaou & C. Stephanidis, (Eds.), Lecture notes in computer science: Research and advanced technology for digital libraries (Vol. 1513, pp. 569–584). Greece: Springer-Verlag.
go back to reference Hiemstra, D. (2000). A probabilistic justification for using tf.idf term weighting in information retrieval. International Journal on Digital Libraries, 3(2), 131–139.CrossRef Hiemstra, D. (2000). A probabilistic justification for using tf.idf term weighting in information retrieval. International Journal on Digital Libraries, 3(2), 131–139.CrossRef
go back to reference Hiemstra, D. (2002). Term-specific smoothing for the language modeling approach to information retrieval: The importance of a query term. In K. Järvelin, M. Beaulieu, R. Baeza-Yates, & S. H. Myaeng, (Eds.), Proceedings of SIGIR ‘02: The 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 35–41). Tempere, Finland, New York: ACM Press. Hiemstra, D. (2002). Term-specific smoothing for the language modeling approach to information retrieval: The importance of a query term. In K. Järvelin, M. Beaulieu, R. Baeza-Yates, & S. H. Myaeng, (Eds.), Proceedings of SIGIR ‘02: The 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 35–41). Tempere, Finland, New York: ACM Press.
go back to reference Lafferty, J., & Zhai, C. (2003). Probabilistic relevance models based on document and query generation. In W. B. Croft & J. Lafferty (Eds.), Language modeling for information retrieval (pp. 1–10). Kluwer Academic Publishers. Lafferty, J., & Zhai, C. (2003). Probabilistic relevance models based on document and query generation. In W. B. Croft & J. Lafferty (Eds.), Language modeling for information retrieval (pp. 1–10). Kluwer Academic Publishers.
go back to reference Lavrenko, V., & Croft, W. B. (2001). Relevance-based language models. In D. H. Kraft, W. B. Croft, D. J. Harper, & J. Zobel, (Eds.), Proceedings of SIGIR ‘01: The 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 120–127). New Orleans, Louisana, US, New York: ACM Press. Lavrenko, V., & Croft, W. B. (2001). Relevance-based language models. In D. H. Kraft, W. B. Croft, D. J. Harper, & J. Zobel, (Eds.), Proceedings of SIGIR ‘01: The 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 120–127). New Orleans, Louisana, US, New York: ACM Press.
go back to reference Lidstone, G. J. (1920). Note on the general case of the Bayes-Laplace formula for inductive or a posteriori probabilities. Transactions of the Faculty of Actuaries, 8, 182–192. Lidstone, G. J. (1920). Note on the general case of the Bayes-Laplace formula for inductive or a posteriori probabilities. Transactions of the Faculty of Actuaries, 8, 182–192.
go back to reference Miller, D. H., Leek, T., & Schwartz, R. (1999). A hidden Markov model information retrieval system. In F. Gey, M. Hearst, & R. Tong, (Eds.), Proceedings of SIGIR ‘99: The 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 214–221). Berkeley, CA, USA, New York: ACM Press. Miller, D. H., Leek, T., & Schwartz, R. (1999). A hidden Markov model information retrieval system. In F. Gey, M. Hearst, & R. Tong, (Eds.), Proceedings of SIGIR ‘99: The 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 214–221). Berkeley, CA, USA, New York: ACM Press.
go back to reference Ney, H., Essen, U., & Kenser, R. (1994). On structuring probabilistic dependencies in stochastic language modeling. Information Technology: Research and Development, 3, 33–42. Ney, H., Essen, U., & Kenser, R. (1994). On structuring probabilistic dependencies in stochastic language modeling. Information Technology: Research and Development, 3, 33–42.
go back to reference Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilinson, & J. Zobel, (Eds.), Proceedings of SIGIR ‘98: The 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 275–281). Melbourne, Australia, New York: ACM Press. Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilinson, & J. Zobel, (Eds.), Proceedings of SIGIR ‘98: The 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 275–281). Melbourne, Australia, New York: ACM Press.
go back to reference Robertson, S. E. (1974). Specificity and weighted retrieval. Journal of Documentation, 30(1), 41–46.CrossRef Robertson, S. E. (1974). Specificity and weighted retrieval. Journal of Documentation, 30(1), 41–46.CrossRef
go back to reference Robertson, S. E. (1977). The probability ranking principle in IR. Journal of Documentation, 33(4), 294–304.CrossRef Robertson, S. E. (1977). The probability ranking principle in IR. Journal of Documentation, 33(4), 294–304.CrossRef
go back to reference Robertson, S. E. (2003). The unified model revisited. In S. Dominich, K. van Rijsbergen, & M. Lalmas, (Eds.), ACM SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval. Toronto, Canada. Robertson, S. E. (2003). The unified model revisited. In S. Dominich, K. van Rijsbergen, & M. Lalmas, (Eds.), ACM SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval. Toronto, Canada.
go back to reference Robertson, S. E. (2005). On event spaces and probabilistic models in information retrieval. Information Retrieval, 8(2), 319–329.CrossRef Robertson, S. E. (2005). On event spaces and probabilistic models in information retrieval. Information Retrieval, 8(2), 319–329.CrossRef
go back to reference Robertson, S. E., & Spärck, J. K. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3), 129–146.CrossRef Robertson, S. E., & Spärck, J. K. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3), 129–146.CrossRef
go back to reference Robertson, S. E., Maron, M., & Cooper, W. S. (1982a). Probability of relevance: A unification of two competing models for document retrieval. Information Technology-Research and Development, 1, 1–21. Robertson, S. E., Maron, M., & Cooper, W. S. (1982a). Probability of relevance: A unification of two competing models for document retrieval. Information Technology-Research and Development, 1, 1–21.
go back to reference Robertson, S. E., Maron, M., & Cooper, W. S. (1982b). The Unified Probabilistic Model for IR. In H.-J. Schneider & G. Salton (Eds.), Proceedings of SIGIR ‘82: Proceedings of the 5th Annual ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 108–117). West Berlin, Germany, New York: ACM Press. Robertson, S. E., Maron, M., & Cooper, W. S. (1982b). The Unified Probabilistic Model for IR. In H.-J. Schneider & G. Salton (Eds.), Proceedings of SIGIR ‘82: Proceedings of the 5th Annual ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 108–117). West Berlin, Germany, New York: ACM Press.
go back to reference Roelleke, T., & Wang, J. (2006). A parallel derivation of probabilistic information retrieval models. In S. Dumais, E. N. Efthimiadis, D. Hawking, & K. Järvelin (Eds.), Proceedings of SIGIR ‘06: The 29th Annual ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 107–114). Seattle, Washington, USA, New York: ACM Press. Roelleke, T., & Wang, J. (2006). A parallel derivation of probabilistic information retrieval models. In S. Dumais, E. N. Efthimiadis, D. Hawking, & K. Järvelin (Eds.), Proceedings of SIGIR ‘06: The 29th Annual ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 107–114). Seattle, Washington, USA, New York: ACM Press.
go back to reference Song, F., & Croft, W. B. (1999). A general language model for information retrieval. In F. Gey, M. Hearst, & R. Tong (Eds.), Proceedings of SIGIR ‘99: The 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 279–280). Berkeley, CA, US, New York: ACM Press. Song, F., & Croft, W. B. (1999). A general language model for information retrieval. In F. Gey, M. Hearst, & R. Tong (Eds.), Proceedings of SIGIR ‘99: The 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 279–280). Berkeley, CA, US, New York: ACM Press.
go back to reference Spärck, J. K., Walker, S., & Robertson, S. E. (2000). A probabilistic model of information retrieval: Development and comparative experiments, part 1. Information Processing and Management, 36(6), 779–808.CrossRef Spärck, J. K., Walker, S., & Robertson, S. E. (2000). A probabilistic model of information retrieval: Development and comparative experiments, part 1. Information Processing and Management, 36(6), 779–808.CrossRef
go back to reference Spärck, J. K., Robertson, S. E., Hiemstra, D., & Zaragoza, H. (2003). Language modelling and relevance. In In W. B. Croft & J. Lafferty (Eds.), Language modeling for information retrieval (pp. 57–73). Kluwer Academic Publishers. Spärck, J. K., Robertson, S. E., Hiemstra, D., & Zaragoza, H. (2003). Language modelling and relevance. In In W. B. Croft & J. Lafferty (Eds.), Language modeling for information retrieval (pp. 57–73). Kluwer Academic Publishers.
go back to reference Wu, H. C., Luk, R. W. P., & Wong, K. F. (2007). Probability ranking principle via optimal expected rank. In W. Kraaij, A. P. de Vries, C. L. A. Clarke, N. Fuhr, & N. Kando, (Eds.), Proceedings of SIGIR ‘99: The 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 713–714). Amsterdam, The Netherland, New York: ACM Press. Wu, H. C., Luk, R. W. P., & Wong, K. F. (2007). Probability ranking principle via optimal expected rank. In W. Kraaij, A. P. de Vries, C. L. A. Clarke, N. Fuhr, & N. Kando, (Eds.), Proceedings of SIGIR ‘99: The 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 713–714). Amsterdam, The Netherland, New York: ACM Press.
go back to reference Wu, H. C., Luk, R. W. P., Wong, K. F., & Kwok, K. L. (in press). Interpreting TF-IDF term weights as making relevance decisions. To appear in ACM Transactions on Information Systems. Wu, H. C., Luk, R. W. P., Wong, K. F., & Kwok, K. L. (in press). Interpreting TF-IDF term weights as making relevance decisions. To appear in ACM Transactions on Information Systems.
go back to reference Zhai, C., & Laffery, J. (2004). A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems, 22(2), 179–214.CrossRef Zhai, C., & Laffery, J. (2004). A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems, 22(2), 179–214.CrossRef
go back to reference Zhai, C., & Lafferty, J. (2006). A risk minimization framework for information retrieval. Information Processing and Management, 42(1), 31–55.MATHCrossRef Zhai, C., & Lafferty, J. (2006). A risk minimization framework for information retrieval. Information Processing and Management, 42(1), 31–55.MATHCrossRef
Metadata
Title
On event space and rank equivalence between probabilistic retrieval models
Author
Robert W. P. Luk
Publication date
01-12-2008
Publisher
Springer Netherlands
Published in
Discover Computing / Issue 6/2008
Print ISSN: 2948-2984
Electronic ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-008-9062-z

Premium Partner