Skip to main content
Erschienen in: Discover Computing 6/2008

01.12.2008

Probabilistic relevance ranking for collaborative filtering

verfasst von: Jun Wang, Stephen Robertson, Arjen P. de Vries, Marcel J. T. Reinders

Erschienen in: Discover Computing | Ausgabe 6/2008

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Collaborative filtering is concerned with making recommendations about items to users. Most formulations of the problem are specifically designed for predicting user ratings, assuming past data of explicit user ratings is available. However, in practice we may only have implicit evidence of user preference; and furthermore, a better view of the task is of generating a top-N list of items that the user is most likely to like. In this regard, we argue that collaborative filtering can be directly cast as a relevance ranking problem. We begin with the classic Probability Ranking Principle of information retrieval, proposing a probabilistic item ranking framework. In the framework, we derive two different ranking models, showing that despite their common origin, different factorizations reflect two distinctive ways to approach item ranking. For the model estimations, we limit our discussions to implicit user preference data, and adopt an approximation method introduced in the classic text retrieval model (i.e. the Okapi BM25 formula) to effectively decouple frequency counts and presence/absence counts in the preference data. Furthermore, we extend the basic formula by proposing the Bayesian inference to estimate the probability of relevance (and non-relevance), which largely alleviates the data sparsity problem. Apart from a theoretical contribution, our experiments on real data sets demonstrate that the proposed methods perform significantly better than other strong baselines.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
The underlying model assumption might be weaker and more plausible by adopting Cooper’s linked dependence assumptions instead of conditional independence (Cooper 1995).
 
Literatur
Zurück zum Zitat Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734–749.CrossRef Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734–749.CrossRef
Zurück zum Zitat Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Addison Wesley. Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Addison Wesley.
Zurück zum Zitat Belkin, N. J., & Croft, W. B. (1992). Information filtering and information retrieval: Two sides of the same coin? Communications of The ACM, 35(12), 29–38. Belkin, N. J., & Croft, W. B. (1992). Information filtering and information retrieval: Two sides of the same coin? Communications of The ACM, 35(12), 29–38.
Zurück zum Zitat Bishop, C. M. (2006). Pattern recognition and machine learning. Springer. Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
Zurück zum Zitat Breese, J., Heckerman, D., & Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14th Annual Conference on Uncertainty in Artificial Intelligence (UAI-98) (pp. 43–52). San Francisco, CA: Morgan Kaufmann. Breese, J., Heckerman, D., & Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14th Annual Conference on Uncertainty in Artificial Intelligence (UAI-98) (pp. 43–52). San Francisco, CA: Morgan Kaufmann.
Zurück zum Zitat Canny, J. (2002). Collaborative filtering with privacy via factor analysis. In SIGIR ’02: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 238–245). New York, NY: ACM Press. Canny, J. (2002). Collaborative filtering with privacy via factor analysis. In SIGIR ’02: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 238–245). New York, NY: ACM Press.
Zurück zum Zitat Claypool, M., Le, P., Wased, M., & Brown, D. (2001). Implicit interest indicators. In IUI ’01: Proceedings of the 6th International Conference on Intelligent User Interfaces (pp. 33–40). New York, NY, USA: ACM. Claypool, M., Le, P., Wased, M., & Brown, D. (2001). Implicit interest indicators. In IUI ’01: Proceedings of the 6th International Conference on Intelligent User Interfaces (pp. 33–40). New York, NY, USA: ACM.
Zurück zum Zitat Cooper, W. S. (1995). Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval. ACM Transactions on Information Systems, 13(1), 100–111.CrossRef Cooper, W. S. (1995). Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval. ACM Transactions on Information Systems, 13(1), 100–111.CrossRef
Zurück zum Zitat Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 39(1), 1–38.MATHMathSciNet Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 39(1), 1–38.MATHMathSciNet
Zurück zum Zitat Deshpande, M., & Karypis, G. (2004). Item-based top-N recommendation algorithms. ACM Transactions on Information Systems, 22(1), 143–177.CrossRef Deshpande, M., & Karypis, G. (2004). Item-based top-N recommendation algorithms. ACM Transactions on Information Systems, 22(1), 143–177.CrossRef
Zurück zum Zitat Eyheramendy, S., Lewis, D., & Madigan, D. (2003). On the naive bayes model for text categorization. In Proceeding of the Artificial Intelligence and Statistics. Eyheramendy, S., Lewis, D., & Madigan, D. (2003). On the naive bayes model for text categorization. In Proceeding of the Artificial Intelligence and Statistics.
Zurück zum Zitat Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2003). Bayesian data analysis. Chapman and Hall. Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2003). Bayesian data analysis. Chapman and Hall.
Zurück zum Zitat Harter, S. (1975). A probabilistic approach to automatic keyword indexing. Journal of the American Society for Information Science, 35, 197–206, 280–289. Harter, S. (1975). A probabilistic approach to automatic keyword indexing. Journal of the American Society for Information Science, 35, 197–206, 280–289.
Zurück zum Zitat Herlocker, J. L., Konstan, J. A., Borchers, A., & Riedl, J. (1999). An algorithmic framework for performing collaborative filtering. In SIGIR '99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 230–237). New York, NY: ACM Press. Herlocker, J. L., Konstan, J. A., Borchers, A., & Riedl, J. (1999). An algorithmic framework for performing collaborative filtering. In SIGIR '99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 230–237). New York, NY: ACM Press.
Zurück zum Zitat Hofmann, T. (2004). Latent semantic models for collaborative filtering. ACM Transactions on Information Systems, 22(1), 89–115.CrossRef Hofmann, T. (2004). Latent semantic models for collaborative filtering. ACM Transactions on Information Systems, 22(1), 89–115.CrossRef
Zurück zum Zitat Hull, D. (1993). Using statistical testing in the evaluation of retrieval experiments. In SIGIR ’93: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 329–338). New York, NY: ACM Press. Hull, D. (1993). Using statistical testing in the evaluation of retrieval experiments. In SIGIR ’93: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 329–338). New York, NY: ACM Press.
Zurück zum Zitat Jin, R., Si, L., & Zhai, C. (2006). A study of mixture models for collaborative filtering. Information Retrieval, 9(3), 357–382.CrossRef Jin, R., Si, L., & Zhai, C. (2006). A study of mixture models for collaborative filtering. Information Retrieval, 9(3), 357–382.CrossRef
Zurück zum Zitat Jordan, M. (1999). Learning in graphical models. MIT Press. Jordan, M. (1999). Learning in graphical models. MIT Press.
Zurück zum Zitat Lafferty, J., & Zhai, C. (2003). Probabilistic relevance models based on document and query generation. Language Modeling and Information Retrieval, Kluwer International Series on Information Retrieval, V13, 1–10. Lafferty, J., & Zhai, C. (2003). Probabilistic relevance models based on document and query generation. Language Modeling and Information Retrieval, Kluwer International Series on Information Retrieval, V13, 1–10.
Zurück zum Zitat Marlin, B. (2004). Collaborative filtering: A machine learning perspective. Master’s thesis, Department of Computer Science, University of Toronto. Marlin, B. (2004). Collaborative filtering: A machine learning perspective. Master’s thesis, Department of Computer Science, University of Toronto.
Zurück zum Zitat McLaughlin, M. R., & Herlocker, J. L. (2004). A collaborative filtering algorithm and evaluation metric that accurately model the user experience. In SIGIR ’04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 329–336). New York, NY, USA: ACM Press. McLaughlin, M. R., & Herlocker, J. L. (2004). A collaborative filtering algorithm and evaluation metric that accurately model the user experience. In SIGIR ’04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 329–336). New York, NY, USA: ACM Press.
Zurück zum Zitat Pennock, D. M., Horvitz, E., Lawrence, S., & Giles, C. L. (2000). Collaborative filtering by personality diagnosis: A hybrid memory and model-based approach. In UAI ’00: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (pp. 473–480). San Francisco, CA: Morgan Kaufmann Publishers Inc. Pennock, D. M., Horvitz, E., Lawrence, S., & Giles, C. L. (2000). Collaborative filtering by personality diagnosis: A hybrid memory and model-based approach. In UAI ’00: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (pp. 473–480). San Francisco, CA: Morgan Kaufmann Publishers Inc.
Zurück zum Zitat Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., & Riedl, J. (1994). Grouplens: An open architecture for collaborative filtering of netnews. In CSCW ’94: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work (pp. 175–186). New York, NY: ACM Press. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., & Riedl, J. (1994). Grouplens: An open architecture for collaborative filtering of netnews. In CSCW ’94: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work (pp. 175–186). New York, NY: ACM Press.
Zurück zum Zitat Robertson, S. E. (1997). The probability ranking principle in IR. In Readings in information retrieval (pp. 281–286). Robertson, S. E. (1997). The probability ranking principle in IR. In Readings in information retrieval (pp. 281–286).
Zurück zum Zitat Robertson, S. E., & Sparck Jones, K. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3), 129–46.CrossRef Robertson, S. E., & Sparck Jones, K. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3), 129–46.CrossRef
Zurück zum Zitat Robertson, S. E., Maron, M. E., & Cooper, W. (1982). Probability of relevance: A unification of two competing models for document retrieval. Information Technology: Research and Development, 1(1), 1–21. Robertson, S. E., Maron, M. E., & Cooper, W. (1982). Probability of relevance: A unification of two competing models for document retrieval. Information Technology: Research and Development, 1(1), 1–21.
Zurück zum Zitat Robertson, S. E., & Walker, S. (1994). Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR’94: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 232–241) New York, NY: Springer-Verlag New York, Inc. Robertson, S. E., & Walker, S. (1994). Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR’94: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 232–241) New York, NY: Springer-Verlag New York, Inc.
Zurück zum Zitat Sarwar, B., Karypis, G., Konstan, J., & Reidl, J. (2001). Item-based collaborative filtering recommendation algorithms. In WWW ’01: Proceedings of the 10th International Conference on World Wide Web (pp. 285–295) New York, NY: ACM Press. Sarwar, B., Karypis, G., Konstan, J., & Reidl, J. (2001). Item-based collaborative filtering recommendation algorithms. In WWW ’01: Proceedings of the 10th International Conference on World Wide Web (pp. 285–295) New York, NY: ACM Press.
Zurück zum Zitat Sparck Jones, K., Walker, S., & Robertson, S. E. (2000). A probabilistic model of information retrieval: Development and comparative experiments, part1. Information Processing and Management, V36(6), 779–808. Sparck Jones, K., Walker, S., & Robertson, S. E. (2000). A probabilistic model of information retrieval: Development and comparative experiments, part1. Information Processing and Management, V36(6), 779–808.
Zurück zum Zitat Sparck Jones, K., Walker, S., & Robertson, S. E. (2000) A probabilistic model of information retrieval: Development and comparative experiments, part 2. Information Processing and Management, 36(6), 809–840. Sparck Jones, K., Walker, S., & Robertson, S. E. (2000) A probabilistic model of information retrieval: Development and comparative experiments, part 2. Information Processing and Management, 36(6), 809–840.
Zurück zum Zitat Taylor, M. J., Zaragoza, H., & Robertson, S. E. (2003). Ranking classes: Finding similar authors. Technical Report, Microsoft Research, Cambridge. Taylor, M. J., Zaragoza, H., & Robertson, S. E. (2003). Ranking classes: Finding similar authors. Technical Report, Microsoft Research, Cambridge.
Zurück zum Zitat van Rijsbergen, C. J. (1979). Information Retrieval. London, UK: Butterworths. van Rijsbergen, C. J. (1979). Information Retrieval. London, UK: Butterworths.
Zurück zum Zitat Wang, J., de Vries, A. P., & Reinders, M. J. T. (2006). A user-item relevance model for log-based collaborative filtering. In Proceedings of the ECIR06, London, UK (pp. 37–48). Berlin, Germany: Springer Berlin/Heidelberg. Wang, J., de Vries, A. P., & Reinders, M. J. T. (2006). A user-item relevance model for log-based collaborative filtering. In Proceedings of the ECIR06, London, UK (pp. 37–48). Berlin, Germany: Springer Berlin/Heidelberg.
Zurück zum Zitat Wang, J., de Vries, A. P., & Reinders, M. J. T. (2008). Unified relevance models for rating prediction in collaborative filtering. ACM Transactions on Information System (TOIS) (to appear). Wang, J., de Vries, A. P., & Reinders, M. J. T. (2008). Unified relevance models for rating prediction in collaborative filtering. ACM Transactions on Information System (TOIS) (to appear).
Zurück zum Zitat Wang, J., de Vries, A. P., & Reinders, M. J. T. (2006). Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In SIGIR ’06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 501–508). New York, NY: ACM Press. Wang, J., de Vries, A. P., & Reinders, M. J. T. (2006). Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In SIGIR ’06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 501–508). New York, NY: ACM Press.
Zurück zum Zitat Wang, J., Yang, J., Clements, M., de Vries, A. P., & Reinders, M. J. T. (2007). Personalized collaborative tagging. Technical Report, University College London. Wang, J., Yang, J., Clements, M., de Vries, A. P., & Reinders, M. J. T. (2007). Personalized collaborative tagging. Technical Report, University College London.
Zurück zum Zitat Xue, G.-R., Lin, C., Yang, Q., Xi, W., Zeng, H.-J., Yu, Y., & Chen, Z. (2005). Scalable collaborative filtering using cluster-based smoothing. In SIGIR’ 05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 114–121). New York, NY: ACM Press. Xue, G.-R., Lin, C., Yang, Q., Xi, W., Zeng, H.-J., Yu, Y., & Chen, Z. (2005). Scalable collaborative filtering using cluster-based smoothing. In SIGIR’ 05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 114–121). New York, NY: ACM Press.
Zurück zum Zitat Zaragoza, H., Hiemstra, D., & Tipping, M. (2003). Bayesian extension to the language model for ad hoc information retrieval. In SIGIR ’03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval (pp. 4–9) New York, NY, USA: ACM Press. Zaragoza, H., Hiemstra, D., & Tipping, M. (2003). Bayesian extension to the language model for ad hoc information retrieval. In SIGIR ’03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval (pp. 4–9) New York, NY, USA: ACM Press.
Zurück zum Zitat Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR ’01: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 334–342) New York, NY: ACM Press. Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR ’01: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 334–342) New York, NY: ACM Press.
Metadaten
Titel
Probabilistic relevance ranking for collaborative filtering
verfasst von
Jun Wang
Stephen Robertson
Arjen P. de Vries
Marcel J. T. Reinders
Publikationsdatum
01.12.2008
Verlag
Springer Netherlands
Erschienen in
Discover Computing / Ausgabe 6/2008
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-008-9060-1

Premium Partner