Skip to main content
Erschienen in: Discover Computing 1/2011

01.02.2011 | The Second International Conference on the Theory of Information Retrieval (ICTIR2009)

An analysis of NP-completeness in novelty and diversity ranking

verfasst von: Ben Carterette

Erschienen in: Discover Computing | Ausgabe 1/2011

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A useful ability for search engines is to be able to rank objects with novelty and diversity: the top k documents retrieved should cover possible intents of a query with some distribution, or should contain a diverse set of subtopics related to the user’s information need, or contain nuggets of information with little redundancy. Evaluation measures have been introduced to measure the effectiveness of systems at this task, but these measures have worst-case NP-hard computation time. The primary consequence of this is that there is no ranking principle akin to the Probability Ranking Principle for document relevance that provides uniform instruction on how to rank documents for novelty and diversity. We use simulation to investigate the practical implications of this for optimization and evaluation of retrieval systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Note that while Zhai et al. defined this quantity in terms of a recall value, we define it in terms of the number of subtopics. The definitions are functionally equivalent.
 
2
This example is derived from Wikipedia’s page on Set Cover (http://​en.​wikipedia.​org/​wiki/​Set_​cover_​problem).
 
3
A simple “hack” for this case might be to redefine S-precision and α-nDCG to have maximum values of 1, but this seems unfair to a system that uses alternatives to the greedy ranking.
 
4
The link to the data provided in this reference no longer works, but the data can be obtained by contacting the author of this work.
 
5
Though this is a relatively small data set, exhaustive search still took a very long time in the most extreme cases, even when parallelized across 64 cores.
 
6
We did not do an optimal ranking, since there are too many documents to be able to do exhaustive search over all subsets.
 
Literatur
Zurück zum Zitat Agrawal, R., Gollapudi, S., Halverson, H., & Ieong, S. (2009). Diversifying search results. In Proceedings of the 2nd ACM international conference on web search and data mining (pp. 5–14). Agrawal, R., Gollapudi, S., Halverson, H., & Ieong, S. (2009). Diversifying search results. In Proceedings of the 2nd ACM international conference on web search and data mining (pp. 5–14).
Zurück zum Zitat Allan, J., Carterette, B., & Lewis, J. (2005). When will information retrieval be ’good enough?’. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 433–440). Allan, J., Carterette, B., & Lewis, J. (2005). When will information retrieval be ’good enough?’. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 433–440).
Zurück zum Zitat Bogdanov, A., & Trevisan, L. (2006). Average-case complexity. Hanover, MA: Now Publishers Inc.MATH Bogdanov, A., & Trevisan, L. (2006). Average-case complexity. Hanover, MA: Now Publishers Inc.MATH
Zurück zum Zitat Carbonell, J. G., & Goldstein, J. (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (pp. 335–336). Carbonell, J. G., & Goldstein, J. (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (pp. 335–336).
Zurück zum Zitat Carterette, B., & Chandar, P. (2009). Probabilistic models of novel document rankings for faceted topic retrieval. In Proceedings of the 18th ACM international conference on information and knowledge management. Carterette, B., & Chandar, P. (2009). Probabilistic models of novel document rankings for faceted topic retrieval. In Proceedings of the 18th ACM international conference on information and knowledge management.
Zurück zum Zitat Chen, H., & Karger, D. R. (2006). Less is more: Probabilistic models for retrieving fewer relevant documents. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 429–436). Chen, H., & Karger, D. R. (2006). Less is more: Probabilistic models for retrieving fewer relevant documents. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 429–436).
Zurück zum Zitat Clarke, C. L., Craswell, N., & Soboroff, I. (2009a). Overview of the TREC 2009 web track. In Proceedings of the 18th text retrieval conference (TREC). Clarke, C. L., Craswell, N., & Soboroff, I. (2009a). Overview of the TREC 2009 web track. In Proceedings of the 18th text retrieval conference (TREC).
Zurück zum Zitat Clarke, C. L. A., Kolla, M., Cormack, G. V., Vechtomova, O., Ashkan, A., Büttcher, S., et al. (2008). Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 659–666). Clarke, C. L. A., Kolla, M., Cormack, G. V., Vechtomova, O., Ashkan, A., Büttcher, S., et al. (2008). Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 659–666).
Zurück zum Zitat Clarke, C. L., Kolla, M., & Vechtomova, O. (2009b). An effectiveness measure for ambiguous and underspecified queries. In Advances in information retrieval theory: Proceedings of the 2nd international conference on the theory of information retrieval (pp. 188–199). Clarke, C. L., Kolla, M., & Vechtomova, O. (2009b). An effectiveness measure for ambiguous and underspecified queries. In Advances in information retrieval theory: Proceedings of the 2nd international conference on the theory of information retrieval (pp. 188–199).
Zurück zum Zitat Jarvelin, K., & Kekalainen, J. (2002). Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information and Systems, 20(4), 422–446.CrossRef Jarvelin, K., & Kekalainen, J. (2002). Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information and Systems, 20(4), 422–446.CrossRef
Zurück zum Zitat Lenstra, J. K., Kan, A. H. G. R., & Brucker, P. (1977). Complexity of machine scheduling problems. In P. L. Hammer (Ed.), Studies in integer programming (Vol. 1). North Holland: Addison-Wesley. Lenstra, J. K., Kan, A. H. G. R., & Brucker, P. (1977). Complexity of machine scheduling problems. In P. L. Hammer (Ed.), Studies in integer programming (Vol. 1). North Holland: Addison-Wesley.
Zurück zum Zitat Li, P., Burges, C. J., & Wu, Q. (2008). McRank: Learning to rank using multiple classification and gradient boosting. In Advances in neural information processing systems (Vol. 20, pp. 897–904). Li, P., Burges, C. J., & Wu, Q. (2008). McRank: Learning to rank using multiple classification and gradient boosting. In Advances in neural information processing systems (Vol. 20, pp. 897–904).
Zurück zum Zitat Moffat, A., & Zobel, J. (2008). Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information and Systems, 27, 1–27.CrossRef Moffat, A., & Zobel, J. (2008). Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information and Systems, 27, 1–27.CrossRef
Zurück zum Zitat Radlinski, F., Kleinberg, R., & Joachims, T. (2008). Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th international conference on machine learning (pp. 784–791). Radlinski, F., Kleinberg, R., & Joachims, T. (2008). Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th international conference on machine learning (pp. 784–791).
Zurück zum Zitat Robertson, S. E. (1977). The probability ranking principle in information retrieval. Journal of Documentation, 33, 294–304.CrossRef Robertson, S. E. (1977). The probability ranking principle in information retrieval. Journal of Documentation, 33, 294–304.CrossRef
Zurück zum Zitat Vee, E., Srivastava, U., Shanmugasundaram, J., Bhat, P., & Amer-Yahia, S. (2008). Efficient computation of diverse query results. In Proceedings of the 24th international conference on data engineering (pp. 228–236). Vee, E., Srivastava, U., Shanmugasundaram, J., Bhat, P., & Amer-Yahia, S. (2008). Efficient computation of diverse query results. In Proceedings of the 24th international conference on data engineering (pp. 228–236).
Zurück zum Zitat Zaman, A., & Simberloff, D. (2002). Random binary matrices in biogeographical ecology—instituting a good neighbor policy. Environmental and Ecological Statistics, 9, 405–421.CrossRefMathSciNet Zaman, A., & Simberloff, D. (2002). Random binary matrices in biogeographical ecology—instituting a good neighbor policy. Environmental and Ecological Statistics, 9, 405–421.CrossRefMathSciNet
Zurück zum Zitat Zhai, C., Cohen, W. W., & Lafferty, J. D. (2003). Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 10–17). Zhai, C., Cohen, W. W., & Lafferty, J. D. (2003). Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 10–17).
Zurück zum Zitat Zipf, G. K. (1949). Human behavior and the principle of least-effort. Cambridge, MA: Addison-Wesley. Zipf, G. K. (1949). Human behavior and the principle of least-effort. Cambridge, MA: Addison-Wesley.
Metadaten
Titel
An analysis of NP-completeness in novelty and diversity ranking
verfasst von
Ben Carterette
Publikationsdatum
01.02.2011
Verlag
Springer Netherlands
Erschienen in
Discover Computing / Ausgabe 1/2011
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-010-9157-1

Weitere Artikel der Ausgabe 1/2011

Discover Computing 1/2011 Zur Ausgabe

The Second International Conference on the Theory of Information Retrieval (ICTIR2009)

Retrieval constraints and word frequency distributions a log-logistic model for IR

The Second International Conference on the Theory of Information Retrieval (ICTIR2009)

Introduction to special issue on the second international conference on the theory of information retrieval

The Second International Conference on the Theory of Information Retrieval (ICTIR2009)

Specificity aboutness in XML retrieval

The Second International Conference on the Theory of Information Retrieval (ICTIR2009)

Modeling score distributions in information retrieval

The Second International Conference on the Theory of Information Retrieval (ICTIR2009)

Variational bayes for modeling score distributions