nach oben

Discover Computing

Erschienen in:

01.08.2016

Assessing efficiency–effectiveness tradeoffs in multi-stage retrieval systems without using relevance judgments

verfasst von: Charles L. A. Clarke, J. Shane Culpepper, Alistair Moffat

Erschienen in: Discover Computing | Ausgabe 4/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Large-scale retrieval systems are often implemented as a cascading sequence of phases—a first filtering step, in which a large set of candidate documents are extracted using a simple technique such as Boolean matching and/or static document scores; and then one or more ranking steps, in which the pool of documents retrieved by the filter is scored more precisely using dozens or perhaps hundreds of different features. The documents returned to the user are then taken from the head of the final ranked list. Here we examine methods for measuring the quality of filtering and preliminary ranking stages, and show how to use these measurements to tune the overall performance of the system. Standard top-weighted metrics used for overall system evaluation are not appropriate for assessing filtering stages, since the output is a set of documents, rather than an ordered sequence of documents. Instead, we use an approach in which a quality score is computed based on the discrepancy between filtered and full evaluation. Unlike previous approaches, our methods do not require relevance judgments, and thus can be used with virtually any query set. We show that this quality score directly correlates with actual differences in measured effectiveness when relevance judgments are available. Since the quality score does not require relevance judgments, it can be used to identify queries that perform particularly poorly for a given filter. Using these methods, we explore a wide range of filtering options using thousands of queries, categorize the relative merits of the different approaches, and identify useful parameter combinations.

Nächster Artikel Mining unstructured content for recommender systems: an ensemble approach

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

http://www.lemurproject.org/indri/.

http://boston.lti.cs.cmu.edu/clueweb09/wiki/tiki-index.php?page=PageRank.

Al-akashi, F. H., & Inkpen, D. (2011). Query-structure based web page indexing. In Proceedings of the text retrieval conference.

Asadi, N., & Lin, J. (2013a). Document vector representations for feature extraction in multi-stage document ranking. Information Retrieval Journal, 16(6), 747–768.CrossRef

Asadi, N., & Lin, J. (2013b). Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures. In Proceedings of the ACM-SIGIR international conference on research and development in information retrieval (pp. 997–1000).

Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. In Proceedings of the international conference on the world wide web (pp. 107–117).

Broder, A. Z., Carmel, D., Herscovici, M., Soffer, A., & Zien, J. Y. (2003). Efficient query evaluation using a two-level retrieval process. In: Proceedings of the conference on information and knowledge management (pp. 426–434).

Büttcher, S., Clarke, C. L. A., & Cormack, G. V. (2010). Information retrieval: Implementing and evaluating search engines. Cambridge, MA: MIT Press.MATH

Chapelle, O., Metzler, D., Zhang, Y., & Grinspan, P. (2009). Expected reciprocal rank for graded relevance. In Proceedigns of the conference on information and knowledge management (pp. 621–630).

Cormack, G. V., Smucker, M. D., & Clarke, C. L. A. (2011). Efficient and effective spam filtering and re-ranking for large web datasets. Information Retrieval Journal, 14(5), 441–465.CrossRef

Culpepper, J. S., & Moffat, A. (2010). Compact set intersection for inverted indexing. ACM Transactions on Information Systems, 29(1), 1:1–1:25.CrossRef

Demaine, E. D., López-Ortiz, A., & Munro, J. I. (2000). Adaptive set intersections, unions, and differences. In Proceedings of the ACM-SIAM symposium on discrete algorithms (pp. 743–752).

Demaine, E. D., López-Ortiz, A., & Munro, J. I. (2001). Experiments on adaptive set intersections for text retrieval systems. In Proceedings of the workshop on algorithm engineering and experiments (pp. 91–104).

Eisayed, T., Asadi, N., Wang, L., Lin, J., & Metzler, D. (2011). UMD and USC/ISI: TREC 2010 Web Track experiments with Ivory. In Proceedings of the text retrieval conference.

Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446.CrossRef

Kane, A., & Tompa, F. W. (2014). Skewed partial bitvectors for list intersection. In Proceedings of the ACM-SIGIR international conference on research and development in information retrieval (pp. 263–272).

Lemire, D., & Boytsov, L. (2015). Decoding billions of integers per second through vectorization. Software Practice and Experience, 45(1), 1–29.CrossRef

Macdonald, C., Santos, R. L. T., & Ounis, I. (2013a). The whens and hows of learning to rank for web search. Information Retrieval Journal, 16(5), 584–628.CrossRef

Macdonald, C., Santos, R. L. T., Ounis, I., & He, B. (2013b). About learning models with multiple query-dependent features. ACM Transactions on Information Systems, 31(3), 11.CrossRef

Moffat, A., & Zobel, J. (2008). Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems, 27(1), 2.1–2.27.CrossRef

Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Technical Report 1999–66. Stanford InfoLab. http://ilpubs.stanford.edu:8090/422/.

Richardson, M., Prakash, A., & Brill, E. (2006). Beyond PageRank: Machine learning for static ranking. In Proceedings of the international conference on the world wide web (pp. 707–715).

Sakai, T. (2007). Alternatives to bpref. In Proceedings of the ACM-SIGIR international conference on research and development in information retrieval (pp. 71–78).

Sakai, T., & Kando, N. (2008). On information retrieval metrics designed for evaluation with incomplete relevance assessments. Information Retrieval Journal, 11(5), 447–470.CrossRef

Smucker, M. D., & Clarke, C. L. A. (2012). Time-based calibration of effectiveness measures. In Proceedings of the ACM-SIGIR international conference on research and development in information retrieval (pp. 95–104).

Strohman, T., Turtle, H., & Croft, W. B. (2005). Optimization strategies for complex queries. In Proceedings of the ACM-SIGIR international conference on research and development in information retrieval (pp. 219–225).

Tan, L., & Clarke, C. L. A. (2015). A family of rank similarity measures based on maximized effectiveness difference. IEEE Transactions on Knowledge and Data Engineering, 27(11), 2865–2877.CrossRef

Trotman, A., Jia, X.-F., & Crane, M. (2012). Towards an efficient and effective search engine. In Workshop on open source IR (pp. 40–47).

Turtle, H. R., & Flood, J. (1995). Query evaluation: Strategies and optimizations. Information Processing & Management, 31(6), 831–850.CrossRef

Wang, L., Lin, J., & Metzler, D. (2011). A cascade ranking model for efficient ranked retrieval. In Proceedings of the ACM-SIGIR international conference on research and development in information retrieval (pp. 105–114).

Webber, W., Moffat, A., & Zobel, J. (2010). A similarity measure for indefinite rankings. ACM Transactions on Information Systems, 28(4), 20.1–20.38.CrossRef

Wu, H., & Fang, H. (2014). Document prioritization for scalable query processing. In Proceedings of the conference on information and knowledge management (pp. 1609–1618).

Zobel, J. (1998). How reliable are the results of large-scale information retrieval experiments? In Proceedings of the ACM-SIGIR international conference on research and development in information retrieval (pp. 307–314).

Zobel, J., & Moffat, A. (2006). Inverted files for text search engines. ACM Computing Surveys, 38(2), 6.1–6.56.CrossRef

Titel: Assessing efficiency–effectiveness tradeoffs in multi-stage retrieval systems without using relevance judgments
verfasst von: Charles L. A. Clarke
J. Shane Culpepper
Alistair Moffat
Publikationsdatum: 01.08.2016
Verlag: Springer Netherlands
Erschienen in: Discover Computing / Ausgabe 4/2016
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI: https://doi.org/10.1007/s10791-016-9279-1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"