Top

Discover Computing

Published in:

01-12-2006

Precision prediction based on ranked list coherence

Authors: Steve Cronen-Townsend, Yun Zhou, W. Bruce Croft

Published in: Discover Computing | Issue 6/2006

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

We introduce a statistical measure of the coherence of a list of documents called the clarity score. Starting with a document list ranked by the query-likelihood retrieval model, we demonstrate the score's relationship to query ambiguity with respect to the collection. We also show that the clarity score is correlated with the average precision of a query and lay the groundwork for useful predictions by discussing a method of setting decision thresholds automatically. We then show that passage-based clarity scores correlate with average-precision measures of ranked lists of passages, where a passage is judged relevant if it contains correct answer text, which extends the basic method to passage-based systems. Next, we introduce variants of document-based clarity scores to improve the robustness, applicability, and predictive ability of clarity scores. In particular, we introduce the ranked list clarity score that can be computed with only a ranked list of documents, and the weighted clarity score where query terms contribute more than other terms. Finally, we show an approach to predicting queries that perform poorly on query expansion that uses techniques expanding on the ideas presented earlier.

previous article Evaluating the effectiveness of content-oriented XML retrieval methods

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Freely available from http://www.cs.cmu.edu/~lemur/.

Both from the TREC 9 Query Track and designed for TREC topic 96.

With answer modeling turned off.

Computed the analogous way to clarity score thresholds.

Allan, J., & Raghavan, H. (2002). Using part of speech patterns to reduce query ambiguity. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 307–314).

Amati, G., Carpineto, C., & Romano, G. (2003). Fondazione ugo bordoni at TREC 2003: Robust and web track. In Proceedings of the Twelth Text REtrieval Conference(TREC 2003). NIST Special Publication 500–255, in press.

Arndt, C. (2001). Information measures: information and its description in science and engineering. Berlin, New York: Springer.

Brants, T., Chen, F., & Tsochantaridis, I. (2002). Topic-based document segmentation with probabilistic latent semantic analysis. In Proceedings of the Eleventh International Conference on Information and Knowledge Management (pp. 211–218). ACM Press.

Broglio, J., Callan, J. P., & Croft, W. B. (1994). INQUERY system overview. In Proc. TIPSTER Text Program (Phase I) (pp. 47–67). Morgan Kaufmann.

Buckley, C. (2000). The TREC-9 query track. In E. Voorhees & D. Harman (Eds.) Proceedings of the Ninth Text REtrieval Conference(TREC-9) (pp. 500–249). NIST Special Publication.

Buckley, C. (n.d.). trec_eval information retrieval evaluation package. Available from ftp://ftp.cs.cornell.edu/pub/smart.

Buckley, C., & Salton, G. (1995). Optimization of relevance feedback weights. In Proc. of the 18th Annual ACM SIGIR Conference (pp. 351–357).

Buckley, C., Salton, G., Allan, J., & Singhal, A. (1994). Automatic query expansion using SMART(TREC 3). In Text REtrieval Conference (pp. 69–80). NIST Special Publication 500-225.

Carpineto, C., de Mori, R., Romano, G., & Bigi, B. (2001). An information-theoretic approach to automatic query expansion. ACM Transactions on Information Systems, 19(1), 1–27.

Corrada, A., & Croft, W. B. (2004). Answer models for question answering passage retrieval. To Appear in Proceedings of the 27th Annual International ACM SIGIR.

Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York: Wiley-Interscience.

Craswell, N., & Hawking, D. (2003). Overview of the TREC 2003 web track. In Proceedings of the Twelth Text REtrieval Conference(TREC 2003). NIST Special Publication 500–255, in press.

Croft, W. B., Cronen-Townsend, S., & Lavrenko, V. (2001). Relevance feedback and personalization (A language modeling perspecvtive. DELOS Workshop: Personalisation and Recommender Systems in Digital Libraries.

Croft, W. B., & Lafferty, J. (Eds.) (2003). Language modeling for information retrieval. Dordrecht: Kluwer Academic.

Cronen-Townsend, S., Corrada-Emmanuel, A., & Croft, W. B. (2003). Predicting question effectiveness, Technical Report IR-282, Center for Intelligent Information Retrieval, University of Massachusetts.

Cronen-Townsend, S., & Croft, W. B. (2002). Quantifying query ambiguity. In Proc. of Human Language Technology 2002 (pp. 94–98).

Cronen-Townsend, S., Zhou, Y., & Croft, W. B. (2002). Predicting query performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 299–306).

Diaz, F., & Jones, R. (2004). Using temporal profiles of queries for precision prediction. To Appear in Proceedings of the 27th Annual International ACM SIGIR.

Gibbons, J. D., & Chakraborty, S. (1992). Nonparametric statistical inference, 3rd ed. New York, New York: Marcel Dekker.

Krovetz, R. (1993). Viewing morpholgy as an inference process. In Proc. of the 16th Annual ACM SIGIR Conference (pp. 191–202).

Lavrenko, V. (2004). Personal communication.

Lavrenko, V., Allan, J., DeGuzman, E., LaFlamme, D., Pollard, V., & Thomas, S. (2002). Relevance models for topic detection and tracking. In Proc. of Human Language Technology 2002 (pp. 104–110).

Lavrenko, V., & Croft, W. B. (2001). Relevance-based language models. Research and Development in Information Retrieval (pp. 120–127).

Lavrenko, V., & Croft, W. B. (2003). Relevance models in information retrieval (pp. 11–56). Kluwer Academic.

Morton, T. (2001). Personal Communication.

Ogilvie, P., & Callan, J. (2002). Experiments using the Lemur toolkit. In Proc. of the Tenth Text Retrieval Conference, (TREC-10) (pp. 103–108).

Pirkola, A., & Jarvelin, K. (2001). Employing the resolution power of search keys. Journal of the American Society for Information Science and Technology, 52(7), 575–583.CrossRef

Robertson, S. (1984). On term selection for query expansion, Journal of Documentation, 46, 359–364.CrossRef

Rorvig, M. (2000). A new method of measurement for question difficulty. In Proceedings of the 2000 Annual Meeting of the American Society for Information Science, Knowledge Innovations, 37, 372–378.

Shah, C., & Croft, W. B. (2004). Evaluating high accuracy retrieval techniques. In SIGIR '04: Proceedings of the 27th Annual International Conference on Research and Development in Information Retrieval (pp. 2–9). ACM Press.

Song, F., & Croft, W. B. (1999). A general language model for information retrieval. In Proceedings of the 22nd Annual International ACM SIGIR Conference (pp. 279–280).

Sullivan, T. (2001). Locating question difficulty through explorations in question space. In Proceedings of the 1st ACM/IEEE Joint Conference on Digital Libraries (pp. 251–252).

Taneja, H. C., & Tuteja, R. K. (1984). Characterization of a quantitative-qualitative measure of relative information. Information Sciences, 33, 217–222.MATHCrossRefMathSciNet

Turpin, A., & Hersh, W. (2004). Do clarity scores for queries correlate with user performance? In Proc, of the Fifteenth Australian Database Conference (ADC2004) (pp. 85–91).

Voorhees, E. (2003). Overview of the TREC 2003 robust retrieval track. In Proceedings of the Twelth Text REtrieval Conference(TREC-2003) (pp. 195–201). NIST Special Publication 500–255, in press.

Voorhees, E. M. (2000). Overview of the TREC-9 question answering track, In E. Voorhees & D. Harman (Eds.) Proceedings of the Ninth Text REtrieval Conference(TREC-9). NIST Special Publication 500–249.

Voorhees, E. M. (2002). Overview of the TREC 2002 question answering track. In E. Voorhees (Ed.) Proceedings of the Eleventh Text REtrieval Conference(TREC-9). NIST Special Publication 500–251.

Xu, J., & Croft, W. B. (2000). Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems, 18(1), 79–112.CrossRef

Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. Research and Development in Information Retrieval (pp. 334–342).

Title: Precision prediction based on ranked list coherence
Authors: Steve Cronen-Townsend
Yun Zhou
W. Bruce Croft
Publication date: 01-12-2006
Publisher: Springer Netherlands
Published in: Discover Computing / Issue 6/2006
Print ISSN: 2948-2984
Electronic ISSN: 2948-2992
DOI: https://doi.org/10.1007/s10791-006-9006-4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 6/2006

Evaluating the effectiveness of content-oriented XML retrieval methods

A relatedness analysis of government regulations using domain knowledge and structural organization

Swedish full text retrieval: Effectiveness of different combinations of indexing strategies with query terms

Δ-distance: A family of dissimilarity metrics between images represented by multi-level feature vectors

Premium Partner