Skip to main content
Top
Published in: Discover Computing 6/2006

01-12-2006

Precision prediction based on ranked list coherence

Authors: Steve Cronen-Townsend, Yun Zhou, W. Bruce Croft

Published in: Discover Computing | Issue 6/2006

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We introduce a statistical measure of the coherence of a list of documents called the clarity score. Starting with a document list ranked by the query-likelihood retrieval model, we demonstrate the score's relationship to query ambiguity with respect to the collection. We also show that the clarity score is correlated with the average precision of a query and lay the groundwork for useful predictions by discussing a method of setting decision thresholds automatically. We then show that passage-based clarity scores correlate with average-precision measures of ranked lists of passages, where a passage is judged relevant if it contains correct answer text, which extends the basic method to passage-based systems. Next, we introduce variants of document-based clarity scores to improve the robustness, applicability, and predictive ability of clarity scores. In particular, we introduce the ranked list clarity score that can be computed with only a ranked list of documents, and the weighted clarity score where query terms contribute more than other terms. Finally, we show an approach to predicting queries that perform poorly on query expansion that uses techniques expanding on the ideas presented earlier.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
2
Both from the TREC 9 Query Track and designed for TREC topic 96.
 
3
With answer modeling turned off.
 
4
Computed the analogous way to clarity score thresholds.
 
Literature
go back to reference Allan, J., & Raghavan, H. (2002). Using part of speech patterns to reduce query ambiguity. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 307–314). Allan, J., & Raghavan, H. (2002). Using part of speech patterns to reduce query ambiguity. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 307–314).
go back to reference Amati, G., Carpineto, C., & Romano, G. (2003). Fondazione ugo bordoni at TREC 2003: Robust and web track. In Proceedings of the Twelth Text REtrieval Conference(TREC 2003). NIST Special Publication 500–255, in press. Amati, G., Carpineto, C., & Romano, G. (2003). Fondazione ugo bordoni at TREC 2003: Robust and web track. In Proceedings of the Twelth Text REtrieval Conference(TREC 2003). NIST Special Publication 500–255, in press.
go back to reference Arndt, C. (2001). Information measures: information and its description in science and engineering. Berlin, New York: Springer. Arndt, C. (2001). Information measures: information and its description in science and engineering. Berlin, New York: Springer.
go back to reference Brants, T., Chen, F., & Tsochantaridis, I. (2002). Topic-based document segmentation with probabilistic latent semantic analysis. In Proceedings of the Eleventh International Conference on Information and Knowledge Management (pp. 211–218). ACM Press. Brants, T., Chen, F., & Tsochantaridis, I. (2002). Topic-based document segmentation with probabilistic latent semantic analysis. In Proceedings of the Eleventh International Conference on Information and Knowledge Management (pp. 211–218). ACM Press.
go back to reference Broglio, J., Callan, J. P., & Croft, W. B. (1994). INQUERY system overview. In Proc. TIPSTER Text Program (Phase I) (pp. 47–67). Morgan Kaufmann. Broglio, J., Callan, J. P., & Croft, W. B. (1994). INQUERY system overview. In Proc. TIPSTER Text Program (Phase I) (pp. 47–67). Morgan Kaufmann.
go back to reference Buckley, C. (2000). The TREC-9 query track. In E. Voorhees & D. Harman (Eds.) Proceedings of the Ninth Text REtrieval Conference(TREC-9) (pp. 500–249). NIST Special Publication. Buckley, C. (2000). The TREC-9 query track. In E. Voorhees & D. Harman (Eds.) Proceedings of the Ninth Text REtrieval Conference(TREC-9) (pp. 500–249). NIST Special Publication.
go back to reference Buckley, C. (n.d.). trec_eval information retrieval evaluation package. Available from ftp://ftp.cs.cornell.edu/pub/smart. Buckley, C. (n.d.). trec_eval information retrieval evaluation package. Available from ftp://ftp.cs.cornell.edu/pub/smart.
go back to reference Buckley, C., & Salton, G. (1995). Optimization of relevance feedback weights. In Proc. of the 18th Annual ACM SIGIR Conference (pp. 351–357). Buckley, C., & Salton, G. (1995). Optimization of relevance feedback weights. In Proc. of the 18th Annual ACM SIGIR Conference (pp. 351–357).
go back to reference Buckley, C., Salton, G., Allan, J., & Singhal, A. (1994). Automatic query expansion using SMART(TREC 3). In Text REtrieval Conference (pp. 69–80). NIST Special Publication 500-225. Buckley, C., Salton, G., Allan, J., & Singhal, A. (1994). Automatic query expansion using SMART(TREC 3). In Text REtrieval Conference (pp. 69–80). NIST Special Publication 500-225.
go back to reference Carpineto, C., de Mori, R., Romano, G., & Bigi, B. (2001). An information-theoretic approach to automatic query expansion. ACM Transactions on Information Systems, 19(1), 1–27. Carpineto, C., de Mori, R., Romano, G., & Bigi, B. (2001). An information-theoretic approach to automatic query expansion. ACM Transactions on Information Systems, 19(1), 1–27.
go back to reference Corrada, A., & Croft, W. B. (2004). Answer models for question answering passage retrieval. To Appear in Proceedings of the 27th Annual International ACM SIGIR. Corrada, A., & Croft, W. B. (2004). Answer models for question answering passage retrieval. To Appear in Proceedings of the 27th Annual International ACM SIGIR.
go back to reference Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York: Wiley-Interscience. Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York: Wiley-Interscience.
go back to reference Craswell, N., & Hawking, D. (2003). Overview of the TREC 2003 web track. In Proceedings of the Twelth Text REtrieval Conference(TREC 2003). NIST Special Publication 500–255, in press. Craswell, N., & Hawking, D. (2003). Overview of the TREC 2003 web track. In Proceedings of the Twelth Text REtrieval Conference(TREC 2003). NIST Special Publication 500–255, in press.
go back to reference Croft, W. B., Cronen-Townsend, S., & Lavrenko, V. (2001). Relevance feedback and personalization (A language modeling perspecvtive. DELOS Workshop: Personalisation and Recommender Systems in Digital Libraries. Croft, W. B., Cronen-Townsend, S., & Lavrenko, V. (2001). Relevance feedback and personalization (A language modeling perspecvtive. DELOS Workshop: Personalisation and Recommender Systems in Digital Libraries.
go back to reference Croft, W. B., & Lafferty, J. (Eds.) (2003). Language modeling for information retrieval. Dordrecht: Kluwer Academic. Croft, W. B., & Lafferty, J. (Eds.) (2003). Language modeling for information retrieval. Dordrecht: Kluwer Academic.
go back to reference Cronen-Townsend, S., Corrada-Emmanuel, A., & Croft, W. B. (2003). Predicting question effectiveness, Technical Report IR-282, Center for Intelligent Information Retrieval, University of Massachusetts. Cronen-Townsend, S., Corrada-Emmanuel, A., & Croft, W. B. (2003). Predicting question effectiveness, Technical Report IR-282, Center for Intelligent Information Retrieval, University of Massachusetts.
go back to reference Cronen-Townsend, S., & Croft, W. B. (2002). Quantifying query ambiguity. In Proc. of Human Language Technology 2002 (pp. 94–98). Cronen-Townsend, S., & Croft, W. B. (2002). Quantifying query ambiguity. In Proc. of Human Language Technology 2002 (pp. 94–98).
go back to reference Cronen-Townsend, S., Zhou, Y., & Croft, W. B. (2002). Predicting query performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 299–306). Cronen-Townsend, S., Zhou, Y., & Croft, W. B. (2002). Predicting query performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 299–306).
go back to reference Diaz, F., & Jones, R. (2004). Using temporal profiles of queries for precision prediction. To Appear in Proceedings of the 27th Annual International ACM SIGIR. Diaz, F., & Jones, R. (2004). Using temporal profiles of queries for precision prediction. To Appear in Proceedings of the 27th Annual International ACM SIGIR.
go back to reference Gibbons, J. D., & Chakraborty, S. (1992). Nonparametric statistical inference, 3rd ed. New York, New York: Marcel Dekker. Gibbons, J. D., & Chakraborty, S. (1992). Nonparametric statistical inference, 3rd ed. New York, New York: Marcel Dekker.
go back to reference Krovetz, R. (1993). Viewing morpholgy as an inference process. In Proc. of the 16th Annual ACM SIGIR Conference (pp. 191–202). Krovetz, R. (1993). Viewing morpholgy as an inference process. In Proc. of the 16th Annual ACM SIGIR Conference (pp. 191–202).
go back to reference Lavrenko, V. (2004). Personal communication. Lavrenko, V. (2004). Personal communication.
go back to reference Lavrenko, V., Allan, J., DeGuzman, E., LaFlamme, D., Pollard, V., & Thomas, S. (2002). Relevance models for topic detection and tracking. In Proc. of Human Language Technology 2002 (pp. 104–110). Lavrenko, V., Allan, J., DeGuzman, E., LaFlamme, D., Pollard, V., & Thomas, S. (2002). Relevance models for topic detection and tracking. In Proc. of Human Language Technology 2002 (pp. 104–110).
go back to reference Lavrenko, V., & Croft, W. B. (2001). Relevance-based language models. Research and Development in Information Retrieval (pp. 120–127). Lavrenko, V., & Croft, W. B. (2001). Relevance-based language models. Research and Development in Information Retrieval (pp. 120–127).
go back to reference Lavrenko, V., & Croft, W. B. (2003). Relevance models in information retrieval (pp. 11–56). Kluwer Academic. Lavrenko, V., & Croft, W. B. (2003). Relevance models in information retrieval (pp. 11–56). Kluwer Academic.
go back to reference Ogilvie, P., & Callan, J. (2002). Experiments using the Lemur toolkit. In Proc. of the Tenth Text Retrieval Conference, (TREC-10) (pp. 103–108). Ogilvie, P., & Callan, J. (2002). Experiments using the Lemur toolkit. In Proc. of the Tenth Text Retrieval Conference, (TREC-10) (pp. 103–108).
go back to reference Pirkola, A., & Jarvelin, K. (2001). Employing the resolution power of search keys. Journal of the American Society for Information Science and Technology, 52(7), 575–583.CrossRef Pirkola, A., & Jarvelin, K. (2001). Employing the resolution power of search keys. Journal of the American Society for Information Science and Technology, 52(7), 575–583.CrossRef
go back to reference Robertson, S. (1984). On term selection for query expansion, Journal of Documentation, 46, 359–364.CrossRef Robertson, S. (1984). On term selection for query expansion, Journal of Documentation, 46, 359–364.CrossRef
go back to reference Rorvig, M. (2000). A new method of measurement for question difficulty. In Proceedings of the 2000 Annual Meeting of the American Society for Information Science, Knowledge Innovations, 37, 372–378. Rorvig, M. (2000). A new method of measurement for question difficulty. In Proceedings of the 2000 Annual Meeting of the American Society for Information Science, Knowledge Innovations, 37, 372–378.
go back to reference Shah, C., & Croft, W. B. (2004). Evaluating high accuracy retrieval techniques. In SIGIR '04: Proceedings of the 27th Annual International Conference on Research and Development in Information Retrieval (pp. 2–9). ACM Press. Shah, C., & Croft, W. B. (2004). Evaluating high accuracy retrieval techniques. In SIGIR '04: Proceedings of the 27th Annual International Conference on Research and Development in Information Retrieval (pp. 2–9). ACM Press.
go back to reference Song, F., & Croft, W. B. (1999). A general language model for information retrieval. In Proceedings of the 22nd Annual International ACM SIGIR Conference (pp. 279–280). Song, F., & Croft, W. B. (1999). A general language model for information retrieval. In Proceedings of the 22nd Annual International ACM SIGIR Conference (pp. 279–280).
go back to reference Sullivan, T. (2001). Locating question difficulty through explorations in question space. In Proceedings of the 1st ACM/IEEE Joint Conference on Digital Libraries (pp. 251–252). Sullivan, T. (2001). Locating question difficulty through explorations in question space. In Proceedings of the 1st ACM/IEEE Joint Conference on Digital Libraries (pp. 251–252).
go back to reference Taneja, H. C., & Tuteja, R. K. (1984). Characterization of a quantitative-qualitative measure of relative information. Information Sciences, 33, 217–222.MATHCrossRefMathSciNet Taneja, H. C., & Tuteja, R. K. (1984). Characterization of a quantitative-qualitative measure of relative information. Information Sciences, 33, 217–222.MATHCrossRefMathSciNet
go back to reference Turpin, A., & Hersh, W. (2004). Do clarity scores for queries correlate with user performance? In Proc, of the Fifteenth Australian Database Conference (ADC2004) (pp. 85–91). Turpin, A., & Hersh, W. (2004). Do clarity scores for queries correlate with user performance? In Proc, of the Fifteenth Australian Database Conference (ADC2004) (pp. 85–91).
go back to reference Voorhees, E. (2003). Overview of the TREC 2003 robust retrieval track. In Proceedings of the Twelth Text REtrieval Conference(TREC-2003) (pp. 195–201). NIST Special Publication 500–255, in press. Voorhees, E. (2003). Overview of the TREC 2003 robust retrieval track. In Proceedings of the Twelth Text REtrieval Conference(TREC-2003) (pp. 195–201). NIST Special Publication 500–255, in press.
go back to reference Voorhees, E. M. (2000). Overview of the TREC-9 question answering track, In E. Voorhees & D. Harman (Eds.) Proceedings of the Ninth Text REtrieval Conference(TREC-9). NIST Special Publication 500–249. Voorhees, E. M. (2000). Overview of the TREC-9 question answering track, In E. Voorhees & D. Harman (Eds.) Proceedings of the Ninth Text REtrieval Conference(TREC-9). NIST Special Publication 500–249.
go back to reference Voorhees, E. M. (2002). Overview of the TREC 2002 question answering track. In E. Voorhees (Ed.) Proceedings of the Eleventh Text REtrieval Conference(TREC-9). NIST Special Publication 500–251. Voorhees, E. M. (2002). Overview of the TREC 2002 question answering track. In E. Voorhees (Ed.) Proceedings of the Eleventh Text REtrieval Conference(TREC-9). NIST Special Publication 500–251.
go back to reference Xu, J., & Croft, W. B. (2000). Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems, 18(1), 79–112.CrossRef Xu, J., & Croft, W. B. (2000). Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems, 18(1), 79–112.CrossRef
go back to reference Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. Research and Development in Information Retrieval (pp. 334–342). Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. Research and Development in Information Retrieval (pp. 334–342).
Metadata
Title
Precision prediction based on ranked list coherence
Authors
Steve Cronen-Townsend
Yun Zhou
W. Bruce Croft
Publication date
01-12-2006
Publisher
Springer Netherlands
Published in
Discover Computing / Issue 6/2006
Print ISSN: 2948-2984
Electronic ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-006-9006-4

Other articles of this Issue 6/2006

Discover Computing 6/2006 Go to the issue

Premium Partner