Skip to main content

Viewing Term Proximity from a Different Perspective

  • Conference paper
Advances in Information Retrieval (ECIR 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4956))

Included in the following conference series:

Abstract

This paper extends the state-of-the-art probabilistic model BM25 to utilize term proximity from a new perspective. Most previous work only consider dependencies between pairs of terms, and regard phrases as additional independent evidence. It is difficult to estimate the importance of a phrase and its extra contribution to a relevance score, as the phrase actually overlaps with the component terms. This paper proposes a new approach. First, query terms are grouped locally into non-overlapping phrases that may contain one or more query terms. Second, these phrases are not scored independently but are instead treated as providing a context for the component query terms. The relevance contribution of a term occurrence is measured by how many query terms occur in the context phrase and how compact they are. Third, we replace term frequency by the accumulated relevance contribution. Consequently, term proximity is easily integrated into the probabilistic model. Experimental results on TREC-10 and TREC-11 collections show stable improvements in terms of average precision and significant improvements in terms of top precisions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. In: Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia (1998)

    Google Scholar 

  2. Büttcher, S., Clarke, C.L.A., Lushman, B.: Term proximity scoring for ad-hoc retrieval on very large text collections. In: Proceedings of 29th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval (2006)

    Google Scholar 

  3. Clarke, C.L.A., Cormack, G.V., Burkowski, F.J.: Shortest substring ranking (multitext experiments for TREC-4). In: Proceedings of TREC-4 (1995)

    Google Scholar 

  4. Clarke, C.L.A., Cormack, G.V., Tudhope, E.A.: Relevance ranking for one to three term queries. Information Processing & Management 36(2), 291–311 (2000)

    Article  Google Scholar 

  5. Croft, W.B., Turtle, H.R., Lewis, D.D.: The use of phrases and structured queries in information retrieval. In: Proceedings of 14th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 32–45 (1991)

    Google Scholar 

  6. Croft, W.B.: Boolean queries and term dependencies in probabilistic retrieval models. JASIS 37(2), 71–77 (1986)

    Google Scholar 

  7. CSIRO, TREC Web Tracks home page, www.ted.cmis.csiro.au/TRECWeb/

  8. Fagan, J.L.: Automatic phrase indexing for document retrieval: An examination of syntactic and non-syntactic methods. In: Proceedings of 10th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 91–101 (1987)

    Google Scholar 

  9. Fox, C.: A stop list for general text. In: SIGIR Forum, December 1990, vol. 24(4), pp. 19–35. ACM Press, New York (1990)

    Google Scholar 

  10. Gao, J., Nie, J.-Y., Wu, G., Cao, G.: Dependence language model for information retrieval. In: Proceedings of 27th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 170–177 (2004)

    Google Scholar 

  11. Harman, D.K.: Overview of the fourth Text Retrieval Conference (TREC-4). In: Proceedings of TREC-4, pp. 1–24

    Google Scholar 

  12. Harper, D.J., van Rijsbergen, C.J.: An evaluation of feedback in document retrieval using co-occurrence data. Journal of Documentation 34, 189–216

    Google Scholar 

  13. Harper, D.J., van Rijsbergen, C.J.: An evaluation of feedback in document retrieval using co-occurrence data. Journal of Documentation 34, 189–216

    Google Scholar 

  14. Hawking, D., Thistlewaite, P.: Proximity operators - So near and yet so far. In: Proceedings of TREC-4, pp. 131–143 (1995)

    Google Scholar 

  15. Hawking, D., Thistlewaite, P.: Relevance weighting using distance between term occurrences. Computer Science Technical Report TR-CS-96-08, Australian National University (August 1996)

    Google Scholar 

  16. Losee Jr., R.M.: Term dependence: truncating the Bahadur Lazarsfeld expansion. Information Processing and Management 30, 293–303 (1994)

    Article  Google Scholar 

  17. Metzler, D., Croft, W.B.: A Markov random field model for term dependencies. In: Proceedings of 28th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 472–479 (2005)

    Google Scholar 

  18. Mishne, G., de Rijke, M.: Boosting web retrieval through query operations. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 502–516. Springer, Heidelberg (2005)

    Google Scholar 

  19. Nallapati, R., Allan, J.: Capturing term dependencies using a language model on sentence trees. In: Proceedings of the 2002 ACM CIKM Intl. Conf. on Information and Knowledge Management, pp. 383–390 (2002)

    Google Scholar 

  20. Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

  21. Pratt, E.J.: Complete poems. University of Toronto Press (1989)

    Google Scholar 

  22. Rasolofo, Y., Savoy, J.: Term proximity scoring for keyword-based retrieval systems. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 207–218. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  23. Robertson, S.E., Spark Jones, K.: Relevance weighting for search terms. Journal of the American Society for Information Science 27(3), 129–146 (1976)

    Article  Google Scholar 

  24. Robertson, S.E., Walker, S., Beaulieu, M.: Experimentation as a way of life: Okapi at TREC. Information Processing & Management 36(1), 95–108 (2000)

    Article  Google Scholar 

  25. Song, F., Croft, W.B.: A general language model for information retrieval. In: Proceedings of CIKM 1999, pp. 316–321 (1999)

    Google Scholar 

  26. Spink, A., Wolfram, D., Jansen, B.J., Saracevic, T.: Searching the Web: The public and their queries. Journal of the American Society for Information Science and Technology 52(3), 226–234 (2001)

    Article  Google Scholar 

  27. Srikanth, M., Srikanth, R.: Biterm language models for document retrieval. In: Proceedings of SIGIR 2002, pp. 425–426 (2002)

    Google Scholar 

  28. van Rijsbergen, C.J.: A theoretical basis for the use of cooccurrence data in retrieval. Journal of Documentation 33(2), 106–119 (1977)

    Article  Google Scholar 

  29. Yu, C.T., Buckley, C., Lam, K., Salton, G.: A generalized term dependence in information retrieval. Technical report (1983)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Craig Macdonald Iadh Ounis Vassilis Plachouras Ian Ruthven Ryen W. White

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Song, R., Taylor, M.J., Wen, JR., Hon, HW., Yu, Y. (2008). Viewing Term Proximity from a Different Perspective. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds) Advances in Information Retrieval. ECIR 2008. Lecture Notes in Computer Science, vol 4956. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78646-7_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78646-7_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78645-0

  • Online ISBN: 978-3-540-78646-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics