Abstract
This paper extends the state-of-the-art probabilistic model BM25 to utilize term proximity from a new perspective. Most previous work only consider dependencies between pairs of terms, and regard phrases as additional independent evidence. It is difficult to estimate the importance of a phrase and its extra contribution to a relevance score, as the phrase actually overlaps with the component terms. This paper proposes a new approach. First, query terms are grouped locally into non-overlapping phrases that may contain one or more query terms. Second, these phrases are not scored independently but are instead treated as providing a context for the component query terms. The relevance contribution of a term occurrence is measured by how many query terms occur in the context phrase and how compact they are. Third, we replace term frequency by the accumulated relevance contribution. Consequently, term proximity is easily integrated into the probabilistic model. Experimental results on TREC-10 and TREC-11 collections show stable improvements in terms of average precision and significant improvements in terms of top precisions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. In: Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia (1998)
Büttcher, S., Clarke, C.L.A., Lushman, B.: Term proximity scoring for ad-hoc retrieval on very large text collections. In: Proceedings of 29th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval (2006)
Clarke, C.L.A., Cormack, G.V., Burkowski, F.J.: Shortest substring ranking (multitext experiments for TREC-4). In: Proceedings of TREC-4 (1995)
Clarke, C.L.A., Cormack, G.V., Tudhope, E.A.: Relevance ranking for one to three term queries. Information Processing & Management 36(2), 291–311 (2000)
Croft, W.B., Turtle, H.R., Lewis, D.D.: The use of phrases and structured queries in information retrieval. In: Proceedings of 14th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 32–45 (1991)
Croft, W.B.: Boolean queries and term dependencies in probabilistic retrieval models. JASIS 37(2), 71–77 (1986)
CSIRO, TREC Web Tracks home page, www.ted.cmis.csiro.au/TRECWeb/
Fagan, J.L.: Automatic phrase indexing for document retrieval: An examination of syntactic and non-syntactic methods. In: Proceedings of 10th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 91–101 (1987)
Fox, C.: A stop list for general text. In: SIGIR Forum, December 1990, vol. 24(4), pp. 19–35. ACM Press, New York (1990)
Gao, J., Nie, J.-Y., Wu, G., Cao, G.: Dependence language model for information retrieval. In: Proceedings of 27th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 170–177 (2004)
Harman, D.K.: Overview of the fourth Text Retrieval Conference (TREC-4). In: Proceedings of TREC-4, pp. 1–24
Harper, D.J., van Rijsbergen, C.J.: An evaluation of feedback in document retrieval using co-occurrence data. Journal of Documentation 34, 189–216
Harper, D.J., van Rijsbergen, C.J.: An evaluation of feedback in document retrieval using co-occurrence data. Journal of Documentation 34, 189–216
Hawking, D., Thistlewaite, P.: Proximity operators - So near and yet so far. In: Proceedings of TREC-4, pp. 131–143 (1995)
Hawking, D., Thistlewaite, P.: Relevance weighting using distance between term occurrences. Computer Science Technical Report TR-CS-96-08, Australian National University (August 1996)
Losee Jr., R.M.: Term dependence: truncating the Bahadur Lazarsfeld expansion. Information Processing and Management 30, 293–303 (1994)
Metzler, D., Croft, W.B.: A Markov random field model for term dependencies. In: Proceedings of 28th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 472–479 (2005)
Mishne, G., de Rijke, M.: Boosting web retrieval through query operations. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 502–516. Springer, Heidelberg (2005)
Nallapati, R., Allan, J.: Capturing term dependencies using a language model on sentence trees. In: Proceedings of the 2002 ACM CIKM Intl. Conf. on Information and Knowledge Management, pp. 383–390 (2002)
Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Pratt, E.J.: Complete poems. University of Toronto Press (1989)
Rasolofo, Y., Savoy, J.: Term proximity scoring for keyword-based retrieval systems. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 207–218. Springer, Heidelberg (2003)
Robertson, S.E., Spark Jones, K.: Relevance weighting for search terms. Journal of the American Society for Information Science 27(3), 129–146 (1976)
Robertson, S.E., Walker, S., Beaulieu, M.: Experimentation as a way of life: Okapi at TREC. Information Processing & Management 36(1), 95–108 (2000)
Song, F., Croft, W.B.: A general language model for information retrieval. In: Proceedings of CIKM 1999, pp. 316–321 (1999)
Spink, A., Wolfram, D., Jansen, B.J., Saracevic, T.: Searching the Web: The public and their queries. Journal of the American Society for Information Science and Technology 52(3), 226–234 (2001)
Srikanth, M., Srikanth, R.: Biterm language models for document retrieval. In: Proceedings of SIGIR 2002, pp. 425–426 (2002)
van Rijsbergen, C.J.: A theoretical basis for the use of cooccurrence data in retrieval. Journal of Documentation 33(2), 106–119 (1977)
Yu, C.T., Buckley, C., Lam, K., Salton, G.: A generalized term dependence in information retrieval. Technical report (1983)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Song, R., Taylor, M.J., Wen, JR., Hon, HW., Yu, Y. (2008). Viewing Term Proximity from a Different Perspective. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds) Advances in Information Retrieval. ECIR 2008. Lecture Notes in Computer Science, vol 4956. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78646-7_32
Download citation
DOI: https://doi.org/10.1007/978-3-540-78646-7_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78645-0
Online ISBN: 978-3-540-78646-7
eBook Packages: Computer ScienceComputer Science (R0)