Skip to main content
Top
Published in: Discover Computing 5/2010

01-10-2010 | S.I.: Focused Retrieval and Result Aggr.

Statistical query expansion for sentence retrieval and its effects on weak and strong queries

Author: David E. Losada

Published in: Discover Computing | Issue 5/2010

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The retrieval of sentences that are relevant to a given information need is a challenging passage retrieval task. In this context, the well-known vocabulary mismatch problem arises severely because of the fine granularity of the task. Short queries, which are usually the rule rather than the exception, aggravate the problem. Consequently, effective sentence retrieval methods tend to apply some form of query expansion, usually based on pseudo-relevance feedback. Nevertheless, there are no extensive studies comparing different statistical expansion strategies for sentence retrieval. In this work we study thoroughly the effect of distinct statistical expansion methods on sentence retrieval. We start from a set of retrieved documents in which relevant sentences have to be found. In our experiments different term selection strategies are evaluated and we provide empirical evidence to show that expansion before sentence retrieval yields competitive performance. This is particularly novel because expansion for sentence retrieval is often done after sentence retrieval (i.e. expansion terms are mined from a ranked set of sentences) and there are no comparative results available between both types of expansion. Furthermore, this comparison is particularly valuable because there are important implications in time efficiency. We also carefully analyze expansion on weak and strong queries and demonstrate clearly that expanding queries before sentence retrieval is not only more convenient for efficiency purposes, but also more effective when handling poor queries.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Note that we focus here on sentence retrieval and we are not interested in novelty detection. In the literature, some linguistic-based approaches have performed well for removing redundant sentences but they were not state of the art techniques for sentence retrieval.
 
2
isf stands for inverse sentence frequency.
 
3
Adapted to the sentence retrieval case (e.g. idf is substituted by isf).
 
4
Along this work, we restrict our attention to these traditional expansion methods that can be naturally applied to a vector-space model such as tf/isf. Therefore, our conclusions with respect to the effect of expansion should be strictly understood in the context of tf/isf with PRF/LCA. More recent and formal expansion techniques, such as Relevance Models (Lavrenko and Croft 2001), have not been considered.
 
5
The complete list of topics chosen for the novelty track can be found in Harman (2002).
 
6
Along this work, we applied two different significance tests, the t-test and the Wilcoxon test, and we show only an asterisk when both tests agree on the significance of the difference (95% confidence level).
 
Literature
go back to reference Allan, J., Wade, C., & Bolivar, A. (2003). Retrieval and novelty detection at the sentence level. In Proceedings of SIGIR-03, the 26th ACM conference on research and development in information retrieval (pp. 314–321). Toronto, Canada: ACM press. Allan, J., Wade, C., & Bolivar, A. (2003). Retrieval and novelty detection at the sentence level. In Proceedings of SIGIR-03, the 26th ACM conference on research and development in information retrieval (pp. 314–321). Toronto, Canada: ACM press.
go back to reference Attar, R., & Fraenkel, A. (1977). Local feedback in full-text retrieval systems. Journal of the Association for Computing Machinery, 24(3), 397–417.MATH Attar, R., & Fraenkel, A. (1977). Local feedback in full-text retrieval systems. Journal of the Association for Computing Machinery, 24(3), 397–417.MATH
go back to reference Buckley, C., Singhal, A., Mitra, M., & Salton, G. (1996). New retrieval approaches using SMART: TREC 4. In D. Harman (Ed.), Proceedings of TREC-4 (pp. 25–48). Buckley, C., Singhal, A., Mitra, M., & Salton, G. (1996). New retrieval approaches using SMART: TREC 4. In D. Harman (Ed.), Proceedings of TREC-4 (pp. 25–48).
go back to reference Cronen-Townsend, S., Zhou, Y., & Croft, W. B. (2002). Predicting query performance. In Proceedings of SIGIR-2002, the 25th ACM conference on research and development in Information retrieval (pp. 299–306). Tampere, Finland. Cronen-Townsend, S., Zhou, Y., & Croft, W. B. (2002). Predicting query performance. In Proceedings of SIGIR-2002, the 25th ACM conference on research and development in Information retrieval (pp. 299–306). Tampere, Finland.
go back to reference Doi, T., Yamamoto, H., & Sumita, E. (2005). Example-based machine translation using efficient sentence retrieval based on edit-distance. ACM Transactions on Asian Language Information Processing, 4(4), 377–399.CrossRef Doi, T., Yamamoto, H., & Sumita, E. (2005). Example-based machine translation using efficient sentence retrieval based on edit-distance. ACM Transactions on Asian Language Information Processing, 4(4), 377–399.CrossRef
go back to reference Hauff, C., Azzopardi, L., & Hiemstra, D. (2009). The combination and evaluation of query performance prediction methods. In Proceedings of the 31st European conference on information retrieval research, ECIR-09 (pp. 301–312). Hauff, C., Azzopardi, L., & Hiemstra, D. (2009). The combination and evaluation of query performance prediction methods. In Proceedings of the 31st European conference on information retrieval research, ECIR-09 (pp. 301–312).
go back to reference Kirsh, D. (2000). A few thoughts on cognitive overload. Intellectia, 30, 19–51. Kirsh, D. (2000). A few thoughts on cognitive overload. Intellectia, 30, 19–51.
go back to reference Lavrenko, V., & Croft, W. B. (2001). Relevance-based language models. In Proceedings of 24th ACM conference on research and development in information retrieval, SIGIR’01 (pp. 120–127). New Orleans, USA. Lavrenko, V., & Croft, W. B. (2001). Relevance-based language models. In Proceedings of 24th ACM conference on research and development in information retrieval, SIGIR’01 (pp. 120–127). New Orleans, USA.
go back to reference Li, X., & Croft, B. (2005). Novelty detection based on sentence level patterns. In Proceedings of CIKM-2005, the ACM conference on information and knowledge management (pp. 314–321). Li, X., & Croft, B. (2005). Novelty detection based on sentence level patterns. In Proceedings of CIKM-2005, the ACM conference on information and knowledge management (pp. 314–321).
go back to reference Losada, D., & Fernández, R. T. (2007). Highly frequent terms and sentence retrieval. In Proceedings of 14th string processing and information retrieval symposium, SPIRE’07. Santiago de Chile. Losada, D., & Fernández, R. T. (2007). Highly frequent terms and sentence retrieval. In Proceedings of 14th string processing and information retrieval symposium, SPIRE’07. Santiago de Chile.
go back to reference Murdock, V. (2006). Aspects of sentence retrieval. Ph.D. thesis, University of Massachussetts. Murdock, V. (2006). Aspects of sentence retrieval. Ph.D. thesis, University of Massachussetts.
go back to reference Nobata, C., & Sekine, S. (1999). Towards automatic acquisition of patterns for information extraction. In Proceedings of international conference of computer processing of oriental languages (pp. 11–16). Nobata, C., & Sekine, S. (1999). Towards automatic acquisition of patterns for information extraction. In Proceedings of international conference of computer processing of oriental languages (pp. 11–16).
go back to reference Robertson, S., Walker, S., Jones, S., HancockBeaulieu, M., & Gatford, M. (1995). Okapi at TREC-3. In D. Harman (Ed.), Proceedings of TREC-3, the 3rd text retrieval conference (pp. 109–127). NIST. Robertson, S., Walker, S., Jones, S., HancockBeaulieu, M., & Gatford, M. (1995). Okapi at TREC-3. In D. Harman (Ed.), Proceedings of TREC-3, the 3rd text retrieval conference (pp. 109–127). NIST.
go back to reference Tombros, A., & Sanderson, M. (1998). Advantages of query biased summaries in information retrieval. In Proceedings of SIGIR-98, the 21st ACM international conference on research and development in information retrieval (pp. 2–10). ACM press. Tombros, A., & Sanderson, M. (1998). Advantages of query biased summaries in information retrieval. In Proceedings of SIGIR-98, the 21st ACM international conference on research and development in information retrieval (pp. 2–10). ACM press.
go back to reference Voorhees, E., & Harman, D. (Eds.). (2005). The TREC adhoc experiments. In TREC: Experiment and evaluation in information retrieval (pp. 79–97). Cambridge: The MIT press. Voorhees, E., & Harman, D. (Eds.). (2005). The TREC adhoc experiments. In TREC: Experiment and evaluation in information retrieval (pp. 79–97). Cambridge: The MIT press.
go back to reference White, R., Jose, J., & Ruthven, I. (2005). Using top-ranking sentences to facilitate effective information access. Journal of the American Society for Information Science and Technology (JASIST), 56(10), 1113–1125.CrossRef White, R., Jose, J., & Ruthven, I. (2005). Using top-ranking sentences to facilitate effective information access. Journal of the American Society for Information Science and Technology (JASIST), 56(10), 1113–1125.CrossRef
go back to reference Xu, J., & Croft, B. (1996). Query expansion using local and global document analysis. In Proceedings of SIGIR-96, the 19th ACM conference on research and development in information retrieval (pp. 4–11). Zurich, Switzerland. Xu, J., & Croft, B. (1996). Query expansion using local and global document analysis. In Proceedings of SIGIR-96, the 19th ACM conference on research and development in information retrieval (pp. 4–11). Zurich, Switzerland.
go back to reference Xu, J., & Croft, B. (2000). Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems, 18(1), 79–112CrossRef Xu, J., & Croft, B. (2000). Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems, 18(1), 79–112CrossRef
Metadata
Title
Statistical query expansion for sentence retrieval and its effects on weak and strong queries
Author
David E. Losada
Publication date
01-10-2010
Publisher
Springer Netherlands
Published in
Discover Computing / Issue 5/2010
Print ISSN: 2948-2984
Electronic ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-009-9122-z

Other articles of this Issue 5/2010

Discover Computing 5/2010 Go to the issue

Premium Partner