Skip to main content
Top
Published in: Discover Computing 6/2009

01-12-2009 | Reliable Information Access Workshop

On the number of terms used in automatic query expansion

Authors: Paul Ogilvie, Ellen Voorhees, Jamie Callan

Published in: Discover Computing | Issue 6/2009

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper investigates the number of expansion terms to use in automatic query expansion by examining the behavior of eight retrieval systems participating in the NRRC Reliable Information Access Workshop. The results demonstrate that current systems are able to obtain nearly all of the benefit of using a fixed number of expansion terms per topic, but significant additional improvement is possible if systems were able to accurately select the best number of expansion terms on a per topic basis. When optimizing average effectiveness as measured by mean average precision, using a fixed number of terms increases the score a large amount for a small number of topics but has little effect for most topics. The analysis further suggests that when a topic is helped by automatic feedback, the increase is from a set of terms that reinforce each other rather than from the system finding a single excellent term.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
The Lemur Toolkit is publicly available for download at http://​www.​lemurproject.​org/​.
 
Literature
go back to reference Alemayehu, N. (2003). Analysis of performance variation using query expansion. Journal of the American Society for Information Science and Technology, 54(5), 379–391.CrossRef Alemayehu, N. (2003). Analysis of performance variation using query expansion. Journal of the American Society for Information Science and Technology, 54(5), 379–391.CrossRef
go back to reference Billerbeck, B., & Zobel, J. (2004). Questioning query expansion: An examination of behaviour and parameters. In The fifteenth Australasian database conference, Dunedin, New Zealand (pp. 69–76). Darlinghurst, Australia: Australian Computer Society, Inc. Billerbeck, B., & Zobel, J. (2004). Questioning query expansion: An examination of behaviour and parameters. In The fifteenth Australasian database conference, Dunedin, New Zealand (pp. 69–76). Darlinghurst, Australia: Australian Computer Society, Inc.
go back to reference Buckley, C. (1985). Implementation of the SMART information retrieval system. Technical Report 85-686, Ithaca, New York: Computer Science Department, Cornell University. Buckley, C. (1985). Implementation of the SMART information retrieval system. Technical Report 85-686, Ithaca, New York: Computer Science Department, Cornell University.
go back to reference Buckley, C., Salton, G., & Allan, J. (1994). The effect of adding relevance information in a relevance feedback environment. In Proceedings of the seventeenth annual international ACM SIGIR conference, Dublin, Ireland (pp. 292–300). New York: Springer-Verlag New York, Inc. Buckley, C., Salton, G., & Allan, J. (1994). The effect of adding relevance information in a relevance feedback environment. In Proceedings of the seventeenth annual international ACM SIGIR conference, Dublin, Ireland (pp. 292–300). New York: Springer-Verlag New York, Inc.
go back to reference Carmel, D., Farchi, E., Petruschka, Y., & Soffer, A. (2002). Automatic query refinement using lexical affinities with maximal information gain. In Proceedings of the 25th annual international ACM SIGIR conference, Tampere, Finland (pp. 283–290). New York: ACM. Carmel, D., Farchi, E., Petruschka, Y., & Soffer, A. (2002). Automatic query refinement using lexical affinities with maximal information gain. In Proceedings of the 25th annual international ACM SIGIR conference, Tampere, Finland (pp. 283–290). New York: ACM.
go back to reference Clarke, C. L. A., Cormack, G. V., & Lynam, T. R. (2001). Exploiting redundancy in question answering. In Proceedings of the 24th annual international ACM SIGIR conference, New Orleans, Louisiana (pp. 358–365). New York: ACM. Clarke, C. L. A., Cormack, G. V., & Lynam, T. R. (2001). Exploiting redundancy in question answering. In Proceedings of the 24th annual international ACM SIGIR conference, New Orleans, Louisiana (pp. 358–365). New York: ACM.
go back to reference Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman and Hall. Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman and Hall.
go back to reference Efthimiadis, E. N. (1996). Query expansion. In Annual review of information science and technology (ARIST) (Vol. 31, pp. 121–187). Medford, NJ: Information Today. Efthimiadis, E. N. (1996). Query expansion. In Annual review of information science and technology (ARIST) (Vol. 31, pp. 121–187). Medford, NJ: Information Today.
go back to reference Evans D. A., & Lefferts, R. G. (1994). Design and evaluation of the CLARIT-TREC-2 system. In The second text retrieval conference (TREC-2), NIST Special Publication 500-215, Gaithersburg, Maryland (pp. 137–150). Evans D. A., & Lefferts, R. G. (1994). Design and evaluation of the CLARIT-TREC-2 system. In The second text retrieval conference (TREC-2), NIST Special Publication 500-215, Gaithersburg, Maryland (pp. 137–150).
go back to reference Evans, D. A., & Lefferts, R. G. (1995). CLARIT-TREC experiments. Information processing and management, 31(3), 385–395.CrossRef Evans, D. A., & Lefferts, R. G. (1995). CLARIT-TREC experiments. Information processing and management, 31(3), 385–395.CrossRef
go back to reference Lavrenko, V. (2001) Relevance-based language models, In Proceedings of the 24th annual international ACM SIGIR conference, New Orleans, Louisiana (pp. 120–127). New York: ACM. Lavrenko, V. (2001) Relevance-based language models, In Proceedings of the 24th annual international ACM SIGIR conference, New Orleans, Louisiana (pp. 120–127). New York: ACM.
go back to reference Milic-Frayling, N., Zhai, C., Tong, X., Jansen, P., & Evans, D. A. (1998). Experiments in query optimization. the CLARIT system TREC-6 report. In The sixth text retrieval conference (TREC-6). NIST Special Publication 500-240, Gaithersburg, Maryland (pp. 415–454). Milic-Frayling, N., Zhai, C., Tong, X., Jansen, P., & Evans, D. A. (1998). Experiments in query optimization. the CLARIT system TREC-6 report. In The sixth text retrieval conference (TREC-6). NIST Special Publication 500-240, Gaithersburg, Maryland (pp. 415–454).
go back to reference Roberston, S. E. (1990) On term selection for query expansion. Journal of Documentation, 46(4), 359–364CrossRef Roberston, S. E. (1990) On term selection for query expansion. Journal of Documentation, 46(4), 359–364CrossRef
go back to reference Robertson S. E., & Sparck Jones, K. (1976) Relevance weighting of search terms. Journal of the American Society for Information Science, 27, 129–146.CrossRef Robertson S. E., & Sparck Jones, K. (1976) Relevance weighting of search terms. Journal of the American Society for Information Science, 27, 129–146.CrossRef
go back to reference Robertson, S. E., Walker, S., Sparck Jones, K., Hancock-Beaulieu, M. M., & Gatford, M. (1995). Okapi at TREC-3. In The third text retrieval conference (TREC-3). NIST Special Publication 500-225, Gaithersburg, Maryland. Robertson, S. E., Walker, S., Sparck Jones, K., Hancock-Beaulieu, M. M., & Gatford, M. (1995). Okapi at TREC-3. In The third text retrieval conference (TREC-3). NIST Special Publication 500-225, Gaithersburg, Maryland.
go back to reference Sakai, T., & Robertson, S. E. (2001). Flexible pseudo-relevance feedback using optimization tables. In Proceedings of the 24th annual international ACM SIGIR conference, New Orleans, Louisiana (pp. 396–397). New York: ACM. Sakai, T., & Robertson, S. E. (2001). Flexible pseudo-relevance feedback using optimization tables. In Proceedings of the 24th annual international ACM SIGIR conference, New Orleans, Louisiana (pp. 396–397). New York: ACM.
go back to reference Small, S., Strzalkowski, T., Liu, T., Shimizu, N., & Yamrom, B. (2004). A data driven approach to interactive question answering. In M. T. Maybury (ed.), New directions in question answering (pp. 129–140). AAAI/MIT. Small, S., Strzalkowski, T., Liu, T., Shimizu, N., & Yamrom, B. (2004). A data driven approach to interactive question answering. In M. T. Maybury (ed.), New directions in question answering (pp. 129–140). AAAI/MIT.
go back to reference Strzalkowski, T., & Harabagiu, S. (2006). Advances in open-domain question answering. Springer Publishing Company, Incorporated. Strzalkowski, T., & Harabagiu, S. (2006). Advances in open-domain question answering. Springer Publishing Company, Incorporated.
go back to reference Wasserman, L. (2004). All of statistics: A concise course in statistical inference. New York: Springer. Wasserman, L. (2004). All of statistics: A concise course in statistical inference. New York: Springer.
go back to reference Williamson, D., Williamson, R., & Lesk, M. (1971). The Cornell implementation of the Smart system, Chapter 2. In G. Salton (Ed.), The SMART retrieval system: Experiments in automatic document processing (pp. 43–44). Prentice-Hall. Williamson, D., Williamson, R., & Lesk, M. (1971). The Cornell implementation of the Smart system, Chapter 2. In G. Salton (Ed.), The SMART retrieval system: Experiments in automatic document processing (pp. 43–44). Prentice-Hall.
go back to reference Yeung, D. L., Clarke, C. L. A., Cormack, G. V., Lynam, T. R., & Terra, E. L. (2003). Task-specific query expansion. In The eleventh text retrieval conference (TREC-11), Gaithersburg, Maryland. Yeung, D. L., Clarke, C. L. A., Cormack, G. V., Lynam, T. R., & Terra, E. L. (2003). Task-specific query expansion. In The eleventh text retrieval conference (TREC-11), Gaithersburg, Maryland.
go back to reference Zhai, C. (2002). Risk minization and language modeling in text retrieval. PhD thesis, Language Technologies Institute, Carnegie Mellon University. Zhai, C. (2002). Risk minization and language modeling in text retrieval. PhD thesis, Language Technologies Institute, Carnegie Mellon University.
go back to reference Zhai, C., & Lafferty, J. (2001). Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the tenth international conference on information and knowledge management (CIKM-01), Atlanta, Georgia (pp. 395–410). New York: ACM. Zhai, C., & Lafferty, J. (2001). Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the tenth international conference on information and knowledge management (CIKM-01), Atlanta, Georgia (pp. 395–410). New York: ACM.
Metadata
Title
On the number of terms used in automatic query expansion
Authors
Paul Ogilvie
Ellen Voorhees
Jamie Callan
Publication date
01-12-2009
Publisher
Springer Netherlands
Published in
Discover Computing / Issue 6/2009
Print ISSN: 2948-2984
Electronic ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-009-9104-1

Other articles of this Issue 6/2009

Discover Computing 6/2009 Go to the issue

Reliable Information Access Workshop

Overview of the Reliable Information Access Workshop

Reliable Information Access Workshop

Swapping documents and terms

Reliable Information Access Workshop

A guide to the RIA workshop data archive

Reliable Information Access Workshop

Why current IR engines fail

Premium Partner