Skip to main content
Erschienen in: Discover Computing 6/2009

01.12.2009 | Reliable Information Access Workshop

On the number of terms used in automatic query expansion

verfasst von: Paul Ogilvie, Ellen Voorhees, Jamie Callan

Erschienen in: Discover Computing | Ausgabe 6/2009

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper investigates the number of expansion terms to use in automatic query expansion by examining the behavior of eight retrieval systems participating in the NRRC Reliable Information Access Workshop. The results demonstrate that current systems are able to obtain nearly all of the benefit of using a fixed number of expansion terms per topic, but significant additional improvement is possible if systems were able to accurately select the best number of expansion terms on a per topic basis. When optimizing average effectiveness as measured by mean average precision, using a fixed number of terms increases the score a large amount for a small number of topics but has little effect for most topics. The analysis further suggests that when a topic is helped by automatic feedback, the increase is from a set of terms that reinforce each other rather than from the system finding a single excellent term.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The Lemur Toolkit is publicly available for download at http://​www.​lemurproject.​org/​.
 
Literatur
Zurück zum Zitat Alemayehu, N. (2003). Analysis of performance variation using query expansion. Journal of the American Society for Information Science and Technology, 54(5), 379–391.CrossRef Alemayehu, N. (2003). Analysis of performance variation using query expansion. Journal of the American Society for Information Science and Technology, 54(5), 379–391.CrossRef
Zurück zum Zitat Billerbeck, B., & Zobel, J. (2004). Questioning query expansion: An examination of behaviour and parameters. In The fifteenth Australasian database conference, Dunedin, New Zealand (pp. 69–76). Darlinghurst, Australia: Australian Computer Society, Inc. Billerbeck, B., & Zobel, J. (2004). Questioning query expansion: An examination of behaviour and parameters. In The fifteenth Australasian database conference, Dunedin, New Zealand (pp. 69–76). Darlinghurst, Australia: Australian Computer Society, Inc.
Zurück zum Zitat Buckley, C. (1985). Implementation of the SMART information retrieval system. Technical Report 85-686, Ithaca, New York: Computer Science Department, Cornell University. Buckley, C. (1985). Implementation of the SMART information retrieval system. Technical Report 85-686, Ithaca, New York: Computer Science Department, Cornell University.
Zurück zum Zitat Buckley, C., Salton, G., & Allan, J. (1994). The effect of adding relevance information in a relevance feedback environment. In Proceedings of the seventeenth annual international ACM SIGIR conference, Dublin, Ireland (pp. 292–300). New York: Springer-Verlag New York, Inc. Buckley, C., Salton, G., & Allan, J. (1994). The effect of adding relevance information in a relevance feedback environment. In Proceedings of the seventeenth annual international ACM SIGIR conference, Dublin, Ireland (pp. 292–300). New York: Springer-Verlag New York, Inc.
Zurück zum Zitat Carmel, D., Farchi, E., Petruschka, Y., & Soffer, A. (2002). Automatic query refinement using lexical affinities with maximal information gain. In Proceedings of the 25th annual international ACM SIGIR conference, Tampere, Finland (pp. 283–290). New York: ACM. Carmel, D., Farchi, E., Petruschka, Y., & Soffer, A. (2002). Automatic query refinement using lexical affinities with maximal information gain. In Proceedings of the 25th annual international ACM SIGIR conference, Tampere, Finland (pp. 283–290). New York: ACM.
Zurück zum Zitat Clarke, C. L. A., Cormack, G. V., & Lynam, T. R. (2001). Exploiting redundancy in question answering. In Proceedings of the 24th annual international ACM SIGIR conference, New Orleans, Louisiana (pp. 358–365). New York: ACM. Clarke, C. L. A., Cormack, G. V., & Lynam, T. R. (2001). Exploiting redundancy in question answering. In Proceedings of the 24th annual international ACM SIGIR conference, New Orleans, Louisiana (pp. 358–365). New York: ACM.
Zurück zum Zitat Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman and Hall. Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman and Hall.
Zurück zum Zitat Efthimiadis, E. N. (1996). Query expansion. In Annual review of information science and technology (ARIST) (Vol. 31, pp. 121–187). Medford, NJ: Information Today. Efthimiadis, E. N. (1996). Query expansion. In Annual review of information science and technology (ARIST) (Vol. 31, pp. 121–187). Medford, NJ: Information Today.
Zurück zum Zitat Evans D. A., & Lefferts, R. G. (1994). Design and evaluation of the CLARIT-TREC-2 system. In The second text retrieval conference (TREC-2), NIST Special Publication 500-215, Gaithersburg, Maryland (pp. 137–150). Evans D. A., & Lefferts, R. G. (1994). Design and evaluation of the CLARIT-TREC-2 system. In The second text retrieval conference (TREC-2), NIST Special Publication 500-215, Gaithersburg, Maryland (pp. 137–150).
Zurück zum Zitat Evans, D. A., & Lefferts, R. G. (1995). CLARIT-TREC experiments. Information processing and management, 31(3), 385–395.CrossRef Evans, D. A., & Lefferts, R. G. (1995). CLARIT-TREC experiments. Information processing and management, 31(3), 385–395.CrossRef
Zurück zum Zitat Lavrenko, V. (2001) Relevance-based language models, In Proceedings of the 24th annual international ACM SIGIR conference, New Orleans, Louisiana (pp. 120–127). New York: ACM. Lavrenko, V. (2001) Relevance-based language models, In Proceedings of the 24th annual international ACM SIGIR conference, New Orleans, Louisiana (pp. 120–127). New York: ACM.
Zurück zum Zitat Milic-Frayling, N., Zhai, C., Tong, X., Jansen, P., & Evans, D. A. (1998). Experiments in query optimization. the CLARIT system TREC-6 report. In The sixth text retrieval conference (TREC-6). NIST Special Publication 500-240, Gaithersburg, Maryland (pp. 415–454). Milic-Frayling, N., Zhai, C., Tong, X., Jansen, P., & Evans, D. A. (1998). Experiments in query optimization. the CLARIT system TREC-6 report. In The sixth text retrieval conference (TREC-6). NIST Special Publication 500-240, Gaithersburg, Maryland (pp. 415–454).
Zurück zum Zitat Roberston, S. E. (1990) On term selection for query expansion. Journal of Documentation, 46(4), 359–364CrossRef Roberston, S. E. (1990) On term selection for query expansion. Journal of Documentation, 46(4), 359–364CrossRef
Zurück zum Zitat Robertson S. E., & Sparck Jones, K. (1976) Relevance weighting of search terms. Journal of the American Society for Information Science, 27, 129–146.CrossRef Robertson S. E., & Sparck Jones, K. (1976) Relevance weighting of search terms. Journal of the American Society for Information Science, 27, 129–146.CrossRef
Zurück zum Zitat Robertson, S. E., Walker, S., Sparck Jones, K., Hancock-Beaulieu, M. M., & Gatford, M. (1995). Okapi at TREC-3. In The third text retrieval conference (TREC-3). NIST Special Publication 500-225, Gaithersburg, Maryland. Robertson, S. E., Walker, S., Sparck Jones, K., Hancock-Beaulieu, M. M., & Gatford, M. (1995). Okapi at TREC-3. In The third text retrieval conference (TREC-3). NIST Special Publication 500-225, Gaithersburg, Maryland.
Zurück zum Zitat Sakai, T., & Robertson, S. E. (2001). Flexible pseudo-relevance feedback using optimization tables. In Proceedings of the 24th annual international ACM SIGIR conference, New Orleans, Louisiana (pp. 396–397). New York: ACM. Sakai, T., & Robertson, S. E. (2001). Flexible pseudo-relevance feedback using optimization tables. In Proceedings of the 24th annual international ACM SIGIR conference, New Orleans, Louisiana (pp. 396–397). New York: ACM.
Zurück zum Zitat Small, S., Strzalkowski, T., Liu, T., Shimizu, N., & Yamrom, B. (2004). A data driven approach to interactive question answering. In M. T. Maybury (ed.), New directions in question answering (pp. 129–140). AAAI/MIT. Small, S., Strzalkowski, T., Liu, T., Shimizu, N., & Yamrom, B. (2004). A data driven approach to interactive question answering. In M. T. Maybury (ed.), New directions in question answering (pp. 129–140). AAAI/MIT.
Zurück zum Zitat Strzalkowski, T., & Harabagiu, S. (2006). Advances in open-domain question answering. Springer Publishing Company, Incorporated. Strzalkowski, T., & Harabagiu, S. (2006). Advances in open-domain question answering. Springer Publishing Company, Incorporated.
Zurück zum Zitat Wasserman, L. (2004). All of statistics: A concise course in statistical inference. New York: Springer. Wasserman, L. (2004). All of statistics: A concise course in statistical inference. New York: Springer.
Zurück zum Zitat Williamson, D., Williamson, R., & Lesk, M. (1971). The Cornell implementation of the Smart system, Chapter 2. In G. Salton (Ed.), The SMART retrieval system: Experiments in automatic document processing (pp. 43–44). Prentice-Hall. Williamson, D., Williamson, R., & Lesk, M. (1971). The Cornell implementation of the Smart system, Chapter 2. In G. Salton (Ed.), The SMART retrieval system: Experiments in automatic document processing (pp. 43–44). Prentice-Hall.
Zurück zum Zitat Yeung, D. L., Clarke, C. L. A., Cormack, G. V., Lynam, T. R., & Terra, E. L. (2003). Task-specific query expansion. In The eleventh text retrieval conference (TREC-11), Gaithersburg, Maryland. Yeung, D. L., Clarke, C. L. A., Cormack, G. V., Lynam, T. R., & Terra, E. L. (2003). Task-specific query expansion. In The eleventh text retrieval conference (TREC-11), Gaithersburg, Maryland.
Zurück zum Zitat Zhai, C. (2002). Risk minization and language modeling in text retrieval. PhD thesis, Language Technologies Institute, Carnegie Mellon University. Zhai, C. (2002). Risk minization and language modeling in text retrieval. PhD thesis, Language Technologies Institute, Carnegie Mellon University.
Zurück zum Zitat Zhai, C., & Lafferty, J. (2001). Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the tenth international conference on information and knowledge management (CIKM-01), Atlanta, Georgia (pp. 395–410). New York: ACM. Zhai, C., & Lafferty, J. (2001). Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the tenth international conference on information and knowledge management (CIKM-01), Atlanta, Georgia (pp. 395–410). New York: ACM.
Metadaten
Titel
On the number of terms used in automatic query expansion
verfasst von
Paul Ogilvie
Ellen Voorhees
Jamie Callan
Publikationsdatum
01.12.2009
Verlag
Springer Netherlands
Erschienen in
Discover Computing / Ausgabe 6/2009
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-009-9104-1

Weitere Artikel der Ausgabe 6/2009

Discover Computing 6/2009 Zur Ausgabe

Reliable Information Access Workshop

Swapping documents and terms

Reliable Information Access Workshop

Overview of the Reliable Information Access Workshop

Reliable Information Access Workshop

A guide to the RIA workshop data archive

Reliable Information Access Workshop

Why current IR engines fail