skip to main content
10.1145/1390334.1390419acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Discovering key concepts in verbose queries

Published:20 July 2008Publication History

ABSTRACT

Current search engines do not, in general, perform well with longer, more verbose queries. One of the main issues in processing these queries is identifying the key concepts that will have the most impact on effectiveness. In this paper, we develop and evaluate a technique that uses query-dependent, corpus-dependent, and corpus-independent features for automatic extraction of key concepts from verbose queries. We show that our method achieves higher accuracy in the identification of key concepts than standard weighting methods such as inverse document frequency. Finally, we propose a probabilistic model for integrating the weighted key concepts identified by our method into a query, and demonstrate that this integration significantly improves retrieval effectiveness for a large set of natural language description queries derived from TREC topics on several newswire and web collections.

References

  1. J. Allan, M.E. Connell, W.B. Croft, F.F. Feng, D. Fisher, and X. Li. INQUERY and TREC-9. Proceedings of the Ninth Text Retrieval Conference (TREC-9), pages 551--562, 2000.Google ScholarGoogle Scholar
  2. James Allan, Jamie Callan, W. Bruce Croft, Lisa Ballesteros, John Broglio, Jinxi Xu, and Hongmin Shu. INQUERY at TREC-5. pages 119--132. NIST, 1997.Google ScholarGoogle Scholar
  3. L. Bentivogli and E. Pianta. Beyond lexical units: Enriching wordnets with phrasets. Proceedings of the Research Note Sessions of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL03), pages 67--70, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D.M. Bikel, R. Schwartz, and R.M. Weischedel. An Algorithm that Learns What's in a Name. Machine Learning, 34(1):211--231, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Thorsten Brants and Alex Franz. Web 1T 5-gram Version 1, 2006.Google ScholarGoogle Scholar
  6. Chris Buckley, Mandar Mitra, Janet A. Walz, and Claire Cardie. Using clustering and superconcepts within SMART: TREC 6. Information Processing and Management, 36(1):109--131, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. James P. Callan, W. Bruce Croft, and John Broglio. TREC and tipster experiments with INQUERY. Information Processing and Management, 31(3):327--343, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kenneth W. Church and William A. Gale. Poisson mixtures. Natural Language Engineering, 1(2):163--190, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  9. K. Collins-Thompson and J. Callan. Query expansion using random walk models. Proceedings of the 14th ACM international conference on Information and knowledge management, pages 704--711, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. Bruce Croft and John Lafferty, editors. Language Modeling for Information Retrieval. Number 13 in Information Retrieval Book Series. Kluwer, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J.F. da Silva, J. Mexia, C.A. Coelho, and J.G.P. Lopes. Document Clustering and Cluster Topic Extraction in Multilingual Corpora. Proceedings of the 2001 IEEE International Conference on Data Mining, pages 513--520, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Frank, G.W. Paynter, I.H. Witten, C. Gutwin, and C.G. Nevill-Manning. Domain-specific keyphrase extraction. Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99), pages 668--673, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Freund and R.E. Schapire. Experiments with a new boosting algorithm. Machine Learning: Proceedings of the Thirteenth International Conference, 148:156, 1996.Google ScholarGoogle Scholar
  14. Djoerd Hiemstra. Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term. In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 35--41. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Hulth. Improved automatic keyword extraction gmore linguistic knowledge. Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pages 216--223, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kevin Knight and Daniel Marcu. Statistics-based summarization - step one: Sentence compression. In AAAI/IAAI, pages 703--710, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Giridhar Kumaran and James Allan. A case for shorter queries, and helping user create them. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 220--227, 2006.Google ScholarGoogle Scholar
  18. O. Kurland and L. Lee. Corpus structure, language models, and ad hoc information retrieval. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 194--201, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hugo Liu. MontyLingua: An end-to-end natural language processor with common sense, 2004. Available at: web.media.mit.edu/ hugo/montylingua.Google ScholarGoogle Scholar
  20. X. Liu and W.B. Croft. Cluster-based retrieval using language models. Proceedings of the 27th annual international conference on Research and developement in information retrieval, pages 186--193, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Q. Mei, H. Fang, and C. Zhai. A study of poisson query generation model for information retrieval. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 319--326. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Metzler and W.B. Croft. A Markov random field model for term dependencies. Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 472--479, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Metzler and W.B. Croft. Latent concept expansion using markov random fields. Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 311--318, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Ogilvie and J. Callan. Experiments using the Lemur toolkit. Proceedings of the Tenth Text Retrieval Conference (TREC-10), pages 103--108, 2001.Google ScholarGoogle Scholar
  25. Jay M. Ponte and W. Bruce Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR, pages 275--281, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Porter. The Porter Stemming Algorithm. Accessible at http://www.tartarus.org/martin/PorterStemmer.Google ScholarGoogle Scholar
  27. Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval. Inf. Process. Manage., 24(5):513--523, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Strohman, D. Metzler, H. Turtle, and W.B. Croft. Indri: A language model-based search engine for complex queries. Proceedings of the International Conference on Intelligence Analysis, 2004.Google ScholarGoogle Scholar
  29. P.D. Turney. Learning Algorithms for Keyphrase Extraction. Information Retrieval, 2(4):303--336, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. X. Wei and W.B. Croft. LDA-based document models for ad-hoc retrieval. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 178--185, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. I.H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Xu and W.B. Croft. Query expansion using local and global document analysis. Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 4--11, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Wen T. Yih, Joshua Goodman, and Vitor R. Carvalho. Finding advertising keywords on web pages. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 213--222, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Y. Zhou and W.B. Croft. Query performance prediction in web search environments. Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 543--550, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Discovering key concepts in verbose queries

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
      July 2008
      934 pages
      ISBN:9781605581644
      DOI:10.1145/1390334

      Copyright © 2008 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 July 2008

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader