skip to main content
10.1145/1645953.1646115acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Classification-based resource selection

Published:02 November 2009Publication History

ABSTRACT

In some retrieval situations, a system must search across multiple collections. This task, referred to as federated search, occurs for example when searching a distributed index or aggregating content for web search. Resource selection refers to the subtask of deciding, given a query, which collections to search. Most existing resource selection methods rely on evidence found in collection content. We present an approach to resource selection that combines multiple sources of evidence to inform the selection decision. We derive evidence from three different sources: collection documents, the topic of the query, and query click-through data. We combine this evidence by treating resource selection as a multiclass machine learning problem. Although machine learned approaches often require large amounts of manually generated training data, we present a method for using automatically generated training data. We make use of and compare against prior resource selection work and evaluate across three experimental testbeds.

References

  1. J. Arguello, F. Diaz, J. Callan, and J.-F. Crespo. Sources of evidence for vertical selection. In SIGIR 2009, pages 315--322. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. M. Beitzel, E. C. Jensen, O. Frieder, D. D. Lewis, A. Chowdhury, and A. Kolcz. Improving automatic query classification via semi-supervised learning. In ICDM 2005, pages 42--49. IEEE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Bhattacharyya. On a measure of divergence between two statistical populations defined by probability distributions. Bull. Calcutta Math. Soc., 35:99--109, 1943.Google ScholarGoogle Scholar
  4. J. Callan and M. Connell. Query-based sampling of text databases. In TOIS. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. P. Callan, Z. Lu, and W. B. Croft. Searching distributed collections with inference networks. In SIGIR 1995, pages 21--28. ACM, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F. Diaz. Integration of news content into web results. In WSDM 2009, pages 182--191. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. Diaz and J. Arguello. Adaptation of online vertical selection predictions in the presence of user feedback. In SIGIR 2009, pages 323--330. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. T. Fallen and G. B. Newby. Partitioning the gov2 corpus by internet domain name: A result-set merging experiment. In TREC 2006, 2006.Google ScholarGoogle Scholar
  9. L. Gravano, H. Garcia-Molina, and A. Tomasic. Gloss: Text-source discovery over the internet. TOIS, 24:229--264, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. G. Ipeirotis and L. Gravano. Distributed search over the hidden web: hierarchical database sampling and selection. In VLDB 2002, pages 394--405. VLDB Endowment, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Je reys. An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 186(1007):453--461, 1946.Google ScholarGoogle ScholarCross RefCross Ref
  12. X. Li, Y.-Y. Wang, and A. Acero. Learning query intent from regularized click graphs. In SIGIR 2008, pages 339--346. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Li, Z. Zheng, and H. K. Dai. Kdd cup-2005 report: facing a great challenge. SIGKDD Explor. Newsl., 7(2):91--99, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Metzler. A markov random Field model for term dependencies. In SIGIR 2005, pages 472--479. ACM Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. F. Porter. An algorithm for suffix stripping. pages 313--316, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Seo and B. W. Croft. Blog site search using resource selection. In CIKM 2008, pages 1053--1062. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Shen, R. Pan, J.-T. Sun, J. J. Pan, K. Wu, J. Yin, and Q. Yang. Q2c@ust: our winning solution to query classification in kddcup 2005. SIGKDD Explor. Newsl., 7(2):100--110, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Shen, J.-T. Sun, Q. Yang, and Z. Chen. Building bridges for web query classification. In SIGIR 2006, pages 131--138. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Shokouhi. Central rank based collection selection in uncooperative distributed information retrieval. In ECIR 2007, pages 160--172. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Shokouhi, F. Scholer, and J. Zobel. Sample sizes for query probing in uncooperative distributed information retrieval. In APWeb 2006, pages 63--75. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Si and J. Callan. Relevant document distribution estimation method for resource selection. In SIGIR 2003, pages 298--305. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Si and J. Callan. Unified utility maximization framework for resource selection. In CIKM 2004, pages 32--41. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Si, R. Jin, J. Callan, and P. Ogilvie. A language modeling framework for resource selection and results merging. In CIKM 2002, pages 391--397. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Thomas and M. Shokouhi. Sushi: Scoring scaled samples for server selection. In SIGIR 2009. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J.-R. Wen, J.-Y. Nie, and H.-J. Zhang. Query clustering using content words and user feedback. In SIGIR 2001, pages 442--443. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Xu and W. B. Croft. Cluster-based language models for distributed retrieval. In SIGIR 1999, pages 254--261. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Classification-based resource selection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
      November 2009
      2162 pages
      ISBN:9781605585123
      DOI:10.1145/1645953

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 November 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader