research-article

Classification-based resource selection

Authors:
Jaime Arguello

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Jamie Callan

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Fernando Diaz

Yahoo!, Montreal, PQ, Canada

Yahoo!, Montreal, PQ, Canada
View Profile

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementNovember 2009Pages 1277–1286https://doi.org/10.1145/1645953.1646115

Published:02 November 2009Publication History

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Pages 1277–1286

ABSTRACT

In some retrieval situations, a system must search across multiple collections. This task, referred to as federated search, occurs for example when searching a distributed index or aggregating content for web search. Resource selection refers to the subtask of deciding, given a query, which collections to search. Most existing resource selection methods rely on evidence found in collection content. We present an approach to resource selection that combines multiple sources of evidence to inform the selection decision. We derive evidence from three different sources: collection documents, the topic of the query, and query click-through data. We combine this evidence by treating resource selection as a multiclass machine learning problem. Although machine learned approaches often require large amounts of manually generated training data, we present a method for using automatically generated training data. We make use of and compare against prior resource selection work and evaluate across three experimental testbeds.

References

J. Arguello, F. Diaz, J. Callan, and J.-F. Crespo. Sources of evidence for vertical selection. In SIGIR 2009, pages 315--322. ACM, 2009. Google ScholarDigital Library
S. M. Beitzel, E. C. Jensen, O. Frieder, D. D. Lewis, A. Chowdhury, and A. Kolcz. Improving automatic query classification via semi-supervised learning. In ICDM 2005, pages 42--49. IEEE, 2005. Google ScholarDigital Library
A. Bhattacharyya. On a measure of divergence between two statistical populations defined by probability distributions. Bull. Calcutta Math. Soc., 35:99--109, 1943.Google Scholar
J. Callan and M. Connell. Query-based sampling of text databases. In TOIS. ACM, 2001. Google ScholarDigital Library
J. P. Callan, Z. Lu, and W. B. Croft. Searching distributed collections with inference networks. In SIGIR 1995, pages 21--28. ACM, 1995. Google ScholarDigital Library
F. Diaz. Integration of news content into web results. In WSDM 2009, pages 182--191. ACM, 2009. Google ScholarDigital Library
F. Diaz and J. Arguello. Adaptation of online vertical selection predictions in the presence of user feedback. In SIGIR 2009, pages 323--330. ACM, 2009. Google ScholarDigital Library
C. T. Fallen and G. B. Newby. Partitioning the gov2 corpus by internet domain name: A result-set merging experiment. In TREC 2006, 2006.Google Scholar
L. Gravano, H. Garcia-Molina, and A. Tomasic. Gloss: Text-source discovery over the internet. TOIS, 24:229--264, 1999. Google ScholarDigital Library
P. G. Ipeirotis and L. Gravano. Distributed search over the hidden web: hierarchical database sampling and selection. In VLDB 2002, pages 394--405. VLDB Endowment, 2002. Google ScholarDigital Library
H. Je reys. An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 186(1007):453--461, 1946.Google ScholarCross Ref
X. Li, Y.-Y. Wang, and A. Acero. Learning query intent from regularized click graphs. In SIGIR 2008, pages 339--346. ACM, 2008. Google ScholarDigital Library
Y. Li, Z. Zheng, and H. K. Dai. Kdd cup-2005 report: facing a great challenge. SIGKDD Explor. Newsl., 7(2):91--99, 2005. Google ScholarDigital Library
D. Metzler. A markov random Field model for term dependencies. In SIGIR 2005, pages 472--479. ACM Press, 2005. Google ScholarDigital Library
M. F. Porter. An algorithm for suffix stripping. pages 313--316, 1997. Google ScholarDigital Library
J. Seo and B. W. Croft. Blog site search using resource selection. In CIKM 2008, pages 1053--1062. ACM, 2008. Google ScholarDigital Library
D. Shen, R. Pan, J.-T. Sun, J. J. Pan, K. Wu, J. Yin, and Q. Yang. Q2c@ust: our winning solution to query classification in kddcup 2005. SIGKDD Explor. Newsl., 7(2):100--110, 2005. Google ScholarDigital Library
D. Shen, J.-T. Sun, Q. Yang, and Z. Chen. Building bridges for web query classification. In SIGIR 2006, pages 131--138. ACM, 2006. Google ScholarDigital Library
M. Shokouhi. Central rank based collection selection in uncooperative distributed information retrieval. In ECIR 2007, pages 160--172. ACM, 2007. Google ScholarDigital Library
M. Shokouhi, F. Scholer, and J. Zobel. Sample sizes for query probing in uncooperative distributed information retrieval. In APWeb 2006, pages 63--75. Springer, 2006. Google ScholarDigital Library
L. Si and J. Callan. Relevant document distribution estimation method for resource selection. In SIGIR 2003, pages 298--305. ACM, 2003. Google ScholarDigital Library
L. Si and J. Callan. Unified utility maximization framework for resource selection. In CIKM 2004, pages 32--41. ACM, 2004. Google ScholarDigital Library
L. Si, R. Jin, J. Callan, and P. Ogilvie. A language modeling framework for resource selection and results merging. In CIKM 2002, pages 391--397. ACM, 2002. Google ScholarDigital Library
P. Thomas and M. Shokouhi. Sushi: Scoring scaled samples for server selection. In SIGIR 2009. ACM, 2009. Google ScholarDigital Library
J.-R. Wen, J.-Y. Nie, and H.-J. Zhang. Query clustering using content words and user feedback. In SIGIR 2001, pages 442--443. ACM, 2001. Google ScholarDigital Library
J. Xu and W. B. Croft. Cluster-based language models for distributed retrieval. In SIGIR 1999, pages 254--261. ACM, 1999. Google ScholarDigital Library

Index Terms

Classification-based resource selection
1. Information systems
  1. Information retrieval

Recommendations

A joint probabilistic classification model for resource selection
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Resource selection is an important task in Federated Search to select a small number of most relevant information sources. Current resource selection algorithms such as GlOSS, CORI, ReDDE, Geometric Average and the recent classification-based method ...
Read More
Sources of evidence for vertical selection
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Web search providers often include search services for domain-specific subcollections, called verticals, such as news, images, videos, job postings, company summaries, and artist profiles. We address the problem of vertical selection, predicting ...
Read More
A Set-Covering-Based Approach for Overlapping Resource Selection in Distributed Information Retrieval
CSIE '09: Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 04

Resource selection, also called server selection, collection selection or database selection, is a foundational problem in distributed information retrieval (DIR). This paper introduces a set-covering-based algorithm for resource selection in DIR, with ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
November 2009
2162 pages
ISBN:9781605585123
DOI:10.1145/1645953
General Chairs:
David Cheung
University of Hong Kong, Hong Kong
,
Il-Yeol Song
Drexel University, USA
,
Program Chairs:
Wesley Chu
UCLA, USA
,
Xiaohua Hu
Drexel University, USA
,
Jimmy Lin
University of Maryland, USA
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 November 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
distributed information retrieval
federated search
query classification
resource selection
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 52
  Total Citations
  View Citations
- 373
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Classification-based resource selection

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

A joint probabilistic classification model for resource selection

Sources of evidence for vertical selection

A Set-Covering-Based Approach for Overlapping Resource Selection in Distributed Information Retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Classification-based resource selection

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

A joint probabilistic classification model for resource selection

Sources of evidence for vertical selection

A Set-Covering-Based Approach for Overlapping Resource Selection in Distributed Information Retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media