Article

Unified utility maximization framework for resource selection

Authors:
Luo Si

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Jamie Callan

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge managementNovember 2004Pages 32–41https://doi.org/10.1145/1031171.1031180

Published:13 November 2004Publication History

CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management

Pages 32–41

ABSTRACT

This paper presents a unified utility framework for resource selection of distributed text information retrieval. This new framework shows an efficient and effective way to infer the probabilities of relevance of all the documents across the text databases. With the estimated relevance information, resource selection can be made by explicitly optimizing the goals of different applications. Specifically, when used for database recommendation, the selection is optimized for the goal of high-recall (include as many relevant documents as possible in the selected databases); when used for distributed document retrieval, the selection targets the high-precision goal (high precision in the final merged list of documents). This new model provides a more solid framework for distributed information retrieval. Empirical studies show that it is at least as effective as other state-of-the-art algorithms.

References

J. Callan. (2000). Distributed information retrieval. In W.B. Croft, editor, Advances in Information Retrieval. Kluwer Academic Publishers. (pp. 127--150).Google Scholar
J. Callan, W.B. Croft, and J. Broglio. (1995). TREC and TIPSTER experiments with INQUERY. Information Processing and Management, 31(3). (pp. 327--343). Google ScholarDigital Library
J. G. Conrad, X. S. Guo, P. Jackson and M. Meziou. (2002). Database selection using actual physical and acquired logical collection resources in a massive domain-specific operational environment. Distributed search over the hidden web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (VLDB). Google ScholarDigital Library
N. Craswell. (2000). Methods for distributed information retrieval. Ph. D. thesis, The Australian Nation University.Google Scholar
N. Craswell, D. Hawking, and P. Thistlewaite. (1999). Merging results from isolated search engines. In Proceedings of 10th Australasian Database Conference.Google Scholar
D. D'Souza, J. Thom, and J. Zobel. (2000). A comparison of techniques for selecting text collections. In Proceedings of the 11th Australasian Database Conference. Google ScholarDigital Library
N. Fuhr. (1999). A Decision-Theoretic approach to database selection in networked IR. ACM Transactions on Information Systems, 17(3). (pp. 229--249). Google ScholarDigital Library
L. Gravano, C. Chang, H. Garcia-Molina, and A. Paepcke. (1997). STARTS: Stanford proposal for internet meta-searching. In Proceedings of the 20th ACM-SIGMOD International Conference on Management of Data. Google ScholarDigital Library
L. Gravano, P. Ipeirotis and M. Sahami. (2003). QProber: A System for Automatic Classification of Hidden-Web Databases. ACM Transactions on Information Systems, 21(1). Google ScholarDigital Library
P. Ipeirotis and L. Gravano. (2002). Distributed search over the hidden web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (VLDB). Google ScholarDigital Library
InvisibleWeb.com. http://www.invisibleweb.comGoogle Scholar
The lemur toolkit. http://www.cs.cmu.edu/ lemurGoogle Scholar
J. Lu and J. Callan. (2003). Content-based information retrieval in peer-to-peer networks. In Proceedings of the 12th International Conference on Information and Knowledge Management. Google ScholarDigital Library
W. Meng, C.T. Yu and K.L. Liu. (2002) Building efficient and effective metasearch engines. ACM Comput. Surv. 34(1). Google ScholarDigital Library
H. Nottelmann and N. Fuhr. (2003). Evaluating different method of estimating retrieval quality for resource selection. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarDigital Library
H., Nottelmann and N., Fuhr. (2003). The MIND architecture for heterogeneous multimedia federated digital libraries. ACM SIGIR 2003 Workshop on Distributed Information Retrieval.Google Scholar
A.L. Powell, J.C. French, J. Callan, M. Connell, and C.L. Viles. (2000). The impact of database selection on distributed searching. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarDigital Library
A.L. Powell and J.C. French. (2003). Comparing the performance of database selection algorithms. ACM Transactions on Information Systems, 21(4). (pp. 412--456). Google ScholarDigital Library
C. Sherman (2001). Search for the invisible web. Guardian Unlimited.Google Scholar
L. Si and J. Callan. (2002). Using sampled data and regression to merge search engine results. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarDigital Library
L. Si and J. Callan. (2003). Relevant document distribution estimation method for resource selection. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarDigital Library
L. Si and J. Callan. (2003). A Semi-Supervised learning method to merge search engine results. ACM Transactions on Information Systems, 21(4). (pp. 457--491). Google ScholarDigital Library

Index Terms

Unified utility maximization framework for resource selection
1. Information systems
  1. Information retrieval

Recommendations

A Set-Covering-Based Approach for Overlapping Resource Selection in Distributed Information Retrieval
CSIE '09: Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 04

Resource selection, also called server selection, collection selection or database selection, is a foundational problem in distributed information retrieval (DIR). This paper introduces a set-covering-based algorithm for resource selection in DIR, with ...
Read More
Evaluating Document Retrieval Methods for Resource Selection in Clustered P2P IR
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Resource Selection (or Query Routing) is an important step in P2P IR. Though analogous to document retrieval in the sense of choosing a relevant subset of resources, resource selection methods have evolved independently from those for document ...
Read More
A semisupervised learning method to merge search engine results

The proliferation of searchable text databases on local area networks and the Internet causes the problem of finding information that may be distributed among many disjoint text databases (distributed information retrieval). How to merge the results ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management
November 2004
678 pages
ISBN:1581138741
DOI:10.1145/1031171
General Chair:
David Grossman
Illinois Institute of Technology
,
Program Chairs:
Luis Gravano
Columbia University
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign
,
Otthein Herzog
University of Bremen, Germany
,
David A. Evans
Clairvoyance Corporation
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 November 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
distributed information retrieval
resource selection
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 37
  Total Citations
  View Citations
- 639
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Unified utility maximization framework for resource selection

CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Set-Covering-Based Approach for Overlapping Resource Selection in Distributed Information Retrieval

Evaluating Document Retrieval Methods for Resource Selection in Clustered P2P IR

A semisupervised learning method to merge search engine results

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Unified utility maximization framework for resource selection

CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Set-Covering-Based Approach for Overlapping Resource Selection in Distributed Information Retrieval

Evaluating Document Retrieval Methods for Resource Selection in Clustered P2P IR

A semisupervised learning method to merge search engine results

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media