short-paper

Federated search in the wild: the combined power of over a hundred search engines

Authors:
Dong Nguyen

University of Twente, Enschede, Netherlands

University of Twente, Enschede, Netherlands
View Profile

,
Thomas Demeester

Ghent University, Ghent, Belgium

Ghent University, Ghent, Belgium
View Profile

,
Dolf Trieschnigg

University of Twente, Enschede, Netherlands

University of Twente, Enschede, Netherlands
View Profile

,
Djoerd Hiemstra

University of Twente, Enschede, Netherlands

University of Twente, Enschede, Netherlands
View Profile

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementOctober 2012Pages 1874–1878https://doi.org/10.1145/2396761.2398535

Published:29 October 2012Publication History

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Pages 1874–1878

ABSTRACT

Federated search has the potential of improving web search: the user becomes less dependent on a single search provider and parts of the deep web become available through a unified interface, leading to a wider variety in the retrieved search results. However, a publicly available dataset for federated search reflecting an actual web environment has been absent. As a result, it has been difficult to assess whether proposed systems are suitable for the web setting. We introduce a new test collection containing the results from more than a hundred actual search engines, ranging from large general web search engines such as Google and Bing to small domain-specific engines. We discuss the design and analyze the effect of several sampling methods. For a set of test queries, we collected relevance judgements for the top 10 results of each search engine. The dataset is publicly available and is useful for researchers interested in resource selection for web search collections, result merging and size estimation of uncooperative resources.

References

J. Arguello, F. Diaz, J. Callan, and J.-F. Crespo. Sources of evidence for vertical selection. In SIGIR 2009, pages 315--322. ACM, 2009. Google ScholarDigital Library
J. Callan. Advances in Information Retrieval, chapter Distributed information retrieval, pages 127--150. Kluwer Academic Publishers, 2000.Google Scholar
J. Callan and M. Connell. Query-based sampling of text databases. ACM Trans. Inf. Syst., 19:97--130, April 2001. Google ScholarDigital Library
C. L. A. Clarke, N. Craswell, I. Soboroff, and G. V. Cormack. Overview of the trec 2010 web track. In TREC, 2010.Google Scholar
N. Craswell, P. Bailey, and D. Hawking. Server selection on the world wide web. In Proceedings of the fifth ACM conference on Digital libraries, DL'00, pages 37--46. ACM, 2000. Google ScholarDigital Library
D. Hawking and P. Thomas. Server selection methods in hybrid portal search. In SIGIR 2005, pages 75--82. ACM, 2005. Google ScholarDigital Library
P. G. Ipeirotis and L. Gravano. Distributed search over the hidden web: hierarchical database sampling and selection. In VLDB 2002, pages 394--405. VLDB Endowment, 2002. Google ScholarDigital Library
G. Monroe, J. French, and A. Powell. Obtaining language models of web collections using query-based sampling techniques. In Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS'02)-Volume 3 - Volume 3, HICSS'02, pages 67.2--, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarDigital Library
G. Pass, A. Chowdhury, and C. Torgeson. A picture of search. In InfoScale 2006. ACM, 2006. Google ScholarDigital Library
A. L. Powell and J. C. French. Comparing the performance of collection selection algorithms. ACM Trans. Inf. Syst., 21(4):412--456, Oct. 2003. Google ScholarDigital Library
S. Raghavan and H. Garcia-Molina. Crawling the hidden web. In VLDB 2001. ACM, 2001. Google ScholarDigital Library
M. Shokouhi and L. Si. Federated search. Foundations and Trends in Information Retrieval, 5(1):1--102, 2011. Google ScholarDigital Library
M. Shokouhi and J. Zobel. Federated text retrieval from uncooperative overlapped collections. In SIGIR 2007, pages 495--502. ACM, 2007. Google ScholarDigital Library
L. Si and J. Callan. Relevant document distribution estimation method for resource selection. In SIGIR 2003, pages 298--305. ACM, 2003. Google ScholarDigital Library
P. Thomas and D. Hawking. Server selection methods in personal metasearch: a comparative empirical study. Inf. Retr., 12:581--604, October 2009. Google ScholarDigital Library
A. S. Tigelaar and D. Hiemstra. Query-based sampling using snippets. In Eighth Workshop on Large-Scale Distributed Systems for Information Retrieval, Geneva, Switzerland, volume 630 of CEUR Workshop Proceedings, pages 9--14, Aachen, Germany, July 2010. CEUR-WS.Google Scholar
R. B. Trieschnigg, K. T. T. E. Tjin-Kam-Jet, and D. Hiemstra. Ranking XPaths for extracting search result records. Technical Report TR-CTIT-12-08, Centre for Telematics and Information Technology, University of Twente, Enschede, March 2012.Google Scholar
E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing and Management, 36:697--716, 2000. Google ScholarDigital Library
K. Zhou, R. Cummins, M. Lalmas, and J. Jose. Evaluating large-scale distributed vertical search. In Proceedings of the 9th workshop on Large-scale and distributed informational retrieval, LSDS-IR'11, pages 9--14. ACM, 2011. Google ScholarDigital Library
J. Zobel and J. A. Thom. Is CORI effective for collection selection? an exploration of parameters, queries, and data. In in 'Proceedings of Australian Document Computing Symposium', pages 41--46, 2004.Google Scholar

Index Terms

Federated search in the wild: the combined power of over a hundred search engines
1. Information systems
  1. Information retrieval

Recommendations

Federated Search

Federated search (federated information retrieval or distributed information retrieval) is a technique for searching multiple text collections simultaneously. Queries are submitted to a subset of collections that are most likely to return relevant ...
Read More
From federated to aggregated search
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Federated search refers to the brokered retrieval of content from a set of auxiliary retrieval systems instead of from a single, centralized retrieval system. Federated search tasks occur in, for example, digital libraries (where documents from several ...
Read More
Improving Ranking Consistency for Web Search by Leveraging a Knowledge Base and Search Logs
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

In this paper, we propose a new idea called ranking consistency in web search. Relevance ranking is one of the biggest problems in creating an effective web search system. Given some queries with similar search intents, conventional approaches typically ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
October 2012
2840 pages
ISBN:9781450311564
DOI:10.1145/2396761
General Chair:
Xuewen Chen
Wayne State University, USA
,
Program Chairs:
Guy Lebanon
Georgia Institute of Technology
,
Haixun Wang
Microsoft Research Asia
,
Mohammed J. Zaki
Rensselaer Polytechnic Institute
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dataset
distributed information retrieval
evaluation
federated search
test collection
web search
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 325
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Federated search in the wild: the combined power of over a hundred search engines

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Federated Search

From federated to aggregated search

Improving Ranking Consistency for Web Search by Leveraging a Knowledge Base and Search Logs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Federated search in the wild: the combined power of over a hundred search engines

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Federated Search

From federated to aggregated search

Improving Ranking Consistency for Web Search by Leveraging a Knowledge Base and Search Logs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media