ROAR: increasing the flexibility and performance of distributed search

Authors:
Costin Raiciu

University College London, London, United Kingdom

University College London, London, United Kingdom
View Profile

,
Felipe Huici

NEC Europe Ltd., Heidelberg, Germany

NEC Europe Ltd., Heidelberg, Germany
View Profile

,
Mark Handley

University College London, London, United Kingdom

University College London, London, United Kingdom
View Profile

,
David S. Rosenblum

University College London, London, United Kingdom

University College London, London, United Kingdom
View Profile

SIGCOMM '09: Proceedings of the ACM SIGCOMM 2009 conference on Data communicationAugust 2009Pages 291–302https://doi.org/10.1145/1592568.1592603

Published:16 August 2009Publication History

SIGCOMM '09: Proceedings of the ACM SIGCOMM 2009 conference on Data communication

Pages 291–302

ABSTRACT

To search the web quickly, search engines partition the web index over many machines, and consult every partition when answering a query. To increase throughput, replicas are added for each of these machines. The key parameter of these algorithms is the trade-off between replication and partitioning: increasing the partitioning level improves query completion time since more servers handle the query, but may incur non-negligible startup costs for each sub-query. Finding the right operating point and adapting to it can significantly improve performance and reduce costs.

We introduce Rendezvous On a Ring (ROAR), a novel distributed algorithm that enables on-the-fly re-configuration of the partitioning level. ROAR can add and remove servers without stopping the system, cope with server failures, and provide good load-balancing even with a heterogeneous server pool. We demonstrate these claims using a privacy-preserving search application built upon ROAR.

References

Amazon Elastic Compute Cloud. http://aws.amazon.com/ec2/.Google Scholar
Google Docs. http://docs.google.com/.Google Scholar
The Google Search Query -- a technical look.Google Scholar
http://www.webmasterworld.com/google/3694079.htm.Google Scholar
R. H. Arpaci--Dusseau, E. Anderson, N. Treuhaft, D. E. Culler, J. M. Hellerstein, D. Patterson, and K. Yelick. Cluster i/o with river: making the fast case common. In Proc. Workshop on I/O in parallel and distributed systems, 1999. Google ScholarDigital Library
L. A. Barroso, J. Dean, and U. Holzle. Web search for a planet: The google cluster architecture. Micro, IEEE, 23, 2003. Google ScholarDigital Library
J. Bent, D. Thain, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and M. Livny. Explicit control a batch-aware distributed file system. In NSDI, 2004. Google ScholarDigital Library
A. R. Bharambe, M. Agrawal, and S. Seshan. Mercury: supporting scalable multi-attribute range queries. SIGCOMM Comput. Commun. Rev., 34(4), 2004. Google ScholarDigital Library
M. Cutts. Gadgets, Google, and SEO: Explaining algorithm updates and data refreshes, Dec. 2006.Google Scholar
J. Dean. Personal Communication. Google.Google Scholar
J. Dean. Challenges in Building Large Scale Information Systems. Keynote Presentation at ACM WSDM, 2009. Google ScholarDigital Library
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proc. OSDI, 2004. Google ScholarDigital Library
R. A. Ferreira, M. K. Ramanathan, A. Awan, A. Grama, and S. Jagannathan. Search with probabilistic guarantees in unstructured peer-to-peer networks. In Proc. P2P, 2005. Google ScholarDigital Library
J. Gao and P. Steenkiste. Design and evaluation of a distributed scalable content discovery system. IEEE JSAC, 22, Jan. 2004. Google ScholarDigital Library
S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In SOSP '03: Proceedings of the nineteenth ACM symposium on Operating systems principles, pages 29--43, New York, NY, USA, 2003. ACM Press. Google ScholarDigital Library
G. Graefe and D. L. Davison. Encapsulation of parallelism and architecture-independence in extensible database query execution. IEEE Trans.Softw. Eng., 19(8), 1993. Google ScholarDigital Library
B. Kroll and P. Widmayer. Distributing a search tree among a growing number of processors. SIGMOD Rec., 23(2), 1994. Google ScholarDigital Library
M. Mitzenmacher. The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst., 12(10):1094--1104, 2001. Google ScholarDigital Library
N. Patel. Learning from google's data centers. http://www.pronetadvertising.com/, 2006.Google Scholar
C. Raiciu and D. S. Rosenblum. Enabling confidentiality in content-based publish/subscribe infrastructures. In Proc. Securecomm, 2006.Google ScholarCross Ref
I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan. Chord: A scalable Peer-To-Peer lookup service for internet applications. In Proc. SIGCOMM, 2001. Google ScholarDigital Library
M. Stonebraker, P. M. Aoki, W. Litwin, A. Pfeffer, A. Sah, J. Sidell, C. Staelin, and A. Yu. Mariposa: a wide-area distributed database system. The VLDB Journal, 5(1), 1996. Google ScholarDigital Library
C. Tang, Z. Xu, and S. Dwarkadas. Peer-to-peer information retrieval using self-organizing semantic overlay networks. In Proc. Sigcomm, 2003. Google ScholarDigital Library
C. Tang, Z. Xu, and M. Mahalingam. psearch: information retrieval in structured overlays. SIGCOMM Comput. Commun. Rev., 33(1), 2003. Google ScholarDigital Library
W. W. Terpstra, S. Behnel, L. Fiege, J. Kangasharju, and A. Buchmann. Bit zipper Rendezvous Optimal data placement for general P2P queries. In Proc. EDBT Workshop on Peer-to-Peer Computing and DataBases, 2004.Google ScholarDigital Library
W. W. Terpstra, J. Kangasharju, C. Leng, and A. P. Buchmann. Bubblestorm: resilient, probabilistic, and exhaustive peer-to-peer search. In Proc. SIGCOMM, 2007. Google ScholarDigital Library
F. Tian and D. J. DeWitt. Tuple routing strategies for distributed eddies. In Proc. VLDB, 2003. Google ScholarDigital Library

Index Terms

ROAR: increasing the flexibility and performance of distributed search
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
2. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Distributed systems organizing principles

Recommendations

ROAR: increasing the flexibility and performance of distributed search
SIGCOMM '09

To search the web quickly, search engines partition the web index over many machines, and consult every partition when answering a query. To increase throughput, replicas are added for each of these machines. The key parameter of these algorithms is the ...
Read More
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

This work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Read More
Equivalence and minimization of conjunctive queries under combined semantics
ICDT '12: Proceedings of the 15th International Conference on Database Theory

The problems of query containment, equivalence, and minimization are fundamental problems in the context of query processing and optimization. In their classic work [2] published in 1977, Chandra and Merlin solved the three problems for the language of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGCOMM '09: Proceedings of the ACM SIGCOMM 2009 conference on Data communication
August 2009
340 pages
ISBN:9781605585949
DOI:10.1145/1592568
General Chairs:
Pablo Rodriguez
Telefonica Research, Spain
,
Ernst Biersack
Eurecom, France
,
Program Chairs:
Konstantina Papagiannaki
Intel Labs Pittsburgh, USA
,
Luigi Rizzo
Università di Pisa, Italy
ACM SIGCOMM Computer Communication Review Volume 39, Issue 4
SIGCOMM '09
October 2009
325 pages
ISSN:0146-4833
DOI:10.1145/1594977
Issue’s Table of Contents
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 August 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate554of3,547submissions,16%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 521
  Total Downloads
- Downloads (Last 12 months)35
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ROAR: increasing the flexibility and performance of distributed search

SIGCOMM '09: Proceedings of the ACM SIGCOMM 2009 conference on Data communication

ABSTRACT

References

Cited By

Index Terms

Recommendations

ROAR: increasing the flexibility and performance of distributed search

Re-ranking search results using query logs

Equivalence and minimization of conjunctive queries under combined semantics