skip to main content
10.1145/1592568.1592603acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free Access

ROAR: increasing the flexibility and performance of distributed search

Published:16 August 2009Publication History

ABSTRACT

To search the web quickly, search engines partition the web index over many machines, and consult every partition when answering a query. To increase throughput, replicas are added for each of these machines. The key parameter of these algorithms is the trade-off between replication and partitioning: increasing the partitioning level improves query completion time since more servers handle the query, but may incur non-negligible startup costs for each sub-query. Finding the right operating point and adapting to it can significantly improve performance and reduce costs.

We introduce Rendezvous On a Ring (ROAR), a novel distributed algorithm that enables on-the-fly re-configuration of the partitioning level. ROAR can add and remove servers without stopping the system, cope with server failures, and provide good load-balancing even with a heterogeneous server pool. We demonstrate these claims using a privacy-preserving search application built upon ROAR.

References

  1. Amazon Elastic Compute Cloud. http://aws.amazon.com/ec2/.Google ScholarGoogle Scholar
  2. Google Docs. http://docs.google.com/.Google ScholarGoogle Scholar
  3. The Google Search Query -- a technical look.Google ScholarGoogle Scholar
  4. http://www.webmasterworld.com/google/3694079.htm.Google ScholarGoogle Scholar
  5. R. H. Arpaci--Dusseau, E. Anderson, N. Treuhaft, D. E. Culler, J. M. Hellerstein, D. Patterson, and K. Yelick. Cluster i/o with river: making the fast case common. In Proc. Workshop on I/O in parallel and distributed systems, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. A. Barroso, J. Dean, and U. Holzle. Web search for a planet: The google cluster architecture. Micro, IEEE, 23, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Bent, D. Thain, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and M. Livny. Explicit control a batch-aware distributed file system. In NSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. R. Bharambe, M. Agrawal, and S. Seshan. Mercury: supporting scalable multi-attribute range queries. SIGCOMM Comput. Commun. Rev., 34(4), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Cutts. Gadgets, Google, and SEO: Explaining algorithm updates and data refreshes, Dec. 2006.Google ScholarGoogle Scholar
  10. J. Dean. Personal Communication. Google.Google ScholarGoogle Scholar
  11. J. Dean. Challenges in Building Large Scale Information Systems. Keynote Presentation at ACM WSDM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proc. OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. A. Ferreira, M. K. Ramanathan, A. Awan, A. Grama, and S. Jagannathan. Search with probabilistic guarantees in unstructured peer-to-peer networks. In Proc. P2P, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Gao and P. Steenkiste. Design and evaluation of a distributed scalable content discovery system. IEEE JSAC, 22, Jan. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In SOSP '03: Proceedings of the nineteenth ACM symposium on Operating systems principles, pages 29--43, New York, NY, USA, 2003. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Graefe and D. L. Davison. Encapsulation of parallelism and architecture-independence in extensible database query execution. IEEE Trans.Softw. Eng., 19(8), 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Kroll and P. Widmayer. Distributing a search tree among a growing number of processors. SIGMOD Rec., 23(2), 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Mitzenmacher. The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst., 12(10):1094--1104, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. N. Patel. Learning from google's data centers. http://www.pronetadvertising.com/, 2006.Google ScholarGoogle Scholar
  20. C. Raiciu and D. S. Rosenblum. Enabling confidentiality in content-based publish/subscribe infrastructures. In Proc. Securecomm, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  21. I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan. Chord: A scalable Peer-To-Peer lookup service for internet applications. In Proc. SIGCOMM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Stonebraker, P. M. Aoki, W. Litwin, A. Pfeffer, A. Sah, J. Sidell, C. Staelin, and A. Yu. Mariposa: a wide-area distributed database system. The VLDB Journal, 5(1), 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Tang, Z. Xu, and S. Dwarkadas. Peer-to-peer information retrieval using self-organizing semantic overlay networks. In Proc. Sigcomm, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Tang, Z. Xu, and M. Mahalingam. psearch: information retrieval in structured overlays. SIGCOMM Comput. Commun. Rev., 33(1), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. W. W. Terpstra, S. Behnel, L. Fiege, J. Kangasharju, and A. Buchmann. Bit zipper Rendezvous Optimal data placement for general P2P queries. In Proc. EDBT Workshop on Peer-to-Peer Computing and DataBases, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. W. W. Terpstra, J. Kangasharju, C. Leng, and A. P. Buchmann. Bubblestorm: resilient, probabilistic, and exhaustive peer-to-peer search. In Proc. SIGCOMM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. F. Tian and D. J. DeWitt. Tuple routing strategies for distributed eddies. In Proc. VLDB, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ROAR: increasing the flexibility and performance of distributed search

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGCOMM '09: Proceedings of the ACM SIGCOMM 2009 conference on Data communication
        August 2009
        340 pages
        ISBN:9781605585949
        DOI:10.1145/1592568
        • cover image ACM SIGCOMM Computer Communication Review
          ACM SIGCOMM Computer Communication Review  Volume 39, Issue 4
          SIGCOMM '09
          October 2009
          325 pages
          ISSN:0146-4833
          DOI:10.1145/1594977
          Issue’s Table of Contents

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 August 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate554of3,547submissions,16%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader