ABSTRACT
To search the web quickly, search engines partition the web index over many machines, and consult every partition when answering a query. To increase throughput, replicas are added for each of these machines. The key parameter of these algorithms is the trade-off between replication and partitioning: increasing the partitioning level improves query completion time since more servers handle the query, but may incur non-negligible startup costs for each sub-query. Finding the right operating point and adapting to it can significantly improve performance and reduce costs.
We introduce Rendezvous On a Ring (ROAR), a novel distributed algorithm that enables on-the-fly re-configuration of the partitioning level. ROAR can add and remove servers without stopping the system, cope with server failures, and provide good load-balancing even with a heterogeneous server pool. We demonstrate these claims using a privacy-preserving search application built upon ROAR.
- Amazon Elastic Compute Cloud. http://aws.amazon.com/ec2/.Google Scholar
- Google Docs. http://docs.google.com/.Google Scholar
- The Google Search Query -- a technical look.Google Scholar
- http://www.webmasterworld.com/google/3694079.htm.Google Scholar
- R. H. Arpaci--Dusseau, E. Anderson, N. Treuhaft, D. E. Culler, J. M. Hellerstein, D. Patterson, and K. Yelick. Cluster i/o with river: making the fast case common. In Proc. Workshop on I/O in parallel and distributed systems, 1999. Google ScholarDigital Library
- L. A. Barroso, J. Dean, and U. Holzle. Web search for a planet: The google cluster architecture. Micro, IEEE, 23, 2003. Google ScholarDigital Library
- J. Bent, D. Thain, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and M. Livny. Explicit control a batch-aware distributed file system. In NSDI, 2004. Google ScholarDigital Library
- A. R. Bharambe, M. Agrawal, and S. Seshan. Mercury: supporting scalable multi-attribute range queries. SIGCOMM Comput. Commun. Rev., 34(4), 2004. Google ScholarDigital Library
- M. Cutts. Gadgets, Google, and SEO: Explaining algorithm updates and data refreshes, Dec. 2006.Google Scholar
- J. Dean. Personal Communication. Google.Google Scholar
- J. Dean. Challenges in Building Large Scale Information Systems. Keynote Presentation at ACM WSDM, 2009. Google ScholarDigital Library
- J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proc. OSDI, 2004. Google ScholarDigital Library
- R. A. Ferreira, M. K. Ramanathan, A. Awan, A. Grama, and S. Jagannathan. Search with probabilistic guarantees in unstructured peer-to-peer networks. In Proc. P2P, 2005. Google ScholarDigital Library
- J. Gao and P. Steenkiste. Design and evaluation of a distributed scalable content discovery system. IEEE JSAC, 22, Jan. 2004. Google ScholarDigital Library
- S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In SOSP '03: Proceedings of the nineteenth ACM symposium on Operating systems principles, pages 29--43, New York, NY, USA, 2003. ACM Press. Google ScholarDigital Library
- G. Graefe and D. L. Davison. Encapsulation of parallelism and architecture-independence in extensible database query execution. IEEE Trans.Softw. Eng., 19(8), 1993. Google ScholarDigital Library
- B. Kroll and P. Widmayer. Distributing a search tree among a growing number of processors. SIGMOD Rec., 23(2), 1994. Google ScholarDigital Library
- M. Mitzenmacher. The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst., 12(10):1094--1104, 2001. Google ScholarDigital Library
- N. Patel. Learning from google's data centers. http://www.pronetadvertising.com/, 2006.Google Scholar
- C. Raiciu and D. S. Rosenblum. Enabling confidentiality in content-based publish/subscribe infrastructures. In Proc. Securecomm, 2006.Google ScholarCross Ref
- I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan. Chord: A scalable Peer-To-Peer lookup service for internet applications. In Proc. SIGCOMM, 2001. Google ScholarDigital Library
- M. Stonebraker, P. M. Aoki, W. Litwin, A. Pfeffer, A. Sah, J. Sidell, C. Staelin, and A. Yu. Mariposa: a wide-area distributed database system. The VLDB Journal, 5(1), 1996. Google ScholarDigital Library
- C. Tang, Z. Xu, and S. Dwarkadas. Peer-to-peer information retrieval using self-organizing semantic overlay networks. In Proc. Sigcomm, 2003. Google ScholarDigital Library
- C. Tang, Z. Xu, and M. Mahalingam. psearch: information retrieval in structured overlays. SIGCOMM Comput. Commun. Rev., 33(1), 2003. Google ScholarDigital Library
- W. W. Terpstra, S. Behnel, L. Fiege, J. Kangasharju, and A. Buchmann. Bit zipper Rendezvous Optimal data placement for general P2P queries. In Proc. EDBT Workshop on Peer-to-Peer Computing and DataBases, 2004.Google ScholarDigital Library
- W. W. Terpstra, J. Kangasharju, C. Leng, and A. P. Buchmann. Bubblestorm: resilient, probabilistic, and exhaustive peer-to-peer search. In Proc. SIGCOMM, 2007. Google ScholarDigital Library
- F. Tian and D. J. DeWitt. Tuple routing strategies for distributed eddies. In Proc. VLDB, 2003. Google ScholarDigital Library
Index Terms
- ROAR: increasing the flexibility and performance of distributed search
Recommendations
ROAR: increasing the flexibility and performance of distributed search
SIGCOMM '09To search the web quickly, search engines partition the web index over many machines, and consult every partition when answering a query. To increase throughput, replicas are added for each of these machines. The key parameter of these algorithms is the ...
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementThis work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Equivalence and minimization of conjunctive queries under combined semantics
ICDT '12: Proceedings of the 15th International Conference on Database TheoryThe problems of query containment, equivalence, and minimization are fundamental problems in the context of query processing and optimization. In their classic work [2] published in 1977, Chandra and Merlin solved the three problems for the language of ...
Comments