skip to main content
10.1145/2390021.2390028acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

The Yahoo!: cloud datastore load balancer

Published:29 October 2012Publication History

ABSTRACT

Sherpa is a large-scale distributed and globally replicated multi-tenant cloud data storage system. Sherpa scales by horizontally partitioning data into tablets and distributing these tablets across multiple servers. While Sherpa scales for increasing workload sizes by adding servers, it is vulnerable to load imbalance among tablets that cause hotspots to develop on just a few servers.

In this paper we describe Yak, the Sherpa load balancer. Yak detects hotspots and then automatically balances load by migrating tablets from the overloaded servers, and also by splitting data into new tablets. We describe Yak's design principles, algorithms and architecture. We then evaluate Yak on workloads based on Sherpa production scenarios.

References

  1. E. Anderson, J. Hall, J. D. Hartline, M. Hobbs, A. R. Karlin, J. Saia, R. Swaminathan, and J. Wilkes. An experimental study of data migration algorithms. In Proceedings of the 5th International Workshop on Algorithm Engineering, WAE '01, pages 145--158, London, UK, 2001. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. Anderson, M. Hobbs, K. Keeton, S. Spence, M. Uysal, and A. Veitch. Hippodrome: Running circles around storage administration. In Proceedings of the 1st USENIX Conference on File and Storage Technologies, FAST '02, Berkeley, CA, USA, 2002. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Bodík, A. Fox, M. J. Franklin, M. I. Jordan, and D. A. Patterson. Characterizing, modeling, and generating workload spikes for stateful services. In SoCC, pages 241--252, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: a distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7, pages 15--15, Berkeley, CA, USA, 2006. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. Pnuts: Yahoo!'s hosted data serving platform. PVLDB, 1(2):1277--1288, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10, pages 143--154, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. P. Copeland, W. Alexander, E. E. Boughter, and T. W. Keller. Data placement in bubba. In SIGMOD, pages 99--108, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles, SOSP'07, pages 205--220, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Ganesan, M. Bawa, and H. Garcia-Molina. Online balancing of range-partitioned data with applications to peer-to-peer systems. In Proceedings of the Thirtieth international conference on Very Large Data Bases - Volume 30, VLDB '04, pages 444--455. VLDB Endowment, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Hall, J. Hartline, A. R. Karlin, J. Saia, and J. Wilkes. On algorithms for efficient data migration. In Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, SODA '01, pages 620--629, Philadelphia, PA, USA, 2001. Society for Industrial and Applied Mathematics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. R. Karger and M. Ruhl. Simple efficient load balancing algorithms for peer-to-peer systems. In Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, SPAA '04, pages 36--43, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Kunkle and J. Schindler. A load balancing framework for clustered storage systems. In Proceedings of the 15th international conference on High Performance Computing, HiPC'08, pages 57--72, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Lakshman and P. Malik. Cassandra: structured storage system on a p2p network. In Proceedings of the 28th ACM symposium on Principles of distributed computing, PODC '09, pages 5--5, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Lu, G. A. Alvarez, and J. Wilkes. Aqueduct: Online data migration with performance guarantees. In Proceedings of the 1st USENIX Conference on File and Storage Technologies, FAST '02, Berkeley, CA, USA, 2002. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Trushkowsky, P. Bodík, A. Fox, M. J. Franklin, M. I. Jordan, and D. A. Patterson. The scads director: scaling a distributed storage system under stringent performance requirements. In Proceedings of the 9th USENIX conference on File and stroage technologies, FAST'11, pages 12--12, Berkeley, CA, USA, 2011. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Urgaonkar, P. J. Shenoy, A. Chandra, and P. Goyal. Dynamic provisioning of multi-tier internet applications. In International Conference on Autonomic Computing (ICAC), pages 217--228, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The Yahoo!: cloud datastore load balancer

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                CloudDB '12: Proceedings of the fourth international workshop on Cloud data management
                October 2012
                74 pages
                ISBN:9781450317085
                DOI:10.1145/2390021

                Copyright © 2012 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 29 October 2012

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                Overall Acceptance Rate12of17submissions,71%

                Upcoming Conference

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader