ABSTRACT
Sherpa is a large-scale distributed and globally replicated multi-tenant cloud data storage system. Sherpa scales by horizontally partitioning data into tablets and distributing these tablets across multiple servers. While Sherpa scales for increasing workload sizes by adding servers, it is vulnerable to load imbalance among tablets that cause hotspots to develop on just a few servers.
In this paper we describe Yak, the Sherpa load balancer. Yak detects hotspots and then automatically balances load by migrating tablets from the overloaded servers, and also by splitting data into new tablets. We describe Yak's design principles, algorithms and architecture. We then evaluate Yak on workloads based on Sherpa production scenarios.
- E. Anderson, J. Hall, J. D. Hartline, M. Hobbs, A. R. Karlin, J. Saia, R. Swaminathan, and J. Wilkes. An experimental study of data migration algorithms. In Proceedings of the 5th International Workshop on Algorithm Engineering, WAE '01, pages 145--158, London, UK, 2001. Springer-Verlag. Google ScholarDigital Library
- E. Anderson, M. Hobbs, K. Keeton, S. Spence, M. Uysal, and A. Veitch. Hippodrome: Running circles around storage administration. In Proceedings of the 1st USENIX Conference on File and Storage Technologies, FAST '02, Berkeley, CA, USA, 2002. USENIX Association. Google ScholarDigital Library
- P. Bodík, A. Fox, M. J. Franklin, M. I. Jordan, and D. A. Patterson. Characterizing, modeling, and generating workload spikes for stateful services. In SoCC, pages 241--252, 2010. Google ScholarDigital Library
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: a distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7, pages 15--15, Berkeley, CA, USA, 2006. USENIX Association. Google ScholarDigital Library
- B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. Pnuts: Yahoo!'s hosted data serving platform. PVLDB, 1(2):1277--1288, 2008. Google ScholarDigital Library
- B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10, pages 143--154, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- G. P. Copeland, W. Alexander, E. E. Boughter, and T. W. Keller. Data placement in bubba. In SIGMOD, pages 99--108, 1988. Google ScholarDigital Library
- G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles, SOSP'07, pages 205--220, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- P. Ganesan, M. Bawa, and H. Garcia-Molina. Online balancing of range-partitioned data with applications to peer-to-peer systems. In Proceedings of the Thirtieth international conference on Very Large Data Bases - Volume 30, VLDB '04, pages 444--455. VLDB Endowment, 2004. Google ScholarDigital Library
- J. Hall, J. Hartline, A. R. Karlin, J. Saia, and J. Wilkes. On algorithms for efficient data migration. In Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, SODA '01, pages 620--629, Philadelphia, PA, USA, 2001. Society for Industrial and Applied Mathematics. Google ScholarDigital Library
- D. R. Karger and M. Ruhl. Simple efficient load balancing algorithms for peer-to-peer systems. In Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, SPAA '04, pages 36--43, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
- D. Kunkle and J. Schindler. A load balancing framework for clustered storage systems. In Proceedings of the 15th international conference on High Performance Computing, HiPC'08, pages 57--72, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarDigital Library
- A. Lakshman and P. Malik. Cassandra: structured storage system on a p2p network. In Proceedings of the 28th ACM symposium on Principles of distributed computing, PODC '09, pages 5--5, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- C. Lu, G. A. Alvarez, and J. Wilkes. Aqueduct: Online data migration with performance guarantees. In Proceedings of the 1st USENIX Conference on File and Storage Technologies, FAST '02, Berkeley, CA, USA, 2002. USENIX Association. Google ScholarDigital Library
- B. Trushkowsky, P. Bodík, A. Fox, M. J. Franklin, M. I. Jordan, and D. A. Patterson. The scads director: scaling a distributed storage system under stringent performance requirements. In Proceedings of the 9th USENIX conference on File and stroage technologies, FAST'11, pages 12--12, Berkeley, CA, USA, 2011. USENIX Association. Google ScholarDigital Library
- B. Urgaonkar, P. J. Shenoy, A. Chandra, and P. Goyal. Dynamic provisioning of multi-tier internet applications. In International Conference on Autonomic Computing (ICAC), pages 217--228, 2005. Google ScholarDigital Library
Index Terms
- The Yahoo!: cloud datastore load balancer
Recommendations
Managing elasticity across multiple cloud providers
MultiCloud '13: Proceedings of the 2013 international workshop on Multi-cloud applications and federated cloudsIn the context of cloud computing, elasticity is the capacity to scale computing resources up and down easily. Currently, most Platforms as a Service (PaaS) manage application elasticity within a single cloud provider. However, the not so infrequent ...
A survey on elasticity management in PaaS systems
Elasticity is a goal of cloud computing. An elastic system should manage in an autonomic way its resources, being adaptive to dynamic workloads, allocating additional resources when workload is increased and deallocating resources when workload ...
Randomized Hydrodynamic Load Balancing Approach
ICPPW '08: Proceedings of the 2008 International Conference on Parallel Processing - WorkshopsLoad balancing is performed to achieve the optimal use of the existing computational resources as much as possible whereby none of the resources remains idle while some other resources are being utilized. Balanced load distribution can be achieved by ...
Comments