research-article

The Yahoo!: cloud datastore load balancer

Authors:
Markus Klems

Karlsruhe Institute of Technology, Karlsruhe, Germany

Karlsruhe Institute of Technology, Karlsruhe, Germany
View Profile

,
Adam Silberstein

LinkedIn, Mountain View, CA, USA

LinkedIn, Mountain View, CA, USA
View Profile

,
Jianjun Chen

Google, Moutain View, CA, USA

Google, Moutain View, CA, USA
View Profile

,
Masood Mortazavi

Yahoo! Inc., Sunnyvale, CA, USA

Yahoo! Inc., Sunnyvale, CA, USA
View Profile

,
Sahaya Andrews Albert

Yahoo! Inc., Sunnyvale, CA, USA

Yahoo! Inc., Sunnyvale, CA, USA
View Profile

,
P.P.S. Narayan

Yahoo! Inc., Sunnyvale, CA, USA

Yahoo! Inc., Sunnyvale, CA, USA
View Profile

,
Adwait Tumbde

Instart Logic

Instart Logic
View Profile

,
Brian Cooper

Google

Google
View Profile

CloudDB '12: Proceedings of the fourth international workshop on Cloud data managementOctober 2012Pages 33–40https://doi.org/10.1145/2390021.2390028

Published:29 October 2012Publication History

CloudDB '12: Proceedings of the fourth international workshop on Cloud data management

Pages 33–40

ABSTRACT

Sherpa is a large-scale distributed and globally replicated multi-tenant cloud data storage system. Sherpa scales by horizontally partitioning data into tablets and distributing these tablets across multiple servers. While Sherpa scales for increasing workload sizes by adding servers, it is vulnerable to load imbalance among tablets that cause hotspots to develop on just a few servers.

In this paper we describe Yak, the Sherpa load balancer. Yak detects hotspots and then automatically balances load by migrating tablets from the overloaded servers, and also by splitting data into new tablets. We describe Yak's design principles, algorithms and architecture. We then evaluate Yak on workloads based on Sherpa production scenarios.

References

E. Anderson, J. Hall, J. D. Hartline, M. Hobbs, A. R. Karlin, J. Saia, R. Swaminathan, and J. Wilkes. An experimental study of data migration algorithms. In Proceedings of the 5th International Workshop on Algorithm Engineering, WAE '01, pages 145--158, London, UK, 2001. Springer-Verlag. Google ScholarDigital Library
E. Anderson, M. Hobbs, K. Keeton, S. Spence, M. Uysal, and A. Veitch. Hippodrome: Running circles around storage administration. In Proceedings of the 1st USENIX Conference on File and Storage Technologies, FAST '02, Berkeley, CA, USA, 2002. USENIX Association. Google ScholarDigital Library
P. Bodík, A. Fox, M. J. Franklin, M. I. Jordan, and D. A. Patterson. Characterizing, modeling, and generating workload spikes for stateful services. In SoCC, pages 241--252, 2010. Google ScholarDigital Library
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: a distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7, pages 15--15, Berkeley, CA, USA, 2006. USENIX Association. Google ScholarDigital Library
B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. Pnuts: Yahoo!'s hosted data serving platform. PVLDB, 1(2):1277--1288, 2008. Google ScholarDigital Library
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10, pages 143--154, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
G. P. Copeland, W. Alexander, E. E. Boughter, and T. W. Keller. Data placement in bubba. In SIGMOD, pages 99--108, 1988. Google ScholarDigital Library
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles, SOSP'07, pages 205--220, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
P. Ganesan, M. Bawa, and H. Garcia-Molina. Online balancing of range-partitioned data with applications to peer-to-peer systems. In Proceedings of the Thirtieth international conference on Very Large Data Bases - Volume 30, VLDB '04, pages 444--455. VLDB Endowment, 2004. Google ScholarDigital Library
J. Hall, J. Hartline, A. R. Karlin, J. Saia, and J. Wilkes. On algorithms for efficient data migration. In Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, SODA '01, pages 620--629, Philadelphia, PA, USA, 2001. Society for Industrial and Applied Mathematics. Google ScholarDigital Library
D. R. Karger and M. Ruhl. Simple efficient load balancing algorithms for peer-to-peer systems. In Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, SPAA '04, pages 36--43, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
D. Kunkle and J. Schindler. A load balancing framework for clustered storage systems. In Proceedings of the 15th international conference on High Performance Computing, HiPC'08, pages 57--72, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarDigital Library
A. Lakshman and P. Malik. Cassandra: structured storage system on a p2p network. In Proceedings of the 28th ACM symposium on Principles of distributed computing, PODC '09, pages 5--5, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
C. Lu, G. A. Alvarez, and J. Wilkes. Aqueduct: Online data migration with performance guarantees. In Proceedings of the 1st USENIX Conference on File and Storage Technologies, FAST '02, Berkeley, CA, USA, 2002. USENIX Association. Google ScholarDigital Library
B. Trushkowsky, P. Bodík, A. Fox, M. J. Franklin, M. I. Jordan, and D. A. Patterson. The scads director: scaling a distributed storage system under stringent performance requirements. In Proceedings of the 9th USENIX conference on File and stroage technologies, FAST'11, pages 12--12, Berkeley, CA, USA, 2011. USENIX Association. Google ScholarDigital Library
B. Urgaonkar, P. J. Shenoy, A. Chandra, and P. Goyal. Dynamic provisioning of multi-tier internet applications. In International Conference on Autonomic Computing (ICAC), pages 217--228, 2005. Google ScholarDigital Library

Index Terms

The Yahoo!: cloud datastore load balancer

Recommendations

Managing elasticity across multiple cloud providers
MultiCloud '13: Proceedings of the 2013 international workshop on Multi-cloud applications and federated clouds

In the context of cloud computing, elasticity is the capacity to scale computing resources up and down easily. Currently, most Platforms as a Service (PaaS) manage application elasticity within a single cloud provider. However, the not so infrequent ...
Read More
A survey on elasticity management in PaaS systems

Elasticity is a goal of cloud computing. An elastic system should manage in an autonomic way its resources, being adaptive to dynamic workloads, allocating additional resources when workload is increased and deallocating resources when workload ...
Read More
Randomized Hydrodynamic Load Balancing Approach
ICPPW '08: Proceedings of the 2008 International Conference on Parallel Processing - Workshops

Load balancing is performed to achieve the optimal use of the existing computational resources as much as possible whereby none of the resources remains idle while some other resources are being utilized. Balanced load distribution can be achieved by ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CloudDB '12: Proceedings of the fourth international workshop on Cloud data management
October 2012
74 pages
ISBN:9781450317085
DOI:10.1145/2390021
General Chair:
Xiaofeng Meng
Renmin University of China, China
,
Program Chairs:
Adam Silberstein
Linkedin, USA
,
Fusheng Wang
Emory University, USA
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dynamic control
dynamic load balancing
elasticity
scalability
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate12of17submissions,71%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 477
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The Yahoo!: cloud datastore load balancer

CloudDB '12: Proceedings of the fourth international workshop on Cloud data management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Managing elasticity across multiple cloud providers

A survey on elasticity management in PaaS systems

Randomized Hydrodynamic Load Balancing Approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

The Yahoo!: cloud datastore load balancer

CloudDB '12: Proceedings of the fourth international workshop on Cloud data management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Managing elasticity across multiple cloud providers

A survey on elasticity management in PaaS systems

Randomized Hydrodynamic Load Balancing Approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media