research-article

SWORD: scalable workload-aware data placement for transactional workloads

Authors:
Abdul Quamar

University of Maryland

University of Maryland
View Profile

,
K. Ashwin Kumar

University of Maryland

University of Maryland
View Profile

,
Amol Deshpande

University of Maryland

University of Maryland
View Profile

EDBT '13: Proceedings of the 16th International Conference on Extending Database TechnologyMarch 2013Pages 430–441https://doi.org/10.1145/2452376.2452427

Published:18 March 2013Publication History

EDBT '13: Proceedings of the 16th International Conference on Extending Database Technology

Pages 430–441

ABSTRACT

In this paper, we address the problem of transparently scaling out transactional (OLTP) workloads on relational databases, to support database-as-a-service in cloud computing environment. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. Capturing and modeling the transactional workload over a period of time, and then exploiting that information for data placement and replication has been shown to provide significant benefits in performance, both in terms of transaction latencies and overall throughput. However, such workload-aware data placement approaches can incur very high overheads, and further, may perform worse than naive approaches if the workload changes.

In this work, we propose SWORD, a scalable workload-aware data partitioning and placement approach for OLTP workloads, that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement, and during query execution at runtime. We model the workload as a hypergraph over the data items, and propose using a hypergraph compression technique to reduce the overheads of partitioning. To deal with workload changes, we propose an incremental data repartitioning technique that modifies data placement in small steps without resorting to complete workload repartitioning. We have built a workload-aware active replication mechanism in SWORD to increase availability and enable load balancing. We propose the use of fine-grained quorums defined at the level of groups of tuples to control the cost of distributed updates, improve throughput, and provide adaptability to different workloads. To our knowledge, SWORD is the first system that uses fine-grained quorums in this context. The results of our experimental evaluation on SWORD deployed on an Amazon EC2 cluster show that our techniques result in orders-of-magnitude reductions in the partitioning and book-keeping overheads, and improve tolerance to failures and workload changes; we also show that choosing quorums based on the query access patterns enables us to better handle query workloads with different read and write access patterns.

References

hMetis: a hypergraph partitioning package, http://glaros.dtc.umn.edu/gkhome/metis/hmetis/overview.Google Scholar
C. Ayka, B. Cambazoglu, and U. Bora. Multi-level direct k-way hypergraph partitioning with multiple constraints and fixed vertices. J. Parallel Distrib. Comput., 2008. Google ScholarDigital Library
J. Baker, C. Bond, J. Corbett, J. J. Furman, A. Khorlin, J. Larson, J. M. Leon, Y. Li, A. Lloyd, and V. Yushprakh. Megastore: Providing scalable, highly available storage for interactive services. In CIDR, 2011.Google Scholar
N. Bruno, S. Chaudhuri, A. C. Konig, V. R. Narasayya, R. Ramamurthy, and M. Syamala. Autoadmin project at Microsoft Research: Lessons learned. IEEE Data Eng. Bull., 2011.Google Scholar
G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In WSDM, 2008. Google ScholarDigital Library
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: a distributed storage system for structured data. In OSDI '06. Google ScholarDigital Library
C. Curino, Y. Zhang, E. P. C. Jones, and S. Madden. Schism: a workload-driven approach to database replication and partitioning. PVLDB, September 2010. Google ScholarDigital Library
D. J. Dewitt and J. Gray. Parallel database systems: the future of high performance database systems. Communications of the ACM, 1992. Google ScholarDigital Library
S. Fortunato. Community detection in graphs. Physics Reports, 2010.Google ScholarCross Ref
J. Gray, P. Helland, P. E. O'Neil, and D. Shasha. The dangers of replication and a solution. In SIGMOD, 1996. Google ScholarDigital Library
J. Gray and L. Lamport. Consensus on transaction commit. ACM Transactions on Database Systems, 2003. Google ScholarDigital Library
E. P. C. Jones, D. J. Abadi, and S. Madden. Low overhead concurrency control for partitioned main memory databases. In SIGMOD, 2010. Google ScholarDigital Library
R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi. H-store: a high-performance, distributed main memory transaction processing system. PVLDB, 2008. Google ScholarDigital Library
C. Karande, K. Chellapilla, and R. Andersen. Speeding up algorithms on compressed web graphs. In WSDM, 2009. Google ScholarDigital Library
B. Kemme and A. Gustavo. Database replication: a tale of research across communities. PVLDB, September 2010. Google ScholarDigital Library
B. Kemme, R. Jiménez-Peris, and M. Patiño-Martínez. Database Replication. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2010. Google ScholarDigital Library
K. A. Kumar, A. Deshpande, and S. Khuller. Data placement and replica selection for improving colocation in distributed environments. Unpublished manuscript, 2012.Google Scholar
A. Lakshman and P. Malik. Cassandra: Structured storage system on a P2P network. In PODC '09. Google ScholarDigital Library
S. Navlakha, R. Rastogi, and N. Shrivastava. Graph summarization with bounded error. In SIGMOD, 2008. Google ScholarDigital Library
R. Nehme and N. Bruno. Automated partitioning design in parallel database systems. In SIGMOD, 2011. Google ScholarDigital Library
A. Pavlo, C. Curino, and S. Zdonik. Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In SIGMOD, 2012. Google ScholarDigital Library
J. R. Peris, M. P. Martinez, G. Alonso, and B. Kemme. Are quorums an alternative for data replication? ACM TODS, 28(3) 2003. Google ScholarDigital Library
R. J. Peris and M. P. Martinez. How to select a replication protocol according to scalability, availability and communication overhead. In SRDS, 2001.Google Scholar
R. J. Peris, M. P. Martnez, B. Kemme, and G. Alonso. How to select a replication protocol according to scalability, availability, and communication overhead. IEEE Symposium on RDS, 2001.Google Scholar
A. L. Tatarowicz, C. Curino, E. P. C. Jones, and S. Madden. Lookup tables: Fine-grained partitioning for distributed databases. In ICDE, 2011. Google ScholarDigital Library
X. Wang, A. Smalter, J. Huan, and G. H. Lushington. G-hash: towards fast kernel-based similarity search in large graph databases. In EDBT, 2009. Google ScholarDigital Library

Index Terms

SWORD: scalable workload-aware data placement for transactional workloads
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database transaction processing
      2. Parallel and distributed DBMSs

Recommendations

SWORD: workload-aware data placement and replica selection for cloud data management systems

Cloud computing is increasingly being seen as a way to reduce infrastructure costs and add elasticity, and is being used by a wide range of organizations. Cloud data management systems today need to serve a range of different workloads, from analytical ...
Read More
Are quorums an alternative for data replication?

Data replication is playing an increasingly important role in the design of parallel information systems. In particular, the widespread use of cluster architectures often requires to replicate data for performance and availability reasons. However, ...
Read More
IoT Data Replication and Consistency Management in Fog Computing
Abstract
Fog Computing has emerged as a virtual platform extending Cloud services down to the network edge especially (and not exclusively) to host IoT applications. Data replication strategies have been designed to investigate the best storage location of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EDBT '13: Proceedings of the 16th International Conference on Extending Database Technology
March 2013
793 pages
ISBN:9781450315975
DOI:10.1145/2452376
General Chair:
Giovanna Guerrini
Università di Genova, Italy
,
Program Chair:
Norman W. Paton
University of Manchester, UK
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 March 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data placement
database-as-a-service
graph partitioning
quorums
replication
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate7of10submissions,70%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 93
  Total Citations
  View Citations
- 542
  Total Downloads
- Downloads (Last 12 months)43
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SWORD: scalable workload-aware data placement for transactional workloads

EDBT '13: Proceedings of the 16th International Conference on Extending Database Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

SWORD: workload-aware data placement and replica selection for cloud data management systems

Are quorums an alternative for data replication?

IoT Data Replication and Consistency Management in Fog Computing