research-article

On the Design and Scalability of Distributed Shared-Data Databases

Authors:
Simon Loesing

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

,
Markus Pilman

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

,
Thomas Etter

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

,
Donald Kossmann

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataMay 2015Pages 663–676https://doi.org/10.1145/2723372.2751519

Published:27 May 2015Publication History

SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

Pages 663–676

ABSTRACT

Database scale-out is commonly implemented by partitioning data across several database instances. This approach, however, has several restrictions. In particular, partitioned databases are inflexible in large-scale deployments and assume a partition-friendly workload in order to scale. In this paper, we analyze an alternative architecture design for distributed relational databases that overcomes the limitations of partitioned databases. The architecture is based on two fundamental principles: We decouple query processing and transaction management from data storage, and we share data across query processing nodes. The combination of these design choices provides scalability, elasticity, and operational flexibility without making any assumptions on the workload. As a drawback, sharing data among multiple database nodes causes synchronization overhead. To address this limitation, we introduce techniques for scalable transaction processing in shared-data environments. Specifically, we describe mechanisms for efficient data access, concurrency control, and data buffering. In combination with new hardware trends, the techniques enable performance characteristics that top state-of-the-art partitioned databases.

References

M. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: a new paradigm for building scalable distributed systems. SOSP'07, pages 159--174, 2007. Google ScholarDigital Library
Apache Hadoop. http://hadoop.apache.org/. Nov. 06, 2014.Google Scholar
J. Baker, C. Bond, J. Corbett, J. Furman, A. Khorlin, J. Larson, J.-M. Léon, Y. Li, A. Lloyd, and V. Yushprakh. Megastore: providing scalable, highly available storage for interactive services. CIDR'11, pages 223--234, 2011.Google Scholar
H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil, and P. O'Neil. A critique of ANSI SQL isolation levels. SIGMOD'95, pages 1--10, 1995. Google ScholarDigital Library
P. Bernstein and N. Goodman. Concurrency control in distributed database systems. ACM Comput. Surv., 13(2):185--221, 1981. Google ScholarDigital Library
P. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency control and recovery in database systems. Addison-Wesley, 1987. Google ScholarDigital Library
P. Bernstein, C. Reid, and S. Das. Hyder - a transactional record manager for shared flash. CIDR'11, pages 9--20, 2011.Google Scholar
M. Brantner, D. Florescu, D. Graf, D. Kossmann, and T. Kraska. Building a database on S3. SIGMOD'08, pages 251--264, 2008. Google ScholarDigital Library
M. Cahill, U. Röhm, and A. Fekete. Serializable isolation for snapshot databases. SIGMOD'08, pages 729--738, 2008. Google ScholarDigital Library
D. Campbell, G. Kakivaya, and N. Ellis. Extreme scale with full SQL language support in microsoft SQL Azure. SIGMOD'10, pages 1021--1024, 2010. Google ScholarDigital Library
S. Chandrasekaran and R. Bamford. Shared cache - the future of parallel databases. ICDE'03, 2003.Google ScholarCross Ref
F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. Gruber. Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2):4:1--4:26, 2008. Google ScholarDigital Library
J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. Spanner: Google's globally-distributed database. OSDI'12, pages 251--264, 2012. Google ScholarDigital Library
S. Das, D. Agrawal, and A. El Abbadi. G-Store: a scalable data store for transactional multi key access in the cloud. SoCC'10, pages 163--174, 2010. Google ScholarDigital Library
S. Das, D. Agrawal, and A. El Abbadi. ElasTraS: an elastic, scalable, and self-managing transactional database for the cloud. ACM Trans. Database Syst., 38(1):5:1--5:45, 2013. Google ScholarDigital Library
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. SOSP'07, pages 205--220, 2007. Google ScholarDigital Library
D. DeWitt and J. Gray. Parallel database systems: the future of high performance database systems. Commun. ACM, 35(6):85--98, 1992. Google ScholarDigital Library
A. Fekete, D. Liarokapis, E. O'Neil, P. O'Neil, and D. Shasha. Making snapshot isolation serializable. ACM Trans. Database Syst., 30(2):492--528, 2005. Google ScholarDigital Library
FoundationDB. https://foundationdb.com/. Feb. 07, 2015.Google Scholar
D. Gomez Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh. Omid: lock-free transactional support for distributed data stores. ICDE'14, pages 676--687, 2014.Google Scholar
G. Graefe. A survey of B-tree locking techniques. ACM Trans. Database Syst., 35(3):16:1--16:26, 2010. Google ScholarDigital Library
J. Gray and A. Reuter. Transaction processing: concepts and techniques. Morgan Kaufmann, 1992. Google ScholarDigital Library
T. Horikawa. Latch-free data structures for DBMS: design, implementation, and evaluation. SIGMOD'13, pages 409--420, 2013. Google ScholarDigital Library
InfiniBand. http://www.infinibandta.org/. Nov. 06, 2014.Google Scholar
E. Jensen, G. Hagensen, and J. Broughton. A new approach to exclusive data access in shared memory multiprocessors. Technical Report UCRL-97663, 1987.Google Scholar
E. P. Jones, D. J. Abadi, and S. Madden. Low overhead concurrency control for partitioned main memory databases. SIGMOD'10, pages 603--614, 2010. Google ScholarDigital Library
S. Jorwekar, A. Fekete, K. Ramamritham, and S. Sudarshan. Automating the detection of snapshot isolation anomalies. VLDB'07, pages 1263--1274, 2007. Google ScholarDigital Library
J. Josten, C. Mohan, I. Narang, and J. Teng. DB2's use of the coupling facility for data sharing. IBM Systems Journal, 36(2):327--351, 1997. Google ScholarDigital Library
R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. Abadi. H-store: a high-performance, distributed main memory transaction processing system. Proc. VLDB Endow., 1(2):1496--1499, 2008. Google ScholarDigital Library
A. Kemper and T. Neumann. HyPer: a hybrid OLTP & OLAP main memory database system based on virtual memory snapshots. ICDE'11, pages 195--206, 2011. Google ScholarDigital Library
H. T. Kung and J. T. Robinson. On optimistic methods for concurrency control. ACM Trans. Database Syst., 6(2):213--226, 1981. Google ScholarDigital Library
P.-A. Larson, S. Blanas, C. Diaconu, C. Freedman, J. M. Patel, and M. Zwilling. High-performance concurrency control mechanisms for main-memory databases. Proc. VLDB Endow., 5(4):298--309, 2011. Google ScholarDigital Library
P. Lehman and S. B. Yao. Efficient locking for concurrent operations on B-trees. ACM Trans. Database Syst., 6(4):650--670, 1981. Google ScholarDigital Library
J. Levandoski, D. Lomet, M. Mokbel, and K. Zhao. Deuteronomy: transaction support for cloud data. CIDR'11, pages 123--133, 2011.Google Scholar
J. Levandoski, D. Lomet, and S. Sengupta. The Bw-tree: a B-tree for new hardware platforms. ICDE'13, pages 302--313, 2013. Google ScholarDigital Library
D. Lomet, R. Anderson, T. Rengarajan, and P. Spiro. How the Rdb/VMS data sharing system became fast. Technical Report CRL 92/4, 1992.Google Scholar
D. Lomet, A. Fekete, G. Weikum, and M. Zwilling. Unbundling transaction services in the cloud. CIDR'09, 2009.Google Scholar
M. Mages. ABA prevention using single-word instructions. Technical Report RC23089 (W0401-136), 2004.Google Scholar
C. Mohan. History repeats itself: sensible and Nonsen aspects of the NoSQL hoopla. EDBT'13, pages 11--16, 2013. Google ScholarDigital Library
T. Mühlbauer, W. Rödiger, A. Reiser, A. Kemper, and T. Neumann. ScyPer: elastic OLAP throughput on transactional data. DanaC'13, pages 11--15, 2013. Google ScholarDigital Library
MySQL Cluster. http://www.mysql.com/products/cluster/. Nov. 06, 2014.Google Scholar
D. Ongaro, S. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum. Fast crash recovery in RAMCloud. SOSP'11, pages 29--41, 2011. Google ScholarDigital Library
J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, D. Ongaro, G. Parulkar, M. Rosenblum, S. Rumble, E. Stratmann, and R. Stutsman. The case for RAMCloud. Commun. ACM, 54(7):121--130, 2011. Google ScholarDigital Library
V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-time query processing. ICDE'08, pages 60--69, 2008. Google ScholarDigital Library
J. Rao, E. Shekita, and S. Tata. Using paxos to build a scalable, consistent, and highly available datastore. Proc. VLDB Endow., 4(4):243--254, 2011. Google ScholarDigital Library
W. Ronald. A technical overview of the Oracle Exadata database machine and Exadata storage server. Technical Report Oracle White Paper, 2012.Google Scholar
M. Rys. Scalable SQL. Commun. ACM, 54(6):48--53, 2011. Google ScholarDigital Library
M. Serafini, E. Mansour, A. Aboulnaga, K. Salem, T. Rafiq, and U. F. Minhas. Accordion: Elastic scalability for database systems supporting distributed transactions. Proc. VLDB Endow., 7(12), 2014. Google ScholarDigital Library
J. Shute, R. Vingralek, B. Samwel, et al. F1: a distributed SQL database that scales. Proc. VLDB Endow., 6(11):1068--1079, 2013. Google ScholarDigital Library
A. Singhal, R. Van der Wijngaart, and P. Barry. Atomic read modify write primitives for I/O devices. Technical Report Intel White Paper, 2008.Google Scholar
G. H. Sockut and B. R. Iyer. Online reorganization of databases. ACM Comput. Surv., 41(3):14:1--14:136, 2009. Google ScholarDigital Library
M. Stonebraker and R. Cattell. 10 rules for scalable performance in 'simple operation' datastores. Commun. ACM, 54(6):72--80, 2011. Google ScholarDigital Library
M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era: (it's time for a complete rewrite). VLDB'07, pages 1150--1160, 2007. Google ScholarDigital Library
M. Stonebraker and A. Weisberg. The VoltDB main memory DBMS. IEEE Data Eng. Bull., 36(2):21--27, 2013.Google Scholar
R. Taft, E. Mansour, M. Serafini, J. Duggan, A. J. ElmoreA, A. Aboulnaga, A. Pavlo, and M. Stonebraker. E-store: Fine-grained elastic partitioning for distributed transaction processing systems. Proc. VLDB Endow., 8(3), 2014. Google ScholarDigital Library
A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi. Calvin: fast distributed transactions for partitioned database systems. SIGMOD'12, pages 1--12, 2012. Google ScholarDigital Library
Transaction Processing Performance Council (TPC). TPC Benchmark C Specification ver. 5.11, 2010.Google Scholar
S. Tu, W. Zheng, E. Kohler, B. Liskov, and S. Madden. Speedy transactions in multicore in-memory databases. SOSP'13, pages 18--32, 2013. Google ScholarDigital Library
P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann. Predictable performance for unpredictable workloads. Proc. VLDB Endow., 2(1):706--717, 2009. Google ScholarDigital Library
VoltDB. http://www.voltdb.com/. Nov. 06, 2014.Google Scholar
M. Yabandeh and D. Gómez Ferro. A critique of snapshot isolation. EuroSys'12, pages 155--168, 2012. Google ScholarDigital Library

Index Terms

On the Design and Scalability of Distributed Shared-Data Databases

Recommendations

Distributed Optimistic Concurrency Control Methods for High-Performance Transaction Processing

There is an ever-increasing demand for more complex transactions and higher throughputs in transaction processing systems leading to higher degrees of transaction concurrency and, hence, higher data contention. The conventional two-phase locking (2PL) ...
Read More
Hekaton: SQL server's memory-optimized OLTP engine
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Hekaton is a new database engine optimized for memory resident data and OLTP workloads. Hekaton is fully integrated into SQL Server; it is not a separate system. To take advantage of Hekaton, a user simply declares a table memory optimized. Hekaton ...
Read More
Database technologies in the world of big data
CompSysTech '15: Proceedings of the 16th International Conference on Computer Systems and Technologies

Now we have a number of database technologies called usually NoSQL, like key-value, column-oriented, and document stores as well as search engines and graph databases. Whereas SQL software vendors offer advanced products with the capability to handle ...
Read More

Reviews

Reviewer: Filip-Martin Brinkmann

Achieving horizontal scalability in distributed databases is in most cases achieved by partitioning the data into shards. It comes, however, at a price; the problem with this approach is that only the database node that owns a shard can access it. This leads to a potential bottleneck when executing distributed transactions. The paper explores the architecture of a distributed relational database system called Tell, which does not follow this paradigm; instead of the whole database, only data are partitioned. Such systems are known as shared-data databases. The paper presents the architecture of the system and discusses the suitability of the approach for both online transaction processing (OLTP) and online analytical processing (OLAP) workloads. Moreover, the authors provide detailed insights into some of their solutions, which solve common problems that appear in shared-data systems. Finally, they show the results of extensive evaluations that not only cover Tell's performance, but also compare it to existing distributed database systems. On a higher level, the paper contributes to an important ongoing discussion concerning the question: Which architectures work best for the workloads that today's large-scale web applications produce__?__ The paper is a very thorough introduction for readers who have some knowledge about architectures of distributed databases. Where necessary, more in-depth information is provided in order to help readers understand the big picture. The authors identify challenges and present their contributions in a very structured way. The results should, however, be taken with a grain of salt: Tell heavily relies on a highly optimized system stack and runs in a local area network (LAN) interconnected by Infiniband. This is certainly not the cheap commodity hardware used in many cloud and big data systems. Overall, the authors bring new insights to the table, underlining the suitability of shared-data architectures for modern workloads. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
May 2015
2110 pages
ISBN:9781450327589
DOI:10.1145/2723372
General Chair:
Timos Sellis
RMIT University, Australia
,
Program Chairs:
Susan B. Davidson
University of Pennsylvania, USA
,
Zack Ives
University of Pennsylvania, USA
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 May 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
decoupled storage
optimistic concurrency control
shared-data
transaction processing
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '15 Paper Acceptance Rate106of415submissions,26%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 45
  Total Citations
  View Citations
- 1,793
  Total Downloads
- Downloads (Last 12 months)71
- Downloads (Last 6 weeks)26
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

On the Design and Scalability of Distributed Shared-Data Databases

SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Distributed Optimistic Concurrency Control Methods for High-Performance Transaction Processing

Hekaton: SQL server's memory-optimized OLTP engine

Database technologies in the world of big data

Reviews

Access critical reviews of Computing literature here