ABSTRACT
Database scale-out is commonly implemented by partitioning data across several database instances. This approach, however, has several restrictions. In particular, partitioned databases are inflexible in large-scale deployments and assume a partition-friendly workload in order to scale. In this paper, we analyze an alternative architecture design for distributed relational databases that overcomes the limitations of partitioned databases. The architecture is based on two fundamental principles: We decouple query processing and transaction management from data storage, and we share data across query processing nodes. The combination of these design choices provides scalability, elasticity, and operational flexibility without making any assumptions on the workload. As a drawback, sharing data among multiple database nodes causes synchronization overhead. To address this limitation, we introduce techniques for scalable transaction processing in shared-data environments. Specifically, we describe mechanisms for efficient data access, concurrency control, and data buffering. In combination with new hardware trends, the techniques enable performance characteristics that top state-of-the-art partitioned databases.
- M. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: a new paradigm for building scalable distributed systems. SOSP'07, pages 159--174, 2007. Google ScholarDigital Library
- Apache Hadoop. http://hadoop.apache.org/. Nov. 06, 2014.Google Scholar
- J. Baker, C. Bond, J. Corbett, J. Furman, A. Khorlin, J. Larson, J.-M. Léon, Y. Li, A. Lloyd, and V. Yushprakh. Megastore: providing scalable, highly available storage for interactive services. CIDR'11, pages 223--234, 2011.Google Scholar
- H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil, and P. O'Neil. A critique of ANSI SQL isolation levels. SIGMOD'95, pages 1--10, 1995. Google ScholarDigital Library
- P. Bernstein and N. Goodman. Concurrency control in distributed database systems. ACM Comput. Surv., 13(2):185--221, 1981. Google ScholarDigital Library
- P. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency control and recovery in database systems. Addison-Wesley, 1987. Google ScholarDigital Library
- P. Bernstein, C. Reid, and S. Das. Hyder - a transactional record manager for shared flash. CIDR'11, pages 9--20, 2011.Google Scholar
- M. Brantner, D. Florescu, D. Graf, D. Kossmann, and T. Kraska. Building a database on S3. SIGMOD'08, pages 251--264, 2008. Google ScholarDigital Library
- M. Cahill, U. Röhm, and A. Fekete. Serializable isolation for snapshot databases. SIGMOD'08, pages 729--738, 2008. Google ScholarDigital Library
- D. Campbell, G. Kakivaya, and N. Ellis. Extreme scale with full SQL language support in microsoft SQL Azure. SIGMOD'10, pages 1021--1024, 2010. Google ScholarDigital Library
- S. Chandrasekaran and R. Bamford. Shared cache - the future of parallel databases. ICDE'03, 2003.Google ScholarCross Ref
- F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. Gruber. Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2):4:1--4:26, 2008. Google ScholarDigital Library
- J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. Spanner: Google's globally-distributed database. OSDI'12, pages 251--264, 2012. Google ScholarDigital Library
- S. Das, D. Agrawal, and A. El Abbadi. G-Store: a scalable data store for transactional multi key access in the cloud. SoCC'10, pages 163--174, 2010. Google ScholarDigital Library
- S. Das, D. Agrawal, and A. El Abbadi. ElasTraS: an elastic, scalable, and self-managing transactional database for the cloud. ACM Trans. Database Syst., 38(1):5:1--5:45, 2013. Google ScholarDigital Library
- G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. SOSP'07, pages 205--220, 2007. Google ScholarDigital Library
- D. DeWitt and J. Gray. Parallel database systems: the future of high performance database systems. Commun. ACM, 35(6):85--98, 1992. Google ScholarDigital Library
- A. Fekete, D. Liarokapis, E. O'Neil, P. O'Neil, and D. Shasha. Making snapshot isolation serializable. ACM Trans. Database Syst., 30(2):492--528, 2005. Google ScholarDigital Library
- FoundationDB. https://foundationdb.com/. Feb. 07, 2015.Google Scholar
- D. Gomez Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh. Omid: lock-free transactional support for distributed data stores. ICDE'14, pages 676--687, 2014.Google Scholar
- G. Graefe. A survey of B-tree locking techniques. ACM Trans. Database Syst., 35(3):16:1--16:26, 2010. Google ScholarDigital Library
- J. Gray and A. Reuter. Transaction processing: concepts and techniques. Morgan Kaufmann, 1992. Google ScholarDigital Library
- T. Horikawa. Latch-free data structures for DBMS: design, implementation, and evaluation. SIGMOD'13, pages 409--420, 2013. Google ScholarDigital Library
- InfiniBand. http://www.infinibandta.org/. Nov. 06, 2014.Google Scholar
- E. Jensen, G. Hagensen, and J. Broughton. A new approach to exclusive data access in shared memory multiprocessors. Technical Report UCRL-97663, 1987.Google Scholar
- E. P. Jones, D. J. Abadi, and S. Madden. Low overhead concurrency control for partitioned main memory databases. SIGMOD'10, pages 603--614, 2010. Google ScholarDigital Library
- S. Jorwekar, A. Fekete, K. Ramamritham, and S. Sudarshan. Automating the detection of snapshot isolation anomalies. VLDB'07, pages 1263--1274, 2007. Google ScholarDigital Library
- J. Josten, C. Mohan, I. Narang, and J. Teng. DB2's use of the coupling facility for data sharing. IBM Systems Journal, 36(2):327--351, 1997. Google ScholarDigital Library
- R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. Abadi. H-store: a high-performance, distributed main memory transaction processing system. Proc. VLDB Endow., 1(2):1496--1499, 2008. Google ScholarDigital Library
- A. Kemper and T. Neumann. HyPer: a hybrid OLTP & OLAP main memory database system based on virtual memory snapshots. ICDE'11, pages 195--206, 2011. Google ScholarDigital Library
- H. T. Kung and J. T. Robinson. On optimistic methods for concurrency control. ACM Trans. Database Syst., 6(2):213--226, 1981. Google ScholarDigital Library
- P.-A. Larson, S. Blanas, C. Diaconu, C. Freedman, J. M. Patel, and M. Zwilling. High-performance concurrency control mechanisms for main-memory databases. Proc. VLDB Endow., 5(4):298--309, 2011. Google ScholarDigital Library
- P. Lehman and S. B. Yao. Efficient locking for concurrent operations on B-trees. ACM Trans. Database Syst., 6(4):650--670, 1981. Google ScholarDigital Library
- J. Levandoski, D. Lomet, M. Mokbel, and K. Zhao. Deuteronomy: transaction support for cloud data. CIDR'11, pages 123--133, 2011.Google Scholar
- J. Levandoski, D. Lomet, and S. Sengupta. The Bw-tree: a B-tree for new hardware platforms. ICDE'13, pages 302--313, 2013. Google ScholarDigital Library
- D. Lomet, R. Anderson, T. Rengarajan, and P. Spiro. How the Rdb/VMS data sharing system became fast. Technical Report CRL 92/4, 1992.Google Scholar
- D. Lomet, A. Fekete, G. Weikum, and M. Zwilling. Unbundling transaction services in the cloud. CIDR'09, 2009.Google Scholar
- M. Mages. ABA prevention using single-word instructions. Technical Report RC23089 (W0401-136), 2004.Google Scholar
- C. Mohan. History repeats itself: sensible and Nonsen aspects of the NoSQL hoopla. EDBT'13, pages 11--16, 2013. Google ScholarDigital Library
- T. Mühlbauer, W. Rödiger, A. Reiser, A. Kemper, and T. Neumann. ScyPer: elastic OLAP throughput on transactional data. DanaC'13, pages 11--15, 2013. Google ScholarDigital Library
- MySQL Cluster. http://www.mysql.com/products/cluster/. Nov. 06, 2014.Google Scholar
- D. Ongaro, S. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum. Fast crash recovery in RAMCloud. SOSP'11, pages 29--41, 2011. Google ScholarDigital Library
- J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, D. Ongaro, G. Parulkar, M. Rosenblum, S. Rumble, E. Stratmann, and R. Stutsman. The case for RAMCloud. Commun. ACM, 54(7):121--130, 2011. Google ScholarDigital Library
- V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-time query processing. ICDE'08, pages 60--69, 2008. Google ScholarDigital Library
- J. Rao, E. Shekita, and S. Tata. Using paxos to build a scalable, consistent, and highly available datastore. Proc. VLDB Endow., 4(4):243--254, 2011. Google ScholarDigital Library
- W. Ronald. A technical overview of the Oracle Exadata database machine and Exadata storage server. Technical Report Oracle White Paper, 2012.Google Scholar
- M. Rys. Scalable SQL. Commun. ACM, 54(6):48--53, 2011. Google ScholarDigital Library
- M. Serafini, E. Mansour, A. Aboulnaga, K. Salem, T. Rafiq, and U. F. Minhas. Accordion: Elastic scalability for database systems supporting distributed transactions. Proc. VLDB Endow., 7(12), 2014. Google ScholarDigital Library
- J. Shute, R. Vingralek, B. Samwel, et al. F1: a distributed SQL database that scales. Proc. VLDB Endow., 6(11):1068--1079, 2013. Google ScholarDigital Library
- A. Singhal, R. Van der Wijngaart, and P. Barry. Atomic read modify write primitives for I/O devices. Technical Report Intel White Paper, 2008.Google Scholar
- G. H. Sockut and B. R. Iyer. Online reorganization of databases. ACM Comput. Surv., 41(3):14:1--14:136, 2009. Google ScholarDigital Library
- M. Stonebraker and R. Cattell. 10 rules for scalable performance in 'simple operation' datastores. Commun. ACM, 54(6):72--80, 2011. Google ScholarDigital Library
- M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era: (it's time for a complete rewrite). VLDB'07, pages 1150--1160, 2007. Google ScholarDigital Library
- M. Stonebraker and A. Weisberg. The VoltDB main memory DBMS. IEEE Data Eng. Bull., 36(2):21--27, 2013.Google Scholar
- R. Taft, E. Mansour, M. Serafini, J. Duggan, A. J. ElmoreA, A. Aboulnaga, A. Pavlo, and M. Stonebraker. E-store: Fine-grained elastic partitioning for distributed transaction processing systems. Proc. VLDB Endow., 8(3), 2014. Google ScholarDigital Library
- A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi. Calvin: fast distributed transactions for partitioned database systems. SIGMOD'12, pages 1--12, 2012. Google ScholarDigital Library
- Transaction Processing Performance Council (TPC). TPC Benchmark C Specification ver. 5.11, 2010.Google Scholar
- S. Tu, W. Zheng, E. Kohler, B. Liskov, and S. Madden. Speedy transactions in multicore in-memory databases. SOSP'13, pages 18--32, 2013. Google ScholarDigital Library
- P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann. Predictable performance for unpredictable workloads. Proc. VLDB Endow., 2(1):706--717, 2009. Google ScholarDigital Library
- VoltDB. http://www.voltdb.com/. Nov. 06, 2014.Google Scholar
- M. Yabandeh and D. Gómez Ferro. A critique of snapshot isolation. EuroSys'12, pages 155--168, 2012. Google ScholarDigital Library
Index Terms
- On the Design and Scalability of Distributed Shared-Data Databases
Recommendations
Distributed Optimistic Concurrency Control Methods for High-Performance Transaction Processing
There is an ever-increasing demand for more complex transactions and higher throughputs in transaction processing systems leading to higher degrees of transaction concurrency and, hence, higher data contention. The conventional two-phase locking (2PL) ...
Hekaton: SQL server's memory-optimized OLTP engine
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of DataHekaton is a new database engine optimized for memory resident data and OLTP workloads. Hekaton is fully integrated into SQL Server; it is not a separate system. To take advantage of Hekaton, a user simply declares a table memory optimized. Hekaton ...
Database technologies in the world of big data
CompSysTech '15: Proceedings of the 16th International Conference on Computer Systems and TechnologiesNow we have a number of database technologies called usually NoSQL, like key-value, column-oriented, and document stores as well as search engines and graph databases. Whereas SQL software vendors offer advanced products with the capability to handle ...
Comments