skip to main content
10.1145/2723372.2751519acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

On the Design and Scalability of Distributed Shared-Data Databases

Published:27 May 2015Publication History

ABSTRACT

Database scale-out is commonly implemented by partitioning data across several database instances. This approach, however, has several restrictions. In particular, partitioned databases are inflexible in large-scale deployments and assume a partition-friendly workload in order to scale. In this paper, we analyze an alternative architecture design for distributed relational databases that overcomes the limitations of partitioned databases. The architecture is based on two fundamental principles: We decouple query processing and transaction management from data storage, and we share data across query processing nodes. The combination of these design choices provides scalability, elasticity, and operational flexibility without making any assumptions on the workload. As a drawback, sharing data among multiple database nodes causes synchronization overhead. To address this limitation, we introduce techniques for scalable transaction processing in shared-data environments. Specifically, we describe mechanisms for efficient data access, concurrency control, and data buffering. In combination with new hardware trends, the techniques enable performance characteristics that top state-of-the-art partitioned databases.

References

  1. M. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: a new paradigm for building scalable distributed systems. SOSP'07, pages 159--174, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Apache Hadoop. http://hadoop.apache.org/. Nov. 06, 2014.Google ScholarGoogle Scholar
  3. J. Baker, C. Bond, J. Corbett, J. Furman, A. Khorlin, J. Larson, J.-M. Léon, Y. Li, A. Lloyd, and V. Yushprakh. Megastore: providing scalable, highly available storage for interactive services. CIDR'11, pages 223--234, 2011.Google ScholarGoogle Scholar
  4. H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil, and P. O'Neil. A critique of ANSI SQL isolation levels. SIGMOD'95, pages 1--10, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Bernstein and N. Goodman. Concurrency control in distributed database systems. ACM Comput. Surv., 13(2):185--221, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency control and recovery in database systems. Addison-Wesley, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Bernstein, C. Reid, and S. Das. Hyder - a transactional record manager for shared flash. CIDR'11, pages 9--20, 2011.Google ScholarGoogle Scholar
  8. M. Brantner, D. Florescu, D. Graf, D. Kossmann, and T. Kraska. Building a database on S3. SIGMOD'08, pages 251--264, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Cahill, U. Röhm, and A. Fekete. Serializable isolation for snapshot databases. SIGMOD'08, pages 729--738, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Campbell, G. Kakivaya, and N. Ellis. Extreme scale with full SQL language support in microsoft SQL Azure. SIGMOD'10, pages 1021--1024, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Chandrasekaran and R. Bamford. Shared cache - the future of parallel databases. ICDE'03, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  12. F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. Gruber. Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2):4:1--4:26, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. Spanner: Google's globally-distributed database. OSDI'12, pages 251--264, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Das, D. Agrawal, and A. El Abbadi. G-Store: a scalable data store for transactional multi key access in the cloud. SoCC'10, pages 163--174, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Das, D. Agrawal, and A. El Abbadi. ElasTraS: an elastic, scalable, and self-managing transactional database for the cloud. ACM Trans. Database Syst., 38(1):5:1--5:45, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. SOSP'07, pages 205--220, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. DeWitt and J. Gray. Parallel database systems: the future of high performance database systems. Commun. ACM, 35(6):85--98, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Fekete, D. Liarokapis, E. O'Neil, P. O'Neil, and D. Shasha. Making snapshot isolation serializable. ACM Trans. Database Syst., 30(2):492--528, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. FoundationDB. https://foundationdb.com/. Feb. 07, 2015.Google ScholarGoogle Scholar
  20. D. Gomez Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh. Omid: lock-free transactional support for distributed data stores. ICDE'14, pages 676--687, 2014.Google ScholarGoogle Scholar
  21. G. Graefe. A survey of B-tree locking techniques. ACM Trans. Database Syst., 35(3):16:1--16:26, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Gray and A. Reuter. Transaction processing: concepts and techniques. Morgan Kaufmann, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Horikawa. Latch-free data structures for DBMS: design, implementation, and evaluation. SIGMOD'13, pages 409--420, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. InfiniBand. http://www.infinibandta.org/. Nov. 06, 2014.Google ScholarGoogle Scholar
  25. E. Jensen, G. Hagensen, and J. Broughton. A new approach to exclusive data access in shared memory multiprocessors. Technical Report UCRL-97663, 1987.Google ScholarGoogle Scholar
  26. E. P. Jones, D. J. Abadi, and S. Madden. Low overhead concurrency control for partitioned main memory databases. SIGMOD'10, pages 603--614, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Jorwekar, A. Fekete, K. Ramamritham, and S. Sudarshan. Automating the detection of snapshot isolation anomalies. VLDB'07, pages 1263--1274, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Josten, C. Mohan, I. Narang, and J. Teng. DB2's use of the coupling facility for data sharing. IBM Systems Journal, 36(2):327--351, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. Abadi. H-store: a high-performance, distributed main memory transaction processing system. Proc. VLDB Endow., 1(2):1496--1499, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Kemper and T. Neumann. HyPer: a hybrid OLTP & OLAP main memory database system based on virtual memory snapshots. ICDE'11, pages 195--206, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. H. T. Kung and J. T. Robinson. On optimistic methods for concurrency control. ACM Trans. Database Syst., 6(2):213--226, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P.-A. Larson, S. Blanas, C. Diaconu, C. Freedman, J. M. Patel, and M. Zwilling. High-performance concurrency control mechanisms for main-memory databases. Proc. VLDB Endow., 5(4):298--309, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. P. Lehman and S. B. Yao. Efficient locking for concurrent operations on B-trees. ACM Trans. Database Syst., 6(4):650--670, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Levandoski, D. Lomet, M. Mokbel, and K. Zhao. Deuteronomy: transaction support for cloud data. CIDR'11, pages 123--133, 2011.Google ScholarGoogle Scholar
  35. J. Levandoski, D. Lomet, and S. Sengupta. The Bw-tree: a B-tree for new hardware platforms. ICDE'13, pages 302--313, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. D. Lomet, R. Anderson, T. Rengarajan, and P. Spiro. How the Rdb/VMS data sharing system became fast. Technical Report CRL 92/4, 1992.Google ScholarGoogle Scholar
  37. D. Lomet, A. Fekete, G. Weikum, and M. Zwilling. Unbundling transaction services in the cloud. CIDR'09, 2009.Google ScholarGoogle Scholar
  38. M. Mages. ABA prevention using single-word instructions. Technical Report RC23089 (W0401-136), 2004.Google ScholarGoogle Scholar
  39. C. Mohan. History repeats itself: sensible and Nonsen aspects of the NoSQL hoopla. EDBT'13, pages 11--16, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. T. Mühlbauer, W. Rödiger, A. Reiser, A. Kemper, and T. Neumann. ScyPer: elastic OLAP throughput on transactional data. DanaC'13, pages 11--15, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. MySQL Cluster. http://www.mysql.com/products/cluster/. Nov. 06, 2014.Google ScholarGoogle Scholar
  42. D. Ongaro, S. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum. Fast crash recovery in RAMCloud. SOSP'11, pages 29--41, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, D. Ongaro, G. Parulkar, M. Rosenblum, S. Rumble, E. Stratmann, and R. Stutsman. The case for RAMCloud. Commun. ACM, 54(7):121--130, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-time query processing. ICDE'08, pages 60--69, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. Rao, E. Shekita, and S. Tata. Using paxos to build a scalable, consistent, and highly available datastore. Proc. VLDB Endow., 4(4):243--254, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. W. Ronald. A technical overview of the Oracle Exadata database machine and Exadata storage server. Technical Report Oracle White Paper, 2012.Google ScholarGoogle Scholar
  47. M. Rys. Scalable SQL. Commun. ACM, 54(6):48--53, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. M. Serafini, E. Mansour, A. Aboulnaga, K. Salem, T. Rafiq, and U. F. Minhas. Accordion: Elastic scalability for database systems supporting distributed transactions. Proc. VLDB Endow., 7(12), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. J. Shute, R. Vingralek, B. Samwel, et al. F1: a distributed SQL database that scales. Proc. VLDB Endow., 6(11):1068--1079, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. A. Singhal, R. Van der Wijngaart, and P. Barry. Atomic read modify write primitives for I/O devices. Technical Report Intel White Paper, 2008.Google ScholarGoogle Scholar
  51. G. H. Sockut and B. R. Iyer. Online reorganization of databases. ACM Comput. Surv., 41(3):14:1--14:136, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. M. Stonebraker and R. Cattell. 10 rules for scalable performance in 'simple operation' datastores. Commun. ACM, 54(6):72--80, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era: (it's time for a complete rewrite). VLDB'07, pages 1150--1160, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. M. Stonebraker and A. Weisberg. The VoltDB main memory DBMS. IEEE Data Eng. Bull., 36(2):21--27, 2013.Google ScholarGoogle Scholar
  55. R. Taft, E. Mansour, M. Serafini, J. Duggan, A. J. ElmoreA, A. Aboulnaga, A. Pavlo, and M. Stonebraker. E-store: Fine-grained elastic partitioning for distributed transaction processing systems. Proc. VLDB Endow., 8(3), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi. Calvin: fast distributed transactions for partitioned database systems. SIGMOD'12, pages 1--12, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Transaction Processing Performance Council (TPC). TPC Benchmark C Specification ver. 5.11, 2010.Google ScholarGoogle Scholar
  58. S. Tu, W. Zheng, E. Kohler, B. Liskov, and S. Madden. Speedy transactions in multicore in-memory databases. SOSP'13, pages 18--32, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann. Predictable performance for unpredictable workloads. Proc. VLDB Endow., 2(1):706--717, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. VoltDB. http://www.voltdb.com/. Nov. 06, 2014.Google ScholarGoogle Scholar
  61. M. Yabandeh and D. Gómez Ferro. A critique of snapshot isolation. EuroSys'12, pages 155--168, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On the Design and Scalability of Distributed Shared-Data Databases

          Recommendations

          Reviews

          Filip-Martin Brinkmann

          Achieving horizontal scalability in distributed databases is in most cases achieved by partitioning the data into shards. It comes, however, at a price; the problem with this approach is that only the database node that owns a shard can access it. This leads to a potential bottleneck when executing distributed transactions. The paper explores the architecture of a distributed relational database system called Tell, which does not follow this paradigm; instead of the whole database, only data are partitioned. Such systems are known as shared-data databases. The paper presents the architecture of the system and discusses the suitability of the approach for both online transaction processing (OLTP) and online analytical processing (OLAP) workloads. Moreover, the authors provide detailed insights into some of their solutions, which solve common problems that appear in shared-data systems. Finally, they show the results of extensive evaluations that not only cover Tell's performance, but also compare it to existing distributed database systems. On a higher level, the paper contributes to an important ongoing discussion concerning the question: Which architectures work best for the workloads that today's large-scale web applications produce__?__ The paper is a very thorough introduction for readers who have some knowledge about architectures of distributed databases. Where necessary, more in-depth information is provided in order to help readers understand the big picture. The authors identify challenges and present their contributions in a very structured way. The results should, however, be taken with a grain of salt: Tell heavily relies on a highly optimized system stack and runs in a local area network (LAN) interconnected by Infiniband. This is certainly not the cheap commodity hardware used in many cloud and big data systems. Overall, the authors bring new insights to the table, underlining the suitability of shared-data architectures for modern workloads. Online Computing Reviews Service

          Access critical reviews of Computing literature here

          Become a reviewer for Computing Reviews.

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
            May 2015
            2110 pages
            ISBN:9781450327589
            DOI:10.1145/2723372

            Copyright © 2015 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 27 May 2015

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            SIGMOD '15 Paper Acceptance Rate106of415submissions,26%Overall Acceptance Rate785of4,003submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader