Abstract
The shift to multi-core hardware brings new challenges to database systems, as the software parallelism determines performance. Even though database systems traditionally accommodate simultaneous requests, a multitude of synchronization barriers serialize execution. Write-ahead logging is a fundamental, omnipresent component in ARIES-style concurrency and recovery, and one of the most important yet-to-be addressed potential bottlenecks, especially in OLTP workloads making frequent small changes to data.
In this paper, we identify four logging-related impediments to database system scalability. Each issue challenges different level in the software architecture: (a) the high volume of small-sized I/O requests may saturate the disk, (b) transactions hold locks while waiting for the log flush, (c) extensive context switching overwhelms the OS scheduler with threads executing log I/Os, and (d) contention appears as transactions serialize accesses to in-memory log data structures. We demonstrate these problems and address them with techniques that, when combined, comprise a holistic, scalable approach to logging. Our solution achieves a 20%-69% speedup over a modern database system when running log-intensive workloads, such as the TPC-B and TATP benchmarks. Moreover, it achieves log insert throughput over 1.8GB/s for small log records on a single socket server, an order of magnitude higher than the traditional way of accessing the log using a single mutex.
- L. Bouganim, B. Jonsson, and P. Bonnet. "uFlip: Understanding Flash I/O Patterns." In Proc. CIDR, 2009.Google Scholar
- M. Carey, et al. "Shoring up persistent applications." In Proc. SIGMOD, 1994. Google ScholarDigital Library
- S. Chen. "FlashLogging: exploiting flash devices for synchronous logging performance." In Proc. SIGMOD, 2009. Google ScholarDigital Library
- D. DeWitt, et al. "Implementation Techniques for Main Memory Database Systems." ACM TODS, 14(2), 1984. Google ScholarDigital Library
- D. Gawlick and D. Kinkade. "Varieties of Concurrency Control in IMS/VS Fast Path." IEEE Database Eng. Bull. 1985.Google Scholar
- N. Hardavellas, et al. "Database servers on chip multiprocessors: limitations and opportunities." In Proc. CIDR, 2007.Google Scholar
- S. Harizopoulos, D. J. Abadi,. S. Madden, and M. Stonebraker. "OLTP through the looking glass, and what we found there." In Proc. SIGMOD, 2008. Google ScholarDigital Library
- P. Helland, H. Sammer, J. Lyon, R. Carr, and P. Garrett. "Group Commit Timers and High-Volume Transaction Systems." In Proc. HPTS, 1987. Google ScholarDigital Library
- D. Hendler, N. Shavit, and L. Yerushalmi. "A Scalable Lock-free Stack Algorithm." In Proc. SPAA, 2004. Google ScholarDigital Library
- R. Johnson, I. Pandis, and A. Ailamaki. "Improving OLTP Scalability using Speculative Lock Inheritance." In Proc. VLDB, 2009. Google ScholarDigital Library
- R. Johnson, I. Pandis, N. Hardavellas, A. Ailamaki, and B. Falsafi. "Shore-MT: a scalable storage manager for the multi-core era." In Proc. EDBT, 2009. Google ScholarDigital Library
- S.-W. Lee, B. Moon, J.-M. Kim, and S.-W. Kim. "A Case for Flash Memory SSD in Enterprise Database Applications." In Proc. SIGMOD, 2008. Google ScholarDigital Library
- C. Mohan. "ARIES/KVL: A key-value locking method for concurrency control of multiaction transactions operating on B-tree indexes." In Proc. VLDB, 1990. Google ScholarDigital Library
- C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. "ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging." ACM TODS, 17(1), 1992. Google ScholarDigital Library
- M. Moir, D. Nussbaum, O. Shalev, and N. Shavit. "Using Elimination to Implement Scalable FIFO Queues." In Proc. SPAA, 2005. Google ScholarDigital Library
- Oracle. Oracle Asynchronous Commit. Oracle Database Advanced Application Developer's Guide. Available at: http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14251/adfns_sqlproc.htm.Google Scholar
- PostgreSQL Asynchronous Commit. PostgreSQL 8.4.2 Documentation. Available at: http://www.postgresql.org/files/documentation/pdf/8.4/postgresql-8.4.2-A4.pdf.Google Scholar
- A. Rafii, and D. DuBois. "Performance Tradeoffs of Group Commit Logging." In Proc. CMG Conference, 1989.Google Scholar
- N. Shavit and D. Touitou. "Elimination Trees and the Construction of Pools and Stacks." In Theory of Computing Systems, 30(6), pp 645--670, 1997.Google ScholarDigital Library
- M. L. Scott. "Non-Blocking Timeout in Scalable Queue-Based Spin Locks." In Proc. PODC, 2002. Google ScholarDigital Library
- E. Soisalon-Soininen, and T. Ylonen. "Partial Strictness in Two-Phase Locking." In Proc. ICDT, 1995. Google ScholarDigital Library
- M. Stonebraker, et al. "The end of an Architectural Era (It's Time for a Complete Rewrite)." In Proc. VLDB, 2007. Google ScholarDigital Library
- Telecom Application Transaction Processing Benchmark (TATP). TATP Benchmark Description. Available at: http://tatpbenchmark.sourceforge.net/TATP_Description.pdf.Google Scholar
- Transaction Processing Performance Council (TPC). TPC Benchmark B: Standard Specification. Available at http://www.tpc.org/tpcb/spec/tpcb_current.pdf.Google Scholar
- {A1} P. Helland. "Life Beyond Distributed Transactions: an Apostate's Opinion." In Proc. CIDR, 2007.Google Scholar
- {A2} T. Lahiri, V. Srihari, W. Chan, and N. MacNaughton. "Cache Fusion: Extending shared-disk clusters with shared caches." In Proc. VLDB, 2001. Google ScholarDigital Library
- {A3} D. Lomet. "Recovery for Shared Disk Systems Using Multiple Redo Logs." CRL 90/4, 1990.Google Scholar
- {A4} D. Lomet, R. Anderson, T. K. Rengarajan, and P. Spiro. "How the Rdb/VMS Data Sharing System Became Fast." CRL 92/4, 1992.Google Scholar
- {A5} Y. Oyama, K. Taura, and A. Yonezawa. "Executing Parallel Programs with Synchronization Bottlenecks Efficiently." In Proc. PDSIA, 1999, pp. 182--204.Google Scholar
- {A6} Transaction Processing Performance Council. "TPC - C v5.5: On-Line Transaction Processing (OLTP) Benchmark."Google Scholar
Index Terms
- Aether: a scalable approach to logging
Recommendations
Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies
ASPLOS '13Transactional memory (TM) has been proposed to alleviate some key programmability problems in chip multiprocessors. Most TMs optimistically allow concurrent transactions, detecting read-write or write-write conflicts. Upon conflicts, existing hardware ...
Comments