Abstract
The design of the logging and recovery components of database management systems (DBMSs) has always been influenced by the difference in the performance characteristics of volatile (DRAM) and non-volatile storage devices (HDD/SSDs). The key assumption has been that non-volatile storage is much slower than DRAM and only supports block-oriented read/writes. But the arrival of new non-volatile memory (NVM) storage that is almost as fast as DRAM with fine-grained read/writes invalidates these previous design choices.
This paper explores the changes that are required in a DBMS to leverage the unique properties of NVM in systems that still include volatile DRAM. We make the case for a new logging and recovery protocol, called write-behind logging, that enables a DBMS to recover nearly instantaneously from system failures. The key idea is that the DBMS logs what parts of the database have changed rather than how it was changed. Using this method, the DBMS flushes the changes to the database <u>before</u> recording them in the log. Our evaluation shows that this protocol improves a DBMS's transactional throughput by 1.3×, reduces the recovery time by more than two orders of magnitude, and shrinks the storage footprint of the DBMS on NVM by 1.5×. We also demonstrate that our logging protocol is compatible with standard replication schemes.
- NUMA policy library. http://linux.die.net/man/3/numa.Google Scholar
- Peloton Database Management System. http://pelotondb.org.Google Scholar
- Persistent memory programming library. http://pmem.io/.Google Scholar
- Intel Architecture Instruction Set Extensions Programming Reference. https://software.intel.com/sites/default/files/managed/b4/3a/319433-024.pdf, 2016.Google Scholar
- R. Agrawal and H. V. Jagadish. Recovery algorithms for database machines with nonvolatile main memory. IWDM'89, pages 269--285. Google ScholarDigital Library
- J. Arulraj, A. Pavlo, and S. Dulloor. Let's talk about storage & recovery methods for non-volatile memory database systems. In SIGMOD'15. Google ScholarDigital Library
- J. Arulraj, A. Pavlo, and P. Menon. Bridging the archipelago between row-stores and column-stores for hybrid workloads. In SIGMOD'16. Google ScholarDigital Library
- J. Axboe. Flexible io tester. http://freecode.com/projects/fio.Google Scholar
- G. W. Burr, B. N. Kurdi, J. C. Scott, C. H. Lam, K. Gopalakrishnan, and R. S. Shenoy. Overview of candidate device technologies for storage-class memory. IBM J. Res. Dev., 52(4):449--464, July 2008. Google ScholarDigital Library
- A. Chatzistergiou, M. Cintra, and S. D. Viglas. REWIND: Recovery write-ahead system for in-memory non-volatile data-structures. PVLDB, 2015. Google ScholarDigital Library
- S. Chen and Q. Jin. Persistent b+-trees in non-volatile main memory. Proc. VLDB Endow., 2015. Google ScholarDigital Library
- J. Condit, E. B. Nightingale, C. Frost, E. Ipek, B. Lee, D. Burger, and D. Coetzee. Better I/O through byte-addressable, persistent memory. In SOSP, pages 133--146, 2009. Google ScholarDigital Library
- B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In SoCC, 2010. Google ScholarDigital Library
- J. DeBrabant, J. Arulraj, A. Pavlo, M. Stonebraker, S. Zdonik, and S. Dulloor. A prolegomenon on OLTP database systems for non-volatile memory. In ADMS@VLDB, 2014.Google Scholar
- D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker, and D. Wood. Implementation techniques for main memory database systems. SIGMOD Rec., 14(2):1--8, 1984. Google ScholarDigital Library
- C. Diaconu, C. Freedman, E. Ismert, P.-A. Larson, P. Mittal, R. Stonecipher, N. Verma, and M. Zwilling. Hekaton: SQL Server's Memory-optimized OLTP Engine. In SIGMOD, 2013. Google ScholarDigital Library
- S. R. Dulloor, S. K. Kumar, A. Keshavamurthy, P. Lantz, D. Subbareddy, R. Sankaran, and J. Jackson. System software for persistent memory. In EuroSys, 2014. Google ScholarDigital Library
- R. Fang, H.-I. Hsiao, B. He, C. Mohan, and Y. Wang. High performance database logging using storage class memory. ICDE, pages 1221--1231, 2011. Google ScholarDigital Library
- M. Franklin. Concurrency Control and Recovery. The Computer Science and Engineering Handbook, pages 1058--1077, 1997.Google Scholar
- G. Graefe, W. Guy, and C. Sauer. Instant recovery with write-ahead logging: Page repair, system restart, and media restore. Synthesis Lectures on Data Management, 2015.Google Scholar
- T. Härder, C. Sauer, G. Graefe, and W. Guy. Instant recovery with write-ahead logging. Datenbank-Spektrum, pages 235--239, 2015.Google ScholarCross Ref
- J. Huang, K. Schwan, and M. K. Qureshi. Nvram-aware logging in transaction systems. Proc. VLDB Endow., pages 389--400, Dec. 2014. Google ScholarDigital Library
- R. Johnson, I. Pandis, N. Hardavellas, A. Ailamaki, and B. Falsafi. Shore-MT: a scalable storage manager for the multicore era. In EDBT, pages 24--35, 2009. Google ScholarDigital Library
- H. Kim, S. Seshadri, C. L. Dickey, and L. Chiu. Evaluating phase change memory for enterprise storage systems: A study of caching and tiering approaches. In FAST, 2014. Google ScholarDigital Library
- H. Kimura. FOEDUS: OLTP engine for a thousand cores and NVRAM. In SIGMOD, 2015. Google ScholarDigital Library
- P.-A. Larson, S. Blanas, C. Diaconu, C. Freedman, J. M. Patel, and M. Zwilling. High-performance concurrency control mechanisms for main-memory databases. Proc. VLDB Endow., 5(4):298--309, Dec. 2011. Google ScholarDigital Library
- D. E. Lowell and P. M. Chen. Free transactions with rio vista. In SOSP, 1997. Google ScholarDigital Library
- C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst., 17(1):94--162, 1992. Google ScholarDigital Library
- T. Neumann, T. Mühlbauer, and A. Kemper. Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems. In SIGMOD, 2015. Google ScholarDigital Library
- NVM Express Inc. NVM Express over Fabrics specification. http://www.nvmexpress.org/specifications, 2016.Google Scholar
- G. Oh, S. Kim, S.-W. Lee, and B. Moon. Sqlite optimization with phase change memory for mobile applications. Proc. VLDB Endow., 8(12):1454--1465, Aug. 2015. Google ScholarDigital Library
- I. Oukid, D. Booss, W. Lehner, P. Bumbulis, and T. Willhalm. SOFORT: A hybrid SCM-DRAM storage engine for fast data recovery. DaMoN, 2014. Google ScholarDigital Library
- S. Pelley, T. F. Wenisch, B. T. Gold, and B. Bridge. Storage management in the NVRAM era. PVLDB, 7(2):121--132, 2013. Google ScholarDigital Library
- S. Pilarski and T. Kameda. Checkpointing for distributed databases: Starting from the basics. IEEE Trans. Parallel Distrib. Syst., 1992. Google ScholarDigital Library
- The Transaction Processing Council. TPC-C Benchmark (Revision 5.9.0). http://www.tpc.org/tpcc/, June 2007.Google Scholar
- T. Wang and R. Johnson. Scalable logging through emerging non-volatile memory. PVLDB, 7(10):865--876, 2014. Google ScholarDigital Library
- Y. Zhang, J. Yang, A. Memaripour, and S. Swanson. Mojim: A reliable and highly-available non-volatile memory system. In ASPLOS, 2015. Google ScholarDigital Library
- W. Zheng, S. Tu, E. Kohler, and B. Liskov. Fast databases with fast durability and recovery through multicore parallelism. In OSDI, 2014. Google ScholarDigital Library
Recommendations
Write-combined logging: an optimized logging for consistency in NVRAM
Nonvolatile memory (e.g., Phase Change Memory) blurs the boundary between memory and storage and it could greatly facilitate the construction of in-memory durable data structures. Data structures can be processed and stored directly in NVRAM. To ...
Flip-N-Write: a simple deterministic technique to improve PRAM write performance, energy and endurance
MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on MicroarchitectureThe phase-change random access memory (PRAM) technology is fast maturing to production levels. Main advantages of PRAM are non-volatility, byte addressability, in-place programmability, low-power operation, and higher write endurance than that of ...
Write reconstruction for write throughput improvement on MLC PCM based main memory
The emerging Phase Change Memory (PCM) is considered as one of the most promising candidates to replace DRAM as main memory due to its better scalability and non-volatility. With multi-bit storage capability, Multiple-Level-Cell (MLC) PCM outperforms ...
Comments