Abstract
We observe that the time bottleneck during the recovery phase of an IMDB (In-Memory DataBase system) shifts from log replaying to index rebuilding after the state-of-art techniques for instant recovery have been applied. In this paper, we investigate index checkpoints to eliminate this bottleneck. However, improper designs may lead to inconsistent index checkpoints or incur severe performance degradation. For the correctness challenge, we combine two techniques, i.e., deferred deletion of index entries, and on-demand clean-up of dangling index entries after recovery, to achieve data correctness. For the efficiency challenge, we propose three wait-free index checkpoint algorithms, i.e., ChainIndex, MirrorIndex, IACoW, for supporting efficient normal processing and fast recovery. We implement our proposed solutions in HiEngine, an IMDB being developed as part of Huawei's next-generation cloud-native database product. We evaluate the impact of index checkpoint persistence on recovery and transaction performance using two workloads (i.e., TPC-C and Microbench). We analyze the pros and cons of each algorithm. Our experimental results show that HiEngine can be recovered instantly (i.e., in ~10 s) with only slight (i.e., 5% - 11%) performance degradation. Therefore, we strongly recommend integrating index checkpointing into IMDBs if recovery time is a crucial product metric.
- Joy Arulraj, Matthew Perron, and Andrew Pavlo. 2016. Write-Behind Logging. Proc. VLDB Endow. 10 (2016), 337--348.Google ScholarDigital Library
- Robert Binna, Eva Zangerle, M. Pichl, Günther Specht, and Viktor Leis. 2018. HOT: A Height Optimized Trie Index for Main-Memory Database Systems. In SIGMOD.Google ScholarDigital Library
- Tuan Cao, M. V. Salles, B. Sowell, Yao Yue, A. Demers, J. Gehrke, and Walker M. White. 2011. Fast checkpoint recovery algorithms for frequently consistent applications. In SIGMOD.Google Scholar
- D. DeWitt, R. Katz, F. Olken, Leonard D. Shapiro, M. Stonebraker, and D. Wood. 1984. Implementation techniques for main memory database systems. In SIGMOD.Google Scholar
- J. Gehrke and Tuan Cao. 2013. Fault tolerance for main-memory applications in the cloud.Google Scholar
- Goetz Graefe. 2012. A survey of B-tree logging and recovery techniques. ACM Transactions on Database Systems (TODS) 37 (2012), 1 -- 35.Google ScholarDigital Library
- Theo Haerder and Andreas Reuter. 1983. Principles of transaction-oriented database recovery. ACM Comput. Surv. 15 (1983), 287--317.Google ScholarDigital Library
- Michael Haubenschild, Caetano Sauer, Thomas Neumann, and Viktor Leis. 2020. Rethinking Logging, Checkpoints, and Recovery for High-Performance Storage Engines. In SIGMOD.Google Scholar
- Hironobu. [n.d.]. The internals of postgresql for database administrators and system developers. http://www.interdb.jp/pg/Google Scholar
- A. Kemper and Thomas Neumann. 2011. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In ICDE. 195--206.Google Scholar
- Kangnyeon Kim, Tianzheng Wang, R. Johnson, and I. Pandis. 2016. ERMIA: Fast Memory-Optimized Database System for Heterogeneous Workloads. In SIGMOD.Google ScholarDigital Library
- P. Larson, Spyros Blanas, C. Diaconu, C. Freedman, J. M. Patel, and Mike Zwilling. 2011. High-Performance Concurrency Control Mechanisms for Main-Memory Databases. Proc. VLDB Endow. 5 (2011), 298--309.Google ScholarDigital Library
- Viktor Leis, Michael Haubenschild, A. Kemper, and Thomas Neumann. 2018. LeanStore: In-Memory Data Management beyond Main Memory. ICDE, 185--196.Google Scholar
- Viktor Leis, Michael Haubenschild, and T. Neumann. 2019. Optimistic Lock Coupling: A Scalable and Efficient General-Purpose Synchronization Method. IEEE TKDE 42 (2019), 73--84.Google Scholar
- Viktor Leis, A. Kemper, and Thomas Neumann. 2013. The adaptive radix tree: ARTful indexing for main-memory databases. ICDE, 38--49.Google Scholar
- Viktor Leis, Florian Scheibner, A. Kemper, and T. Neumann. 2016. The ART of practical synchronization. In DaMoN '16.Google Scholar
- Justin J. Levandoski, D. Lomet, and S. Sengupta. 2013. The Bw-Tree: A B-tree for new hardware platforms. ICDE, 302--313.Google Scholar
- Justin J. Levandoski, D. Lomet, and S. Sengupta. 2013. LLAMA: A Cache/Storage Subsystem for Modern Hardware. Proc. VLDB Endow. 6 (2013), 877--888.Google ScholarDigital Library
- L. Li, Guoren Wang, Gang Wu, and Ye Yuan. 2018. Consistent Snapshot Algorithms for In-Memory Database Systems: Experiments and Analysis. ICDE, 1284--1287.Google Scholar
- L. Li, Guoren Wang, Gang Wu, Ye Yuan, Lei Chen, and Xiang Lian. 2021. A Comparative Study of Consistent Snapshot Algorithms for Main-Memory Database Systems. IEEE TKDE 33 (2021), 316--330.Google Scholar
- Antti-Pekka Liedes and A. Wolski. 2006. SIREN: A Memory-Conserving, Snapshot-Consistent Checkpoint Algorithm for in-Memory Databases. ICDE, 99--99.Google Scholar
- Hyeontaek Lim, M. Kaminsky, and D. Andersen. 2017. Cicada: Dependably Fast Multi-Core In-Memory Transactions. In SIGMOD.Google ScholarDigital Library
- Gangcai Liu, Leying Chen, and Shimin Chen. 2021. Zen: a High-Throughput Log-Free OLTP Engine for Non-Volatile Main Memory. Proc. VLDB Endow. 14 (2021), 835--848.Google ScholarDigital Library
- David B. Lomet and Betty Salzberg. 1997. Concurrency and recovery for index trees. The VLDB Journal 6 (1997), 224--240.Google ScholarDigital Library
- Yunus Ma, Siphrey Xie, Henry Zhong, Leon Lee, and King Lv. 2022. HiEngine: How to Architect a Cloud-Native Memory-Optimized Database Engine.. In SIGMOD.Google Scholar
- Arlino Magalhães, Angelo Brayner, José Maria S. Monteiro, and Gustavo Moraes. 2021. Indexed Log File: Towards Main Memory Database Instant Recovery. In EDBT.Google Scholar
- Nirmesh Malviya, Ariel Weisberg, S. Madden, and M. Stonebraker. 2014. Rethinking main memory OLTP recovery. ICDE, 604--615.Google Scholar
- Yandong Mao, E. Kohler, and R. Morris. 2012. Cache craftiness for fast multicore key-value storage. In EuroSys '12.Google Scholar
- C. Mohan, Donald J. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. 1992. ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst. 17 (1992), 94--162.Google ScholarDigital Library
- Thomas Neumann, Tobias Mühlbauer, and A. Kemper. 2015. Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems. In SIGMOD.Google Scholar
- Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, T. Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, A. Tomasic, S. Toor, D. V. Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. 2017. Self-Driving Database Management Systems. In CIDR.Google Scholar
- PMDK. [n.d.]. using-persistent-memory-devices-with-the-linux-device-mapper. https://pmem.io/2018/05/15/using_persistent_memory_devices_with_the_linux_device_mapper.htmlGoogle Scholar
- J. Rao and K. A. Ross. 2000. Making B+- trees cache conscious in main memory. In SIGMOD.Google Scholar
- Kun Ren, Thaddeus Diamond, D. Abadi, and Alexander Thomson. 2016. Low-Overhead Asynchronous Checkpointing in Main-Memory Database Systems. In SIGMOD.Google Scholar
- K. Salem and H. Garcia-Molina. 1990. System M: A Transaction Processing Testbed for Memory Resident Data. IEEE TKDE 2 (1990), 161--172.Google Scholar
- M. V. Salles, Tuan Cao, B. Sowell, A. Demers, J. Gehrke, Christoph E. Koch, and Walker M. White. 2009. An Evaluation of Checkpoint Recovery for Massively Multiplayer Online Games. Proc. VLDB Endow. 2 (2009), 1258--1269.Google ScholarDigital Library
- Caetano Sauer. 2019. Modern techniques for transaction-oriented database recovery1.Google Scholar
- Caetano Sauer, G. Graefe, and T. Härder. 2017. Instant Restore After a Media Failure. In ADBIS.Google Scholar
- Caetano Sauer, G. Graefe, and T. Härder. 2018. FineLine: log-structured transactional storage and recovery. Proc. VLDB Endow. 11 (2018), 2249--2262.Google ScholarDigital Library
- Stephen Tu, Wenting Zheng, E. Kohler, B. Liskov, and S. Madden. 2013. Speedy transactions in multicore in-memory databases. Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles.Google Scholar
- Alexandre Verbitski, Anurag Gupta, D. Saha, Murali Brahmadesam, K. Gupta, Raman Mittal, S. Krishnamurthy, Sandor Maurice, T. Kharatishvili, and Xiaofeng Bao. 2017. Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. In SIGMOD.Google ScholarDigital Library
- Ziqi Wang, Andrew Pavlo, Hyeontaek Lim, Viktor Leis, Huanchen Zhang, M. Kaminsky, and D. Andersen. 2018. Building a Bw-Tree Takes More Than Just Buzz Words. In SIGMOD.Google Scholar
- Yingjun Wu, Joy Arulraj, Jiexi Lin, R. Xian, and A. Pavlo. 2017. An Empirical Evaluation of In-Memory Multi-Version Concurrency Control. Proc. VLDB Endow. 10 (2017), 781--792.Google ScholarDigital Library
- Yingjun Wu, Wentian Guo, Chee-Yong Chan, and Kian-Lee Tan. 2017. Fast Failure Recovery for Main-Memory DBMSs on Multicores. In SIGMOD (Chicago, Illinois, USA) (SIGMOD '17). Association for Computing Machinery, New York, NY, USA, 267--281. Google ScholarDigital Library
- Yu Xia, Xiangyao Yu, A. Pavlo, and S. Devadas. 2020. Taurus: Lightweight Parallel Logging for In-Memory Database Management Systems (Extended Version). ArXiv abs/2010.06760 (2020).Google Scholar
- Chang Yao, D. Agrawal, G. Chen, B. Ooi, and Sai Wu. 2016. Adaptive Logging: Optimizing Logging and Recovery Costs in Distributed In-memory Databases. In SIGMOD.Google ScholarDigital Library
- Hao Zhang, Gang Chen, B. Ooi, K. Tan, and Meihui Zhang. 2015. In-Memory Big Data Management and Processing: A Survey. IEEE TKDE 27 (2015), 1920--1948.Google ScholarDigital Library
- L. Zhang, Matthew Butrovich, Tianyu Li, Yash Nannapanei, A. Pavlo, John Rollinson, Huanchen Zhang, Ambarish Balakumar, Daniel Biales, Ziqi Dong, Emmanuel Eppinger, J. González, Wan Shen Lim, Jianqiao Liu, Prashanth Menon, Soumil Mukherjee, Tanuj Nayak, Amadou Ngom, Jeff Niu, Deepayan Patra, Poojita Raj, Stephanie Wang, Wuwen Wang, Yao Yu, and W. Zhang. 2020. Everything is a Transaction: Unifying Logical Concurrency Control and Physical Data Structure Maintenance in Database Management Systems.Google Scholar
- Wenting Zheng, Stephen Tu, E. Kohler, and B. Liskov. 2014. Fast Databases with Fast Durability and Recovery Through Multicore Parallelism. In OSDI.Google Scholar
Recommendations
Main Memory Database Recovery: A Survey
Many of today’s applications need massive real-time data processing. In-memory database systems have become a good alternative for these requirements. These systems maintain the primary copy of the database in the main memory to achieve high throughput ...
Implementing rollback-recovery coordinated checkpoints
ISSADS'05: Proceedings of the 5th international conference on Advanced Distributed SystemsRecovering from processor failures in distributed systems is an important problem in the design of reliable systems. The processes should coordinate their operation to guarantee that the set of local checkpoints taken by the individual processes form a ...
Comments