skip to main content
research-article

Query fresh: log shipping on steroids

Published:01 December 2017Publication History
Skip Abstract Section

Abstract

Hot standby systems often have to trade safety (i.e., not losing committed work) and freshness (i.e., having access to recent updates) for performance. Guaranteeing safety requires synchronous log shipping that blocks the primary until the log records are durably replicated in one or multiple backups; maintaining freshness necessitates fast log replay on backups, but is often defeated by the dual-copy architecture and serial replay: a backup must generate the "real" data from the log to make recent updates accessible to read-only queries.

This paper proposes Query Fresh, a hot standby system that provides both safety and freshness while maintaining high performance on the primary. The crux is an append-only storage architecture used in conjunction with fast networks (e.g., InfiniBand) and byte-addressable, non-volatile memory (NVRAM). Query Fresh avoids the dual-copy design and treats the log as the database, enabling lightweight, parallel log replay that does not block the primary.

Experimental results using the TPC-C benchmark show that under Query Fresh, backup servers can replay log records faster than they are generated by the primary server, using one quarter of the available compute resources. With a 56Gbps network, Query Fresh can support up to 4--5 synchronous replicas, each of which receives and replays ∼1.4GB of log records per second, with up to 4--6% overhead on the primary compared to a standalone server that achieves 620kTPS without replication.

References

  1. AgigaTech. AgigaTech Non-Volatile RAM. 2017. http://www.agigatech.com/nvram.php.Google ScholarGoogle Scholar
  2. J. Arulraj, M. Perron, and A. Pavlo. Write-behind logging. PVLDB, 10(4):337--348, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Balakrishnan, D. Malkhi, V. Prabhakaran, T. Wobber, M. Wei, and J. D. Davis. CORFU: A shared log design for flash clusters. NSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Balakrishnan, D. Malkhi, T. Wobber, M. Wu, V. Prabhakaran, M. Wei, J. D. Davis, S. Rao, T. Zou, and A. Zuck. Tango: Distributed data structures over a shared log. SOSP, pages 325--340, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Barthels, S. Loesing, G. Alonso, and D. Kossmann. Rack-scale in-memory join processing using RDMA. SIGMOD, pages 1463--1475, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. A. Bernstein, S. Das, B. Ding, and M. Pilman. Optimizing optimistic concurrency control for tree-structured, log-structured databases. SIGMOD, pages 1295--1309, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. A. Bernstein, C. W. Reid, and S. Das. Hyder - a transactional record manager for shared flash. CIDR, 2011.Google ScholarGoogle Scholar
  8. P. A. Bernstein, C. W. Reid, M. Wu, and X. Yuan. Optimistic concurrency control by melding trees. PVLDB, 4(11):944--955, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Binnig, A. Crotty, A. Galakatos, T. Kraska, and E. Zamanian. The end of slow networks: It's time for a redesign. PVLDB, 9(7):528--539, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Chen, X. Wei, J. Shi, R. Chen, and H. Chen. Fast and general distributed transactions using RDMA and HTM. EuroSys, pages 26:1--26:17, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Cohen, T. Talpey, A. Kanevsky, U. Cummings, M. Krause, R. Recio, D. Crupnicoff, L. Dickman, and P. Grun. Remote direct memory access over the converged enhanced Ethernet fabric: Evaluating the options. Hot Inteconnects, pages 123--130, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Condit, E. B. Nightingale, C. Frost, E. Ipek, B. Lee, D. Burger, and D. Coetzee. Better I/O through byte-addressable, persistent memory. SOSP, pages 133--146, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. C. Corbett et al. Spanner: Google's globally-distributed database. OSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Crooke and M. Durcan. A revolutionary breakthrough in memory technology. Intel 3D XPoint launch keynote, 2015.Google ScholarGoogle Scholar
  15. J. DeBrabant, A. Pavlo, S. Tu, M. Stonebraker, and S. Zdonik. Anti-caching: A new approach to database management system architecture. PVLDB, 6(14):1942--1953, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Diaconu et al. Hekaton: SQL server's memory-optimized OLTP engine. SIGMOD, pages 1243--1254, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Douglas. RDMA with PMEM: Software mechanisms for enabling access to remote persistent memory. Storage Developer Conference, 2015. http://www.snia.org/sites/default/files/SDC15_presentations/persistant_mem/ChetDouglas_RDMA_with_PM.pdf.Google ScholarGoogle Scholar
  18. A. Dragojević, D. Narayanan, O. Hodson, and M. Castro. FaRM: Fast remote memory. NSDI, pages 401--414, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Graefe. Instant recovery for data center savings. SIGMOD Record, 44(2):29--34, Aug. 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Gray and A. Reuter. Transaction Processing: Concepts and Techniques. Morgan Kaufmann Publishers Inc., 1st edition, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. IBM. High availability through log shipping. IBM DB2 9.7 for Linux, UNIX, and Windows documentation, 2015.Google ScholarGoogle Scholar
  22. InfiniBand Trade Association. InfiniBand roadmap. 2016. http://www.infinibandta.org/content/pages.php?pg=technology_overview.Google ScholarGoogle Scholar
  23. Intel Corporation. Intel data direct I/O technology (Intel DDIO): A primer. 2012.Google ScholarGoogle Scholar
  24. Intel Corporation. Intel 64 and IA-32 architectures software developer's manual. 2015.Google ScholarGoogle Scholar
  25. JEDEC. DDR3 SDRAM standard. 2012. http://www.jedec.org/standards-documents/docs/jesd-79-3d.Google ScholarGoogle Scholar
  26. R. Johnson, I. Pandis, R. Stoica, M. Athanassoulis, and A. Ailamaki. Aether: a scalable approach to logging. PVLDB, 3(1):681--692, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Kalia, M. Kaminsky, and D. G. Andersen. Using RDMA efficiently for key-value services. SIGCOMM, pages 295--306, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Kalia, M. Kaminsky, and D. G. Andersen. FaSST: Fast, scalable and simple distributed transactions with two-sided (RDMA) datagram RPCs. OSDI, pages 185--201, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi. H-Store: a high-performance, distributed main memory transaction processing system. PVLDB, 1(2):1496--1499, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Kateja, A. Badam, S. Govindan, B. Sharma, and G. Ganger. Viyojit: Decoupling battery and DRAM capacities for battery-backed DRAM. ISCA, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Kim, K. Salem, K. Daudjee, A. Aboulnaga, and X. Pan. Database high availability using shadow systems. SoCC, pages 209--221, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. Kim, T. Wang, R. Johnson, and I. Pandis. ERMIA: Fast memory-optimized database system for heterogeneous workloads. SIGMOD, pages 1675--1687, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. Kimura. FOEDUS: OLTP engine for a thousand cores and NVRAM. SIGMOD, pages 691--706, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. Lamport. The part-time parliament. ACM TOCS, 16(2):133--169, May 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Levandoski, D. Lomet, and S. Sengupta. LLAMA: A cache/storage subsystem for modern hardware. PVLDB, 6(10):877--888, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Levandoski, D. Lomet, and S. Sengupta. The Bw-tree: A B-tree for new hardware platforms. ICDE, pages 302--313, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. F. Liu, L. Yin, and S. Blanas. Design and evaluation of an RDMA-aware data shuffling operator for parallel database systems. EuroSys, pages 48--63, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. Makreshanski, J. Giceva, C. Barthels, and G. Alonso. BatchDB: Efficient isolated execution of hybrid OLTP+OLAP workloads for interactive applications. SIGMOD, pages 37--50, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. N. Malviya, A. Weisberg, S. Madden, and M. Stonebraker. Rethinking main memory OLTP recovery. ICDE, pages 604--615, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  40. Y. Mao, E. Kohler, and R. T. Morris. Cache craftiness for fast multicore key-value storage. EuroSys, pages 183--196, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Mellanox Technologies. RDMA aware networks programming user manual. 2015.Google ScholarGoogle Scholar
  42. Mellanox Technologies. RDMA over converged ethernet (RoCE) - an efficient, low-cost, zero copy implementation. 2017. http://www.mellanox.com/page/products_dyn?product_family=79.Google ScholarGoogle Scholar
  43. C. Min, S. Kashyap, S. Maass, W. Kang, and T. Kim. Understanding manycore scalability of file systems. USENIX ATC, pages 71--85, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. U. F. Minhas, S. Rajagopalan, B. Cully, A. Aboulnaga, K. Salem, and A. Warfield. RemusDB: Transparent high availability for database systems. PVLDB, 4(11):738--748, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. C. Mitchell, Y. Geng, and J. Li. Using one-sided RDMA reads to build a fast, CPU-efficient key-value store. USENIX ATC, pages 103--114, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. ARIES: a transaction recovery met- hod supporting fine-granularity locking and partial roll backs using write-ahead logging. TODS, 17(1):94--162, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Oracle. TimesTen in-memory database replication guide. Oracle Database Online Documentation, 2014.Google ScholarGoogle Scholar
  48. Oracle. Chapter 17 Replication. MySQL 5.7 Reference Manual, 2015.Google ScholarGoogle Scholar
  49. I. Oukid, J. Lasperas, A. Nica, T. Willhalm, and W. Lehner. FPTree: A hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory. SIGMOD, pages 371--386, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. I. Oukid, W. Lehner, T. Kissinger, T. Willhalm, and P. Bumbulis. Instant recovery for main memory databases. CIDR, 2015.Google ScholarGoogle Scholar
  51. D. Qin, A. D. Brown, and A. Goel. Scalable replay-based replication for fast databases. PVLDB, 10(13):2025--2036, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. P. S. Randal. High availability with SQL Server 2008. Microsoft White Papers, 2009. https://technet.microsoft.com/en-us/library/ee523927.aspx.Google ScholarGoogle Scholar
  53. R. Ricci, G. Wong, L. Stoller, K. Webb, J. Duerig, K. Downie, and M. Hibler. Apt: A platform for repeatable research in computer science. SIGOPS Oper. Syst. Rev., 49(1):100--107, Jan. 2015. http://docs.aptlab.net/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. W. Rödiger, T. Mühlbauer, A. Kemper, and T. Neumann. High-speed query processing over high-speed networks. PVLDB, 9(4):228--239, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. M. Sadoghi, K. A. Ross, M. Canim, and B. Bhattacharjee. Making updates disk-I/O friendly using SSDs. PVLDB, 6(11):997--1008, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. T. Talpey. RDMA extensions for remote persistent memory access. 12th Annual Open Fabrics Alliance Workshop, 2016. https://www.openfabrics.org/images/eventpresos/2016presentations/215RDMAforRemPerMem.pdf.Google ScholarGoogle Scholar
  57. The PostgreSQL Global Development Group. Chapter 25. High Availability, Load Balancing, and Replication. PostgreSQL 9.4.4 Documentation, 2015.Google ScholarGoogle Scholar
  58. A. Thomson and D. J. Abadi. The case for determinism in database systems. PVLDB, 3(1--2):70--80, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi. Calvin: fast distributed transactions for partitioned database systems. SIGMOD, pages 1--12, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. TPC. TPC benchmark C (OLTP) standard specification, revision 5.11, 2010. http://www.tpc.org/tpcc.Google ScholarGoogle Scholar
  61. S. Tu, W. Zheng, E. Kohler, B. Liskov, and S. Madden. Speedy transactions in multicore in-memory databases. SOSP, pages 18--32, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. A. Verbitski, A. Gupta, D. Saha, M. Brahmadesam, K. Gupta, R. Mittal, S. Krishnamurthy, S. Maurice, T. Kharatishvili, and X. Bao. Amazon aurora: Design considerations for high throughput cloud-native relational databases. SIGMOD, pages 1041--1052, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Viking Technology. DDR4 NVDIMM. 2017. http://www.vikingtechnology.com/products/nvdimm/ddr4-nvdimm/.Google ScholarGoogle Scholar
  64. T. Wang and R. Johnson. Scalable logging through emerging non-volatile memory. PVLDB, 7(10):865--876, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. T. Wang, R. Johnson, and I. Pandis. Fresh replicas through append-only storage. HPTS, 2015. http://www.hpts.ws/papers/2015/lightning/append-only-log-ship.pdf.Google ScholarGoogle Scholar
  66. Y. Wu, J. Arulraj, J. Lin, R. Xian, and A. Pavlo. An empirical evaluation of in-memory multi-version concurrency control. PVLDB, 10(7):781--792, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Y. Wu, W. Guo, C.-Y. Chan, and K.-L. Tan. Fast failure recovery for main-memory DBMSs on multicores. SIGMOD, pages 267--281, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. M. Yang, D. Zhou, C. Kuo, C. Hong, L. Zhang, and L. Zhou. KuaFu: Closing the parallelism gap in database replication. ICDE 2013, pages 1186--1195, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. C. Yao, D. Agrawal, G. Chen, B. C. Ooi, and S. Wu. Adaptive logging: Optimizing logging and recovery costs in distributed in-memory databases. SIGMOD, pages 1119--1134, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. E. Zamanian, C. Binnig, T. Kraska, and T. Harris. The end of a myth: Distributed transactions can scale. PVLDB, 10(6):685--696, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Y. Zhang, J. Yang, A. Memaripour, and S. Swanson. Mojim: A reliable and highly-available non-volatile memory system. ASPLOS, pages 3--18, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Query fresh: log shipping on steroids
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Proceedings of the VLDB Endowment
            Proceedings of the VLDB Endowment  Volume 11, Issue 4
            December 2017
            133 pages
            ISSN:2150-8097
            Issue’s Table of Contents

            Publisher

            VLDB Endowment

            Publication History

            • Published: 1 December 2017
            Published in pvldb Volume 11, Issue 4

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader