research-article

Query fresh: log shipping on steroids

Authors:
Tianzheng Wang

University of Toronto

University of Toronto
View Profile

,
Ryan Johnson

LogicBlox

LogicBlox
View Profile

,
Ippokratis Pandis

Amazon Web Services

Amazon Web Services
View Profile

Proceedings of the VLDB Endowment Volume 11 Issue 4pp 406–419https://doi.org/10.1145/3186728.3164137

Published:01 December 2017Publication History

Proceedings of the VLDB Endowment

Abstract

Hot standby systems often have to trade safety (i.e., not losing committed work) and freshness (i.e., having access to recent updates) for performance. Guaranteeing safety requires synchronous log shipping that blocks the primary until the log records are durably replicated in one or multiple backups; maintaining freshness necessitates fast log replay on backups, but is often defeated by the dual-copy architecture and serial replay: a backup must generate the "real" data from the log to make recent updates accessible to read-only queries.

This paper proposes Query Fresh, a hot standby system that provides both safety and freshness while maintaining high performance on the primary. The crux is an append-only storage architecture used in conjunction with fast networks (e.g., InfiniBand) and byte-addressable, non-volatile memory (NVRAM). Query Fresh avoids the dual-copy design and treats the log as the database, enabling lightweight, parallel log replay that does not block the primary.

Experimental results using the TPC-C benchmark show that under Query Fresh, backup servers can replay log records faster than they are generated by the primary server, using one quarter of the available compute resources. With a 56Gbps network, Query Fresh can support up to 4--5 synchronous replicas, each of which receives and replays ∼1.4GB of log records per second, with up to 4--6% overhead on the primary compared to a standalone server that achieves 620kTPS without replication.

References

AgigaTech. AgigaTech Non-Volatile RAM. 2017. http://www.agigatech.com/nvram.php.Google Scholar
J. Arulraj, M. Perron, and A. Pavlo. Write-behind logging. PVLDB, 10(4):337--348, 2016. Google ScholarDigital Library
M. Balakrishnan, D. Malkhi, V. Prabhakaran, T. Wobber, M. Wei, and J. D. Davis. CORFU: A shared log design for flash clusters. NSDI, 2012. Google ScholarDigital Library
M. Balakrishnan, D. Malkhi, T. Wobber, M. Wu, V. Prabhakaran, M. Wei, J. D. Davis, S. Rao, T. Zou, and A. Zuck. Tango: Distributed data structures over a shared log. SOSP, pages 325--340, 2013. Google ScholarDigital Library
C. Barthels, S. Loesing, G. Alonso, and D. Kossmann. Rack-scale in-memory join processing using RDMA. SIGMOD, pages 1463--1475, 2015. Google ScholarDigital Library
P. A. Bernstein, S. Das, B. Ding, and M. Pilman. Optimizing optimistic concurrency control for tree-structured, log-structured databases. SIGMOD, pages 1295--1309, 2015. Google ScholarDigital Library
P. A. Bernstein, C. W. Reid, and S. Das. Hyder - a transactional record manager for shared flash. CIDR, 2011.Google Scholar
P. A. Bernstein, C. W. Reid, M. Wu, and X. Yuan. Optimistic concurrency control by melding trees. PVLDB, 4(11):944--955, 2011.Google ScholarDigital Library
C. Binnig, A. Crotty, A. Galakatos, T. Kraska, and E. Zamanian. The end of slow networks: It's time for a redesign. PVLDB, 9(7):528--539, 2016. Google ScholarDigital Library
Y. Chen, X. Wei, J. Shi, R. Chen, and H. Chen. Fast and general distributed transactions using RDMA and HTM. EuroSys, pages 26:1--26:17, 2016. Google ScholarDigital Library
D. Cohen, T. Talpey, A. Kanevsky, U. Cummings, M. Krause, R. Recio, D. Crupnicoff, L. Dickman, and P. Grun. Remote direct memory access over the converged enhanced Ethernet fabric: Evaluating the options. Hot Inteconnects, pages 123--130, 2009. Google ScholarDigital Library
J. Condit, E. B. Nightingale, C. Frost, E. Ipek, B. Lee, D. Burger, and D. Coetzee. Better I/O through byte-addressable, persistent memory. SOSP, pages 133--146, 2009. Google ScholarDigital Library
J. C. Corbett et al. Spanner: Google's globally-distributed database. OSDI, 2012. Google ScholarDigital Library
R. Crooke and M. Durcan. A revolutionary breakthrough in memory technology. Intel 3D XPoint launch keynote, 2015.Google Scholar
J. DeBrabant, A. Pavlo, S. Tu, M. Stonebraker, and S. Zdonik. Anti-caching: A new approach to database management system architecture. PVLDB, 6(14):1942--1953, 2013. Google ScholarDigital Library
C. Diaconu et al. Hekaton: SQL server's memory-optimized OLTP engine. SIGMOD, pages 1243--1254, 2013. Google ScholarDigital Library
C. Douglas. RDMA with PMEM: Software mechanisms for enabling access to remote persistent memory. Storage Developer Conference, 2015. http://www.snia.org/sites/default/files/SDC15_presentations/persistant_mem/ChetDouglas_RDMA_with_PM.pdf.Google Scholar
A. Dragojević, D. Narayanan, O. Hodson, and M. Castro. FaRM: Fast remote memory. NSDI, pages 401--414, 2014. Google ScholarDigital Library
G. Graefe. Instant recovery for data center savings. SIGMOD Record, 44(2):29--34, Aug. 2015. Google ScholarDigital Library
J. Gray and A. Reuter. Transaction Processing: Concepts and Techniques. Morgan Kaufmann Publishers Inc., 1st edition, 1992. Google ScholarDigital Library
IBM. High availability through log shipping. IBM DB2 9.7 for Linux, UNIX, and Windows documentation, 2015.Google Scholar
InfiniBand Trade Association. InfiniBand roadmap. 2016. http://www.infinibandta.org/content/pages.php?pg=technology_overview.Google Scholar
Intel Corporation. Intel data direct I/O technology (Intel DDIO): A primer. 2012.Google Scholar
Intel Corporation. Intel 64 and IA-32 architectures software developer's manual. 2015.Google Scholar
JEDEC. DDR3 SDRAM standard. 2012. http://www.jedec.org/standards-documents/docs/jesd-79-3d.Google Scholar
R. Johnson, I. Pandis, R. Stoica, M. Athanassoulis, and A. Ailamaki. Aether: a scalable approach to logging. PVLDB, 3(1):681--692, 2010. Google ScholarDigital Library
A. Kalia, M. Kaminsky, and D. G. Andersen. Using RDMA efficiently for key-value services. SIGCOMM, pages 295--306, 2014. Google ScholarDigital Library
A. Kalia, M. Kaminsky, and D. G. Andersen. FaSST: Fast, scalable and simple distributed transactions with two-sided (RDMA) datagram RPCs. OSDI, pages 185--201, 2016. Google ScholarDigital Library
R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi. H-Store: a high-performance, distributed main memory transaction processing system. PVLDB, 1(2):1496--1499, 2008. Google ScholarDigital Library
R. Kateja, A. Badam, S. Govindan, B. Sharma, and G. Ganger. Viyojit: Decoupling battery and DRAM capacities for battery-backed DRAM. ISCA, 2017. Google ScholarDigital Library
J. Kim, K. Salem, K. Daudjee, A. Aboulnaga, and X. Pan. Database high availability using shadow systems. SoCC, pages 209--221, 2015. Google ScholarDigital Library
K. Kim, T. Wang, R. Johnson, and I. Pandis. ERMIA: Fast memory-optimized database system for heterogeneous workloads. SIGMOD, pages 1675--1687, 2016. Google ScholarDigital Library
H. Kimura. FOEDUS: OLTP engine for a thousand cores and NVRAM. SIGMOD, pages 691--706, 2015. Google ScholarDigital Library
L. Lamport. The part-time parliament. ACM TOCS, 16(2):133--169, May 1998. Google ScholarDigital Library
J. Levandoski, D. Lomet, and S. Sengupta. LLAMA: A cache/storage subsystem for modern hardware. PVLDB, 6(10):877--888, 2013. Google ScholarDigital Library
J. Levandoski, D. Lomet, and S. Sengupta. The Bw-tree: A B-tree for new hardware platforms. ICDE, pages 302--313, 2013. Google ScholarDigital Library
F. Liu, L. Yin, and S. Blanas. Design and evaluation of an RDMA-aware data shuffling operator for parallel database systems. EuroSys, pages 48--63, 2017. Google ScholarDigital Library
D. Makreshanski, J. Giceva, C. Barthels, and G. Alonso. BatchDB: Efficient isolated execution of hybrid OLTP+OLAP workloads for interactive applications. SIGMOD, pages 37--50, 2017. Google ScholarDigital Library
N. Malviya, A. Weisberg, S. Madden, and M. Stonebraker. Rethinking main memory OLTP recovery. ICDE, pages 604--615, 2014.Google ScholarCross Ref
Y. Mao, E. Kohler, and R. T. Morris. Cache craftiness for fast multicore key-value storage. EuroSys, pages 183--196, 2012. Google ScholarDigital Library
Mellanox Technologies. RDMA aware networks programming user manual. 2015.Google Scholar
Mellanox Technologies. RDMA over converged ethernet (RoCE) - an efficient, low-cost, zero copy implementation. 2017. http://www.mellanox.com/page/products_dyn?product_family=79.Google Scholar
C. Min, S. Kashyap, S. Maass, W. Kang, and T. Kim. Understanding manycore scalability of file systems. USENIX ATC, pages 71--85, 2016. Google ScholarDigital Library
U. F. Minhas, S. Rajagopalan, B. Cully, A. Aboulnaga, K. Salem, and A. Warfield. RemusDB: Transparent high availability for database systems. PVLDB, 4(11):738--748, 2011. Google ScholarDigital Library
C. Mitchell, Y. Geng, and J. Li. Using one-sided RDMA reads to build a fast, CPU-efficient key-value store. USENIX ATC, pages 103--114, 2013. Google ScholarDigital Library
C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. ARIES: a transaction recovery met- hod supporting fine-granularity locking and partial roll backs using write-ahead logging. TODS, 17(1):94--162, 1992. Google ScholarDigital Library
Oracle. TimesTen in-memory database replication guide. Oracle Database Online Documentation, 2014.Google Scholar
Oracle. Chapter 17 Replication. MySQL 5.7 Reference Manual, 2015.Google Scholar
I. Oukid, J. Lasperas, A. Nica, T. Willhalm, and W. Lehner. FPTree: A hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory. SIGMOD, pages 371--386, 2016. Google ScholarDigital Library
I. Oukid, W. Lehner, T. Kissinger, T. Willhalm, and P. Bumbulis. Instant recovery for main memory databases. CIDR, 2015.Google Scholar
D. Qin, A. D. Brown, and A. Goel. Scalable replay-based replication for fast databases. PVLDB, 10(13):2025--2036, 2017. Google ScholarDigital Library
P. S. Randal. High availability with SQL Server 2008. Microsoft White Papers, 2009. https://technet.microsoft.com/en-us/library/ee523927.aspx.Google Scholar
R. Ricci, G. Wong, L. Stoller, K. Webb, J. Duerig, K. Downie, and M. Hibler. Apt: A platform for repeatable research in computer science. SIGOPS Oper. Syst. Rev., 49(1):100--107, Jan. 2015. http://docs.aptlab.net/. Google ScholarDigital Library
W. Rödiger, T. Mühlbauer, A. Kemper, and T. Neumann. High-speed query processing over high-speed networks. PVLDB, 9(4):228--239, 2015. Google ScholarDigital Library
M. Sadoghi, K. A. Ross, M. Canim, and B. Bhattacharjee. Making updates disk-I/O friendly using SSDs. PVLDB, 6(11):997--1008, 2013. Google ScholarDigital Library
T. Talpey. RDMA extensions for remote persistent memory access. 12th Annual Open Fabrics Alliance Workshop, 2016. https://www.openfabrics.org/images/eventpresos/2016presentations/215RDMAforRemPerMem.pdf.Google Scholar
The PostgreSQL Global Development Group. Chapter 25. High Availability, Load Balancing, and Replication. PostgreSQL 9.4.4 Documentation, 2015.Google Scholar
A. Thomson and D. J. Abadi. The case for determinism in database systems. PVLDB, 3(1--2):70--80, 2010. Google ScholarDigital Library
A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi. Calvin: fast distributed transactions for partitioned database systems. SIGMOD, pages 1--12, 2012. Google ScholarDigital Library
TPC. TPC benchmark C (OLTP) standard specification, revision 5.11, 2010. http://www.tpc.org/tpcc.Google Scholar
S. Tu, W. Zheng, E. Kohler, B. Liskov, and S. Madden. Speedy transactions in multicore in-memory databases. SOSP, pages 18--32, 2013. Google ScholarDigital Library
A. Verbitski, A. Gupta, D. Saha, M. Brahmadesam, K. Gupta, R. Mittal, S. Krishnamurthy, S. Maurice, T. Kharatishvili, and X. Bao. Amazon aurora: Design considerations for high throughput cloud-native relational databases. SIGMOD, pages 1041--1052, 2017. Google ScholarDigital Library
Viking Technology. DDR4 NVDIMM. 2017. http://www.vikingtechnology.com/products/nvdimm/ddr4-nvdimm/.Google Scholar
T. Wang and R. Johnson. Scalable logging through emerging non-volatile memory. PVLDB, 7(10):865--876, 2014. Google ScholarDigital Library
T. Wang, R. Johnson, and I. Pandis. Fresh replicas through append-only storage. HPTS, 2015. http://www.hpts.ws/papers/2015/lightning/append-only-log-ship.pdf.Google Scholar
Y. Wu, J. Arulraj, J. Lin, R. Xian, and A. Pavlo. An empirical evaluation of in-memory multi-version concurrency control. PVLDB, 10(7):781--792, 2017. Google ScholarDigital Library
Y. Wu, W. Guo, C.-Y. Chan, and K.-L. Tan. Fast failure recovery for main-memory DBMSs on multicores. SIGMOD, pages 267--281, 2017. Google ScholarDigital Library
M. Yang, D. Zhou, C. Kuo, C. Hong, L. Zhang, and L. Zhou. KuaFu: Closing the parallelism gap in database replication. ICDE 2013, pages 1186--1195, 2013. Google ScholarDigital Library
C. Yao, D. Agrawal, G. Chen, B. C. Ooi, and S. Wu. Adaptive logging: Optimizing logging and recovery costs in distributed in-memory databases. SIGMOD, pages 1119--1134, 2016. Google ScholarDigital Library
E. Zamanian, C. Binnig, T. Kraska, and T. Harris. The end of a myth: Distributed transactions can scale. PVLDB, 10(6):685--696, 2017. Google ScholarDigital Library
Y. Zhang, J. Yang, A. Memaripour, and S. Swanson. Mojim: A reliable and highly-available non-volatile memory system. ASPLOS, pages 3--18, 2015. Google ScholarDigital Library

Index Terms

Query fresh: log shipping on steroids

Index terms have been assigned to the content through auto-classification.

Recommendations

Fresh apps: an empirical study of frequently-updated mobile apps in the Google play store

Mobile app stores provide a unique platform for developers to rapidly deploy new updates of their apps. We studied the frequency of updates of 10,713 mobile apps (the top free 400 apps at the start of 2014 in each of the 30 categories in the Google Play ...
Read More
Cache-oblivious dynamic dictionaries with update/query tradeoffs
SODA '10: Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete algorithms

Several existing cache-oblivious dynamic dictionaries achieve O(log_B N) (or slightly better O(log_B N/M)) memory transfers per operation, where N is the number of items stored, M is the memory size, and B is the block size, which matches the classic B-...
Read More
Update or wait: How to keep your data fresh
IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications
In this work we study how to manage the freshness of status updates sent from a source to a remote monitor via a network server. A proper metric of data freshness at the monitor is the age-of-information, which is defined as how old the freshest update is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 11, Issue 4
December 2017
133 pages
ISSN:2150-8097
Editors:
Jian Pei
Simon Fraser University
,
Sihem Amer-Yahia
University of Grenoble Alpes, CNRS
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 December 2017
Published in pvldb Volume 11, Issue 4
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 146
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Query fresh: log shipping on steroids

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Fresh apps: an empirical study of frequently-updated mobile apps in the Google play store

Cache-oblivious dynamic dictionaries with update/query tradeoffs

Update or wait: How to keep your data fresh

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Query fresh: log shipping on steroids

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Fresh apps: an empirical study of frequently-updated mobile apps in the Google play store

Cache-oblivious dynamic dictionaries with update/query tradeoffs

Update or wait: How to keep your data fresh

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media