skip to main content
research-article

Constant time recovery in Azure SQL database

Published:01 August 2019Publication History
Skip Abstract Section

Abstract

Azure SQL Database and the upcoming release of SQL Server introduce a novel database recovery mechanism that combines traditional ARIES recovery with multi-version concurrency control to achieve database recovery in constant time, regardless of the size of user transactions. Additionally, our algorithm enables continuous transaction log truncation, even in the presence of long running transactions, thereby allowing large data modifications using only a small, constant amount of log space. These capabilities are particularly important for any Cloud database service given a) the constantly increasing database sizes, b) the frequent failures of commodity hardware, c) the strict availability requirements of modern, global applications and d) the fact that software upgrades and other maintenance tasks are managed by the Cloud platform, introducing unexpected failures for the users. This paper describes the design of our recovery algorithm and demonstrates how it allowed us to improve the availability of Azure SQL Database by guaranteeing consistent recovery times of under 3 minutes for 99.999% of recovery cases in production.

References

  1. Arulraj, J., Pavlo, A., and Dulloor, S. R. Let's talk about storage & recovery methods for non-volatile memory database systems. SIGMOD, 2015, Pages 707--722. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Coburn, J., Bunker, T., Schwarz, M., Gupta, R., and Swanson, S. From ARIES to MARS: Transaction support for next-generation, solid-state drives. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP, 2013, Pages 197--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Delaney, K., Randal, P. S., Tripp, K. L., Cunningham, C., Machanic, A. Microsoft SQL Server 2008 Internals. Microsoft Press, Redmond, WA, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Gao, S., Xu, J., He, B., Choi, B., Hu, H. PCMLogging: Reducing transaction logging overhead with pcm. CIKM, 2011, Pages 2401--2404. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. IBM, IBM DB2, Crash recovery. https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.admin.ha.doc/doc/c0005962.htmlGoogle ScholarGoogle Scholar
  6. Lomet, D., Hong, M., Nehme, R., Zhang, R. Transaction time indexing with version compression. PVLDB, 1(1):870--881, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Microsoft, Accelerated Database Recovery. https://docs.microsoft.com/en-us/azure/sql-database/sql-database-accelerated-database-recoveryGoogle ScholarGoogle Scholar
  8. Microsoft, Offload read-only workload to secondary replica of an Always On availability group. https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/active-secondaries-readable-secondary-replicas-always-on-availability-groups?view=sql-server-2017Google ScholarGoogle Scholar
  9. Mohan, C., Haderle, D. J., Lindsay, B. G., Pirahesh, H., Schwarz, P. M. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. ACM TODS, 17(1):94--162, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. MySQL, InnoDB Recovery. https://dev.mysql.com/doc/refman/8.0/en/innodb-recovery.htmlGoogle ScholarGoogle Scholar
  11. Oracle, Using Fast-Start On-Demand Rollback https://docs.oracle.com/cd/B10500_01/server.920/a96533/instreco.htm#429546Google ScholarGoogle Scholar
  12. Oukid, I., Booss, D., Lehner, W., Bumbulis, P., Willhalm, T. SOFORT: A hybrid SCM-DRAM storage engine for fast data recovery. DaMoN, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Stonebraker, M., Rowe, L. A. The design of POSTGRES. SIGMOD, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Verbitski, A., Gupta, A., Saha, D., Corey, J., Gupta, K., Brahmadesam, M., Mittal, R., Krishnamurthy, S., Maurice, S., Kharatishvilli, T., Bao, X. Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes. SIGMOD, 2018, Pages 789--796. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Wang, T., Johnson, R. Scalable logging through emerging non-volatile memory. PVLDB, 7(10):865--876, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Constant time recovery in Azure SQL database
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 12, Issue 12
        August 2019
        547 pages

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 August 2019
        Published in pvldb Volume 12, Issue 12

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader