skip to main content
research-article

PaxosStore: high-availability storage made practical in WeChat

Published:01 August 2017Publication History
Skip Abstract Section

Abstract

In this paper, we present PaxosStore, a high-availability storage system developed to support the comprehensive business of WeChat. It employs a combinational design in the storage layer to engage multiple storage engines constructed for different storage models. PaxosStore is characteristic of extracting the Paxos-based distributed consensus protocol as a middleware that is universally accessible to the underlying multi-model storage engines. This facilitates tuning, maintaining, scaling and extending the storage engines. According to our experience in engineering practice, to achieve a practical consistent read/write protocol is far more complex than its theory. To tackle such engineering complexity, we propose a layered design of the Paxos-based storage protocol stack, where PaxosLog, the key data structure used in the protocol, is devised to bridge the programming-oriented consistent read/write to the storage-oriented Paxos procedure. Additionally, we present optimizations based on Paxos that made fault-tolerance more efficient. Discussion throughout the paper primarily focuses on pragmatic solutions that could be insightful for building practical distributed storage systems.

References

  1. R. Ananthanarayanan, V. Basker, S. Das, A. Gupta, H. Jiang, T. Qiu, et al. Photon: Fault-tolerant and scalable joining of continuous data streams. In Proc. of SIGMOD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Baker, C. Bond, J. C. Corbett, J. Furman, A. Khorlin, J. Larson, et al. Megastore: Providing scalable, highly available storage for interactive services. In Proc. of CIDR, 2011.Google ScholarGoogle Scholar
  3. P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems. Addison-Wesley Longman Publishing Co., Inc., 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Burrows. The chubby lock service for loosely-coupled distributed systems. In Proc. of OSDI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. D. Chandra, R. Griesemer, and J. Redstone. Paxos made live: An engineering perspective. In Proc. of PODC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, et al. Spanner: Google's globally-distributed database. In Proc. of OSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Cowling, D. Myers, B. Liskov, R. Rodrigues, and L. Shrira. Hq replication: A hybrid quorum protocol for byzantine fault tolerance. In Proc. of OSDI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. De Prisco, B. Lampson, and N. Lynch. Revisiting the paxos algorithm. In Proc. of WDAG, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Färber, N. May, W. Lehner, P. Große, I. Müller, H. Rauhe, et al. The sap hana database--an architecture overview. IEEE Data Engineering Bulletin, 35(1):28--33, 2012.Google ScholarGoogle Scholar
  10. M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. JACM, 32(2):374--382, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Glendenning, I. Beschastnikh, A. Krishnamurthy, and T. Anderson. Scalable consistency in scatter. In Proc. of SOSP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Gray and L. Lamport. Consensus on transaction commit. TODS, 31(1):133--160, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Gupta and J. Shute. High-availability at massive scale: Building google's data infrastructure for ads. In Proc. of BIRTE, 2015.Google ScholarGoogle Scholar
  14. A. Gupta, F. Yang, J. Govig, A. Kirsch, K. Chan, K. Lai, et al. Mesa: Geo-replicated, near real-time, scalable data warehousing. PVLDB, 7(12):1259--1270, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Héman, M. Zukowski, N. J. Nes, L. Sidirourgos, and P. Boncz. Positional update handling in column stores. In Proc. of SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: Wait-free coordination for internet-scale systems. In Proc. of USENIX ATC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Kirsch and Y. Amir. Paxos for system builders: An overview. In Proc. of LADIS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Lamport. Time, clocks, and the ordering of events in a distributed system. CACM, 21(7):558--565, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Lamport. The part-time parliament. TOCS, 16(2):133--169, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Lamport. Fast paxos. Distributed Computing, 19(2):79--103, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Lamport and M. Massa. Cheap paxos. In Proc. of DSN, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Lamport, R. Shostak, and M. Pease. The byzantine generals problem. TOPLAS, 4(3):382--401, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. Lampson. The abcd's of paxos. In Proc. of PODC, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. W. Lampson. How to build a highly available system using consensus. In Proc. of WDAG, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. I. Moraru, D. G. Andersen, and M. Kaminsky. Paxos quorum leases: Fast reads without sacrificing writes. In Proc. of SoCC, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The log-structured merge-tree (lsm-tree). Acta Informatica, 33(4):351--385, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Ongaro and J. Ousterhout. In search of an understandable consensus algorithm. In Proc. of USENIX ATC, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. R. K. Ports, J. Li, V. Liu, N. K. Sharma, and A. Krishnamurthy. Designing distributed systems using approximate synchrony in data center networks. In Proc. of NSDI, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. D. Prisco, B. Lampson, and N. Lynch. Revisiting the paxos algorithm. TCS, 243(1-2):35--91, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Sears and R. Ramakrishnan. blsm: A general purpose log structured merge tree. In Proc. of SIGMOD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Sheehy and D. Smith. Bitcask: a log-structured hash table for fast key/value data. White paper, Basho Technologies, 2010.Google ScholarGoogle Scholar
  32. I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proc. of SIGCOMM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, et al. C-store: A column-oriented dbms. In Proc. of VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. R. Van Renesse and D. Altinbuken. Paxos made moderately complex. CSUR, 47(3):42:1--42:36, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PaxosStore: high-availability storage made practical in WeChat
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Proceedings of the VLDB Endowment
          Proceedings of the VLDB Endowment  Volume 10, Issue 12
          August 2017
          427 pages
          ISSN:2150-8097
          Issue’s Table of Contents

          Publisher

          VLDB Endowment

          Publication History

          • Published: 1 August 2017
          Published in pvldb Volume 10, Issue 12

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader