skip to main content
research-article

MyRocks: LSM-tree database storage engine serving Facebook's social graph

Published:01 August 2020Publication History
Skip Abstract Section

Abstract

Facebook uses MySQL to manage tens of petabytes of data in its main database named the User Database (UDB). UDB serves social activities such as likes, comments, and shares. In the past, Facebook used InnoDB, a B+Tree based storage engine as the backend. The challenge was to find an index structure using less space and write amplification [1]. LSM-tree [2] has the potential to greatly improve these two bottlenecks. RocksDB, an LSM tree-based key/value store was already widely used in variety of applications but had a very low-level key-value interface. To overcome these limitations, MyRocks, a new MySQL storage engine, was built on top of RocksDB by adding relational capabilities. With MyRocks, using the RocksDB API, significant efficiency gains were achieved while still benefiting from all the MySQL features and tools. The transition was mostly transparent to client applications.

Facebook completed the UDB migration from InnoDB to MyRocks in 2017. Since then, ongoing improvements in production operations, and additional enhancements to MySQL, MyRocks, and RocksDB, provided even greater efficiency wins. MyRocks also reduced the instance size by 62.3% for UDB data sets and performed fewer I/O operations than InnoDB. Finally, MyRocks consumed less CPU time for serving the same production traffic workload. These gains enabled us to reduce the number of database servers in UDB to less than half, saving significant resources. In this paper, we describe our journey to build and run an OLTP LSM-tree SQL database at scale. We also discuss the features we implemented to keep pace with UDB workloads, what made migrations easier, and what operational and software development challenges we faced during the two years of running MyRocks in production.

Among the new features we introduced in RocksDB were transactional support, bulk loading, and prefix bloom filters, all are available for the benefit of all RocksDB users.

References

  1. M. Athanassoulis, M. S. Kester, L. M. Maas, R. I. Stoica, S. Idreos, A. Ailamaki, and M. Callaghan. Designing Access Methods: The RUM Conjecture. In Proceedings of the International Conference on Extending Database Technology (EDBT) Conference, 2016Google ScholarGoogle Scholar
  2. Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Inf. 33, 4 (June 1996), 351--385. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Venkateshwaran Venkataramani, Zach Amsden, Nathan Bronson, George Cabrera III, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Jeremy Hoon, Sachin Kulkarni, Nathan Lawrence, Mark Marchukov, Dmitri Petrov, and Lovro Puzar. 2012. TAO: how facebook serves the social graph. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD '12). Association for Computing Machinery, New York, NY, USA, 791--792. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Facebook's MySQL extensions. https://github.com/facebook/mysql-5.6Google ScholarGoogle Scholar
  5. Data centers year in review. Facebook Engineering. https://engineering.fb.com/data-center-engineering/data-centers-2018/.Google ScholarGoogle Scholar
  6. Sharma, Y., Ajoux, P., Ang, P., Callies, D., Choudhary, A., Demailly, L., Fersch, T., Guz, L.A., Kotulski, A., Kulkarni, S. and Kumar, S., 2015. Wormhole: Reliable pub-sub to support geo-replicated internet services. In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15) (pp. 351--366). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Flashcache https://www.facebook.com/notes/mysql-at-facebook/releasing-flashcache/388112370932/Google ScholarGoogle Scholar
  8. MySQL Glossary for Covering Index https://dev.mysql.com/doc/refman/5.6/en/glossary.html#glos_covering_indexGoogle ScholarGoogle Scholar
  9. RocksDB. https://github.com/facebook/rocksdbGoogle ScholarGoogle Scholar
  10. Amy Tai, Andrew Kryczka, Shobhit O. Kanaujia, Kyle Jamieson, Michael J. Freedman, and Asaf Cidon. 2019. Who's afraid of uncorrectable bit errors? online recovery of flash errors with distributed redundancy. In Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '19). USENIX Association, USA, 977--991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Guoqiang Jerry Chen, Janet L. Wiener, Shridhar Iyer, Anshul Jaiswal, Ran Lei, Nikhil Simha, Wei Wang, Kevin Wilfong, Tim Williamson, and Serhat Yilmaz. 2016. Realtime Data Processing at Facebook. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). Association for Computing Machinery, New York, NY, USA, 1087--1098. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Arun Sharma. Dragon: A distributed graph query engine. https://engineering.fb.com/data-infrastructure/dragon-a-distributed-graph-query-engine/Google ScholarGoogle Scholar
  13. Ghemawat, S. and Dean, J., 2011. LevelDB. https://github.com/google/leveldbGoogle ScholarGoogle Scholar
  14. S. Dong, M. Callaghan, L. Galanis, D. Borthakur, T. Savor, and M. Strumm. Optimizing space amplification in RocksDB. In CIDR, volume 3, page 3, 2017.Google ScholarGoogle Scholar
  15. Timothy G. Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. 2013. LinkBench: a database benchmark based on the Facebook social graph. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD '13). Association for Computing Machinery, New York, NY, USA, 1185--1196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Tasha Frankie, Gordon Hughes, and Ken Kreutz-Delgado. 2012. A mathematical model of the trim command in NAND-flash SSDs. In Proceedings of the 50th Annual Southeast Regional Conference (ACM-SE '12). Association for Computing Machinery, New York, NY, USA, 59--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. MySQL InnoDB Undo Logs https://dev.mysql.com/doc/refman/5.6/en/innodb-undo-logs.htmlGoogle ScholarGoogle Scholar
  18. George, Lars. HBase: the definitive guide: random access to your planet-size data. "O'Reilly Media, Inc.", 2011.Google ScholarGoogle Scholar
  19. Tyler Harter, Dhruba Borthakur, Siying Dong, Amitanand Aiyer, Liyin Tang, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2014. Analysis of HDFS under HBase: a facebook messages case study. In Proceedings of the 12th USENIX conference on File and Storage Technologies (FAST'14). USENIX Association, USA, 199--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Xiang Li, Thomas Georgiou. Migrating Messenger storage to optimize performance https://engineering.fb.com/core-data/migrating-messenger-storage-to-optimize-performance/Google ScholarGoogle Scholar
  21. Evans, J. 2006, A Scalable Concurrent malloc(3) Implementation for FreeBSDGoogle ScholarGoogle Scholar
  22. Stonebraker, M. 1981. Operating System Support for Database Management. Communications of the ACM 24(7): 412--418 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst. 26, 2, Article 4 (June 2008), 26 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lakshman, A. and Malik, P., 2010. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2), pp.35--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Bacon, D.F., Bales, N., Bruno, N., Cooper, B.F., Dickinson, A., Fikes, A., Fraser, C., Gubarev, A., Joshi, M., Kogan, E. and Lloyd, A., 2017, May. Spanner: Becoming a SQL system. In Proceedings of the 2017 ACM International Conference on Management of Data (pp. 331--343). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Taft, R., Sharif, I., Matei, A., VanBenschoten, N., Lewis, J., Grieger, T., Niemi, K., Woods, A., Birzin, A., Poss, R. and Bardea, P., 2020, June. CockroachDB: The Resilient Geo-Distributed SQL Database. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (pp. 1493--1509). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yugabyte, Inc. The Leading High-Performance Distributed SQL Database. https://www.yugabyte.com/. Accessed: 2020-02-09.Google ScholarGoogle Scholar
  28. PingCAP. Tackling MySQL Scalability with TiDB: the most actively developed open source NewSQL database on GitHub. https://pingcap.com/. Accessed: 2020-02-09.Google ScholarGoogle Scholar
  29. Verbitski, A., Gupta, A., Saha, D., Brahmadesam, M., Gupta, K., Mittal, R., Krishnamurthy, S., Maurice, S., Kharatishvili, T. and Bao, X., 2017, May. Amazon aurora: Design considerations for high throughput cloud-native relational databases. In Proceedings of the 2017 ACM International Conference on Management of Data (pp. 1041--1052). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. I. Tokutek, "TokuDB: MySQL performance, MariaDB performance," http://www.tokutek.com/products/tokudb-for-mysql/, 2013.Google ScholarGoogle Scholar
  31. Feifei Li. Cloud-Native Database Systems at Alibaba: Opportunities and Challenges. PVLDB, 12(12): 2263 -- 2272, 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Elmore, A.J., Das, S., Agrawal, D. and El Abbadi, A., 2011, June. Zephyr: live migration in shared nothing databases for elastic cloud platforms. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (pp. 301--312). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Netflix Technology Blog. Netflix Billing Migration to AWS --- Part III. https://netflixtechblog.com/netflix-billing-migration-to-aws-part-iii-7d94ab9d1f59Google ScholarGoogle Scholar
  34. Migrating from AWS RDS MySQL to AWS Aurora Serverless MySQL Database. https://www.adelatech.com/migrating-from-aws-rds-mysql-to-aws-aurora-serverless-mysql-database/Google ScholarGoogle Scholar
  35. Dayan, N., Athanassoulis, M. and Idreos, S., 2017, May. Monkey: Optimal navigable key-value store. In Proceedings of the 2017 ACM International Conference on Management of Data (pp. 79--94). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zhang, Y., Li, Y., Guo, F., Li, C. and Xu, Y., 2018. ElasticBF: Fine-grained and Elastic Bloom Filter Towards Efficient Read for LSM-tree-based {KV} Stores. In 10th {USENIX} Workshop on Hot Topics in Storage and File Systems (HotStorage 18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G. Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2018. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). Association for Computing Machinery, New York, NY, USA, 323--336. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 13, Issue 12
    August 2020
    1710 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 August 2020
    Published in pvldb Volume 13, Issue 12

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader