skip to main content
research-article

Accordion: better memory organization for LSM key-value stores

Published:01 August 2018Publication History
Skip Abstract Section

Abstract

Log-structured merge (LSM) stores have emerged as the technology of choice for building scalable write-intensive key-value storage systems. An LSM store replaces random I/O with sequential I/O by accumulating large batches of writes in a memory store prior to flushing them to log-structured disk storage; the latter is continuously re-organized in the background through a compaction process for efficiency of reads. Though inherent to the LSM design, frequent compactions are a major pain point because they slow down data store operations, primarily writes, and also increase disk wear. Another performance bottleneck in today's state-of-the-art LSM stores, in particular ones that use managed languages like Java, is the fragmented memory layout of their dynamic memory store.

In this paper we show that these pain points may be mitigated via better organization of the memory store. We present Accordion - an algorithm that addresses these problems by re-applying the LSM design principles to memory management. Accordion is implemented in the production code of Apache HBase, where it was extensively evaluated. We demonstrate Accordion's double-digit performance gains versus the baseline HBase implementation and discuss some unexpected lessons learned in the process.

References

  1. Apache HBase. http://hbase.apache.org.Google ScholarGoogle Scholar
  2. Apache hbase at airbnb. https://www.slideshare.net/HBaseCon/apache-hbase-at-airbnb.Google ScholarGoogle Scholar
  3. Avoiding full gcs with memstore-local allocation buffers. http://blog.cloudera.com/blog/2011/02/.Google ScholarGoogle Scholar
  4. Cassandra. http://cassandra.apache.org.Google ScholarGoogle Scholar
  5. Dragon: A distributed graph query engine. https://code.facebook.com/posts/1737605303120405.Google ScholarGoogle Scholar
  6. Hbase compaction tuning tips. https://community.hortonworks.com/articles/52616/hbase-compaction-tuning-tips.html.Google ScholarGoogle Scholar
  7. Hbase operations in a flurry. http://bit.ly/2yaTfoP.Google ScholarGoogle Scholar
  8. Java concurrent skiplist map. https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ConcurrentSkipListSet.html.Google ScholarGoogle Scholar
  9. LevelDB. https://github.com/google/leveldb.Google ScholarGoogle Scholar
  10. Mongodb. https://mongorocks.org.Google ScholarGoogle Scholar
  11. Mysql. http://myrocks.io.Google ScholarGoogle Scholar
  12. Off-heap memtables in cassandra 2.1. https://www.datastax.com/dev/blog/off-heap-memtables-in-cassandra-2-1.Google ScholarGoogle Scholar
  13. Offheap read-path in production the alibaba story. https://blog.cloudera.com/blog/2017/03/.Google ScholarGoogle Scholar
  14. RocksDB. http://rocksdb.org/.Google ScholarGoogle Scholar
  15. Rocksdb tuning guide.Google ScholarGoogle Scholar
  16. Scylladb. https://github.com/scylladb/scylla.Google ScholarGoogle Scholar
  17. Sstable compaction and compaction strategies. http://bit.ly/2wXc7ah.Google ScholarGoogle Scholar
  18. Universal compaction.Google ScholarGoogle Scholar
  19. Zen: Pinterest's graph storage service. http://bit.ly/2ft4YDx.Google ScholarGoogle Scholar
  20. O. Balmau, R. Guerraoui, V. Trigonakis, and I. Zablotchi. Flodb: Unlocking memory in persistent key-value stores. In Proceedings of the Twelfth European Conference on Computer Systems, EuroSys '17, pages 80--94, New York, NY, USA, 2017. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. L. Bentley and J. B. Saxe. Decomposable searching problems i. static-to-dynamic transformation. Journal of Algorithms, 1(4):301--358, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  22. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2):4:1--4:26, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10, pages 143--154, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Devineni, D. Koutra, M. Faloutsos, and C. Faloutsos. If walls could talk: Patterns and anomalies in facebook wallposts. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, ASONAM '15, pages 367--374, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Golan-Gueta, E. Bortnikov, E. Hillel, and I. Keidar. Scaling concurrent log-structured data stores. In Proceedings of the Tenth European Conference on Computer Systems, EuroSys '15, pages 32:1--32:14, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Gray, P. Sundaresan, S. Englert, K. Baclawski, and P. J. Weinberger. Quickly generating billion-record synthetic databases. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, SIGMOD '94, pages 243--252, New York, NY, USA, 1994. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. X.-Y. Hu, E. Eleftheriou, R. Haas, I. Iliadis, and R. Pletka. Write amplification analysis in flash-based solid state drives. In Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, SYSTOR '09, pages 10:1--10:9, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. Lu, T. S. Pillai, H. Gopalakrishnan, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Wisckey: Separating keys from values in ssd-conscious storage. TOS, 13(1):5:1--5:28, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. P. Muth, P. E. O'Neil, A. Pick, and G. Weikum. Design, implementation, and performance of the lham log-structured history data access method. In Proceedings of the 24rd International Conference on Very Large Data Bases, VLDB '98, pages 452--463, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C. Nyberg, T. Barclay, Z. Cvetanovic, J. Gray, and D. Lomet. AlphaSort: A cache-sensitive parallel external sort. The VLDB Journal, 4(4):603--628, Oct. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The log-structured merge-tree (LSM-tree). Acta Inf., 33(4):351--385, June 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Sears and R. Ramakrishnan. blsm: A general purpose log structured merge tree. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12, pages 217--228, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. S. Tanenbaum and H. Bos. Modern Operating Systems. Prentice Hall Press, Upper Saddle River, NJ, USA, 4th edition, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. G. Wu, X. He, and B. Eckart. An adaptive write buffer management scheme for flash-based ssds. Trans. Storage, 8(1):1:1--1:24, Feb. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. X. Wu, Y. Xu, Z. Shao, and S. Jiang. Lsm-trie: An lsm-tree-based ultra-large key-value store for small data items. In 2015 USENIX Annual Technical Conference (USENIX ATC 15), pages 71--82, Santa Clara, CA, 2015. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Accordion: better memory organization for LSM key-value stores
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Proceedings of the VLDB Endowment
            Proceedings of the VLDB Endowment  Volume 11, Issue 12
            August 2018
            426 pages
            ISSN:2150-8097
            Issue’s Table of Contents

            Publisher

            VLDB Endowment

            Publication History

            • Published: 1 August 2018
            Published in pvldb Volume 11, Issue 12

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader