skip to main content
10.1145/2723372.2742798acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Purity: Building Fast, Highly-Available Enterprise Flash Storage from Commodity Components

Published:27 May 2015Publication History

ABSTRACT

Although flash storage has largely replaced hard disks in consumer class devices, enterprise workloads pose unique challenges that have slowed adoption of flash in ``performance tier'' storage appliances. In this paper, we describe Purity, the foundation of Pure Storage's Flash Arrays, the first all-flash enterprise storage system to support compression, deduplication, and high-availability.

Purity borrows techniques from modern database and key-value storage architectures, and introduces novel storage primitives that have wide applicability to data management systems. For instance, all writes in Purity are monotonic, and deletions are handled using an atomic predicate-based tuple elision primitive.

Purity's redundancy mechanisms are optimized for SSD failure modes and performance characteristics, allowing for fast recovery from component failures and lower space overhead than the best hard disk systems. We built deduplication and data compression schemes atop these primitives.

Flash changes storage capacity/performance tradeoffs: unlike disk-based systems, flash deployments are rarely performance bound. A single Purity appliance can provide over 7GiB/s of throughput on 32KiB random I/Os, even through multiple device failures, and while providing asynchronous off-site replication. Typical installations have 99.9% latencies under 1ms, and production arrays average 5.4x data reduction and 99.999% availability.

Purity takes advantage of storage performance increasing more rapidly than computational performance to build a simpler (with respect to engineering, installation, and management) scale-up storage appliance that supports hundreds of terabytes of highly-available, high-performance storage. The resulting performance and capacity supports many customer deployments of multiple applications, including scale-out and parallel systems, such as MongoDB and Oracle RAC, on a single Purity appliance.

References

  1. D. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In Proc. SIGMOD Conf., pages 671--682, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Agrawal, D. Ganesan, R. Sitaraman, Y. Diao, and S. Singh. Lazy-adaptive tree: An optimized index structure for flash devices. In Proc. 35th VLDB Conf., Aug. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Alvaro, N. Conway, J. Hellerstein, and W. R. Marczak. Consistency analysis in Bloom: a CALM and collected approach. In Proc. CIDR, pages 249--260, 2011.Google ScholarGoogle Scholar
  4. T. J. Ameloot and J. Van den Bussche. Positive Dedalus programs tolerate non-causality. Journal of Computer and System Sciences, 80(7):1191--1213, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  5. A. Anand, C. Muthukrishnan, S. Kappes, A. Akella, and S. Nath. Cheap and large CAMs for high performance data-intensive networked systems. In Proc. 7th NSDI Symp., 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. FAWN: A fast array of wimpy nodes. In Proc. 22nd SOSP Conf., pages 1--14, Oct. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Workload analysis of a large-scale key-value store. In Proc. 2012 SIGMETRICS, pages 53--64, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Balakrishnan, D. Malkhi, V. Prabhakaran, T. Wobber, M. Wei, and J. D. Davis. CORFU: A shared log design for flash clusters. In Proc. 9th NSDI Symp., Apr. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Balakrishnan, D. Malkhi, T. Wobber, M. Wu, V. Prabhakaran, M. Wei, J. D. Davis, S. Rao, T. Zou, and A. Zuck. Tango: Distributed data structures over a shared log. In Proc. 24th SOSP Conf., pages 325--340, Nov. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. A. Bender, M. Farach-Colton, J. T. Fineman, Y. R. Fogel, B. C. Kuszmaul, and J. Nelson. Cache-oblivious streaming B-trees. In Proc. 19th Symp. on Parallel Algorithms and Architectures, pages 81--92, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. A. Brewer. Lessons from giant-scale services. IEEE Internet Computing, 5(4):46--55, Aug. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Burrows, C. Jerian, B. Lampson, and T. Mann. On-line data compression in a log-structured file system. In Proc. 5th ASPLOS, pages 2--9, Oct. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. In Proc. 7th OSDI Symp., Nov. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. F. Chen, T. Luo, and X. Zhang. CAFTL: A content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In Proc. 9th FAST Conf., Feb. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!'s hosted data serving platform. In Proc. 34th VLDB Conf., Aug. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In Proc. 1st ACM Symp. on Cloud Computing, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Cully, J. Wires, D. Meyer, K. Jamieson, K. Fraser, T. Deegan, D. Stodden, G. Lefebvre, D. Ferstay, and A. Warfield. Strata: High-performance scalable storage on virtualized non-volatile memory. In Proc. 12th FAST Conf., Feb. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Dean. Designs, lessons and advice from building large distributed systems. Keynote from LADIS, 2009.Google ScholarGoogle Scholar
  19. J. Dean and L. A. Barroso. The tail at scale. Communications of the ACM, 56(2):74--80, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Debnath, M. F. Mokbel, D. J. Lilja, and D. Du. Deferred updates for flash-based storage. In Proc. 26th IEEE Conf. on Mass Storage Systems and Technologies, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Debnath, S. Sengupta, and J. Li. ChunkStash: Speeding up inline storage deduplication using flash memory. In Proc. 2010 USENIX Annual Technical Conference, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. Debnath, S. Sengupta, and J. Li. FlashStore: High throughput persistent key-value store. In Proc. 36th VLDB Conf., Sept. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. Debnath, S. Sengupta, and J. Li. SkimpyStash: RAM space skimpy key-value store on flash-based storage. In Proc. SIGMOD Conf., 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Gray and G. Graefe. The five-minute rule ten years later, and other computer storage rules of thumb. ACM Sigmod Record, 26(4):63--68, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Gray, P. Helland, P. O'Neil, and D. Shasha. The dangers of replication and a solution. In Proc. SIGMOD Conf., pages 173--182, June 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. N. Gray, R. A. Lorie, G. R. Putzolu, and I. L. Traiger. Granularity of locks and degrees of consistency in a shared data base. In Proc. IFIP Working Conf. on Modelling in Data Base Management Systems, pages 365--394, 1976.Google ScholarGoogle Scholar
  27. K. M. Greenan, D. D. Long, E. L. Miller, T. J. E. Schwarz, S.J., and A. Wildani. Building flexible, fault-tolerant flash-based storage systems. In Proc. 5th Workshop on Hot Topics in System Dependability, June 2009.Google ScholarGoogle Scholar
  28. L. M. Grupp, J. D. Davis, and S. Swanson. The bleak future of NAND flash memory. In Proc. 10th FAST Conf., Feb. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. M. Grupp, J. D. Davis, and S. Swanson. The harey tortoise: Managing heterogeneous write performance in SSDs. In Proc. 2013 USENIX Annual Technical Conference, June 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Gupta, R. Pisolkar, B. Urgaonkar, and A. Sivasubramaniam. Leveraging value locality in optimizing NAND flash-based SSDs. In Proc. 9th FAST Conf., Feb. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Hamilton. Why scale matters and why the cloud is different. AWS re:Invent, 2013.Google ScholarGoogle Scholar
  32. J. Hamilton. AWS innovation at scale. AWS re:Invent 2014.Google ScholarGoogle Scholar
  33. A. L. Holloway, V. Raman, G. Swart, and D. J. DeWitt. How to barter bits for chronons: compression and bandwidth trade offs for database scans. In Proc. SIGMOD Conf., pages 389--400, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. B. R. Iyer and D. Wilhite. Data compression support in databases. In Proc. 20th VLDB Conf., pages 695--704, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. Jermaine, E. Omiecinski, and W. G. Yee. The partitioned exponential file for database storage management. Proc. VLDB Endowment, 16:417--437, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. K. Jin and E. L. Miller. The effectiveness of deduplication on virtual machine disk images. In Proc. SYSTOR 2009, May 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. W. K. Josephson, L. A. Bongo, K. Li, and D. Flynn. DFS: A file system for virtualized flash storage. ACM Trans. on Storage, 6(3), Sept. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Y. Kang and E. L. Miller. Adding aggressive error correction to a high-performance compressing flash file system. In Proc. 9th ACM & IEEE Conf. on Embedded Software (EMSOFT '09), Oct. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. LevelDB: A fast and lightweight key/value database library by Google. https://code.google.com/p/leveldb/.Google ScholarGoogle Scholar
  40. Y. Li, B. He, Q. Luo, and K. Yi. Tree indexing on flash disks. In Proc. 25th Int'l Conf. on Data Engineering, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. H. Lim, B. Fan, D. G. Andersen, and M. Kaminsky. SILT: a memory-efficient, high-performance key-value store. In Proc. 23rd SOSP Conf., Oct. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. D. T. Meyer and W. J. Bolosky. A study of practical deduplication. In Proc. 9th FAST Conf., Feb. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. C. Min, K. Kim, H. Cho, S.-W. Lee, and Y. I. Eom. SFS: Random write considered harmful in solid state drives. In Proc. 10th FAST Conf., Feb. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33:351--385, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. S. Plank, K. M. Greenan, and E. L. Miller. Screaming fast Galois field arithmetic using Intel SIMD instructions. In Proc. 11th FAST Conf., Feb. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Pure Storage reference architecture for Oracle databases, 2014.Google ScholarGoogle Scholar
  47. RocksDB: A fork of LevelDB by Facebook. https://github.com/facebook/rocksdb/.Google ScholarGoogle Scholar
  48. M. Rosenblum and J. K. Ousterhout. The design and implementation of a log-structured file system. ACM Trans. on Computer Systems, 10(1):26--52, Feb. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. S. M. Rumble, A. Kejriwal, and J. Ousterhout. Log-structured memory for DRAM-based storage. In Proc. 12th FAST Conf., Feb. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. R. Sears, M. Callaghan, and E. Brewer. Rose: Compressed, log-structured replication. In Proc. 34th VLDB Conf., pages 526--537, Aug. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. R. Sears and R. Ramakrishnan. bLSM: A general purpose log structured merge tree. In Proc. SIGMOD Conf., pages 217--228, May 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. D. J. Sheehy and D. Smith. Bitcask. a log-structured hash table for fast key value data. Technical report, Technical report, Basho Technologies, 04 2010, 2010.Google ScholarGoogle Scholar
  53. V. Sikka, F. Färber, W. Lehner, S. K. Cha, T. Peh, and C. Bornhövd. Efficient transaction processing in SAP HANA database: the end of a column store myth. In Proc. SIGMOD Conf., pages 731--742, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. K. Srinivasan, T. Bisson, G. Goodson, and K. Voruganti. iDedup: Latency-aware, inline data deduplication for primary storage. In Proc. 10th FAST Conf., Feb. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. R. Stoica, M. Athanassoulis, R. Johnson, and A. Ailamaki. Evaluating and repairing write performance on flash devices. In DaMoN, pages 9--14, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran, and S. Zdonik. C-store: a column-oriented DBMS. In Proc. 31st VLDB Conf., pages 553--564, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Y. Wang, M. Kapritsos, Z. Ren, P. Mahajan, J. Kirubanandam, L. Alvisi, and M. Dahlin. Robustness in the Salus scalable block store. In Proc. 10th NSDI Symp., pages 357--370, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. H. Yadava. The Berkeley DB Book. Apress, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Purity: Building Fast, Highly-Available Enterprise Flash Storage from Commodity Components

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
      May 2015
      2110 pages
      ISBN:9781450327589
      DOI:10.1145/2723372

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 May 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGMOD '15 Paper Acceptance Rate106of415submissions,26%Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader