skip to main content
research-article

Storage management in AsterixDB

Published:01 June 2014Publication History
Skip Abstract Section

Abstract

Social networks, online communities, mobile devices, and instant messaging applications generate complex, unstructured data at a high rate, resulting in large volumes of data. This poses new challenges for data management systems that aim to ingest, store, index, and analyze such data efficiently. In response, we released the first public version of AsterixDB, an open-source Big Data Management System (BDMS), in June of 2013. This paper describes the storage management layer of AsterixDB, providing a detailed description of its ingestion-oriented approach to local storage and a set of initial measurements of its ingestion-related performance characteristics.

In order to support high frequency insertions, AsterixDB has wholly adopted Log-Structured Merge-trees as the storage technology for all of its index structures. We describe how the AsterixDB software framework enables "LSM-ification" (conversion from an in-place update, disk-based data structure to a deferred-update, append-only data structure) of any kind of index structure that supports certain primitive operations, enabling the index to ingest data efficiently. We also describe how AsterixDB ensures the ACID properties for operations involving multiple heterogeneous LSM-based indexes. Lastly, we highlight the challenges related to managing the resources of a system when many LSM indexes are used concurrently and present AsterixDB's initial solution.

References

  1. AsterixDB. http://asterixdb.ics.uci.edu/.Google ScholarGoogle Scholar
  2. Cassandra. http://cassandra.apache.org/.Google ScholarGoogle Scholar
  3. CouchDB. http://couchdb.apache.org/.Google ScholarGoogle Scholar
  4. HBase. http://hbase.apache.org/.Google ScholarGoogle Scholar
  5. LevelDB. https://code.google.com/p/leveldb/.Google ScholarGoogle Scholar
  6. S. Alsubaiee et al. Asterix: scalable warehouse-style web data integration. In IIWeb, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Apache Hive, http://hadoop.apache.org/hive.Google ScholarGoogle Scholar
  8. A. Behm et al. Asterix: towards a scalable, semistructured data platform for evolving-world models. Distributed and Parallel Databases, 29(3), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. V. R. Borkar et al. Hyracks: A flexible and extensible foundation for data-intensive computing. In ICDE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. P. Brown et al. Towards automated performance tuning for complex workloads. In VLDB, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. Chang et al. Bigtable: A Distributed Storage System for Structured Data. ACM TOCS., 26(2), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Chen et al. Log-based architectures: using multicore to help software behave correctly. ACM SIGOPS Oper. Syst. Rev., 45(1), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Facebook. Facebook's growth in the past year. https://www.facebook.com/media/set/? set=a.10151908376636729.1073741825.20531316728.Google ScholarGoogle Scholar
  14. A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jaql, http://www.jaql.org.Google ScholarGoogle Scholar
  16. C. Jermaine et al. The partitioned exponential file for database storage management. The VLDB Journal., 16(4), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. I. Kamel et al. On packing R-trees. In CIKM, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Kornacker et al. Concurrency and recovery in generalized search trees. In SIGMOD, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Mohan. ARIES/KVL: A key-value locking method for concurrency control of multiaction transactions operating on b-tree indexes. In VLDB, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Mohan et al. ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM TODS., 17(1), 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Olston et al. Pig Latin: a not-so-foreign language for data processing. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. O'Neil et al. The log-structured merge-tree (LSM-tree). Acta Inf., 33(4), 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. O. Procopiuc et al. Bkd-tree: A dynamic scalable kd-tree. In SSTD, 2003.Google ScholarGoogle Scholar
  24. W. Pugh. Skip Lists: A probabilistic alternative to balanced trees. Commun. ACM, 33(6), 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Rosenblum and J. K. Ousterhout. The design and implementation of a log-structured file system. In SOSP, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Sears et al. bLSM: a general purpose log structured merge tree. In SIGMOD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. G. Severance et al. Differential files: Their application to the maintenance of large databases. ACM TODS., 1(3), 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. J. Storm et al. Adaptive self-tuning memory in DB2. In VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Twitter Blog. New Tweets per second record, and how!, August 2013. https://blog.twitter.com/2013/new-tweets-persecond-record-and-how.Google ScholarGoogle Scholar

Index Terms

  1. Storage management in AsterixDB
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 7, Issue 10
      June 2014
      146 pages
      ISSN:2150-8097
      Issue’s Table of Contents

      Publisher

      VLDB Endowment

      Publication History

      • Published: 1 June 2014
      Published in pvldb Volume 7, Issue 10

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader