skip to main content
research-article

Couchbase analytics: NoETL for scalable NoSQL data analysis

Published:01 August 2019Publication History
Skip Abstract Section

Abstract

Couchbase Server is a highly scalable document-oriented database management system. With a shared-nothing architecture, it exposes a fast key-value store with a managed cache for sub-millisecond data operations, indexing for fast queries, and a powerful query engine for executing declarative SQL-like queries. Its Query Service debuted several years ago and supports high volumes of low-latency queries and updates for JSON documents. Its recently introduced Analytics Service complements the Query Service. Couchbase Analytics, the focus of this paper, supports complex analytical queries (e.g., ad hoc joins and aggregations) over large collections of JSON documents. This paper describes the Analytics Service from the outside in, including its user model, its SQL++ based query language, and its MPP-based storage and query processing architecture. It also briefly touches on the relationship of Couchbase Analytics to Apache AsterixDB, the open source Big Data management system at the core of Couchbase Analytics.

References

  1. S. Alsubaiee, A. Behm, V. R. Borkar, Z. Heilbron, Y.-S. Kim, M. J. Carey, M. Dreseler, and C. Li. Storage management in asterixdb. PVLDB, 7(10):841--852, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Alsuliman. Optimizing external parallel sorting in AsterixDB. M.S. Thesis, Department of Computer Science, University of California, Irvine, 2018.Google ScholarGoogle Scholar
  3. Apache AsterixDB, http://asterixdb.apache.org.Google ScholarGoogle Scholar
  4. ASTERIX, http://asterix.ics.uci.edu.Google ScholarGoogle Scholar
  5. D. Borkar, R. Mayuram, G. Sangudi, and M. J. Carey. Have your data and query it too: From key-value caching to Big Data management. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), San Francisco, CA, USA, June 26 - July 01, 2016, pages 239--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. V. Borkar, Y. Bu, E. P. Carman, Jr., N. Onose, T. Westmann, P. Pirzadeh, M. Carey, and V. Tsotras. Algebricks: A data model-agnostic compiler backend for Big Data languages. In Proceedings of the Sixth ACM Symposium on Cloud Computing (SoCC), pages 422--433, New York, NY, USA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. V. R. Borkar, M. J. Carey, R. Grover, N. Onose, and R. Vernica. Hyracks: A flexible and extensible foundation for data-intensive computing. In Proceedings of the 27th International Conference on Data Engineering (ICDE), April 11--16, pages 1151--1162, Hannover, Germany, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Carey. AsterixDB mid-flight: a case study in building systems in academia. In Proceedings of the 35th International Conference on Data Engineering (ICDE), April 8--11, Macao, China, pages 1--12, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  9. R. Chaiken, B. Jenkins, P.-Å. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: easy and efficient parallel processing of massive data sets. PVLDB, 1(2):1265--1276, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Chamberlin. SQL++ for SQL Users: A Tutorial. September 2018. (Available via Amazon.com.).Google ScholarGoogle Scholar
  11. S. Chaudhuri, U. Dayal, and V. Narasayya. An overview of business intelligence technology. Commun. ACM, 54(8):88--98, Aug. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. F. Codd. Derivability, redundancy and consistency of relations stored in large data banks. IBM Research Report, San Jose, California, RJ599, 1969.Google ScholarGoogle Scholar
  13. E. F. Codd. A relational model of data for large shared data banks. Commun. ACM, 13(6):377--387, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. J. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. Commun. ACM, 35(6):85--98, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Elliott. What is hybrid transaction/analytical processing (HTAP)? https://www.zdnet.com/article/what-is-hybrid-transactionanalytical-processing-htap/, December 15, 2014.Google ScholarGoogle Scholar
  16. G. Graefe. Query evaluation techniques for large databases. ACM Comput. Surv., 25(2):73--170, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. JSON. http://www.json.org/.Google ScholarGoogle Scholar
  18. T. Kim, A. Behm, M. Blow, V. Borkar, Y. Bu, M. J. Carey, M. Hubail, S. Jahangiri, J. Jia, C. Li, C. Luo, I. Maxon, and P. Pirzadeh. Robust and efficient memory management in Apache AsterixDB. 2019. Submitted for publication.Google ScholarGoogle Scholar
  19. C. Luo and M. J. Carey. LSM-based Storage Techniques: A survey. CoRR, abs/1812.07527, 2018.Google ScholarGoogle Scholar
  20. Couchbase N1QL for Analytics language web page, Couchbase, Inc., https://docs.couchbase.com/server/6.0/analytics/introduction.html#n1ql-for-analytics-query-language.Google ScholarGoogle Scholar
  21. P. E. O'Neil, E. Cheng, D. Gawlick, and E. J. O'Neil. The log-structured merge-tree (lsm-tree). Acta Inf., 33(4):351--385, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. W. Ong, Y. Papakonstantinou, and R. Vernoux. The SQL++ semi-structured data model and query language: A capabilities survey of SQL-on-Hadoop, NoSQL and NewSQL databases. CoRR, abs/1405.3631, 2014.Google ScholarGoogle Scholar
  23. L. D. Shapiro. Join processing in database systems with large main memories. ACM Transactions on Database Systems (TODS), 11(3):239--264, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. SocialGen, https://github.com/pouriapirz/socialGen.Google ScholarGoogle Scholar

Index Terms

  1. Couchbase analytics: NoETL for scalable NoSQL data analysis
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 12, Issue 12
        August 2019
        547 pages

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 August 2019
        Published in pvldb Volume 12, Issue 12

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader