Abstract
Couchbase Server is a highly scalable document-oriented database management system. With a shared-nothing architecture, it exposes a fast key-value store with a managed cache for sub-millisecond data operations, indexing for fast queries, and a powerful query engine for executing declarative SQL-like queries. Its Query Service debuted several years ago and supports high volumes of low-latency queries and updates for JSON documents. Its recently introduced Analytics Service complements the Query Service. Couchbase Analytics, the focus of this paper, supports complex analytical queries (e.g., ad hoc joins and aggregations) over large collections of JSON documents. This paper describes the Analytics Service from the outside in, including its user model, its SQL++ based query language, and its MPP-based storage and query processing architecture. It also briefly touches on the relationship of Couchbase Analytics to Apache AsterixDB, the open source Big Data management system at the core of Couchbase Analytics.
- S. Alsubaiee, A. Behm, V. R. Borkar, Z. Heilbron, Y.-S. Kim, M. J. Carey, M. Dreseler, and C. Li. Storage management in asterixdb. PVLDB, 7(10):841--852, 2014. Google ScholarDigital Library
- A. Alsuliman. Optimizing external parallel sorting in AsterixDB. M.S. Thesis, Department of Computer Science, University of California, Irvine, 2018.Google Scholar
- Apache AsterixDB, http://asterixdb.apache.org.Google Scholar
- ASTERIX, http://asterix.ics.uci.edu.Google Scholar
- D. Borkar, R. Mayuram, G. Sangudi, and M. J. Carey. Have your data and query it too: From key-value caching to Big Data management. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), San Francisco, CA, USA, June 26 - July 01, 2016, pages 239--251. Google ScholarDigital Library
- V. Borkar, Y. Bu, E. P. Carman, Jr., N. Onose, T. Westmann, P. Pirzadeh, M. Carey, and V. Tsotras. Algebricks: A data model-agnostic compiler backend for Big Data languages. In Proceedings of the Sixth ACM Symposium on Cloud Computing (SoCC), pages 422--433, New York, NY, USA, 2015. Google ScholarDigital Library
- V. R. Borkar, M. J. Carey, R. Grover, N. Onose, and R. Vernica. Hyracks: A flexible and extensible foundation for data-intensive computing. In Proceedings of the 27th International Conference on Data Engineering (ICDE), April 11--16, pages 1151--1162, Hannover, Germany, 2011. Google ScholarDigital Library
- M. Carey. AsterixDB mid-flight: a case study in building systems in academia. In Proceedings of the 35th International Conference on Data Engineering (ICDE), April 8--11, Macao, China, pages 1--12, 2019.Google ScholarCross Ref
- R. Chaiken, B. Jenkins, P.-Å. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: easy and efficient parallel processing of massive data sets. PVLDB, 1(2):1265--1276, 2008. Google ScholarDigital Library
- D. Chamberlin. SQL++ for SQL Users: A Tutorial. September 2018. (Available via Amazon.com.).Google Scholar
- S. Chaudhuri, U. Dayal, and V. Narasayya. An overview of business intelligence technology. Commun. ACM, 54(8):88--98, Aug. 2011. Google ScholarDigital Library
- E. F. Codd. Derivability, redundancy and consistency of relations stored in large data banks. IBM Research Report, San Jose, California, RJ599, 1969.Google Scholar
- E. F. Codd. A relational model of data for large shared data banks. Commun. ACM, 13(6):377--387, 1970. Google ScholarDigital Library
- D. J. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. Commun. ACM, 35(6):85--98, 1992. Google ScholarDigital Library
- T. Elliott. What is hybrid transaction/analytical processing (HTAP)? https://www.zdnet.com/article/what-is-hybrid-transactionanalytical-processing-htap/, December 15, 2014.Google Scholar
- G. Graefe. Query evaluation techniques for large databases. ACM Comput. Surv., 25(2):73--170, 1993. Google ScholarDigital Library
- JSON. http://www.json.org/.Google Scholar
- T. Kim, A. Behm, M. Blow, V. Borkar, Y. Bu, M. J. Carey, M. Hubail, S. Jahangiri, J. Jia, C. Li, C. Luo, I. Maxon, and P. Pirzadeh. Robust and efficient memory management in Apache AsterixDB. 2019. Submitted for publication.Google Scholar
- C. Luo and M. J. Carey. LSM-based Storage Techniques: A survey. CoRR, abs/1812.07527, 2018.Google Scholar
- Couchbase N1QL for Analytics language web page, Couchbase, Inc., https://docs.couchbase.com/server/6.0/analytics/introduction.html#n1ql-for-analytics-query-language.Google Scholar
- P. E. O'Neil, E. Cheng, D. Gawlick, and E. J. O'Neil. The log-structured merge-tree (lsm-tree). Acta Inf., 33(4):351--385, 1996. Google ScholarDigital Library
- K. W. Ong, Y. Papakonstantinou, and R. Vernoux. The SQL++ semi-structured data model and query language: A capabilities survey of SQL-on-Hadoop, NoSQL and NewSQL databases. CoRR, abs/1405.3631, 2014.Google Scholar
- L. D. Shapiro. Join processing in database systems with large main memories. ACM Transactions on Database Systems (TODS), 11(3):239--264, 1986. Google ScholarDigital Library
- SocialGen, https://github.com/pouriapirz/socialGen.Google Scholar
Index Terms
- Couchbase analytics: NoETL for scalable NoSQL data analysis
Comments