ABSTRACT
The query models of the recent generation of very large scale distributed (VLSD) shared-nothing data storage systems, including our own PNUTS and others (e.g. BigTable, Dynamo, Cassandra, etc.) are intentionally simple, focusing on simple lookups and scans and trading query expressiveness for massive scale. Indexes and views can expand the query expressiveness of such systems by materializing more complex access paths and query results. In this paper, we examine mechanisms to implement indexes and views in a massive scale distributed database. For web applications, minimizing update latencies is critical, so we advocate deferring the work of maintaining views and indexes as much as possible. We examine the design space, and conclude that two types of view implementations, called remote view tables (RVTs) and local view tables (LVTs), provide good tradeoff between system throughput and minimizing view staleness. We describe how to construct and maintain such view tables, and how they can be used to implement indexes, group-by-aggregate views, equijoin views and selection views. We also introduce and analyze a consistency model that makes it easier for application developers to cope with the impact of deferred view maintenance. An empirical evaluation quantifies the maintenance costs of our views, and shows that they can significantly improve the cost of evaluating complex queries.
- CouchDB. http://couchdb.apache.org/.Google Scholar
- D. Agrawal, A. E. Abbadi, A. K. Singh, and T. Yurek. Efficient view maintenance at data warehouses. In SIGMOD, 1997. Google ScholarDigital Library
- J. A. Blakeley, P.-A. Larson, and F. W. Tompa. Efficiently updating materialized views. In SIGMOD, 1986. Google ScholarDigital Library
- M. Cafarella et al. Data management projects at Google. SIGMOD Record, 34--38(1), March 2008. Google ScholarDigital Library
- S. Ceri and J. Widom. Deriving production rules for incremental view maintenance. In VLDB, 1991. Google ScholarDigital Library
- F. Chang et al. Bigtable: A distributed storage system for structured data. In OSDI, 2006. Google ScholarDigital Library
- S. Chen, B. Liu, and E. A. Rundensteiner. Multiversion-based view maintenance over distributed data sources. ACM Transactions on Database Systems, 29:675--709, 2004. Google ScholarDigital Library
- S. Chen and E. A. Rundensteiner. Gpivot: Efficient incremental maintenance of complex rolap views. In ICDE, 2005. Google ScholarDigital Library
- L. S. Colby et al. Algorithms for deferred view maintenance. In SIGMOD, 1996. Google ScholarDigital Library
- B. F. Cooper et al. PNUTS: Yahoo!'s hosted data serving platform. In VLDB, 2008. Google ScholarDigital Library
- G. DeCandia et al. Dynamo: Amazon's highly available key-value store. In SOSP, 2007. Google ScholarDigital Library
- G. Graefe. B-tree indexes for high update rates. SIGMOD Record, 35(1):39--44, March 2006. Google ScholarDigital Library
- A. Gupta, I. S. Mumick, and V. S. Subrahmanian. Maintaining views incrementally. In SIGMOD, 1993. Google ScholarDigital Library
- H. He, J. Xie, J. Yang, and H. Yu. Asymmetric batch incremental view maintenance. In ICDE, 2005. Google ScholarDigital Library
- A. Lakshman, P. Malik, and K. Ranganathan. Cassandra: A structured storage system on a P2P network. In SIGMOD, 2008. Google ScholarDigital Library
- G. Luo, J. F. Naughton, C. J. Ellmann, and M. Watzke. A comparison of three methods for join view maintenance in parallel RDBMS. In ICDE, 2003.Google ScholarCross Ref
- C. Mohan and I. Narang. Algorithms for creating indexes for very large tables without quiescing updates. In SIGMOD, 1992. Google ScholarDigital Library
- D. Quass, A. Gupta, I. S. Mumick, and J. Widom. Making views self-maintainable for data warehousing. In Conf. on Parallel and Distributed Information Systems, 1996. Google ScholarDigital Library
- D. Quass and J. Widom. On-line warehouse view maintenance. In SIGMOD, 1997. Google ScholarDigital Library
- K. Salem, K. Beyer, B. Lindsay, and R. Cochrane. How to roll a join: Asynchronous incremental view maintenance. In SIGMOD, 2000. Google ScholarDigital Library
- J. Zhou, P.-A. Larson, and H. G. Elmongui. Lazy maintenance of materialized views. In VLDB, 2007. Google ScholarDigital Library
- Y. Zhuge, H. Garcia-Molina, J. Hammer, and J. Widom. View maintenance in a warehousing environment. In SIGMOD, 1995. Google ScholarDigital Library
- Y. Zhuge, H. Garcia-Molina, and J. L. Wiener. The strobe algorithms for multi-source warehouse consistency. In Proc. Conf. on Parallel and Distributed Information Systems, 1996. Google ScholarDigital Library
Index Terms
- Asynchronous view maintenance for VLSD databases
Recommendations
SQL Query Optimization in Distributed NoSQL Databases for Cloud-Based Applications
Algorithmic Aspects of Cloud ComputingAbstractA method for query optimization is presented by utilizing Spark SQL, a module of Apache Spark that integrates relational data processing. The goal of this paper is to explore NoSQL databases and their effective usage in conjunction with ...
Describing and deriving certain answers over partial databases
Although there has been much work in recent years on answering queries using views, there has been less work on deriving answers from partial databases. That is given a partial database state DV, materialized via the view V, what queries can be asked ...
Query Optimization in NoSQL Databases Using an Enhanced Localized R-tree Index
Information Integration and Web IntelligenceAbstractQuery optimization is a crucial process across data mining and big data analytics. As the size of the data in the modern applications is increasing due to various sources, types and multi-modal records across databases, there is an urge to ...
Comments