Abstract
This paper introduces Crescando: a scalable, distributed relational table implementation designed to perform large numbers of queries and updates with guaranteed access latency and data freshness. To this end, Crescando leverages a number of modern query processing techniques and hardware trends. Specifically, Crescando is based on parallel, collaborative scans in main memory and so-called "query-data" joins known from data-stream processing. While the proposed approach is not always optimal for a given workload, it provides latency and freshness guarantees for all workloads. Thus, Crescando is particularly attractive if the workload is unknown, changing, or involves many different queries. This paper describes the design, algorithms, and implementation of a Crescando storage node, and assesses its performance on modern multi-core hardware.
- A. Ailamaki et. al. Weaving relations for cache performance. In Proc. VLDB '01, 2001. Google ScholarDigital Library
- P. M. G. Apers et. al. Prisma/db: A parallel, main memory relational dbms. IEEE TKDE, 4(6), 1992. Google ScholarDigital Library
- H. Berenson et. al. A critique of ansi sql isolation levels. In Proc. SIGMOD '95, 1995. Google ScholarDigital Library
- P. A. Boncz et. al. Database architecture optimized for the new bottleneck: Memory access. In Proc. VLDB '99, 1999. Google ScholarDigital Library
- P. A. Boncz et. al. Monetdb/x100: Hyper-pipelining query execution. In Proc. CIDR '05, 2005.Google Scholar
- S. Chandrasekaran and M. J. Franklin. Streaming queries over streaming data. In Proc. VLDB '02, 2002. Google ScholarDigital Library
- D. J. Dewitt et. al. The gamma database machine project. IEEE TKDE, 2(1), 1990. Google ScholarDigital Library
- D. J. DeWitt et. al. An evaluation of non-equijoin algorithms. In Proc. VLDB '91, 1991. Google ScholarDigital Library
- F. Fabret et. al. Filtering algorithms and implementation for very fast publish/subscribe systems. In Proc. SIGMOD '01, 2001. Google ScholarDigital Library
- P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci., 31(2):182--209, 1985. Google ScholarDigital Library
- A. Guttman. R-trees: a dynamic index structure for spatial searching. In Proc. SIGMOD '84, 1984. Google ScholarDigital Library
- S. Harizopoulos et. al. Qpipe: A simultaneously pipelined relational query engine. In Proc. SIGMOD '05, 2005. Google ScholarDigital Library
- A. Kleen. A numa api for linux. Novell Technical Whitepaper, 2005. http://www.novell.com/-resourcecenter/ext_item.jsp?itemId=14444.Google Scholar
- C. Lang et. al. Increasing buffer-locality for multiple relational table scans through grouping and throttling. Proc. ICDE '07, 2007.Google Scholar
- D. B. Lomet. Key range locking strategies for improved concurrency. In Proc. VLDB '93, 1993. Google ScholarDigital Library
- C. Mohan et. al. Aries: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM TODS, 17:94--162, 1992. Google ScholarDigital Library
- L. Qiao et. al. Main-memory scan sharing for multi-core cpus. Proc. VLDB '08, 1(1), 2008. Google ScholarDigital Library
- V. Raman et. al. Constant-time query processing. In Proc. ICDE '08, 2008. Google ScholarDigital Library
- M. Ronström and L. Thalmann. Mysql cluster architecture overview: High availability features of mysql cluster. MySQL Technical Whitepaper, 2004. http://www.techworld.com/whitepapers/index.cfm?-whitepaperid=5663.Google Scholar
- K. A. Ross. Conjunctive selection conditions in main memory. In Proc. PODS '02, 2002. Google ScholarDigital Library
- K. A. Ross. Selection conditions in main memory. ACM TODS, 29(1), 2004. Google ScholarDigital Library
- T. K. Sellis. Multiple-query optimization. ACM TODS, 13(1):23--52, 1988. Google ScholarDigital Library
- K.-Y. Whang et. al. A linear-time probabilistic counting algorithm for database applications. ACM TODS, 15(2):208--229, 1990. Google ScholarDigital Library
- W. A. Wulf and S. A. McKee. Hitting the memory wall: implications of the obvious. ACM SIGARCH Comput. Archit. News, 23(1):20--24, 1995. Google ScholarDigital Library
- M. Zukowski et. al. Cooperative scans: Dynamic bandwidth sharing in a dbms. In Proc. VLDB '07, 2007. Google ScholarDigital Library
Index Terms
- Predictable performance for unpredictable workloads
Recommendations
Predictable performance and high query concurrency for data analytics
Conventional data warehouses employ the query-at-a-time model, which maps each query to a distinct physical plan. When several queries execute concurrently, this model introduces contention and thrashing, because the physical plans--unaware of each ...
Materialized view selection for XQuery workloads
SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of DataThe efficient processing of XQuery still poses significant challenges. A particularly effective technique to improve XQuery processing performance consists of using materialized views to answer queries. In this work, we consider the problem of choosing ...
Performance prediction for concurrent database workloads
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of dataCurrent trends in data management systems, such as cloud and multi-tenant databases, are leading to data processing environments that concurrently execute heterogeneous query workloads. At the same time, these systems need to satisfy diverse performance ...
Comments