Editorial Notes
Computationally Replicable. The experimental results of this paper were replicated by a SIGMOD Review Committee and were found to support the central results reported in the paper. Details of the review process are found here
ABSTRACT
This paper studies architecting query compilers. The state of the art in query compiler construction is lagging behind that in the compilers field. We attempt to remedy this by exploring the key causes of technical challenges in need of well founded solutions, and by gathering the most relevant ideas and approaches from the PL and compilers communities for easy digestion by database researchers. All query compilers known to us are more or less monolithic template expanders that do the bulk of the compilation task in one large leap. Such systems are hard to build and maintain. We propose to use a stack of multiple DSLs on different levels of abstraction with lowering in multiple steps to make query compilers easier to build and extend, ultimately allowing us to create more convincing and sustainable compiler-based data management systems. We attempt to derive our advice for creating such DSL stacks from widely acceptable principles. We have also re-created a well-known query compiler following these ideas and report on this effort.
Supplemental Material
Available for Download
Rights information
Experiments
- SC - Systems Compiler. http://data.epfl.ch/sc.Google Scholar
- Y. Ahmad and C. Koch. DBToaster: A SQL compiler for high-performance delta processing in main-memory databases. PVLDB, 2(2):1566--1569, 2009. Google ScholarDigital Library
- A. V. Aho, R. Sethi, and J. D. Ullman. Compilers, Principles, Techniques. Addison wesley, 1986. Google ScholarDigital Library
- A. W. Appel. SSA is functional programming. SIGPLAN notices, 33(4):17--20, 1998. Google ScholarDigital Library
- A. W. Appel. Compiling with continuations. Cambridge University Press, 2006. Google ScholarDigital Library
- M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia. Spark SQL: Relational Data Processing in Spark. SIGMOD '15, pages 1383--1394, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- K. Asai, H. Masuhara, and A. Yonezawa. Partial evaluation of call-by-value lambda-calculus with side-effects. PEPM '97, pages 12--21. ACM, 1997. Google ScholarDigital Library
- S. Bauman, C. F. Bolz, R. Hirschfeld, V. Kirilichev, T. Pape, J. G. Siek, and S. Tobin-Hochstadt. Pycket: A Tracing JIT for a Functional Language. ICFP 2015, pages 22--34. ACM, 2015. Google ScholarDigital Library
- C. Beeri and Y. Kornatzky. Algebraic optimization of object-oriented query languages. In ICDT '90, volume 470, pages 72--88. 1990. Google ScholarDigital Library
- C. Binnig, S. Hildenbrand, and F. Farber. Dictionary-based order-preserving string compression for main memory column stores. In SIGMOD '09, pages 283--296. ACM, 2009. Google ScholarDigital Library
- V. Breazu-Tannen, P. Buneman, and L. Wong. Naturally embedded query languages. Springer, 1992.Google ScholarCross Ref
- V. Breazu-Tannen and R. Subrahmanyam. Logical and computational aspects of programming with sets/bags/lists. Springer, 1991.Google ScholarCross Ref
- J. Carette, O. Kiselyov, and C.-C. Shan. Finally tagless, partially evaluated: Tagless staged interpreters for simpler typed languages. Journal of Functional Programming, 19(05):509--543, 2009. Google ScholarDigital Library
- M. M. Chakravarty, G. Keller, and P. Zadarnowski. A functional perspective on SSA optimisation algorithms. Electronic Notes in Theoretical Computer Science, 82(2):347--361, 2004.Google ScholarCross Ref
- D. D. Chamberlin, M. M. Astrahan, M. W. Blasgen, J. Gray, W. F. King III, B. G. Lindsay, R. A. Lorie, J. W. Mehl, T. G. Price, G. R. Putzolu, P. G. Selinger, M. Schkolnick, D. R. Slutz, I. L. Traiger, B. W. Wade, and R. A. Yost. A history and evaluation of system R. CACM, 24(10):632--646, 1981. Google ScholarDigital Library
- J. Cieslewicz and K. A. Ross. Adaptive aggregation on chip multiprocessors. VLDB '07, pages 339--350. ACM, 2007. Google ScholarDigital Library
- C. Click and K. D. Cooper. Combining analyses, combining optimizations. TOPLAS, 17(2):181--196, Mar. 1995. Google ScholarDigital Library
- D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion. from lists to streams to nothing at all. In ICFP '07, 2007. Google ScholarDigital Library
- A. Crotty, A. Galakatos, K. Dursun, T. Kraska, U. Çetintemel, and S. B. Zdonik. Tupleware:" big" data, big analytics, small clusters. In CIDR, 2015.Google Scholar
- R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. TOPLAS, 13(4):451--490, 1991. Google ScholarDigital Library
- A. Darte. On the complexity of loop fusion. Parallel Computing, 26(9):1175 -- 1193, 2000. Google ScholarDigital Library
- L. Fegaras and D. Maier. Optimizing object queries using an effective calculus. ACM Trans. Database Syst., 25(4):457--516, Dec. 2000. Google ScholarDigital Library
- M. Felleisen. On the expressive power of programming languages. In ESOP'90, pages 134--151. Springer, 1990. Google ScholarDigital Library
- J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. TOPLAS, 9(3):319--349, July 1987. Google ScholarDigital Library
- C. Flanagan, A. Sabry, B. F. Duba, and M. Felleisen. The essence of compiling with continuations. In ACM Sigplan Notices, volume 28, pages 237--247. ACM, 1993. Google ScholarDigital Library
- F. Franchetti, Y. Voronenko, and M. Püschel. Formal loop merging for signal transforms. PLDI '05, pages 315--326. Google ScholarDigital Library
- A. Gill, J. Launchbury, and S. L. Peyton Jones. A short cut to deforestation. FPCA, pages 223--232. ACM, 1993. Google ScholarDigital Library
- A. J. Gill. Cheap deforestation for non-strict functional languages. PhD thesis, University of Glasgow, 1996.Google Scholar
- A. Goldberg and R. Paige. Stream processing. LFP '84, pages 53--62, New York, NY, USA, 1984. ACM. Google ScholarDigital Library
- G. Graefe. Query evaluation techniques for large databases. CSUR, 25(2):73--169, June 1993. Google ScholarDigital Library
- G. Graefe. Volcano-an extensible and parallel query evaluation system. IEEE Transactions on Knowledge and Data Engineering, 6(1):120--135, 1994. Google ScholarDigital Library
- J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min. Knowl. Discov., 1(1):29--53, Jan. 1997. Google ScholarDigital Library
- T. Grust, M. Mayr, J. Rittinger, and T. Schreiber. FERRY: database-supported program execution. SIGMOD 2009, pages 1063--1066. ACM. Google ScholarDigital Library
- T. Grust, J. Rittinger, and T. Schreiber. Avalanche-safe LINQ compilation. PVLDB, 3(1--2):162--172, Sept. 2010. Google ScholarDigital Library
- T. Grust and M. Scholl. How to comprehend queries functionally. Journal of Intelligent Information Systems, 12(2--3):191--218, 1999. Google ScholarDigital Library
- P. Hudak. Building domain-specific embedded languages. ACM Comput. Surv., 28(4es), Dec. 1996. Google ScholarDigital Library
- S. Idreos, F. Groffen, N. Nes, S. Manegold, S. Mullender, M. Kersten, et al. Monetdb: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull., 35(1):40--45, 2012.Google Scholar
- N. D. Jones, C. K. Gomard, and P. Sestoft. Partial evaluation and automatic program generation. Prentice Hall, 1993. Google ScholarDigital Library
- S. Jones. Compiling haskell by program transformation: A report from the trenches. In H. Nielson, editor, Programming Languages and Systems - ESOP '96, volume 1058 of Lecture Notes in Computer Science, pages 18--44. Springer Berlin Heidelberg, 1996. Google ScholarCross Ref
- M. Jonnalagedda and S. Stucki. Fold-based fusion as a library: A generative programming pearl. In Proceedings of the 6th ACM SIGPLAN Symposium on Scala, pages 41--50. ACM, 2015. Google ScholarDigital Library
- V. Jovanovic, A. Shaikhha, S. Stucki, V. Nikolaev, C. Koch, and M. Odersky. Yin-Yang: Concealing the deep embedding of DSLs. GPCE 2014, pages 73--82. ACM, 2014. Google ScholarDigital Library
- J. Kam and J. Ullman. Monotone data flow analysis frameworks. Acta Informatica, 7(3):305--317, 1977. Google ScholarDigital Library
- J. B. Kam and J. D. Ullman. Global data flow analysis and iterative algorithms. J. ACM, 23(1):158--171, Jan. 1976. Google ScholarDigital Library
- M. Karpathiotakis, I. Alagiannis, T. Heinis, M. Branco, and A. Ailamaki. Just-in-time data virtualization: Lightweight data management with vida. In CIDR, 2015.Google Scholar
- R. A. Kelsey. A correspondence between continuation passing style and static single assignment form. In ACM SIGPLAN Notices, volume 30, pages 13--22. ACM, 1995. Google ScholarDigital Library
- A. Kennedy. Compiling with continuations, continued. In ACM SIGPLAN Notices, volume 42, pages 177--190, 2007. Google ScholarDigital Library
- K. Kennedy. A survey of data flow analysis techniques. IBM Thomas J. Watson Research Division, 1979.Google Scholar
- K. Kennedy and K. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Languages and Compilers for Parallel Computing, pages 301--320. Springer Berlin Heidelberg, 1994. Google ScholarDigital Library
- W. Kim. On Optimizing an SQL-like Nested Query. ACM Trans. Database Syst., 7(3):443--469, Sept. 1982. Google ScholarDigital Library
- Y. Klonatos, C. Koch, T. Rompf, and H. Chafi. Building efficient query engines in a high-level language. PVLDB, 7(10):853--864, 2014. Google ScholarDigital Library
- Y. Klonatos, A. Nötzli, A. Spielmann, C. Koch, and V. Kuncak. Automatic Synthesis of Out-of-core Algorithms. In ACM SIGMOD, pages 133--144, 2013. Google ScholarDigital Library
- C. Koch. Incremental query evaluation in a ring of databases. PODS 2010, pages 87--98. ACM, 2010. Google ScholarDigital Library
- C. Koch. Abstraction without regret in data management systems. In CIDR, 2013.Google Scholar
- C. Koch. Abstraction without regret in database systems building: a manifesto. IEEE Data Eng. Bull., 37(1):70--79, 2014.Google Scholar
- C. Koch, Y. Ahmad, O. Kennedy, M. Nikolic, A. Nötzli, D. Lupei, and A. Shaikhha. DBToaster: higher-order delta processing for dynamic, frequently fresh views. VLDBJ, 23(2):253--278, 2014. Google ScholarDigital Library
- K. Krikellas, S. Viglas, and M. Cintra. Generating code for holistic query evaluation. In ICDE, pages 613--624, 2010.Google ScholarCross Ref
- V. Leis, P. Boncz, A. Kemper, and T. Neumann. Morsel-driven Parallelism: A NUMA-aware Query Evaluation Framework for the Many-core Age. SIGMOD '14, pages 743--754, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- D. Leivant. Reasoning about functional programs and complexity classes associated with type disciplines. In FOCS, pages 460--469, Nov 1983. Google ScholarDigital Library
- L. Libkin. Expressive Power of SQL. Theor. Comput. Sci., 296(3):379--404, Mar. 2003. Google ScholarDigital Library
- M. Mehta and D. J. DeWitt. Managing intra-operator parallelism in parallel database systems. In VLDB, volume 95, pages 382--394, 1995. Google ScholarDigital Library
- E. Meijer, B. Beckman, and G. Bierman. LINQ: Reconciling Object, Relations and XML in the .NET Framework. SIGMOD '06, pages 706--706. ACM, 2006. Google ScholarDigital Library
- F. Nagel, G. Bierman, and S. D. Viglas. Code generation for efficient query processing in managed runtimes. PVLDB, 7(12):1095--1106. Google ScholarDigital Library
- S. Najd, S. Lindley, J. Svenningsson, and P. Wadler. Everything old is new again: Quoted domain-specific languages. PEPM 2016, pages 25--36, New York, NY, USA, 2016. ACM. Google ScholarDigital Library
- T. Neumann. Efficiently Compiling Efficient Query Plans for Modern Hardware. PVLDB, 4(9):539--550, 2011. Google ScholarDigital Library
- V. Pankratius, F. Schmidt, and G. Garreton. Combining functional and imperative programming for multicore software: An empirical study evaluating Scala and Java. In ICSE 2012, pages 123--133. Google ScholarDigital Library
- B. C. Pierce. Types and programming languages. MIT press, 2002. Google ScholarDigital Library
- M. Puschel, J. M. Moura, J. R. Johnson, D. Padua, M. M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, et al. SPIRAL: code generation for DSP transforms. Proceedings of the IEEE, 93(2):232--275, 2005.Google ScholarCross Ref
- R. Ramakrishnan and J. Gehrke. Database Management Systems. Osborne/McGraw-Hill, 2nd edition, 2000. Google ScholarDigital Library
- G. Ramalingam. The undecidability of aliasing. TOPLAS, 16(5):1467--1471, Sept. 1994. Google ScholarDigital Library
- T. Rompf and M. Odersky. Lightweight Modular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs. CACM, 55(6):121--130, June 2012. Google ScholarDigital Library
- B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Global Value Numbers and Redundant Computations. POPL '88, pages 12--27. ACM, 1988. Google ScholarDigital Library
- O. Shivers and M. Might. Continuations and transducer composition. PLDI '06, pages 295--307. ACM, 2006. Google ScholarDigital Library
- D. G. Spampinato and M. Püschel. A basic linear algebra compiler. CGO '14, pages 23:23--23:32. ACM, 2014. Google ScholarDigital Library
- J. Stanier and D. Watson. Intermediate representations in imperative compilers: A survey. CSUR, 45(3):26:1--26:27, July 2013. Google ScholarDigital Library
- M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran, and S. Zdonik. C-store: A Column-oriented DBMS. VLDB '05, pages 553--564. VLDB Endowment, 2005. Google ScholarDigital Library
- J. Svenningsson. Shortcut fusion for accumulating parameters & zip-like functions. ICFP '02, pages 124--132. ACM, 2002. Google ScholarDigital Library
- W. Taha and T. Sheard. Multi-stage programming with explicit annotations. PEPM '97, pages 203--217, NY, USA, 1997. ACM. Google ScholarDigital Library
- R. Tate, M. Stepp, Z. Tatlock, and S. Lerner. Equality saturation: A new approach to optimization. POPL '09, pages 264--276. ACM. Google ScholarDigital Library
- L. Torczon and K. Cooper. Engineering A Compiler. Morgan Kaufmann Publishers Inc., 2nd edition, 2011. Google ScholarDigital Library
- S.-A.-A. Touati and D. Barthou. On the decidability of phase orde- ring problem in optimizing compilation. In Proceedings of the 3rd Conference on Computing Frontiers, CF '06, pages 147--156, 2006. Google ScholarDigital Library
- Transaction Processing Performance Council. TPC-H, a decision support benchmark. http://www.tpc.org/tpch.Google Scholar
- P. Trinder. Comprehensions, a Query Notation for DBPLs. In Proc. of the 3rd DBPL workshop, DBPL3, pages 55--68, San Francisco, CA, USA, 1992. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- D. Turner. Total functional programming. Journal of Universal Computer Science, 10(7):751--768, 2004.Google Scholar
- S. Viglas, G. M. Bierman, and F. Nagel. Processing Declarative Queries Through Generating Imperative Code in Managed Runtimes. IEEE Data Eng. Bull., 37(1):12--21, 2014.Google Scholar
- E. Visser, Z. Benaissa, and A. Tolmach. Building program optimizers with rewriting strategies. ICFP '98, pages 13--26, 1998. Google ScholarDigital Library
- P. Wadler. Deforestation: Transforming programs to eliminate trees. In ESOP'88, pages 344--358. Springer, 1988. Google ScholarDigital Library
- P. Wadler. Comprehending monads. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming, LFP '90, pages 61--78, New York, NY, USA, 1990. ACM. Google ScholarDigital Library
- T. Würthinger, A. Wöß, L. Stadler, G. Duboscq, D. Simon, and C. Wimmer. Self-optimizing AST Interpreters. DLS '12, pages 73--82, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. NSDI'12. USENIX Association. Google ScholarDigital Library
Index Terms
- How to Architect a Query Compiler
Recommendations
How to Architect a Query Compiler, Revisited
SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataTo leverage modern hardware platforms to their fullest, more and more database systems embrace compilation of query plans to native code. In the research community, there is an ongoing debate about the best way to architect such query compilers. This is ...
Building Efficient Query Engines in a High-Level Language
Best of SIGMOD 2016 Papers and Regular PapersAbstraction without regret refers to the vision of using high-level programming languages for systems development without experiencing a negative impact on performance. A database system designed according to this vision offers both increased ...
Functional pearl: a SQL to C compiler in 500 lines of code
ICFP '15We present the design and implementation of a SQL query processor that outperforms existing database systems and is written in just about 500 lines of Scala code -- a convincing case study that high-level functional programming can handily beat C for ...
Comments