skip to main content
10.1145/2882903.2915244acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections

How to Architect a Query Compiler

Authors Info & Claims
Published:26 June 2016Publication History

Editorial Notes

Computationally Replicable. The experimental results of this paper were replicated by a SIGMOD Review Committee and were found to support the central results reported in the paper. Details of the review process are found here

ABSTRACT

This paper studies architecting query compilers. The state of the art in query compiler construction is lagging behind that in the compilers field. We attempt to remedy this by exploring the key causes of technical challenges in need of well founded solutions, and by gathering the most relevant ideas and approaches from the PL and compilers communities for easy digestion by database researchers. All query compilers known to us are more or less monolithic template expanders that do the bulk of the compilation task in one large leap. Such systems are hard to build and maintain. We propose to use a stack of multiple DSLs on different levels of abstraction with lowering in multiple steps to make query compilers easier to build and extend, ultimately allowing us to create more convincing and sustainable compiler-based data management systems. We attempt to derive our advice for creating such DSL stacks from widely acceptable principles. We have also re-created a well-known query compiler following these ideas and report on this effort.

Skip Supplemental Material Section

Supplemental Material

References

  1. SC - Systems Compiler. http://data.epfl.ch/sc.Google ScholarGoogle Scholar
  2. Y. Ahmad and C. Koch. DBToaster: A SQL compiler for high-performance delta processing in main-memory databases. PVLDB, 2(2):1566--1569, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. V. Aho, R. Sethi, and J. D. Ullman. Compilers, Principles, Techniques. Addison wesley, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. W. Appel. SSA is functional programming. SIGPLAN notices, 33(4):17--20, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. W. Appel. Compiling with continuations. Cambridge University Press, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia. Spark SQL: Relational Data Processing in Spark. SIGMOD '15, pages 1383--1394, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Asai, H. Masuhara, and A. Yonezawa. Partial evaluation of call-by-value lambda-calculus with side-effects. PEPM '97, pages 12--21. ACM, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Bauman, C. F. Bolz, R. Hirschfeld, V. Kirilichev, T. Pape, J. G. Siek, and S. Tobin-Hochstadt. Pycket: A Tracing JIT for a Functional Language. ICFP 2015, pages 22--34. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Beeri and Y. Kornatzky. Algebraic optimization of object-oriented query languages. In ICDT '90, volume 470, pages 72--88. 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Binnig, S. Hildenbrand, and F. Farber. Dictionary-based order-preserving string compression for main memory column stores. In SIGMOD '09, pages 283--296. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. V. Breazu-Tannen, P. Buneman, and L. Wong. Naturally embedded query languages. Springer, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  12. V. Breazu-Tannen and R. Subrahmanyam. Logical and computational aspects of programming with sets/bags/lists. Springer, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  13. J. Carette, O. Kiselyov, and C.-C. Shan. Finally tagless, partially evaluated: Tagless staged interpreters for simpler typed languages. Journal of Functional Programming, 19(05):509--543, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. M. Chakravarty, G. Keller, and P. Zadarnowski. A functional perspective on SSA optimisation algorithms. Electronic Notes in Theoretical Computer Science, 82(2):347--361, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  15. D. D. Chamberlin, M. M. Astrahan, M. W. Blasgen, J. Gray, W. F. King III, B. G. Lindsay, R. A. Lorie, J. W. Mehl, T. G. Price, G. R. Putzolu, P. G. Selinger, M. Schkolnick, D. R. Slutz, I. L. Traiger, B. W. Wade, and R. A. Yost. A history and evaluation of system R. CACM, 24(10):632--646, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Cieslewicz and K. A. Ross. Adaptive aggregation on chip multiprocessors. VLDB '07, pages 339--350. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Click and K. D. Cooper. Combining analyses, combining optimizations. TOPLAS, 17(2):181--196, Mar. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion. from lists to streams to nothing at all. In ICFP '07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Crotty, A. Galakatos, K. Dursun, T. Kraska, U. Çetintemel, and S. B. Zdonik. Tupleware:" big" data, big analytics, small clusters. In CIDR, 2015.Google ScholarGoogle Scholar
  20. R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. TOPLAS, 13(4):451--490, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Darte. On the complexity of loop fusion. Parallel Computing, 26(9):1175 -- 1193, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Fegaras and D. Maier. Optimizing object queries using an effective calculus. ACM Trans. Database Syst., 25(4):457--516, Dec. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Felleisen. On the expressive power of programming languages. In ESOP'90, pages 134--151. Springer, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. TOPLAS, 9(3):319--349, July 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Flanagan, A. Sabry, B. F. Duba, and M. Felleisen. The essence of compiling with continuations. In ACM Sigplan Notices, volume 28, pages 237--247. ACM, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. F. Franchetti, Y. Voronenko, and M. Püschel. Formal loop merging for signal transforms. PLDI '05, pages 315--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Gill, J. Launchbury, and S. L. Peyton Jones. A short cut to deforestation. FPCA, pages 223--232. ACM, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. J. Gill. Cheap deforestation for non-strict functional languages. PhD thesis, University of Glasgow, 1996.Google ScholarGoogle Scholar
  29. A. Goldberg and R. Paige. Stream processing. LFP '84, pages 53--62, New York, NY, USA, 1984. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. Graefe. Query evaluation techniques for large databases. CSUR, 25(2):73--169, June 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. G. Graefe. Volcano-an extensible and parallel query evaluation system. IEEE Transactions on Knowledge and Data Engineering, 6(1):120--135, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min. Knowl. Discov., 1(1):29--53, Jan. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. T. Grust, M. Mayr, J. Rittinger, and T. Schreiber. FERRY: database-supported program execution. SIGMOD 2009, pages 1063--1066. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. T. Grust, J. Rittinger, and T. Schreiber. Avalanche-safe LINQ compilation. PVLDB, 3(1--2):162--172, Sept. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. T. Grust and M. Scholl. How to comprehend queries functionally. Journal of Intelligent Information Systems, 12(2--3):191--218, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. Hudak. Building domain-specific embedded languages. ACM Comput. Surv., 28(4es), Dec. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S. Idreos, F. Groffen, N. Nes, S. Manegold, S. Mullender, M. Kersten, et al. Monetdb: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull., 35(1):40--45, 2012.Google ScholarGoogle Scholar
  38. N. D. Jones, C. K. Gomard, and P. Sestoft. Partial evaluation and automatic program generation. Prentice Hall, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Jones. Compiling haskell by program transformation: A report from the trenches. In H. Nielson, editor, Programming Languages and Systems - ESOP '96, volume 1058 of Lecture Notes in Computer Science, pages 18--44. Springer Berlin Heidelberg, 1996. Google ScholarGoogle ScholarCross RefCross Ref
  40. M. Jonnalagedda and S. Stucki. Fold-based fusion as a library: A generative programming pearl. In Proceedings of the 6th ACM SIGPLAN Symposium on Scala, pages 41--50. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. V. Jovanovic, A. Shaikhha, S. Stucki, V. Nikolaev, C. Koch, and M. Odersky. Yin-Yang: Concealing the deep embedding of DSLs. GPCE 2014, pages 73--82. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. J. Kam and J. Ullman. Monotone data flow analysis frameworks. Acta Informatica, 7(3):305--317, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. B. Kam and J. D. Ullman. Global data flow analysis and iterative algorithms. J. ACM, 23(1):158--171, Jan. 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. Karpathiotakis, I. Alagiannis, T. Heinis, M. Branco, and A. Ailamaki. Just-in-time data virtualization: Lightweight data management with vida. In CIDR, 2015.Google ScholarGoogle Scholar
  45. R. A. Kelsey. A correspondence between continuation passing style and static single assignment form. In ACM SIGPLAN Notices, volume 30, pages 13--22. ACM, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. A. Kennedy. Compiling with continuations, continued. In ACM SIGPLAN Notices, volume 42, pages 177--190, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. K. Kennedy. A survey of data flow analysis techniques. IBM Thomas J. Watson Research Division, 1979.Google ScholarGoogle Scholar
  48. K. Kennedy and K. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Languages and Compilers for Parallel Computing, pages 301--320. Springer Berlin Heidelberg, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. W. Kim. On Optimizing an SQL-like Nested Query. ACM Trans. Database Syst., 7(3):443--469, Sept. 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Y. Klonatos, C. Koch, T. Rompf, and H. Chafi. Building efficient query engines in a high-level language. PVLDB, 7(10):853--864, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Y. Klonatos, A. Nötzli, A. Spielmann, C. Koch, and V. Kuncak. Automatic Synthesis of Out-of-core Algorithms. In ACM SIGMOD, pages 133--144, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. C. Koch. Incremental query evaluation in a ring of databases. PODS 2010, pages 87--98. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. C. Koch. Abstraction without regret in data management systems. In CIDR, 2013.Google ScholarGoogle Scholar
  54. C. Koch. Abstraction without regret in database systems building: a manifesto. IEEE Data Eng. Bull., 37(1):70--79, 2014.Google ScholarGoogle Scholar
  55. C. Koch, Y. Ahmad, O. Kennedy, M. Nikolic, A. Nötzli, D. Lupei, and A. Shaikhha. DBToaster: higher-order delta processing for dynamic, frequently fresh views. VLDBJ, 23(2):253--278, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. K. Krikellas, S. Viglas, and M. Cintra. Generating code for holistic query evaluation. In ICDE, pages 613--624, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  57. V. Leis, P. Boncz, A. Kemper, and T. Neumann. Morsel-driven Parallelism: A NUMA-aware Query Evaluation Framework for the Many-core Age. SIGMOD '14, pages 743--754, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. D. Leivant. Reasoning about functional programs and complexity classes associated with type disciplines. In FOCS, pages 460--469, Nov 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. L. Libkin. Expressive Power of SQL. Theor. Comput. Sci., 296(3):379--404, Mar. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. M. Mehta and D. J. DeWitt. Managing intra-operator parallelism in parallel database systems. In VLDB, volume 95, pages 382--394, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. E. Meijer, B. Beckman, and G. Bierman. LINQ: Reconciling Object, Relations and XML in the .NET Framework. SIGMOD '06, pages 706--706. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. F. Nagel, G. Bierman, and S. D. Viglas. Code generation for efficient query processing in managed runtimes. PVLDB, 7(12):1095--1106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. S. Najd, S. Lindley, J. Svenningsson, and P. Wadler. Everything old is new again: Quoted domain-specific languages. PEPM 2016, pages 25--36, New York, NY, USA, 2016. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. T. Neumann. Efficiently Compiling Efficient Query Plans for Modern Hardware. PVLDB, 4(9):539--550, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. V. Pankratius, F. Schmidt, and G. Garreton. Combining functional and imperative programming for multicore software: An empirical study evaluating Scala and Java. In ICSE 2012, pages 123--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. B. C. Pierce. Types and programming languages. MIT press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. M. Puschel, J. M. Moura, J. R. Johnson, D. Padua, M. M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, et al. SPIRAL: code generation for DSP transforms. Proceedings of the IEEE, 93(2):232--275, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  68. R. Ramakrishnan and J. Gehrke. Database Management Systems. Osborne/McGraw-Hill, 2nd edition, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. G. Ramalingam. The undecidability of aliasing. TOPLAS, 16(5):1467--1471, Sept. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. T. Rompf and M. Odersky. Lightweight Modular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs. CACM, 55(6):121--130, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Global Value Numbers and Redundant Computations. POPL '88, pages 12--27. ACM, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. O. Shivers and M. Might. Continuations and transducer composition. PLDI '06, pages 295--307. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. D. G. Spampinato and M. Püschel. A basic linear algebra compiler. CGO '14, pages 23:23--23:32. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. J. Stanier and D. Watson. Intermediate representations in imperative compilers: A survey. CSUR, 45(3):26:1--26:27, July 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran, and S. Zdonik. C-store: A Column-oriented DBMS. VLDB '05, pages 553--564. VLDB Endowment, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. J. Svenningsson. Shortcut fusion for accumulating parameters & zip-like functions. ICFP '02, pages 124--132. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. W. Taha and T. Sheard. Multi-stage programming with explicit annotations. PEPM '97, pages 203--217, NY, USA, 1997. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. R. Tate, M. Stepp, Z. Tatlock, and S. Lerner. Equality saturation: A new approach to optimization. POPL '09, pages 264--276. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. L. Torczon and K. Cooper. Engineering A Compiler. Morgan Kaufmann Publishers Inc., 2nd edition, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. S.-A.-A. Touati and D. Barthou. On the decidability of phase orde- ring problem in optimizing compilation. In Proceedings of the 3rd Conference on Computing Frontiers, CF '06, pages 147--156, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Transaction Processing Performance Council. TPC-H, a decision support benchmark. http://www.tpc.org/tpch.Google ScholarGoogle Scholar
  82. P. Trinder. Comprehensions, a Query Notation for DBPLs. In Proc. of the 3rd DBPL workshop, DBPL3, pages 55--68, San Francisco, CA, USA, 1992. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. D. Turner. Total functional programming. Journal of Universal Computer Science, 10(7):751--768, 2004.Google ScholarGoogle Scholar
  84. S. Viglas, G. M. Bierman, and F. Nagel. Processing Declarative Queries Through Generating Imperative Code in Managed Runtimes. IEEE Data Eng. Bull., 37(1):12--21, 2014.Google ScholarGoogle Scholar
  85. E. Visser, Z. Benaissa, and A. Tolmach. Building program optimizers with rewriting strategies. ICFP '98, pages 13--26, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. P. Wadler. Deforestation: Transforming programs to eliminate trees. In ESOP'88, pages 344--358. Springer, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. P. Wadler. Comprehending monads. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming, LFP '90, pages 61--78, New York, NY, USA, 1990. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. T. Würthinger, A. Wöß, L. Stadler, G. Duboscq, D. Simon, and C. Wimmer. Self-optimizing AST Interpreters. DLS '12, pages 73--82, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. NSDI'12. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. How to Architect a Query Compiler

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
          June 2016
          2300 pages
          ISBN:9781450335317
          DOI:10.1145/2882903

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 June 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader