skip to main content
research-article

Consolidation of queries with user-defined functions

Published:09 June 2014Publication History
Skip Abstract Section

Abstract

Motivated by streaming and data analytics scenarios where many queries operate on the same data and perform similar computations, we propose program consolidation for merging multiple user-defined functions (UDFs) that operate on the same input. Program consolidation exploits common computations between UDFs to generate an equivalent optimized function whose execution cost is often much smaller (and never greater) than the sum of the costs of executing each function individually. We present a sound consolidation calculus and an effective algorithm for consolidating multiple UDFs. Our approach is purely static and uses symbolic SMT-based techniques to identify shared or redundant computations. We have implemented the proposed technique on top of the Naiad data processing system. Our experiments show that our algorithm dramatically improves overall job completion time when executing user-defined filters that operate on the same data and perform similar computations.

References

  1. Linq project,. URL http://msdn.microsoft.com/en-us/library/vstudio/bb397926.aspx.Google ScholarGoogle Scholar
  2. Msdn linq documentation,. URL http://msdn.microsoft.com/en-us/library/bb669073(v=vs.110).aspx.Google ScholarGoogle Scholar
  3. Storm project. URL http://storm-project.net.Google ScholarGoogle Scholar
  4. Wordcount example. URL https://github.com/MicrosoftResearchSVC/Naiad/blob/release_0.2/Examples/.Google ScholarGoogle Scholar
  5. R.-J. Back. A calculus of refinements for program derivations. Acta Inf., 25(6):593--624, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. Bruno, S. Agarwal, S. Kandula, B. Shi, M. Wu, and J. Zhou. Recurring job optimization in scope. SIGMOD, pages 805--806, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Chaudhuri and K. Shim. Optimization of queries with user-defined predicates. ACM Trans. Database Syst., 24(2):177--228, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: From lists to streams to nothing at all. ICFP, pages 315--326, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. De Moura and N. Bjørner. Z3: An efficient smt solver. In TACAS, pages 337--340. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. W. Dijkstra. A Discipline of Programming. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Finkelstein. Common expression analysis in database applications. SIGMOD '82, pages 235--245, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Giannikis, G. Alonso, and D. Kossmann. SharedDB: killing one thousand queries with one stone. VLDB, 5(6):526--537, Feb. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Gill, J. Launchbury, and S. L. Peyton Jones. A short cut to deforestation. FPCA '93, pages 223--232, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. K. Gunda, L. Ravindranath, C. A. Thekkath, Y. Yu, and L. Zhuang. Nectar: automatic management of data and computation in datacenters. OSDI'10, pages 1--8, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Haghighat and C. D. Polychronopoulos. Symbolic analysis: A basis for parallelization, optimization, and scheduling of programs. In LCPC, pages 567--585, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. C. R. Hehner. Formalization of time and space. Formal Asp. Comput., 10(3):290--306, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  17. F. Hueske, M. Peters, M. J. Sax, A. Rheinländer, R. Bergmann, A. Krettek, and K. Tzoumas. Opening the black boxes in data flow optimization. Proc. VLDB Endow., 5(11):1256--1267, July 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. D. Jones, C. K. Gomard, and P. Sestoft. Partial Evaluation and Automatic Program Generation. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1993. ISBN 0-13-020249-5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. Krikellas, S. Viglas, and M. Cintra. Generating code for holistic query evaluation. In ICDE, pages 613--624, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  20. E. Meijer, M. M. Fokkinga, and R. Paterson. Functional programming with bananas, lenses, envelopes and barbed wire. In FPCA, pages 124--144, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Morgan. Programming from Specifications. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1990. ISBN 0-13-726225-6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Munagala, U. Srivastava, and J. Widom. Optimization of continuous queries with shared expensive filters. PODS'07, pages 215--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. G. Murray, M. Isard, and Y. Yu. Steno: automatic optimization of declarative queries. PLDI '11, pages 121--131, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a timely dataflow system. In SOSP, pages 439--455, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. ICDMW '10, pages 170--177, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD, pages 1099--1110, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Park and A. Segev. Using common subexpressions to optimize multiple queries. In ICDE, pages 311--319, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G. Press. Scaling users, not data: Sisense new take on machine learning and crowdsourcing. URL http://aka.ms/Fzo7sh.Google ScholarGoogle Scholar
  29. W. Pugh. The Omega Test: a fast and practical integer programming algorithm for dependence analysis. CACM, 8:4--13, 1992.Google ScholarGoogle Scholar
  30. T. K. Sellis. Multiple-query optimization. ACM Trans. Database Syst., 13(1):23--52, Mar. 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. N. Silva, P. Larson, and J. Zhou. Exploiting common subexpressions for cloud query processing. ICDE, pages 1337--1348, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. VLDB, 2(2):1626--1629, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. D. Viglas. Just-in-time compilation for sql query processing. Proc. VLDB Endow., 6(11):1190--1191, Aug. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. Gunda, and J. Currey. Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language. OSDI'08, pages 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Zhang, H. Zhou, R. Chen, X. Fan, Z. Guo, H. Lin, J. Y. Li, W. Lin, J. Zhou, and L. Zhou. Optimizing data shuffling in data-parallel computation by understanding user-defined functions. NSDI'12, pages 22--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Zhou, P.-A. Larson, J.-C. Freytag, and W. Lehner. Efficient exploitation of similar subexpressions for query processing. SIGMOD '07, pages 533--544, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Consolidation of queries with user-defined functions

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 49, Issue 6
            PLDI '14
            June 2014
            598 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/2666356
            • Editor:
            • Andy Gill
            Issue’s Table of Contents
            • cover image ACM Conferences
              PLDI '14: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation
              June 2014
              619 pages
              ISBN:9781450327848
              DOI:10.1145/2594291

            Copyright © 2014 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 9 June 2014

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader