skip to main content
research-article

SQL/MapReduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions

Published:01 August 2009Publication History
Skip Abstract Section

Abstract

A user-defined function (UDF) is a powerful database feature that allows users to customize database functionality. Though useful, present UDFs have numerous limitations, including install-time specification of input and output schema and poor ability to parallelize execution. We present a new approach to implementing a UDF, which we call SQL/MapReduce (SQL/MR), that overcomes many of these limitations. We leverage ideas from the MapReduce programming paradigm to provide users with a straightforward API through which they can implement a UDF in the language of their choice. Moreover, our approach allows maximum flexibility as the output schema of the UDF is specified by the function itself at query plan-time. This means that a SQL/MR function is polymorphic. It can process arbitrary input because its behavior as well as output schema are dynamically determined by information available at query plan-time, such as the function's input schema and arbitrary user-provided parameters. This also increases reusability as the same SQL/MR function can be used on inputs with many different schemas or with different user-specified parameters.

In this paper we describe the motivation for this new approach to UDFs as well as the implementation within Aster Data Systems' nCluster database. We demonstrate that in the context of massively parallel, shared-nothing database systems, this model of computation facilitates highly scalable computation within the database. We also include examples of new applications that take advantage of this novel UDF framework.

References

  1. Apache Software Foundation. Hadoop, March 2009. http://hadoop.apache.org.Google ScholarGoogle Scholar
  2. Apache Software Foundation. Hive, March 2009. http://hadoop.apache.org/hive/.Google ScholarGoogle Scholar
  3. Aster Data Systems. Aster nCluster database. White paper, 2008. Available online: www.asterdata.com.Google ScholarGoogle Scholar
  4. M. Carey and L. Haas. Extensible database management systems. SIGMOD Rec., 19(4):54--60, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: Easy and efficient parallel processing of massive data sets. In VLDB, pages 1265--1276, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Chaudhuri and K. Shim. Optimization of queries with user-defined predicates. ACM Trans. Database Syst., 24(2):177--228, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. Commun. ACM, 35(6):85--98, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In SIGMOD Conference, pages 58--66, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. M. Hellerstein and J. F. Naughton. Query execution techniques for caching expensive methods. In SIGMOD, pages 423--434, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. M. Hellerstein and M. Stonebraker. Predicate migration: Optimizing queries with expensive predicates. In SIGMOD, pages 267--276, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. IBM. IBM DB2 Universal Database Application Development Guide: Programming Server Applications, 2004. Version 8.2.Google ScholarGoogle Scholar
  13. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys, pages 59--72, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Jaedicke and B. Mitschang. On parallel processing of aggregate and scalar functions in object-relational DBMS. In SIGMOD, pages 379--389, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Jaedicke and B. Mitschang. User-defined table operators: Enhancing extensibility for ORDBMS. In VLDB, pages 494--505, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Microsoft Corporation. Table-valued user-defined functions, June 2009. http://msdn.microsoft.com/.Google ScholarGoogle Scholar
  17. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: A not-so-foreign language for data processing. In SIGMOD, pages 1099--1110, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Oracle. Oracle Database PL/SQL Language Reference, 2008. Version 11g Release 1.Google ScholarGoogle Scholar
  19. M. Stonebraker. Inclusion of new types in relational database systems. In ICDE, pages 262--269, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Stonebraker, J. Anton, and E. Hanson. Extending a database system with procedures. ACM Trans. Database Syst., 12(3):350--376, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Stonebraker and G. Kemnitz. The POSTGRES next generation database management system. Commun. ACM, 34(10):78--92, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Stonebraker, L. A. Rowe, and M. Hirohama. The implementation of POSTGRES. IEEE Trans. Knowl. Data Eng., 2(1):125--142, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SQL/MapReduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image Proceedings of the VLDB Endowment
              Proceedings of the VLDB Endowment  Volume 2, Issue 2
              August 2009
              367 pages

              Publisher

              VLDB Endowment

              Publication History

              • Published: 1 August 2009
              Published in pvldb Volume 2, Issue 2

              Qualifiers

              • research-article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader