research-article

SQL/MapReduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions

Authors:
Eric Friedman

Aster Data Systems

Aster Data Systems
View Profile

,
Peter Pawlowski

Aster Data Systems

Aster Data Systems
View Profile

,
John Cieslewicz

Aster Data Systems

Aster Data Systems
View Profile

Proceedings of the VLDB Endowment Volume 2 Issue 2pp 1402–1413https://doi.org/10.14778/1687553.1687567

Published:01 August 2009Publication History

Proceedings of the VLDB Endowment

Abstract

A user-defined function (UDF) is a powerful database feature that allows users to customize database functionality. Though useful, present UDFs have numerous limitations, including install-time specification of input and output schema and poor ability to parallelize execution. We present a new approach to implementing a UDF, which we call SQL/MapReduce (SQL/MR), that overcomes many of these limitations. We leverage ideas from the MapReduce programming paradigm to provide users with a straightforward API through which they can implement a UDF in the language of their choice. Moreover, our approach allows maximum flexibility as the output schema of the UDF is specified by the function itself at query plan-time. This means that a SQL/MR function is polymorphic. It can process arbitrary input because its behavior as well as output schema are dynamically determined by information available at query plan-time, such as the function's input schema and arbitrary user-provided parameters. This also increases reusability as the same SQL/MR function can be used on inputs with many different schemas or with different user-specified parameters.

In this paper we describe the motivation for this new approach to UDFs as well as the implementation within Aster Data Systems' nCluster database. We demonstrate that in the context of massively parallel, shared-nothing database systems, this model of computation facilitates highly scalable computation within the database. We also include examples of new applications that take advantage of this novel UDF framework.

References

Apache Software Foundation. Hadoop, March 2009. http://hadoop.apache.org.Google Scholar
Apache Software Foundation. Hive, March 2009. http://hadoop.apache.org/hive/.Google Scholar
Aster Data Systems. Aster nCluster database. White paper, 2008. Available online: www.asterdata.com.Google Scholar
M. Carey and L. Haas. Extensible database management systems. SIGMOD Rec., 19(4):54--60, 1990. Google ScholarDigital Library
R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: Easy and efficient parallel processing of massive data sets. In VLDB, pages 1265--1276, 2008. Google ScholarDigital Library
S. Chaudhuri and K. Shim. Optimization of queries with user-defined predicates. ACM Trans. Database Syst., 24(2):177--228, 1999. Google ScholarDigital Library
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarDigital Library
D. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. Commun. ACM, 35(6):85--98, 1992. Google ScholarDigital Library
M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In SIGMOD Conference, pages 58--66, 2001. Google ScholarDigital Library
J. M. Hellerstein and J. F. Naughton. Query execution techniques for caching expensive methods. In SIGMOD, pages 423--434, 1996. Google ScholarDigital Library
J. M. Hellerstein and M. Stonebraker. Predicate migration: Optimizing queries with expensive predicates. In SIGMOD, pages 267--276, 1993. Google ScholarDigital Library
IBM. IBM DB2 Universal Database Application Development Guide: Programming Server Applications, 2004. Version 8.2.Google Scholar
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys, pages 59--72, 2007. Google ScholarDigital Library
M. Jaedicke and B. Mitschang. On parallel processing of aggregate and scalar functions in object-relational DBMS. In SIGMOD, pages 379--389, 1998. Google ScholarDigital Library
M. Jaedicke and B. Mitschang. User-defined table operators: Enhancing extensibility for ORDBMS. In VLDB, pages 494--505, 1999. Google ScholarDigital Library
Microsoft Corporation. Table-valued user-defined functions, June 2009. http://msdn.microsoft.com/.Google Scholar
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: A not-so-foreign language for data processing. In SIGMOD, pages 1099--1110, 2008. Google ScholarDigital Library
Oracle. Oracle Database PL/SQL Language Reference, 2008. Version 11g Release 1.Google Scholar
M. Stonebraker. Inclusion of new types in relational database systems. In ICDE, pages 262--269, 1986. Google ScholarDigital Library
M. Stonebraker, J. Anton, and E. Hanson. Extending a database system with procedures. ACM Trans. Database Syst., 12(3):350--376, 1987. Google ScholarDigital Library
M. Stonebraker and G. Kemnitz. The POSTGRES next generation database management system. Commun. ACM, 34(10):78--92, 1991. Google ScholarDigital Library
M. Stonebraker, L. A. Rowe, and M. Hirohama. The implementation of POSTGRES. IEEE Trans. Knowl. Data Eng., 2(1):125--142, 1990. Google ScholarDigital Library

Index Terms

SQL/MapReduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 2, Issue 2
August 2009
367 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2009
Published in pvldb Volume 2, Issue 2
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 51
  Total Citations
  View Citations
- 963
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SQL/MapReduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Sql: Learn Basics of Queries and Implement Easily (sql programming, SQL 2016, sql database programming, sql for beginners, sql beginners guide, sql ... sql workbook, sql guide, MSSQL) (Volume 1)

SQL: The Complete Reference, Second Edition

Oracle High-Performance SQL Tuning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

SQL/MapReduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Sql: Learn Basics of Queries and Implement Easily (sql programming, SQL 2016, sql database programming, sql for beginners, sql beginners guide, sql ... sql workbook, sql guide, MSSQL) (Volume 1)

SQL: The Complete Reference, Second Edition

Oracle High-Performance SQL Tuning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media