skip to main content
10.1145/1807128.1807148acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Nephele/PACTs: a programming model and execution framework for web-scale analytical processing

Authors Info & Claims
Published:10 June 2010Publication History

ABSTRACT

We present a parallel data processor centered around a programming model of so called Parallelization Contracts (PACTs) and the scalable parallel execution engine Nephele [18]. The PACT programming model is a generalization of the well-known map/reduce programming model, extending it with further second-order functions, as well as with Output Contracts that give guarantees about the behavior of a function. We describe methods to transform a PACT program into a data flow for Nephele, which executes its sequential building blocks in parallel and deals with communication, synchronization and fault tolerance. Our definition of PACTs allows to apply several types of optimizations on the data flow during the transformation.

The system as a whole is designed to be as generic as (and compatible to) map/reduce systems, while overcoming several of their major weaknesses: 1) The functions map and reduce alone are not sufficient to express many data processing tasks both naturally and efficiently. 2) Map/reduce ties a program to a single fixed execution strategy, which is robust but highly suboptimal for many tasks. 3) Map/reduce makes no assumptions about the behavior of the functions. Hence, it offers only very limited optimization opportunities. With a set of examples and experiments, we illustrate how our system is able to naturally represent and efficiently execute several tasks that do not fit the map/reduce model well.

References

  1. Hadoop. URL: http://hadoop.apache.org.Google ScholarGoogle Scholar
  2. TPC-H. URL: http://www.tpc.org/tpch/.Google ScholarGoogle Scholar
  3. K. Beyer, V. Ercegovac, J. Rao, and E. Shekita. Jaql: A JSON Query Language. URL: http://jaql.org.Google ScholarGoogle Scholar
  4. R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. PVLDB, 1(2):1265--1276, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Delmerico, N. Byrnes, A. Bruno, M. Jones, S. Gallo, and V. Chaudhary. Comparing the Performance of Clusters, Hadoop, and Active Disks on Microarray Correlation Computations. In International Conference on High Performance Computing, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  7. D. J. DeWitt, R. H. Gerber, G. Graefe, M. L. Heytens, K. B. Kumar, and M. Muralikrishna. GAMMA - A High Performance Dataflow Database Machine. In W. W. Chu, G. Gardarin, S. Ohsuga, and Y. Kambayashi, editors, VLDB, pages 228--237. Morgan Kaufmann, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Friedman, P. M. Pawlowski, and J. Cieslewicz. SQL/MapReduce: A Practical Approach to Self-describing, Polymorphic, and Parallelizable User-defined Functions. PVLDB, 2(2):1402--1413, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Fushimi, M. Kitsuregawa, and H. Tanaka. An Overview of The System Software of A Parallel Relational Database Machine GRACE. In W. W. Chu, G. Gardarin, S. Ohsuga, and Y. Kambayashi, editors, VLDB, pages 209--219. Morgan Kaufmann, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. In P. Ferreira, T. R. Gross, and L. Veiga, editors, EuroSys, pages 59--72. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Olston, B. Reed, A. Silberstein, and U. Srivastava. Automatic Optimization of Parallel Dataflow Programs. In R. Isaacs and Y. Zhou, editors, USENIX Annual Technical Conference, pages 267--273. USENIX Association, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A Not-So-Foreign Language for Data Processing. In J. T.-L. Wang, editor, SIGMOD Conference, pages 1099--1110. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A Comparison of Approaches to Large-Scale Data Analysis. In U. Cetintemel, S. B. Zdonik, D. Kossmann, and N. Tatbul, editors, SIGMOD Conference, pages 165--178. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. A. Schneider and D. J. DeWitt. A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment. In J. Clifford, B. G. Lindsay, and D. Maier, editors, SIGMOD Conference, pages 110--121. ACM Press, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access Path Selection in a Relational Database Management System. In P. A. Bernstein, editor, SIGMOD Conference, pages 23--34. ACM, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. W. Stamos and H. C. Young. A Symmetric Fragment and Replicate Algorithm for Distributed Joins. IEEE Trans. Parallel Distrib. Syst., 4(12):1345--1354, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive - A Warehousing Solution Over a Map-Reduce Framework. PVLDB, 2(2):1626--1629, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Warneke and O. Kao. Nephele: Efficient Parallel Data Processing in the Cloud. In I. Raicu, I. T. Foster, and Y. Zhao, editors, SC-MTAGS. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Yang, C. Yen, C. Tan, and S. Madden. Osprey: Implementing MapReduce-Style Fault Tolerance in a Shared-Nothing Distributed Database. In ICDE, 2009.Google ScholarGoogle Scholar
  20. H. Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker. Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters. In C. Y. Chan, B. C. Ooi, and A. Zhou, editors, SIGMOD Conference, pages 1029--1040. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. In R. Draves and R. van Renesse, editors, OSDI, pages 1--14. USENIX Association, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Nephele/PACTs: a programming model and execution framework for web-scale analytical processing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SoCC '10: Proceedings of the 1st ACM symposium on Cloud computing
      June 2010
      264 pages
      ISBN:9781450300360
      DOI:10.1145/1807128

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 June 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate169of722submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader