skip to main content
10.1145/1851476.1851593acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Twister: a runtime for iterative MapReduce

Published:21 June 2010Publication History

ABSTRACT

MapReduce programming model has simplified the implementation of many data parallel applications. The simplicity of the programming model and the quality of services provided by many implementations of MapReduce attract a lot of enthusiasm among distributed computing communities. From the years of experience in applying MapReduce to various scientific applications we identified a set of extensions to the programming model and improvements to its architecture that will expand the applicability of MapReduce to more classes of applications. In this paper, we present the programming model and the architecture of Twister an enhanced MapReduce runtime that supports iterative MapReduce computations efficiently. We also show performance comparisons of Twister with other similar runtimes such as Hadoop and DryadLINQ for large scale data parallel applications.

References

  1. }}J. Dean and S. Ghemawat, "MapReduce: simplified data processing on large clusters," Commun. ACM, vol. 51, pp. 107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. }}M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, "Dryad: distributed data-parallel programs from sequential building blocks," presented at the Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, Lisbon, Portugal, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. }}J. Ekanayake, A. Balkir, T. Gunarathne, G. Fox, C. Poulain, N. Araujo, and R. Barga, "DryadLINQ for Scientific Analyses," presented at the 5th IEEE International Conference on e-Science, Oxford UK, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. }}J. Ekanayake, S. Pallickara, and G. Fox, "MapReduce for Data Intensive Scientific Analyses," presented at the Proceedings of the 2008 Fourth IEEE International Conference on eScience, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. }}J. Ekanayake, X. Qiu, T. Gunarathne, S. Beason, and G. Fox, "High Performance Parallel Computing with Clouds and Cloud Technologies," in Cloud Computing and Software Services: Theory and Techniques, ed: CRC Press (Taylor and Francis).Google ScholarGoogle Scholar
  6. }}G. Fox, S.-H. Bae, J. Ekanayake, X. Qiu, and H. Yuan, "Parallel Data Mining from Multicore to Cloudy Grids," presented at the International Advanced Research Workshop on High Performance Computing and Grids (HPC2008), Cetraro, Italy, 2008.Google ScholarGoogle Scholar
  7. }}MPI (Message Passing Interface). Available: http://www-unix.mcs.anl.gov/mpi/Google ScholarGoogle Scholar
  8. }}PVM (Parallel Virtual Machine). Available: http://www.csm.ornl.gov/pvm/Google ScholarGoogle Scholar
  9. }}C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. R. Bradski, A. Y. Ng, and K. Olukotun, "Map-Reduce for Machine Learning on Multicore," in NIPS, ed: MIT Press, 2006, pp. 281--288.Google ScholarGoogle Scholar
  10. }}Apache Hadoop. Available: http://hadoop.apache.org/Google ScholarGoogle Scholar
  11. }}Y. Gu and R. L. Grossman, "Sector and Sphere: the design and implementation of a high-performance data cloud," Philosophical transactions. Series A, Mathematical, physical, and engineering sciences, vol. 367, pp. 2429--2445, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  12. }}Twister: A Runtime for Iterative MapReduce. Available: http://www.iterativemapreduce.org/Google ScholarGoogle Scholar
  13. }}Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and C. J., "DryadLINQ: A System for GeneralPurpose Distributed Data-Parallel Computing Using a HighLevel Language," in Symposium on Operating System Design and Implementation (OSDI), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. }}Disco project. Available: http://discoproject.org/Google ScholarGoogle Scholar
  15. }}S. Ghemawat, H. Gobioff, and S.-T. Leung, "The Google file system," SIGOPS Oper. Syst. Rev., vol. 37, pp. 29--43, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. }}J. B. MacQueen, "Some Methods for Classification and Analysis of MultiVariate Observations," in Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability. vol. 1, L. M. L. Cam and J. Neyman, Eds., ed: University of California Press, 1967.Google ScholarGoogle Scholar
  17. }}K. Rose, E. Gurewwitz, and G. Fox, "A deterministic annealing approach to clustering," Pattern Recogn. Lett., vol. 11, pp. 589--594, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. }}S. Brin and L. Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Available: http://infolab.stanford.edu/~backrub/google.html Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. }}J. de Leeuw, "Applications of convex analysis to multidimensional scaling," Recent Developments in Statistics, pp. 133--145, 1977.Google ScholarGoogle Scholar
  20. }}S. Pallickara and G. Fox, "NaradaBrokering: A Distributed Middleware Framework and Architecture for Enabling Durable Peer-to-Peer Grids," presented at the Middleware 2003, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. }}ActiveMQ. Available: http://activemq.apache.org/Google ScholarGoogle Scholar
  22. }}C. Moretti, H. Bui, K. Hollingsworth, B. Rich, P. Flynn, and D. Thain, "All-Pairs: An Abstraction for Data Intensive Computing on Campus Grids," in IEEE Transactions on Parallel and Distributed Systems, 2010, pp. 33--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. }}O. Gotoh, "An improved algorithm for matching biological sequences," Journal of Molecular Biology vol. 162, pp. 705--708, 1982.Google ScholarGoogle ScholarCross RefCross Ref
  24. }}Source Code. Smith Waterman Software. Available: http://jaligner.sourceforge.net/Google ScholarGoogle Scholar
  25. }}J. Qiu, J. Ekanayake, T. Gunarathne, J. Y. Choi, S.-H. Bae, Y. Ruan, S. Ekanayake, S. Wu, S. Beason, G. Fox, M. Rho, and H. Tang, "Data Intensive Computing for Bioinformatics," in Data Intensive Distributed Computing, ed: IGI Publishers, 2010.Google ScholarGoogle Scholar
  26. }}A. F. A. Smit, R. Hubley, and P. Green. (2004, Repeatmasker. Available: http://www.repeatmasker.orgGoogle ScholarGoogle Scholar
  27. }}J. Jurka, "Repbase Update: a database and an electronic journal of repetitive elements," Trends in Genetics, vol. 6, pp. 418--420, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  28. }}J. Kruskal, "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis," Psychometrika, vol. 29, pp. 1--27, 1964.Google ScholarGoogle ScholarCross RefCross Ref
  29. }}Y. Takane, Young, F. W., & de Leeuw, J., "Nonmetric individual differences multidimensional scaling: an alternating least squares method with optimal scaling features," Psychometrika, vol. 42, pp. 7--67, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  30. }}I. Borg, & Groenen, P. J., Modern Multidimensional Scaling: Theory and Applications: Springer, 2005.Google ScholarGoogle Scholar
  31. }}Y. Zhu, S. Ye, and X. Li, "Distributed PageRank computation based on iterative aggregation-disaggregation methods," presented at the Proceedings of the 14th ACM international conference on Information and knowledge management, Bremen, Germany, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. }}S. Kamvar, T. Haveliwala, C. Manning, and G. Golub, "Exploiting the Block Structure of the Web for Computing PageRank," Stanford InfoLab, Technical Report 2003.Google ScholarGoogle Scholar
  33. }}The Power Method. Available: http://en.wikipedia.org/wiki/Pagerank#Power_MethodGoogle ScholarGoogle Scholar
  34. }}(2009, The ClueWeb09 Dataset. Available: http://boston.lti.cs.cmu.edu/Data/clueweb09/Google ScholarGoogle Scholar
  35. }}C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis, "Evaluating MapReduce for multi-core and multiprocessor systems," in 13th International Symposium on High-Performance Computer Architecture, 2007, pp. 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. }}C. Team. (2009, Condor DAGMan. Available: http://www.cs.wisc.edu/condor/dagman/.Google ScholarGoogle Scholar
  37. }}LINQ Language-Integrated Query. Available: http://msdn.microsoft.com/en-us/netframework/aa904594.aspxGoogle ScholarGoogle Scholar
  38. }}Y. Zhao, M. Hategan, B. Clifford, I. Foster, G. v. Laszewski, V. Nefedova, I. Raicu, T. Stef-Praun, and M. Wilde, "Swift: Fast, Reliable, Loosely Coupled Parallel Computation," in IEEE Congress on Services, 2007, pp. 199--206.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Twister: a runtime for iterative MapReduce

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
            June 2010
            911 pages
            ISBN:9781605589428
            DOI:10.1145/1851476

            Copyright © 2010 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 21 June 2010

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate166of966submissions,17%

            Upcoming Conference

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader