ABSTRACT
Since the development of MapReduce, there have been several efforts to extend data mining and machine learning algorithms for MapReduce. Many of those algorithms are iterative by nature. In order to process them efficiently, Spark as well as research prototypes such as HaLoop, iMapReduce, and Twister are proposed with solutions to iterative computation. In this paper, we thoroughly examine the pros and cons of each system.
- Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. HaLoop: Efficient iterative data processing on large clusters. Proceedings of the VLDB Endowment, 3(1--2):285∼296, 2010. Google ScholarDigital Library
- Y. Zhang, Q. Gao, L. Gao, and C. Wang. iMapreduce: A distributed computing framework for iterative computation. Journal of Grid Computing, 10(1):47∼68, 2012. Google ScholarDigital Library
- J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S. H. Bae, J. Qiu, and G. Fox. Twister: A runtime for iterative MapReduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pages 810∼818, 2010. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, pages 10, 2010. Google ScholarDigital Library
- T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears. MapReduce Online. In Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, pages 21, 2010. Google ScholarDigital Library
- E. Elnikety, T. Elsayed, and H. E. Ramadan. iHadoop: Asynchronous iterations for MapReduce. In Proceedings of the 2011 IEEE 3rd International Conference on Cloud Computing Technology and Science, pages 81∼90, 2011. Google ScholarDigital Library
- G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pages 135∼146, 2010. Google ScholarDigital Library
Recommendations
An Experimental Comparison of Iterative MapReduce Frameworks
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementMapReduce has become a dominant framework in big data analysis, and thus there have been significant efforts to implement various data analysis algorithms in MapReduce. Many data analysis algorithms are inherently iterative, repeating the same set of ...
An experimental analysis of limitations of MapReduce for iterative algorithms on Spark
MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms ...
MapReduce: Review and open challenges
The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...
Comments