skip to main content
10.1145/1951365.1951367acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Map-reduce extensions and recursive queries

Published:21 March 2011Publication History

ABSTRACT

We survey the recent wave of extensions to the popular map-reduce systems, including those that have begun to address the implementation of recursive queries using the same computing environment as map-reduce. A central problem is that recursive tasks cannot deliver their output only at the end, which makes recovery from failures much more complicated than in map-reduce and its nonrecursive extensions. We propose several algorithmic ideas for efficient implementation of recursions in the map-reduce environment and discuss several alternatives for supporting recovery from failures without restarting the entire job.

References

  1. F. Afrati, V. Borkar, M. Carey, N. Polyzotis, and J. Ullman. Cluster computing, recursion and datalog. To appear in a book based on the Datalog 2.0 Workshop (March, 2010), Oxford GB, to be published by Springer in 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. N. Afrati and C. H. Papadimitriou. The parallel complexity of simple chain queries. In PODS, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. N. Afrati and J. D. Ullman. Optimizing joins in a map-reduce environment. In EDBT, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Apache. Hadoop. http://hadoop.apache.org/, 2006.Google ScholarGoogle Scholar
  5. Apache. Hdfs. http://hadoop.apache.org/hdfs/, 2008.Google ScholarGoogle Scholar
  6. Apache. Hive. http://wiki.apache.org/hadoop/Hive, 2008.Google ScholarGoogle Scholar
  7. D. Battré, S. Ewen, F. Hueske, O. Kao, V. Markl, and D. Warneke. Nephele/pacts: a programming model and execution framework for web-scale analytical processing. In SoCC '10: Proceedings of the 1st ACM symposium on Cloud computing, pages 119--130, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. V. Borkar, M. Carey, R. Grover, N. Onose, and R. Vernica. Hyracks: A flexible and extensible foundation for data-intensive computing. In Proceedings of the IEEE International Conference on Data Engineering, to appear, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Borodin and R. El-Yaniv. Online Computation and Competitive Analysis. Cambridge University Press, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Z. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. L. Wiener. Graph structure in the web. Computer Networks, 33(1--6):309--320, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Bu, B. Howe, M. Balazinska, and M. Ernst. Haloop: efficient iterative data processing on large clusters. In VLDB Conference, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Chaiken, B. Jenkins, P. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. Scope: easy and efficient parallel processing of massive data sets. In VLDB, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Dar and R. Ramakrishnan. A performance study of transitive closure algorithms. In SIGMOD Conference, pages 454--465, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. J. DeWitt, E. Paulson, E. Robinson, J. F. Naughton, J. Royalty, S. Shankar, and A. Krioukov. Clustera: an integrated computation and data management system. PVLDB, 1(1):28--41, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Ganguly, A. Silberschatz, and S. Tsur. A framework for the parallel processing of datalog queries. SIGMOD Rec., 19:143--152, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In 19th ACM Symposium on Operating Systems Principles, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. M. Hellerstein. Datalog redux: experience and conjecture. In PODS, pages 1--2, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. E. Ioannidis. On the computation of the transitive closure of relational operators. In Proceedings of the 12th International Conference on Very Large Data Bases, VLDB '86, pages 403--411, San Francisco, CA, USA, 1986. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys '07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Kabler, Y. E. Ioannidis, and M. J. Carey. Performance evaluation of algorithms for transitive closure. Inf. Syst., 17(5):415--441, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Lam and et al. Bdd-based deductive database. bddbddb.sourceforge.net, 2008.Google ScholarGoogle Scholar
  23. G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In SIGMOD Conference, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD Conference, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S.-W. Seong, M. Nasielski, J. Seo, D. Sengupta, S. Hangal, S. K. Teh, R. Chu, B. Dodson, and M. S. Lam. The architecture and implementation of a decentralized social networking platform. http://prpl.stanford.edu/papers/prpl09.pdf, 2009.Google ScholarGoogle Scholar
  26. T. Suen and J. Wong. Efficient task migration algorithm for distributed systems. Parallel and Distributed Systems, IEEE Transactions on, 3(4):488--499, jul. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. D. Ullman. Principles of Database and Knowledge-Base Systems, Volume II. Computer Science Press, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. D. Ullman and A. V. Gelder. Parallel complexity of logical query programs. In FOCS, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. D. Ullman and J. Widom. A first course in database systems. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P. Valduriez and H. Boral. Evaluation of recursive queries using join indices. In Expert Database Conf., pages 271--293, 1986.Google ScholarGoogle Scholar
  31. Y. Yu, M. Isard, D. Fetterly, M. Budiu, lfar Erlingsson, P. K. Gunda, and J. Currey. Dryadlinq: A system for general-purpose distributed data-parallel computing using a high-level language. In R. Draves and R. van Renesse, editors, OSDI, pages 1--14. USENIX Association, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Map-reduce extensions and recursive queries

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader