research-article

Map-reduce extensions and recursive queries

Authors:
Foto N. Afrati

National Technical University, Athens, Greece

National Technical University, Athens, Greece
View Profile

,
Vinayak Borkar

UC Irvine

UC Irvine
View Profile

,
Michael Carey

UC Irvine

UC Irvine
View Profile

,
Neoklis Polyzotis

UC Santa Cruz

UC Santa Cruz
View Profile

,
Jeffrey D. Ullman

Stanford University

Stanford University
View Profile

EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database TechnologyMarch 2011Pages 1–8https://doi.org/10.1145/1951365.1951367

Published:21 March 2011Publication History

EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database Technology

Pages 1–8

ABSTRACT

We survey the recent wave of extensions to the popular map-reduce systems, including those that have begun to address the implementation of recursive queries using the same computing environment as map-reduce. A central problem is that recursive tasks cannot deliver their output only at the end, which makes recovery from failures much more complicated than in map-reduce and its nonrecursive extensions. We propose several algorithmic ideas for efficient implementation of recursions in the map-reduce environment and discuss several alternatives for supporting recovery from failures without restarting the entire job.

References

F. Afrati, V. Borkar, M. Carey, N. Polyzotis, and J. Ullman. Cluster computing, recursion and datalog. To appear in a book based on the Datalog 2.0 Workshop (March, 2010), Oxford GB, to be published by Springer in 2011. Google ScholarDigital Library
F. N. Afrati and C. H. Papadimitriou. The parallel complexity of simple chain queries. In PODS, 1987. Google ScholarDigital Library
F. N. Afrati and J. D. Ullman. Optimizing joins in a map-reduce environment. In EDBT, 2010. Google ScholarDigital Library
Apache. Hadoop. http://hadoop.apache.org/, 2006.Google Scholar
Apache. Hdfs. http://hadoop.apache.org/hdfs/, 2008.Google Scholar
Apache. Hive. http://wiki.apache.org/hadoop/Hive, 2008.Google Scholar
D. Battré, S. Ewen, F. Hueske, O. Kao, V. Markl, and D. Warneke. Nephele/pacts: a programming model and execution framework for web-scale analytical processing. In SoCC '10: Proceedings of the 1st ACM symposium on Cloud computing, pages 119--130, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
V. Borkar, M. Carey, R. Grover, N. Onose, and R. Vernica. Hyracks: A flexible and extensible foundation for data-intensive computing. In Proceedings of the IEEE International Conference on Data Engineering, to appear, 2011. Google ScholarDigital Library
A. Borodin and R. El-Yaniv. Online Computation and Competitive Analysis. Cambridge University Press, 1998. Google ScholarDigital Library
A. Z. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. L. Wiener. Graph structure in the web. Computer Networks, 33(1--6):309--320, 2000. Google ScholarDigital Library
Y. Bu, B. Howe, M. Balazinska, and M. Ernst. Haloop: efficient iterative data processing on large clusters. In VLDB Conference, 2010. Google ScholarDigital Library
R. Chaiken, B. Jenkins, P. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. Scope: easy and efficient parallel processing of massive data sets. In VLDB, 2008. Google ScholarDigital Library
S. Dar and R. Ramakrishnan. A performance study of transitive closure algorithms. In SIGMOD Conference, pages 454--465, 1994. Google ScholarDigital Library
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarDigital Library
D. J. DeWitt, E. Paulson, E. Robinson, J. F. Naughton, J. Royalty, S. Shankar, and A. Krioukov. Clustera: an integrated computation and data management system. PVLDB, 1(1):28--41, 2008. Google ScholarDigital Library
S. Ganguly, A. Silberschatz, and S. Tsur. A framework for the parallel processing of datalog queries. SIGMOD Rec., 19:143--152, May 1990. Google ScholarDigital Library
S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In 19th ACM Symposium on Operating Systems Principles, 2003. Google ScholarDigital Library
J. M. Hellerstein. Datalog redux: experience and conjecture. In PODS, pages 1--2, 2010. Google ScholarDigital Library
Y. E. Ioannidis. On the computation of the transitive closure of relational operators. In Proceedings of the 12th International Conference on Very Large Data Bases, VLDB '86, pages 403--411, San Francisco, CA, USA, 1986. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys '07, 2007. Google ScholarDigital Library
R. Kabler, Y. E. Ioannidis, and M. J. Carey. Performance evaluation of algorithms for transitive closure. Inf. Syst., 17(5):415--441, 1992. Google ScholarDigital Library
M. Lam and et al. Bdd-based deductive database. bddbddb.sourceforge.net, 2008.Google Scholar
G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In SIGMOD Conference, 2010. Google ScholarDigital Library
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD Conference, 2008. Google ScholarDigital Library
S.-W. Seong, M. Nasielski, J. Seo, D. Sengupta, S. Hangal, S. K. Teh, R. Chu, B. Dodson, and M. S. Lam. The architecture and implementation of a decentralized social networking platform. http://prpl.stanford.edu/papers/prpl09.pdf, 2009.Google Scholar
T. Suen and J. Wong. Efficient task migration algorithm for distributed systems. Parallel and Distributed Systems, IEEE Transactions on, 3(4):488--499, jul. 1992. Google ScholarDigital Library
J. D. Ullman. Principles of Database and Knowledge-Base Systems, Volume II. Computer Science Press, 1989. Google ScholarDigital Library
J. D. Ullman and A. V. Gelder. Parallel complexity of logical query programs. In FOCS, 1986. Google ScholarDigital Library
J. D. Ullman and J. Widom. A first course in database systems. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1997. Google ScholarDigital Library
P. Valduriez and H. Boral. Evaluation of recursive queries using join indices. In Expert Database Conf., pages 271--293, 1986.Google Scholar
Y. Yu, M. Isard, D. Fetterly, M. Budiu, lfar Erlingsson, P. K. Gunda, and J. Currey. Dryadlinq: A system for general-purpose distributed data-parallel computing using a high-level language. In R. Draves and R. van Renesse, editors, OSDI, pages 1--14. USENIX Association, 2008. Google ScholarDigital Library

Index Terms

Map-reduce extensions and recursive queries
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
      2. Parallel and distributed DBMSs
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Scale-out beyond map-reduce
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

The amount and variety of data being collected in the enterprise is growing at a staggering pace. The default now is to capture and store any and all data, in anticipation of potential future strategic value, and vast amounts of data are being generated ...
Read More
Map-reduce-merge: simplified relational data processing on large clusters
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

Map-Reduce is a programming model that enables easy development of scalable parallel applications to process a vast amount of data on large clusters of commodity machines. Through a simple interface with two functions, map and reduce, this model ...
Read More
Decidable containment of recursive queries
Database theory

One of the most important reasoning tasks on queries is checking containment, i.e., verifying whether one query yields necessarily a subset of the result of another one. Query containment is crucial in several contexts, such as query optimization, query ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database Technology
March 2011
587 pages
ISBN:9781450305280
DOI:10.1145/1951365
Editors:
Anastasia Ailamaki
EPFL, Switzerland
,
Sihem Amer-Yahia
Yahoo! Research
,
Jignesh Pate
University of Wisconsin-Madison
,
Tore Risch
Uppsala University, Sweden
,
Pierre Senellart
Télécom ParisTech, France
,
Julia Stoyanovich
University of Pennsylvania
Copyright © 2011 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 March 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
datalog
map-reduce
polynomial fringe
recursion
task migration
transitive closure
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate7of10submissions,70%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 47
  Total Citations
  View Citations
- 889
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Map-reduce extensions and recursive queries

EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Scale-out beyond map-reduce

Map-reduce-merge: simplified relational data processing on large clusters

Decidable containment of recursive queries