research-article

Integrating MapReduce and RDBMSs

Authors:
Natalie Gruska

Queen's University, Kingston, ON, Canada

Queen's University, Kingston, ON, Canada
View Profile

,
Patrick Martin

Queen's University, Kingston, ON, Canada

Queen's University, Kingston, ON, Canada
View Profile

CASCON '10: Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative ResearchNovember 2010Pages 212–223https://doi.org/10.1145/1923947.1923970

Published:01 November 2010Publication History

CASCON '10: Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research

Pages 212–223

ABSTRACT

Data processing needs are changing with the ever increasing amounts of both structured and unstructured data. While the processing of structured data typically relies on the well-developed field of relational database management systems (RDBMSs), MapReduce is a programming model developed to cope with processing immense amounts of unstructured data. MapReduce, however, offers features and advantages that can be exploited to process structured data. Several database vendors and researchers have already turned to MapReduce to aid in processing relational data, thus requiring integration of MapReduce and RDBMS technologies. In this paper, we provide a taxonomy to characterize several existing integration methods. Further, we take a detailed look at DBInputFormat which is an interface between Hadoop's MapReduce and a relational database. The challenges posed by such an interface are identified and we provide suggestions for improvement.

References

DBInputFormat. http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/db/DBInputFormat.html.Google Scholar
Hadoop. http://hadoop.apache.org/.Google Scholar
JDBC. http://java.sun.com/javase/technologies/database/.Google Scholar
Vertica. http://www.vertica.com/.Google Scholar
Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel Abadi, Alexander Rasin, and Avi Silberschatz. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. In VLDB'09: Proceedings of the 2009 International Conference on VLDB, August 2009.Google Scholar
Qiming Chen, Andy Therber, Meichun Hsu, Hans Zeller, Bin Zhang, and Ren Wu. Efficiently support MapReduce-like computation models inside parallel DBMS. In IDEAS '09: Proceedings of the 2009 International Database Engineering and Applications Symposium, pages 43--53, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
Cheng T. Chu, Sang K. Kim, Yi A. Lin, Yuanyuan Yu, Gary R. Bradski, Andrew Y. Ng, and Kunle Olukotun. Map-Reduce for machine learning on multi-core. In In Proceedings of Neural Information Processing Systems Conference, pages 281--288. MIT Press, 2006.Google Scholar
Jonathan Cohen. Graph Twiddling in a MapReduce World. Computing in Science and Eng., 11(4):29--41, 2009. Google ScholarDigital Library
Jeffrey Dean and Sanjay Ghemawat. MapReduce: simplified data processing on large clusters. In OSDI'04: Proceedings of the 6th conference on Symposium on Operating Systems Design & Implementation, pages 137--150, Berkeley, CA, USA, 2004. USENIX Association. Google ScholarDigital Library
Jeffrey Dean and Sanjay Ghemawat. MapReduce: a flexible data processing tool. Commun. ACM, 53(1):72--77, 2010. Google ScholarDigital Library
Eric Friedman, Peter Pawlowski, and John Cieslewicz. SQL/MapReduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions. VLDB'09: Proceedings of the 2009 International Conference on VLDB, 2(2):1402--1413, 2009. Google ScholarDigital Library
Greenplum. A Unified Engine for RDBMS and MapReduce. www.greenplum.com/technology/mapreduce/.Google Scholar
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1099--1110, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, and Michael Stonebraker. A comparison of approaches to large-scale data analysis. In SIGMOD '09: Proceedings of the 35th SIGMOD international conference on Management of data, pages 165--178, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. Interpreting the data: Parallel analysis with Sawzall. Sci. Program., 13(4):277--298, 2005. Google ScholarDigital Library
Michael Stonebraker, Daniel Abadi, David J. DeWitt, Sam Madden, Erik Paulson, Andrew Pavlo, and Alexander Rasin. MapReduce and parallel DBMSs: friends or foes? Commun. ACM, 53(1):64--71, 2010. Google ScholarDigital Library
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. Hive: a warehousing solution over a map-reduce framework. VLDB'09: Proceedings of the 2009 International Conference on VLDB, 2(2):1626--1629, 2009. Google ScholarDigital Library

Index Terms

Integrating MapReduce and RDBMSs

Recommendations

MapReduce: Review and open challenges

The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...
Read More
Challenges for MapReduce in Big Data
SERVICES '14: Proceedings of the 2014 IEEE World Congress on Services

In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce ...
Read More
Efficient big data processing in Hadoop MapReduce

This tutorial is motivated by the clear need of many organizations, companies, and researchers to deal with big data volumes efficiently. Examples include web analytics applications, scientific applications, and social networks. A popular data ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CASCON '10: Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
November 2010
482 pages
Conference Chairs:
Joanna Ng
IBM Canada Lab, Toronto
,
Christian Couturier
National Research Council Canada
,
Editors:
Hausi A. Müller
University of Victoria
,
Arthur Ryman
IBM Canada
,
Anatol W. Kark
National Research Council Canada
,
Program Chairs:
Hausi A. Müller,
Arthur Ryman
Sponsors
In-Cooperation
Publisher
IBM Corp.
United States
Publication History
- Published: 1 November 2010
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate24of90submissions,27%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 758
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Integrating MapReduce and RDBMSs

CASCON '10: Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research

ABSTRACT

References

Cited By

Index Terms

Recommendations

MapReduce: Review and open challenges

Challenges for MapReduce in Big Data

Efficient big data processing in Hadoop MapReduce

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Integrating MapReduce and RDBMSs

CASCON '10: Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research

ABSTRACT

References

Cited By

Index Terms

Recommendations

MapReduce: Review and open challenges

Challenges for MapReduce in Big Data

Efficient big data processing in Hadoop MapReduce

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media