skip to main content
10.1145/1923947.1923970dlproceedingsArticle/Chapter ViewAbstractPublication PagescasconConference Proceedingsconference-collections
research-article

Integrating MapReduce and RDBMSs

Published:01 November 2010Publication History

ABSTRACT

Data processing needs are changing with the ever increasing amounts of both structured and unstructured data. While the processing of structured data typically relies on the well-developed field of relational database management systems (RDBMSs), MapReduce is a programming model developed to cope with processing immense amounts of unstructured data. MapReduce, however, offers features and advantages that can be exploited to process structured data. Several database vendors and researchers have already turned to MapReduce to aid in processing relational data, thus requiring integration of MapReduce and RDBMS technologies. In this paper, we provide a taxonomy to characterize several existing integration methods. Further, we take a detailed look at DBInputFormat which is an interface between Hadoop's MapReduce and a relational database. The challenges posed by such an interface are identified and we provide suggestions for improvement.

References

  1. DBInputFormat. http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/db/DBInputFormat.html.Google ScholarGoogle Scholar
  2. Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  3. JDBC. http://java.sun.com/javase/technologies/database/.Google ScholarGoogle Scholar
  4. Vertica. http://www.vertica.com/.Google ScholarGoogle Scholar
  5. Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel Abadi, Alexander Rasin, and Avi Silberschatz. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. In VLDB'09: Proceedings of the 2009 International Conference on VLDB, August 2009.Google ScholarGoogle Scholar
  6. Qiming Chen, Andy Therber, Meichun Hsu, Hans Zeller, Bin Zhang, and Ren Wu. Efficiently support MapReduce-like computation models inside parallel DBMS. In IDEAS '09: Proceedings of the 2009 International Database Engineering and Applications Symposium, pages 43--53, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cheng T. Chu, Sang K. Kim, Yi A. Lin, Yuanyuan Yu, Gary R. Bradski, Andrew Y. Ng, and Kunle Olukotun. Map-Reduce for machine learning on multi-core. In In Proceedings of Neural Information Processing Systems Conference, pages 281--288. MIT Press, 2006.Google ScholarGoogle Scholar
  8. Jonathan Cohen. Graph Twiddling in a MapReduce World. Computing in Science and Eng., 11(4):29--41, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jeffrey Dean and Sanjay Ghemawat. MapReduce: simplified data processing on large clusters. In OSDI'04: Proceedings of the 6th conference on Symposium on Operating Systems Design & Implementation, pages 137--150, Berkeley, CA, USA, 2004. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jeffrey Dean and Sanjay Ghemawat. MapReduce: a flexible data processing tool. Commun. ACM, 53(1):72--77, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Eric Friedman, Peter Pawlowski, and John Cieslewicz. SQL/MapReduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions. VLDB'09: Proceedings of the 2009 International Conference on VLDB, 2(2):1402--1413, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Greenplum. A Unified Engine for RDBMS and MapReduce. www.greenplum.com/technology/mapreduce/.Google ScholarGoogle Scholar
  13. Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1099--1110, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, and Michael Stonebraker. A comparison of approaches to large-scale data analysis. In SIGMOD '09: Proceedings of the 35th SIGMOD international conference on Management of data, pages 165--178, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. Interpreting the data: Parallel analysis with Sawzall. Sci. Program., 13(4):277--298, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Michael Stonebraker, Daniel Abadi, David J. DeWitt, Sam Madden, Erik Paulson, Andrew Pavlo, and Alexander Rasin. MapReduce and parallel DBMSs: friends or foes? Commun. ACM, 53(1):64--71, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. Hive: a warehousing solution over a map-reduce framework. VLDB'09: Proceedings of the 2009 International Conference on VLDB, 2(2):1626--1629, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Integrating MapReduce and RDBMSs

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image DL Hosted proceedings
            CASCON '10: Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
            November 2010
            482 pages

            Publisher

            IBM Corp.

            United States

            Publication History

            • Published: 1 November 2010

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate24of90submissions,27%
          • Article Metrics

            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader