skip to main content
10.1145/1951365.1951432acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Big data and cloud computing: current state and future opportunities

Authors Info & Claims
Published:21 March 2011Publication History

ABSTRACT

Scalable database management systems (DBMS)---both for update intensive application workloads as well as decision support systems for descriptive and deep analytics---are a critical part of the cloud infrastructure and play an important role in ensuring the smooth transition of applications from the traditional enterprise infrastructures to next generation cloud infrastructures. Though scalable data management has been a vision for more than three decades and much research has focussed on large scale data management in traditional enterprise setting, cloud computing brings its own set of novel challenges that must be addressed to ensure the success of data management solutions in the cloud environment. This tutorial presents an organized picture of the challenges faced by application developers and DBMS designers in developing and deploying internet scale applications. Our background study encompasses both classes of systems: (i) for supporting update heavy applications, and (ii) for ad-hoc analytics and decision support. We then focus on providing an in-depth analysis of systems for supporting update intensive web-applications and provide a survey of the state-of-the-art in this domain. We crystallize the design choices made by some successful systems large scale database management systems, analyze the application demands and access patterns, and enumerate the desiderata for a cloud-bound DBMS.

References

  1. A. Abouzeid, K. B. Pawlikowski, D. J. Abadi, A. Rasin, and A. Silberschatz. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. PVLDB, 2(1):922--933, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Agrawal, S. Das, and A. E. Abbadi. Big data and cloud computing: New wine or just new bottles? PVLDB, 3(2):1647--1648, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Agrawal, A. El Abbadi, S. Antony, and S. Das. Data Management Challenges in Cloud Computing Infrastructures. In DNIS, pages 1--10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Agrawal, A. Silberstein, B. F. Cooper, U. Srivastava, and R. Ramakrishnan. Asynchronous view maintenance for vlsd databases. In SIGMOD Conference, pages 179--192, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Aulbach, D. Jacobs, A. Kemper, and M. Seibold. A comparison of flexible schemas for software as a service. In SIGMOD, pages 881--888, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Bernstein, C. Rein, and S. Das. Hyder -- A Transactional Record Manager for Shared Flash. In CIDR, 2011.Google ScholarGoogle Scholar
  7. M. Brantner, D. Florescu, D. Graf, D. Kossmann, and T. Kraska. Building a database on S3. In SIGMOD, pages 251--264, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. In OSDI, pages 205--218, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Cohen, B. Dolan, M. Dunlap, J. M. Hellerstein, and C. Welton. Mad skills: New analysis practices for big data. PVLDB, 2(2):1481--1492, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!'s hosted data serving platform. Proc. VLDB Endow., 1(2):1277--1288, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Curino, E. Jones, Y. Zhang, E. Wu, and S. Madden. Relational Cloud: The Case for a Database Service. Technical Report 2010-14, CSAIL, MIT, 2010. http://hdl.handle.net/1721.1/52606.Google ScholarGoogle Scholar
  12. S. Das, S. Agarwal, D. Agrawal, and A. El Abbadi. ElasTraS: An Elastic, Scalable, and Self Managing Transactional Database for the Cloud. Technical Report 2010-04, CS, UCSB, 2010.Google ScholarGoogle Scholar
  13. S. Das, D. Agrawal, and A. El Abbadi. ElasTraS: An Elastic Transactional Data Store in the Cloud. In USENIX HotCloud, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Das, D. Agrawal, and A. El Abbadi. G-Store: A Scalable Data Store for Transactional Multi key Access in the Cloud. In ACM SOCC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Das, S. Nishimura, D. Agrawal, and A. El Abbadi. Live Database Migration for Elasticity in a Multitenant Database for Cloud Platforms. Technical Report 2010-09, CS, UCSB, 2010.Google ScholarGoogle Scholar
  16. S. Das, Y. Sismanis, K. Beyer, R. Gemulla, P. Haas, and J. McPherson. Ricardo: Integrating R and Hadoop. In SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In SOSP, pages 205--220, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. J. Dewitt, S. Ghandeharizadeh, D. A. Schneider, A. Bricker, H. I. Hsiao, and R. Rasmussen. The Gamma Database Machine Project. IEEE Trans. on Knowl. and Data Eng., 2(1):44--62, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. The Apache Hadoop Project. http://hadoop.apache.org/core/, 2009.Google ScholarGoogle Scholar
  21. P. Helland. Life beyond Distributed Transactions: An Apostate's Opinion. In CIDR, pages 132--141, 2007.Google ScholarGoogle Scholar
  22. D. Jacobs and S. Aulbach. Ruminations on multi-tenant databases. In BTW, pages 514--521, 2007.Google ScholarGoogle Scholar
  23. T. Kraska, M. Hentschel, G. Alonso, and D. Kossmann. Consistency Rationing in the Cloud: Pay only when it matters. PVLDB, 2(1):253--264, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. B. Lomet, A. Fekete, G. Weikum, and M. J. Zwilling. Unbundling transaction services in the cloud. In CIDR Perspectives, 2009.Google ScholarGoogle Scholar
  25. A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In SIGMOD, pages 165--178, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Reinwald. Database support for multi-tenant applications. In IEEE Workshop on Information and Software as Services, 2010.Google ScholarGoogle Scholar
  27. J. B. Rothnie Jr., P. A. Bernstein, S. Fox, N. Goodman, M. Hammer, T. A. Landers, C. L. Reeve, D. W. Shipman, and E. Wong. Introduction to a System for Distributed Databases (SDD-1). ACM Trans. Database Syst., 5(1):1--17, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive - A Warehousing Solution Over a Map-Reduce Framework. PVLDB, 2(2):1626--1629, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. H. T. Vo, C. Chen, and B. C. Ooi. Towards elastic transactional cloud storage with range query support. PVLDB, 3(1):506--517, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. W. Vogels. Data access patterns in the amazon.com technology platform. In VLDB, pages 1--1. VLDB Endowment, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. D. Weissman and S. Bobrowski. The design of the force.com multitenant internet application development platform. In SIGMOD, pages 889--896, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. F. Yang, J. Shanmugasundaram, and R. Yerneni. A scalable data platform for a large number of small applications. In CIDR, 2009.Google ScholarGoogle Scholar

Index Terms

  1. Big data and cloud computing: current state and future opportunities

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database Technology
      March 2011
      587 pages
      ISBN:9781450305280
      DOI:10.1145/1951365

      Copyright © 2011 Authors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 March 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate7of10submissions,70%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader