skip to main content
research-article

Breaking the chains: on declarative data analysis and data independence in the big data era

Published:01 August 2014Publication History
Skip Abstract Section

Abstract

Data management research, systems, and technologies have drastically improved the availability of data analysis capabilities, particularly for non-experts, due in part to low-entry barriers and reduced ownership costs (e.g., for data management infrastructures and applications). Major reasons for the widespread success of database systems and today's multi-billion dollar data management market include data independence, separating physical representation and storage from the actual information, and declarative languages, separating the program specification from its intended execution environment. In contrast, today's big data solutions do not offer data independence and declarative specification. As a result, big data technologies are mostly employed in newly-established companies with IT-savvy employees or in large well-established companies with big IT departments. We argue that current big data solutions will continue to fall short of widespread adoption, due to usability problems, despite the fact that in-situ data analytics technologies achieve a good degree of schema independence. In particular, we consider the lack of a declarative specification to be a major road-block, contributing to the scarcity in available data scientists available and limiting the application of big data to the IT-savvy industries. In particular, data scientists currently have to spend a lot of time on tuning their data analysis programs for specific data characteristics and a specific execution environment. We believe that the research community needs to bring the powerful concepts of declarative specification to current data analysis systems, in order to achieve the broad big data technology adoption and effectively deliver the promise that novel big data technologies offer.

References

  1. A. Alexandrov, R. Bergmann, S. Ewen, et al.: "The Stratosphere Platform for Big Data Analytics," VLDB Journal 05/2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Schelter, S. Ewen, K. Tzoumas, et al.: "All Roads Lead to Rome: Optimistic Recovery for Distributed Iterative Data Processing," CIKM 2013: 1919--1928. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Ewen, K. Tzoumas, M. Kaufmann, et al.: "Spinning Fast Iterative Data Flows," PVLDB 5(11): 1268--1279 (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Heimel, V. Markl: "A First Step Towards GPU-assisted Query Optimization," ADMS@VLDB 2012: 33--44.Google ScholarGoogle Scholar
  5. D. Battré, S. Ewen, F. Hueske, et al: "Nephele/PACTs: programming model and execution framework for web-scale analytical processing," SoCC 2010: 119--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Zaharia, M. Chowdhury, M. J. Franklin, et al: "Spark: cluster computing with working sets," HotCloud (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Jiang, G. Chen, B. C. Ooi, K.-L. Tan, S. Wu: "epiC: an Extensible and Scalable System for Processing Big Data," PVLDB 7(7): 541--552 (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Alsubaiee, Y. Altowim, H. Altwaijry, et al: "ASTERIX: An Open Source System for Big Data Management and Analysis." PVLDB 5(12): 1898--1901 (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Stratosphere, http://www.stratosphere.eu, last checked Jul 7, 2014Google ScholarGoogle Scholar
  10. Apache Flink Incubator Project, http://flink.incubator.apache.org/ last checked Jul 7, 2014Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 7, Issue 13
    August 2014
    466 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 August 2014
    Published in pvldb Volume 7, Issue 13

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader