skip to main content
10.1145/3006299.3006311acmconferencesArticle/Chapter ViewAbstractPublication PagesbdcatConference Proceedingsconference-collections
research-article

Towards a comprehensive data lifecycle model for big data environments

Authors Info & Claims
Published:06 December 2016Publication History

ABSTRACT

A huge amount of data is constantly being produced in the world. Data coming from the IoT, from scientific simulations, or from any other field of the eScience, are accumulated over historical data sets and set up the seed for future Big Data processing, with the final goal to generate added value and discover knowledge. In such computing processes, data are the main resource; however, organizing and managing data during their entire life cycle becomes a complex research topic. As part of this, Data LifeCycle (DLC) models have been proposed to efficiently organize large and complex data sets, from creation to consumption, in any field, and any scale, for an effective data usage and big data exploitation.

Several DLC frameworks can be found in the literature, each one defined for specific environments and scenarios. However, we realized that there is no global and comprehensive DLC model to be easily adapted to different scientific areas. For this reason, in this paper we describe the Comprehensive Scenario Agnostic Data LifeCycle (COSA-DLC) model, a DLC model which: i) is proved to be comprehensive as it addresses the 6Vs challenges (namely Value, Volume, Variety, Velocity, Variability and Veracity; and ii), it can be easily adapted to any particular scenario and, therefore, fit the requirements of a specific scientific field. In this paper we also include two use cases to illustrate the ease of the adaptation in different scenarios. We conclude that the comprehensive scenario agnostic DLC model provides several advantages, such as facilitating global data management, organization and integration, easing the adaptation to any kind of scenario, guaranteeing good data quality levels and, therefore, saving design time and efforts for the scientific and industrial communities.

References

  1. J. Wang, Y. Tang, M. Nguyen, and I. Altintas, "A Scalable Data Science Workflow Approach for Big Data Bayesian Network Learning," in Proceedings of the 2014 IEEE/ACM International Symposium on Big Data Computing (BDC), 2014, pp. 16--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. Kolias, I. Anagnostopoulos, and E. Kayafas, "A Covering Classification Rule Induction Approach for Big Datasets," in Proceedings of the 2014 IEEE/ACM International Symposium on Big Data Computing, 2014, pp. 45--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Hu, Y. Wen, T.-S. Chua, and X. Li, "Toward scalable systems for big data analytics: A technology tutorial," Journals & Magazines on IEEE Access, vol. 2, pp. 652--687, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  4. Y. Demchenko, Z. Zhao, P. Grosso, A. Wibisono, and C. De Laat, "Addressing big data challenges for scientific data infrastructure," in IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom), 2012, pp. 614--617. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Grunzke, A. Aguilera, W. E. Nagel, et al., "Managing complexity in distributed Data Life Cycles enhancing scientific discovery," in IEEE 11th International Conference on E-Science (e-Science), 2015, pp. 371--380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Levitin and T. Redman, "A model of the data (life) cycles with application to quality," Journal of Information and Software Technology on Elsevier, vol. 35, pp. 217--223, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  7. W. K. Michener and M. B. Jones, "Ecoinformatics: supporting ecology as a data-intensive science," Journal of Trends in ecology & evolution, vol. 27, pp. 85--93, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. Rüegg, C. Gries, B. Bond-Lamberty, et al., "Completing the Data Life Cycle: using information management in macrosystems ecology research," Journal of Frontiers in Ecology and the Environment, vol. 12, pp. 24--30, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  9. J. M. Schopf, "Treating data like software: a case for production quality data," in Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries, 2012, pp. 153--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. Lenhardt, S. Ahalt, B. Blanton, L. Christopherson, and R. Idaszak, "Data management Lifecycle and Software Lifecycle management in the context of conducting science," Journal of Open Research Software, vol. 2, 2014.Google ScholarGoogle Scholar
  11. A. Sinaeepourfard, X. Masip-Bruin, J. Garcia, and E. Marín-Tordera, "A Survey on Data Lifecycle Models: Discussions toward the 6Vs Challenges," Technical Report (UPC-DAC-RR-2015--18), 2015.Google ScholarGoogle Scholar
  12. A. Sinaeepourfard, J. Garcia, X. Masip, et al., "A Comprehensive Scenario Agnostic Data LifeCycle model for an efficient data complexity management," in IEEE 12th International Conference on E-Science (e-Science), Baltimore, USA, 2016.Google ScholarGoogle Scholar
  13. S. Henry, S. Hoon, M. Hwang, D. Lee, and M. D. DeVore, "Engineering trade study: extract, transform, load tools for data migration," in IEEE Conference on Design Symposium, Systems and Information Engineering, 2005, pp. 1--8.Google ScholarGoogle Scholar
  14. S. Kurunji, T. Ge, B. Liu, and C. X. Chen, "Communication cost optimization for cloud Data Warehouse queries," in IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom), 2012, pp. 512--519. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. F. L. F. Almeida and C. Calistru, "The main challenges and issues of big data management," International Journal of Research Studies in Computing, vol. 2, 2012.Google ScholarGoogle Scholar
  16. M. Rouse. (2010). Data Life Cycle management (DLM) definition. Available: Available on: http://searchstorage.techtarget.com/definition/data-life-cycle-management.Google ScholarGoogle Scholar
  17. A. Burton and A. Treloar, "Publish my data: a composition of services from ANDS and ARCS," in IEEE 5th International Conference on E-Science (e-Science), 2009, pp. 164--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. X. Yu and Q. Wen, "A view about cloud data security from data life cycle," in International Conference on Computational Intelligence and Software Engineering (CiSE), 2010, pp. 1--4.Google ScholarGoogle Scholar
  19. J. Starr, P. Willett, L. Federer, C. Horning, and M. L. Bergstrom, "A collaborative framework for data management services: the experience of the University of California," Journal of eScience Librarianship, vol. 1, p. 7, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  20. L. Hsu, R. L. Martin, B. McElroy, K. Litwin-Miller, and W. Kim, "Data management, sharing, and reuse in experimental geomorphology: Challenges, strategies, and scientific opportunities," Journal of Geomorphology, vol. 244, pp. 180--189, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  21. M. Emaldi, O. Peña, J. Lázaro, and D. López-de-Ipiña, "Linked Open Data as the fuel for Smarter Cities," in Modeling and Processing for Next-Generation Big-Data Technologies, ed: Springer, 2015, pp. 443--472.Google ScholarGoogle Scholar
  22. J. Jin, J. Gubbi, S. Marusic, and M. Palaniswami, "An information framework for creating a Smart City through Internet of Things," Journal of Internet of Things Journal on IEEE, vol. 1, pp. 112--121, 2014.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Towards a comprehensive data lifecycle model for big data environments

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          BDCAT '16: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies
          December 2016
          373 pages
          ISBN:9781450346177
          DOI:10.1145/3006299

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 6 December 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate27of93submissions,29%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader