skip to main content
10.1145/2457317.2457370acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Enhancing and abstracting scientific workflow provenance for data publishing

Published:18 March 2013Publication History

ABSTRACT

Many scientists are using workflows to systematically design and run computational experiments. Once the workflow is executed, the scientist may want to publish the dataset generated as a result, to be, e.g., reused by other scientists as input to their experiments. In doing so, the scientist needs to curate such dataset by specifying metadata information that describes it, e.g. its derivation history, origins and ownership. To assist the scientist in this task, we explore in this paper the use of provenance traces collected by workflow management systems when enacting workflows. Specifically, we identify the shortcomings of such raw provenance traces in supporting the data publishing task, and propose an approach whereby distilled, yet more informative, provenance traces that are fit for the data publishing task can be derived.

References

  1. Recommended practices for citation of data published through the GBIF network. (May), 2012.Google ScholarGoogle Scholar
  2. Y. Amsterdamer, S. B. Davidson, D. Deutch, T. Milo, J. Stoyanovich, and V. Tannen. Putting lipstick on pig: Enabling database-style workflow provenance. PVLDB, 5(4):346--357, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. K. Anand, S. Bowers, and B. Ludäscher. Provenance browser: Displaying and querying scientific workflow provenance graphs. In ICDE, pages 1201--1204, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  4. R. Bentley, J. M. Brooke, A. Csillaghy, D. Fellows, A. L. Blanc, M. Messerotti, D. Perez-Suarez, G. Pierantoni, and M. Soldati. Helio: Discovery and analysis of data in heliophysics. In eScience, pages 248--255. IEEE Computer Society, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Bhagwat, L. Chiticariu, W. C. Tan, and G. Vijayvargiya. An annotation management system for relational databases. In Proceedings of the 13th VLDB Conference, pages 900--911. Morgan Kaufmann, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. O. Biton, S. Cohen-Boulakia, S. B. Davidson, and C. S. Hara. Querying and Managing Provenance through User Views in Scientific Workflows. 2008 IEEE 24th International Conference on Data Engineering, pages 1072--1081, Apr. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Cheney, L. Chiticariu, and W. C. Tan. Provenance in databases: Why, how, and where. Foundations and Trends in Databases, 1(4):379--474, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. B. Davidson and J. Freire. Provenance and scientific workflows: challenges and opportunities. In SIGMOD Conference, pages 1345--1350, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. V. de Sompel and C. Lagoze. All aboard: toward a machine-friendly scholarly communication system. In The Fourth Paradigm, pages 193--199. 2009.Google ScholarGoogle Scholar
  10. E. Deelman, D. Gannon, M. S. Shields, and I. Taylor. Workflows and e-science: An overview of workflow system features and capabilities. Future Generation Comp. Syst., 25(5):528--540, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. C. Dey, D. Zinn, and B. Ludäscher. Propub: towards a declarative approach for publishing customized, policy-aware provenance. In Proceedings of the 23rd international conference on Scientific and statistical database management, SSDBM'11, pages 225--243, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Francine. Got Data? A Guide to Data Preservation in the Information Age. Communications of the ACM, 51(12):50--56, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Gamble and C. Goble. Quality, trust, and utility of scientific data on the web: Towards a joint model. In Proceedings of the ACM WebSci'11, Koblenz, Germany., June 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Garijo, P. Alper, K. Belhajjame, O. Corcho, C. Goble, and Y. Gil. Common motifs in scientific workflows: An empirical analysis. In In the proceedings of the IEEE eScience Conference. IEEE CS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Hey, S. Tansley, and K. Tolle, editors. The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, 2009.Google ScholarGoogle Scholar
  16. D. Hull, R. Stevens, P. Lord, C. Wroe, and C. Goble. Treating shimantic web syndrome with ontologies. In AKT Workshop on Semantic Web Services, 2004.Google ScholarGoogle Scholar
  17. R. Ikeda, J. Cho, C. Fang, S. Salihoglu, S. Torikai, and J. Widom. Provenance-based debugging and drill-down in data-oriented workflows. In ICDE 2012. Stanford InfoLab. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Ingwersen and V. Chavan. Indicators for the Data Usage Index (DUI): an incentive for publishing primary biodiversity data through global information infrastructure. BMC bioinformatics, 12 Suppl 1(Suppl 15):S3, Dec. 2011.Google ScholarGoogle ScholarCross RefCross Ref
  19. J. Kim, E. Deelman, Y. Gil, G. Mehta, and V. Ratnakar. Provenance trails in the wings-pegasus system. Concurr. Comput.: Pract. Exper., 20(5):587--597, Apr. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. F. Lavoie. Technology Watch Report The Open Archival Information System Reference Model: Introductory Guide. (January), 2004.Google ScholarGoogle Scholar
  21. P. Missier, S. S. Sahoo, J. Zhao, C. A. Goble, and A. P. Sheth. Janus: From workflows to semantic provenance and linked open data. In IPAW, pages 129--141, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  22. P. Missier, S. Soiland-Reyes, S. Owen, W. Tan, A. Nenadic, I. Dunlop, A. Williams, T. Oinn, and C. A. Goble. Taverna, reloaded. In M. Gertz and B. Ludäscher, editors, SSDBM, volume 6187 of Lecture Notes in Computer Science, pages 471--481. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Scheidegger, D. Koop, E. Santos, H. Vo, S. Callahan, J. Freire, and C. Silva. Tackling the provenance challenge one layer at a time. Concurrency and Computation: Practice and Experience, 20(5):473--483, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Enhancing and abstracting scientific workflow provenance for data publishing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          EDBT '13: Proceedings of the Joint EDBT/ICDT 2013 Workshops
          March 2013
          423 pages
          ISBN:9781450315999
          DOI:10.1145/2457317

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 March 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          EDBT '13 Paper Acceptance Rate7of10submissions,70%Overall Acceptance Rate7of10submissions,70%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader