Skip to main content
Top
Published in: Computing 5/2018

14-11-2017

Mechanisms for provenance collection in scientific workflow systems

Authors: Mehdi Sarikhani, Andrew Wendelborn

Published in: Computing | Issue 5/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Scientific workflow management systems run scientific experiments. They manage sequences of complex process transformations and collect provenance information at various levels of abstraction. Collected provenance information from scientific experiments documents how experimental results are derived from input values along with experimental parameters and workflow configurations. Provenance greatly enhances usability and acceptance of workflow systems among scientists, because provenance allows workflow systems to capture process configuration and behaviour at different levels of detail. On this basis, a sufficient level of collected provenance information enables scientists to validate their hypotheses and make a workflow reproducible. Currently SWfMS’s do not use a standard or portable provenance model for either capturing, storing, querying or representing model. There are a variety of design issues in provenance models and mechanisms in workflow system, owing to the variation of design dimensions in workflow architectures. Given this variety, it seems desirable to classify provenance mechanisms in workflow systems. We aim to survey provenance collection mechanisms, that are either a part of scientific workflow system, or of a software infrastructure that supports collection mechanisms in a scientific workflow system. In this paper, firstly, we identify and define a set of design dimensions and conventions for provenance collection mechanisms in the context of working on scientific workflow systems. After this, we survey a set of scientific workflow projects based on our design dimensions with an emphasis on provenance collection mechanisms. Then, those conventions are used in order to evaluate a number of existing provenance collection mechanisms, presented at the end of this paper. This survey provides an understanding of primary design issues for provenance collection mechanisms along with a set of desirable design dimensions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Ranno F, Shrivastava S (1999) A review of distributed workflow management systems. In: The international joint conference on Work activities coordination and collaboration (WACC99), San Francisco, California Ranno F, Shrivastava S (1999) A review of distributed workflow management systems. In: The international joint conference on Work activities coordination and collaboration (WACC99), San Francisco, California
5.
go back to reference Li H, Yang Y, Shi M (2003) Key issues and experiences in development of distributed workflow management systems. In: Zhou X, Orlowska M, Zhang Y (eds) Web technologies and applications, vol 2642. Lecture notes in computer science. Springer, Berlin, pp 507–512. https://doi.org/10.1007/3-540-36901-5_51 Li H, Yang Y, Shi M (2003) Key issues and experiences in development of distributed workflow management systems. In: Zhou X, Orlowska M, Zhang Y (eds) Web technologies and applications, vol 2642. Lecture notes in computer science. Springer, Berlin, pp 507–512. https://​doi.​org/​10.​1007/​3-540-36901-5_​51
6.
7.
go back to reference Hahn C, Horn S, Jablonski S, Lay R, Neeb J, Schamburger R, Schlundt M Taxonomy of distribution concepts for workflow management. University Erlangen-Nürnberg Hahn C, Horn S, Jablonski S, Lay R, Neeb J, Schamburger R, Schlundt M Taxonomy of distribution concepts for workflow management. University Erlangen-Nürnberg
8.
go back to reference Simmhan YL, Plale B, Gannon D (2005) A survey of data provenance techniques, vol 47405. Indiana University, Bloomington Simmhan YL, Plale B, Gannon D (2005) A survey of data provenance techniques, vol 47405. Indiana University, Bloomington
11.
go back to reference Tao L, Ling L, Xiaolong Z, Kai X, Chao Y (2014) ProvenanceLens: service provenance management in the cloud. In: 2014 international conference on collaborative computing: networking, applications and worksharing (CollaborateCom), 22–25 Oct 2014, pp 275-284 Tao L, Ling L, Xiaolong Z, Kai X, Chao Y (2014) ProvenanceLens: service provenance management in the cloud. In: 2014 international conference on collaborative computing: networking, applications and worksharing (CollaborateCom), 22–25 Oct 2014, pp 275-284
14.
go back to reference Tan YS, Ko RKL, Holmes G (2013) Security and data accountability in distributed systems: a provenance survey. In: 2013 IEEE 10th international conference on high performance computing and communications & 2013 IEEE international conference on embedded and ubiquitous computing (HPCC_EUC), 13–15 Nov 2013, pp 1571–1578. https://doi.org/10.1109/HPCC.and.EUC.2013.221 Tan YS, Ko RKL, Holmes G (2013) Security and data accountability in distributed systems: a provenance survey. In: 2013 IEEE 10th international conference on high performance computing and communications & 2013 IEEE international conference on embedded and ubiquitous computing (HPCC_EUC), 13–15 Nov 2013, pp 1571–1578. https://​doi.​org/​10.​1109/​HPCC.​and.​EUC.​2013.​221
16.
go back to reference Davidson SB, Boulakia SC, Eyal A, Ludäscher B, McPhillips TM, Bowers S, Anand MK, Freire J (2007) Provenance in scientific workflow systems. IEEE Data Eng Bull 30(4):44–50 Davidson SB, Boulakia SC, Eyal A, Ludäscher B, McPhillips TM, Bowers S, Anand MK, Freire J (2007) Provenance in scientific workflow systems. IEEE Data Eng Bull 30(4):44–50
17.
go back to reference Amsterdamer Y, Davidson SB, Deutch D, Milo T, Stoyanovich J, Tannen V (2011) Putting lipstick on pig: enabling database-style workflow provenance. Very Large Data Base (VLDB) Endow 5(4):346–357 Amsterdamer Y, Davidson SB, Deutch D, Milo T, Stoyanovich J, Tannen V (2011) Putting lipstick on pig: enabling database-style workflow provenance. Very Large Data Base (VLDB) Endow 5(4):346–357
19.
go back to reference Stamatogiannakis M, Groth P, Bos H (2015) Looking inside the black-box: capturing data provenance using dynamic instrumentation. In: Ludäscher B, Plale B (eds) Provenance and annotation of data and processes, vol 8628. Lecture notes in computer science. Springer, Switzerland, pp 155–167. https://doi.org/10.1007/978-3-319-16462-5_12 Stamatogiannakis M, Groth P, Bos H (2015) Looking inside the black-box: capturing data provenance using dynamic instrumentation. In: Ludäscher B, Plale B (eds) Provenance and annotation of data and processes, vol 8628. Lecture notes in computer science. Springer, Switzerland, pp 155–167. https://​doi.​org/​10.​1007/​978-3-319-16462-5_​12
20.
go back to reference Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering. EBSE Technical Report Ver. 2:3 Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering. EBSE Technical Report Ver. 2:3
21.
go back to reference Keele S (2007) Guidelines for performing systematic literature reviews in software engineering. In: Technical report, Ver. 2.3 EBSE Technical Report. EBSE Keele S (2007) Guidelines for performing systematic literature reviews in software engineering. In: Technical report, Ver. 2.3 EBSE Technical Report. EBSE
24.
25.
go back to reference Zahedi M, Shahin M, Babar MA (2015) A systematic review of knowledge sharing challenges and practices in global software development. Int J Inf Manag (submiited to) Zahedi M, Shahin M, Babar MA (2015) A systematic review of knowledge sharing challenges and practices in global software development. Int J Inf Manag (submiited to)
28.
go back to reference Chen L, Babar MA, Zhang H (2010) Towards an evidence-based understanding of electronic data sources. Paper presented at the. Proceedings of the 14th international conference on evaluation and assessment in software engineering, UK Chen L, Babar MA, Zhang H (2010) Towards an evidence-based understanding of electronic data sources. Paper presented at the. Proceedings of the 14th international conference on evaluation and assessment in software engineering, UK
34.
go back to reference Simmhan Y, Groth P, Moreau L (2011) Special section: the third provenance challenge on using the open provenance model for interoperability. Future Gener Comput Syst 27(6):737–742CrossRef Simmhan Y, Groth P, Moreau L (2011) Special section: the third provenance challenge on using the open provenance model for interoperability. Future Gener Comput Syst 27(6):737–742CrossRef
35.
go back to reference Crawl D, Wang J, Altintas I (2011) Provenance for MapReduce-based data-intensive workflows. In: 6th workshop on workflows in support of large-scale science, Seattle, Washington, ACM, pp 21–30 Crawl D, Wang J, Altintas I (2011) Provenance for MapReduce-based data-intensive workflows. In: 6th workshop on workflows in support of large-scale science, Seattle, Washington, ACM, pp 21–30
36.
go back to reference Cruz SMS, Paulino CE, Oliveira Dd, Campos MLM, Mattoso M (2011) Capturing distributed provenance metadata from cloud-based scientific workflows. J Inf Data Manag 2(1):43–50 Cruz SMS, Paulino CE, Oliveira Dd, Campos MLM, Mattoso M (2011) Capturing distributed provenance metadata from cloud-based scientific workflows. J Inf Data Manag 2(1):43–50
37.
go back to reference Muniswamy-Reddy K-K, Macko P, Seltzer MI (2010) Provenance for the cloud. In: the 8th USENIX conference on file and storage technologies, San Jose, California, USENIX Association, 1855526, pp 15–14 Muniswamy-Reddy K-K, Macko P, Seltzer MI (2010) Provenance for the cloud. In: the 8th USENIX conference on file and storage technologies, San Jose, California, USENIX Association, 1855526, pp 15–14
38.
go back to reference Muniswamy-Reddy K-K, Macko P, Seltzer MI (2009) Making a cloud provenance-aware. In: Workshop on the theory and practice of provenance, San Francisco, California, USENIX Association Muniswamy-Reddy K-K, Macko P, Seltzer MI (2009) Making a cloud provenance-aware. In: Workshop on the theory and practice of provenance, San Francisco, California, USENIX Association
41.
go back to reference Buchert T, Nussbaum L, Gustedt J (2015) Towards complete tracking of provenance in experimental distributed systems research. In: Hunold S, Costan A, Giménez D et al (eds) Euro-Par 2015: parallel processing workshops: Euro-Par 2015 international workshops, Vienna, Austria, 24–25 August 2015, Revised Selected Papers. Springer, Cham, pp 604-616. https://doi.org/10.1007/978-3-319-27308-2_49 Buchert T, Nussbaum L, Gustedt J (2015) Towards complete tracking of provenance in experimental distributed systems research. In: Hunold S, Costan A, Giménez D et al (eds) Euro-Par 2015: parallel processing workshops: Euro-Par 2015 international workshops, Vienna, Austria, 24–25 August 2015, Revised Selected Papers. Springer, Cham, pp 604-616. https://​doi.​org/​10.​1007/​978-3-319-27308-2_​49
44.
go back to reference Glavic B, Dittrich KR (2007) Data provenance: a categorization of existing approaches. In: Conference on Datenbanksysteme in Buisness, Technologie und Web (BTW), Aachen, Germany, vol 12, pp 227–241 Glavic B, Dittrich KR (2007) Data provenance: a categorization of existing approaches. In: Conference on Datenbanksysteme in Buisness, Technologie und Web (BTW), Aachen, Germany, vol 12, pp 227–241
46.
go back to reference Sarikhani M (2015) An adaptive provenance collection architecture in scientific workflow systems. Ph.D. Thesis, The University of Adelaide, Adelaide, Australia Sarikhani M (2015) An adaptive provenance collection architecture in scientific workflow systems. Ph.D. Thesis, The University of Adelaide, Adelaide, Australia
48.
go back to reference Brooks C, Lee EA, Liu X, Neuendorffer S, Zhao Y, Zheng H (2008) Heterogeneous concurrent modeling and design in Java (volume 3: Ptolemy ii domains). EECS Department, University of California, Berkley, California Brooks C, Lee EA, Liu X, Neuendorffer S, Zhao Y, Zheng H (2008) Heterogeneous concurrent modeling and design in Java (volume 3: Ptolemy ii domains). EECS Department, University of California, Berkley, California
49.
go back to reference Muniswamy-Reddy K-K (2010) Foundations for provenance-aware systems. Harvard University, Cambridge Muniswamy-Reddy K-K (2010) Foundations for provenance-aware systems. Harvard University, Cambridge
50.
go back to reference Anand MK (2010) Managing scientific workflow provenance. Univeristy of California Davis, Davis Anand MK (2010) Managing scientific workflow provenance. Univeristy of California Davis, Davis
52.
go back to reference Moreau L, Freire J, Futrelle J, McGrath R, Myers J, Paulson P (2008) The open provenance model: an overview. In: Freire J, Koop D, Moreau L (eds) Provenance and annotation of data and processes, vol 5272. Lecture notes in computer science. Springer, Berlin, Germany, pp 323–326. https://doi.org/10.1007/978-3-540-89965-5_31 Moreau L, Freire J, Futrelle J, McGrath R, Myers J, Paulson P (2008) The open provenance model: an overview. In: Freire J, Koop D, Moreau L (eds) Provenance and annotation of data and processes, vol 5272. Lecture notes in computer science. Springer, Berlin, Germany, pp 323–326. https://​doi.​org/​10.​1007/​978-3-540-89965-5_​31
53.
go back to reference Sonntag M, Karastoyanova D, Deelman E (2010) Bridging the gap between business and scientific workflows: humans in the loop of scientific workflows. In: Sixth international conference on e-science (e-science 2010), Brisbane, Queensland, Australia, pp 206–213. IEEE. https://doi.org/10.1109/eScience.2010.12 Sonntag M, Karastoyanova D, Deelman E (2010) Bridging the gap between business and scientific workflows: humans in the loop of scientific workflows. In: Sixth international conference on e-science (e-science 2010), Brisbane, Queensland, Australia, pp 206–213. IEEE. https://​doi.​org/​10.​1109/​eScience.​2010.​12
54.
go back to reference Ludäscher B, Weske M, McPhillips T, Bowers S (2009) Scientific workflows: business as usual? In: Dayal U, Eder J, Koehler J, Reijers H (eds) Business process management, vol 5701. Lecture notes in computer science. Springer, Berlin, Germany, pp 31–47. https://doi.org/10.1007/978-3-642-03848-8_4 Ludäscher B, Weske M, McPhillips T, Bowers S (2009) Scientific workflows: business as usual? In: Dayal U, Eder J, Koehler J, Reijers H (eds) Business process management, vol 5701. Lecture notes in computer science. Springer, Berlin, Germany, pp 31–47. https://​doi.​org/​10.​1007/​978-3-642-03848-8_​4
56.
go back to reference Andrews T, Curbera F, Dholakia H, Goland Y, Klein J, Leymann F, Liu K, Roller D, Smith D, Thatte S (2003) Business process execution language for web services. version Andrews T, Curbera F, Dholakia H, Goland Y, Klein J, Leymann F, Liu K, Roller D, Smith D, Thatte S (2003) Business process execution language for web services. version
57.
go back to reference Juric MB, Mathew B, Sarang PG (2006) Business process execution language for web services: an architect and developer’s guide to orchestrating web services using BPEL4WS. Packt Publishing Ltd, Birmingham Juric MB, Mathew B, Sarang PG (2006) Business process execution language for web services: an architect and developer’s guide to orchestrating web services using BPEL4WS. Packt Publishing Ltd, Birmingham
58.
go back to reference Altintas I, Barney O, Jaeger-Frank E (2006) Provenance collection support in the Kepler scientific workflow system. In: Moreau L, Foster I (eds) Provenance and annotation of data, vol 4145. Lecture notes in computer science. Springer, Berlin, Germany, pp 118–132. https://doi.org/10.1007/11890850_14 Altintas I, Barney O, Jaeger-Frank E (2006) Provenance collection support in the Kepler scientific workflow system. In: Moreau L, Foster I (eds) Provenance and annotation of data, vol 4145. Lecture notes in computer science. Springer, Berlin, Germany, pp 118–132. https://​doi.​org/​10.​1007/​11890850_​14
60.
go back to reference Mattoso M, Werner C, Travassos GH, Braganholo V, Ogasawara E, Oliveira D, Cruz SMS, Martinho W, Murta L (2010) Towards supporting the life cycle of large scale scientific experiments. Int J Bus Process Integr Manag 5(1):79–92CrossRef Mattoso M, Werner C, Travassos GH, Braganholo V, Ogasawara E, Oliveira D, Cruz SMS, Martinho W, Murta L (2010) Towards supporting the life cycle of large scale scientific experiments. Int J Bus Process Integr Manag 5(1):79–92CrossRef
62.
go back to reference Cruz SMS, Barros PM, Bisch PM, Campos MLM, Mattoso M (2008) Provenance services for distributed workflows. In: 8th IEEE international symposium on cluster computing and the grid (CCGRID), Lyon, France, 19–22 May 2008. IEEE, pp 526–533. https://doi.org/10.1109/CCGRID.2008.73 Cruz SMS, Barros PM, Bisch PM, Campos MLM, Mattoso M (2008) Provenance services for distributed workflows. In: 8th IEEE international symposium on cluster computing and the grid (CCGRID), Lyon, France, 19–22 May 2008. IEEE, pp 526–533. https://​doi.​org/​10.​1109/​CCGRID.​2008.​73
63.
go back to reference Belhajjame K, Wolstencroft K, Corcho O, Oinn T, Tanoh F, William A, Goble C (2008) Metadata management in the Taverna workflow system. In: 8th IEEE international symposium on cluster computing and the grid, CCGRID’08. IEEE, pp 651–656 Belhajjame K, Wolstencroft K, Corcho O, Oinn T, Tanoh F, William A, Goble C (2008) Metadata management in the Taverna workflow system. In: 8th IEEE international symposium on cluster computing and the grid, CCGRID’08. IEEE, pp 651–656
65.
go back to reference Lim C, Lu S, Chebotko A, Fotouhi F (2010) Prospective and retrospective provenance collection in scientific workflow environments. In: International Conference on Services Computing (SCC), Miami, Florida. IEEE, pp 449–456. https://doi.org/10.1109/SCC.2010.18 Lim C, Lu S, Chebotko A, Fotouhi F (2010) Prospective and retrospective provenance collection in scientific workflow environments. In: International Conference on Services Computing (SCC), Miami, Florida. IEEE, pp 449–456. https://​doi.​org/​10.​1109/​SCC.​2010.​18
66.
go back to reference McPhillips T, Bowers S, Belhajjame K, Ludascher B (2015) Retrospective provenance without a runtime provenance recorder. Paper presented at the proceedings of the 7th USENIX conference on theory and practice of provenance, Edinburgh, Scotland McPhillips T, Bowers S, Belhajjame K, Ludascher B (2015) Retrospective provenance without a runtime provenance recorder. Paper presented at the proceedings of the 7th USENIX conference on theory and practice of provenance, Edinburgh, Scotland
68.
go back to reference Groth PT (2005) On the record: provenance in large scale, open distributed systems. A mini-thesis for transfer from M.Phil. to Ph.D., University of Southampton, Southampton, England Groth PT (2005) On the record: provenance in large scale, open distributed systems. A mini-thesis for transfer from M.Phil. to Ph.D., University of Southampton, Southampton, England
69.
go back to reference Spillane RP, Sears R, Yalamanchili C, Gaikwad S, Chinni M, Zadok E (2009) Story book: an efficient extensible provenance framework. In: Theory and practice of provenance (TaPP’09), San Francisco, California Spillane RP, Sears R, Yalamanchili C, Gaikwad S, Chinni M, Zadok E (2009) Story book: an efficient extensible provenance framework. In: Theory and practice of provenance (TaPP’09), San Francisco, California
70.
go back to reference Vahdat A, Anderson TE (1998) Transparent result caching. In: USENIX annual technical conference, New Orleans, Louisiana Vahdat A, Anderson TE (1998) Transparent result caching. In: USENIX annual technical conference, New Orleans, Louisiana
71.
go back to reference Malik T, Gehani A, Tariq D, Zaffar F (2013) Sketching distributed data provenance. In: Liu Q, Bai Q, Giugni S, Williamson D, Taylor J (eds) Data provenance and data management in eScience. Studies in computational intelligence. Springer, Berlin, Germany, pp 85–107. https://doi.org/10.1007/978-3-642-29931-5_4 Malik T, Gehani A, Tariq D, Zaffar F (2013) Sketching distributed data provenance. In: Liu Q, Bai Q, Giugni S, Williamson D, Taylor J (eds) Data provenance and data management in eScience. Studies in computational intelligence. Springer, Berlin, Germany, pp 85–107. https://​doi.​org/​10.​1007/​978-3-642-29931-5_​4
74.
go back to reference Widom J (2005) Trio: a system for integrated management of data, accuracy, and lineage. In: Conference on innovative data systems research (CIDR), Asilomar, California Widom J (2005) Trio: a system for integrated management of data, accuracy, and lineage. In: Conference on innovative data systems research (CIDR), Asilomar, California
75.
go back to reference Ikeda R, Widom J (2010) Panda: a system for provenance and data. IEEE Data Eng Bull 33(3):42–49 Ikeda R, Widom J (2010) Panda: a system for provenance and data. IEEE Data Eng Bull 33(3):42–49
76.
go back to reference Foster IT, Vöckler J-S, Wilde M, Zhao Y (2003) The virtual data grid: a new model and architecture for data-intensive collaboration. In: Conference on innovative data systems research (CIDR), Asilomar, California, pp 18–29 Foster IT, Vöckler J-S, Wilde M, Zhao Y (2003) The virtual data grid: a new model and architecture for data-intensive collaboration. In: Conference on innovative data systems research (CIDR), Asilomar, California, pp 18–29
78.
go back to reference Simmhan YL, Plale B, Gannon D (2008) Karma2: provenance management for data-driven workflows. Int J Web Serv Res 5(2):1–22CrossRef Simmhan YL, Plale B, Gannon D (2008) Karma2: provenance management for data-driven workflows. Int J Web Serv Res 5(2):1–22CrossRef
79.
go back to reference Lanter DP (1990) Lineage in gis: The problem and a solution. In: National center for geographic information and analysis (NCGIA), Santa Barbara, California Lanter DP (1990) Lineage in gis: The problem and a solution. In: National center for geographic information and analysis (NCGIA), Santa Barbara, California
80.
go back to reference Hasan R, Sion R, Winslett M (2009) The case of the fake Picasso: preventing history forgery with secure provenance. Paper presented at the proccedings of the 7th conference on file and storage technologies, San Francisco, California Hasan R, Sion R, Winslett M (2009) The case of the fake Picasso: preventing history forgery with secure provenance. Paper presented at the proccedings of the 7th conference on file and storage technologies, San Francisco, California
81.
go back to reference Asghar MR, Ion M, Russello G, Crispo B (2012) Securing data provenance in the cloud. In: Camenisch J, Kesdogan D (eds) Open problems in network security: IFIP WG 11.4 international workshop, iNetSec 2011, Lucerne, Switzerland, June 9, 2011, Revised Selected Papers. Springer, Berlin, pp 145–160. https://doi.org/10.1007/978-3-642-27585-2_12 Asghar MR, Ion M, Russello G, Crispo B (2012) Securing data provenance in the cloud. In: Camenisch J, Kesdogan D (eds) Open problems in network security: IFIP WG 11.4 international workshop, iNetSec 2011, Lucerne, Switzerland, June 9, 2011, Revised Selected Papers. Springer, Berlin, pp 145–160. https://​doi.​org/​10.​1007/​978-3-642-27585-2_​12
82.
go back to reference Murta L, Braganholo V, Chirigati F, Koop D, Freire J (2015) noWorkflow: capturing and analyzing provenance of scripts. In: Ludäscher B, Plale B (eds) Provenance and annotation of data and processes: 5th international provenance and annotation workshop, IPAW 2014, Cologne, Germany, 9–13 June 2014. Revised selected papers. Springer International Publishing, Cham, pp 71–83. https://doi.org/10.1007/978-3-319-16462-5_6 Murta L, Braganholo V, Chirigati F, Koop D, Freire J (2015) noWorkflow: capturing and analyzing provenance of scripts. In: Ludäscher B, Plale B (eds) Provenance and annotation of data and processes: 5th international provenance and annotation workshop, IPAW 2014, Cologne, Germany, 9–13 June 2014. Revised selected papers. Springer International Publishing, Cham, pp 71–83. https://​doi.​org/​10.​1007/​978-3-319-16462-5_​6
83.
go back to reference McPhillips T, Song T, Kolisnik T, Aulenbach S, Belhajjame K, Bocinsky K, Cao Y, Chirigati F, Dey S, Freire J (2015) YesWorkflow: a user-oriented, language-independent tool for recovering workflow information from scripts. arXiv preprint arXiv:1502.02403 McPhillips T, Song T, Kolisnik T, Aulenbach S, Belhajjame K, Bocinsky K, Cao Y, Chirigati F, Dey S, Freire J (2015) YesWorkflow: a user-oriented, language-independent tool for recovering workflow information from scripts. arXiv preprint arXiv:​1502.​02403
84.
go back to reference Reynolds P, Killian C, Wiener JL, Mogul JC, Shah MA, Vahdat A (2006) Pip: detecting the unexpected in distributed systems. Paper presented at the proceedings of the 3rd conference on networked systems design & implementation—volume 3, San Jose, CA Reynolds P, Killian C, Wiener JL, Mogul JC, Shah MA, Vahdat A (2006) Pip: detecting the unexpected in distributed systems. Paper presented at the proceedings of the 3rd conference on networked systems design & implementation—volume 3, San Jose, CA
85.
go back to reference Singh A, Maniatis P, Roscoe T, Druschel P (2006) Using queries for distributed monitoring and forensics. Paper presented at the Proceedings of the 1st ACM SIGOPS/EuroSys European conference on computer systems 2006, Leuven, Belgium Singh A, Maniatis P, Roscoe T, Druschel P (2006) Using queries for distributed monitoring and forensics. Paper presented at the Proceedings of the 1st ACM SIGOPS/EuroSys European conference on computer systems 2006, Leuven, Belgium
86.
go back to reference Ruth P, Xu D, Bhargava B, Regnier F (2004) E-notebook middleware for accountability and reputation based trust in distributed data sharing communities. In: Jensen C, Poslad S, Dimitrakos T (eds) Trust management: second international conference, iTrust 2004, Oxford, UK, March 29–April 1, 2004. Proceedings. Springer, Berlin, pp 161–175. https://doi.org/10.1007/978-3-540-24747-0_13 Ruth P, Xu D, Bhargava B, Regnier F (2004) E-notebook middleware for accountability and reputation based trust in distributed data sharing communities. In: Jensen C, Poslad S, Dimitrakos T (eds) Trust management: second international conference, iTrust 2004, Oxford, UK, March 29–April 1, 2004. Proceedings. Springer, Berlin, pp 161–175. https://​doi.​org/​10.​1007/​978-3-540-24747-0_​13
88.
go back to reference Otero C (2012) Software engineering design: theory and practice, 1st edn. CRC Press, Boca Raton Otero C (2012) Software engineering design: theory and practice, 1st edn. CRC Press, Boca Raton
89.
go back to reference Aktas MS, Plale B, Leake D, Mukhi NK (2013) Unmanaged workflows: their provenance and use. In: Liu Q, Bai Q, Giugni S, Williamson D, Taylor J (eds) Data provenance and data management in eScience. Studies in computational intelligence. Springer, Berlin, Germany, pp 59–81. https://doi.org/10.1007/978-3-642-29931-5_3 Aktas MS, Plale B, Leake D, Mukhi NK (2013) Unmanaged workflows: their provenance and use. In: Liu Q, Bai Q, Giugni S, Williamson D, Taylor J (eds) Data provenance and data management in eScience. Studies in computational intelligence. Springer, Berlin, Germany, pp 59–81. https://​doi.​org/​10.​1007/​978-3-642-29931-5_​3
90.
go back to reference De Nies T, Coppens S, Van Deursen D, Mannens E, Van de Walle R (2012) Automatic discovery of high-level provenance using semantic similarity. In: Groth P, Frew J (eds) Provenance and annotation of data and processes. Lecture notes in computer science. Springer, Berlin, pp 97–110. https://doi.org/10.1007/978-3-642-34222-6_8 De Nies T, Coppens S, Van Deursen D, Mannens E, Van de Walle R (2012) Automatic discovery of high-level provenance using semantic similarity. In: Groth P, Frew J (eds) Provenance and annotation of data and processes. Lecture notes in computer science. Springer, Berlin, pp 97–110. https://​doi.​org/​10.​1007/​978-3-642-34222-6_​8
92.
go back to reference Tariq D, Ali M, Gehani A (2012) Towards automated collection of application-level data provenance. In: Theory and practice of provenance (TaPP’12), Boston, Massachusetts. USENIX Association Tariq D, Ali M, Gehani A (2012) Towards automated collection of application-level data provenance. In: Theory and practice of provenance (TaPP’12), Boston, Massachusetts. USENIX Association
94.
go back to reference Simmhan Y, Barga R, Van Ingen C, Lazowska E, Szalay A (2009) Building the trident scientific workflow workbench for data management in the cloud. In: 3rd international conference on advanced engineering computing and applications in sciences (ADVCOMP), pp 41–50. https://doi.org/10.1109/ADVCOMP.2009.14 Simmhan Y, Barga R, Van Ingen C, Lazowska E, Szalay A (2009) Building the trident scientific workflow workbench for data management in the cloud. In: 3rd international conference on advanced engineering computing and applications in sciences (ADVCOMP), pp 41–50. https://​doi.​org/​10.​1109/​ADVCOMP.​2009.​14
95.
go back to reference Barga R, Simmhan Y, Withana EC, Sahoo S, Jackson J, Araujo N (2010) Provenance for scientific workflows towards reproducible research. IEEE Data Eng Bull 33:50–59 Barga R, Simmhan Y, Withana EC, Sahoo S, Jackson J, Araujo N (2010) Provenance for scientific workflows towards reproducible research. IEEE Data Eng Bull 33:50–59
97.
go back to reference Freeman E, Robson E, Bates B, Sierra K (2004) Head first design patterns. O’Reilly Media, Inc., Sebastopol, CA, USA Freeman E, Robson E, Bates B, Sierra K (2004) Head first design patterns. O’Reilly Media, Inc., Sebastopol, CA, USA
98.
go back to reference Forman IR, Forman N (2004) Java reflection in action. Manning Publications Co., Greenwich, CT, USA Forman IR, Forman N (2004) Java reflection in action. Manning Publications Co., Greenwich, CT, USA
99.
100.
go back to reference Oliva A, Garcia IC, Buzato LE (1998) The reflective architecture of Guaraná. State University of Campinas, Sao Paulo Oliva A, Garcia IC, Buzato LE (1998) The reflective architecture of Guaraná. State University of Campinas, Sao Paulo
101.
go back to reference Corradi A, Lodolo E, Monti S, Pasini S (2009) Dynamic reconfiguration of middleware for ubiquitous computing. In: the 3rd international workshop on Adaptive and dependable mobile ubiquitous systems, London, UK. ACM, pp 7–12 Corradi A, Lodolo E, Monti S, Pasini S (2009) Dynamic reconfiguration of middleware for ubiquitous computing. In: the 3rd international workshop on Adaptive and dependable mobile ubiquitous systems, London, UK. ACM, pp 7–12
103.
go back to reference Coulson G (2001) What is reflective middleware. IEEE Distrib Syst Online 2(8):165–169 Coulson G (2001) What is reflective middleware. IEEE Distrib Syst Online 2(8):165–169
104.
go back to reference Barbosa R, Pinho LM (2004) Monitoring of real time systems: a case for reflection. Polytechnic Institute of Porto, Porto Barbosa R, Pinho LM (2004) Monitoring of real time systems: a case for reflection. Polytechnic Institute of Porto, Porto
107.
go back to reference Webb D, Wendelborn A (2003) The PAGIS grid application environment. In: Sloot PA, Abramson D, Bogdanov A, Gorbachev Y, Dongarra J, Zomaya A (eds) Computational science—ICCS 2003, vol 2659. Lecture notes in computer science. Springer, Berlin, pp 1113–1122. https://doi.org/10.1007/3-540-44863-2_110 Webb D, Wendelborn A (2003) The PAGIS grid application environment. In: Sloot PA, Abramson D, Bogdanov A, Gorbachev Y, Dongarra J, Zomaya A (eds) Computational science—ICCS 2003, vol 2659. Lecture notes in computer science. Springer, Berlin, pp 1113–1122. https://​doi.​org/​10.​1007/​3-540-44863-2_​110
108.
go back to reference Lopes CV (2002) Aspect-oriented programming: an historical perspective (what’s in a name?). University of California, Irvine Lopes CV (2002) Aspect-oriented programming: an historical perspective (what’s in a name?). University of California, Irvine
110.
go back to reference Elrad T, Aksit M, Kiczales G, Lieberherr KJ, Ossher H (2001) Discussing aspects of AOP. Commun ACM 44(10):33–38CrossRef Elrad T, Aksit M, Kiczales G, Lieberherr KJ, Ossher H (2001) Discussing aspects of AOP. Commun ACM 44(10):33–38CrossRef
113.
go back to reference Moreau L, Ludäscher B, Altintas I, Barga RS, Bowers S, Callahan S, Chin G, Clifford B, Cohen S, Cohen-Boulakia S, Davidson S, Deelman E, Digiampietri L, Foster I, Freire J, Frew J, Futrelle J, Gibson T, Gil Y, Goble C, Golbeck J, Groth P, Holland DA, Jiang S, Kim J, Koop D, Krenek A, McPhillips T, Mehta G, Miles S, Metzger D, Munroe S, Myers J, Plale B, Podhorszki N, Ratnakar V, Santos E, Scheidegger C, Schuchardt K, Seltzer M, Simmhan YL, Silva C, Slaughter P, Stephan E, Stevens R, Turi D, Vo H, Wilde M, Zhao J, Zhao Y (2008) Special issue: the first provenance challenge. Concurr Comput Pract Exp 20(5):409–418. https://doi.org/10.1002/cpe.1233 CrossRef Moreau L, Ludäscher B, Altintas I, Barga RS, Bowers S, Callahan S, Chin G, Clifford B, Cohen S, Cohen-Boulakia S, Davidson S, Deelman E, Digiampietri L, Foster I, Freire J, Frew J, Futrelle J, Gibson T, Gil Y, Goble C, Golbeck J, Groth P, Holland DA, Jiang S, Kim J, Koop D, Krenek A, McPhillips T, Mehta G, Miles S, Metzger D, Munroe S, Myers J, Plale B, Podhorszki N, Ratnakar V, Santos E, Scheidegger C, Schuchardt K, Seltzer M, Simmhan YL, Silva C, Slaughter P, Stephan E, Stevens R, Turi D, Vo H, Wilde M, Zhao J, Zhao Y (2008) Special issue: the first provenance challenge. Concurr Comput Pract Exp 20(5):409–418. https://​doi.​org/​10.​1002/​cpe.​1233 CrossRef
118.
go back to reference Altintas I, Berkley C, Jaeger E, Jones M, Ludascher B, Mock S (2004) Kepler: an extensible system for design and execution of scientific workflows. In: 16th international conference on scientific and statistical database management, 21–23 June 2004. IEEE, pp 423–424. https://doi.org/10.1109/SSDM.2004.1311241 Altintas I, Berkley C, Jaeger E, Jones M, Ludascher B, Mock S (2004) Kepler: an extensible system for design and execution of scientific workflows. In: 16th international conference on scientific and statistical database management, 21–23 June 2004. IEEE, pp 423–424. https://​doi.​org/​10.​1109/​SSDM.​2004.​1311241
121.
go back to reference Lin C, Lu S, Lai Z, Chebotko A, Fei X, Hua J, Fotouhi F (2008) Service-oriented architecture for VIEW: a visual scientific workflow management system. In: IEEE international conference on services computing (SCC’08), Honolulu, Hawaii. IEEE, pp 335–342 Lin C, Lu S, Lai Z, Chebotko A, Fei X, Hua J, Fotouhi F (2008) Service-oriented architecture for VIEW: a visual scientific workflow management system. In: IEEE international conference on services computing (SCC’08), Honolulu, Hawaii. IEEE, pp 335–342
122.
go back to reference Simmhan Y, Plale B, Gannon D, Marru S (2006) Performance evaluation of the karma provenance framework for scientific workflows. In: Moreau L, Foster I (eds) Provenance and annotation of data, vol 4145. Lecture notes in computer science. Springer, Berlin, pp 222–236. https://doi.org/10.1007/11890850_23 Simmhan Y, Plale B, Gannon D, Marru S (2006) Performance evaluation of the karma provenance framework for scientific workflows. In: Moreau L, Foster I (eds) Provenance and annotation of data, vol 4145. Lecture notes in computer science. Springer, Berlin, pp 222–236. https://​doi.​org/​10.​1007/​11890850_​23
123.
go back to reference Wolstencroft K, Haines R, Fellows D, Williams A, Withers D, Owen S, Soiland-Reyes S, Dunlop I, Nenadic A, Fisher P (2013) The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucl Acids Res 41(W1):W557–W561CrossRef Wolstencroft K, Haines R, Fellows D, Williams A, Withers D, Owen S, Soiland-Reyes S, Dunlop I, Nenadic A, Fisher P (2013) The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucl Acids Res 41(W1):W557–W561CrossRef
124.
go back to reference Oinn T, Greenwood M, Addis M, Alpdemir MN, Ferris J, Glover K, Goble C, Goderis A, Hull D, Marvin D (2006) Taverna: lessons in creating a workflow environment for the life sciences. Concurr Comput Pract Exp 18(10):1067–1100CrossRef Oinn T, Greenwood M, Addis M, Alpdemir MN, Ferris J, Glover K, Goble C, Goderis A, Hull D, Marvin D (2006) Taverna: lessons in creating a workflow environment for the life sciences. Concurr Comput Pract Exp 18(10):1067–1100CrossRef
125.
go back to reference Marinho A, Murta L, Werner C, Braganholo V, Ogasawara E, Cruz SMS, Mattoso M (2010) Integrating provenance data from distributed workflow systems with ProvManager. In: Provenance and annotation of data and processes. Springer, pp 286–288 Marinho A, Murta L, Werner C, Braganholo V, Ogasawara E, Cruz SMS, Mattoso M (2010) Integrating provenance data from distributed workflow systems with ProvManager. In: Provenance and annotation of data and processes. Springer, pp 286–288
126.
go back to reference Marinho A, Murta L, Werner C, Braganholo V, Cruz SMS, Ogasawara E, Mattoso M (2010) Managing provenance in scientific workflows with ProvManager. In: International workshop on challenges in e-Science (CIS2010), Petrópolis, Rio de Janeiro, Brazil, pp 17–24 Marinho A, Murta L, Werner C, Braganholo V, Cruz SMS, Ogasawara E, Mattoso M (2010) Managing provenance in scientific workflows with ProvManager. In: International workshop on challenges in e-Science (CIS2010), Petrópolis, Rio de Janeiro, Brazil, pp 17–24
127.
go back to reference Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig latin: a not-so-foreign language for data processing. Paper presented at the proceedings of the 2008 ACM SIGMOD international conference on management of data, Vancouver, Canada Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig latin: a not-so-foreign language for data processing. Paper presented at the proceedings of the 2008 ACM SIGMOD international conference on management of data, Vancouver, Canada
128.
go back to reference Green TJ, Karvounarakis G, Tannen V (2007) Provenance semirings. Paper presented at the proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, Beijing, China Green TJ, Karvounarakis G, Tannen V (2007) Provenance semirings. Paper presented at the proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, Beijing, China
129.
go back to reference Dey S, Belhajjame K, Koop D, Raul M, Ludascher B (2015) Linking prospective and retrospective provenance in scripts. Paper presented at the proceedings of the 7th USENIX conference on theory and practice of provenance, Edinburgh, Scotland Dey S, Belhajjame K, Koop D, Raul M, Ludascher B (2015) Linking prospective and retrospective provenance in scripts. Paper presented at the proceedings of the 7th USENIX conference on theory and practice of provenance, Edinburgh, Scotland
131.
go back to reference Williams DN, Bremer T, Doutriaux C, Patchett J, Williams S, Shipman G, Miller R, Pugmire DR, Smith B, Steed C, Bethel EW, Childs H, Krishnan H, Prabhat P, Wehner M, Silva CT, Santos E, Koop D, Ellqvist T, Poco J, Geveci B, Chaudhary A, Bauer A, Pletzer A, Kindig D, Potter GL, Maxwell TP (2013) Ultrascale visualization of climate data. Computer 46(9):68–76. https://doi.org/10.1109/MC.2013.119 CrossRef Williams DN, Bremer T, Doutriaux C, Patchett J, Williams S, Shipman G, Miller R, Pugmire DR, Smith B, Steed C, Bethel EW, Childs H, Krishnan H, Prabhat P, Wehner M, Silva CT, Santos E, Koop D, Ellqvist T, Poco J, Geveci B, Chaudhary A, Bauer A, Pletzer A, Kindig D, Potter GL, Maxwell TP (2013) Ultrascale visualization of climate data. Computer 46(9):68–76. https://​doi.​org/​10.​1109/​MC.​2013.​119 CrossRef
134.
go back to reference Hey AJG, Tansley S, Tolle KM (2009) The fourth paradigm: data-intensive scientific discovery, 1st edn. Microsoft Research Redmond, Washangton Hey AJG, Tansley S, Tolle KM (2009) The fourth paradigm: data-intensive scientific discovery, 1st edn. Microsoft Research Redmond, Washangton
135.
go back to reference Delaney J, Heath G, Chave A, Howe B, Kirkham H (2000) NEPTUNE: real-time ocean and earth sciences at the scale of a tectonic plate. Oceanography 13(2):71–79CrossRef Delaney J, Heath G, Chave A, Howe B, Kirkham H (2000) NEPTUNE: real-time ocean and earth sciences at the scale of a tectonic plate. Oceanography 13(2):71–79CrossRef
Metadata
Title
Mechanisms for provenance collection in scientific workflow systems
Authors
Mehdi Sarikhani
Andrew Wendelborn
Publication date
14-11-2017
Publisher
Springer Vienna
Published in
Computing / Issue 5/2018
Print ISSN: 0010-485X
Electronic ISSN: 1436-5057
DOI
https://doi.org/10.1007/s00607-017-0578-1

Premium Partner