Abstract
There has been an information explosion in fields of science such as high energy physics, astronomy, environmental sciences and biology. There is a critical need for automated systems to manage scientific applications and data. Database technology is well-suited to handle several aspects of workflow management. Contemporary workflow systems are built from multiple, separately developed components and do not exploit the full power of DBMSs in handling data of large magnitudes. We advocate a holistic view of a WFMS that includes not only workflow modeling but planning, scheduling, data management and cluster management. Thus, it is worthwhile to explore the ways in which databases can be augmented to manage workflows in addition to data. We present a language for modeling workflows that is tightly integrated with SQL. Each scientific program in a workflow is associated with an active table or view. The definition of data products is in relational format, and invocation of programs and querying is done in SQL. The tight coupling between workflow management and data-manipulation is an advantage for data-intensive scientific programs.
- Biomedical informatics research network. http://www.nbirn.net.]]Google Scholar
- Condor dagman. http://www.cs.wisc.edu/condor/dagman/.]]Google Scholar
- Condor diskrouter. http://www.cs.wisc.edu/condor/diskrouter/]]Google Scholar
- Condor high throughput computing. http://www.cs.wisc.edu/condor.]]Google Scholar
- Cyberstructure for the geosciences. http://www.geongrid.org.]]Google Scholar
- Grid physics network. http://www.griphyn.org.]]Google Scholar
- Grid physics network in atlas. http://www.usatlas.bnl.gov/computing/grid/griphyn/.]]Google Scholar
- Hawkeye. http://www.cs.wisc.edu/condor/hawkeye/.]]Google Scholar
- Ptolemy ii: Heterogenous modeling and design. http://ptolemy.eecs.berkeley.edu/ptolemyII/.]]Google Scholar
- Sloan digital sky survey. http://www.sdss.org.]]Google Scholar
- S. Abiteboul, V. Vianu, et al. Relational transducers for electronic commerce. In PODS, pages 179--187, 1998.]] Google ScholarDigital Library
- I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, and S. Mock. Kepler: An extensible system for design and execution of scientific workflows. In SSDBM, pages 423--424, 2004.]] Google ScholarDigital Library
- J. Becla and D. L. Wang. Lessons learned from managing a petabyte. In CIDR, pages 70--83, 2005.]]Google ScholarCross Ref
- J. Bent, D. Thain, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and M. Livny. Explicit control in the batch-aware distributed file system. In NSDI, pages 365--378, 2004.]] Google ScholarDigital Library
- A. J. Bonner. Workflow, transactions, and datalog. In Proceedings of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 1999, pages 294--305. ACM Press, 1999.]] Google ScholarDigital Library
- M. Bote-Lorenzo and E. Dimitriadis, Y. and Gomez-Sanchez. Grid characteristics and uses: a grid definition. In Proceedings of the First European Across Grids Conference, pages 291--298, February 2003.]]Google Scholar
- P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of data provenance. In ICDT, pages 316--330, 2001.]] Google ScholarDigital Library
- S. Chaudhuri and K. Shim. Optimization of queries with user-defined predicates. ACM Trans. Database Syst., 24(2):177--228, 1999.]] Google ScholarDigital Library
- A. L. Chervenak et al. Giggle: a framework for constructing scalable replica location services. In SC, pages 1--17, 2002.]] Google ScholarDigital Library
- R. L. Cole and G. Graefe. Optimization of dynamic query evaluation plans. In SIGMOD Conference, pages 150--160, 1994.]] Google ScholarDigital Library
- E. Deelman, J. Blythe, et al. Pegasus: Mapping scientific workflows onto the grid. In European Across Grids Conference, pages 11--20, 2004.]]Google ScholarCross Ref
- D. J. DeWitt and other. The gamma database machine project. IEEE Trans. Knowl. Data Eng., 2(1):44--62, 1990.]] Google ScholarDigital Library
- I. T. Foster, J.-S. Vöckler, M. Wilde, and Y. Zhao. Chimera: Avirtual data system for representing, querying, and automating data derivation. In SSDBM, pages 37--46, 2002.]] Google ScholarDigital Library
- J. Gray et al. When database systems meet the grid. In CIDR, pages 154--161, 2005.]]Google Scholar
- K. He, S. Dong, L. Zhang, and B. Song. Building grid monitoring system based on globus toolkit: Architecture and implementation. In CIS, pages 353--358, 2004.]]Google Scholar
- Y. E. Ioannidis, M. Livny, A. Ailamaki, A. Narayanan, and A. Therber. Zoo: A desktop experiment management environment. In SIGMOD Conference, pages 580--583, 1997.]] Google ScholarDigital Library
- A. Kini, S. Shankar, D. DeWitt, and J. Naughton. Match-making in database systems, submitted for publication. Submitted for publication.]]Google Scholar
- T. Kosar and M. Livny. Stork: Making data placement a first class citizen in the grid. In ICDCS, pages 342--349, 2004.]] Google ScholarDigital Library
- D. T. Liu and M. J. Franklin. The design of griddb: A data-centric overlay for the scientific grid. In VLDB, pages 600--611, 2004.]]Google ScholarDigital Library
- G. M. Lohman, C. Mohan, et al. Query processing in R*. In Query Processing in Database Systems, pages 31--47. 1985.]]Google ScholarCross Ref
- N. W. Paton and O. Díaz. Active database systems. ACM Comput. Surv., 31(1):63--103, 1999.]] Google ScholarDigital Library
- J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, pages 262--276, 2005.]]Google Scholar
Index Terms
- Integrating databases and workflow systems
Recommendations
Integrating existing scientific workflow systems: the Kepler/Pegasus example
WORKS '07: Proceedings of the 2nd workshop on Workflows in support of large-scale scienceScientific workflows have become an important tool used by scientists to conduct large-scale analysis in distributed environments. Today thereare a variety of workflow systems that provide an often disjoint set of capabilities and expose different ...
A Workflow Fragmentation Framework for Enterprise Grid Workflow Systems
WAINA '10: Proceedings of the 2010 IEEE 24th International Conference on Advanced Information Networking and Applications WorkshopsIn this paper, we focus on a workflow distribution methodology for deploying the workflow enactment functionality over enterprise grid computing environments. The essential idea of the workflow distribution methodology is about how to fragment a ...
The Grid Resource Broker workflow engine
2nd International Workshop on Workflow Management and Applications in Grid Environments (WaGe2007)Increasingly, complex scientific applications are structured in terms of workflows. These applications are usually computationally and-or data intensive and thus are well suited for execution in grid environments. Distributed, geographically spread ...
Comments