skip to main content
article

Integrating databases and workflow systems

Published:01 September 2005Publication History
Skip Abstract Section

Abstract

There has been an information explosion in fields of science such as high energy physics, astronomy, environmental sciences and biology. There is a critical need for automated systems to manage scientific applications and data. Database technology is well-suited to handle several aspects of workflow management. Contemporary workflow systems are built from multiple, separately developed components and do not exploit the full power of DBMSs in handling data of large magnitudes. We advocate a holistic view of a WFMS that includes not only workflow modeling but planning, scheduling, data management and cluster management. Thus, it is worthwhile to explore the ways in which databases can be augmented to manage workflows in addition to data. We present a language for modeling workflows that is tightly integrated with SQL. Each scientific program in a workflow is associated with an active table or view. The definition of data products is in relational format, and invocation of programs and querying is done in SQL. The tight coupling between workflow management and data-manipulation is an advantage for data-intensive scientific programs.

References

  1. Biomedical informatics research network. http://www.nbirn.net.]]Google ScholarGoogle Scholar
  2. Condor dagman. http://www.cs.wisc.edu/condor/dagman/.]]Google ScholarGoogle Scholar
  3. Condor diskrouter. http://www.cs.wisc.edu/condor/diskrouter/]]Google ScholarGoogle Scholar
  4. Condor high throughput computing. http://www.cs.wisc.edu/condor.]]Google ScholarGoogle Scholar
  5. Cyberstructure for the geosciences. http://www.geongrid.org.]]Google ScholarGoogle Scholar
  6. Grid physics network. http://www.griphyn.org.]]Google ScholarGoogle Scholar
  7. Grid physics network in atlas. http://www.usatlas.bnl.gov/computing/grid/griphyn/.]]Google ScholarGoogle Scholar
  8. Hawkeye. http://www.cs.wisc.edu/condor/hawkeye/.]]Google ScholarGoogle Scholar
  9. Ptolemy ii: Heterogenous modeling and design. http://ptolemy.eecs.berkeley.edu/ptolemyII/.]]Google ScholarGoogle Scholar
  10. Sloan digital sky survey. http://www.sdss.org.]]Google ScholarGoogle Scholar
  11. S. Abiteboul, V. Vianu, et al. Relational transducers for electronic commerce. In PODS, pages 179--187, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, and S. Mock. Kepler: An extensible system for design and execution of scientific workflows. In SSDBM, pages 423--424, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Becla and D. L. Wang. Lessons learned from managing a petabyte. In CIDR, pages 70--83, 2005.]]Google ScholarGoogle ScholarCross RefCross Ref
  14. J. Bent, D. Thain, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and M. Livny. Explicit control in the batch-aware distributed file system. In NSDI, pages 365--378, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. J. Bonner. Workflow, transactions, and datalog. In Proceedings of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 1999, pages 294--305. ACM Press, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Bote-Lorenzo and E. Dimitriadis, Y. and Gomez-Sanchez. Grid characteristics and uses: a grid definition. In Proceedings of the First European Across Grids Conference, pages 291--298, February 2003.]]Google ScholarGoogle Scholar
  17. P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of data provenance. In ICDT, pages 316--330, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Chaudhuri and K. Shim. Optimization of queries with user-defined predicates. ACM Trans. Database Syst., 24(2):177--228, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. L. Chervenak et al. Giggle: a framework for constructing scalable replica location services. In SC, pages 1--17, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. L. Cole and G. Graefe. Optimization of dynamic query evaluation plans. In SIGMOD Conference, pages 150--160, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. Deelman, J. Blythe, et al. Pegasus: Mapping scientific workflows onto the grid. In European Across Grids Conference, pages 11--20, 2004.]]Google ScholarGoogle ScholarCross RefCross Ref
  22. D. J. DeWitt and other. The gamma database machine project. IEEE Trans. Knowl. Data Eng., 2(1):44--62, 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. I. T. Foster, J.-S. Vöckler, M. Wilde, and Y. Zhao. Chimera: Avirtual data system for representing, querying, and automating data derivation. In SSDBM, pages 37--46, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Gray et al. When database systems meet the grid. In CIDR, pages 154--161, 2005.]]Google ScholarGoogle Scholar
  25. K. He, S. Dong, L. Zhang, and B. Song. Building grid monitoring system based on globus toolkit: Architecture and implementation. In CIS, pages 353--358, 2004.]]Google ScholarGoogle Scholar
  26. Y. E. Ioannidis, M. Livny, A. Ailamaki, A. Narayanan, and A. Therber. Zoo: A desktop experiment management environment. In SIGMOD Conference, pages 580--583, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Kini, S. Shankar, D. DeWitt, and J. Naughton. Match-making in database systems, submitted for publication. Submitted for publication.]]Google ScholarGoogle Scholar
  28. T. Kosar and M. Livny. Stork: Making data placement a first class citizen in the grid. In ICDCS, pages 342--349, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. T. Liu and M. J. Franklin. The design of griddb: A data-centric overlay for the scientific grid. In VLDB, pages 600--611, 2004.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. M. Lohman, C. Mohan, et al. Query processing in R*. In Query Processing in Database Systems, pages 31--47. 1985.]]Google ScholarGoogle ScholarCross RefCross Ref
  31. N. W. Paton and O. Díaz. Active database systems. ACM Comput. Surv., 31(1):63--103, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, pages 262--276, 2005.]]Google ScholarGoogle Scholar

Index Terms

  1. Integrating databases and workflow systems

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGMOD Record
          ACM SIGMOD Record  Volume 34, Issue 3
          September 2005
          115 pages
          ISSN:0163-5808
          DOI:10.1145/1084805
          Issue’s Table of Contents

          Copyright © 2005 Authors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 September 2005

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader