skip to main content
article

Lineage retrieval for scientific data processing: a survey

Authors Info & Claims
Published:01 March 2005Publication History
Skip Abstract Section

Abstract

Scientific research relies as much on the dissemination and exchange of data sets as on the publication of conclusions. Accurately tracking the lineage (origin and subsequent processing history) of scientific data sets is thus imperative for the complete documentation of scientific work. Researchers are effectively prevented from determining, preserving, or providing the lineage of the computational data products they use and create, however, because of the lack of a definitive model for lineage retrieval and a poor fit between current data management tools and scientific software. Based on a comprehensive survey of lineage research and previous prototypes, we present a metamodel to help identify and assess the basic components of systems that provide lineage retrieval for scientific data products.

References

  1. Alonso, G. 1994. Managing advanced databases: Concurrency, recovery, and cooperation in scientific applications. Ph.D. Dissertation, Computer Science Department, University of California at Santa Barbara, Santa Barbara, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alonso, G., Agrawal, D., El Abbadi, A., and Mohan, C. 1997a. Functionality and limitations of current workflow management systems. Computer Science Department, University of California at Santa Barbara, Santa Barbara, CA. Available at: http://www.inf.ethz.ch/personal/alonso/PAPERS/IEEE-Expert.ps.Z.]]Google ScholarGoogle Scholar
  3. Alonso, G., and El Abbadi, A. 1993. GOOSE: Geographic object oriented support environment. In Proceedings of the ACM Workshop on Advances in Geographic Information Systems. Arlington, VA. 38--49.]]Google ScholarGoogle Scholar
  4. Alonso, G., and Hagen, C. 1997b. Geo-Opera: Workflow concepts for spatial processes. In Proceedings of the 5th International Symposium on Spatial Databases (SSD '97). Berlin, Germany. 238--258.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Alonso, G., Hagen, C., Schek, H.-J., and Tresch, M. 1998. Towards a platform for distributed application development. In Workflow Management Systems and Interoperability. A. Dogac, L. Kalinichenko, M. T. Ozsu and A. Sheth, Eds. NATO ASI Series, Vol. 164. Springer, Berlin. 195--221.]]Google ScholarGoogle Scholar
  6. Aoyama, M., Weerawarana, S., Maruyama, H., Szyperski, C., Sullivan, K., and Lea, D. 2002. Web services engineering: promises and challenges. In IEEE Proceedings of the 24th International Conference on Software Engineering (ICSE '02). Orlando, FL. 647--648.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. AT&T. 2001. Graphviz graph visualization software. AT&T Labs---Research. Available at: http://www.research.att.com/sw/tools/graphviz/.]]Google ScholarGoogle Scholar
  8. Baker, N., McClatchey, R., and Le Goff, J.-M. 1997. Scientific workflow management in a distributed production environment. In IEEE Proceedings of the 1st International Enterprise Distributed Object Computing Workshop. 291--299.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Barkstrom, B. R. 1998. Digital archive issues from the perspective of an Earth Science data producer. Position Paper: ISO Archiving Workshop Series: Digital Archive Directions (DADs) Workshop (June). College Park, MD. Available at: http://ssdoo.gsfc.nasa.gov/nost/isoas/dads/.]]Google ScholarGoogle Scholar
  10. Barkstrom, B. R. 2002. Data product configuration management and versioning in large-scale production of satellite scientific data production. Position paper: Workshop on Data Derivation and Provenance (Oct.). Chicago, IL.]]Google ScholarGoogle Scholar
  11. Barry, A., Baker, N., Le Goff, J.-M., McClatchey, R., and Vialle, J.-P. 1998. Meta-data based design of workflow systems. Workshop paper: Metadata and Dynamic Object-Model Pattern Mining Workshop (at OOPSLA '98) (Oct.). Vancouver, Canada. Available at: http://www-poleia.lip6.fr/~razavi/aom/papers/oopsla98/mcclatchey.pdf.]]Google ScholarGoogle Scholar
  12. Becker, R. A., and Chambers, J. M. 1988. Auditing of data analyses. SIAM J. Sci. Stat. Comput. 9, 4, 747--760.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Berkley, C., Jones, M., Bojilova, J., and Higgins, D. 2001. Metacat: A schema-independent XML database system. In Proceedings of the 13th International Conference on Scientific and Statistical Database Management (SSDBM '01) (July), Fairfax, VA, L. Kerschberg and M. Kafatos, Eds. IEEE Computer Society. 171--179.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Bernstein, A., Dellarocas, C., and Klein, M. 1999. Towards adaptive workflow systems. SIGMOD Record 28, 3, 7--8.]]Google ScholarGoogle Scholar
  15. Booch, G., Rumbaugh, J., and Jacobson, I. 1999. The Unified Modeling Language User Guide. Addison-Wesley.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Brown, P., and Stonebraker, M. 1995. Big Sur: A system for the management of Earth science data. In Proceedings of the 21st International Conference of Very Large Data Bases (VLDB '95). Zurich, Switzerland. 720--728.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Buneman, P., and Foster, I. 2002a. Workshop on Data Derivation and Provenance. (Oct). Chicago, IL. Available at: http://www-fp.mcs.anl.gov/~foster/provenance/.]]Google ScholarGoogle Scholar
  18. Buneman, P., and Foster, I. 2003. Workshop on Data Provenance and Annotation (Dec.). Edinburgh, Scotland. Available at: http://www.nesc.ac.uk/esi/events/304/.]]Google ScholarGoogle Scholar
  19. Buneman, P., Khanna, S., and Tan, W. C. 2000a. Data provenance: Some basic issues. In Proceedings of the Foundations of Software Technology and Theoretical Computer Science (FSTTCS '00). New Delhi, India. Springer, 87--93.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Buneman, P., Khanna, S., and Tan, W. C. 2001. Why and where: A characterization of data provenance. In Proceedings of the International Conference on Database Theory (ICDT '01) (Jan.). London, UK. 316--330.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Buneman, P., Khanna, S., and Tan, W. C. 2002b. Computing provenance and annotations for views. Workshop Paper: Workshop on Data Derivation and Provenance (Oct.). Chicago IL. Available at: http://people.cs.uchicago.edu/~yongzh/position_papers.html.]]Google ScholarGoogle Scholar
  22. Buneman, P., Maier, D., and Widom, J. 2000b. Where was your data yesterday, and where will it go tomorrow? Data Annotation and Provenance for Scientific Applications. Position paper for NSF Workshop on Information and Data Management (IDM '00): Research Agenda into the Future (March), Chicago IL.]]Google ScholarGoogle Scholar
  23. Cederqvist, P. 1993. Version management with CVS, Signum Support AB (Dec.). Available at: https://www.cvshome.org/docs/manual/.]]Google ScholarGoogle Scholar
  24. Chakravarthy, S., Krishnaprasad, V., Tamizuddin, Z., and Lambay, F. 1993. A federated multi-media DBMS for medical research: Architecture and functionality. Technical Report UF-CIS-TR-93-006, Department of Computer and Information Sciences, University of Florida, Gainesville, FL.]]Google ScholarGoogle Scholar
  25. Chen, I. A., and Markowitz, V. M. 1995a. Modeling scientific experiments with an object data model. In Proceedings of the 11th International Conference on Data Engineering (ICDE '95). 391--400.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Chen, I. A., and Markowitz, V. M. 1995b. An overview of the Object Protocol Model (OPM) and the OPM data management tools. Inform. Syst. 20, 5, 393--418.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Chen, L., Shadbolt, N. R., Goble, C., Tao, F., Cox, S. J., Puleston, C., and Smart, P. 2003. Towards a knowledge-based approach to semantic service composition. Lecture Notes in Computer Science. 2870, 319--334.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Cichocki, A., Helal, A., Rusinkiewcz, M., and Woelk, D. 1998. Workflow and Process Automation. Kluwer Academic Publishers, London, UK.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Clarke, D. G., and Clark, D. M. 1995. Lineage. In Elements of Spatial Data Quality, S. C. Guptill and J. L. Morrison, Eds., Elsevier Science, Oxford. 13--30.]]Google ScholarGoogle Scholar
  30. Conradi, R., and Westfechtel, B. 1998. Version models for software configuration management. ACM Comput. Sur. 30, 2, 232--282.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Cui, Y., and Widom, J. 2003. Lineage tracing for general data warehouse transformations. The VLDB J. 12, 1, 41--58.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Cui, Y., Widom, J., and Wiener, J. L. 1997. Tracing the lineage of view data in a warehousing environment. Technical Report, Stanford University Database Group (Nov.). Stanford, CA. Available at: http://www-db.stanford.edu/pub/papers/lineage-full.ps.]]Google ScholarGoogle Scholar
  33. Cui, Y., Widom, J., and Wiener, J. L. 2000. Tracing the lineage of view data in a data warehousing environment. ACM Trans. Datab. Syst. 25, 2, 179--227.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Cushing, J. B., Maier, D., Rao, M., Abel, D., Feller, D., and DeVaney, D. M. 1994. Computational proxies: Modeling scientific applications in object databases. In Proceedings of the 7th International Working Conference on Scientific and Statistical Database Management (SSDBM '94). 196--206.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Date, C. J. 2000. Introduction to Database Systems. Addison-Wesley.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Draskic, J., Le Goff, J.-M., Willers, I., Estrella, F., Kovacs, Z., McClatchey, R., and Zsenei, M. 1999. Using a meta-model as the basis for enterprise-wide data navigation. In Proceedings of the 3rd IEEE Metadata Conference (MD'99) (April). Bethesda, MO.]]Google ScholarGoogle Scholar
  37. Eagan, P. D., and Ventura, S. J. 1993. Enhancing value of environmental data: data lineage reporting. J. Environ. Eng. 119, 1, 5--16.]]Google ScholarGoogle ScholarCross RefCross Ref
  38. Elmagarmid, A., and Du, W. 1997. Workflow management: State of the art versus state of the products. In Workflow Management Systems and Interoperability, A. Dogac, L. Kalinichenko, M. T. Ozsu and A. Sheth, Eds. NATO ASI Series, Vol. 164, Springer, Berlin. 1--17.]]Google ScholarGoogle Scholar
  39. ESRI. 1982. ARC/INFO geographic information system (GIS), ESRI, Redlands, CA. Available at: www.esri.com.]]Google ScholarGoogle Scholar
  40. Federal Geographic Data Committee. 1998. Content standard for digital geospatial metadata FGDC-STD-001-1998 (revised June), Federal Geographic Data Committee, Washington, DC. Available at: http://www.fgdc.gov/metadata/csdgm/.]]Google ScholarGoogle Scholar
  41. Feldman, S. I. 1978. Make---A program for maintaining computer programs. In UNIX Programmer's Manual, Vol. 2 (Bell Laboratories). Holt, Rinehart and Winston, New York. 291--300.]]Google ScholarGoogle Scholar
  42. Foster, I., and Kesselmann, C., Eds. 1999. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Foster, I., Vockler, J., Wilde, M., and Zhao, Y. 2002. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management (SSDBM '02) (July). Edinburgh, Scotland, J. Kennedy, Ed. IEEE Computer Society. 37--46.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Foster, I., Vockler, J., Wilde, M., and Zhao, Y. 2003. The virtual data grid: A new model and architecture for data-intensive collaboration. In Proceedings of the 1st Biennial Conference on Innovative Data System Research (CIDR '03) {Online proceedings} (Jan.). Pacific Grove, CA.]]Google ScholarGoogle Scholar
  45. French, J. C. 1995. What is metadata? In Proceedings of the SDM--92 Workshop: The Role of Metadata in Managing Large Environmental Science Datasets, Richland, WA, R. B. Melton, D. M. DeVaney and J. C. French, Eds. Pacific Northwest Laboratory. 3--8.]]Google ScholarGoogle Scholar
  46. Frew, J., and Bose, R. 2001. Earth system science workbench: A data management infrastructure for earth science products. In Proceedings of the 13th International Conference on Scientific and Statistical Database Management (SSDBM '01) (July). Fairfax, VA. L. Kerschberg and M. Kafatos, Eds. IEEE Computer Society. 180--189.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Frew, J., and Dozier, J. 1997. Data management for earth system science. SIGMOD Record 26, 1, 27--31.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Geist, A., and Nachtigal, N. 2003. ORNL Electronic Notebook Project. Oak Ridge National Laboratory. Available at: http://www.csm.ornl.gov/~geist/java/applets/enote/.]]Google ScholarGoogle Scholar
  49. Geographic Designs. 1993. Geolineus Version 3.0 User Manual. Santa Barbara, CA.]]Google ScholarGoogle Scholar
  50. Georgakopoulos, D., Hornick, M., and Sheth, A. 1995. An overview of workflow management: from process modeling to workflow automation infrastructure. Distrib. Paral. Datab. 3, 2, 119--153.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Goland, Y., Whitehead, E., Faizi, A., Carter, S., and Jensen, D. 1999. HTTP Extensions for distributed authoring--WEBDAV: RFC 2518. Network Working Group. Available at: http://asg.web.cmu.edu/rfc/rfc2518.html.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Greenwood, M., Goble, C., Stevens, R., Zhao, J., Addis, M., Marvin, D., Moreau, L., and Oinn, T. 2003. Provenance of e-science experiments---experience from bioinformatics. In Proceedings of the UK e-Science All Hands Meeting. Nottingham, UK. 223--226.]]Google ScholarGoogle Scholar
  53. Grid Physics Network (GriPhyN) project. 2003. Chimera Virtual Data System Version 1.2 User Guide, Grid Physics Network (GriPhyN) project (Dec.). Available at: http://www.griphyn.org/chimera/release.html.]]Google ScholarGoogle Scholar
  54. Hachem, N. I., Qui, K., Gennert, M., and Ward, M. 1993. Managing derived data in the Gaea scientific DBMS. In Proceedings of the 19th International Conference on Very Large Databases (VLDB '93) (Aug.). Dublin, Ireland. 1--12.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Insightful Corporation. 2003. S-PLUS statistical analysis, graphics and programming application, Insightful Corporation, Seattle, WA. Available at: http://www.insightful.com/.]]Google ScholarGoogle Scholar
  56. Ioannidis, Y., Livny, M., Gupta, S., and Ponnekanti, N. 1996. ZOO: A desktop experiment management environment. In Proceedings of the 22nd International Conference on Very Large Databases (VLDB '96). Bombay, India. 274--285.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Ioannidis, Y., Livny, M., Haber, E., Miller, R., Tsatalos, O., and Wiener, J. 1993. Desktop experiment management. IEEE Data Eng. Bull. 16, 1, 19--23.]]Google ScholarGoogle Scholar
  58. IT Innovation. 2002. IT innovation workflow enactment engine. IT Innovation Centre. Available at: http://www.it-innovation.soton.ac.uk/mygrid/workflow/.]]Google ScholarGoogle Scholar
  59. Kaestle, G., Eddie C. Shek, and Dao, S. K. 1999. Sharing experiences from scientific experiments. In Proceedings of the 11th International Conference on Scientific and Statistical Database Management (SSDBM '99) (July). Cleveland, OH. IEEE Computer Society, 168--177.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Kavantzas, N., Burdett, D., and Ritzinger, G. 2004. Web Services Choreography Description Language Version 1.0. W3C Working Draft, IBM developerWorks (April). Available at: http://www.w3.org/TR/ws-cdl-10/.]]Google ScholarGoogle Scholar
  61. Lanter, D. P. 1988. A neural network for GIS command language translation. Unpublished research paper. University of South Carolina, Columbia, SC.]]Google ScholarGoogle Scholar
  62. Lanter, D. P. 1989a. Techniques and methods of spatial data-base lineage tracing. Ph.D. Dissertation, University of South Carolina, Columbia, SC.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Lanter, D. P. 1989b. Trimming Large spatial databases with lineage analysis. In Proceedings of the 10th Annual ESRI Users Conference. Palm Springs, CA.]]Google ScholarGoogle Scholar
  64. Lanter, D. P. 1990. Lineage in GIS: The problem and a solution. Technical Report 90-6, National Center for Geographic Information and Analysis (NCGIA), University of California at Santa Barbara, Santa Barbara, CA.]]Google ScholarGoogle Scholar
  65. Lanter, D. P. 1991. Design of a lineage-based meta-data base for GIS. Cart. Geograph. Info. Syst. 18, 4, 255--261.]]Google ScholarGoogle ScholarCross RefCross Ref
  66. Lanter, D. P. 1993. A Lineage meta-database approach toward spatial analytic database optimization. Cart. Geograph. Info. Syst. 20, 2, 112--121.]]Google ScholarGoogle ScholarCross RefCross Ref
  67. Lanter, D. P. 1994. Comparison of spatial analytic applications of GIS. In Environmental Information Management and Analysis: Ecosystem to Global Scales, W. K. Michener, J. W. Brunt and S. G. Stafford, Eds. Taylor & Francis, Bristol, PA. 413--425.]]Google ScholarGoogle Scholar
  68. Lanter, D. P., and Veregin, H. 1990. A lineage meta-database program for propagating error in geographic information systems. In Proceedings of the GIS/LIS Conference (Nov.). 144--153.]]Google ScholarGoogle Scholar
  69. Le Goff, J.-M., Vialle, J.-P., Bazan, A., Le Flour, T., Lieunard, S., Rousset, D., McClatchey, R., Baker, N., Kovacs, Z., Heath, H., Leonardi, E., Barone, G., and Organtini, G. 1996. C. R. I. S. T. A. L./ Concurrent repository & information system for tracking assembly and production lifecycles---A data capture and production management tool for the assembly and construction of the CMS ECAL detector. CERN CMS Note 1996/003, CERN, 1996, Geneva, Switzerland. Available at: http://cmsdoc.cern.ch/documents/96/note96_003.pdf.]]Google ScholarGoogle Scholar
  70. Lee, J., Gruninger, M., Jin, Y., Malone, T., Tate, A., and Yost, G. 1998. PIF The process interchange format. In Handbook on Architectures of Information Systems. P. Bernus, G. Schmidt and K. Mertins, Eds. Springer, Berlin. 167--189.]]Google ScholarGoogle Scholar
  71. Manola, F., and Miller, E. 2004. RDF Primer W3C Recommendation. World Wide Web Consortium (W3C). Available at: http://www.w3.org/TR/2004/REC-rdf-primer-20040210/.]]Google ScholarGoogle Scholar
  72. Marathe, A. P. 2001. Tracing lineage of array data. In Proceedings of the 13th International Conference on Scientific and Statistical Database Management (SSDBM '01) (July). Fairfax, VA. L. Kerschberg and M. Kafatos, Eds. IEEE Computer Society. 69--78.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Mathworks. 2003. MATLAB programming and visualization application. The Mathworks, Inc., Natick, MA. Available at: http://www.mathworks.com/.]]Google ScholarGoogle Scholar
  74. McClatchey, R., Baker, N., Harris, W., Le Goff, J.-M., Kovacs, Z., Estrella, F., Bazan, A., and Le Flour, T. 1997a. Version management in a distributed workflow application. In IEEE Proceedings of the 8th International Workshop on Database and Expert Systems Applications (DEXA '97). 10--15.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. McClatchey, R., Estrella, F., Le Goff, J.-M., Kovacs, Z., and Baker, N. 1997b. Object databases in a distributed scientific workflow application. In Proceedings of the 3rd Basque International Workshop on Information Technology (BIWIT '97). 11--21.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. McClatchey, R., Kovacs, Z., Estrella, F., Le Goff, J.-M., Chevenier, G., Baker, N., Lieunard, S., Murray, S., Le Flour, T., and Bazan, A. 1998. The integration of product data and workflow management systems in a large scale engineering database application. In IEEE Proceedings of the International Database Engineering and Applications Symposium (IDEAS '98). 296--302.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Medeiros, C. B., Vossen, G., and Weske, M. 1995. WASA: A workflow-based architecture to support scientific database applications. In Proceedings of the 6th International Workshop on Database and Expert Systems Applications (DEXA '95). 574--583.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Merriam-Webster Inc. 2001. Merriam-Webster Collegiate Dictionary, Springfield, MA.]]Google ScholarGoogle Scholar
  79. Mohan, C. 1997. Recent Trends in workflow management products, standards and research. In Workflow Management Systems and Interoperability. A. Dogac, L. Kalinichenko, M. T. Ozsu and A. Sheth, Eds. NATO ASI Series Vol. 164, Springer. 396--409.]]Google ScholarGoogle Scholar
  80. Myers, J., Pancerella, C., Lansing, C., Schuchardt, K., and Didier, B. 2003a. Multi-scale science: Supporting emerging practice with semantically derived provenance. In Proceedings of the Workshop on Semantic Web Technologies for Searching and Retrieving Scientific Data {Online proceedings} (Oct.). Sanibel Island, FL. 2003.]]Google ScholarGoogle Scholar
  81. Myers, J. D., Chappell, A. R., Elder, M., Geist, A., and Schwidder, J. 2003b. Re-integrating the research record. Comput. Sci. Eng. 5, 3, 44--50.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. National Aeronautics and Space Administration (NASA). 1986. Report of the EOS Data Panel, Vol. IIa: Earth Observing System Data and Information System. Technical Memorandum 87777, National Aeronautics and Space Administration (NASA), Washington, DC.]]Google ScholarGoogle Scholar
  83. National Research Council. 1999. Global Environmental Change: Research Pathways for the Next Decade. National Academy Press, Washington, DC.]]Google ScholarGoogle Scholar
  84. Object Management Group. 2002. Meta-Object Facility (MOF) Specification, Version 1.4. Object Management Group (OMG). Available at: http://www.omg.org/cgi-bin/doc?formal/2002-04-03.]]Google ScholarGoogle Scholar
  85. Object Management Group. 2004. dtc/04-05-01 (Life Sciences Identifiers final adopted specification). Object Management Group, Inc. Available at: http://www.omg.org/docs/dtc/04-05-01.pdf.]]Google ScholarGoogle Scholar
  86. Ousterhout, J. 1994. Tcl and the Tk Toolkit. Addison-Wesley, Reading, MA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Pancerella, C., Myers, J., Allison, T. C., and Amin, K. 2003. Metadata in the collaboratory for multi-scale chemical science. In Proceedings of the Dublin Core Conference (DC-'03) {Online proceedings} (Sept.-Oct.). Seattle, WA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Pratt, J. M. 1995. Data modeling of scientific experimentation. In Proceedings of the 1995 ACM Symposium on Applied Comput., 86--90.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Research Systems Inc. 2003. Interactive Data Language (IDL) computing environment for interactive analysis and visualization of data. Research Systems, Inc. Available at: http://www.rsinc.com/.]]Google ScholarGoogle Scholar
  90. Roush, G. E. 1989. Documenting one's work. IEEE Potentials 8, 2, 24--26.]]Google ScholarGoogle Scholar
  91. Rusinkiewicz, M., and Sheth, A. 1995. Specification and execution of transactional workflows. In Modern Database Systems: The Object Model, Interoperability, and Beyond. W. Kim, Ed. ACM Press, New York. 592--620.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Saran, A., Agrawal, D., El Abbadi, A., Smith, T. R., and Su, J. 1996. Scientific modeling using distributed resources. In Proceedings of the 4th ACM Workshop on Advances on Advances in Geographic Information Systems, Rockville, MD. ACM Press. 68--75.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Schael, T. 1998. Workflow Management Systems for Process Organizations. Springer, Berlin.]]Google ScholarGoogle Scholar
  94. Singh, M., and Vouk, M. A. 1996. Scientific workflows: Scientific computing meets transactional workflow. In Proceedings of the NSF Workshop on Workflow and Process Automation in Information Systems: State-of-the-Art and Future Directions {Online Proceedings} (May). Athens, GA.]]Google ScholarGoogle Scholar
  95. Skidmore, J. L., Sottile, M. J., Cuny, J. E., and Maloney, A. D. 1998. A prototype notebook-based environment for computational tools. In IEEE Proceedings of the Supercomputing '98 (SC '98) Conference (Nov.). Orlando, FL. 7--13.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Smith, T. R., Su, J., Agrawal, D., and El Abbadi, A. 1993. Database and modeling systems for the earth sciences. IEEE Bull. Tech. Comm. Data Eng. 16, 1, 33--37.]]Google ScholarGoogle Scholar
  97. Smith, T. R., Su, J., El Abbadi, A., Agrawal, D., Alonso, G., and Saran, A. 1995. Computational modeling systems. Info. Syst. 20, 2, 127--153.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Spery, L., Claramunt, C., and Libourel, T. 1999. A lineage metadata model for the temporal management of a cadastre application. In Proceedings of the 10th International Workshop on Database and Expert Systems Applications (DEXA '99) (Sept.). Florence, Italy, A. Cammelli, A. Tjoa and R. R. Wagner, Eds. IEEE Computer Society, 466--474.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Stein, L., Rozen, S., and Goodman, N. 1994. Managing laboratory flow with LabBase. In Proceedings of the Conference on Computers in Medicine (CompMed'94).]]Google ScholarGoogle Scholar
  100. Stonebraker, M. 1991. An overview of the Sequoia 2000 project. Sequoia Technical Report S2K-94-58. Berkeley, CA. Available at: http://epoch.cs.berkeley.edu:8000/sequoia/tech-reports/s2k-94-58/.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Stonebraker, M. 1994. Sequoia 2000-a reflection on the first three years. Sequoia Technical Report S2K-94-58. Berkeley, CA. Available at: http://epoch.cs.berkeley.edu:8000/sequoia/tech-reports/s2k-93-23/.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Stonebraker, M., Chen, J., Nathan, N., Paxson, C., and Wu, J. 1993. Tioga: Providing data management support for scientific visualization applications. In Proceedings of the 19th International Conference on Very Large Databases (VLDB '93). Dublin, Ireland. 25--38.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Thatte, S. 2003. Business Process Execution Language for Web Services Version 1.1. Specification, IBM developerWorks (May). Available at: http://www-106.ibm.com/developerworks/library/ws-bpel/.]]Google ScholarGoogle Scholar
  104. U.S. Geological Survey. 1992. Spatial Data Transfer Standard (SDTS) NCITS 320-1998, American National Standards Institute (ANSI) (June). Reston, VA. Available at: http://mcmcweb.er.usgs.gov/sdts/SDTS_standard_nov97/part1b12.html.]]Google ScholarGoogle Scholar
  105. U.S. Geological Survey. 1995. Modern Average Global Sea-Surface Temperature: Metadata. U.S. Geological Survey. Available at: http://geo-nsdi.er.usgs.gov/metadata/digital-data/10/metadata.html#2.]]Google ScholarGoogle Scholar
  106. UC Berkeley. 1994. POSTGRES database management system (DBMS), Universtity of California Berkeley, Berkeley, CA. Available at: http://db.cs.berkeley.edu/postgres.html.]]Google ScholarGoogle Scholar
  107. Vahdat, A., and Anderson, T. 1998. Transparent result caching. In Proceedings of the USENIX Annual Technical Conference {Online proceedings} (June). New Orleans, LA. 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. Vossen, G., and Weske, M. 1997. The WASA Approach to workflow management for scientific applications. In Workflow Management Systems and Interoperability, A. Dogac, L. Kalinichenko, M. T. Ozsu and A. Sheth, Eds. NATO ASI Series Vol. 164, Springer, Berlin. 145--164.]]Google ScholarGoogle Scholar
  109. Vossen, G., and Weske, M. 1999. The WASA2 object-oriented workflow management system. In Proceedings of the ACM SIGMOD International Conference on Management of Data, ACM. 587--589.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. Wainer, J., Weske, M., Vossen, G., and Medeiros, C. M. B. 1996. Scientific workflow systems. In Proceedings of the NSF Workshop on Workflow and Process Automation in Information Systems: State-of-the-Art and Future Directions {Online Proceedings} (May). Athens, GA.]]Google ScholarGoogle Scholar
  111. Winfield, A. J. 1998. A Virtual Laboratory Notebook for simulation models. In Proceedings of the Pacific Symposium on Biocomputing '98 (Jan.). Maui, HI. 177--88.]]Google ScholarGoogle Scholar
  112. Woodruff, A. G., and Stonebraker, M. 1997. Supporting fine-grained data lineage in a database visualization environment. In Proceedings of the 13th International Conference on Data Engineering (ICDE '97) (April). Birmingham, UK. IEEE Computer Society Press. 91--102.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. Workflow Management Coalition. 1999a. Interface 1: Process Definition Interchange---Process Model. WfMC Standard WfMC-TC-1016-P v1.1, Workflow Management Coalition. Available at: http://www.wfmc.org/standards/docs.htm.]]Google ScholarGoogle Scholar
  114. Workflow Management Coalition. 1999b. Interface 1: Process Definition Interchange---Q&A and Examples. WfMC Standard WfMC-TC-1016-X v1.1, Workflow Management Coalition. Available at: http://www.wfmc.org/standards/docs.htm.]]Google ScholarGoogle Scholar
  115. Workflow Management Coalition. 2001. Workflow Process Definition Interface---XML Process Definition Language (XPDL). WfMC Standard WFMC-TC-1025, Workflow Management Coalition. Available at: http://www.wfmc.org/standards/docs.htm.]]Google ScholarGoogle Scholar
  116. Zhao, J., Goble, C., Greenwood, M., Wroe, C., and Stevens, R. 2003. Annotating, linking and browsing provenance logs for e-Science. In Proceedings of the Workshop on Semantic Web Technologies for Searching and Retrieving Scientific Data {Online proceedings} (Oct.). Sanibel Island, FL.]]Google ScholarGoogle Scholar

Index Terms

  1. Lineage retrieval for scientific data processing: a survey

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in

                  Full Access

                  • Published in

                    cover image ACM Computing Surveys
                    ACM Computing Surveys  Volume 37, Issue 1
                    March 2005
                    81 pages
                    ISSN:0360-0300
                    EISSN:1557-7341
                    DOI:10.1145/1057977
                    Issue’s Table of Contents

                    Copyright © 2005 ACM

                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 1 March 2005
                    Published in csur Volume 37, Issue 1

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • article

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader