Abstract
Scientific research relies as much on the dissemination and exchange of data sets as on the publication of conclusions. Accurately tracking the lineage (origin and subsequent processing history) of scientific data sets is thus imperative for the complete documentation of scientific work. Researchers are effectively prevented from determining, preserving, or providing the lineage of the computational data products they use and create, however, because of the lack of a definitive model for lineage retrieval and a poor fit between current data management tools and scientific software. Based on a comprehensive survey of lineage research and previous prototypes, we present a metamodel to help identify and assess the basic components of systems that provide lineage retrieval for scientific data products.
- Alonso, G. 1994. Managing advanced databases: Concurrency, recovery, and cooperation in scientific applications. Ph.D. Dissertation, Computer Science Department, University of California at Santa Barbara, Santa Barbara, CA.]] Google ScholarDigital Library
- Alonso, G., Agrawal, D., El Abbadi, A., and Mohan, C. 1997a. Functionality and limitations of current workflow management systems. Computer Science Department, University of California at Santa Barbara, Santa Barbara, CA. Available at: http://www.inf.ethz.ch/personal/alonso/PAPERS/IEEE-Expert.ps.Z.]]Google Scholar
- Alonso, G., and El Abbadi, A. 1993. GOOSE: Geographic object oriented support environment. In Proceedings of the ACM Workshop on Advances in Geographic Information Systems. Arlington, VA. 38--49.]]Google Scholar
- Alonso, G., and Hagen, C. 1997b. Geo-Opera: Workflow concepts for spatial processes. In Proceedings of the 5th International Symposium on Spatial Databases (SSD '97). Berlin, Germany. 238--258.]] Google ScholarDigital Library
- Alonso, G., Hagen, C., Schek, H.-J., and Tresch, M. 1998. Towards a platform for distributed application development. In Workflow Management Systems and Interoperability. A. Dogac, L. Kalinichenko, M. T. Ozsu and A. Sheth, Eds. NATO ASI Series, Vol. 164. Springer, Berlin. 195--221.]]Google Scholar
- Aoyama, M., Weerawarana, S., Maruyama, H., Szyperski, C., Sullivan, K., and Lea, D. 2002. Web services engineering: promises and challenges. In IEEE Proceedings of the 24th International Conference on Software Engineering (ICSE '02). Orlando, FL. 647--648.]] Google ScholarDigital Library
- AT&T. 2001. Graphviz graph visualization software. AT&T Labs---Research. Available at: http://www.research.att.com/sw/tools/graphviz/.]]Google Scholar
- Baker, N., McClatchey, R., and Le Goff, J.-M. 1997. Scientific workflow management in a distributed production environment. In IEEE Proceedings of the 1st International Enterprise Distributed Object Computing Workshop. 291--299.]] Google ScholarDigital Library
- Barkstrom, B. R. 1998. Digital archive issues from the perspective of an Earth Science data producer. Position Paper: ISO Archiving Workshop Series: Digital Archive Directions (DADs) Workshop (June). College Park, MD. Available at: http://ssdoo.gsfc.nasa.gov/nost/isoas/dads/.]]Google Scholar
- Barkstrom, B. R. 2002. Data product configuration management and versioning in large-scale production of satellite scientific data production. Position paper: Workshop on Data Derivation and Provenance (Oct.). Chicago, IL.]]Google Scholar
- Barry, A., Baker, N., Le Goff, J.-M., McClatchey, R., and Vialle, J.-P. 1998. Meta-data based design of workflow systems. Workshop paper: Metadata and Dynamic Object-Model Pattern Mining Workshop (at OOPSLA '98) (Oct.). Vancouver, Canada. Available at: http://www-poleia.lip6.fr/~razavi/aom/papers/oopsla98/mcclatchey.pdf.]]Google Scholar
- Becker, R. A., and Chambers, J. M. 1988. Auditing of data analyses. SIAM J. Sci. Stat. Comput. 9, 4, 747--760.]]Google ScholarDigital Library
- Berkley, C., Jones, M., Bojilova, J., and Higgins, D. 2001. Metacat: A schema-independent XML database system. In Proceedings of the 13th International Conference on Scientific and Statistical Database Management (SSDBM '01) (July), Fairfax, VA, L. Kerschberg and M. Kafatos, Eds. IEEE Computer Society. 171--179.]] Google ScholarDigital Library
- Bernstein, A., Dellarocas, C., and Klein, M. 1999. Towards adaptive workflow systems. SIGMOD Record 28, 3, 7--8.]]Google Scholar
- Booch, G., Rumbaugh, J., and Jacobson, I. 1999. The Unified Modeling Language User Guide. Addison-Wesley.]] Google ScholarDigital Library
- Brown, P., and Stonebraker, M. 1995. Big Sur: A system for the management of Earth science data. In Proceedings of the 21st International Conference of Very Large Data Bases (VLDB '95). Zurich, Switzerland. 720--728.]] Google ScholarDigital Library
- Buneman, P., and Foster, I. 2002a. Workshop on Data Derivation and Provenance. (Oct). Chicago, IL. Available at: http://www-fp.mcs.anl.gov/~foster/provenance/.]]Google Scholar
- Buneman, P., and Foster, I. 2003. Workshop on Data Provenance and Annotation (Dec.). Edinburgh, Scotland. Available at: http://www.nesc.ac.uk/esi/events/304/.]]Google Scholar
- Buneman, P., Khanna, S., and Tan, W. C. 2000a. Data provenance: Some basic issues. In Proceedings of the Foundations of Software Technology and Theoretical Computer Science (FSTTCS '00). New Delhi, India. Springer, 87--93.]] Google ScholarDigital Library
- Buneman, P., Khanna, S., and Tan, W. C. 2001. Why and where: A characterization of data provenance. In Proceedings of the International Conference on Database Theory (ICDT '01) (Jan.). London, UK. 316--330.]] Google ScholarDigital Library
- Buneman, P., Khanna, S., and Tan, W. C. 2002b. Computing provenance and annotations for views. Workshop Paper: Workshop on Data Derivation and Provenance (Oct.). Chicago IL. Available at: http://people.cs.uchicago.edu/~yongzh/position_papers.html.]]Google Scholar
- Buneman, P., Maier, D., and Widom, J. 2000b. Where was your data yesterday, and where will it go tomorrow? Data Annotation and Provenance for Scientific Applications. Position paper for NSF Workshop on Information and Data Management (IDM '00): Research Agenda into the Future (March), Chicago IL.]]Google Scholar
- Cederqvist, P. 1993. Version management with CVS, Signum Support AB (Dec.). Available at: https://www.cvshome.org/docs/manual/.]]Google Scholar
- Chakravarthy, S., Krishnaprasad, V., Tamizuddin, Z., and Lambay, F. 1993. A federated multi-media DBMS for medical research: Architecture and functionality. Technical Report UF-CIS-TR-93-006, Department of Computer and Information Sciences, University of Florida, Gainesville, FL.]]Google Scholar
- Chen, I. A., and Markowitz, V. M. 1995a. Modeling scientific experiments with an object data model. In Proceedings of the 11th International Conference on Data Engineering (ICDE '95). 391--400.]] Google ScholarDigital Library
- Chen, I. A., and Markowitz, V. M. 1995b. An overview of the Object Protocol Model (OPM) and the OPM data management tools. Inform. Syst. 20, 5, 393--418.]] Google ScholarDigital Library
- Chen, L., Shadbolt, N. R., Goble, C., Tao, F., Cox, S. J., Puleston, C., and Smart, P. 2003. Towards a knowledge-based approach to semantic service composition. Lecture Notes in Computer Science. 2870, 319--334.]]Google ScholarDigital Library
- Cichocki, A., Helal, A., Rusinkiewcz, M., and Woelk, D. 1998. Workflow and Process Automation. Kluwer Academic Publishers, London, UK.]] Google ScholarDigital Library
- Clarke, D. G., and Clark, D. M. 1995. Lineage. In Elements of Spatial Data Quality, S. C. Guptill and J. L. Morrison, Eds., Elsevier Science, Oxford. 13--30.]]Google Scholar
- Conradi, R., and Westfechtel, B. 1998. Version models for software configuration management. ACM Comput. Sur. 30, 2, 232--282.]] Google ScholarDigital Library
- Cui, Y., and Widom, J. 2003. Lineage tracing for general data warehouse transformations. The VLDB J. 12, 1, 41--58.]] Google ScholarDigital Library
- Cui, Y., Widom, J., and Wiener, J. L. 1997. Tracing the lineage of view data in a warehousing environment. Technical Report, Stanford University Database Group (Nov.). Stanford, CA. Available at: http://www-db.stanford.edu/pub/papers/lineage-full.ps.]]Google Scholar
- Cui, Y., Widom, J., and Wiener, J. L. 2000. Tracing the lineage of view data in a data warehousing environment. ACM Trans. Datab. Syst. 25, 2, 179--227.]] Google ScholarDigital Library
- Cushing, J. B., Maier, D., Rao, M., Abel, D., Feller, D., and DeVaney, D. M. 1994. Computational proxies: Modeling scientific applications in object databases. In Proceedings of the 7th International Working Conference on Scientific and Statistical Database Management (SSDBM '94). 196--206.]] Google ScholarDigital Library
- Date, C. J. 2000. Introduction to Database Systems. Addison-Wesley.]] Google ScholarDigital Library
- Draskic, J., Le Goff, J.-M., Willers, I., Estrella, F., Kovacs, Z., McClatchey, R., and Zsenei, M. 1999. Using a meta-model as the basis for enterprise-wide data navigation. In Proceedings of the 3rd IEEE Metadata Conference (MD'99) (April). Bethesda, MO.]]Google Scholar
- Eagan, P. D., and Ventura, S. J. 1993. Enhancing value of environmental data: data lineage reporting. J. Environ. Eng. 119, 1, 5--16.]]Google ScholarCross Ref
- Elmagarmid, A., and Du, W. 1997. Workflow management: State of the art versus state of the products. In Workflow Management Systems and Interoperability, A. Dogac, L. Kalinichenko, M. T. Ozsu and A. Sheth, Eds. NATO ASI Series, Vol. 164, Springer, Berlin. 1--17.]]Google Scholar
- ESRI. 1982. ARC/INFO geographic information system (GIS), ESRI, Redlands, CA. Available at: www.esri.com.]]Google Scholar
- Federal Geographic Data Committee. 1998. Content standard for digital geospatial metadata FGDC-STD-001-1998 (revised June), Federal Geographic Data Committee, Washington, DC. Available at: http://www.fgdc.gov/metadata/csdgm/.]]Google Scholar
- Feldman, S. I. 1978. Make---A program for maintaining computer programs. In UNIX Programmer's Manual, Vol. 2 (Bell Laboratories). Holt, Rinehart and Winston, New York. 291--300.]]Google Scholar
- Foster, I., and Kesselmann, C., Eds. 1999. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann.]] Google ScholarDigital Library
- Foster, I., Vockler, J., Wilde, M., and Zhao, Y. 2002. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management (SSDBM '02) (July). Edinburgh, Scotland, J. Kennedy, Ed. IEEE Computer Society. 37--46.]] Google ScholarDigital Library
- Foster, I., Vockler, J., Wilde, M., and Zhao, Y. 2003. The virtual data grid: A new model and architecture for data-intensive collaboration. In Proceedings of the 1st Biennial Conference on Innovative Data System Research (CIDR '03) {Online proceedings} (Jan.). Pacific Grove, CA.]]Google Scholar
- French, J. C. 1995. What is metadata? In Proceedings of the SDM--92 Workshop: The Role of Metadata in Managing Large Environmental Science Datasets, Richland, WA, R. B. Melton, D. M. DeVaney and J. C. French, Eds. Pacific Northwest Laboratory. 3--8.]]Google Scholar
- Frew, J., and Bose, R. 2001. Earth system science workbench: A data management infrastructure for earth science products. In Proceedings of the 13th International Conference on Scientific and Statistical Database Management (SSDBM '01) (July). Fairfax, VA. L. Kerschberg and M. Kafatos, Eds. IEEE Computer Society. 180--189.]] Google ScholarDigital Library
- Frew, J., and Dozier, J. 1997. Data management for earth system science. SIGMOD Record 26, 1, 27--31.]] Google ScholarDigital Library
- Geist, A., and Nachtigal, N. 2003. ORNL Electronic Notebook Project. Oak Ridge National Laboratory. Available at: http://www.csm.ornl.gov/~geist/java/applets/enote/.]]Google Scholar
- Geographic Designs. 1993. Geolineus Version 3.0 User Manual. Santa Barbara, CA.]]Google Scholar
- Georgakopoulos, D., Hornick, M., and Sheth, A. 1995. An overview of workflow management: from process modeling to workflow automation infrastructure. Distrib. Paral. Datab. 3, 2, 119--153.]] Google ScholarDigital Library
- Goland, Y., Whitehead, E., Faizi, A., Carter, S., and Jensen, D. 1999. HTTP Extensions for distributed authoring--WEBDAV: RFC 2518. Network Working Group. Available at: http://asg.web.cmu.edu/rfc/rfc2518.html.]] Google ScholarDigital Library
- Greenwood, M., Goble, C., Stevens, R., Zhao, J., Addis, M., Marvin, D., Moreau, L., and Oinn, T. 2003. Provenance of e-science experiments---experience from bioinformatics. In Proceedings of the UK e-Science All Hands Meeting. Nottingham, UK. 223--226.]]Google Scholar
- Grid Physics Network (GriPhyN) project. 2003. Chimera Virtual Data System Version 1.2 User Guide, Grid Physics Network (GriPhyN) project (Dec.). Available at: http://www.griphyn.org/chimera/release.html.]]Google Scholar
- Hachem, N. I., Qui, K., Gennert, M., and Ward, M. 1993. Managing derived data in the Gaea scientific DBMS. In Proceedings of the 19th International Conference on Very Large Databases (VLDB '93) (Aug.). Dublin, Ireland. 1--12.]] Google ScholarDigital Library
- Insightful Corporation. 2003. S-PLUS statistical analysis, graphics and programming application, Insightful Corporation, Seattle, WA. Available at: http://www.insightful.com/.]]Google Scholar
- Ioannidis, Y., Livny, M., Gupta, S., and Ponnekanti, N. 1996. ZOO: A desktop experiment management environment. In Proceedings of the 22nd International Conference on Very Large Databases (VLDB '96). Bombay, India. 274--285.]] Google ScholarDigital Library
- Ioannidis, Y., Livny, M., Haber, E., Miller, R., Tsatalos, O., and Wiener, J. 1993. Desktop experiment management. IEEE Data Eng. Bull. 16, 1, 19--23.]]Google Scholar
- IT Innovation. 2002. IT innovation workflow enactment engine. IT Innovation Centre. Available at: http://www.it-innovation.soton.ac.uk/mygrid/workflow/.]]Google Scholar
- Kaestle, G., Eddie C. Shek, and Dao, S. K. 1999. Sharing experiences from scientific experiments. In Proceedings of the 11th International Conference on Scientific and Statistical Database Management (SSDBM '99) (July). Cleveland, OH. IEEE Computer Society, 168--177.]] Google ScholarDigital Library
- Kavantzas, N., Burdett, D., and Ritzinger, G. 2004. Web Services Choreography Description Language Version 1.0. W3C Working Draft, IBM developerWorks (April). Available at: http://www.w3.org/TR/ws-cdl-10/.]]Google Scholar
- Lanter, D. P. 1988. A neural network for GIS command language translation. Unpublished research paper. University of South Carolina, Columbia, SC.]]Google Scholar
- Lanter, D. P. 1989a. Techniques and methods of spatial data-base lineage tracing. Ph.D. Dissertation, University of South Carolina, Columbia, SC.]] Google ScholarDigital Library
- Lanter, D. P. 1989b. Trimming Large spatial databases with lineage analysis. In Proceedings of the 10th Annual ESRI Users Conference. Palm Springs, CA.]]Google Scholar
- Lanter, D. P. 1990. Lineage in GIS: The problem and a solution. Technical Report 90-6, National Center for Geographic Information and Analysis (NCGIA), University of California at Santa Barbara, Santa Barbara, CA.]]Google Scholar
- Lanter, D. P. 1991. Design of a lineage-based meta-data base for GIS. Cart. Geograph. Info. Syst. 18, 4, 255--261.]]Google ScholarCross Ref
- Lanter, D. P. 1993. A Lineage meta-database approach toward spatial analytic database optimization. Cart. Geograph. Info. Syst. 20, 2, 112--121.]]Google ScholarCross Ref
- Lanter, D. P. 1994. Comparison of spatial analytic applications of GIS. In Environmental Information Management and Analysis: Ecosystem to Global Scales, W. K. Michener, J. W. Brunt and S. G. Stafford, Eds. Taylor & Francis, Bristol, PA. 413--425.]]Google Scholar
- Lanter, D. P., and Veregin, H. 1990. A lineage meta-database program for propagating error in geographic information systems. In Proceedings of the GIS/LIS Conference (Nov.). 144--153.]]Google Scholar
- Le Goff, J.-M., Vialle, J.-P., Bazan, A., Le Flour, T., Lieunard, S., Rousset, D., McClatchey, R., Baker, N., Kovacs, Z., Heath, H., Leonardi, E., Barone, G., and Organtini, G. 1996. C. R. I. S. T. A. L./ Concurrent repository & information system for tracking assembly and production lifecycles---A data capture and production management tool for the assembly and construction of the CMS ECAL detector. CERN CMS Note 1996/003, CERN, 1996, Geneva, Switzerland. Available at: http://cmsdoc.cern.ch/documents/96/note96_003.pdf.]]Google Scholar
- Lee, J., Gruninger, M., Jin, Y., Malone, T., Tate, A., and Yost, G. 1998. PIF The process interchange format. In Handbook on Architectures of Information Systems. P. Bernus, G. Schmidt and K. Mertins, Eds. Springer, Berlin. 167--189.]]Google Scholar
- Manola, F., and Miller, E. 2004. RDF Primer W3C Recommendation. World Wide Web Consortium (W3C). Available at: http://www.w3.org/TR/2004/REC-rdf-primer-20040210/.]]Google Scholar
- Marathe, A. P. 2001. Tracing lineage of array data. In Proceedings of the 13th International Conference on Scientific and Statistical Database Management (SSDBM '01) (July). Fairfax, VA. L. Kerschberg and M. Kafatos, Eds. IEEE Computer Society. 69--78.]] Google ScholarDigital Library
- Mathworks. 2003. MATLAB programming and visualization application. The Mathworks, Inc., Natick, MA. Available at: http://www.mathworks.com/.]]Google Scholar
- McClatchey, R., Baker, N., Harris, W., Le Goff, J.-M., Kovacs, Z., Estrella, F., Bazan, A., and Le Flour, T. 1997a. Version management in a distributed workflow application. In IEEE Proceedings of the 8th International Workshop on Database and Expert Systems Applications (DEXA '97). 10--15.]] Google ScholarDigital Library
- McClatchey, R., Estrella, F., Le Goff, J.-M., Kovacs, Z., and Baker, N. 1997b. Object databases in a distributed scientific workflow application. In Proceedings of the 3rd Basque International Workshop on Information Technology (BIWIT '97). 11--21.]] Google ScholarDigital Library
- McClatchey, R., Kovacs, Z., Estrella, F., Le Goff, J.-M., Chevenier, G., Baker, N., Lieunard, S., Murray, S., Le Flour, T., and Bazan, A. 1998. The integration of product data and workflow management systems in a large scale engineering database application. In IEEE Proceedings of the International Database Engineering and Applications Symposium (IDEAS '98). 296--302.]] Google ScholarDigital Library
- Medeiros, C. B., Vossen, G., and Weske, M. 1995. WASA: A workflow-based architecture to support scientific database applications. In Proceedings of the 6th International Workshop on Database and Expert Systems Applications (DEXA '95). 574--583.]] Google ScholarDigital Library
- Merriam-Webster Inc. 2001. Merriam-Webster Collegiate Dictionary, Springfield, MA.]]Google Scholar
- Mohan, C. 1997. Recent Trends in workflow management products, standards and research. In Workflow Management Systems and Interoperability. A. Dogac, L. Kalinichenko, M. T. Ozsu and A. Sheth, Eds. NATO ASI Series Vol. 164, Springer. 396--409.]]Google Scholar
- Myers, J., Pancerella, C., Lansing, C., Schuchardt, K., and Didier, B. 2003a. Multi-scale science: Supporting emerging practice with semantically derived provenance. In Proceedings of the Workshop on Semantic Web Technologies for Searching and Retrieving Scientific Data {Online proceedings} (Oct.). Sanibel Island, FL. 2003.]]Google Scholar
- Myers, J. D., Chappell, A. R., Elder, M., Geist, A., and Schwidder, J. 2003b. Re-integrating the research record. Comput. Sci. Eng. 5, 3, 44--50.]] Google ScholarDigital Library
- National Aeronautics and Space Administration (NASA). 1986. Report of the EOS Data Panel, Vol. IIa: Earth Observing System Data and Information System. Technical Memorandum 87777, National Aeronautics and Space Administration (NASA), Washington, DC.]]Google Scholar
- National Research Council. 1999. Global Environmental Change: Research Pathways for the Next Decade. National Academy Press, Washington, DC.]]Google Scholar
- Object Management Group. 2002. Meta-Object Facility (MOF) Specification, Version 1.4. Object Management Group (OMG). Available at: http://www.omg.org/cgi-bin/doc?formal/2002-04-03.]]Google Scholar
- Object Management Group. 2004. dtc/04-05-01 (Life Sciences Identifiers final adopted specification). Object Management Group, Inc. Available at: http://www.omg.org/docs/dtc/04-05-01.pdf.]]Google Scholar
- Ousterhout, J. 1994. Tcl and the Tk Toolkit. Addison-Wesley, Reading, MA.]] Google ScholarDigital Library
- Pancerella, C., Myers, J., Allison, T. C., and Amin, K. 2003. Metadata in the collaboratory for multi-scale chemical science. In Proceedings of the Dublin Core Conference (DC-'03) {Online proceedings} (Sept.-Oct.). Seattle, WA.]] Google ScholarDigital Library
- Pratt, J. M. 1995. Data modeling of scientific experimentation. In Proceedings of the 1995 ACM Symposium on Applied Comput., 86--90.]] Google ScholarDigital Library
- Research Systems Inc. 2003. Interactive Data Language (IDL) computing environment for interactive analysis and visualization of data. Research Systems, Inc. Available at: http://www.rsinc.com/.]]Google Scholar
- Roush, G. E. 1989. Documenting one's work. IEEE Potentials 8, 2, 24--26.]]Google Scholar
- Rusinkiewicz, M., and Sheth, A. 1995. Specification and execution of transactional workflows. In Modern Database Systems: The Object Model, Interoperability, and Beyond. W. Kim, Ed. ACM Press, New York. 592--620.]] Google ScholarDigital Library
- Saran, A., Agrawal, D., El Abbadi, A., Smith, T. R., and Su, J. 1996. Scientific modeling using distributed resources. In Proceedings of the 4th ACM Workshop on Advances on Advances in Geographic Information Systems, Rockville, MD. ACM Press. 68--75.]] Google ScholarDigital Library
- Schael, T. 1998. Workflow Management Systems for Process Organizations. Springer, Berlin.]]Google Scholar
- Singh, M., and Vouk, M. A. 1996. Scientific workflows: Scientific computing meets transactional workflow. In Proceedings of the NSF Workshop on Workflow and Process Automation in Information Systems: State-of-the-Art and Future Directions {Online Proceedings} (May). Athens, GA.]]Google Scholar
- Skidmore, J. L., Sottile, M. J., Cuny, J. E., and Maloney, A. D. 1998. A prototype notebook-based environment for computational tools. In IEEE Proceedings of the Supercomputing '98 (SC '98) Conference (Nov.). Orlando, FL. 7--13.]] Google ScholarDigital Library
- Smith, T. R., Su, J., Agrawal, D., and El Abbadi, A. 1993. Database and modeling systems for the earth sciences. IEEE Bull. Tech. Comm. Data Eng. 16, 1, 33--37.]]Google Scholar
- Smith, T. R., Su, J., El Abbadi, A., Agrawal, D., Alonso, G., and Saran, A. 1995. Computational modeling systems. Info. Syst. 20, 2, 127--153.]] Google ScholarDigital Library
- Spery, L., Claramunt, C., and Libourel, T. 1999. A lineage metadata model for the temporal management of a cadastre application. In Proceedings of the 10th International Workshop on Database and Expert Systems Applications (DEXA '99) (Sept.). Florence, Italy, A. Cammelli, A. Tjoa and R. R. Wagner, Eds. IEEE Computer Society, 466--474.]] Google ScholarDigital Library
- Stein, L., Rozen, S., and Goodman, N. 1994. Managing laboratory flow with LabBase. In Proceedings of the Conference on Computers in Medicine (CompMed'94).]]Google Scholar
- Stonebraker, M. 1991. An overview of the Sequoia 2000 project. Sequoia Technical Report S2K-94-58. Berkeley, CA. Available at: http://epoch.cs.berkeley.edu:8000/sequoia/tech-reports/s2k-94-58/.]] Google ScholarDigital Library
- Stonebraker, M. 1994. Sequoia 2000-a reflection on the first three years. Sequoia Technical Report S2K-94-58. Berkeley, CA. Available at: http://epoch.cs.berkeley.edu:8000/sequoia/tech-reports/s2k-93-23/.]] Google ScholarDigital Library
- Stonebraker, M., Chen, J., Nathan, N., Paxson, C., and Wu, J. 1993. Tioga: Providing data management support for scientific visualization applications. In Proceedings of the 19th International Conference on Very Large Databases (VLDB '93). Dublin, Ireland. 25--38.]] Google ScholarDigital Library
- Thatte, S. 2003. Business Process Execution Language for Web Services Version 1.1. Specification, IBM developerWorks (May). Available at: http://www-106.ibm.com/developerworks/library/ws-bpel/.]]Google Scholar
- U.S. Geological Survey. 1992. Spatial Data Transfer Standard (SDTS) NCITS 320-1998, American National Standards Institute (ANSI) (June). Reston, VA. Available at: http://mcmcweb.er.usgs.gov/sdts/SDTS_standard_nov97/part1b12.html.]]Google Scholar
- U.S. Geological Survey. 1995. Modern Average Global Sea-Surface Temperature: Metadata. U.S. Geological Survey. Available at: http://geo-nsdi.er.usgs.gov/metadata/digital-data/10/metadata.html#2.]]Google Scholar
- UC Berkeley. 1994. POSTGRES database management system (DBMS), Universtity of California Berkeley, Berkeley, CA. Available at: http://db.cs.berkeley.edu/postgres.html.]]Google Scholar
- Vahdat, A., and Anderson, T. 1998. Transparent result caching. In Proceedings of the USENIX Annual Technical Conference {Online proceedings} (June). New Orleans, LA. 1998.]] Google ScholarDigital Library
- Vossen, G., and Weske, M. 1997. The WASA Approach to workflow management for scientific applications. In Workflow Management Systems and Interoperability, A. Dogac, L. Kalinichenko, M. T. Ozsu and A. Sheth, Eds. NATO ASI Series Vol. 164, Springer, Berlin. 145--164.]]Google Scholar
- Vossen, G., and Weske, M. 1999. The WASA2 object-oriented workflow management system. In Proceedings of the ACM SIGMOD International Conference on Management of Data, ACM. 587--589.]] Google ScholarDigital Library
- Wainer, J., Weske, M., Vossen, G., and Medeiros, C. M. B. 1996. Scientific workflow systems. In Proceedings of the NSF Workshop on Workflow and Process Automation in Information Systems: State-of-the-Art and Future Directions {Online Proceedings} (May). Athens, GA.]]Google Scholar
- Winfield, A. J. 1998. A Virtual Laboratory Notebook for simulation models. In Proceedings of the Pacific Symposium on Biocomputing '98 (Jan.). Maui, HI. 177--88.]]Google Scholar
- Woodruff, A. G., and Stonebraker, M. 1997. Supporting fine-grained data lineage in a database visualization environment. In Proceedings of the 13th International Conference on Data Engineering (ICDE '97) (April). Birmingham, UK. IEEE Computer Society Press. 91--102.]] Google ScholarDigital Library
- Workflow Management Coalition. 1999a. Interface 1: Process Definition Interchange---Process Model. WfMC Standard WfMC-TC-1016-P v1.1, Workflow Management Coalition. Available at: http://www.wfmc.org/standards/docs.htm.]]Google Scholar
- Workflow Management Coalition. 1999b. Interface 1: Process Definition Interchange---Q&A and Examples. WfMC Standard WfMC-TC-1016-X v1.1, Workflow Management Coalition. Available at: http://www.wfmc.org/standards/docs.htm.]]Google Scholar
- Workflow Management Coalition. 2001. Workflow Process Definition Interface---XML Process Definition Language (XPDL). WfMC Standard WFMC-TC-1025, Workflow Management Coalition. Available at: http://www.wfmc.org/standards/docs.htm.]]Google Scholar
- Zhao, J., Goble, C., Greenwood, M., Wroe, C., and Stevens, R. 2003. Annotating, linking and browsing provenance logs for e-Science. In Proceedings of the Workshop on Semantic Web Technologies for Searching and Retrieving Scientific Data {Online proceedings} (Oct.). Sanibel Island, FL.]]Google Scholar
Index Terms
- Lineage retrieval for scientific data processing: a survey
Recommendations
Efficient lineage tracking for scientific workflows
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of dataData lineage and data provenance are key to the management of scientific data. Not knowing the exact provenance and processing pipeline used to produce a derived data set often renders the data set useless from a scientific point of view. On the ...
A Column-Level Data Lineage Processing System Based on Hive
ICBDT '20: Proceedings of the 3rd International Conference on Big Data TechnologiesFor big data, the data warehouse stores all business data of the entire enterprise. The data collected in the data warehouse will generate new data collection through the operations of data union, splitting, and transformation. This data conversion ...
Capturing and supporting contexts for scientific data sharing via the biological sciences collaboratory
CSCW '04: Proceedings of the 2004 ACM conference on Computer supported cooperative workScientific collaboration is largely focused on the sharing and joint analysis of scientific data and results. Today, a movement is afoot within the scientific computing community to shift "collaboratory" development from traditional tool-centric ...
Comments