Abstract
Digital provenance is meta-data that describes the ancestry or history of a digital object. Most work on provenance focuses on how provenance increases the value of data to consumers. However, provenance is also valuable to storage providers. For example, provenance can provide hints on access patterns, detect anomalous behavior, and provide enhanced user search capabilities. As the next generation storage providers, cloud vendors are in the unique position to capitalize on this opportunity to incorporate provenance as a fundamental storage system primitive. To date, cloud offerings have not yet done so. We provide motivation for providers to treat provenance as first class data in the cloud and based on our experience with provenance in a local storage system, suggest a set of requirements that make provenance feasible and attractive.
- Pubchem. http://pubchem.ncbi.nlm.nih.gov/.Google Scholar
- Genbank. Nucleic Acids Research, 36 (Database Issue), January 2008.Google Scholar
- I. Adams, D.D.E. Long, E.L. Miller, S. Pasupathy, and M.W. Storer. Maximizing efficiency by trading storage for computation. 2009.Google Scholar
- U. Braun, A. Shinnar, and M. Seltzer. Securing Provenance. In Proceedings of HotSec 2008, July 2008. Google ScholarDigital Library
- P. Buneman, S. Khanna, and W. Tan. Why and Where: A Characterization of Data Provenance. In International Conference on Database Theory, London, UK, Jan. 2001. Google ScholarDigital Library
- F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. Gruber. Bigtable: A distributed storage system for structured data. In 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2006. Google ScholarDigital Library
- C. Dagdigian. Plenery Keynote: Bio.IT World. http://blog.bioteam.net/wp-content/uploads/2009/04/bioitworld-2009-keynote-cdagdigian.pdf.Google Scholar
- J. Griffioen and R. Appleton. Reducing file system latency using a predictive approach. In USTC'94: Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference, pages 13--13, Berkeley, CA, USA, 1994. USENIX Association. Google ScholarDigital Library
- R. Hasan, R. Sion, and M. Winslett. The Case of the Fake Picasso: Preventing History Forgery with Secure Provenance. In FAST, 2009. Google ScholarDigital Library
- D.A. Holland, U. Braun, D. Maclean, K.-K. Muniswamy-Reddy, and M.I. Seltzer. A Data Model and Query Language Suitable for Provenance. In Proceedings of the 2008 International Provenance and Annotation Workshop (IPAW).Google Scholar
- Nirvanix internet media file system. http://developer.nirvanix.com/sitefiles/1000/API.html.Google Scholar
- S.T. King, Z.M. Mao, D.G. Lucchetti, and P.M. Chen. Enriching intrusion alerts through multi-host causality. In the 12th Annual Network and Distributed System Security Symposium, 2005.Google Scholar
- T.M. Kroeger and D.D.E. Long. Predicting file system actions from prior events. In ATEC '96: Proceedings of the 1996 annual conference on USENIX Annual Technical Conference, pages 26--26, Berkeley, CA, USA, 1996. USENIX Association. Google ScholarDigital Library
- G.H. Kuenning. The design of the seer predictive caching system. In In Proceedings of the Workshop on Mobile Computing Systems and Applications, pages 37--43, 1994. Google ScholarDigital Library
- The LINQ project. http://msdn.microsoft.com/en-us/vcsharp/aa904594.aspx.Google Scholar
- L. Moreau, B. Plale, S. Miles, C. Goble, P. Missier, R. Barga, Y. Simmhan, J. Futrelle, R.E. McGrath, J. Myers, P. Paulson, S. Bowers, B. Ludaescher, N. Kwasnikowska, J.V. den Bussche, T. Ellkvist, and J.F.P. Groth. The open provenance model (v1.01). http://eprints.ecs.soton.ac.uk/16148/1/opm-v1.01.pdf.Google Scholar
- K.-K. Muniswamy-Reddy, U. Braun, D.A. Holland, P. Macko, D. Maclean, D. Margo, M. Seltzer, and R. Smogor. Layering in provenance systems. In Proceedings of the 2009 USENIX Annual Technical Conference. Google ScholarDigital Library
- K.-K. Muniswamy-Reddy, D.A. Holland, U. Braun, and M. Seltzer. Provenance-aware storage systems. In Proceedings of the 2006 USENIX Annual Technical Conference. Google ScholarDigital Library
- K.-K. Muniswamy-Reddy, P. Macko, and M. Seltzer. Making a cloud provenance-aware. In 1st Workshop on the Theory and Practice of Provenance, 2009. Google ScholarDigital Library
- Data Dictionary for Preservation Metadata. http://www.oclc.org/research/projects/pmwg/premis-final.pdf, May 2005.Google Scholar
- Amazon Simple Storage Service (Amazon S3). http://aws.amazon.com/s3.Google Scholar
- Amazon SimpleDB. http://aws.amazon.com/simpledb.Google Scholar
- S. Shah, C.A.N. Soules, G.R. Ganger, and B.D. Noble. Using provenance to aid in personal file search. In Proceedings of the USENIX Annual Technical Conference, 2007. Google ScholarDigital Library
- A. Somayaji and S. Forrest. Automated Response Using System-Call Delays. In USENIX Security Symposium, 2000. Google ScholarDigital Library
- J. Widom. Trio: A system for data, uncertainty, and lineage. In Managing and Mining Uncertain Data. Springer, 2008.Google Scholar
Index Terms
- Provenance as first class cloud data
Recommendations
Provenance for the cloud
FAST'10: Proceedings of the 8th USENIX conference on File and storage technologiesThe cloud is poised to become the next computing environment for both data storage and computation due to its pay-as-you-go and provision-as-you-go models. Cloud storage is already being used to back up desktop user data, host shared scientific data, ...
Making a cloud provenance-aware
TAPP'09: First workshop on on Theory and practice of provenanceThe advent of cloud computing provides a cheap and convenient mechanism for scientists to share data. The utility of such data is obviously enhanced when the provenance of the data is also available. The cloud, while convenient for storing data, is not ...
Securing data provenance in the cloud
iNetSec'11: Proceedings of the 2011 IFIP WG 11.4 international conference on Open Problems in Network SecurityCloud storage offers the flexibility of accessing data from anywhere at any time while providing economical benefits and scalability. However, cloud stores lack the ability to manage data provenance. Data provenance describes how a particular piece of ...
Comments