Abstract
As increasing amounts of valuable information are produced and persist digitally, the ability to determine the origin of data becomes important. In science, medicine, commerce, and government, data provenance tracking is essential for rights protection, regulatory compliance, management of intelligence and medical data, and authentication of information as it flows through workplace tasks. While significant research has been conducted in this area, the associated security and privacy issues have not been explored, leaving provenance information vulnerable to illicit alteration as it passes through untrusted environments.
In this article, we show how to provide strong integrity and confidentiality assurances for data provenance information at the kernel, file system, or application layer. We describe Sprov, our provenance-aware system prototype that implements provenance tracking of data writes at the application layer, which makes Sprov extremely easy to deploy. We present empirical results that show that, for real-life workloads, the runtime overhead of Sprov for recording provenance with confidentiality and integrity guarantees ranges from 1% to 13%, when all file modifications are recorded, and from 12% to 16%, when all file read and modifications are tracked.
- Agrawal, N., Bolosky, W. J., Douceur, J. R., and Lorch, J. R. 2007. A five-year study of file-system metadata. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Berkeley, CA. Google ScholarDigital Library
- Aldeco-Perez, R. and Moreau, L. 2008. Provenance-based Auditing of Private Data Use. In Proceedings of the BCS International Academic Research Conference, Visions of Computer Science. Google ScholarDigital Library
- Barga, R. S. and Digiampietri, L. A. 2006. Automatic generation of workflow provenance. Lecture Notes in Computer Science, vol. 4145, L. Moreau and I. T. Foster Eds, Springer, 1--9. Google ScholarDigital Library
- Berliner, B. 1990. CVS II: parallelizing software development. In Proceedings of the Winter USENIX Conference. USENIX Assoc., Berkeley, CA, 341--352.Google Scholar
- Blum, M. 1981. Coin flipping by telephone. In Proceedings of the International Cryptology Conference (CRYPTO). 11--15.Google Scholar
- Braun, U., Garfinkel, S. L., Holland, D. A., Muniswamy-Reddy, K.-K., and Seltzer, M. I. 2006. Issues in automatic provenance collection. Lecture Notes in Computer Science, vol. 4145, L. Moreau and I. T. Foster Eds., I. T. Foster Eds, Springer, 171--183. Google ScholarDigital Library
- Braun, U., Shinnar, A., and Seltzer, M. 2008. Securing provenance. In Proceedings of the 3rd USENIX Workshop on Hot Topics in Security (USENIX HotSec). USENIX Association, Berkeley, CA. Google ScholarDigital Library
- Buneman, P., Chapman, A., and Cheney, J. 2006a. Provenance management in curated databases. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). ACM Press, New York, NY, 539--550. Google ScholarDigital Library
- Buneman, P., Chapman, A., Cheney, J., and Vansummeren, S. 2006b. A provenance model for manually curated data. Lecture Notes in Computer Science, vol. 4145, L. Morean and I. T. Foster Eds., Springer, 162--170. Google ScholarDigital Library
- Buneman, P., Khanna, S., and Tan, W. C. 2000. Data provenance: Some basic issues. In Proceedings of the 20th Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS). Springer-Verlag, 87--93. Google ScholarDigital Library
- Buneman, P., Khanna, S., and Tan, W. C. 2001. Why and where: A characterization of data provenance. Lecture Notes in Computer Science, vol. 1973, 316--330. Google ScholarDigital Library
- Centers for Medicare&Medicaid Services. 1996. The Health Insurance Portability and Accountability Act of 1996 (HIPAA). Online at http://www.cms.hhs.gov/hipaa/.Google Scholar
- Chapman, A., Jagadish, H., and Ramanan, P. 2008. Efficient provenance storage. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). ACM Press. Google ScholarDigital Library
- Collins-Sussman, B. 2002. The subversion project: buiding a better CVS. Linux J. 94, 3. Google ScholarDigital Library
- Congress of the United States. 1999. Gramm-Leach-Bliley Financial Services Modernization Act. Public. Law No. 106-102, 113 Stat. 1338.Google Scholar
- Douceur, J. R. and Bolosky, W. J. 1999. A large-scale study of file-system contents. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. ACM New York, NY, 59--70. Google ScholarDigital Library
- Ellard, D., Ledlie, J., Malkani, P., and Seltzer, M. 2003. Passive NFS tracing of email and research workloads. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Berkeley, CA, 203--216. Google ScholarDigital Library
- Foster, I. T., Vockler, J.-S., Wilde, M., and Zhao, Y. 2002. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management (SSDBM'02). IEEE Computer Society, Los Alamitos, CA, 37--46. Google ScholarDigital Library
- Frew, J. and Bose, R. 2001. Earth system science workbench: A data management infrastructure for earth science products. In Proceedings of the 13th International Conference on Scientific and Statistical Database Management (SSDBM'01). IEEE Computer Society, Los Alamitos, CA, 180. Google ScholarDigital Library
- Gennaro, R. and Rohatgi, P. 2001. How to sign digital streams. Inform. Comput. 165, 1, 100 --116. Google ScholarDigital Library
- Goble, C. 2002. Position statement: Musings on provenance, workflow workflow and (semantic web) annotations for bioinformatics. In Proceedings of the Workshop on Data Derivation and Provenance.Google Scholar
- Golbeck, J. 2006. Combining provenance with trust in social networks for semantic web content filtering. Lecture Notes in Computer Science, vol. 4145, I. Moreau and I. T. Foster Eds, Springer, 101--108. Google ScholarDigital Library
- Golle, P. and Modadugu, N. 2001. Authenticating streamed data in the presence of random packet loss. In Proceedings of the Symposium on Network and Distributed Systems Security (NDSS), 13--22.Google Scholar
- Halevy, D. and Shamir, A. 2002. The LSD broadcast encryption scheme. In Proceedings of the 22nd Annual International Cryptology Conference on Advances in Cryptology (CRYPTOZ). Springer-Verlag, 47--60. Google ScholarDigital Library
- Hasan, R., Sion, R., and Winslett, M. 2007. Introducing secure provenance: problems and challenges. In Proceedings of the ACM Workshop on Storage Security and Survivability (StorageSS). ACM Press, New York, NY, 13--18. Google ScholarDigital Library
- Hasan, R., Sion, R., and Winslett, M. 2009. The case of the fake Picasso: Preventing history forgery with secure provenance. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Berkeley, CA. Google ScholarDigital Library
- Katcher, J. 1997. Postmark: a new file system benchmark. Network Appliance Tech. rep. TR3022.Google Scholar
- Kogan, N., Shavitt, Y., and Wool, A. 2006. A practical revocation scheme for broadcast encryption using smartcards. ACM Trans. Inform. Syst. Secur. 9, 3, 325--351. Google ScholarDigital Library
- Leung, A. W., Pasupathy, S., Goodson, G., and Miller, E. L. 2008. Measurement and analysis of large-scale network file system workloads. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, Berkeley, CA, 213--226. Google ScholarDigital Library
- Loeliger, J. 2006. Collaborating with GIT. Linux Mag.Google Scholar
- Lynch, C. A. 2001. When documents deceive: Trust and provenance as new factors for information retrieval in a tangled web. J. Amer. Soci. Inform. Sci. Tech. 52, 1, 12--17. Google ScholarDigital Library
- Maniatis, P. and Baker, M. 2002. Secure history preservation through timeline entanglement. In Proceedings of the 11th USENIX Security Symposium. USENIX Association, Berkeley, CA, 297--312. Google ScholarDigital Library
- Mella, G., Ferrari, E., Bertino, E., and Koglin, Y. 2006. Controlled and cooperative updates of XML documents in Byzantine and failure-prone distributed systems. ACM Trans. Inform. Syst. Secur. 9, 4, 421--460. Google ScholarDigital Library
- Miner, S. K. and Staddon, J. 2001. Graph-based authentication of digital streams. In Proceedings of the IEEE Symposium on Security and Privacy. 232--246. Google ScholarDigital Library
- Moreau, L., Freire, J., Futrelle, J., McGrath, R. E., Myers, J., and Paulson, P. 2008. The open provenance model: An overview. In Proceedings of the International Provenance and Associations Workshop (IPAW), J. Freire, D. Koop, and L. Moreau, Eds. Lecture Notes in Computer Science, vol. 5272. Springer, 323--326. Google ScholarDigital Library
- Moreau, L., Groth, P., Miles, S., Vazquez-Salceda, J., Ibbotson, J., Jiang, S., Munroe, S., Rana, O., Schreiber, A., Tan, V., and Varga, L. 2008. The provenance of electronic data. Comm. ACM 51, 4, 52--58. Google ScholarDigital Library
- Muniswamy-Reddy, K.-K., Braun, U., Holland, D. A., Macko, P., Maclean, D., Margo, D., Seltzer, M., and Smogor, R. 2009. Layering in provenance systems. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, Berkeley, CA. Google ScholarDigital Library
- Muniswamy-Reddy, K.-K., Holland, D. A., Braun, U., and Seltzer, M. I. 2006. Provenance-aware storage systems. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, Berkeley, CA, 43--56. Google ScholarDigital Library
- Myers, J. D., Allison, T. C., Bittner, S., Didier, B., Frenklach, M., William, H., Green, J., Ho, Y.-L., Hewson, J., Koegler, W., Lansing, C., Leahy, D., Lee, M., McCoy, R., Minkoff, M., Nijsure, S., von Laszewski, G., Montoya, D., Pancerella, C., Pinzon, R., Pitz, W., Rahn, L. A., Ruscic, B., Schuchardt, K., Stephan, E., Wagner, A., Windus, T., and Yang, C. 2004. A collaborative informatics infrastructure for multi-scale science. In Proceedings of the 2nd International Workshop on Challenges of Large Applications in Distributed Environments (CLADE'04). IEEE Computer Society, Los Alamitos, CA, 24. Google ScholarDigital Library
- Perrig, A., Canetti, R., Tygar, D., and Song, D. X. 2000. Efficient authentication and signing of multicast streams over lossy channels. In Proceedings of the IEEE Symposium on Security and Privacy. 56--73. Google ScholarDigital Library
- Peterson, Z. N. J., Burns, R., Ateniese, G., and Bono, S. 2007. Design and implementation of verifiable audit trails for a versioning file system. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Berkeley, CA, 20. Google ScholarDigital Library
- Pugh, W. 1990. Skip lists: a probabilistic alternative to balanced trees. Comm. ACM 33, 6, 668--676. Google ScholarDigital Library
- Roselli, D., Lorch, J. R., and Anderson, T. E. 2000. A comparison of file system workloads. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, Berkeley, CA. Google ScholarDigital Library
- Sandler, D. and Wallach, D. S. 2007. Casting votes in the auditorium. In Proceedings of the USENIX Workshop on Accurate Electronic Voting Technology. USENIX Association, Berkeley, CA. Google ScholarDigital Library
- Sar, C. and Cao, P. 2005. Lineage file system. http://crypto.stanford.edu/cao/lineage.html.Google Scholar
- Schneier, B. and Kelsey, J. 1999. Secure audit logs to support computer forensics. ACM Trans. Inform. Syst. Secur. 2, 2, 159--176. Google ScholarDigital Library
- Seltzer, M., Smith, K. A., Balakrishnan, H., Chang, J., McMains, S., and Padmanabhan, V. 1995. File system logging versus clustering: a performance comparison. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, Berkeley, CA, 21. Google ScholarDigital Library
- Shamir, A. 1979. How to share a secret. Comm. ACM 22, 11, 612--613. Google ScholarDigital Library
- Simmhan, Y. L., Plale, B., and Gannon, D. 2005. A survey of data provenance in e-science. SIGMOD Rec. 34, 3, 31--36. Google ScholarDigital Library
- Snodgrass, R. T., Yao, S. S., and Collberg, C. 2004. Tamper detection in audit logs. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB). VLDB Endowment, Toronto, Canada, 504--515. Google ScholarDigital Library
- Szomszor, M. and Moreau, L. 2003. Recording and reasoning over data provenance in web and grid services. In Proceedings of the International Conference on Ontologies, Databases and Applications of SEmantics (ODBASE). Lecture Notes in Computer Science, vol. 2888. 603--620.Google ScholarCross Ref
- Tan, V., Groth, P., Miles, S., Jiang, S., Munroe, S., Tsasakou, S., and Moreau, L. 2006. Security issues in a SOA-based provenance system. Lecture Notes in Computer Science, vol. 4145, I. Moreau and I. T. Foster Eds, Springer, 203--211. Google ScholarDigital Library
- Tan, V., Munroe, S., Groth, P., Jiang, S., Miles, S., and Moreau, L. 2006. A profile for non-repudiable process documentation. Tech. rep., University of Southampton, http://eprints.ecs.soton.ac.uk/13054/.Google Scholar
- U. S. Securities and Exchange Commission. 2003. Rule 17a-3&4, 17 CFR Part 240: Electronic Storage of Broker-Dealer Records. http://edocket.access.gpo.gov/cfr_2002/aprqtr/17cfr240.17a-4.htm.Google Scholar
- U.S. Public Law No. 107-204, 116 Stat. 745. 2002. Public Company Accounting Reform and Investor Protection Act.Google Scholar
- Vijayakumar, N. N. and Plale, B. 2006. Towards low overhead provenance tracking in near real-time stream filtering. Lecture Notes in Computer Science, vol. 4145, I. Moreau and I. T. Foster Eds, Springer, 46--54. Google ScholarDigital Library
- Widom, J. 2005. Trio: A system for integrated management of data, accuracy, and lineage. In Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research (CIDR'05).Google Scholar
- Zhao, J., Goble, C. A., Stevens, R., and Bechhofer, S. 2004. Semantically linking and browsing provenance logs for e-science. In Proceedings of the 1st International IFIP Conference on Semantics of a Networked World (ICSNW). 158--176.Google Scholar
Index Terms
- Preventing history forgery with secure provenance
Recommendations
Introducing secure provenance: problems and challenges
StorageSS '07: Proceedings of the 2007 ACM workshop on Storage security and survivabilityData provenance summarizes the history of the ownership of the item, as well as the actions performed on it. While widely used in archives, art, and archeology, provenance is also very important in forensics, scientific computing, and legal proceedings ...
Towards secure provenance in the cloud: a survey
UCC '15: Proceedings of the 8th International Conference on Utility and Cloud ComputingProvenance information are meta-data that summarize the history of the creation and the actions performed on an artefact e.g. data, process etc. Secure provenance is essential to improve data forensics, ensure accountability and increase the trust in ...
Context-aware security in the internet of things: a survey
Internet of things (IoT) applications encompass home-automation, health, transportation, etc. The main objective of these applications is to improve user's lives. However, security and privacy threats and the lack of adapted security mechanisms could ...
Comments