skip to main content
research-article

Preventing history forgery with secure provenance

Authors Info & Claims
Published:14 December 2009Publication History
Skip Abstract Section

Abstract

As increasing amounts of valuable information are produced and persist digitally, the ability to determine the origin of data becomes important. In science, medicine, commerce, and government, data provenance tracking is essential for rights protection, regulatory compliance, management of intelligence and medical data, and authentication of information as it flows through workplace tasks. While significant research has been conducted in this area, the associated security and privacy issues have not been explored, leaving provenance information vulnerable to illicit alteration as it passes through untrusted environments.

In this article, we show how to provide strong integrity and confidentiality assurances for data provenance information at the kernel, file system, or application layer. We describe Sprov, our provenance-aware system prototype that implements provenance tracking of data writes at the application layer, which makes Sprov extremely easy to deploy. We present empirical results that show that, for real-life workloads, the runtime overhead of Sprov for recording provenance with confidentiality and integrity guarantees ranges from 1% to 13%, when all file modifications are recorded, and from 12% to 16%, when all file read and modifications are tracked.

References

  1. Agrawal, N., Bolosky, W. J., Douceur, J. R., and Lorch, J. R. 2007. A five-year study of file-system metadata. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aldeco-Perez, R. and Moreau, L. 2008. Provenance-based Auditing of Private Data Use. In Proceedings of the BCS International Academic Research Conference, Visions of Computer Science. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Barga, R. S. and Digiampietri, L. A. 2006. Automatic generation of workflow provenance. Lecture Notes in Computer Science, vol. 4145, L. Moreau and I. T. Foster Eds, Springer, 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Berliner, B. 1990. CVS II: parallelizing software development. In Proceedings of the Winter USENIX Conference. USENIX Assoc., Berkeley, CA, 341--352.Google ScholarGoogle Scholar
  5. Blum, M. 1981. Coin flipping by telephone. In Proceedings of the International Cryptology Conference (CRYPTO). 11--15.Google ScholarGoogle Scholar
  6. Braun, U., Garfinkel, S. L., Holland, D. A., Muniswamy-Reddy, K.-K., and Seltzer, M. I. 2006. Issues in automatic provenance collection. Lecture Notes in Computer Science, vol. 4145, L. Moreau and I. T. Foster Eds., I. T. Foster Eds, Springer, 171--183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Braun, U., Shinnar, A., and Seltzer, M. 2008. Securing provenance. In Proceedings of the 3rd USENIX Workshop on Hot Topics in Security (USENIX HotSec). USENIX Association, Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Buneman, P., Chapman, A., and Cheney, J. 2006a. Provenance management in curated databases. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). ACM Press, New York, NY, 539--550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Buneman, P., Chapman, A., Cheney, J., and Vansummeren, S. 2006b. A provenance model for manually curated data. Lecture Notes in Computer Science, vol. 4145, L. Morean and I. T. Foster Eds., Springer, 162--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Buneman, P., Khanna, S., and Tan, W. C. 2000. Data provenance: Some basic issues. In Proceedings of the 20th Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS). Springer-Verlag, 87--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Buneman, P., Khanna, S., and Tan, W. C. 2001. Why and where: A characterization of data provenance. Lecture Notes in Computer Science, vol. 1973, 316--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Centers for Medicare&Medicaid Services. 1996. The Health Insurance Portability and Accountability Act of 1996 (HIPAA). Online at http://www.cms.hhs.gov/hipaa/.Google ScholarGoogle Scholar
  13. Chapman, A., Jagadish, H., and Ramanan, P. 2008. Efficient provenance storage. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Collins-Sussman, B. 2002. The subversion project: buiding a better CVS. Linux J. 94, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Congress of the United States. 1999. Gramm-Leach-Bliley Financial Services Modernization Act. Public. Law No. 106-102, 113 Stat. 1338.Google ScholarGoogle Scholar
  16. Douceur, J. R. and Bolosky, W. J. 1999. A large-scale study of file-system contents. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. ACM New York, NY, 59--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ellard, D., Ledlie, J., Malkani, P., and Seltzer, M. 2003. Passive NFS tracing of email and research workloads. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Berkeley, CA, 203--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Foster, I. T., Vockler, J.-S., Wilde, M., and Zhao, Y. 2002. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management (SSDBM'02). IEEE Computer Society, Los Alamitos, CA, 37--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Frew, J. and Bose, R. 2001. Earth system science workbench: A data management infrastructure for earth science products. In Proceedings of the 13th International Conference on Scientific and Statistical Database Management (SSDBM'01). IEEE Computer Society, Los Alamitos, CA, 180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gennaro, R. and Rohatgi, P. 2001. How to sign digital streams. Inform. Comput. 165, 1, 100 --116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Goble, C. 2002. Position statement: Musings on provenance, workflow workflow and (semantic web) annotations for bioinformatics. In Proceedings of the Workshop on Data Derivation and Provenance.Google ScholarGoogle Scholar
  22. Golbeck, J. 2006. Combining provenance with trust in social networks for semantic web content filtering. Lecture Notes in Computer Science, vol. 4145, I. Moreau and I. T. Foster Eds, Springer, 101--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Golle, P. and Modadugu, N. 2001. Authenticating streamed data in the presence of random packet loss. In Proceedings of the Symposium on Network and Distributed Systems Security (NDSS), 13--22.Google ScholarGoogle Scholar
  24. Halevy, D. and Shamir, A. 2002. The LSD broadcast encryption scheme. In Proceedings of the 22nd Annual International Cryptology Conference on Advances in Cryptology (CRYPTOZ). Springer-Verlag, 47--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hasan, R., Sion, R., and Winslett, M. 2007. Introducing secure provenance: problems and challenges. In Proceedings of the ACM Workshop on Storage Security and Survivability (StorageSS). ACM Press, New York, NY, 13--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hasan, R., Sion, R., and Winslett, M. 2009. The case of the fake Picasso: Preventing history forgery with secure provenance. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Katcher, J. 1997. Postmark: a new file system benchmark. Network Appliance Tech. rep. TR3022.Google ScholarGoogle Scholar
  28. Kogan, N., Shavitt, Y., and Wool, A. 2006. A practical revocation scheme for broadcast encryption using smartcards. ACM Trans. Inform. Syst. Secur. 9, 3, 325--351. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Leung, A. W., Pasupathy, S., Goodson, G., and Miller, E. L. 2008. Measurement and analysis of large-scale network file system workloads. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, Berkeley, CA, 213--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Loeliger, J. 2006. Collaborating with GIT. Linux Mag.Google ScholarGoogle Scholar
  31. Lynch, C. A. 2001. When documents deceive: Trust and provenance as new factors for information retrieval in a tangled web. J. Amer. Soci. Inform. Sci. Tech. 52, 1, 12--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Maniatis, P. and Baker, M. 2002. Secure history preservation through timeline entanglement. In Proceedings of the 11th USENIX Security Symposium. USENIX Association, Berkeley, CA, 297--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Mella, G., Ferrari, E., Bertino, E., and Koglin, Y. 2006. Controlled and cooperative updates of XML documents in Byzantine and failure-prone distributed systems. ACM Trans. Inform. Syst. Secur. 9, 4, 421--460. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Miner, S. K. and Staddon, J. 2001. Graph-based authentication of digital streams. In Proceedings of the IEEE Symposium on Security and Privacy. 232--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Moreau, L., Freire, J., Futrelle, J., McGrath, R. E., Myers, J., and Paulson, P. 2008. The open provenance model: An overview. In Proceedings of the International Provenance and Associations Workshop (IPAW), J. Freire, D. Koop, and L. Moreau, Eds. Lecture Notes in Computer Science, vol. 5272. Springer, 323--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Moreau, L., Groth, P., Miles, S., Vazquez-Salceda, J., Ibbotson, J., Jiang, S., Munroe, S., Rana, O., Schreiber, A., Tan, V., and Varga, L. 2008. The provenance of electronic data. Comm. ACM 51, 4, 52--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Muniswamy-Reddy, K.-K., Braun, U., Holland, D. A., Macko, P., Maclean, D., Margo, D., Seltzer, M., and Smogor, R. 2009. Layering in provenance systems. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Muniswamy-Reddy, K.-K., Holland, D. A., Braun, U., and Seltzer, M. I. 2006. Provenance-aware storage systems. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, Berkeley, CA, 43--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Myers, J. D., Allison, T. C., Bittner, S., Didier, B., Frenklach, M., William, H., Green, J., Ho, Y.-L., Hewson, J., Koegler, W., Lansing, C., Leahy, D., Lee, M., McCoy, R., Minkoff, M., Nijsure, S., von Laszewski, G., Montoya, D., Pancerella, C., Pinzon, R., Pitz, W., Rahn, L. A., Ruscic, B., Schuchardt, K., Stephan, E., Wagner, A., Windus, T., and Yang, C. 2004. A collaborative informatics infrastructure for multi-scale science. In Proceedings of the 2nd International Workshop on Challenges of Large Applications in Distributed Environments (CLADE'04). IEEE Computer Society, Los Alamitos, CA, 24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Perrig, A., Canetti, R., Tygar, D., and Song, D. X. 2000. Efficient authentication and signing of multicast streams over lossy channels. In Proceedings of the IEEE Symposium on Security and Privacy. 56--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Peterson, Z. N. J., Burns, R., Ateniese, G., and Bono, S. 2007. Design and implementation of verifiable audit trails for a versioning file system. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Berkeley, CA, 20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Pugh, W. 1990. Skip lists: a probabilistic alternative to balanced trees. Comm. ACM 33, 6, 668--676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Roselli, D., Lorch, J. R., and Anderson, T. E. 2000. A comparison of file system workloads. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Sandler, D. and Wallach, D. S. 2007. Casting votes in the auditorium. In Proceedings of the USENIX Workshop on Accurate Electronic Voting Technology. USENIX Association, Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Sar, C. and Cao, P. 2005. Lineage file system. http://crypto.stanford.edu/cao/lineage.html.Google ScholarGoogle Scholar
  46. Schneier, B. and Kelsey, J. 1999. Secure audit logs to support computer forensics. ACM Trans. Inform. Syst. Secur. 2, 2, 159--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Seltzer, M., Smith, K. A., Balakrishnan, H., Chang, J., McMains, S., and Padmanabhan, V. 1995. File system logging versus clustering: a performance comparison. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, Berkeley, CA, 21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Shamir, A. 1979. How to share a secret. Comm. ACM 22, 11, 612--613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Simmhan, Y. L., Plale, B., and Gannon, D. 2005. A survey of data provenance in e-science. SIGMOD Rec. 34, 3, 31--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Snodgrass, R. T., Yao, S. S., and Collberg, C. 2004. Tamper detection in audit logs. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB). VLDB Endowment, Toronto, Canada, 504--515. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Szomszor, M. and Moreau, L. 2003. Recording and reasoning over data provenance in web and grid services. In Proceedings of the International Conference on Ontologies, Databases and Applications of SEmantics (ODBASE). Lecture Notes in Computer Science, vol. 2888. 603--620.Google ScholarGoogle ScholarCross RefCross Ref
  52. Tan, V., Groth, P., Miles, S., Jiang, S., Munroe, S., Tsasakou, S., and Moreau, L. 2006. Security issues in a SOA-based provenance system. Lecture Notes in Computer Science, vol. 4145, I. Moreau and I. T. Foster Eds, Springer, 203--211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Tan, V., Munroe, S., Groth, P., Jiang, S., Miles, S., and Moreau, L. 2006. A profile for non-repudiable process documentation. Tech. rep., University of Southampton, http://eprints.ecs.soton.ac.uk/13054/.Google ScholarGoogle Scholar
  54. U. S. Securities and Exchange Commission. 2003. Rule 17a-3&4, 17 CFR Part 240: Electronic Storage of Broker-Dealer Records. http://edocket.access.gpo.gov/cfr_2002/aprqtr/17cfr240.17a-4.htm.Google ScholarGoogle Scholar
  55. U.S. Public Law No. 107-204, 116 Stat. 745. 2002. Public Company Accounting Reform and Investor Protection Act.Google ScholarGoogle Scholar
  56. Vijayakumar, N. N. and Plale, B. 2006. Towards low overhead provenance tracking in near real-time stream filtering. Lecture Notes in Computer Science, vol. 4145, I. Moreau and I. T. Foster Eds, Springer, 46--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Widom, J. 2005. Trio: A system for integrated management of data, accuracy, and lineage. In Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research (CIDR'05).Google ScholarGoogle Scholar
  58. Zhao, J., Goble, C. A., Stevens, R., and Bechhofer, S. 2004. Semantically linking and browsing provenance logs for e-science. In Proceedings of the 1st International IFIP Conference on Semantics of a Networked World (ICSNW). 158--176.Google ScholarGoogle Scholar

Index Terms

  1. Preventing history forgery with secure provenance

            Recommendations

            Reviews

            Brad D. Reid

            Provenance records provide the history of an item. But how does one determine if the provenance records themselves have been altered__?__ Hasan, Sion, and Winslett "show how to provide strong integrity and confidentiality assurance for data provenance information at the kernel, file system, or application layer." Individuals involved in file management and security will want to study this valuable research. There are five issues involved in provenance: completeness, integrity, availability, confidentiality, and efficiency. The authors investigate ways to address each of these issues. They acknowledge that any tracking system is only as good as the level at which it operates. After discussing the threat models, the authors "propose a solution composed of several layered components: encryption for sensitive provenance chain record fields, the checksum-based approach for chain records, and an incremental chained signature mechanism" for the entire chain. They adequately discuss the technical aspects of these methods, reduced to definitions, and provide an approach to detailed control over confidentiality, as well as theorems to provide proofs of correctness. There is a good discussion of ways "to verify that the history recorded in a provenance chain is its actual history, not just a plausible history." This is a very significant issue. The high point of the paper is a discussion of the implementation of a prototype application layer C library, Sprov. It consists of wrapper functions for the standard file input/output (I/O) library stdio.h, and tracks document changes at the file level. The experiments presented are detailed and indicate results in real-life workloads: "runtime overhead of Sprov for recording provenance with confidentiality and integrity guarantees ranges from one to 13 percent, when all file modifications are recorded, and from 12 to 16 percent, when all file read[s] and modifications are tracked." The authors suggest several avenues for future research, including how to pass information between the provenance collection system and higher software levels, and creating specialized support for small additions to large files. The ultimate goal is the creation of inexpensive provenance methods that are seldom needed, because of the barriers to fraud that are in place. The paper has a detailed reference list and includes other researchers' investigations. It is a well-written presentation of an important topic. Online Computing Reviews Service

            Access critical reviews of Computing literature here

            Become a reviewer for Computing Reviews.

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Storage
              ACM Transactions on Storage  Volume 5, Issue 4
              December 2009
              155 pages
              ISSN:1553-3077
              EISSN:1553-3093
              DOI:10.1145/1629080
              Issue’s Table of Contents

              Copyright © 2009 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 14 December 2009
              • Accepted: 1 August 2009
              • Received: 1 May 2009
              Published in tos Volume 5, Issue 4

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader