skip to main content
article

On scaling up sensitive data auditing

Published:01 March 2013Publication History
Skip Abstract Section

Abstract

This paper studies the following problem: given (1) a query and (2) a set of sensitive records, find the subset of records "accessed" by the query. The notion of a query accessing a single record is adopted from prior work. There are several scenarios where the number of sensitive records is large (in the millions). The novel challenge addressed in this work is to develop a general-purpose solution for complex SQL that scales in the number of sensitive records. We propose efficient techniques that improves upon straightforward alternatives by orders of magnitude. Our empirical evaluation over the TPC-H benchmark data illustrates the benefits of our techniques.

References

  1. R. Agrawal, R. J. Bayardo, C. Faloutsos, J. Kiernan, R. Rantzau, and R. Srikant. Auditing compliance with a hippocratic database. In VLDB, pages 516-527, 2004. Google ScholarGoogle Scholar
  2. Yael Amsterdamer, Daniel Deutch, and Val Tannen. Provenance for aggregate queries. In PODS, 2011. Google ScholarGoogle Scholar
  3. D. Bhagwat, L. Chiticariu, W. Tan, and G. Vijayvargiya. An annotation management system for relational databases. In VLDB, 2004. Google ScholarGoogle Scholar
  4. Privacy Rights Clearinghouse. Chronology of data breaches. http://www.privacyrights.org/data-breach.Google ScholarGoogle Scholar
  5. J. Cheney, L. Chiticariu, and W. C. Tan. Provenance in databases: Why, how, and where. Foundations and Trends in Databases, 1(4), 2009. Google ScholarGoogle Scholar
  6. F. Geerts, A. Kementsietsidis, and D. Milano. MONDRIAN: Annotating and querying databases through colors and blocks. In ICDE, 2006. Google ScholarGoogle Scholar
  7. G. Giannikis, G. Alonso, and D. Kossmann. SharedDB: Killing one thousand queries with one stone. PVLDB, 5(6), 2012. Google ScholarGoogle Scholar
  8. B. Glavic. Perm: Efficient Provenance Support for Relational Databases. PhD thesis, University of Zurich, 2010.Google ScholarGoogle Scholar
  9. B. Glavic and K. R. Dittrich. Data provenance: A categorization of existing approaches. In BTW, 2007.Google ScholarGoogle Scholar
  10. B. Glavic and R. J. Miller. Reexamining some holy grails of data provenance. In TaPP'11: 3rd USENIX Workshop on the Theory and Practice of Provenance, 2011.Google ScholarGoogle Scholar
  11. T. J. Green, G. Karvounarakis, Z. G. Ives, and V. Tannen. Provenance in ORCHESTRA. IEEE Data Eng. Bull., 33(3), 2010.Google ScholarGoogle Scholar
  12. Todd J. Green, Gregory Karvounarakis, and Val Tannen. Provenance semirings. In PODS, 2007. Google ScholarGoogle Scholar
  13. R. Ikeda and J. Widom. Panda: A system for provenance and data. IEEE Data Eng. Bull., 33(3), 2010.Google ScholarGoogle Scholar
  14. R. Kaushik and R. Ramamurthy. Efficient auditing for complex sql queries. In SIGMOD, 2011. Google ScholarGoogle Scholar
  15. Butler Lampson. Privacy and security: Usable security: how to get it. Commun. ACM, 52(11):25-27, November 2009. Google ScholarGoogle Scholar
  16. A. Machanavajjhala and J. Gehrke. On the efficiency of checking perfect privacy. In PODS, 2006. Google ScholarGoogle Scholar
  17. A. Meliou, W. Gatterbauer, J. Y. Halpern, C. Koch, K. F. Moore, and D. Suciu. Causality in databases. IEEE Data Eng. Bull., 33(3), 2010.Google ScholarGoogle Scholar
  18. Microsoft Corporation. SQL Server 2008 Change Data Capture. http://msdn.microsoft.com/en-us/library/bb522489.aspx.Google ScholarGoogle Scholar
  19. G. Miklau and D. Suciu. A formal analysis of information disclosure in data exchange. In SIGMOD, 2004. Google ScholarGoogle Scholar
  20. R. Motwani, S. U. Nabar, and D. Thomas. Auditing sql queries. In ICDE, 2008. Google ScholarGoogle Scholar
  21. Oracle Corporation. Oracle Flashback Query. http://download.oracle.com/docs/cd/B28359_01/appdev.111/ b28424/adfns_flashback.htm.Google ScholarGoogle Scholar
  22. A. Das Sarma, O. Benjelloun, A. Y. Halevy, and J. Widom. Working models for uncertain data. In ICDE, 2006. Google ScholarGoogle Scholar
  23. P. Seshadri, H. Pirahesh, and T. Y. C. Leung. Complex query decorrelation. In ICDE, 1996. Google ScholarGoogle Scholar
  24. D. Suciu, D. Olteanu, C. Ré, and C. Koch. Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011. Google ScholarGoogle Scholar
  25. Germany Tackles Tax Evasion. Wall Street Journal, Feb 7 2010.Google ScholarGoogle Scholar
  26. The TPC-H Benchmark. http://www.tpc.org.Google ScholarGoogle Scholar
  27. D. J. Weitzner, H. Abelson, T. Berners-Lee, et al. Information accountability. Commun. ACM, 51(6):82-87, June 2008. Google ScholarGoogle Scholar

Index Terms

  1. On scaling up sensitive data auditing

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Proceedings of the VLDB Endowment
            Proceedings of the VLDB Endowment  Volume 6, Issue 5
            March 2013
            60 pages

            Publisher

            VLDB Endowment

            Publication History

            • Published: 1 March 2013
            Published in pvldb Volume 6, Issue 5

            Qualifiers

            • article
          • Article Metrics

            • Downloads (Last 12 months)5
            • Downloads (Last 6 weeks)1

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader