Abstract
This paper studies the following problem: given (1) a query and (2) a set of sensitive records, find the subset of records "accessed" by the query. The notion of a query accessing a single record is adopted from prior work. There are several scenarios where the number of sensitive records is large (in the millions). The novel challenge addressed in this work is to develop a general-purpose solution for complex SQL that scales in the number of sensitive records. We propose efficient techniques that improves upon straightforward alternatives by orders of magnitude. Our empirical evaluation over the TPC-H benchmark data illustrates the benefits of our techniques.
- R. Agrawal, R. J. Bayardo, C. Faloutsos, J. Kiernan, R. Rantzau, and R. Srikant. Auditing compliance with a hippocratic database. In VLDB, pages 516-527, 2004. Google Scholar
- Yael Amsterdamer, Daniel Deutch, and Val Tannen. Provenance for aggregate queries. In PODS, 2011. Google Scholar
- D. Bhagwat, L. Chiticariu, W. Tan, and G. Vijayvargiya. An annotation management system for relational databases. In VLDB, 2004. Google Scholar
- Privacy Rights Clearinghouse. Chronology of data breaches. http://www.privacyrights.org/data-breach.Google Scholar
- J. Cheney, L. Chiticariu, and W. C. Tan. Provenance in databases: Why, how, and where. Foundations and Trends in Databases, 1(4), 2009. Google Scholar
- F. Geerts, A. Kementsietsidis, and D. Milano. MONDRIAN: Annotating and querying databases through colors and blocks. In ICDE, 2006. Google Scholar
- G. Giannikis, G. Alonso, and D. Kossmann. SharedDB: Killing one thousand queries with one stone. PVLDB, 5(6), 2012. Google Scholar
- B. Glavic. Perm: Efficient Provenance Support for Relational Databases. PhD thesis, University of Zurich, 2010.Google Scholar
- B. Glavic and K. R. Dittrich. Data provenance: A categorization of existing approaches. In BTW, 2007.Google Scholar
- B. Glavic and R. J. Miller. Reexamining some holy grails of data provenance. In TaPP'11: 3rd USENIX Workshop on the Theory and Practice of Provenance, 2011.Google Scholar
- T. J. Green, G. Karvounarakis, Z. G. Ives, and V. Tannen. Provenance in ORCHESTRA. IEEE Data Eng. Bull., 33(3), 2010.Google Scholar
- Todd J. Green, Gregory Karvounarakis, and Val Tannen. Provenance semirings. In PODS, 2007. Google Scholar
- R. Ikeda and J. Widom. Panda: A system for provenance and data. IEEE Data Eng. Bull., 33(3), 2010.Google Scholar
- R. Kaushik and R. Ramamurthy. Efficient auditing for complex sql queries. In SIGMOD, 2011. Google Scholar
- Butler Lampson. Privacy and security: Usable security: how to get it. Commun. ACM, 52(11):25-27, November 2009. Google Scholar
- A. Machanavajjhala and J. Gehrke. On the efficiency of checking perfect privacy. In PODS, 2006. Google Scholar
- A. Meliou, W. Gatterbauer, J. Y. Halpern, C. Koch, K. F. Moore, and D. Suciu. Causality in databases. IEEE Data Eng. Bull., 33(3), 2010.Google Scholar
- Microsoft Corporation. SQL Server 2008 Change Data Capture. http://msdn.microsoft.com/en-us/library/bb522489.aspx.Google Scholar
- G. Miklau and D. Suciu. A formal analysis of information disclosure in data exchange. In SIGMOD, 2004. Google Scholar
- R. Motwani, S. U. Nabar, and D. Thomas. Auditing sql queries. In ICDE, 2008. Google Scholar
- Oracle Corporation. Oracle Flashback Query. http://download.oracle.com/docs/cd/B28359_01/appdev.111/ b28424/adfns_flashback.htm.Google Scholar
- A. Das Sarma, O. Benjelloun, A. Y. Halevy, and J. Widom. Working models for uncertain data. In ICDE, 2006. Google Scholar
- P. Seshadri, H. Pirahesh, and T. Y. C. Leung. Complex query decorrelation. In ICDE, 1996. Google Scholar
- D. Suciu, D. Olteanu, C. Ré, and C. Koch. Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011. Google Scholar
- Germany Tackles Tax Evasion. Wall Street Journal, Feb 7 2010.Google Scholar
- The TPC-H Benchmark. http://www.tpc.org.Google Scholar
- D. J. Weitzner, H. Abelson, T. Berners-Lee, et al. Information accountability. Commun. ACM, 51(6):82-87, June 2008. Google Scholar
Index Terms
- On scaling up sensitive data auditing
Recommendations
Simulatable auditing
PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsGiven a data set consisting of private information about individuals, we consider the online query auditing problem: given a sequence of queries that have already been posed about the data, their corresponding answers -- where each answer is either the ...
Statistical Database Auditing Without Query Denial Threat
<P>Statistical database auditing is the process of checking aggregate queries that are submitted in a continuous manner, to prevent inference disclosure. Compared to other data protection mechanisms, auditing has the features of flexibility and maximum ...
An efficient online auditing approach to limit private data disclosure
EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database TechnologyIn a database system, disclosure of confidential private data may occur if users can put together the answers of past queries. Traditional access control mechanisms cannot guard against such breaches to private data. Online auditing techniques have been ...
Comments