research-article

A learning theory approach to noninteractive database privacy

Authors:
Avrim Blum

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Katrina Ligett

California Institute of Technology, Pasadena, CA

California Institute of Technology, Pasadena, CA
View Profile

,
Aaron Roth

University of Pennsylvania, Philadephia, PA

University of Pennsylvania, Philadephia, PA
View Profile

Authors Info & Claims

Journal of the ACM Volume 60 Issue 2Article No.: 12pp 1–25https://doi.org/10.1145/2450142.2450148

Published:03 May 2013Publication History

Journal of the ACM

Abstract

In this article, we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy. Specifically, we give a mechanism that privately releases synthetic data useful for answering a class of queries over a discrete domain with error that grows as a function of the size of the smallest net approximately representing the answers to that class of queries. We show that this in particular implies a mechanism for counting queries that gives error guarantees that grow only with the VC-dimension of the class of queries, which itself grows at most logarithmically with the size of the query class.

We also show that it is not possible to release even simple classes of queries (such as intervals and their generalizations) over continuous domains with worst-case utility guarantees while preserving differential privacy. In response to this, we consider a relaxation of the utility guarantee and give a privacy preserving polynomial time algorithm that for any halfspace query will provide an answer that is accurate for some small perturbation of the query. This algorithm does not release synthetic data, but instead another data structure capable of representing an answer for each query. We also give an efficient algorithm for releasing synthetic data for the class of interval queries and axis-aligned rectangles of constant dimension over discrete domains.

References

Achlioptas, D. 2003. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. comput. Syst. Scie. 66, 4, 671--687. Google ScholarDigital Library
Anthony, M. and Bartlett, P. 1999. Neural Network Learning: Theoretical Foundations. Cambridge University Press. Google ScholarDigital Library
Balcan, M. F., Blum, A., and Vempala, S. 2006. Kernels as features: On kernels, margins, and low-dimensional mappings. Mach. Learn. 65, 1, 79--94. Google ScholarDigital Library
Blum, A., Dwork, C., McSherry, F., and Nissim, K. 2005. Practical privacy: The SuLQ framework. In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM New York. 128--138. Google ScholarDigital Library
Blum, A., Ligett, K., and Roth, A. 2008. A learning theory approach to non-interactive database privacy. In Proceedings of the 40th Annual ACM Symposium on Theory of Computing. ACM, 609--618. Google ScholarDigital Library
Dasgupta, S. and Gupta, A. 1999. An elementary proof of the Johnson-Lindenstrauss Lemma. Tech. rep., 99-006, International Science Institute.Google Scholar
De, A. 2011. Lower bounds in differential privacy. arXiv:1107.2183 (2011).Google Scholar
Dinur, I. and Nissim K. 2003. Revealing information while preserving privacy. In Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 202--210. Google ScholarDigital Library
Dwork, C. 2006. Differential privacy. In Proceedings of the 33rd International Colloquium on Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 4052, Springer. Google ScholarDigital Library
Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., and Naor, M. 2006a. Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology—EUROCRYPT 2006. Lecture Notes in Computer Science, vol. 4004, Springer, Berlin, 486--503. Google ScholarDigital Library
Dwork, C., McSherry, F., Nissim, K., and Smith, A. 2006b. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Theory of Cryptography Conference (TCC) Lecture Notes in Computer Science, vol. 3876, Springer, 265. Google ScholarDigital Library
Dwork, C., McSherry, F., and Talwar, K. 2007. The price of privacy and the limits of LP decoding. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing, 85--94. Google ScholarDigital Library
Dwork, C., Naor, M., Reingold, O., Rothblum, G. N., and Vadhan, S. 2009. On the complexity of differentially private data release: Efficient algorithms and hardness results. In Proceedings of the 41st Annual ACM Symposium on the Theory of Computing. ACM New York, 381--390. Google ScholarDigital Library
Dwork, C. and Nissim, K. 2004. Privacy-preserving datamining on vertically partitioned databases. In Proceedings of CRYPTO. Lecture Notes in Computer Science, Springer, 528--544.Google Scholar
Dwork, C., Rothblum, G. N., and Vadhan, S. 2010. Boosting and differential privacy. In Proceedings of the 51st Annua IEEE Symposium on Foundations of Computer Science. IEEE, 51--60. Google ScholarDigital Library
Dwork, C. and Yekhanin, S. 2008. New efficient attacks on statistical disclosure control mechanisms. In Proceedings of the Advances in Cryptology--CRYPTO 2008. 469--480. Google ScholarDigital Library
Gupta, A., Hardt, M., Roth, A., and Ullman, J. 2011a. Privately releasing conjunctions and the statistical query barrier. In Proceedings of the 43rd Annual ACM Symposium on the Theory of Computing. ACM New York. Google ScholarDigital Library
Gupta, A., Roth, A., and Ullman, J. 2011b. Iterative constructions and private data release. arXiv:1107.3731.Google Scholar
Hardt, M., Ligett, K., and McSherry, F. 2012. A simple and practical algorithm for differentially private data release. arXiv:1012.4763.Google Scholar
Hardt, M. and Rothblum, G. N. 2010. A multiplicative weights mechanism for privacy-preserving data analysis. In Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science. IEEE, 61--70. Google ScholarDigital Library
Hardt, M., Rothblum, G. N., and Servedio, R. A. 2011. Private data release via learning thresholds. arXiv:1107.2444.Google Scholar
Hardt, M. and Talwar K. 2010. On the geometry of differential privacy. In Proceedings of the 42nd ACM Symposium on the Theory of Computing, (STOC'10). Google ScholarDigital Library
Hay, M., Rastogi, V., Miklau, G., and Suciu, D. 2010. Boosting the accuracy of differentially-private queries through consistency. In Proceedings of VLDB. Google ScholarDigital Library
Kasiviswanathan, S. P., Lee, H. K., Nissim, K., Raskhodnikova, S., and Smith, A. 2008. What can we learn privately&quest;. In Proceedings of the IEEE 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS'08). 531--540. Google ScholarDigital Library
Kasiviswanathan, S. P., Rudelson, M., Smith, A., and Ullman, J. 2010. The price of privately releasing contingency tables and the spectra of random matrices with correlated rows. In Proceedings of the 42nd STOC. ACM. Google ScholarDigital Library
Li, C., Hay, M., Rastogi, V., Miklau, G., and McGregor, A. 2010. Optimizing linear counting queries under differential privacy. In Proceedings of PODS. Google ScholarDigital Library
Li, C. and Miklau, G. 2011. Efficient batch query answering under differential privacy. CoRR abs/1103.1367 (2011).Google Scholar
Li, C. and Miklau, G. 2012a. An adaptive mechanism for accurate query answering under differential privacy. Proc. VLDB Endowment 5, 6, 514--525. Google ScholarDigital Library
Li, C. and Miklau, G. 2012b. Measuring the achievable error of query sets under differential privacy. CoRR abs/1202.3399v2.Google Scholar
McSherry, F. and Talwar, K. 2007. Mechanism design via differential privacy. In Proceedings of the 48th IEEE Symposium on Foundations of Computer Science (FOCS'07). 94--103. Google ScholarDigital Library
Muthukrishnan, S. and Nikolov, A. 2012. Optimal private halfspace counting via discrepancy. In Proceedings of STOC. 1285--1292. Google ScholarDigital Library
Roth, A. 2010. Differential privacy and the fat shattering dimension of linear queries. In Proceedings of the 14th Annual Workshop on Randomization and Computation (RANDOM'10). Google ScholarDigital Library
Roth, A. and Roughgarden, T. 2010. Interactive privacy via the median mechanism. In Proceedings of the 42nd ACM Symposium on the Theory of Computing (STOC'10). Google ScholarDigital Library
Smola, A. J. and Schölkopf, B. 2002. Learning with Kernels. MIT Press.Google Scholar
Ullman, J. and Vadhan, S. P. 2011. PCPs and the hardness of generating private synthetic data. In Proceedings of TCC. 400--416. Google ScholarDigital Library
Vapnik, V. N. 1998. Statistical Learning Theory. Wiley.Google Scholar
Xiao, X., Wang, G., and Gehrke, J. 2010. Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 1200--1214. Google ScholarDigital Library

Index Terms

A learning theory approach to noninteractive database privacy
1. Theory of computation
  1. Design and analysis of algorithms

Recommendations

A learning theory approach to non-interactive database privacy
STOC '08: Proceedings of the fortieth annual ACM symposium on Theory of computing

We demonstrate that, ignoring computational constraints, it is possible to release privacy-preserving databases that are useful for all queries over a discretized domain from any given concept class with polynomial VC-dimension. We show a new lower ...
Read More
Privacy-preserving database querying
Read More
Database Learning: Toward a Database that Becomes Smarter Every Time
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

In today's databases, previous query answers rarely benefit answering future queries. For the first time, to the best of our knowledge, we change this paradigm in an approximate query processing (AQP) context. We make the following observation: the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Journal of the ACM Volume 60, Issue 2
April 2013
237 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/2450142
Issue’s Table of Contents

Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 May 2013
- Accepted: 1 January 2013
- Revised: 1 July 2012
- Received: 1 September 2011
Published in jacm Volume 60, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Noninteractive database privacy
learning theory
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 155
  Total Citations
  View Citations
- 1,812
  Total Downloads
- Downloads (Last 12 months)110
- Downloads (Last 6 weeks)22
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A learning theory approach to noninteractive database privacy

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

A learning theory approach to non-interactive database privacy

Privacy-preserving database querying

Database Learning: Toward a Database that Becomes Smarter Every Time

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A learning theory approach to noninteractive database privacy

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

A learning theory approach to non-interactive database privacy

Privacy-preserving database querying

Database Learning: Toward a Database that Becomes Smarter Every Time

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media