Abstract
In this article, we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy. Specifically, we give a mechanism that privately releases synthetic data useful for answering a class of queries over a discrete domain with error that grows as a function of the size of the smallest net approximately representing the answers to that class of queries. We show that this in particular implies a mechanism for counting queries that gives error guarantees that grow only with the VC-dimension of the class of queries, which itself grows at most logarithmically with the size of the query class.
We also show that it is not possible to release even simple classes of queries (such as intervals and their generalizations) over continuous domains with worst-case utility guarantees while preserving differential privacy. In response to this, we consider a relaxation of the utility guarantee and give a privacy preserving polynomial time algorithm that for any halfspace query will provide an answer that is accurate for some small perturbation of the query. This algorithm does not release synthetic data, but instead another data structure capable of representing an answer for each query. We also give an efficient algorithm for releasing synthetic data for the class of interval queries and axis-aligned rectangles of constant dimension over discrete domains.
- Achlioptas, D. 2003. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. comput. Syst. Scie. 66, 4, 671--687. Google ScholarDigital Library
- Anthony, M. and Bartlett, P. 1999. Neural Network Learning: Theoretical Foundations. Cambridge University Press. Google ScholarDigital Library
- Balcan, M. F., Blum, A., and Vempala, S. 2006. Kernels as features: On kernels, margins, and low-dimensional mappings. Mach. Learn. 65, 1, 79--94. Google ScholarDigital Library
- Blum, A., Dwork, C., McSherry, F., and Nissim, K. 2005. Practical privacy: The SuLQ framework. In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM New York. 128--138. Google ScholarDigital Library
- Blum, A., Ligett, K., and Roth, A. 2008. A learning theory approach to non-interactive database privacy. In Proceedings of the 40th Annual ACM Symposium on Theory of Computing. ACM, 609--618. Google ScholarDigital Library
- Dasgupta, S. and Gupta, A. 1999. An elementary proof of the Johnson-Lindenstrauss Lemma. Tech. rep., 99-006, International Science Institute.Google Scholar
- De, A. 2011. Lower bounds in differential privacy. arXiv:1107.2183 (2011).Google Scholar
- Dinur, I. and Nissim K. 2003. Revealing information while preserving privacy. In Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 202--210. Google ScholarDigital Library
- Dwork, C. 2006. Differential privacy. In Proceedings of the 33rd International Colloquium on Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 4052, Springer. Google ScholarDigital Library
- Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., and Naor, M. 2006a. Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology—EUROCRYPT 2006. Lecture Notes in Computer Science, vol. 4004, Springer, Berlin, 486--503. Google ScholarDigital Library
- Dwork, C., McSherry, F., Nissim, K., and Smith, A. 2006b. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Theory of Cryptography Conference (TCC) Lecture Notes in Computer Science, vol. 3876, Springer, 265. Google ScholarDigital Library
- Dwork, C., McSherry, F., and Talwar, K. 2007. The price of privacy and the limits of LP decoding. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing, 85--94. Google ScholarDigital Library
- Dwork, C., Naor, M., Reingold, O., Rothblum, G. N., and Vadhan, S. 2009. On the complexity of differentially private data release: Efficient algorithms and hardness results. In Proceedings of the 41st Annual ACM Symposium on the Theory of Computing. ACM New York, 381--390. Google ScholarDigital Library
- Dwork, C. and Nissim, K. 2004. Privacy-preserving datamining on vertically partitioned databases. In Proceedings of CRYPTO. Lecture Notes in Computer Science, Springer, 528--544.Google Scholar
- Dwork, C., Rothblum, G. N., and Vadhan, S. 2010. Boosting and differential privacy. In Proceedings of the 51st Annua IEEE Symposium on Foundations of Computer Science. IEEE, 51--60. Google ScholarDigital Library
- Dwork, C. and Yekhanin, S. 2008. New efficient attacks on statistical disclosure control mechanisms. In Proceedings of the Advances in Cryptology--CRYPTO 2008. 469--480. Google ScholarDigital Library
- Gupta, A., Hardt, M., Roth, A., and Ullman, J. 2011a. Privately releasing conjunctions and the statistical query barrier. In Proceedings of the 43rd Annual ACM Symposium on the Theory of Computing. ACM New York. Google ScholarDigital Library
- Gupta, A., Roth, A., and Ullman, J. 2011b. Iterative constructions and private data release. arXiv:1107.3731.Google Scholar
- Hardt, M., Ligett, K., and McSherry, F. 2012. A simple and practical algorithm for differentially private data release. arXiv:1012.4763.Google Scholar
- Hardt, M. and Rothblum, G. N. 2010. A multiplicative weights mechanism for privacy-preserving data analysis. In Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science. IEEE, 61--70. Google ScholarDigital Library
- Hardt, M., Rothblum, G. N., and Servedio, R. A. 2011. Private data release via learning thresholds. arXiv:1107.2444.Google Scholar
- Hardt, M. and Talwar K. 2010. On the geometry of differential privacy. In Proceedings of the 42nd ACM Symposium on the Theory of Computing, (STOC'10). Google ScholarDigital Library
- Hay, M., Rastogi, V., Miklau, G., and Suciu, D. 2010. Boosting the accuracy of differentially-private queries through consistency. In Proceedings of VLDB. Google ScholarDigital Library
- Kasiviswanathan, S. P., Lee, H. K., Nissim, K., Raskhodnikova, S., and Smith, A. 2008. What can we learn privately?. In Proceedings of the IEEE 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS'08). 531--540. Google ScholarDigital Library
- Kasiviswanathan, S. P., Rudelson, M., Smith, A., and Ullman, J. 2010. The price of privately releasing contingency tables and the spectra of random matrices with correlated rows. In Proceedings of the 42nd STOC. ACM. Google ScholarDigital Library
- Li, C., Hay, M., Rastogi, V., Miklau, G., and McGregor, A. 2010. Optimizing linear counting queries under differential privacy. In Proceedings of PODS. Google ScholarDigital Library
- Li, C. and Miklau, G. 2011. Efficient batch query answering under differential privacy. CoRR abs/1103.1367 (2011).Google Scholar
- Li, C. and Miklau, G. 2012a. An adaptive mechanism for accurate query answering under differential privacy. Proc. VLDB Endowment 5, 6, 514--525. Google ScholarDigital Library
- Li, C. and Miklau, G. 2012b. Measuring the achievable error of query sets under differential privacy. CoRR abs/1202.3399v2.Google Scholar
- McSherry, F. and Talwar, K. 2007. Mechanism design via differential privacy. In Proceedings of the 48th IEEE Symposium on Foundations of Computer Science (FOCS'07). 94--103. Google ScholarDigital Library
- Muthukrishnan, S. and Nikolov, A. 2012. Optimal private halfspace counting via discrepancy. In Proceedings of STOC. 1285--1292. Google ScholarDigital Library
- Roth, A. 2010. Differential privacy and the fat shattering dimension of linear queries. In Proceedings of the 14th Annual Workshop on Randomization and Computation (RANDOM'10). Google ScholarDigital Library
- Roth, A. and Roughgarden, T. 2010. Interactive privacy via the median mechanism. In Proceedings of the 42nd ACM Symposium on the Theory of Computing (STOC'10). Google ScholarDigital Library
- Smola, A. J. and Schölkopf, B. 2002. Learning with Kernels. MIT Press.Google Scholar
- Ullman, J. and Vadhan, S. P. 2011. PCPs and the hardness of generating private synthetic data. In Proceedings of TCC. 400--416. Google ScholarDigital Library
- Vapnik, V. N. 1998. Statistical Learning Theory. Wiley.Google Scholar
- Xiao, X., Wang, G., and Gehrke, J. 2010. Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 1200--1214. Google ScholarDigital Library
Index Terms
- A learning theory approach to noninteractive database privacy
Recommendations
A learning theory approach to non-interactive database privacy
STOC '08: Proceedings of the fortieth annual ACM symposium on Theory of computingWe demonstrate that, ignoring computational constraints, it is possible to release privacy-preserving databases that are useful for all queries over a discretized domain from any given concept class with polynomial VC-dimension. We show a new lower ...
Database Learning: Toward a Database that Becomes Smarter Every Time
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of DataIn today's databases, previous query answers rarely benefit answering future queries. For the first time, to the best of our knowledge, we change this paradigm in an approximate query processing (AQP) context. We make the following observation: the ...
Comments