skip to main content
research-article

A learning theory approach to noninteractive database privacy

Published:03 May 2013Publication History
Skip Abstract Section

Abstract

In this article, we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy. Specifically, we give a mechanism that privately releases synthetic data useful for answering a class of queries over a discrete domain with error that grows as a function of the size of the smallest net approximately representing the answers to that class of queries. We show that this in particular implies a mechanism for counting queries that gives error guarantees that grow only with the VC-dimension of the class of queries, which itself grows at most logarithmically with the size of the query class.

We also show that it is not possible to release even simple classes of queries (such as intervals and their generalizations) over continuous domains with worst-case utility guarantees while preserving differential privacy. In response to this, we consider a relaxation of the utility guarantee and give a privacy preserving polynomial time algorithm that for any halfspace query will provide an answer that is accurate for some small perturbation of the query. This algorithm does not release synthetic data, but instead another data structure capable of representing an answer for each query. We also give an efficient algorithm for releasing synthetic data for the class of interval queries and axis-aligned rectangles of constant dimension over discrete domains.

References

  1. Achlioptas, D. 2003. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. comput. Syst. Scie. 66, 4, 671--687. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Anthony, M. and Bartlett, P. 1999. Neural Network Learning: Theoretical Foundations. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Balcan, M. F., Blum, A., and Vempala, S. 2006. Kernels as features: On kernels, margins, and low-dimensional mappings. Mach. Learn. 65, 1, 79--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Blum, A., Dwork, C., McSherry, F., and Nissim, K. 2005. Practical privacy: The SuLQ framework. In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM New York. 128--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Blum, A., Ligett, K., and Roth, A. 2008. A learning theory approach to non-interactive database privacy. In Proceedings of the 40th Annual ACM Symposium on Theory of Computing. ACM, 609--618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dasgupta, S. and Gupta, A. 1999. An elementary proof of the Johnson-Lindenstrauss Lemma. Tech. rep., 99-006, International Science Institute.Google ScholarGoogle Scholar
  7. De, A. 2011. Lower bounds in differential privacy. arXiv:1107.2183 (2011).Google ScholarGoogle Scholar
  8. Dinur, I. and Nissim K. 2003. Revealing information while preserving privacy. In Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 202--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dwork, C. 2006. Differential privacy. In Proceedings of the 33rd International Colloquium on Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 4052, Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., and Naor, M. 2006a. Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology—EUROCRYPT 2006. Lecture Notes in Computer Science, vol. 4004, Springer, Berlin, 486--503. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dwork, C., McSherry, F., Nissim, K., and Smith, A. 2006b. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Theory of Cryptography Conference (TCC) Lecture Notes in Computer Science, vol. 3876, Springer, 265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dwork, C., McSherry, F., and Talwar, K. 2007. The price of privacy and the limits of LP decoding. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing, 85--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dwork, C., Naor, M., Reingold, O., Rothblum, G. N., and Vadhan, S. 2009. On the complexity of differentially private data release: Efficient algorithms and hardness results. In Proceedings of the 41st Annual ACM Symposium on the Theory of Computing. ACM New York, 381--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dwork, C. and Nissim, K. 2004. Privacy-preserving datamining on vertically partitioned databases. In Proceedings of CRYPTO. Lecture Notes in Computer Science, Springer, 528--544.Google ScholarGoogle Scholar
  15. Dwork, C., Rothblum, G. N., and Vadhan, S. 2010. Boosting and differential privacy. In Proceedings of the 51st Annua IEEE Symposium on Foundations of Computer Science. IEEE, 51--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Dwork, C. and Yekhanin, S. 2008. New efficient attacks on statistical disclosure control mechanisms. In Proceedings of the Advances in Cryptology--CRYPTO 2008. 469--480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gupta, A., Hardt, M., Roth, A., and Ullman, J. 2011a. Privately releasing conjunctions and the statistical query barrier. In Proceedings of the 43rd Annual ACM Symposium on the Theory of Computing. ACM New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Gupta, A., Roth, A., and Ullman, J. 2011b. Iterative constructions and private data release. arXiv:1107.3731.Google ScholarGoogle Scholar
  19. Hardt, M., Ligett, K., and McSherry, F. 2012. A simple and practical algorithm for differentially private data release. arXiv:1012.4763.Google ScholarGoogle Scholar
  20. Hardt, M. and Rothblum, G. N. 2010. A multiplicative weights mechanism for privacy-preserving data analysis. In Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science. IEEE, 61--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hardt, M., Rothblum, G. N., and Servedio, R. A. 2011. Private data release via learning thresholds. arXiv:1107.2444.Google ScholarGoogle Scholar
  22. Hardt, M. and Talwar K. 2010. On the geometry of differential privacy. In Proceedings of the 42nd ACM Symposium on the Theory of Computing, (STOC'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hay, M., Rastogi, V., Miklau, G., and Suciu, D. 2010. Boosting the accuracy of differentially-private queries through consistency. In Proceedings of VLDB. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kasiviswanathan, S. P., Lee, H. K., Nissim, K., Raskhodnikova, S., and Smith, A. 2008. What can we learn privately?. In Proceedings of the IEEE 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS'08). 531--540. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kasiviswanathan, S. P., Rudelson, M., Smith, A., and Ullman, J. 2010. The price of privately releasing contingency tables and the spectra of random matrices with correlated rows. In Proceedings of the 42nd STOC. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Li, C., Hay, M., Rastogi, V., Miklau, G., and McGregor, A. 2010. Optimizing linear counting queries under differential privacy. In Proceedings of PODS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Li, C. and Miklau, G. 2011. Efficient batch query answering under differential privacy. CoRR abs/1103.1367 (2011).Google ScholarGoogle Scholar
  28. Li, C. and Miklau, G. 2012a. An adaptive mechanism for accurate query answering under differential privacy. Proc. VLDB Endowment 5, 6, 514--525. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Li, C. and Miklau, G. 2012b. Measuring the achievable error of query sets under differential privacy. CoRR abs/1202.3399v2.Google ScholarGoogle Scholar
  30. McSherry, F. and Talwar, K. 2007. Mechanism design via differential privacy. In Proceedings of the 48th IEEE Symposium on Foundations of Computer Science (FOCS'07). 94--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Muthukrishnan, S. and Nikolov, A. 2012. Optimal private halfspace counting via discrepancy. In Proceedings of STOC. 1285--1292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Roth, A. 2010. Differential privacy and the fat shattering dimension of linear queries. In Proceedings of the 14th Annual Workshop on Randomization and Computation (RANDOM'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Roth, A. and Roughgarden, T. 2010. Interactive privacy via the median mechanism. In Proceedings of the 42nd ACM Symposium on the Theory of Computing (STOC'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Smola, A. J. and Schölkopf, B. 2002. Learning with Kernels. MIT Press.Google ScholarGoogle Scholar
  35. Ullman, J. and Vadhan, S. P. 2011. PCPs and the hardness of generating private synthetic data. In Proceedings of TCC. 400--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Vapnik, V. N. 1998. Statistical Learning Theory. Wiley.Google ScholarGoogle Scholar
  37. Xiao, X., Wang, G., and Gehrke, J. 2010. Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 1200--1214. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A learning theory approach to noninteractive database privacy

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Journal of the ACM
      Journal of the ACM  Volume 60, Issue 2
      April 2013
      237 pages
      ISSN:0004-5411
      EISSN:1557-735X
      DOI:10.1145/2450142
      Issue’s Table of Contents

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 May 2013
      • Accepted: 1 January 2013
      • Revised: 1 July 2012
      • Received: 1 September 2011
      Published in jacm Volume 60, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader