ABSTRACT
We consider a statistical database in which a trusted administrator introduces noise to the query responses with the goal of maintaining privacy of individual database entries. In such a database, a query consists of a pair (S, f) where S is a set of rows in the database and f is a function mapping database rows to {0, 1}. The true answer is ΣiεS f(di), and a noisy version is released as the response to the query. Results of Dinur, Dwork, and Nissim show that a strong form of privacy can be maintained using a surprisingly small amount of noise -- much less than the sampling error -- provided the total number of queries is sublinear in the number of database rows. We call this query and (slightly) noisy reply the SuLQ (Sub-Linear Queries) primitive. The assumption of sublinearity becomes reasonable as databases grow increasingly large.We extend this work in two ways. First, we modify the privacy analysis to real-valued functions f and arbitrary row types, as a consequence greatly improving the bounds on noise required for privacy. Second, we examine the computational power of the SuLQ primitive. We show that it is very powerful indeed, in that slightly noisy versions of the following computations can be carried out with very few invocations of the primitive: principal component analysis, k means clustering, the Perceptron Algorithm, the ID3 algorithm, and (apparently!) all algorithms that operate in the in the statistical query learning model [11].
- D. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In Proc. Twentieth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 2001.]] Google ScholarDigital Library
- R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proc. ACM SIGMOD Conference on Management of Data, pages 439--450. ACM Press, 2000.]] Google ScholarDigital Library
- N. R. Adam and J. C. Wortmann, Security-Control Methods for Statistical Databases: A Comparative Study, ACM Computing Surveys 21(4), pp. 515--556, 1989.]] Google ScholarDigital Library
- S. Chawla, C. Dwork, F. McSherry, A. Smith and H. Wee, Toward Privacy in Public Databases, TCC 2005.]] Google ScholarDigital Library
- D.E. Denning. Secure statistical database with random sample queries. ACM Trans. Database Syst. 5, 3(Sept.), 291--315, 1980.]] Google ScholarDigital Library
- I. Dinur and K. Nissim, Revealing information while preserving privacy, Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 202--210, 2003.]] Google ScholarDigital Library
- C. Dwork and N. Nissim, Privacy-Preserving Datamining on Vertically Partitioned Databases, Proceedings of CRYPTO 2004, pp. 528--544, 2004.]]Google ScholarCross Ref
- A. Evfimievsky, J. Gehrke, and R. Srikant, Limiting privacy breaches in privacy preserving data mining. In Proc. Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 211--222, 2003.]] Google ScholarDigital Library
- J. Feigenbaum, Y. Ishai, T. Malkin, K. Nissim, M. Strauss, and R. N. Wright. Secure multiparty computation of approximations. In Proc. 28th International Colloquium on Automata, Languages and Programming, pages 927--938. Springer-Verlag, 2001.]] Google ScholarDigital Library
- S. Halevi, E. Kushilevitz, R. Krauthgamer, and K. Nissim. Private approximations of np-hard functions. In Proc. 33th Annual ACM Symposium on the Theory of Computing, pages 550--559, 2001.]] Google ScholarDigital Library
- M. Kearns, Efficient Noise-Tolerant Learning from Statistical Queries, JACM 45(6), pp. 983--1006, 1998. See also Proc. 25th ACM STOC, pp. 392--401, 1993]] Google ScholarDigital Library
- P. Kocher, Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems, CRYPTO 1996, Lecture Notes in Computer Science, Volume 1109, Jan 1996, Page 104.]] Google ScholarDigital Library
- Y. Lindell and B. Pinkas. Privacy preserving data mining. J. Cryptology, 15(3):177--206, 2002. An earlier version appeared in Proc. Crypto 2000.]] Google ScholarDigital Library
- E. Leiss. Randomizing a practical method for protecting statistical databases against compromise. In Proceedings of the 8th Conference on Very Large Databases, pp. 189--196.]] Google ScholarDigital Library
- M. J. O'Connel, Search Program for Significant Variables, Comp. Phys. Comm. 8, 1974.]]Google Scholar
- D. MacKay Information Theory, Inference, and Learning Algorithms Cambridge University Press, 2003.]] Google ScholarDigital Library
- J. R. Quinlan, Induction of Decision Trees, Machine learning 1(1), 1986, pp. 81--106.]] Google ScholarDigital Library
Recommendations
Towards practical differential privacy for SQL queries
Differential privacy promises to enable general data analytics while protecting individual privacy, but existing differential privacy mechanisms do not support the wide variety of features and databases used in real-world SQL-based analytics systems.
...
Practical Anonymization for Protecting Privacy in Combinatorial Maps
PDCAT '14: Proceedings of the 2014 15th International Conference on Parallel and Distributed Computing, Applications and TechnologiesCombinatorial Map (CM) is becoming increasingly popular due to its power in modeling topological structures with subdivided objects, which is widely used in the fields of social network, computer vision, social media and so on. However, due to its ...
Comments