article

L-diversity: Privacy beyond k-anonymity

Authors:
Ashwin Machanavajjhala

Cornell University, Ithaca, NY

Cornell University, Ithaca, NY
View Profile

,
Daniel Kifer

Cornell University, Ithaca, NY

Cornell University, Ithaca, NY
View Profile

,
Johannes Gehrke

Cornell University, Ithaca, NY

Cornell University, Ithaca, NY
View Profile

,
Muthuramakrishnan Venkitasubramaniam

Cornell University, Ithaca, NY

Cornell University, Ithaca, NY
View Profile

Authors Info & Claims

ACM Transactions on Knowledge Discovery from Data Volume 1 Issue 1pp 3–eshttps://doi.org/10.1145/1217299.1217302

Published:01 March 2007Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In a k-anonymized dataset, each record is indistinguishable from at least k − 1 other records with respect to certain identifying attributes.

In this article, we show using two simple attacks that a k-anonymized dataset has some subtle but severe privacy problems. First, an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. This is a known problem. Second, attackers often have background knowledge, and we show that k-anonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks, and we propose a novel and powerful privacy criterion called ℓ-diversity that can defend against such attacks. In addition to building a formal foundation for ℓ-diversity, we show in an experimental evaluation that ℓ-diversity is practical and can be implemented efficiently.

References

Adam, N. R. and Wortmann, J. C. 1989. Security-control methods for statistical databases: A comparative study. ACM Comput. Surv. 21, 4, 515--556. Google ScholarDigital Library
Aggarwal, C. C. and Yu, P. S. 2004. A condensation approach to privacy preserving data mining. In Proceedings of the International Conference on Extending Database Technology (EDBT). 183--199.Google Scholar
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., and Zhu, A. 2004. k-anonymity: Algorithms and hardness. Tech. rep., Stanford University.Google Scholar
Agrawal, D. and Aggarwal, C. C. 2001. On the design and quantifiaction of privacy preserving data mining algorithms. In Proceedings of the International Conference on Principles of Database Systems (PODS). Google ScholarDigital Library
Agrawal, R., Bayardo, R. J., Faloutsos, C., Kiernan, J., Rantzau, R., and Srikant, R. 2004. Auditing compliance with a hippocratic database. In Proceedings of the International Conference on Very Large Databases (VLDB). 516--527. Google ScholarDigital Library
Agrawal, R., Evfimievski, A. V., and Srikant, R. 2003. Information sharing across private databases. In Proceedings of the SIGMOD Conference. 86--97. Google ScholarDigital Library
Agrawal, R., Kiernan, J., Srikant, R., and Xu, Y. 2002. Hippocratic databases. In Proceedings of the International Conference on Very Large Databases (VLDB). 143--154. Google ScholarDigital Library
Agrawal, R. and Srikant, R. 1994. Fast Algorithms for Mining Association Rules in Large Databases. In Proceedings of the International Conference on Very Large Databases (VLDB). Google ScholarDigital Library
Agrawal, R. and Srikant, R. 2000. Privacy preserving data mining. In Proceedings of the 19th ACM SIGMOD Conference on Management of Data. Google ScholarDigital Library
Agrawal, R., Srikant, R., and Thomas, D. 2004. Privacy preserving OLAP. In Proceedings of the 23th ACM SIGMOD Conference on Management of Data. Google ScholarDigital Library
Bacchus, F., Grove, A. J., Halpern, J. Y., and Koller, D. 1996. From statistical knowledge bases to degrees of belief. A.I. 87, 1--2. Google ScholarDigital Library
Bayardo, R. J. and Agrawal, R. 2005. Data privacy through optimal k-anonymization. In Proceedings of the International Conference on Data Engineering (ICDE'05). Google ScholarDigital Library
Beaver, D. 1997. Commodity-based cryptography (extended abstract). In Proceedings of the 29th ACM Symposium on Theory of Computing (STOC'97). 446--455. Google ScholarDigital Library
Beaver, D. 1998. Server-assisted cryptography. In Proceedings of the 1998 Workshop on New Security Paradigms (NSPW'98). 92--106. Google ScholarDigital Library
Beck, L. 1980. A security mechanism for statistical database. ACM Trans. Datab. Syst. 5, 3, 316--338. Google ScholarDigital Library
Ben-Or, M., Goldwasser, S., and Wigderson, A. 1988. Completeness theorems for non-cryptographic fault-tolerant distributed computation. In Proceedings of the 20th ACM Symposium on Theory of Computing (STOC'88). 1--10. Google ScholarDigital Library
Ben-Tal, A., Charnes, A., and Teboulle, M. 1989. Entropic means. J. Mathemat. Anal. Appl. 139, 2, 537--551.Google ScholarCross Ref
Blum, A., Dwork, C., McSherry, F., and Nissim, K. 2005. Practical privacy: The SuLQ framework. In Proceedings of the International Conference on Principles of Data Systems (PODS). Google ScholarDigital Library
Chaum, D., Crepeau, C., and Damgard, I. 1988. Multiparty unconditionally secure protocols. In Proceedings of the 20th ACM Symposium on Theory of Computing (STOC'88). 11--19. Google ScholarDigital Library
Chawla, S., Dwork, C., McSherry, F., Smith, A., and Wee, H. 2005. Toward privacy in public databases. In Proceedings of the Tactical Communications Conference (TCC). Google ScholarDigital Library
Chin, F. 1986. Security problems on inference control for sum, max, and min queries. J. ACM 33, 3, 451--464. Google ScholarDigital Library
Chin, F. and Ozsoyoglu, G. 1981. Auditing for secure statistical databases. In Proceedings of the ACM Conference (ACM'81). 53--59. Google ScholarDigital Library
Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., and Zhu, M. Y. 2002. Tools for privacy preserving data mining. SIGKDD Explorations 4, 2, 28--34. Google ScholarDigital Library
Cox, L. 1995. Network models for complementary cell suppression. J. Amer. Statis. Asso. 90, 1453--1462.Google ScholarCross Ref
Cox, L. H. 1980. Suppression, methodology and statistical disclosure control. J. Amer. Statis. Asso. 75.Google ScholarCross Ref
Cox, L. H. 1982. Solving confidentiality protection problems in tabulations using network optimization: A network model for cell suppression in the u.s. economic censuses. In Proceedings of the International Seminar on Statistical Confidentiality. Dublin International Statistical Institute, Dublin, Ireland. 229--245.Google Scholar
Cox, L. H. 1987. New results in dislosure avoidance for tabulations. In Proceedings of the International Statistical Institute 46th Session. Tokyo, Japan. 83--84.Google Scholar
Dalenius, T. 1981. A simple procedure for controlled rounding. Statistik Tidskrift.Google Scholar
Dalenius, T. and Reiss, S. 1982. Data swapping: A technique for disclosure control. J. Statis. Plan. Infer. 6.Google Scholar
Denning, D. 1980. Secure statistical databases with random sample queries. ACM Trans. Datab. Syst. 5, 3, 291--315. Google ScholarDigital Library
Denning, D. E., Denning, P. J., and Schwartz, M. D. 1979. The tracker: A threat to statistical database security. ACM Trans. Datab. Syst. 4, 1, 76--96. Google ScholarDigital Library
Diaconis, P. and Sturmfels, B. 1998. Algebraic algorithms for sampling from conditional distributions. Annals of Statistics 1, 363--397.Google Scholar
Dinur, I. and Nissim, K. 2003. Revealing information while preserving privacy. In Proceedings of the International Conference on Principles of Data Systems (PODS). 202--210. Google ScholarDigital Library
Dobkin, D. P., Jones, A. K., and Lipton, R. J. 1979. Secure databases: Protection against user influence. ACM: Trans. Datab. Syst. 4, 1 (March), 76--96. Google ScholarDigital Library
Dobra, A. 2002. Statistical tools for disclosure limitation in multiway contingency tables. Ph.D. thesis, Carnegie Mellon University.Google Scholar
Dobra, A. and Feinberg, S. E. 2000. Assessing the risk of disclosure of confidential categorical data. In Bayesian Statistics 7. Oxford University Press, Oxford, UK.Google Scholar
Dobra, A. and Feinberg, S. E. 2003. Bounding entries in multi-way contingency tables given a set of marginal totals. In Proceedings of the Shoresh Conference 2000: Foundations of Statistical Inference. Springer Verlag.Google Scholar
Du, W. 2001. A study of several specific secure two-party computation problems. Ph.D. thesis, Purdue University. Google ScholarDigital Library
Du, W. and Zhan, Z. 2002. A practical approach to solve secure multi-party computation problems. New Security Paradigms Workshop. Google ScholarDigital Library
Duncan, G. T. and Feinberg, S. E. 1997. Obtaining information while preserving privacy: A markov perturbation method for tabular data. Joint Statistical Meetings. Anaheim, CA.Google Scholar
Evfimievski, A., Gehrke, J., and Srikant, R. 2003. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the International Conference on Principles of Data Systems (PODS). Google ScholarDigital Library
Evfimievsky, A., Srikant, R., Gehrke, J., and Agrawal, R. 2002. Privacy preserving data mining of association rules. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery in Databases and Data Mining. 217--228. Google ScholarDigital Library
Fellegi, I. P. 1972. On the question of statistical confidentiality. J. Amer. Statis. Asso. 67:337, 7--18.Google ScholarCross Ref
Goldreich, O., Micali, S., and Wigderson, A. 1987. How to play any mental game. In Proceedings of the 19th ACM Conference on Theory of Computing (STOC'87). 218--229. Google ScholarDigital Library
Huang, Z., Du, W., and Chen, B. 2004. Deriving private information from randomized data. In Proceedings of the 23th ACM SIGMOD Conference on Management of Data. Google ScholarDigital Library
Kantarcioglu, M. and Clifton, C. 2002. Privacy-preserving distributed mining of association rules on horizontally partitioned data. In Proceedings of the Conference on Data Mining and Knowledge Discovery (DMKD).Google Scholar
Kargupta, H., Datta, S., Wang, Q., and Sivakumar, K. 2003. On the privacy preserving properties of random data perturbation techniques. In Proceedings of the International Conference on Data Mining (ICDM). 99--106. Google ScholarDigital Library
Kenthapadi, K., Mishra, N., and Nissim, K. 2005. Simulatable auditing. In PODS. Google ScholarDigital Library
Kleinberg, J., Papadimitriou, C., and Raghavan, P. 2000. Auditing boolean attributes. In Proceedings of the International Conference on Principles of Data Systems (PODS). Google ScholarDigital Library
LeFevre, K., Agrawal, R., Ercegovac, V., Ramakrishnan, R., Xu, Y., and DeWitt, D. J. 2004. Limiting disclosure in hippocratic databases. In Proceedings of the International Conference on Very Large Databases (VLDB). 108--119. Google ScholarDigital Library
LeFevre, K., DeWitt, D., and Ramakrishnan, R. 2005. Incognito: Efficient fulldomain k-anonymity. In SIGMOD. Google ScholarDigital Library
M. Langheinrich, E. 2001. A P3P preference exchange language 1.0 (appel1.0). W3C Working Draft.Google Scholar
M. Marchiori, E. 2002. The platform for privacy preferences 1.0 (p3p1.0) specification. W3C Proposed Recommendation.Google Scholar
Machanavajjhala, A., Gehrke, J., Kifer, D., and Venkitasubramaniam, M. 2006. &ell;-diversity: Privacy beyond k-anonymity. In Proceedings of the International Conference on Data Engineering (ICDE). Google ScholarDigital Library
Martin, D., Kifer, D., Machanavajjhala, A., Gehrke, J., and Halpern, J. 2006. Worst-case background knowledge in privacy. Tech. rep., Cornell University.Google Scholar
Matloff, N. S. 1986. Another look at the use of noise addition for database security. In Proceedings of IEEE Symposium on Security and Privacy. 173--180.Google ScholarCross Ref
Meyerson, A. and Williams, R. 2004. On the complexity of optimal k-anonymity. In PODS. Google ScholarDigital Library
Miklau, G. and Suciu, D. 2003. Controlling access to published data using cryptography. In Proceedings of the International Conference on Very Large Databases (VLDB). 898--909. Google ScholarDigital Library
Miklau, G. and Suciu, D. 2004. A formal analysis of information disclosure in data exchange. In SIGMOD. Google ScholarDigital Library
Ohrn, A. and Ohno-Machado, L. 1999. Using boolean reasoning to anonymize databases. A. I. Medicine 15, 3, 235--254.Google Scholar
Samarati, P. 2001. Protecting respondents' identities in microdata release. In IEEE Trans. Knowl. Data Eng. Google ScholarDigital Library
Samarati, P. and Sweeney, L. 1998. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Tech. rep. SRI-CSL-98-04, SRI Computer Science Laboratory, Palo Alto, CA.Google Scholar
Schlorer, J. 1975. Identification and retrieval of personal records from a statistical bank. Methods Inform. Medicine.Google Scholar
Slavkovic, A. and Feinberg, S. E. 2004. Bounds for cell entries in two-way tables given conditional relative frequencies. In Lecture Notes in Computer Science, Vol. 3050. J. Domingo-Ferrer and V. Torra Eds. Springer-Verlag, 30--43.Google Scholar
Snodgrass, R. T., Yao, S., and Collberg, C. S. 2004. Tamper detection in audit logs. In Proceedings of the International Conference on Very Large Databases (VLDB). 504--515. Google ScholarDigital Library
Sweeney, L. 2000. Uniqueness of simple demographics in the u.s. population. Tech. rep., Carnegie Mellon University.Google Scholar
Sweeney, L. 2002. k-anonymity: a model for protecting privacy. Int. J. Uncer., Fuz. Knowl-based Syst. 10, 5, 557--570. Google ScholarDigital Library
Traub, J. F., Yemini, Y., and Wozniakowski, H. 1984. The statistical security of a statistical database. ACM Trans. Datab. Syst. 9, 4, 672--679. Google ScholarDigital Library
University of California Irvine Machine Learning Repository. http://www.ics.uci.edu/mlearn/mlrepository.html.Google Scholar
Vaidya, J. and Clifton, C. 2002. Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD). 639--644. Google ScholarDigital Library
Warner, S. L. 1965. Randomized response: A survey technique for eliminating evasive answer bias. J. Amer. Statis. Ass.Google ScholarCross Ref
Warner, S. L. 1971. The linear randomized response model. J. Amer. Statis. Ass. 884--888.Google ScholarCross Ref
Yang, X. and Li, C. 2004. Secure XML publishing without information leakage in the presence of data inference. In Proceedings of the International Conference on Very Large Databases (VLDB). 96--107. Google ScholarDigital Library
Zhong, S., Yang, Z., and Wright, R. N. 2005. Privacy-enhancing k-anonymization of customer data. In Proceedings of the International Conference on Principles of Data Systems (PODS). Google ScholarDigital Library

Index Terms

L-diversity: Privacy beyond k-anonymity
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Data types and structures

Recommendations

t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation
Microaggregation is a technique for disclosure limitation aimed at protecting the privacy of data subjects in microdata releases. It has been used as an alternative to generalization and suppression to generate k-anonymous data sets, where the identity of ...
Read More
Improving accuracy of classification models induced from anonymized datasets

The performance of classifiers and other data mining models can be significantly enhanced using the large repositories of digital data collected nowadays by public and private organizations. However, the original records stored in those repositories ...
Read More
Background knowledge attacks in privacy-preserving data publishing models
Abstract
Massive volumes of data are being generated at every moment through various sources in the cyber-physical world. While storing as well as facilitating these data for business or individual requirements, data disclosure, sensitive data ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Knowledge Discovery from Data Volume 1, Issue 1
March 2007
161 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/1217299
Issue’s Table of Contents

Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 March 2007
Published in tkdd Volume 1, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ℓ-diversity
k-anonymity
Data privacy
privacy-preserving data publishing
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2,061
  Total Citations
  View Citations
- 9,760
  Total Downloads
- Downloads (Last 12 months)664
- Downloads (Last 6 weeks)81
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

L-diversity: Privacy beyond k-anonymity

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation

Improving accuracy of classification models induced from anonymized datasets

Background knowledge attacks in privacy-preserving data publishing models

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

L-diversity: Privacy beyond k-anonymity

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation

Improving accuracy of classification models induced from anonymized datasets

Background knowledge attacks in privacy-preserving data publishing models

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media