skip to main content
article

L-diversity: Privacy beyond k-anonymity

Published:01 March 2007Publication History
Skip Abstract Section

Abstract

Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In a k-anonymized dataset, each record is indistinguishable from at least k − 1 other records with respect to certain identifying attributes.

In this article, we show using two simple attacks that a k-anonymized dataset has some subtle but severe privacy problems. First, an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. This is a known problem. Second, attackers often have background knowledge, and we show that k-anonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks, and we propose a novel and powerful privacy criterion called ℓ-diversity that can defend against such attacks. In addition to building a formal foundation for ℓ-diversity, we show in an experimental evaluation that ℓ-diversity is practical and can be implemented efficiently.

References

  1. Adam, N. R. and Wortmann, J. C. 1989. Security-control methods for statistical databases: A comparative study. ACM Comput. Surv. 21, 4, 515--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aggarwal, C. C. and Yu, P. S. 2004. A condensation approach to privacy preserving data mining. In Proceedings of the International Conference on Extending Database Technology (EDBT). 183--199.Google ScholarGoogle Scholar
  3. Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., and Zhu, A. 2004. k-anonymity: Algorithms and hardness. Tech. rep., Stanford University.Google ScholarGoogle Scholar
  4. Agrawal, D. and Aggarwal, C. C. 2001. On the design and quantifiaction of privacy preserving data mining algorithms. In Proceedings of the International Conference on Principles of Database Systems (PODS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Agrawal, R., Bayardo, R. J., Faloutsos, C., Kiernan, J., Rantzau, R., and Srikant, R. 2004. Auditing compliance with a hippocratic database. In Proceedings of the International Conference on Very Large Databases (VLDB). 516--527. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Agrawal, R., Evfimievski, A. V., and Srikant, R. 2003. Information sharing across private databases. In Proceedings of the SIGMOD Conference. 86--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Agrawal, R., Kiernan, J., Srikant, R., and Xu, Y. 2002. Hippocratic databases. In Proceedings of the International Conference on Very Large Databases (VLDB). 143--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Agrawal, R. and Srikant, R. 1994. Fast Algorithms for Mining Association Rules in Large Databases. In Proceedings of the International Conference on Very Large Databases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Agrawal, R. and Srikant, R. 2000. Privacy preserving data mining. In Proceedings of the 19th ACM SIGMOD Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Agrawal, R., Srikant, R., and Thomas, D. 2004. Privacy preserving OLAP. In Proceedings of the 23th ACM SIGMOD Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bacchus, F., Grove, A. J., Halpern, J. Y., and Koller, D. 1996. From statistical knowledge bases to degrees of belief. A.I. 87, 1--2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Bayardo, R. J. and Agrawal, R. 2005. Data privacy through optimal k-anonymization. In Proceedings of the International Conference on Data Engineering (ICDE'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Beaver, D. 1997. Commodity-based cryptography (extended abstract). In Proceedings of the 29th ACM Symposium on Theory of Computing (STOC'97). 446--455. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Beaver, D. 1998. Server-assisted cryptography. In Proceedings of the 1998 Workshop on New Security Paradigms (NSPW'98). 92--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Beck, L. 1980. A security mechanism for statistical database. ACM Trans. Datab. Syst. 5, 3, 316--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ben-Or, M., Goldwasser, S., and Wigderson, A. 1988. Completeness theorems for non-cryptographic fault-tolerant distributed computation. In Proceedings of the 20th ACM Symposium on Theory of Computing (STOC'88). 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ben-Tal, A., Charnes, A., and Teboulle, M. 1989. Entropic means. J. Mathemat. Anal. Appl. 139, 2, 537--551.Google ScholarGoogle ScholarCross RefCross Ref
  18. Blum, A., Dwork, C., McSherry, F., and Nissim, K. 2005. Practical privacy: The SuLQ framework. In Proceedings of the International Conference on Principles of Data Systems (PODS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Chaum, D., Crepeau, C., and Damgard, I. 1988. Multiparty unconditionally secure protocols. In Proceedings of the 20th ACM Symposium on Theory of Computing (STOC'88). 11--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Chawla, S., Dwork, C., McSherry, F., Smith, A., and Wee, H. 2005. Toward privacy in public databases. In Proceedings of the Tactical Communications Conference (TCC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Chin, F. 1986. Security problems on inference control for sum, max, and min queries. J. ACM 33, 3, 451--464. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Chin, F. and Ozsoyoglu, G. 1981. Auditing for secure statistical databases. In Proceedings of the ACM Conference (ACM'81). 53--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., and Zhu, M. Y. 2002. Tools for privacy preserving data mining. SIGKDD Explorations 4, 2, 28--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Cox, L. 1995. Network models for complementary cell suppression. J. Amer. Statis. Asso. 90, 1453--1462.Google ScholarGoogle ScholarCross RefCross Ref
  25. Cox, L. H. 1980. Suppression, methodology and statistical disclosure control. J. Amer. Statis. Asso. 75.Google ScholarGoogle ScholarCross RefCross Ref
  26. Cox, L. H. 1982. Solving confidentiality protection problems in tabulations using network optimization: A network model for cell suppression in the u.s. economic censuses. In Proceedings of the International Seminar on Statistical Confidentiality. Dublin International Statistical Institute, Dublin, Ireland. 229--245.Google ScholarGoogle Scholar
  27. Cox, L. H. 1987. New results in dislosure avoidance for tabulations. In Proceedings of the International Statistical Institute 46th Session. Tokyo, Japan. 83--84.Google ScholarGoogle Scholar
  28. Dalenius, T. 1981. A simple procedure for controlled rounding. Statistik Tidskrift.Google ScholarGoogle Scholar
  29. Dalenius, T. and Reiss, S. 1982. Data swapping: A technique for disclosure control. J. Statis. Plan. Infer. 6.Google ScholarGoogle Scholar
  30. Denning, D. 1980. Secure statistical databases with random sample queries. ACM Trans. Datab. Syst. 5, 3, 291--315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Denning, D. E., Denning, P. J., and Schwartz, M. D. 1979. The tracker: A threat to statistical database security. ACM Trans. Datab. Syst. 4, 1, 76--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Diaconis, P. and Sturmfels, B. 1998. Algebraic algorithms for sampling from conditional distributions. Annals of Statistics 1, 363--397.Google ScholarGoogle Scholar
  33. Dinur, I. and Nissim, K. 2003. Revealing information while preserving privacy. In Proceedings of the International Conference on Principles of Data Systems (PODS). 202--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Dobkin, D. P., Jones, A. K., and Lipton, R. J. 1979. Secure databases: Protection against user influence. ACM: Trans. Datab. Syst. 4, 1 (March), 76--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Dobra, A. 2002. Statistical tools for disclosure limitation in multiway contingency tables. Ph.D. thesis, Carnegie Mellon University.Google ScholarGoogle Scholar
  36. Dobra, A. and Feinberg, S. E. 2000. Assessing the risk of disclosure of confidential categorical data. In Bayesian Statistics 7. Oxford University Press, Oxford, UK.Google ScholarGoogle Scholar
  37. Dobra, A. and Feinberg, S. E. 2003. Bounding entries in multi-way contingency tables given a set of marginal totals. In Proceedings of the Shoresh Conference 2000: Foundations of Statistical Inference. Springer Verlag.Google ScholarGoogle Scholar
  38. Du, W. 2001. A study of several specific secure two-party computation problems. Ph.D. thesis, Purdue University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Du, W. and Zhan, Z. 2002. A practical approach to solve secure multi-party computation problems. New Security Paradigms Workshop. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Duncan, G. T. and Feinberg, S. E. 1997. Obtaining information while preserving privacy: A markov perturbation method for tabular data. Joint Statistical Meetings. Anaheim, CA.Google ScholarGoogle Scholar
  41. Evfimievski, A., Gehrke, J., and Srikant, R. 2003. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the International Conference on Principles of Data Systems (PODS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Evfimievsky, A., Srikant, R., Gehrke, J., and Agrawal, R. 2002. Privacy preserving data mining of association rules. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery in Databases and Data Mining. 217--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Fellegi, I. P. 1972. On the question of statistical confidentiality. J. Amer. Statis. Asso. 67:337, 7--18.Google ScholarGoogle ScholarCross RefCross Ref
  44. Goldreich, O., Micali, S., and Wigderson, A. 1987. How to play any mental game. In Proceedings of the 19th ACM Conference on Theory of Computing (STOC'87). 218--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Huang, Z., Du, W., and Chen, B. 2004. Deriving private information from randomized data. In Proceedings of the 23th ACM SIGMOD Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Kantarcioglu, M. and Clifton, C. 2002. Privacy-preserving distributed mining of association rules on horizontally partitioned data. In Proceedings of the Conference on Data Mining and Knowledge Discovery (DMKD).Google ScholarGoogle Scholar
  47. Kargupta, H., Datta, S., Wang, Q., and Sivakumar, K. 2003. On the privacy preserving properties of random data perturbation techniques. In Proceedings of the International Conference on Data Mining (ICDM). 99--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Kenthapadi, K., Mishra, N., and Nissim, K. 2005. Simulatable auditing. In PODS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Kleinberg, J., Papadimitriou, C., and Raghavan, P. 2000. Auditing boolean attributes. In Proceedings of the International Conference on Principles of Data Systems (PODS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. LeFevre, K., Agrawal, R., Ercegovac, V., Ramakrishnan, R., Xu, Y., and DeWitt, D. J. 2004. Limiting disclosure in hippocratic databases. In Proceedings of the International Conference on Very Large Databases (VLDB). 108--119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. LeFevre, K., DeWitt, D., and Ramakrishnan, R. 2005. Incognito: Efficient fulldomain k-anonymity. In SIGMOD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. M. Langheinrich, E. 2001. A P3P preference exchange language 1.0 (appel1.0). W3C Working Draft.Google ScholarGoogle Scholar
  53. M. Marchiori, E. 2002. The platform for privacy preferences 1.0 (p3p1.0) specification. W3C Proposed Recommendation.Google ScholarGoogle Scholar
  54. Machanavajjhala, A., Gehrke, J., Kifer, D., and Venkitasubramaniam, M. 2006. ℓ-diversity: Privacy beyond k-anonymity. In Proceedings of the International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Martin, D., Kifer, D., Machanavajjhala, A., Gehrke, J., and Halpern, J. 2006. Worst-case background knowledge in privacy. Tech. rep., Cornell University.Google ScholarGoogle Scholar
  56. Matloff, N. S. 1986. Another look at the use of noise addition for database security. In Proceedings of IEEE Symposium on Security and Privacy. 173--180.Google ScholarGoogle ScholarCross RefCross Ref
  57. Meyerson, A. and Williams, R. 2004. On the complexity of optimal k-anonymity. In PODS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Miklau, G. and Suciu, D. 2003. Controlling access to published data using cryptography. In Proceedings of the International Conference on Very Large Databases (VLDB). 898--909. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Miklau, G. and Suciu, D. 2004. A formal analysis of information disclosure in data exchange. In SIGMOD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Ohrn, A. and Ohno-Machado, L. 1999. Using boolean reasoning to anonymize databases. A. I. Medicine 15, 3, 235--254.Google ScholarGoogle Scholar
  61. Samarati, P. 2001. Protecting respondents' identities in microdata release. In IEEE Trans. Knowl. Data Eng. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Samarati, P. and Sweeney, L. 1998. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Tech. rep. SRI-CSL-98-04, SRI Computer Science Laboratory, Palo Alto, CA.Google ScholarGoogle Scholar
  63. Schlorer, J. 1975. Identification and retrieval of personal records from a statistical bank. Methods Inform. Medicine.Google ScholarGoogle Scholar
  64. Slavkovic, A. and Feinberg, S. E. 2004. Bounds for cell entries in two-way tables given conditional relative frequencies. In Lecture Notes in Computer Science, Vol. 3050. J. Domingo-Ferrer and V. Torra Eds. Springer-Verlag, 30--43.Google ScholarGoogle Scholar
  65. Snodgrass, R. T., Yao, S., and Collberg, C. S. 2004. Tamper detection in audit logs. In Proceedings of the International Conference on Very Large Databases (VLDB). 504--515. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Sweeney, L. 2000. Uniqueness of simple demographics in the u.s. population. Tech. rep., Carnegie Mellon University.Google ScholarGoogle Scholar
  67. Sweeney, L. 2002. k-anonymity: a model for protecting privacy. Int. J. Uncer., Fuz. Knowl-based Syst. 10, 5, 557--570. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Traub, J. F., Yemini, Y., and Wozniakowski, H. 1984. The statistical security of a statistical database. ACM Trans. Datab. Syst. 9, 4, 672--679. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. University of California Irvine Machine Learning Repository. http://www.ics.uci.edu/mlearn/mlrepository.html.Google ScholarGoogle Scholar
  70. Vaidya, J. and Clifton, C. 2002. Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD). 639--644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Warner, S. L. 1965. Randomized response: A survey technique for eliminating evasive answer bias. J. Amer. Statis. Ass.Google ScholarGoogle ScholarCross RefCross Ref
  72. Warner, S. L. 1971. The linear randomized response model. J. Amer. Statis. Ass. 884--888.Google ScholarGoogle ScholarCross RefCross Ref
  73. Yang, X. and Li, C. 2004. Secure XML publishing without information leakage in the presence of data inference. In Proceedings of the International Conference on Very Large Databases (VLDB). 96--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Zhong, S., Yang, Z., and Wright, R. N. 2005. Privacy-enhancing k-anonymization of customer data. In Proceedings of the International Conference on Principles of Data Systems (PODS). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. L-diversity: Privacy beyond k-anonymity

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 1, Issue 1
      March 2007
      161 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/1217299
      Issue’s Table of Contents

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 March 2007
      Published in tkdd Volume 1, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader