research-article

Achieving anonymity via clustering

Authors:
Gagan Aggarwal

Google Inc., Mountian View, CA

Google Inc., Mountian View, CA
View Profile

,
Rina Panigrahy

Microsoft Research, Mountian View, CA

Microsoft Research, Mountian View, CA
View Profile

,
Tomás Feder

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Dilys Thomas

Oracle, Redwood Shores, CA

Oracle, Redwood Shores, CA
View Profile

,
Krishnaram Kenthapadi

Microsoft Research, Mountain View, CA

Microsoft Research, Mountain View, CA
View Profile

,
Samir Khuller

University of Maryland, College Park, MD

University of Maryland, College Park, MD
View Profile

,
An Zhu

Google Inc., Mountian View, CA

Google Inc., Mountian View, CA
View Profile

Authors Info & Claims

ACM Transactions on Algorithms Volume 6 Issue 3Article No.: 49pp 1–19https://doi.org/10.1145/1798596.1798602

Published:02 July 2010Publication History

ACM Transactions on Algorithms

Abstract

Publishing data for analysis from a table containing personal records, while maintaining individual privacy, is a problem of increasing importance today. The traditional approach of deidentifying records is to remove identifying fields such as social security number, name, etc. However, recent research has shown that a large fraction of the U.S. population can be identified using nonkey attributes (called quasi-identifiers) such as date of birth, gender, and zip code. The k-anonymity model protects privacy via requiring that nonkey attributes that leak information are suppressed or generalized so that, for every record in the modified table, there are at least k−1 other records having exactly the same values for quasi-identifiers. We propose a new method for anonymizing data records, where quasi-identifiers of data records are first clustered and then cluster centers are published. To ensure privacy of the data records, we impose the constraint that each cluster must contain no fewer than a prespecified number of data records. This technique is more general since we have a much larger choice for cluster centers than k-anonymity. In many cases, it lets us release a lot more information without compromising privacy. We also provide constant factor approximation algorithms to come up with such a clustering. This is the first set of algorithms for the anonymization problem where the performance is independent of the anonymity parameter k. We further observe that a few outlier points can significantly increase the cost of anonymization. Hence, we extend our algorithms to allow an ϵ fraction of points to remain unclustered, that is, deleted from the anonymized publication. Thus, by not releasing a small fraction of the database records, we can ensure that the data published for analysis has less distortion and hence is more useful. Our approximation algorithms for new clustering objectives are of independent interest and could be applicable in other clustering scenarios as well.

References

Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., and Zhu, A. 2005. Approximation algorithms for k-anonymity. J. Privacy Technol., Number 20051120001.Google Scholar
Bar-Ilan, J., Kortsarz, G., and Peleg, D. 1993. How to allocate network centers. J. Algor. 15, 385--415. Google ScholarDigital Library
Bayardo, R. J. and Agrawal, R. 2005. Data privacy through optimal k-anonymization. In Proceedings of the International Conference on Data Engineering. 217--228. Google ScholarDigital Library
Charikar, M., Khuller, S., Mount, D., and Narasimhan, G. 2001. Algorithms for facility location with outliers. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 642--651. Google ScholarDigital Library
Chawla, S., Dwork, C., McSherry, F., Smith, A., and Wee, H. 2005. Toward privacy in public databases. In Proceedings of the Theory of Cryptography Conference. 363--385. Google ScholarDigital Library
Garey, M. R. and Johnson, D. S. 1990. Computers and Intractability, A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, New York. Google ScholarDigital Library
Guha, S., Meyerson, A., and Munagala, K. 2000. Hierarchical placement and network design problems. In Proceedings of the IEEE Symposium on Foundations of Computer Science. 603--612. Google ScholarDigital Library
Hochbaum, D. and Shmoys, D. 1985. A best possible approximation algorithm for the k-center problem. Math. Oper. Res. 10, 180--184.Google ScholarDigital Library
Jain, K. and Vazirani, V. V. 1999. Primal-Dual approximation algorithms for metric facility location and k-median problems. In Proceedings of the IEEE Symposium on Foundations of Computer Science. 2--13. Google ScholarDigital Library
Karger, D. and Minkoff, M. 2000. Building steiner trees with incomplete global knowledge. In Proceedings of the IEEE Symposium on Foundations of Computer Science. 613--623. Google ScholarDigital Library
Khuller, S. and Sussmann, Y. 2000. The capacitated k-center problem. SIAM J. Discr. Math. 13, 3, 403--418. Google ScholarDigital Library
LeFevre, K., DeWitt, D., and Ramakrishnan, R. 2005. Incognito: Efficient full-domain k-anonymity. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 49--60. Google ScholarDigital Library
Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkitasubramaniam, M. 2006. l-Diversity: Privacy beyond k-anonymity. In Proceedings of the International Conference on Data Engineering. 24. Google ScholarDigital Library
Meyerson, A. and Williams, R. 2004. On the complexity of optimal k-anonymity. In Proceedings of the Symposium on Principles of Database Systems. 223--228. Google ScholarDigital Library
Samarati, P. 2001. Protecting respondent's privacy in microdata release. IEEE Trans. Knowl. Data Engin. 13, 6, 1010--1027. Google ScholarDigital Library
Sweeney, L. 2000. Uniqueness of simple demographics in the u.s. population. LIDAP-WP4. Carnegie Mellon University, Laboratory for International Data Privacy, Pittsburgh, PA.Google Scholar
Time. 1997. The death of privacy.Google Scholar

Index Terms

Achieving anonymity via clustering
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Achieving anonymity via clustering
PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Publishing data for analysis from a table containing personal records, while maintaining individual privacy, is a problem of increasing importance today. The traditional approach of de-identifying records is to remove identifying fields such as social ...
Read More
Anonymity preserving framework for location-based information services
MEDES '10: Proceedings of the International Conference on Management of Emergent Digital EcoSystems

Recently, location based services (LBS) have become more important in today technology advancements. Privacy issue in LBS is one of the most important concerns. In this paper, we have proposed an anonymity preserving framework which can provide a user ...
Read More
Improved yoking proof protocols for preserving anonymity

In emerging RFID applications, the yoking proof provides a method not only to ensure the physical proximity of multiple objects but also to verify that a pair of RFID tags has been scanned simultaneously by a reader. Previous studies have focused on ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Algorithms Volume 6, Issue 3
June 2010
304 pages
ISSN:1549-6325
EISSN:1549-6333
DOI:10.1145/1798596
Issue’s Table of Contents

Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 July 2010
- Accepted: 1 July 2008
- Revised: 1 May 2008
- Received: 1 August 2007
Published in talg Volume 6, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Privacy
anonymity
approximation algorithms
clustering
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 75
  Total Citations
  View Citations
- 896
  Total Downloads
- Downloads (Last 12 months)57
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Achieving anonymity via clustering

ACM Transactions on Algorithms

Abstract

References

Cited By

Index Terms

Recommendations

Achieving anonymity via clustering

Anonymity preserving framework for location-based information services

Improved yoking proof protocols for preserving anonymity

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Achieving anonymity via clustering

ACM Transactions on Algorithms

Abstract

References

Cited By

Index Terms

Recommendations

Achieving anonymity via clustering

Anonymity preserving framework for location-based information services

Improved yoking proof protocols for preserving anonymity

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media