research-article

Centralized and Distributed Anonymization for High-Dimensional Healthcare Data

Authors:
Noman Mohammed

Concordia University

Concordia University
View Profile

,
Benjamin C. M. Fung

Concordia University

Concordia University
View Profile

,
Patrick C. K. Hung

University of Ontario Institute of Technology

University of Ontario Institute of Technology
View Profile

,
Cheuk-Kwong Lee

Hong Kong Red Cross Blood Transfusion Service

Hong Kong Red Cross Blood Transfusion Service
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 4 Issue 4Article No.: 18pp 1–33https://doi.org/10.1145/1857947.1857950

Published:01 October 2010Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Sharing healthcare data has become a vital requirement in healthcare system management; however, inappropriate sharing and usage of healthcare data could threaten patients’ privacy. In this article, we study the privacy concerns of sharing patient information between the Hong Kong Red Cross Blood Transfusion Service (BTS) and the public hospitals. We generalize their information and privacy requirements to the problems of centralized anonymization and distributed anonymization, and identify the major challenges that make traditional data anonymization methods not applicable. Furthermore, we propose a new privacy model called LKC-privacy to overcome the challenges and present two anonymization algorithms to achieve LKC-privacy in both the centralized and the distributed scenarios. Experiments on real-life data demonstrate that our anonymization algorithms can effectively retain the essential information in anonymous data for data analysis and is scalable for anonymizing large datasets.

References

Adam, N. R. and Wortman, J. C. 1989. Security control methods for statistical databases. ACM Comput. Surv. 21, 4, 515--556. Google ScholarDigital Library
Aggarwal, C. C. 2005. On k-anonymity and the curse of dimensionality. In Proceedings of the International Conference on Very Large Databases. Google ScholarDigital Library
Aggarwal, C. C. and Yu, P. S. 2008. Privacy Preserving Data Mining: Models and Algorithms. Springer. Google ScholarDigital Library
Agrawal, R. and Srikant, R. 2000. Privacy preserving data mining. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarDigital Library
Bayardo, R. J. and Agrawal, R. 2005. Data privacy through optimal k-anonymization. In Proceedings of the International Conference on Data Engineering. Google ScholarDigital Library
Blum, A., Dwork, C., McSherry, F., and Nissim, K. 2005. Practical privacy: the sulq framework. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Google ScholarDigital Library
Carlisle, D. M., Rodrian, M. L., and Diamond, C. L. 2007. California inpatient data reporting manual, medical information reporting for California, 5th edition. Tech. rep., Office of Statewide Health Planning and Development.Google Scholar
Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., and Zhu, M. Y. 2002. Tools for privacy preserving distributed data mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Explor. Newslett. 4, 2, 28--34. Google ScholarDigital Library
Dinur, I. and Nissim, K. 2003. Revealing information while preserving privacy. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Google ScholarDigital Library
Du, W., Han, Y. S., and Chen, S. 2004. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In Proceedings of the SIAM International Conference on Data Mining.Google Scholar
Du, W. and Zhan, Z. 2002. Building decision tree classifier on private data. In Proceedings of the IEEE ICDM Workshop on Privacy, Security, and Data Mining. Google ScholarDigital Library
Du, W. L. 2001. A study of several specific secure two-party computation problems. PhD thesis, Purdue University, West Lafayette. Google ScholarDigital Library
Dwork, C. 2006. Differential privacy. In Proceedings of the International Colloquium on Automata, Languages, and Programming. Google ScholarDigital Library
Dwork, C., McSherry, F., Nissim, K., and Smith, A. 2006. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Theory of Cryptography Conference. Google ScholarDigital Library
Fuller, W. A. 1993. Masking procedures for microdata disclosure limitation. Official Statistics.Google Scholar
Fung, B. C. M., Wang, K., Chen, R., and Yu, P. S. 2010. Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv. 42, 4, 1--53. Google ScholarDigital Library
Fung, B. C. M., Wang, K., and Yu, P. S. 2007. Anonymizing classification data for privacy preservation. IEEE Trans. Knowl. Data Engin. 19, 5, 711--725. Google ScholarDigital Library
Gardner, J. and Xiong, L. 2009. An integrated framework for de-identifying heterogeneous data. Data Knowl. Engin. Google ScholarDigital Library
Ghinita, G., Tao, Y., and Kalnis, P. 2008. On the anonymization of sparse high-dimensional data. In Proceedings of the International Conference on Data Engineering. Google ScholarDigital Library
Iyengar, V. S. 2002. Transforming data to satisfy privacy constraints. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
Jiang, W. and Clifton, C. 2005. Privacy-preserving distributed k-anonymity. In Proceedings of the Working Conference on Data and Applications Security. Google ScholarDigital Library
Jiang, W. and Clifton, C. 2006. A secure distributed framework for achieving k-anonymity. J. VLDB 15, 4, 316--333. Google ScholarDigital Library
Jurczyk, P. and Xiong, L. 2008. Towards privacy-preserving integration of distributed heterogeneous data. In Proceedings of the PhD Workshop on Information and Knowledge Management (PIKM). Google ScholarDigital Library
Jurczyk, P. and Xiong, L. 2009. Distributed anonymization: Achieving privacy for both data subjects and data providers. In Proceedings of the Working Conference on Data and Applications Security. Google ScholarDigital Library
Kim, J. and Winkler, W. 1995. Masking microdata files. In Proceedings of the ASA Section on Survey Research Methods.Google Scholar
LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. 2006. Mondrian multidimensional k-anonymity. In Proceedings of the International Conference on Data Engineering. Google ScholarDigital Library
LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. 2008. Workload-aware anonymization techniques for large-scale datasets. ACM Trans. Datab. Syst. Google ScholarDigital Library
Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkitasubramaniam, M. 2007. &ell;-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data. Google ScholarDigital Library
Mohammed, N., Fung, B. C. M., Hung, P. C. K., and Lee, C. 2009a. Anonymizing healthcare data: A case study on the blood transfusion service. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
Mohammed, N., Fung, B. C. M., Wang, K., and Hung, P. C. K. 2009b. Privacy-preserving data mashup. In Proceedings of the International Conference on Extending Database Technology. Google ScholarDigital Library
Newman, D. J., Hettich, S., Blake, C. L., and Merz, C. J. 1998. UCI Repository of Machine Learning Databases.Google Scholar
Quinlan, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann. Google ScholarDigital Library
Samarati, P. 2001. Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Engin. Google ScholarDigital Library
Schneier, B. 1995. Applied Cryptography. 2nd Ed. John Wiley & Sons.Google Scholar
Skowron, A. and Rauszer, C. 1992. The discernibility matrices and functions in information systems. In Intelligent Decision Support: Handbook of Applications and Advances of the Rough Set Theory.Google Scholar
Sweeney, L. 2002. k-anonymity: A model for protecting privacy. Int. J. Uncert. Fuzz. Knowl. Based Syst. Google ScholarDigital Library
Terrovitis, M., Mamoulis, N., and Kalnis, P. 2008. Privacy-preserving anonymization of set-valued data. In Proceedings of the International Conference on Very Large Databases.Google Scholar
Vaidya, J. and Clifton, C. 2002. Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
Vaidya, J. and Clifton, C. 2003. Privacy-preserving k-means clustering over vertically partitioned data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
Wang, K., Fung, B. C. M., and Yu, P. S. 2007. Handicapping attacker’s confidence: An alternative to k-anonymization. Knowl. Inform. Syst. 11, 3, 345--368. Google ScholarDigital Library
Wong, R. C. W., Li., J., Fu, A. W. C., and Wang, K. 2006. α, k-anonymity: An enhanced k-anonymity model for privacy preserving data publishing. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
Xiao, X. and Tao, Y. 2006a. Anatomy: Simple and effective privacy preservation. In Proceedings of the International Conference on Very Large Databases. Google ScholarDigital Library
Xiao, X. and Tao, Y. 2006b. Personalized privacy preservation. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarDigital Library
Xu, Y., Wang, K., Fu, A. W. C., and Yu, P. S. 2008. Anonymizing transaction databases for publication. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
Yang, Z., Zhong, S., and Wright, R. N. 2005. Privacy-preserving classification of customer data without loss of accuracy. In Proceedings of the SIAM International Conference on Data Mining.Google Scholar
Zhao, K., Liu, B., Tirpak, T. M., and Xiao, W. 2005. A visual data mining framework for convenient identification of useful knowledge. In Proceedings of the IEEE ICDM: IEEE International Conference on Data Mining. Google ScholarDigital Library

Index Terms

Centralized and Distributed Anonymization for High-Dimensional Healthcare Data

Recommendations

Anonymizing healthcare data: a case study on the blood transfusion service
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Sharing healthcare data has become a vital requirement in healthcare system management; however, inappropriate sharing and usage of healthcare data could threaten patients' privacy. In this paper, we study the privacy concerns of the blood transfusion ...
Read More
A framework for efficient data anonymization under privacy and accuracy constraints

Recent research studied the problem of publishing microdata without revealing sensitive information, leading to the privacy-preserving paradigms of k-anonymity and l-diversity. k-anonymity protects against the identification of an individual's record. l-...
Read More
Efficient and flexible anonymization of transaction data

Transaction data are increasingly used in applications, such as marketing research and biomedical studies. Publishing these data, however, may risk privacy breaches, as they often contain personal information about individuals. Approaches to anonymizing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Knowledge Discovery from Data Volume 4, Issue 4
October 2010
121 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/1857947
Issue’s Table of Contents

Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 2010
- Accepted: 1 June 2010
- Received: 1 January 2010
Published in tkdd Volume 4, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Privacy
anonymity
classification
healthcare
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 77
  Total Citations
  View Citations
- 1,405
  Total Downloads
- Downloads (Last 12 months)70
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Centralized and Distributed Anonymization for High-Dimensional Healthcare Data

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Anonymizing healthcare data: a case study on the blood transfusion service

A framework for efficient data anonymization under privacy and accuracy constraints

Efficient and flexible anonymization of transaction data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Centralized and Distributed Anonymization for High-Dimensional Healthcare Data

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Anonymizing healthcare data: a case study on the blood transfusion service

A framework for efficient data anonymization under privacy and accuracy constraints

Efficient and flexible anonymization of transaction data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media