research-article

Anonymizing healthcare data: a case study on the blood transfusion service

Authors:
Noman Mohammed

Concordia University, Montreal, PQ, Canada

Concordia University, Montreal, PQ, Canada
View Profile

,
Benjamin C.M. Fung

Concordia University, Montreal, PQ, Canada

Concordia University, Montreal, PQ, Canada
View Profile

,
Patrick C.K. Hung

University of Ontario Institute of Technology, Oshawa, ON, Canada

University of Ontario Institute of Technology, Oshawa, ON, Canada
View Profile

,
Cheuk-kwong Lee

Hong Kong Red Cross Blood Transfusion Service, Hong Kong, China

Hong Kong Red Cross Blood Transfusion Service, Hong Kong, China
View Profile

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data miningJune 2009Pages 1285–1294https://doi.org/10.1145/1557019.1557157

Published:28 June 2009Publication History

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 1285–1294

ABSTRACT

Sharing healthcare data has become a vital requirement in healthcare system management; however, inappropriate sharing and usage of healthcare data could threaten patients' privacy. In this paper, we study the privacy concerns of the blood transfusion information-sharing system between the Hong Kong Red Cross Blood Transfusion Service (BTS) and public hospitals, and identify the major challenges that make traditional data anonymization methods not applicable. Furthermore, we propose a new privacy model called LKC-privacy, together with an anonymization algorithm, to meet the privacy and information requirements in this BTS case. Experiments on the real-life data demonstrate that our anonymization algorithm can effectively retain the essential information in anonymous data for data analysis and is scalable for anonymizing large datasets.

Supplemental Material

p1285-mohammed.mp4

mp4

104.8 MB

Download

References

C. C. Aggarwal. On k-anonymity and the curse of dimensionality. In VLDB, 2005. Google ScholarDigital Library
C. C. Aggarwal and P. S. Yu. Privacy Preserving Data Mining: Models and Algorithms. Springer, 2008. Google ScholarDigital Library
R. Agrawal and R. Srikant. Privacy preserving data mining. In SIGMOD, 2000. Google ScholarDigital Library
R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In ICDE, 2005. Google ScholarDigital Library
D. M. Carlisle, M. L. Rodrian, and C. L. Diamond. California inpatient data reporting manual, medical information reporting for california, 5th edition. Technical report, Office of Statewide Health Planning and Development, July 2007.Google Scholar
C. Dwork. Differential privacy: A survey of results. Theory and Applications of Models of Computation, 2008.Google ScholarDigital Library
B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey on recent developments. ACM Computing Surveys, 2010.Google ScholarDigital Library
B. C. M. Fung, K. Wang, and P. S. Yu. Anonymizing classification data for privacy preservation. IEEE TKDE, 19(5):711--725, May 2007. Google ScholarDigital Library
J. Gardner and L. Xiong. An integrated framework for de-identifying heterogeneous data. DKE, 2009. Google ScholarDigital Library
G. Ghinita, Y. Tao, and P. Kalnis. On the anonymization of sparse high-dimensional data. In ICDE, 2008. Google ScholarDigital Library
V. S. Iyengar. Transforming data to satisfy privacy constraints. In SIGKDD, 2002. Google ScholarDigital Library
J. Kim and W. Winkler. Masking microdata files. In ASA Section on Survey Research Methods, 1995.Google Scholar
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workload-aware anonymization techniques for large-scale data sets. ACM TODS, 2008. Google ScholarDigital Library
A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. ACM TKDD, 2007. Google ScholarDigital Library
N. Mohammed, B. C. M. Fung, K. Wang, and P. C. K. Hung. Privacy-preserving data mashup. In EDBT, 2009. Google ScholarDigital Library
D. J. Newman, S. Hettich, C. L. Blake, and C. J. Merz. UCI repository of machine learning databases, 1998.Google Scholar
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. Google ScholarDigital Library
P. Samarati. Protecting respondents' identities in microdata release. IEEE TKDE, 2001. Google ScholarDigital Library
A. Skowron and C. Rauszer. Intelligent Decision Support: Handbook of Applications and Advances of the Rough Set Theory, chapter The discernibility matrices and functions in information systems. 1992.Google Scholar
L. Sweeney. k-anonymity: A model for protecting privacy. In International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 2002. Google ScholarDigital Library
M. Terrovitis, N. Mamoulis, and P. Kalnis. Privacy-preserving anonymization of set-valued data. In VLDB, 2008. Google ScholarDigital Library
K. Wang and B. C. M. Fung. Anonymizing sequential releases. In SIGKDD, pages 414--423, August 2006. Google ScholarDigital Library
K. Wang, B. C. M. Fung, and P. S. Yu. Handicapping attacker's confidence: An alternative to k-anonymization. KAIS, 11(3):345--368, April 2007. Google ScholarDigital Library
R. C. W. Wong, J. Li., A. W. C. Fu, and K. Wang. (®,k)-anonymity: An enhanced k-anonymity model for privacy preserving data publishing. In SIGKDD, 2006. Google ScholarDigital Library
X. Xiao and Y. Tao. Anatomy: Simple and effective privacy preservation. In VLDB, 2006. Google ScholarDigital Library
Y. Xu, B. C. M. Fung, K. Wang, A. W. C. Fu, and J. Pei. Publishing sensitive transactions for itemset utility. In ICDM, pages 1109--1114, December 2008. Google ScholarDigital Library
Y. Xu, K. Wang, A. W. C. Fu, and P. S. Yu. Anonymizing transaction databases for publication. In SIGKDD, 2008. Google ScholarDigital Library
S. Yu, G. Fung, R. Rosales, S. Krishnan, R. B. Rao, C. Dehing-Oberije, and P. Lambin. Privacy-preserving cox regression for survival analysis. In SIGKDD, 2008. Google ScholarDigital Library

Index Terms

Anonymizing healthcare data: a case study on the blood transfusion service

Recommendations

Centralized and Distributed Anonymization for High-Dimensional Healthcare Data

Sharing healthcare data has become a vital requirement in healthcare system management; however, inappropriate sharing and usage of healthcare data could threaten patients’ privacy. In this article, we study the privacy concerns of sharing patient ...
Read More
Anonymizing Classification Data for Privacy Preservation

Classification is a fundamental problem in data analysis. Training a classifier requires accessing a large collection of data. Releasing person-specific data, such as customer data or patient records, may pose a threat to an individual's privacy. Even ...
Read More
Anonymizing sequential releases
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

An organization makes a new release as new information become available, releases a tailored view for each data request, releases sensitive information and identifying information separately. The availability of related releases sharpens the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
June 2009
1426 pages
ISBN:9781605584959
DOI:10.1145/1557019
General Chairs:
John Elder
Elder Research, Inc., USA
,
Françoise Soulié Fogelman
KXEN, France
,
Program Chairs:
Peter Flach
University of Bristol, UK
,
Mohammed Zaki
RPI, USA
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
anonymity
classification
healthcare
privacy
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 77
  Total Citations
  View Citations
- 1,399
  Total Downloads
- Downloads (Last 12 months)47
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Anonymizing healthcare data: a case study on the blood transfusion service

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Centralized and Distributed Anonymization for High-Dimensional Healthcare Data

Anonymizing Classification Data for Privacy Preservation

Anonymizing sequential releases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Anonymizing healthcare data: a case study on the blood transfusion service

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Centralized and Distributed Anonymization for High-Dimensional Healthcare Data

Anonymizing Classification Data for Privacy Preservation

Anonymizing sequential releases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media