ABSTRACT
Sharing healthcare data has become a vital requirement in healthcare system management; however, inappropriate sharing and usage of healthcare data could threaten patients' privacy. In this paper, we study the privacy concerns of the blood transfusion information-sharing system between the Hong Kong Red Cross Blood Transfusion Service (BTS) and public hospitals, and identify the major challenges that make traditional data anonymization methods not applicable. Furthermore, we propose a new privacy model called LKC-privacy, together with an anonymization algorithm, to meet the privacy and information requirements in this BTS case. Experiments on the real-life data demonstrate that our anonymization algorithm can effectively retain the essential information in anonymous data for data analysis and is scalable for anonymizing large datasets.
Supplemental Material
- C. C. Aggarwal. On k-anonymity and the curse of dimensionality. In VLDB, 2005. Google ScholarDigital Library
- C. C. Aggarwal and P. S. Yu. Privacy Preserving Data Mining: Models and Algorithms. Springer, 2008. Google ScholarDigital Library
- R. Agrawal and R. Srikant. Privacy preserving data mining. In SIGMOD, 2000. Google ScholarDigital Library
- R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In ICDE, 2005. Google ScholarDigital Library
- D. M. Carlisle, M. L. Rodrian, and C. L. Diamond. California inpatient data reporting manual, medical information reporting for california, 5th edition. Technical report, Office of Statewide Health Planning and Development, July 2007.Google Scholar
- C. Dwork. Differential privacy: A survey of results. Theory and Applications of Models of Computation, 2008.Google ScholarDigital Library
- B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey on recent developments. ACM Computing Surveys, 2010.Google ScholarDigital Library
- B. C. M. Fung, K. Wang, and P. S. Yu. Anonymizing classification data for privacy preservation. IEEE TKDE, 19(5):711--725, May 2007. Google ScholarDigital Library
- J. Gardner and L. Xiong. An integrated framework for de-identifying heterogeneous data. DKE, 2009. Google ScholarDigital Library
- G. Ghinita, Y. Tao, and P. Kalnis. On the anonymization of sparse high-dimensional data. In ICDE, 2008. Google ScholarDigital Library
- V. S. Iyengar. Transforming data to satisfy privacy constraints. In SIGKDD, 2002. Google ScholarDigital Library
- J. Kim and W. Winkler. Masking microdata files. In ASA Section on Survey Research Methods, 1995.Google Scholar
- K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workload-aware anonymization techniques for large-scale data sets. ACM TODS, 2008. Google ScholarDigital Library
- A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. ACM TKDD, 2007. Google ScholarDigital Library
- N. Mohammed, B. C. M. Fung, K. Wang, and P. C. K. Hung. Privacy-preserving data mashup. In EDBT, 2009. Google ScholarDigital Library
- D. J. Newman, S. Hettich, C. L. Blake, and C. J. Merz. UCI repository of machine learning databases, 1998.Google Scholar
- J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. Google ScholarDigital Library
- P. Samarati. Protecting respondents' identities in microdata release. IEEE TKDE, 2001. Google ScholarDigital Library
- A. Skowron and C. Rauszer. Intelligent Decision Support: Handbook of Applications and Advances of the Rough Set Theory, chapter The discernibility matrices and functions in information systems. 1992.Google Scholar
- L. Sweeney. k-anonymity: A model for protecting privacy. In International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 2002. Google ScholarDigital Library
- M. Terrovitis, N. Mamoulis, and P. Kalnis. Privacy-preserving anonymization of set-valued data. In VLDB, 2008. Google ScholarDigital Library
- K. Wang and B. C. M. Fung. Anonymizing sequential releases. In SIGKDD, pages 414--423, August 2006. Google ScholarDigital Library
- K. Wang, B. C. M. Fung, and P. S. Yu. Handicapping attacker's confidence: An alternative to k-anonymization. KAIS, 11(3):345--368, April 2007. Google ScholarDigital Library
- R. C. W. Wong, J. Li., A. W. C. Fu, and K. Wang. (®,k)-anonymity: An enhanced k-anonymity model for privacy preserving data publishing. In SIGKDD, 2006. Google ScholarDigital Library
- X. Xiao and Y. Tao. Anatomy: Simple and effective privacy preservation. In VLDB, 2006. Google ScholarDigital Library
- Y. Xu, B. C. M. Fung, K. Wang, A. W. C. Fu, and J. Pei. Publishing sensitive transactions for itemset utility. In ICDM, pages 1109--1114, December 2008. Google ScholarDigital Library
- Y. Xu, K. Wang, A. W. C. Fu, and P. S. Yu. Anonymizing transaction databases for publication. In SIGKDD, 2008. Google ScholarDigital Library
- S. Yu, G. Fung, R. Rosales, S. Krishnan, R. B. Rao, C. Dehing-Oberije, and P. Lambin. Privacy-preserving cox regression for survival analysis. In SIGKDD, 2008. Google ScholarDigital Library
Index Terms
- Anonymizing healthcare data: a case study on the blood transfusion service
Recommendations
Centralized and Distributed Anonymization for High-Dimensional Healthcare Data
Sharing healthcare data has become a vital requirement in healthcare system management; however, inappropriate sharing and usage of healthcare data could threaten patients’ privacy. In this article, we study the privacy concerns of sharing patient ...
Anonymizing Classification Data for Privacy Preservation
Classification is a fundamental problem in data analysis. Training a classifier requires accessing a large collection of data. Releasing person-specific data, such as customer data or patient records, may pose a threat to an individual's privacy. Even ...
Anonymizing sequential releases
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningAn organization makes a new release as new information become available, releases a tailored view for each data request, releases sensitive information and identifying information separately. The availability of related releases sharpens the ...
Comments