Abstract
Sharing healthcare data has become a vital requirement in healthcare system management; however, inappropriate sharing and usage of healthcare data could threaten patients’ privacy. In this article, we study the privacy concerns of sharing patient information between the Hong Kong Red Cross Blood Transfusion Service (BTS) and the public hospitals. We generalize their information and privacy requirements to the problems of centralized anonymization and distributed anonymization, and identify the major challenges that make traditional data anonymization methods not applicable. Furthermore, we propose a new privacy model called LKC-privacy to overcome the challenges and present two anonymization algorithms to achieve LKC-privacy in both the centralized and the distributed scenarios. Experiments on real-life data demonstrate that our anonymization algorithms can effectively retain the essential information in anonymous data for data analysis and is scalable for anonymizing large datasets.
- Adam, N. R. and Wortman, J. C. 1989. Security control methods for statistical databases. ACM Comput. Surv. 21, 4, 515--556. Google ScholarDigital Library
- Aggarwal, C. C. 2005. On k-anonymity and the curse of dimensionality. In Proceedings of the International Conference on Very Large Databases. Google ScholarDigital Library
- Aggarwal, C. C. and Yu, P. S. 2008. Privacy Preserving Data Mining: Models and Algorithms. Springer. Google ScholarDigital Library
- Agrawal, R. and Srikant, R. 2000. Privacy preserving data mining. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarDigital Library
- Bayardo, R. J. and Agrawal, R. 2005. Data privacy through optimal k-anonymization. In Proceedings of the International Conference on Data Engineering. Google ScholarDigital Library
- Blum, A., Dwork, C., McSherry, F., and Nissim, K. 2005. Practical privacy: the sulq framework. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Google ScholarDigital Library
- Carlisle, D. M., Rodrian, M. L., and Diamond, C. L. 2007. California inpatient data reporting manual, medical information reporting for California, 5th edition. Tech. rep., Office of Statewide Health Planning and Development.Google Scholar
- Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., and Zhu, M. Y. 2002. Tools for privacy preserving distributed data mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Explor. Newslett. 4, 2, 28--34. Google ScholarDigital Library
- Dinur, I. and Nissim, K. 2003. Revealing information while preserving privacy. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Google ScholarDigital Library
- Du, W., Han, Y. S., and Chen, S. 2004. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In Proceedings of the SIAM International Conference on Data Mining.Google Scholar
- Du, W. and Zhan, Z. 2002. Building decision tree classifier on private data. In Proceedings of the IEEE ICDM Workshop on Privacy, Security, and Data Mining. Google ScholarDigital Library
- Du, W. L. 2001. A study of several specific secure two-party computation problems. PhD thesis, Purdue University, West Lafayette. Google ScholarDigital Library
- Dwork, C. 2006. Differential privacy. In Proceedings of the International Colloquium on Automata, Languages, and Programming. Google ScholarDigital Library
- Dwork, C., McSherry, F., Nissim, K., and Smith, A. 2006. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Theory of Cryptography Conference. Google ScholarDigital Library
- Fuller, W. A. 1993. Masking procedures for microdata disclosure limitation. Official Statistics.Google Scholar
- Fung, B. C. M., Wang, K., Chen, R., and Yu, P. S. 2010. Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv. 42, 4, 1--53. Google ScholarDigital Library
- Fung, B. C. M., Wang, K., and Yu, P. S. 2007. Anonymizing classification data for privacy preservation. IEEE Trans. Knowl. Data Engin. 19, 5, 711--725. Google ScholarDigital Library
- Gardner, J. and Xiong, L. 2009. An integrated framework for de-identifying heterogeneous data. Data Knowl. Engin. Google ScholarDigital Library
- Ghinita, G., Tao, Y., and Kalnis, P. 2008. On the anonymization of sparse high-dimensional data. In Proceedings of the International Conference on Data Engineering. Google ScholarDigital Library
- Iyengar, V. S. 2002. Transforming data to satisfy privacy constraints. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Jiang, W. and Clifton, C. 2005. Privacy-preserving distributed k-anonymity. In Proceedings of the Working Conference on Data and Applications Security. Google ScholarDigital Library
- Jiang, W. and Clifton, C. 2006. A secure distributed framework for achieving k-anonymity. J. VLDB 15, 4, 316--333. Google ScholarDigital Library
- Jurczyk, P. and Xiong, L. 2008. Towards privacy-preserving integration of distributed heterogeneous data. In Proceedings of the PhD Workshop on Information and Knowledge Management (PIKM). Google ScholarDigital Library
- Jurczyk, P. and Xiong, L. 2009. Distributed anonymization: Achieving privacy for both data subjects and data providers. In Proceedings of the Working Conference on Data and Applications Security. Google ScholarDigital Library
- Kim, J. and Winkler, W. 1995. Masking microdata files. In Proceedings of the ASA Section on Survey Research Methods.Google Scholar
- LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. 2006. Mondrian multidimensional k-anonymity. In Proceedings of the International Conference on Data Engineering. Google ScholarDigital Library
- LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. 2008. Workload-aware anonymization techniques for large-scale datasets. ACM Trans. Datab. Syst. Google ScholarDigital Library
- Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkitasubramaniam, M. 2007. ℓ-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data. Google ScholarDigital Library
- Mohammed, N., Fung, B. C. M., Hung, P. C. K., and Lee, C. 2009a. Anonymizing healthcare data: A case study on the blood transfusion service. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Mohammed, N., Fung, B. C. M., Wang, K., and Hung, P. C. K. 2009b. Privacy-preserving data mashup. In Proceedings of the International Conference on Extending Database Technology. Google ScholarDigital Library
- Newman, D. J., Hettich, S., Blake, C. L., and Merz, C. J. 1998. UCI Repository of Machine Learning Databases.Google Scholar
- Quinlan, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann. Google ScholarDigital Library
- Samarati, P. 2001. Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Engin. Google ScholarDigital Library
- Schneier, B. 1995. Applied Cryptography. 2nd Ed. John Wiley & Sons.Google Scholar
- Skowron, A. and Rauszer, C. 1992. The discernibility matrices and functions in information systems. In Intelligent Decision Support: Handbook of Applications and Advances of the Rough Set Theory.Google Scholar
- Sweeney, L. 2002. k-anonymity: A model for protecting privacy. Int. J. Uncert. Fuzz. Knowl. Based Syst. Google ScholarDigital Library
- Terrovitis, M., Mamoulis, N., and Kalnis, P. 2008. Privacy-preserving anonymization of set-valued data. In Proceedings of the International Conference on Very Large Databases.Google Scholar
- Vaidya, J. and Clifton, C. 2002. Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Vaidya, J. and Clifton, C. 2003. Privacy-preserving k-means clustering over vertically partitioned data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Wang, K., Fung, B. C. M., and Yu, P. S. 2007. Handicapping attacker’s confidence: An alternative to k-anonymization. Knowl. Inform. Syst. 11, 3, 345--368. Google ScholarDigital Library
- Wong, R. C. W., Li., J., Fu, A. W. C., and Wang, K. 2006. α, k-anonymity: An enhanced k-anonymity model for privacy preserving data publishing. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Xiao, X. and Tao, Y. 2006a. Anatomy: Simple and effective privacy preservation. In Proceedings of the International Conference on Very Large Databases. Google ScholarDigital Library
- Xiao, X. and Tao, Y. 2006b. Personalized privacy preservation. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarDigital Library
- Xu, Y., Wang, K., Fu, A. W. C., and Yu, P. S. 2008. Anonymizing transaction databases for publication. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Yang, Z., Zhong, S., and Wright, R. N. 2005. Privacy-preserving classification of customer data without loss of accuracy. In Proceedings of the SIAM International Conference on Data Mining.Google Scholar
- Zhao, K., Liu, B., Tirpak, T. M., and Xiao, W. 2005. A visual data mining framework for convenient identification of useful knowledge. In Proceedings of the IEEE ICDM: IEEE International Conference on Data Mining. Google ScholarDigital Library
Index Terms
- Centralized and Distributed Anonymization for High-Dimensional Healthcare Data
Recommendations
Anonymizing healthcare data: a case study on the blood transfusion service
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data miningSharing healthcare data has become a vital requirement in healthcare system management; however, inappropriate sharing and usage of healthcare data could threaten patients' privacy. In this paper, we study the privacy concerns of the blood transfusion ...
A framework for efficient data anonymization under privacy and accuracy constraints
Recent research studied the problem of publishing microdata without revealing sensitive information, leading to the privacy-preserving paradigms of k-anonymity and l-diversity. k-anonymity protects against the identification of an individual's record. l-...
Efficient and flexible anonymization of transaction data
Transaction data are increasingly used in applications, such as marketing research and biomedical studies. Publishing these data, however, may risk privacy breaches, as they often contain personal information about individuals. Approaches to anonymizing ...
Comments