skip to main content
research-article

Centralized and Distributed Anonymization for High-Dimensional Healthcare Data

Published:01 October 2010Publication History
Skip Abstract Section

Abstract

Sharing healthcare data has become a vital requirement in healthcare system management; however, inappropriate sharing and usage of healthcare data could threaten patients’ privacy. In this article, we study the privacy concerns of sharing patient information between the Hong Kong Red Cross Blood Transfusion Service (BTS) and the public hospitals. We generalize their information and privacy requirements to the problems of centralized anonymization and distributed anonymization, and identify the major challenges that make traditional data anonymization methods not applicable. Furthermore, we propose a new privacy model called LKC-privacy to overcome the challenges and present two anonymization algorithms to achieve LKC-privacy in both the centralized and the distributed scenarios. Experiments on real-life data demonstrate that our anonymization algorithms can effectively retain the essential information in anonymous data for data analysis and is scalable for anonymizing large datasets.

References

  1. Adam, N. R. and Wortman, J. C. 1989. Security control methods for statistical databases. ACM Comput. Surv. 21, 4, 515--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aggarwal, C. C. 2005. On k-anonymity and the curse of dimensionality. In Proceedings of the International Conference on Very Large Databases. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Aggarwal, C. C. and Yu, P. S. 2008. Privacy Preserving Data Mining: Models and Algorithms. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Agrawal, R. and Srikant, R. 2000. Privacy preserving data mining. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bayardo, R. J. and Agrawal, R. 2005. Data privacy through optimal k-anonymization. In Proceedings of the International Conference on Data Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Blum, A., Dwork, C., McSherry, F., and Nissim, K. 2005. Practical privacy: the sulq framework. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Carlisle, D. M., Rodrian, M. L., and Diamond, C. L. 2007. California inpatient data reporting manual, medical information reporting for California, 5th edition. Tech. rep., Office of Statewide Health Planning and Development.Google ScholarGoogle Scholar
  8. Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., and Zhu, M. Y. 2002. Tools for privacy preserving distributed data mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Explor. Newslett. 4, 2, 28--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dinur, I. and Nissim, K. 2003. Revealing information while preserving privacy. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Du, W., Han, Y. S., and Chen, S. 2004. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In Proceedings of the SIAM International Conference on Data Mining.Google ScholarGoogle Scholar
  11. Du, W. and Zhan, Z. 2002. Building decision tree classifier on private data. In Proceedings of the IEEE ICDM Workshop on Privacy, Security, and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Du, W. L. 2001. A study of several specific secure two-party computation problems. PhD thesis, Purdue University, West Lafayette. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dwork, C. 2006. Differential privacy. In Proceedings of the International Colloquium on Automata, Languages, and Programming. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dwork, C., McSherry, F., Nissim, K., and Smith, A. 2006. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Theory of Cryptography Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Fuller, W. A. 1993. Masking procedures for microdata disclosure limitation. Official Statistics.Google ScholarGoogle Scholar
  16. Fung, B. C. M., Wang, K., Chen, R., and Yu, P. S. 2010. Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv. 42, 4, 1--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Fung, B. C. M., Wang, K., and Yu, P. S. 2007. Anonymizing classification data for privacy preservation. IEEE Trans. Knowl. Data Engin. 19, 5, 711--725. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Gardner, J. and Xiong, L. 2009. An integrated framework for de-identifying heterogeneous data. Data Knowl. Engin. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ghinita, G., Tao, Y., and Kalnis, P. 2008. On the anonymization of sparse high-dimensional data. In Proceedings of the International Conference on Data Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Iyengar, V. S. 2002. Transforming data to satisfy privacy constraints. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jiang, W. and Clifton, C. 2005. Privacy-preserving distributed k-anonymity. In Proceedings of the Working Conference on Data and Applications Security. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jiang, W. and Clifton, C. 2006. A secure distributed framework for achieving k-anonymity. J. VLDB 15, 4, 316--333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jurczyk, P. and Xiong, L. 2008. Towards privacy-preserving integration of distributed heterogeneous data. In Proceedings of the PhD Workshop on Information and Knowledge Management (PIKM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jurczyk, P. and Xiong, L. 2009. Distributed anonymization: Achieving privacy for both data subjects and data providers. In Proceedings of the Working Conference on Data and Applications Security. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kim, J. and Winkler, W. 1995. Masking microdata files. In Proceedings of the ASA Section on Survey Research Methods.Google ScholarGoogle Scholar
  26. LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. 2006. Mondrian multidimensional k-anonymity. In Proceedings of the International Conference on Data Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. 2008. Workload-aware anonymization techniques for large-scale datasets. ACM Trans. Datab. Syst. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkitasubramaniam, M. 2007. ℓ-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mohammed, N., Fung, B. C. M., Hung, P. C. K., and Lee, C. 2009a. Anonymizing healthcare data: A case study on the blood transfusion service. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mohammed, N., Fung, B. C. M., Wang, K., and Hung, P. C. K. 2009b. Privacy-preserving data mashup. In Proceedings of the International Conference on Extending Database Technology. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Newman, D. J., Hettich, S., Blake, C. L., and Merz, C. J. 1998. UCI Repository of Machine Learning Databases.Google ScholarGoogle Scholar
  32. Quinlan, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Samarati, P. 2001. Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Engin. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Schneier, B. 1995. Applied Cryptography. 2nd Ed. John Wiley & Sons.Google ScholarGoogle Scholar
  35. Skowron, A. and Rauszer, C. 1992. The discernibility matrices and functions in information systems. In Intelligent Decision Support: Handbook of Applications and Advances of the Rough Set Theory.Google ScholarGoogle Scholar
  36. Sweeney, L. 2002. k-anonymity: A model for protecting privacy. Int. J. Uncert. Fuzz. Knowl. Based Syst. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Terrovitis, M., Mamoulis, N., and Kalnis, P. 2008. Privacy-preserving anonymization of set-valued data. In Proceedings of the International Conference on Very Large Databases.Google ScholarGoogle Scholar
  38. Vaidya, J. and Clifton, C. 2002. Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Vaidya, J. and Clifton, C. 2003. Privacy-preserving k-means clustering over vertically partitioned data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Wang, K., Fung, B. C. M., and Yu, P. S. 2007. Handicapping attacker’s confidence: An alternative to k-anonymization. Knowl. Inform. Syst. 11, 3, 345--368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Wong, R. C. W., Li., J., Fu, A. W. C., and Wang, K. 2006. α, k-anonymity: An enhanced k-anonymity model for privacy preserving data publishing. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Xiao, X. and Tao, Y. 2006a. Anatomy: Simple and effective privacy preservation. In Proceedings of the International Conference on Very Large Databases. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Xiao, X. and Tao, Y. 2006b. Personalized privacy preservation. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Xu, Y., Wang, K., Fu, A. W. C., and Yu, P. S. 2008. Anonymizing transaction databases for publication. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yang, Z., Zhong, S., and Wright, R. N. 2005. Privacy-preserving classification of customer data without loss of accuracy. In Proceedings of the SIAM International Conference on Data Mining.Google ScholarGoogle Scholar
  46. Zhao, K., Liu, B., Tirpak, T. M., and Xiao, W. 2005. A visual data mining framework for convenient identification of useful knowledge. In Proceedings of the IEEE ICDM: IEEE International Conference on Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Centralized and Distributed Anonymization for High-Dimensional Healthcare Data

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Knowledge Discovery from Data
          ACM Transactions on Knowledge Discovery from Data  Volume 4, Issue 4
          October 2010
          121 pages
          ISSN:1556-4681
          EISSN:1556-472X
          DOI:10.1145/1857947
          Issue’s Table of Contents

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 October 2010
          • Accepted: 1 June 2010
          • Received: 1 January 2010
          Published in tkdd Volume 4, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader