Skip to main content
Top
Published in: Cluster Computing 2/2017

28-02-2017

A framework for utility enhanced incomplete microdata anonymization

Authors: Qiyuan Gong, Ming Yang, Zhouguo Chen, Wenjia Wu, Junzhou Luo

Published in: Cluster Computing | Issue 2/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Incomplete microdata, i.e., microdata with missing value, is very common in real-world datasets. However, existing anonymization techniques, which were developed for complete datasets, suffer from serious information loss on incomplete microdata, due to the missing value pollution. In this paper, we propose a framework for utility enhanced anonymization of incomplete microdata to address this issue. First, we study the properties of missing value pollution on generalization. Guided by these properties, we develop two top-down anonymization algorithms to preserve data utility on incomplete microdata. Extensive experiments on real-world datasets show that our techniques outperform the state-of-the-art techniques in terms of information loss and missing value pollution.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Mondrian, Enhanced Mondrian and semi-partition.
 
4
According to the documents provided by UCI and INFORMS, ‘?’ in Adult data and -1, -7, -8, -9 in INFORMS are considered as missing values.
 
5
We assume age range is [1, 100], and Zipcode range is [10001, 50000].
 
Literature
2.
go back to reference Markkula, J.: Dynamic geographic personal data—new opportunity and challenge introduced by the location-aware mobile networks. Cluster Comput. 4(4), 369–377 (2001)CrossRef Markkula, J.: Dynamic geographic personal data—new opportunity and challenge introduced by the location-aware mobile networks. Cluster Comput. 4(4), 369–377 (2001)CrossRef
3.
4.
go back to reference LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: ICDE’06: Proceedings of the 22nd International Conference on Data Engineering, p. 25. IEEE Computer Society, Washington, DC (2006) LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: ICDE’06: Proceedings of the 22nd International Conference on Data Engineering, p. 25. IEEE Computer Society, Washington, DC (2006)
5.
go back to reference Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.-C.: Utility-based anonymization using local recoding. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD’06, pp. 785–790. ACM, New York (2006). doi:10.1145/1150402.1150504 Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.-C.: Utility-based anonymization using local recoding. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD’06, pp. 785–790. ACM, New York (2006). doi:10.​1145/​1150402.​1150504
7.
go back to reference Nergiz, M., Clifton, C., Nergiz, A.: Multirelational k-anonymity. IEEE Trans. Knowl. Data Eng. 21(8), 1104–1117 (2009)CrossRef Nergiz, M., Clifton, C., Nergiz, A.: Multirelational k-anonymity. IEEE Trans. Knowl. Data Eng. 21(8), 1104–1117 (2009)CrossRef
8.
go back to reference Gong, Q., Luo, J., Yang, M.: Aim: a new privacy preservation algorithm for incomplete microdata based on anatomy. In: Proceedings of the 2012 International Conference on Pervasive Computing and the Networked World, ser. ICPCA/SWS’12, pp. 194–208. Springer, Berlin (2013). doi:10.1007/978-3-642-37015-1_16 Gong, Q., Luo, J., Yang, M.: Aim: a new privacy preservation algorithm for incomplete microdata based on anatomy. In: Proceedings of the 2012 International Conference on Pervasive Computing and the Networked World, ser. ICPCA/SWS’12, pp. 194–208. Springer, Berlin (2013). doi:10.​1007/​978-3-642-37015-1_​16
9.
go back to reference Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 571–588 (2002)MathSciNetCrossRefMATH Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 571–588 (2002)MathSciNetCrossRefMATH
10.
go back to reference Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1), 3 (2007)CrossRef Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1), 3 (2007)CrossRef
11.
go back to reference Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: IEEE 23rd International Conference on Data Engineering (ICDE), IEEE, pp. 106–115 (2007) Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: IEEE 23rd International Conference on Data Engineering (ICDE), IEEE, pp. 106–115 (2007)
14.
go back to reference Gong, Q., Luo, J., Yang, M., Ni, W., Li, X.-B.: Anonymizing 1:m microdata with high utility. Knowl. Based Syst. 115, 15–26 (2017)CrossRef Gong, Q., Luo, J., Yang, M., Ni, W., Li, X.-B.: Anonymizing 1:m microdata with high utility. Knowl. Based Syst. 115, 15–26 (2017)CrossRef
15.
go back to reference Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: PODS’04: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 223–228. ACM, New York (2004) Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: PODS’04: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 223–228. ACM, New York (2004)
16.
go back to reference Xiao, X., Yi, K., Tao, Y.: The hardness and approximation algorithms for l-diversity. In EDBT’10: Proceedings of the 13th International Conference on Extending Database Technology, pp. 135–146. ACM, New York (2010) Xiao, X., Yi, K., Tao, Y.: The hardness and approximation algorithms for l-diversity. In EDBT’10: Proceedings of the 13th International Conference on Extending Database Technology, pp. 135–146. ACM, New York (2010)
17.
go back to reference He, Y., Naughton, J.F.: Anonymization of set-valued data via top-down, local generalization. Proc. VLDB Endow. 2(1), 934–945 (2009)CrossRef He, Y., Naughton, J.F.: Anonymization of set-valued data via top-down, local generalization. Proc. VLDB Endow. 2(1), 934–945 (2009)CrossRef
18.
go back to reference Zakerzadeh, H., Aggarwal, C.C., Barker, K.: Privacy-preserving big data publishing. In: Proceedings of the 27th International Conference on Scientific and Statistical Database Management, ser. SSDBM’15, pp. 26:1–26:11. ACM, New York (2015). doi:10.1145/2791347.2791380 Zakerzadeh, H., Aggarwal, C.C., Barker, K.: Privacy-preserving big data publishing. In: Proceedings of the 27th International Conference on Scientific and Statistical Database Management, ser. SSDBM’15, pp. 26:1–26:11. ACM, New York (2015). doi:10.​1145/​2791347.​2791380
19.
go back to reference Ni, W., Chong, Z.: Clustering-oriented privacy-preserving data publishing. Knowl. Based Syst. 35, 264–270 (2012)CrossRef Ni, W., Chong, Z.: Clustering-oriented privacy-preserving data publishing. Knowl. Based Syst. 35, 264–270 (2012)CrossRef
20.
go back to reference Guo, K., Zhang, Q.: Fast clustering-based anonymization approaches with time constraints for data streams. Knowl. Based Syst. 46, 95–108 (2013) Guo, K., Zhang, Q.: Fast clustering-based anonymization approaches with time constraints for data streams. Knowl. Based Syst. 46, 95–108 (2013)
21.
go back to reference Bhuyan, H.K., Kamila, N.K.: Privacy preserving sub-feature selection based on fuzzy probabilities. Cluster Comput. 17(4), 1383–1399 (2014)CrossRef Bhuyan, H.K., Kamila, N.K.: Privacy preserving sub-feature selection based on fuzzy probabilities. Cluster Comput. 17(4), 1383–1399 (2014)CrossRef
22.
go back to reference Wong, W.K., Mamoulis, N., Cheung, D.W.L.: Non-homogeneous generalization in privacy preserving data publishing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD’10, pp. 747–758. ACM, New York (2010). doi:10.1145/1807167.1807248 Wong, W.K., Mamoulis, N., Cheung, D.W.L.: Non-homogeneous generalization in privacy preserving data publishing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD’10, pp. 747–758. ACM, New York (2010). doi:10.​1145/​1807167.​1807248
23.
go back to reference Xue, M., Karras, P., Raïssi, C., Vaidya, J., Tan, K.-L.: Anonymizing set-valued data by nonreciprocal recoding. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD’12, pp. 1050–1058. ACM, New York (2012). doi:10.1145/2339530.2339696 Xue, M., Karras, P., Raïssi, C., Vaidya, J., Tan, K.-L.: Anonymizing set-valued data by nonreciprocal recoding. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD’12, pp. 1050–1058. ACM, New York (2012). doi:10.​1145/​2339530.​2339696
24.
go back to reference Doka, K., Xue, M., Tsoumakos, D., Karras, P.: k-anonymization by freeform generalization. In: Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security, ser. ASIA CCS’15, pp. 519–530. ACM, New York (2015). doi:10.1145/2714576.2714590 Doka, K., Xue, M., Tsoumakos, D., Karras, P.: k-anonymization by freeform generalization. In: Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security, ser. ASIA CCS’15, pp. 519–530. ACM, New York (2015). doi:10.​1145/​2714576.​2714590
26.
go back to reference Brown, M.L., Kros, J.F.: Data mining and the impact of missing data. Ind. Manag. Data Syst. 103(8), 611–621 (2003)CrossRef Brown, M.L., Kros, J.F.: Data mining and the impact of missing data. Ind. Manag. Data Syst. 103(8), 611–621 (2003)CrossRef
27.
go back to reference Zhang, S., Zhang, J., Zhu, X., Qin, Y., Zhang, C.: Missing value imputation based on data clustering. In: Gavrilova, M., Tan, C. (eds.) Transactions on Computational Science I, ser. Lecture Notes in Computer Science, vol. 4750, pp. 128–138. Springer, Berlin (2008). doi:10.1007/978-3-540-79299-4_7 Zhang, S., Zhang, J., Zhu, X., Qin, Y., Zhang, C.: Missing value imputation based on data clustering. In: Gavrilova, M., Tan, C. (eds.) Transactions on Computational Science I, ser. Lecture Notes in Computer Science, vol. 4750, pp. 128–138. Springer, Berlin (2008). doi:10.​1007/​978-3-540-79299-4_​7
28.
go back to reference Zhu, X., Zhang, S., Jin, Z., Zhang, Z., Xu, Z.: Missing value estimation for mixed-attribute data sets. IEEE Trans. Knowl. Data Eng. 23(1), 110–121 (2011)CrossRef Zhu, X., Zhang, S., Jin, Z., Zhang, Z., Xu, Z.: Missing value estimation for mixed-attribute data sets. IEEE Trans. Knowl. Data Eng. 23(1), 110–121 (2011)CrossRef
29.
go back to reference Zhang, X., Leckie, C., Dou, W., Chen, J., Kotagiri, R., Salcic, Z.: Scalable local-recoding anonymization using locality sensitive hashing for big data privacy preservation. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, ser. CIKM’16, pp. 1793–1802. ACM, New York (2016). doi:10.1145/2983323.2983841 Zhang, X., Leckie, C., Dou, W., Chen, J., Kotagiri, R., Salcic, Z.: Scalable local-recoding anonymization using locality sensitive hashing for big data privacy preservation. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, ser. CIKM’16, pp. 1793–1802. ACM, New York (2016). doi:10.​1145/​2983323.​2983841
31.
go back to reference LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In SIGMOD’05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 49–60. ACM, New York (2005) LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In SIGMOD’05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 49–60. ACM, New York (2005)
32.
go back to reference Poulis, G., Loukides, G., Gkoulalas-Divanis, A., Skiadopoulos, S.: Anonymizing data with relational and transaction attributes. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD) (2013) Poulis, G., Loukides, G., Gkoulalas-Divanis, A., Skiadopoulos, S.: Anonymizing data with relational and transaction attributes. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD) (2013)
33.
go back to reference Bayardo, R.J., Agrawal, R.: Data Privacy Through Optimal k-Anonymization. IEEE Computer Society, Los Alamitos (2005)CrossRef Bayardo, R.J., Agrawal, R.: Data Privacy Through Optimal k-Anonymization. IEEE Computer Society, Los Alamitos (2005)CrossRef
34.
go back to reference Byun, J.-W., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymization using clustering techniques. In: Proceedings of the 12th International Conference on Database Systems for Advanced Applications, ser. DASFAA’07, pp. 188–200. Springer, Berlin (2007) Byun, J.-W., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymization using clustering techniques. In: Proceedings of the 12th International Conference on Database Systems for Advanced Applications, ser. DASFAA’07, pp. 188–200. Springer, Berlin (2007)
Metadata
Title
A framework for utility enhanced incomplete microdata anonymization
Authors
Qiyuan Gong
Ming Yang
Zhouguo Chen
Wenjia Wu
Junzhou Luo
Publication date
28-02-2017
Publisher
Springer US
Published in
Cluster Computing / Issue 2/2017
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-0795-6

Other articles of this Issue 2/2017

Cluster Computing 2/2017 Go to the issue

Premium Partner