Skip to main content
Erschienen in: Cluster Computing 2/2017

28.02.2017

A framework for utility enhanced incomplete microdata anonymization

verfasst von: Qiyuan Gong, Ming Yang, Zhouguo Chen, Wenjia Wu, Junzhou Luo

Erschienen in: Cluster Computing | Ausgabe 2/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Incomplete microdata, i.e., microdata with missing value, is very common in real-world datasets. However, existing anonymization techniques, which were developed for complete datasets, suffer from serious information loss on incomplete microdata, due to the missing value pollution. In this paper, we propose a framework for utility enhanced anonymization of incomplete microdata to address this issue. First, we study the properties of missing value pollution on generalization. Guided by these properties, we develop two top-down anonymization algorithms to preserve data utility on incomplete microdata. Extensive experiments on real-world datasets show that our techniques outperform the state-of-the-art techniques in terms of information loss and missing value pollution.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Mondrian, Enhanced Mondrian and semi-partition.
 
4
According to the documents provided by UCI and INFORMS, ‘?’ in Adult data and -1, -7, -8, -9 in INFORMS are considered as missing values.
 
5
We assume age range is [1, 100], and Zipcode range is [10001, 50000].
 
Literatur
2.
Zurück zum Zitat Markkula, J.: Dynamic geographic personal data—new opportunity and challenge introduced by the location-aware mobile networks. Cluster Comput. 4(4), 369–377 (2001)CrossRef Markkula, J.: Dynamic geographic personal data—new opportunity and challenge introduced by the location-aware mobile networks. Cluster Comput. 4(4), 369–377 (2001)CrossRef
3.
Zurück zum Zitat Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)MathSciNetCrossRefMATH Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)MathSciNetCrossRefMATH
4.
Zurück zum Zitat LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: ICDE’06: Proceedings of the 22nd International Conference on Data Engineering, p. 25. IEEE Computer Society, Washington, DC (2006) LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: ICDE’06: Proceedings of the 22nd International Conference on Data Engineering, p. 25. IEEE Computer Society, Washington, DC (2006)
5.
Zurück zum Zitat Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.-C.: Utility-based anonymization using local recoding. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD’06, pp. 785–790. ACM, New York (2006). doi:10.1145/1150402.1150504 Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.-C.: Utility-based anonymization using local recoding. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD’06, pp. 785–790. ACM, New York (2006). doi:10.​1145/​1150402.​1150504
7.
Zurück zum Zitat Nergiz, M., Clifton, C., Nergiz, A.: Multirelational k-anonymity. IEEE Trans. Knowl. Data Eng. 21(8), 1104–1117 (2009)CrossRef Nergiz, M., Clifton, C., Nergiz, A.: Multirelational k-anonymity. IEEE Trans. Knowl. Data Eng. 21(8), 1104–1117 (2009)CrossRef
8.
Zurück zum Zitat Gong, Q., Luo, J., Yang, M.: Aim: a new privacy preservation algorithm for incomplete microdata based on anatomy. In: Proceedings of the 2012 International Conference on Pervasive Computing and the Networked World, ser. ICPCA/SWS’12, pp. 194–208. Springer, Berlin (2013). doi:10.1007/978-3-642-37015-1_16 Gong, Q., Luo, J., Yang, M.: Aim: a new privacy preservation algorithm for incomplete microdata based on anatomy. In: Proceedings of the 2012 International Conference on Pervasive Computing and the Networked World, ser. ICPCA/SWS’12, pp. 194–208. Springer, Berlin (2013). doi:10.​1007/​978-3-642-37015-1_​16
9.
Zurück zum Zitat Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 571–588 (2002)MathSciNetCrossRefMATH Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 571–588 (2002)MathSciNetCrossRefMATH
10.
Zurück zum Zitat Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1), 3 (2007)CrossRef Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1), 3 (2007)CrossRef
11.
Zurück zum Zitat Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: IEEE 23rd International Conference on Data Engineering (ICDE), IEEE, pp. 106–115 (2007) Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: IEEE 23rd International Conference on Data Engineering (ICDE), IEEE, pp. 106–115 (2007)
14.
Zurück zum Zitat Gong, Q., Luo, J., Yang, M., Ni, W., Li, X.-B.: Anonymizing 1:m microdata with high utility. Knowl. Based Syst. 115, 15–26 (2017)CrossRef Gong, Q., Luo, J., Yang, M., Ni, W., Li, X.-B.: Anonymizing 1:m microdata with high utility. Knowl. Based Syst. 115, 15–26 (2017)CrossRef
15.
Zurück zum Zitat Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: PODS’04: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 223–228. ACM, New York (2004) Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: PODS’04: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 223–228. ACM, New York (2004)
16.
Zurück zum Zitat Xiao, X., Yi, K., Tao, Y.: The hardness and approximation algorithms for l-diversity. In EDBT’10: Proceedings of the 13th International Conference on Extending Database Technology, pp. 135–146. ACM, New York (2010) Xiao, X., Yi, K., Tao, Y.: The hardness and approximation algorithms for l-diversity. In EDBT’10: Proceedings of the 13th International Conference on Extending Database Technology, pp. 135–146. ACM, New York (2010)
17.
Zurück zum Zitat He, Y., Naughton, J.F.: Anonymization of set-valued data via top-down, local generalization. Proc. VLDB Endow. 2(1), 934–945 (2009)CrossRef He, Y., Naughton, J.F.: Anonymization of set-valued data via top-down, local generalization. Proc. VLDB Endow. 2(1), 934–945 (2009)CrossRef
18.
Zurück zum Zitat Zakerzadeh, H., Aggarwal, C.C., Barker, K.: Privacy-preserving big data publishing. In: Proceedings of the 27th International Conference on Scientific and Statistical Database Management, ser. SSDBM’15, pp. 26:1–26:11. ACM, New York (2015). doi:10.1145/2791347.2791380 Zakerzadeh, H., Aggarwal, C.C., Barker, K.: Privacy-preserving big data publishing. In: Proceedings of the 27th International Conference on Scientific and Statistical Database Management, ser. SSDBM’15, pp. 26:1–26:11. ACM, New York (2015). doi:10.​1145/​2791347.​2791380
19.
Zurück zum Zitat Ni, W., Chong, Z.: Clustering-oriented privacy-preserving data publishing. Knowl. Based Syst. 35, 264–270 (2012)CrossRef Ni, W., Chong, Z.: Clustering-oriented privacy-preserving data publishing. Knowl. Based Syst. 35, 264–270 (2012)CrossRef
20.
Zurück zum Zitat Guo, K., Zhang, Q.: Fast clustering-based anonymization approaches with time constraints for data streams. Knowl. Based Syst. 46, 95–108 (2013) Guo, K., Zhang, Q.: Fast clustering-based anonymization approaches with time constraints for data streams. Knowl. Based Syst. 46, 95–108 (2013)
21.
Zurück zum Zitat Bhuyan, H.K., Kamila, N.K.: Privacy preserving sub-feature selection based on fuzzy probabilities. Cluster Comput. 17(4), 1383–1399 (2014)CrossRef Bhuyan, H.K., Kamila, N.K.: Privacy preserving sub-feature selection based on fuzzy probabilities. Cluster Comput. 17(4), 1383–1399 (2014)CrossRef
22.
Zurück zum Zitat Wong, W.K., Mamoulis, N., Cheung, D.W.L.: Non-homogeneous generalization in privacy preserving data publishing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD’10, pp. 747–758. ACM, New York (2010). doi:10.1145/1807167.1807248 Wong, W.K., Mamoulis, N., Cheung, D.W.L.: Non-homogeneous generalization in privacy preserving data publishing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD’10, pp. 747–758. ACM, New York (2010). doi:10.​1145/​1807167.​1807248
23.
Zurück zum Zitat Xue, M., Karras, P., Raïssi, C., Vaidya, J., Tan, K.-L.: Anonymizing set-valued data by nonreciprocal recoding. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD’12, pp. 1050–1058. ACM, New York (2012). doi:10.1145/2339530.2339696 Xue, M., Karras, P., Raïssi, C., Vaidya, J., Tan, K.-L.: Anonymizing set-valued data by nonreciprocal recoding. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD’12, pp. 1050–1058. ACM, New York (2012). doi:10.​1145/​2339530.​2339696
24.
Zurück zum Zitat Doka, K., Xue, M., Tsoumakos, D., Karras, P.: k-anonymization by freeform generalization. In: Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security, ser. ASIA CCS’15, pp. 519–530. ACM, New York (2015). doi:10.1145/2714576.2714590 Doka, K., Xue, M., Tsoumakos, D., Karras, P.: k-anonymization by freeform generalization. In: Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security, ser. ASIA CCS’15, pp. 519–530. ACM, New York (2015). doi:10.​1145/​2714576.​2714590
26.
Zurück zum Zitat Brown, M.L., Kros, J.F.: Data mining and the impact of missing data. Ind. Manag. Data Syst. 103(8), 611–621 (2003)CrossRef Brown, M.L., Kros, J.F.: Data mining and the impact of missing data. Ind. Manag. Data Syst. 103(8), 611–621 (2003)CrossRef
27.
Zurück zum Zitat Zhang, S., Zhang, J., Zhu, X., Qin, Y., Zhang, C.: Missing value imputation based on data clustering. In: Gavrilova, M., Tan, C. (eds.) Transactions on Computational Science I, ser. Lecture Notes in Computer Science, vol. 4750, pp. 128–138. Springer, Berlin (2008). doi:10.1007/978-3-540-79299-4_7 Zhang, S., Zhang, J., Zhu, X., Qin, Y., Zhang, C.: Missing value imputation based on data clustering. In: Gavrilova, M., Tan, C. (eds.) Transactions on Computational Science I, ser. Lecture Notes in Computer Science, vol. 4750, pp. 128–138. Springer, Berlin (2008). doi:10.​1007/​978-3-540-79299-4_​7
28.
Zurück zum Zitat Zhu, X., Zhang, S., Jin, Z., Zhang, Z., Xu, Z.: Missing value estimation for mixed-attribute data sets. IEEE Trans. Knowl. Data Eng. 23(1), 110–121 (2011)CrossRef Zhu, X., Zhang, S., Jin, Z., Zhang, Z., Xu, Z.: Missing value estimation for mixed-attribute data sets. IEEE Trans. Knowl. Data Eng. 23(1), 110–121 (2011)CrossRef
29.
Zurück zum Zitat Zhang, X., Leckie, C., Dou, W., Chen, J., Kotagiri, R., Salcic, Z.: Scalable local-recoding anonymization using locality sensitive hashing for big data privacy preservation. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, ser. CIKM’16, pp. 1793–1802. ACM, New York (2016). doi:10.1145/2983323.2983841 Zhang, X., Leckie, C., Dou, W., Chen, J., Kotagiri, R., Salcic, Z.: Scalable local-recoding anonymization using locality sensitive hashing for big data privacy preservation. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, ser. CIKM’16, pp. 1793–1802. ACM, New York (2016). doi:10.​1145/​2983323.​2983841
31.
Zurück zum Zitat LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In SIGMOD’05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 49–60. ACM, New York (2005) LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In SIGMOD’05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 49–60. ACM, New York (2005)
32.
Zurück zum Zitat Poulis, G., Loukides, G., Gkoulalas-Divanis, A., Skiadopoulos, S.: Anonymizing data with relational and transaction attributes. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD) (2013) Poulis, G., Loukides, G., Gkoulalas-Divanis, A., Skiadopoulos, S.: Anonymizing data with relational and transaction attributes. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD) (2013)
33.
Zurück zum Zitat Bayardo, R.J., Agrawal, R.: Data Privacy Through Optimal k-Anonymization. IEEE Computer Society, Los Alamitos (2005)CrossRef Bayardo, R.J., Agrawal, R.: Data Privacy Through Optimal k-Anonymization. IEEE Computer Society, Los Alamitos (2005)CrossRef
34.
Zurück zum Zitat Byun, J.-W., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymization using clustering techniques. In: Proceedings of the 12th International Conference on Database Systems for Advanced Applications, ser. DASFAA’07, pp. 188–200. Springer, Berlin (2007) Byun, J.-W., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymization using clustering techniques. In: Proceedings of the 12th International Conference on Database Systems for Advanced Applications, ser. DASFAA’07, pp. 188–200. Springer, Berlin (2007)
Metadaten
Titel
A framework for utility enhanced incomplete microdata anonymization
verfasst von
Qiyuan Gong
Ming Yang
Zhouguo Chen
Wenjia Wu
Junzhou Luo
Publikationsdatum
28.02.2017
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 2/2017
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-0795-6

Weitere Artikel der Ausgabe 2/2017

Cluster Computing 2/2017 Zur Ausgabe