Skip to main content
Erschienen in: Progress in Artificial Intelligence 4/2016

08.07.2016 | Regular Paper

Applying multi-label and multi-class classification to enhance K-anonymity in sequential releases

verfasst von: Dung Tran, Marina Sokolova

Erschienen in: Progress in Artificial Intelligence | Ausgabe 4/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Privacy-preserving data mining is gaining prominence due to increased accumulation of data containing personal information. Data holders in healthcare, finance and other sectors collecting person-specific information are challenged to publish useful data, while meeting ever-increasing demands of privacy protection for data subjects. K-anonymity is a popular technique used to preserve data privacy for data publishing by anonymizing quasi identifiers (QI) (e.g., race, gender, age). However, K-anonymized data can be at risk of temporal attacks that target multiple versions of released data, also called sequential releases. The objective of this study is to develop a model that uses multi-class and multi-label classifiers to evaluate risk in re-identifying QI information in previous data releases through learning from current data release. In our empirical study, we use five healthcare and financial data sets to compare performance of binary relationship and label powerset problem transformations and Naïve Bayes, C4.5, random tree and kNN learning algorithms. Our empirical results show that multi-label classification is a powerful tool in enhancing K-anonymity of sequential data release. Statistical analysis of the classification results shows that RAkEL outperforms other transformation methods in predicting demographics information, hence, can be useful in assessing risks of QI re-identification.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aggarwal, C.: Privacy-preserving data mining.’ In: Data Mining, pp. 663–693. Springer International Publishing (2015) Aggarwal, C.: Privacy-preserving data mining.’ In: Data Mining, pp. 663–693. Springer International Publishing (2015)
2.
Zurück zum Zitat Cotha, N., Sokolova, M.: Multi-label learning in classification of patients’ quasi-identifiers. Prog. Artificial Intell. 4(3–4), 37–48 (2015)CrossRef Cotha, N., Sokolova, M.: Multi-label learning in classification of patients’ quasi-identifiers. Prog. Artificial Intell. 4(3–4), 37–48 (2015)CrossRef
3.
Zurück zum Zitat Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetMATH Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetMATH
4.
Zurück zum Zitat Dong, Y., Yang, Y., Tang, J., Yang, Y., Chawla, N.: Inferring user demographics and social strategies in mobile social networks. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 2014, pp. 15–24 (2014) Dong, Y., Yang, Y., Tang, J., Yang, Y., Chawla, N.: Inferring user demographics and social strategies in mobile social networks. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 2014, pp. 15–24 (2014)
5.
Zurück zum Zitat Elisseeff, A., Weston, J.: A Kernel method for multi-labelled classification. In: Proceedings of the Annual ACM Conference on Research and Development in Information Retrieval, pp. 274–281 (2005) Elisseeff, A., Weston, J.: A Kernel method for multi-labelled classification. In: Proceedings of the Annual ACM Conference on Research and Development in Information Retrieval, pp. 274–281 (2005)
6.
Zurück zum Zitat Eze, B., Peyton, L.: Systematic literature review on the anonymization of high dimensional streaming datasets for health data sharing. Proc. Comput. Sci. 63, 348–355 (2015) Eze, B., Peyton, L.: Systematic literature review on the anonymization of high dimensional streaming datasets for health data sharing. Proc. Comput. Sci. 63, 348–355 (2015)
7.
Zurück zum Zitat Fan, W., Wang, H., Yu, P., Ma, S.: Is random model better? On its accuracy and efficiency. In: Third IEEE International Conference on Data Mining, 2003. ICDM 2003, pp. 51–58. IEEE (2003) Fan, W., Wang, H., Yu, P., Ma, S.: Is random model better? On its accuracy and efficiency. In: Third IEEE International Conference on Data Mining, 2003. ICDM 2003, pp. 51–58. IEEE (2003)
8.
Zurück zum Zitat Gibaja, E., Ventura, S.: Multi-label learning: a review of the state of the art and ongoing research. Wiley Int. Rev. Data Min. Knowl. Disc., 4, 6, pp. 411–444 (2014) Gibaja, E., Ventura, S.: Multi-label learning: a review of the state of the art and ongoing research. Wiley Int. Rev. Data Min. Knowl. Disc., 4, 6, pp. 411–444 (2014)
9.
Zurück zum Zitat Hu, J., Zeng, H., Li, H., Niu, C., Chen, Z.: Demographic prediction based on user’s browsing behavior. In: Proceedings of the \(16^{th}\) international conference on World Wide Web, pp. 151–160 (2007) Hu, J., Zeng, H., Li, H., Niu, C., Chen, Z.: Demographic prediction based on user’s browsing behavior. In: Proceedings of the \(16^{th}\) international conference on World Wide Web, pp. 151–160 (2007)
10.
Zurück zum Zitat Jafer, Y., Matwin, S., Sokolova, M.: Task oriented privacy preserving data publishing using feature selection. In: Advances in Artificial Intelligence 27, pp. 143–154. Springer (2014) Jafer, Y., Matwin, S., Sokolova, M.: Task oriented privacy preserving data publishing using feature selection. In: Advances in Artificial Intelligence 27, pp. 143–154. Springer (2014)
11.
Zurück zum Zitat Japkowicz, N., Shah, M.: Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press, Cambridge (2011)CrossRefMATH Japkowicz, N., Shah, M.: Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press, Cambridge (2011)CrossRefMATH
12.
Zurück zum Zitat Madjarov, G., Kocev, D., Gjorgjevikj, D., Džeroski, S.: An extensive experimental comparison of methods for multi-label learning. Pattern Recognit. 45(9), 3084–3104 (2012)CrossRef Madjarov, G., Kocev, D., Gjorgjevikj, D., Džeroski, S.: An extensive experimental comparison of methods for multi-label learning. Pattern Recognit. 45(9), 3084–3104 (2012)CrossRef
13.
Zurück zum Zitat Martínez, S., Sánchez, D., Valls, A.: A semantic framework to protect the privacy of electronic health records with non-numerical attributes. J. Biomed. Inform. 46(2), 294–303 (2013)CrossRef Martínez, S., Sánchez, D., Valls, A.: A semantic framework to protect the privacy of electronic health records with non-numerical attributes. J. Biomed. Inform. 46(2), 294–303 (2013)CrossRef
14.
Zurück zum Zitat Office for Civil Rights, H.: Standards for privacy of individually identifiable health information. Final rule. Federal Register 67(157), 53181 (2002) Office for Civil Rights, H.: Standards for privacy of individually identifiable health information. Final rule. Federal Register 67(157), 53181 (2002)
15.
Zurück zum Zitat Pei, J., Xu, J., Wang, Z., Wang, W., Wang, K.: Maintaining k-anonymity against incremental updates. In: Proceedings of the International Conference on Scientific and Statistical Database Management (2007) Pei, J., Xu, J., Wang, Z., Wang, W., Wang, K.: Maintaining k-anonymity against incremental updates. In: Proceedings of the International Conference on Scientific and Statistical Database Management (2007)
16.
Zurück zum Zitat Read, J.: A pruned problem transformation method for multi-label classification. In: Proc. 2008 New Zealand Computer Science Research Student Conference (NZCSRS 2008), pp. 143–150 (2008) Read, J.: A pruned problem transformation method for multi-label classification. In: Proc. 2008 New Zealand Computer Science Research Student Conference (NZCSRS 2008), pp. 143–150 (2008)
17.
Zurück zum Zitat Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. In: Proceedings of the 20th European Conference on Machine Learning, pp. 254–269 (2009) Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. In: Proceedings of the 20th European Conference on Machine Learning, pp. 254–269 (2009)
18.
Zurück zum Zitat Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45, 427–437 (2009)CrossRef Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45, 427–437 (2009)CrossRef
19.
Zurück zum Zitat Soria-Comas, J., Domingo-Ferrer, J.: Big data privacy: challenges to privacy principles and models. Data Sci. Eng. 1(1), 21–28 (2016)CrossRef Soria-Comas, J., Domingo-Ferrer, J.: Big data privacy: challenges to privacy principles and models. Data Sci. Eng. 1(1), 21–28 (2016)CrossRef
20.
Zurück zum Zitat Sorower, M.S.: A Literature Survey on Algorithms for Multi-Label Learning. Oregon State University, Corvallis (2010) Sorower, M.S.: A Literature Survey on Algorithms for Multi-Label Learning. Oregon State University, Corvallis (2010)
21.
Zurück zum Zitat Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 571–588 (2002)MathSciNetCrossRefMATH Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 571–588 (2002)MathSciNetCrossRefMATH
22.
Zurück zum Zitat Tsoumakas, G., Vlahavas, I.: Random k-labelsets: an ensemble method for multilabel classification. In: Proceedings of the 18th European Conference on Machine Learning (ECML 2007) (2007) Tsoumakas, G., Vlahavas, I.: Random k-labelsets: an ensemble method for multilabel classification. In: Proceedings of the 18th European Conference on Machine Learning (ECML 2007) (2007)
23.
Zurück zum Zitat Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. Int. J. Data Warehous. Min. 3(3), 1–13 (2007)CrossRef Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. Int. J. Data Warehous. Min. 3(3), 1–13 (2007)CrossRef
24.
Zurück zum Zitat Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer (2009) Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer (2009)
25.
Zurück zum Zitat Wang, K., Fung, B.: Anonymizing sequential releases. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 414–423. ACM (2006) Wang, K., Fung, B.: Anonymizing sequential releases. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 414–423. ACM (2006)
26.
Zurück zum Zitat Zhang, X., Yuan, Q., Zhao, S., Fan, W., Zheng, W., Wang, Z.: Multilabel classification without the multi-label cost. In: Proceedings of SDM, pp. 778–789 (2010) Zhang, X., Yuan, Q., Zhao, S., Fan, W., Zheng, W., Wang, Z.: Multilabel classification without the multi-label cost. In: Proceedings of SDM, pp. 778–789 (2010)
27.
Zurück zum Zitat Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Knowl. Data Eng. Trans. 26(8), 1819–1837 (2014)CrossRef Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Knowl. Data Eng. Trans. 26(8), 1819–1837 (2014)CrossRef
Metadaten
Titel
Applying multi-label and multi-class classification to enhance K-anonymity in sequential releases
verfasst von
Dung Tran
Marina Sokolova
Publikationsdatum
08.07.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
Progress in Artificial Intelligence / Ausgabe 4/2016
Print ISSN: 2192-6352
Elektronische ISSN: 2192-6360
DOI
https://doi.org/10.1007/s13748-016-0096-y

Weitere Artikel der Ausgabe 4/2016

Progress in Artificial Intelligence 4/2016 Zur Ausgabe