Skip to main content
Top

2017 | OriginalPaper | Chapter

DO NOT DISTURB? Classifier Behavior on Perturbed Datasets

Authors : Bernd Malle, Peter Kieseberg, Andreas Holzinger

Published in: Machine Learning and Knowledge Extraction

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Exponential trends in data generation are presenting today’s organizations, economies and governments with challenges never encountered before, especially in the field of privacy and data security. One crucial trade-off regulators are facing regards the simultaneous need for publishing personal information for the sake of statistical analysis and Machine Learning in order to increase quality levels in areas like medical services, while at the same time protecting the identity of individuals. A key European measure will be the introduction of the General Data Protection Regulation (GDPR) in 2018, giving customers the ‘right to be forgotten’, i.e. having their data deleted on request. As this could lead to a competitive disadvantage for European companies, it is important to understand which effects deletion of significant data points has on the performance of ML techniques. In a previous paper we introduced a series of experiments applying different algorithms to a binary classification problem under anonymization as well as perturbation. In this paper we extend those experiments by multi-class classification and introduce outlier-removal as an additional scenario. While the results of our previous work were mostly in-line with our expectations, our current experiments revealed unexpected behavior over a range of different scenarios. A surprising conclusion of those experiments is the fact that classification on an anonymized dataset with outliers removed in beforehand can almost compete with classification on the original, un-anonymized dataset. This could soon lead to competitive Machine Learning pipelines on anonymized datasets for real-world usage in the marketplace.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: Proceedings of the 31st International Conference on Very Large Data Bases VLDB, pp. 901–909 (2005) Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: Proceedings of the 31st International Conference on Very Large Data Bases VLDB, pp. 901–909 (2005)
2.
go back to reference Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Approximation algorithms for k-anonymity. J. Priv. Technol. (JOPT) (2005) Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Approximation algorithms for k-anonymity. J. Priv. Technol. (JOPT) (2005)
3.
go back to reference Brain, D., Webb, G.: On the effect of data set size on bias and variance in classification learning. In: Proceedings of the Fourth Australian Knowledge Acquisition Workshop, pp. 117–128. University of New South Wales (1999) Brain, D., Webb, G.: On the effect of data set size on bias and variance in classification learning. In: Proceedings of the Fourth Australian Knowledge Acquisition Workshop, pp. 117–128. University of New South Wales (1999)
4.
go back to reference Campan, A., Truta, T.M.: Data and structural k-anonymity in social networks. In: Bonchi, F., Ferrari, E., Jiang, W., Malin, B. (eds.) PInKDD 2008. LNCS, vol. 5456, pp. 33–54. Springer, Heidelberg (2009). doi:10.1007/978-3-642-01718-6_4 CrossRef Campan, A., Truta, T.M.: Data and structural k-anonymity in social networks. In: Bonchi, F., Ferrari, E., Jiang, W., Malin, B. (eds.) PInKDD 2008. LNCS, vol. 5456, pp. 33–54. Springer, Heidelberg (2009). doi:10.​1007/​978-3-642-01718-6_​4 CrossRef
5.
go back to reference Ciriani, V., De Capitani di Vimercati, S., Foresti, S., Samarati, P.: \(\kappa \)-anonymity. In: Yu, T., Jajodia, S. (eds.) Secure Data Management in Decentralized Systems. Advances in Information Security, vol. 33, pp. 323–353. Springer, Boston (2007) Ciriani, V., De Capitani di Vimercati, S., Foresti, S., Samarati, P.: \(\kappa \)-anonymity. In: Yu, T., Jajodia, S. (eds.) Secure Data Management in Decentralized Systems. Advances in Information Security, vol. 33, pp. 323–353. Springer, Boston (2007)
8.
go back to reference Holzinger, A., Plass, M., Holzinger, K., Crişan, G.C., Pintea, C.-M., Palade, V.: Towards interactive machine learning (iML): applying ant colony algorithms to solve the traveling salesman problem with the human-in-the-loop approach. In: Buccafurri, F., Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-ARES 2016. LNCS, vol. 9817, pp. 81–95. Springer, Cham (2016). doi:10.1007/978-3-319-45507-5_6 CrossRef Holzinger, A., Plass, M., Holzinger, K., Crişan, G.C., Pintea, C.-M., Palade, V.: Towards interactive machine learning (iML): applying ant colony algorithms to solve the traveling salesman problem with the human-in-the-loop approach. In: Buccafurri, F., Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-ARES 2016. LNCS, vol. 9817, pp. 81–95. Springer, Cham (2016). doi:10.​1007/​978-3-319-45507-5_​6 CrossRef
9.
go back to reference Holzinger, A.: Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inform. (BRIN) 3(2), 119–131 (2016). SpringerCrossRef Holzinger, A.: Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inform. (BRIN) 3(2), 119–131 (2016). SpringerCrossRef
10.
go back to reference Holzinger, A.: Introduction to machine learning & knowledge extraction (make). Mach. Learn. Knowl. Extract. 1(1), 1–20 (2017)CrossRef Holzinger, A.: Introduction to machine learning & knowledge extraction (make). Mach. Learn. Knowl. Extract. 1(1), 1–20 (2017)CrossRef
11.
go back to reference Kieseberg, P., Malle, B., Frhwirt, P., Weippl, E., Holzinger, A.: A tamper-proof audit and control system for the doctor in the loop. Brain Inform. 3(4), 269–279 (2016)CrossRef Kieseberg, P., Malle, B., Frhwirt, P., Weippl, E., Holzinger, A.: A tamper-proof audit and control system for the doctor in the loop. Brain Inform. 3(4), 269–279 (2016)CrossRef
12.
go back to reference Lee, H., Kim, S., Kim, J.W., Chung, Y.D.: Utility-preserving anonymization for health data publishing. BMC Med. Inform. Decis. Making 17(1), 104 (2017)CrossRef Lee, H., Kim, S., Kim, J.W., Chung, Y.D.: Utility-preserving anonymization for health data publishing. BMC Med. Inform. Decis. Making 17(1), 104 (2017)CrossRef
13.
go back to reference LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE 2006), p. 25. IEEE (2006) LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE 2006), p. 25. IEEE (2006)
14.
go back to reference Li, J., Liu, J., Baig, M., Wong, R.C.-W.: Information based data anonymization for classification utility. Data Knowl. Eng. 70(12), 1030–1045 (2011)CrossRef Li, J., Liu, J., Baig, M., Wong, R.C.-W.: Information based data anonymization for classification utility. Data Knowl. Eng. 70(12), 1030–1045 (2011)CrossRef
15.
go back to reference Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: IEEE 23rd International Conference on Data Engineering (ICDE 2007), pp. 106–115. IEEE (2007) Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: IEEE 23rd International Conference on Data Engineering (ICDE 2007), pp. 106–115. IEEE (2007)
16.
go back to reference Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Disc. Data (TKDD) 1(1), 1–52 (2007)CrossRef Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Disc. Data (TKDD) 1(1), 1–52 (2007)CrossRef
17.
go back to reference Majeed, A., Ullah, F., Lee, S.: Vulnerability-and diversity-aware anonymization of personally identifiable information for improving user privacy and utility of publishing data. Sensors 17(5), 1–23 (2017)CrossRef Majeed, A., Ullah, F., Lee, S.: Vulnerability-and diversity-aware anonymization of personally identifiable information for improving user privacy and utility of publishing data. Sensors 17(5), 1–23 (2017)CrossRef
18.
go back to reference Malle, B., Kieseberg, P., Weippl, E., Holzinger, A.: The right to be forgotten: towards machine learning on perturbed knowledge bases. In: Buccafurri, F., Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-ARES 2016. LNCS, vol. 9817, pp. 251–266. Springer, Cham (2016). doi:10.1007/978-3-319-45507-5_17 CrossRef Malle, B., Kieseberg, P., Weippl, E., Holzinger, A.: The right to be forgotten: towards machine learning on perturbed knowledge bases. In: Buccafurri, F., Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-ARES 2016. LNCS, vol. 9817, pp. 251–266. Springer, Cham (2016). doi:10.​1007/​978-3-319-45507-5_​17 CrossRef
19.
go back to reference Nergiz, M.E., Clifton, C.: Delta-presence without complete world knowledge. IEEE Trans. Knowl. Data Eng. 22(6), 868–883 (2010)CrossRef Nergiz, M.E., Clifton, C.: Delta-presence without complete world knowledge. IEEE Trans. Knowl. Data Eng. 22(6), 868–883 (2010)CrossRef
20.
go back to reference Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)CrossRef Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)CrossRef
22.
go back to reference Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertaint. Fuzziness Knowl. Based Syst. 10(5), 571–588 (2002)MathSciNetCrossRefMATH Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertaint. Fuzziness Knowl. Based Syst. 10(5), 571–588 (2002)MathSciNetCrossRefMATH
23.
go back to reference Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertaint. Fuzziness Knowl. Based Syst. 10(05), 557–570 (2002)MathSciNetCrossRefMATH Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertaint. Fuzziness Knowl. Based Syst. 10(05), 557–570 (2002)MathSciNetCrossRefMATH
24.
go back to reference Wimmer, H., Powell, L..: A comparison of the effects of K-anonymity on machine learning algorithms, pp. 1–9 (2014) Wimmer, H., Powell, L..: A comparison of the effects of K-anonymity on machine learning algorithms, pp. 1–9 (2014)
25.
go back to reference Wong, S.C., Gatt, A., Stamatescu, V., McDonnell, M.D.: Understanding data augmentation for classification: when to warp? In: 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp. 1–6. IEEE (2016) Wong, S.C., Gatt, A., Stamatescu, V., McDonnell, M.D.: Understanding data augmentation for classification: when to warp? In: 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp. 1–6. IEEE (2016)
Metadata
Title
DO NOT DISTURB? Classifier Behavior on Perturbed Datasets
Authors
Bernd Malle
Peter Kieseberg
Andreas Holzinger
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-66808-6_11

Premium Partner