Top

Published in:

2017 | OriginalPaper | Chapter

DO NOT DISTURB? Classifier Behavior on Perturbed Datasets

Authors : Bernd Malle, Peter Kieseberg, Andreas Holzinger

Published in: Machine Learning and Knowledge Extraction

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Exponential trends in data generation are presenting today’s organizations, economies and governments with challenges never encountered before, especially in the field of privacy and data security. One crucial trade-off regulators are facing regards the simultaneous need for publishing personal information for the sake of statistical analysis and Machine Learning in order to increase quality levels in areas like medical services, while at the same time protecting the identity of individuals. A key European measure will be the introduction of the General Data Protection Regulation (GDPR) in 2018, giving customers the ‘right to be forgotten’, i.e. having their data deleted on request. As this could lead to a competitive disadvantage for European companies, it is important to understand which effects deletion of significant data points has on the performance of ML techniques. In a previous paper we introduced a series of experiments applying different algorithms to a binary classification problem under anonymization as well as perturbation. In this paper we extend those experiments by multi-class classification and introduce outlier-removal as an additional scenario. While the results of our previous work were mostly in-line with our expectations, our current experiments revealed unexpected behavior over a range of different scenarios. A surprising conclusion of those experiments is the fact that classification on an anonymized dataset with outliers removed in beforehand can almost compete with classification on the original, un-anonymized dataset. This could soon lead to competitive Machine Learning pipelines on anonymized datasets for real-world usage in the marketplace.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter IntelliAV: Toward the Feasibility of Building Intelligent Anti-malware on Android Devices

next chapter A Short-Term Forecast Approach of Public Buildings’ Power Demands upon Multi-source Data

Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: Proceedings of the 31st International Conference on Very Large Data Bases VLDB, pp. 901–909 (2005)

Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Approximation algorithms for k-anonymity. J. Priv. Technol. (JOPT) (2005)

Brain, D., Webb, G.: On the effect of data set size on bias and variance in classification learning. In: Proceedings of the Fourth Australian Knowledge Acquisition Workshop, pp. 117–128. University of New South Wales (1999)

Campan, A., Truta, T.M.: Data and structural k-anonymity in social networks. In: Bonchi, F., Ferrari, E., Jiang, W., Malin, B. (eds.) PInKDD 2008. LNCS, vol. 5456, pp. 33–54. Springer, Heidelberg (2009). doi:10.1007/978-3-642-01718-6_4 CrossRef

Ciriani, V., De Capitani di Vimercati, S., Foresti, S., Samarati, P.: \(\kappa \)-anonymity. In: Yu, T., Jajodia, S. (eds.) Secure Data Management in Decentralized Systems. Advances in Information Security, vol. 33, pp. 323–353. Springer, Boston (2007)

Duchi, J.C., Jordan, M.I., Wainwright, M.J.: Privacy aware learning. J. ACM (JACM) 61(6), 38 (2014)MathSciNetCrossRefMATH

Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). doi:10.1007/978-3-540-79228-4_1 CrossRef

Holzinger, A., Plass, M., Holzinger, K., Crişan, G.C., Pintea, C.-M., Palade, V.: Towards interactive machine learning (iML): applying ant colony algorithms to solve the traveling salesman problem with the human-in-the-loop approach. In: Buccafurri, F., Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-ARES 2016. LNCS, vol. 9817, pp. 81–95. Springer, Cham (2016). doi:10.1007/978-3-319-45507-5_6 CrossRef

Holzinger, A.: Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inform. (BRIN) 3(2), 119–131 (2016). SpringerCrossRef

10.

Holzinger, A.: Introduction to machine learning & knowledge extraction (make). Mach. Learn. Knowl. Extract. 1(1), 1–20 (2017)CrossRef

11.

Kieseberg, P., Malle, B., Frhwirt, P., Weippl, E., Holzinger, A.: A tamper-proof audit and control system for the doctor in the loop. Brain Inform. 3(4), 269–279 (2016)CrossRef

12.

Lee, H., Kim, S., Kim, J.W., Chung, Y.D.: Utility-preserving anonymization for health data publishing. BMC Med. Inform. Decis. Making 17(1), 104 (2017)CrossRef

13.

LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE 2006), p. 25. IEEE (2006)

14.

Li, J., Liu, J., Baig, M., Wong, R.C.-W.: Information based data anonymization for classification utility. Data Knowl. Eng. 70(12), 1030–1045 (2011)CrossRef

15.

Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: IEEE 23rd International Conference on Data Engineering (ICDE 2007), pp. 106–115. IEEE (2007)

16.

Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Disc. Data (TKDD) 1(1), 1–52 (2007)CrossRef

17.

Majeed, A., Ullah, F., Lee, S.: Vulnerability-and diversity-aware anonymization of personally identifiable information for improving user privacy and utility of publishing data. Sensors 17(5), 1–23 (2017)CrossRef

18.

Malle, B., Kieseberg, P., Weippl, E., Holzinger, A.: The right to be forgotten: towards machine learning on perturbed knowledge bases. In: Buccafurri, F., Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-ARES 2016. LNCS, vol. 9817, pp. 251–266. Springer, Cham (2016). doi:10.1007/978-3-319-45507-5_17 CrossRef

19.

Nergiz, M.E., Clifton, C.: Delta-presence without complete world knowledge. IEEE Trans. Knowl. Data Eng. 22(6), 868–883 (2010)CrossRef

20.

Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)CrossRef

21.

Simpson, E.H.: Measurement of diversity. Nature 163, 688 (1949)CrossRefMATH

22.

Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertaint. Fuzziness Knowl. Based Syst. 10(5), 571–588 (2002)MathSciNetCrossRefMATH

23.

Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertaint. Fuzziness Knowl. Based Syst. 10(05), 557–570 (2002)MathSciNetCrossRefMATH

24.

Wimmer, H., Powell, L..: A comparison of the effects of K-anonymity on machine learning algorithms, pp. 1–9 (2014)

25.

Wong, S.C., Gatt, A., Stamatescu, V., McDonnell, M.D.: Understanding data augmentation for classification: when to warp? In: 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp. 1–6. IEEE (2016)

Title: DO NOT DISTURB? Classifier Behavior on Perturbed Datasets
Authors: Bernd Malle
Peter Kieseberg
Andreas Holzinger
Publisher: Springer International Publishing
Book: Machine Learning and Knowledge Extraction
Print ISBN: 978-3-319-66807-9

Electronic ISBN: 978-3-319-66808-6

Copyright Year: 2017
DOI: https://doi.org/10.1007/978-3-319-66808-6_11

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner