Skip to main content
Top

2019 | OriginalPaper | Chapter

9. Data Security

Author : Ervin Varga

Published in: Practical Data Science with Python 3

Publisher: Apress

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Data science is all about data, which inevitably also includes sensitive information about people, organizations, government agencies, and so on. Any confidential information must be handled with utmost care and kept secret from villains. Protecting privacy and squeezing out value from data are opposing forces, like performance optimization and maintainability of a software system. As you improve one you diminish the other. As data scientists, we must ensure both that data is properly protected and that our data science product is capable of fending off abuse as well as unintended usage (for example, prevent it from being used to manipulate people by recommending to them specific items or convincing them to act in some particular manner). All protective actions should nicely interplay with the usefulness of a data science product; otherwise, there is no point in developing the product.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
It is interesting to observe that some techniques have gotten reclassified over time due to legal restrictions. In the recent past, web sites were not obligated to warn you about their use of cookies. Nowadays, you must explicitly confirm that you accept the usage of cookies. In other words, collection has turned into disclosure, at least for cookies. Of course, it is a completely different matter whether users are even aware of what those cookies represent. Most of them just click Accept to get rid of the annoying (perhaps intentionally so) pop-up.
 
2
One key issue is the right to be forgotten related to mandatory deletion of data after processing is finished (this is tightly interrelated to data retention policy) or when requested by a data subject. Many ML algorithms cannot remove individual records from a trained model, although anonymization can help to some extent. Microservices-based systems with event-sourced domain models have the same difficulty; one possible solution is to keep event-sourced data encrypted with user-specific keys and delete the selected keys as necessary (without a key, the corresponding record is effectively deleted). At any rate, some flexibility and ingenuity are demanded to deliver a fully compliant GDPR solution, while retaining all the benefits of data science offerings. Deletion of data is a qualified right that must be evaluated by lawyers. Moreover, restoration (recovery) procedures must be modified to ensure deleted data is not accidentally restored.
 
3
An overall risk is the sum of expected values of random variables, each representing a specific threat. Consequently, you must model threats as part of your risk assessment procedure. There are also tools to support your effort, like the freely available Microsoft Threat Modeling Tool (you can download it from http://bit.ly/ms_threat_modeling_tool ).
 
4
The adversarial ML uses a so-called attack model that is purely a naming convention. There is nothing special in this model from the viewpoint of an ML classification algorithm.
 
5
An example side-channel attack is a scheme that monitors the fluctuations in power usage of crypto computations to decipher parts of the secret key. The assumption is that nonsecure code leaks out information indicating whether it is processing a binary zero or one at any given moment. Another potent side-channel attack is time based, measuring time differences in data processing to figure out the secret key (this can be leveraged to attack weakly secured web services). You can read more about this in [2].
 
Metadata
Title
Data Security
Author
Ervin Varga
Copyright Year
2019
Publisher
Apress
DOI
https://doi.org/10.1007/978-1-4842-4859-1_9

Premium Partner