Introduction
Name | Approach | Protects from adversial | Focus |
---|---|---|---|
HybrEx [44] | Data separation | Cloud | Only does partitioning as a measure to handle privacy Does not deal with key generation in map phase Does not deal with adversarial users |
EPiC [45] | Homomorphic encryption | Cloud | User does not trust cloud infrastructure Supports counting operation only Purpose specific |
Secret sharing | User and cloud | Cost overhead | |
Airavat [22] | MAC + differential privacy | User | Full trust on cloud providers Cannot guarantee privacy for calculations which output keys created by untrusted mappers |
Differential privacy
Basic terms of differential privacy
Privacy budget (ε)
Sensitivity
Mechanisms used in differential privacy
Noise
Laplace mechanism
Exponential mechanism
Combination properties
Sequential composition (SC)
Parallel composition (PC)
DP example on dataset application
IOP | Number of patients (real value) | Number of patients (differential privacy applied) |
---|---|---|
08–09 | 1 | − 1.089174486 |
10–11 | 2 | 0.332053694 |
12–13 | 17 | 16.71256706 |
14–15 | 20 | 18.67770913 |
16–17 | 43 | 62.75784853 |
18–19 | 63 | 79.93521335 |
20–21 | 57 | 44.60007283 |
22–23 | 23 | 28.04627043 |
24–25 | 7 | − 14.62243625 |
26–27 | 2 | 15.67088707 |
28–29 | 0 | 19.38616653 |
30–31 | 2 | 9.491168864 |
Mechanism | Advantages | Drawbacks |
---|---|---|
PINQ [18] | First platform providing differential privacy guarantees Expands the set of capable users of sensitive data, increases the portability of privacy-preserving algorithms across data sets and domains, and broadens the scope of the analysis of sensitive data | Does not consider the application developer to be an adversary Subjected to a weaker privacy constraint. Hence, vulnerable to state attack, privacy budget attack and timing attacks It further requires the developers to rewrite the application to make use of the PINQ primitives |
Airavat [22] | First system that integrates mandatory access control with differential privacy, enabling many privacy-preserving MapReduce computations without the need to audit untrusted code Can be deployed in large scale distribution, without the need of rewriting existing MapReduce applications | Cannot confine every computation performed by untrusted code. Only considers the map program to be an “untrusted” computation while the reduce program is “trusted” to be implemented in a differentially private manner Supports only limited Reducer functions Vulnerable to state attack and timing attacks |
GUPT [26] | Uses the aging model of data sensitivity, to allow analysts to describe the abstract ‘privacy budget’ in terms of expected accuracy of the final output GUPT automatically allocates a privacy budget to each query in order to match the data analysts’ accuracy requirements Defends against side channel attacks such as the privacy budget attacks, state attacks and timing attacks | GUPT assumes that the output dimensions are known in advance. This may however not always be true in practice Inherits limitations of differential privacy regarding splitting og privacy budget |
Geo-indistinguishability [25] | Proposes a generalized notion of differential privacy instantiated with the Euclidean metric which can be naturally applied to location privacy Offers the best privacy guarantees for the same utility, among all those which do not depend on the prior knowledge of the adversary, i.e., the mechanism is designed once and for all and it is applicable also when we do not know the prior | Linear degradation of the user’s privacy that limits use of the mechanism over time The level of noise of the Laplacian mechanism has to be fixed in advance independently of the movements of the user Despite achieving the flexible behavior, the tiled mechanism would not satisfy geo-indistinguishability to its full potential |
Telco big data [12] | First attempt to implement three basic DP architectures in the deployed telecommunication (telco) big data platform for data mining applications Proposed with the observation that the accuracy loss increases by increasing the variety of features, but decreases by increasing the volume of training data | The privacy of people in the training data is protected, but the privacy of people in the prediction data (that is, the data which will be applied to the trained model to) is not Design of adjustable privacy budget assignment strategies is required for better accuracy along with privacy guarantee |
e-Health data release [21] | Improves the performance of the previous work by designing a new private partition algorithm of histogram and also proposing a heuristic hierarchical query method Real experiments were conducted and the schemes compared with the existing one to show that the proposal is more efficient in terms of data processing and updating Increase of the accuracy of data release through consistency and gives a proof of privacy to show that the proposed algorithm is under differential privacy | Data release issues under differential privacy, such as real time monitoring and publishing of e-health data is proposed |
Approaches to achieve differential privacy
Differential privacy in Telco big data platform [12]
Efficient e-health data release with consistency guarantee under differential privacy [21]
Airavat model: security and privacy for Map Reduce [22]
Location based privacy based on differential privacy [23, 24]
-
First, when used repeatedly, there is a linear degradation of the user’s privacy that limits the use of the mechanism over time.
-
Second, the level of noise of the Laplacian mechanism has to be fixed in advance independently of the movements of the user, providing the same protection in areas with very different privacy characteristic, like a dense city or a sparse countryside. This limits the flexibility of the mechanism over space.
Geographic fences
Tiled mechanism
GUPT mechanism for differential privacy [26]
PINQ [18]
Advantages and drawbacks of different differential privacy mechanisms
Differential privacy and big data
Apple’s case
-
Hashing takes a string of text and turns it into a shorter value with a fixed length and mixes these keys up into irreversibly random strings of unique characters or “hash”. This obscures data so the device isn’t storing any of it in its original form.
-
Subsampling means that instead of collecting every word a person types, Apple will only use a smaller sample of them. For example, if a person has a long text conversation with a friend liberally using emoji, instead of collecting that entire conversation, subsampling might instead use only the parts Apple is interested in, such as the emoji.
-
Finally, the device injects noise, adding random data into the original dataset in order to make it vaguer. This means that Apple gets a result that has been masked ever so slightly and therefore isn’t quite exact.All this happens on the device, so it has already been shortened, mixed up, sampled, and blurred before it is even sent to the cloud for Apple to analyze.
-
When enough people replace a word with a particular emoji, it will become a suggestion for everyone.
-
When new words are added to enough local dictionaries to be considered commonplace, Apple will add it to everyone else’s dictionary too.
-
A search term can be used in Spotlight, and it will then provide app suggestions and open that link in said app or allow to install it from the App Store.
-
It will provide more accurate results for Lookup Hints in Notes.
Comparative study of different differential privacy mechanism
S. no | Years | Paper/work | Focus |
---|---|---|---|
1 | 2008 | US Census Bureau [17] | Protecting patient data confidentiality and indicating driving examples |
2 | 2009 | PINQ [18] | Interactive DP which guarantees, at runtime, that inquiries adhere to a worldwide security spending plan |
3 | 2010 | Airavat model [22] | MAC + differential privacy, i.e. access control mechanisms in integration with DP |
4 | 2012 | GUPT [26] | Makes protection saving information investigation simple for security non-specialists, the expert can transfer subjective information mining projects and GUPT ensures the security of the yields |
5 | 2014 | Google’s Rappor: randomized aggregatable privacy-preserving ordinal response | For telemetry, for example, learning insights about undesirable programming commandeering clients’ settings |
6 | 2014 | Ensures the client’s correct area, while permitting surmised data—normally expected to acquire a specific wanted administration—to be discharged | |
7 | 2015 | Google | For sharing historical traffic statistics |
8 | 2015 | DP in telecommunication big data platform, VLDB 2015 [12] | Implemented three basic DP architectures in the deployed telecommunication big data platform |
9 | 2015 | Efficient e-health data release with consistency guarantee under differential privacy, 2015 [21] | Investigated e-wellbeing information discharge issue and proposed an effective and secure e-wellbeing information discharge conspire with consistency ensure under DP |
10 | 2016 | Apple’s iOS 10 [30] | DP implemented in the messaging app and search recommendations |