Introduction
Privacy threats in data analytics
Surveillance
Disclosure
Discrimination
Personal embracement and abuse
Privacy preservation methods
-
K anonymity
-
L diversity
-
T closeness
-
Randomization
-
Data distribution
-
Cryptographic techniques
-
Multidimensional Sensitivity Based Anonymization (MDSBA).
K anonymity [10]
Sno | Zip | Age | Disease |
---|---|---|---|
1 | 57677 | 29 | Cardiac problem |
2 | 57602 | 22 | Cardiac problem |
3 | 57678 | 27 | Cardiac problem |
4 | 57905 | 43 | Skin allergy |
5 | 57909 | 52 | Cardiac problem |
6 | 57906 | 47 | Cancer |
7 | 57605 | 30 | Cardiac problem |
8 | 57673 | 36 | Cancer |
9 | 57607 | 32 | Cancer |
Sno | Zip | Age | Disease |
---|---|---|---|
1 | 576** | 2* | Cardiac problem |
2 | 576** | 2* | Cardiac problem |
3 | 576** | 2* | Cardiac problem |
4 | 5790* | > 40 | Skin allergy |
5 | 5790* | > 40 | Cardiac problem |
6 | 5790* | > 40 | Cancer |
7 | 576** | 3* | Cardiac problem |
8 | 576** | 3* | Cancer |
9 | 576** | 3* | Cancer |
L diversity
Sno | Zip | Age | Salary | Disease |
---|---|---|---|---|
1 | 576** | 2* | 5k | Cardiac problem |
2 | 576** | 2* | 6k | Cardiac problem |
3 | 576** | 2* | 7k | Cardiac problem |
4 | 5790* | > 40 | 20k | Skin allergy |
5 | 5790* | > 40 | 22k | Cardiac problem |
6 | 5790* | > 40 | 24k | Cancer |
T closeness
Sno | Zip | Age | Salary | Disease |
---|---|---|---|---|
1 | 576** | 2* | 5k | Cardiac problem |
2 | 576** | 2* | 16k | Cancer |
3 | 576** | 2* | 9k | Skin allergy |
4 | 5790* | > 40 | 20k | Skin allergy |
5 | 5790* | > 40 | 42k | Cardiac problem |
6 | 5790* | > 40 | 8k | Flu |
Randomization technique
-
More number of Mappers and Reducers were used as data volume increased.
-
Results before and after randomization were significantly different.
-
Some of the records which are outliers remain unaffected with randomization and are vulnerable to adversary attack.
-
Privacy preservation at the cost of data utility is not appreciated and hence randomization may not be suitable for privacy preservation especially attribute disclosure.
Data distribution technique
Cryptographic techniques
Multidimensional Sensitivity Based Anonymization (MDSBA)
Analysis
Features | Privacy preservation techniques | ||||
---|---|---|---|---|---|
Anonymization techniques | Cryptographic techniques | Data distribution | Randomization | MDSBA | |
Suitability for unstructured data | No | No | No | No | Yes |
Attribute preservation | No | No | No | Yes | Yes |
Damage to data utility | No | No | Yes | No | Yes |
Very complex to apply | No | Yes | Yes | Yes | Yes |
Accuracy of results of data analytics | No | Yes | No | No | No |