Skip to main content
Top

2024 | OriginalPaper | Chapter

Privacy Risk from Synthetic Data: Practical Proposals

Author : Gillian M. Raab

Published in: Privacy in Statistical Databases

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper proposes and compares measures of identity and attribute disclosure risk for synthetic data. Data custodians can use the methods proposed here to inform the decision as to whether to release synthetic versions of confidential data. Different measures are evaluated on two data sets. Insight into the measures is obtained by examining the details of the records identified as posing a disclosure risk. This leads to methods to identify, and possibly exclude, apparently risky records where the identification or attribution would be expected by someone with background knowledge of the data. The methods described are available as part of the synthpop package for R.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
The team was led by Beata Nowok and also included Chris Dibben and the author.
 
2
See https://​commission.​europa.​eu/​law/​law-topic/​data-protection/​data-protection-eu_​en, accessed 19/5/2024 In the UK this is now incorporated within the Data Protection Act 2018.
 
5
An organisation with the mission “...to work with researchers, analysts and policymakers to unlock the potential of public sector data for the benefit of public good”; see https://​www.​researchdata.​scot/​.
 
6
This is version 1.8.1 that can be installed from Github at https://​github.​com/​gillian-raab/​synthpop.
 
7
The author’s involvement with SD nearly came to an end when her laptop, with SD created for a training course, was stolen. Fortunately, she was able to reassure the security staff from the data holders that the SD was fully encrypted as well as being clearly labelled as “Fake data”.
 
8
This may not be too unrealistic if the data are made available inadvertently, or if the intruder thinks that efforts to label the SD as e.g. “Fake Data” are thought to be just a cover up.
 
9
Jackson et al. in [12] argue that the denominator for repU should be the number of records in SD, rather than those in GT. This is inappropriate because our scenario is to consider the risk to the GT data.
 
10
This took place at the Newton Institute programme on Data Linkage and Anonymisation, 2016 see https://​www.​newton.​ac.​uk/​event/​dla/​.
 
11
including missing value categories.
 
12
Although the data set lists 15 variables, two are identical except that one (education) appears as both a factor and a numeric variable and We have excluded the weight, as it is not an analysis variable.
 
13
While restricting to one-way or two-way relationships seems limited, in practice stratification by segments of the GT such as area can make this more effective.
 
Literature
1.
go back to reference Bowen, C.M., Snoke, J.: Comparative study of differentially private synthetic data algorithms from the NIST PSCR differential privacy synthetic data challenge. J. Priv. Confidentiality 11(1) (2021) Bowen, C.M., Snoke, J.: Comparative study of differentially private synthetic data algorithms from the NIST PSCR differential privacy synthetic data challenge. J. Priv. Confidentiality 11(1) (2021)
3.
go back to reference Dalenius, T.: Finding a needle in a haystack. J. Off. Stat. 2, 329–336 (1986) Dalenius, T.: Finding a needle in a haystack. J. Off. Stat. 2, 329–336 (1986)
9.
go back to reference Goodfellow, I.J., et al.: Generative adversarial networks (2014) Goodfellow, I.J., et al.: Generative adversarial networks (2014)
11.
go back to reference Hundepool, A.: Statistical Disclosure Control, 1st edn. Wiley, Chichester (2012)CrossRef Hundepool, A.: Statistical Disclosure Control, 1st edn. Wiley, Chichester (2012)CrossRef
12.
go back to reference Jackson, J., Mitra, R., Francis, B., Dove, I.: Using saturated count models for user-friendly synthesis of large confidential administrative databases. J. R. Stat. Soc. A. Stat. Soc. 185, 1613–1643 (2022)MathSciNetCrossRef Jackson, J., Mitra, R., Francis, B., Dove, I.: Using saturated count models for user-friendly synthesis of large confidential administrative databases. J. R. Stat. Soc. A. Stat. Soc. 185, 1613–1643 (2022)MathSciNetCrossRef
14.
go back to reference Kaloskampis, I., Joshi, C., Cheung, C., Pugh, D., Nolan, L.: Synthetic data in the civil service. Significance 17, 18–23 (2021)CrossRef Kaloskampis, I., Joshi, C., Cheung, C., Pugh, D., Nolan, L.: Synthetic data in the civil service. Significance 17, 18–23 (2021)CrossRef
15.
go back to reference Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. Am. Stat. 60(3), 224–232 (2006)MathSciNetCrossRef Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. Am. Stat. 60(3), 224–232 (2006)MathSciNetCrossRef
17.
go back to reference Little, R.J.A.: Statistical analysis of masked data. J. Off. Stat. 9(2), 407–26 (1993) Little, R.J.A.: Statistical analysis of masked data. J. Off. Stat. 9(2), 407–26 (1993)
18.
go back to reference Machanavajjhala, A., Gehrke, J., Kifer, K., Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. In: 22nd International Conference on Data Engineering (ICDE 2006). IEEE (2006) Machanavajjhala, A., Gehrke, J., Kifer, K., Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. In: 22nd International Conference on Data Engineering (ICDE 2006). IEEE (2006)
26.
go back to reference Reiter, J.: Synthetic data: a look back and a look forward. Trans. Data Priv. 16, 15–24 (2023) Reiter, J.: Synthetic data: a look back and a look forward. Trans. Data Priv. 16, 15–24 (2023)
27.
go back to reference Rubin, D.B.: Discussion: statistical disclosure limitation. J. Off. Stat. 9(2), 461–8 (1993) Rubin, D.B.: Discussion: statistical disclosure limitation. J. Off. Stat. 9(2), 461–8 (1993)
28.
go back to reference Shokri, R., Strobel, M., Zick, Y.: On the privacy risks of model explanations. arXiv.org (2021) Shokri, R., Strobel, M., Zick, Y.: On the privacy risks of model explanations. arXiv.​org (2021)
29.
go back to reference Snoke, J., Raab, G., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. J. Roy. Statist. Soc. Serues A 181(3), 663–688 (2018)MathSciNetCrossRef Snoke, J., Raab, G., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. J. Roy. Statist. Soc. Serues A 181(3), 663–688 (2018)MathSciNetCrossRef
33.
go back to reference Voas, D., Williamson, P.: Evaluating goodness-of-fit measures for synthetic microdata. Geogr. Environ. Model. 5, 177–200 (2001)CrossRef Voas, D., Williamson, P.: Evaluating goodness-of-fit measures for synthetic microdata. Geogr. Environ. Model. 5, 177–200 (2001)CrossRef
34.
go back to reference Wagner, I., Eckhoff, D.: Technical privacy metrics: a systematic survey. ACM Comput. Surv. 51(3), 1–38 (2018)CrossRef Wagner, I., Eckhoff, D.: Technical privacy metrics: a systematic survey. ACM Comput. Surv. 51(3), 1–38 (2018)CrossRef
35.
go back to reference Woo, M.J., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1, 111–124 (2009) Woo, M.J., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1, 111–124 (2009)
Metadata
Title
Privacy Risk from Synthetic Data: Practical Proposals
Author
Gillian M. Raab
Copyright Year
2024
DOI
https://doi.org/10.1007/978-3-031-69651-0_17

Premium Partner