Skip to main content
Top

2024 | OriginalPaper | Chapter

A Comparison of SynDiffix Multi-table Versus Single-table Synthetic Data

Author : Paul Francis

Published in: Privacy in Statistical Databases

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

SynDiffix is a new open-source tool for structured data synthesis. It has anonymization features that allow it to generate multiple synthetic tables while maintaining strong anonymity. Compared to the more common single-table approach, multi-table leads to more accurate data, since only the features of interest for a given analysis need be synthesized. This paper compares SynDiffix with 15 other commercial and academic synthetic data techniques using the SDNIST analysis framework, modified by us to accommodate multi-table synthetic data. The results show that SynDiffix is many times more accurate than other approaches for low-dimension tables, but somewhat worse than the best single-table techniques for high-dimension tables.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
7.
go back to reference Dwork, C.: Differential privacy. In: ICALP (2006) Dwork, C.: Differential privacy. In: ICALP (2006)
8.
9.
go back to reference Francis, P., Wagner, D.: Towards more accurate and useful data anonymity vulnerability measures. arXiv preprint arXiv:2403.06595 (2024) Francis, P., Wagner, D.: Towards more accurate and useful data anonymity vulnerability measures. arXiv preprint arXiv:​2403.​06595 (2024)
11.
go back to reference Jordon, J., Yoon, J., Van Der Schaar, M.: PATE-GAN: generating synthetic data with differential privacy guarantees. In: International conference on learning representations (2018) Jordon, J., Yoon, J., Van Der Schaar, M.: PATE-GAN: generating synthetic data with differential privacy guarantees. In: International conference on learning representations (2018)
12.
go back to reference Liu, T., Tang, J., Vietri, G., Wu, S.: Generating private synthetic data with genetic algorithms. In: International Conference on Machine Learning, pp. 22009–22027. PMLR (2023) Liu, T., Tang, J., Vietri, G., Wu, S.: Generating private synthetic data with genetic algorithms. In: International Conference on Machine Learning, pp. 22009–22027. PMLR (2023)
13.
go back to reference McKenna, R., Mullins, B., Sheldon, D., Miklau, G.: Aim: an adaptive and iterative mechanism for differentially private synthetic data. Proc. VLDB Endow. 15(11), 2599–2612 (2022)CrossRef McKenna, R., Mullins, B., Sheldon, D., Miklau, G.: Aim: an adaptive and iterative mechanism for differentially private synthetic data. Proc. VLDB Endow. 15(11), 2599–2612 (2022)CrossRef
14.
go back to reference McKenna, R., Sheldon, D., Miklau, G.: Graphical-model based estimation and inference for differential privacy. In: International Conference on Machine Learning, pp. 4435–4444. PMLR (2019) McKenna, R., Sheldon, D., Miklau, G.: Graphical-model based estimation and inference for differential privacy. In: International Conference on Machine Learning, pp. 4435–4444. PMLR (2019)
15.
go back to reference Meindl, B., Templ, M.: Feedback-based integration of the whole process of data anonymization in a graphical interface. Algorithms 12(9), 191 (2019)CrossRef Meindl, B., Templ, M.: Feedback-based integration of the whole process of data anonymization in a graphical interface. Algorithms 12(9), 191 (2019)CrossRef
16.
go back to reference Nowok, B., Raab, G.M., Dibben, C.: synthpop: bespoke creation of synthetic data in R. J. Stat. Softw. 74, 1–26 (2016)CrossRef Nowok, B., Raab, G.M., Dibben, C.: synthpop: bespoke creation of synthetic data in R. J. Stat. Softw. 74, 1–26 (2016)CrossRef
17.
go back to reference Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. J. R. Stat. Soc. Ser. A Stat. Soc. 181(3), 663–688 (2018)MathSciNetCrossRef Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. J. R. Stat. Soc. Ser. A Stat. Soc. 181(3), 663–688 (2018)MathSciNetCrossRef
18.
go back to reference Templ, M., Kowarik, A., Meindl, B.: Statistical disclosure control for micro-data using the r package sdcMicro. J. Stat. Softw. 67(i04), 1–36 (2015) Templ, M., Kowarik, A., Meindl, B.: Statistical disclosure control for micro-data using the r package sdcMicro. J. Stat. Softw. 67(i04), 1–36 (2015)
19.
go back to reference Thompson, G., Broadfoot, S., Elazar, D.: Methodology for the automatic confidentialisation of statistical outputs from remote servers at the australian bureau of statistics. In: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, Ottawa, Canada (2013) Thompson, G., Broadfoot, S., Elazar, D.: Methodology for the automatic confidentialisation of statistical outputs from remote servers at the australian bureau of statistics. In: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, Ottawa, Canada (2013)
20.
go back to reference Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Advances in Neural Information Processing Systems, vol. 32 (2019) Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
21.
go back to reference Zhou, Y., Kantarcioglu, M., Clifton, C.: On improving fairness of AI models with synthetic minority oversampling techniques. In: Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), pp. 874–882. SIAM (2023) Zhou, Y., Kantarcioglu, M., Clifton, C.: On improving fairness of AI models with synthetic minority oversampling techniques. In: Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), pp. 874–882. SIAM (2023)
Metadata
Title
A Comparison of SynDiffix Multi-table Versus Single-table Synthetic Data
Author
Paul Francis
Copyright Year
2024
DOI
https://doi.org/10.1007/978-3-031-69651-0_11

Premium Partner