Skip to main content
Erschienen in: Distributed and Parallel Databases 4/2019

02.01.2019

IHP: improving the utility in differential private histogram publication

verfasst von: Hui Li, Jiangtao Cui, Xue Meng, Jianfeng Ma

Erschienen in: Distributed and Parallel Databases | Ausgabe 4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Differential privacy (DP) is a promising tool for preserving privacy during data publication, as it provides strong theoretical privacy guarantees in face of adversaries with arbitrary background knowledge. Histogram, as the result of a set of count queries, serves as a core statistical tool to report data distributions and is in fact viewed as the fundamental method for many other statistical analysis such as range queries. It is an important form for data publishing. In this paper, we consider the scenario of publishing sensitive histogram data with differential privacy scheme. Existing work in this field has justified that, comparing to directly applying DP techniques (i.e., injecting noise) over the counts in histogram bins, grouping bins before noise injection is more effective (i.e., with higher utility) as it introduces much less error over the sanitized histogram given the same privacy budget. However, state-of-the-art works have not unveiled how the overall utility of a sanitized histogram can be affected by the balance between the privacy budget distributed between grouping and noise injection phases. In this work, we conduct a theoretical study towards how the probability of getting better groups can be improved such that the overall error introduced in sanitized histogram can be further reduced, which directly leads to a higher utility for the sanitized histograms. In particular, we show that the probability of achieving better grouping can be affected by two factors, namely privacy budget assigned in grouping and the normalized utility function used for selecting groups. Motivated by that, we propose a new DP histogram publishing scheme, namely Iterative Histogram Partition, in which we carefully assign privacy budget between grouping and injection phases based on our theoretical study. We also theoretically prove that \(\epsilon \)-differential privacy can be achieved according to our new scheme. Moreover, we also show that, under the same privacy budget, our scheme exhibits less errors in the sanitized histograms comparing with state-of-the-art methods. We also extends the model to multi-dimensional histogram publication cases. Finally, empirical study over four real-world datasets also justifies that our scheme achieves the least error among series of state-of-the-art baseline methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
In the sequel, we may use the term “cluster” and “group” interchangeably to facilitate the discussion.
 
2
In the standard form, \(D_2\) is in fact obtained by either inserting or deleting a simple record from \(D_1\). Modifying a record cannot be viewed as neighboring, as it in fact consists of both delete and insert operations.
 
3
In the following, we shall use 2-dimensional data cube and 2-dimensional histogram interchangeably.
 
Literatur
1.
Zurück zum Zitat Ács, G., Castelluccia, C., Chen, R.: Differentially private histogram publishing through Lossy compression. In: Proceedings of the 12th IEEE ICDM, pp. 1–10 (2012) Ács, G., Castelluccia, C., Chen, R.: Differentially private histogram publishing through Lossy compression. In: Proceedings of the 12th IEEE ICDM, pp. 1–10 (2012)
2.
Zurück zum Zitat Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the 26th ACM PODS, pp. 273–282 (2007) Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the 26th ACM PODS, pp. 273–282 (2007)
3.
Zurück zum Zitat Cho, H., Kwon, S.J., Jin, R., Chung, T.: A privacy-aware monitoring algorithm for moving k-nearest neighbor queries in road networks. Distrib. Parallel Databases 33(3), 319–352 (2015)CrossRef Cho, H., Kwon, S.J., Jin, R., Chung, T.: A privacy-aware monitoring algorithm for moving k-nearest neighbor queries in road networks. Distrib. Parallel Databases 33(3), 319–352 (2015)CrossRef
4.
Zurück zum Zitat Cormode, G., Procopiuc, C.M., Srivastava, D., Shen, E., Yu, T.: Differentially private spatial decompositions. In: IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, VA), 1–5 April 2012, pp. 20–31 (2012) Cormode, G., Procopiuc, C.M., Srivastava, D., Shen, E., Yu, T.: Differentially private spatial decompositions. In: IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, VA), 1–5 April 2012, pp. 20–31 (2012)
5.
Zurück zum Zitat Ding, B., Winslett, M., Han, J., Li, Z.: Differentially private data cubes: optimizing noise sources and consistency. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, 12–16 June 2011, pp. 217–228 (2011) Ding, B., Winslett, M., Han, J., Li, Z.: Differentially private data cubes: optimizing noise sources and consistency. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, 12–16 June 2011, pp. 217–228 (2011)
6.
Zurück zum Zitat Dwork, C., McSherry, F., Nissim, K., Smith, A.D.: Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd TCC, pp. 265–284 (2006)CrossRef Dwork, C., McSherry, F., Nissim, K., Smith, A.D.: Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd TCC, pp. 265–284 (2006)CrossRef
7.
Zurück zum Zitat Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. PVLDB 3(1), 1021–1032 (2010) Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. PVLDB 3(1), 1021–1032 (2010)
8.
Zurück zum Zitat Hua, J., Tang, A., Fang, Y., Shen, Z., Zhong, S.: Privacy-preserving utility verification of the data published by non-interactive differentially private mechanisms. IEEE Trans. Inf. Forensics Security 11(10), 2298–2311 (2016)CrossRef Hua, J., Tang, A., Fang, Y., Shen, Z., Zhong, S.: Privacy-preserving utility verification of the data published by non-interactive differentially private mechanisms. IEEE Trans. Inf. Forensics Security 11(10), 2298–2311 (2016)CrossRef
9.
Zurück zum Zitat Huang, J., Qi, J., Xu, Y., Chen, J.: A privacy-enhancing model for location-based personalized recommendations. Distrib. Parallel Databases 33(2), 253–276 (2015)CrossRef Huang, J., Qi, J., Xu, Y., Chen, J.: A privacy-enhancing model for location-based personalized recommendations. Distrib. Parallel Databases 33(2), 253–276 (2015)CrossRef
10.
Zurück zum Zitat Kellaris, G., Papadopoulos, S.: Practical differential privacy via grouping and smoothing. PVLDB 6(5), 301–312 (2013) Kellaris, G., Papadopoulos, S.: Practical differential privacy via grouping and smoothing. PVLDB 6(5), 301–312 (2013)
11.
Zurück zum Zitat Li, C., Hay, M., Rastogi, V., Miklau, G., McGregor, A.: Optimizing linear counting queries under differential privacy. In: Proceedings of the 29th ACM PODS, pp. 123–134 (2010) Li, C., Hay, M., Rastogi, V., Miklau, G., McGregor, A.: Optimizing linear counting queries under differential privacy. In: Proceedings of the 29th ACM PODS, pp. 123–134 (2010)
12.
Zurück zum Zitat Li, C., Hay, M., Miklau, G., Wang, Y.: A data- and workload-aware query answering algorithm for range queries under differential privacy. PVLDB 7(5), 341–352 (2014) Li, C., Hay, M., Miklau, G., Wang, Y.: A data- and workload-aware query answering algorithm for range queries under differential privacy. PVLDB 7(5), 341–352 (2014)
13.
Zurück zum Zitat Li, C., Miklau, G., Hay, M., McGregor, A., Rastogi, V.: The matrix mechanism: optimizing linear counting queries under differential privacy. VLDB J. 24(6), 757–781 (2015)CrossRef Li, C., Miklau, G., Hay, M., McGregor, A., Rastogi, V.: The matrix mechanism: optimizing linear counting queries under differential privacy. VLDB J. 24(6), 757–781 (2015)CrossRef
14.
Zurück zum Zitat Li, H., Xiong, L., Jiang, X., Liu, J.: Differentially private histogram publication for dynamic datasets: an adaptive sampling approach. In: Proceedings of the 24th ACM CIKM, pp. 1001–1010 (2015) Li, H., Xiong, L., Jiang, X., Liu, J.: Differentially private histogram publication for dynamic datasets: an adaptive sampling approach. In: Proceedings of the 24th ACM CIKM, pp. 1001–1010 (2015)
15.
Zurück zum Zitat Li, H., Cui, J., Lin, X., Ma, J.: Improving the utility in differential private histogram publishing: Theoretical study and practice. In: 2016 IEEE International Conference on Big Data, BigData 2016, Washington DC, USA, December 5-8, 2016, pp 1100–1109 (2016) Li, H., Cui, J., Lin, X., Ma, J.: Improving the utility in differential private histogram publishing: Theoretical study and practice. In: 2016 IEEE International Conference on Big Data, BigData 2016, Washington DC, USA, December 5-8, 2016, pp 1100–1109 (2016)
16.
Zurück zum Zitat Li, Y.D., Zhang, Z., Winslett, M., Yang, Y.: Compressive mechanism: utilizing sparse representation in differential privacy. In: Proceedings of the 10th ACM WPES, pp. 177–182 (2011) Li, Y.D., Zhang, Z., Winslett, M., Yang, Y.: Compressive mechanism: utilizing sparse representation in differential privacy. In: Proceedings of the 10th ACM WPES, pp. 177–182 (2011)
18.
Zurück zum Zitat McSherry, F.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the ACM SIGMOD, pp. 19–30 (2009) McSherry, F.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the ACM SIGMOD, pp. 19–30 (2009)
19.
Zurück zum Zitat McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: Proceedings of the 48th IEEE FOCS, pp. 94–103 (2007) McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: Proceedings of the 48th IEEE FOCS, pp. 94–103 (2007)
20.
Zurück zum Zitat Muthukrishnan, S., Poosala, V., Suel, T.: On rectangular partitionings in two dimensions: Algorithms, complexity, and applications. In: 7th International Conference on Database Theory—ICDT ’99, Jerusalem, Israel, 10–12 January 1999, Proceedings, pp. 236–256 (1999) Muthukrishnan, S., Poosala, V., Suel, T.: On rectangular partitionings in two dimensions: Algorithms, complexity, and applications. In: 7th International Conference on Database Theory—ICDT ’99, Jerusalem, Israel, 10–12 January 1999, Proceedings, pp. 236–256 (1999)
21.
Zurück zum Zitat Qardaji, W.H., Yang, W., Li, N.: Differentially private grids for geospatial data. In: 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, 8–12 April 2013, pp. 757–768 (2013) Qardaji, W.H., Yang, W., Li, N.: Differentially private grids for geospatial data. In: 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, 8–12 April 2013, pp. 757–768 (2013)
22.
Zurück zum Zitat Qardaji, W.H., Yang, W., Li, N.: Understanding hierarchical methods for differentially private histograms. PVLDB 6(14), 1954–1965 (2013) Qardaji, W.H., Yang, W., Li, N.: Understanding hierarchical methods for differentially private histograms. PVLDB 6(14), 1954–1965 (2013)
23.
Zurück zum Zitat Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993) Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
24.
Zurück zum Zitat Rastogi, V., Nath, S.: Differentially private aggregation of distributed time-series with transformation and encryption. In: Proceedings of the ACM SIGMOD, pp. 735–746 (2010) Rastogi, V., Nath, S.: Differentially private aggregation of distributed time-series with transformation and encryption. In: Proceedings of the ACM SIGMOD, pp. 735–746 (2010)
25.
Zurück zum Zitat Song, C., Ge, T.: Aroma: A new data protection method with differential privacy and accurate query answering. In: Proceedings of the 23rd ACM CIKM, pp. 1569–1578 (2014) Song, C., Ge, T.: Aroma: A new data protection method with differential privacy and accurate query answering. In: Proceedings of the 23rd ACM CIKM, pp. 1569–1578 (2014)
26.
Zurück zum Zitat Soria-Comas, J., Domingo-Ferrer, J., Sánchez, D., Megías, D.: Individual differential privacy: a utility-preserving formulation of differential privacy guarantees. IEEE Trans. Inf. Forensics Security 12(6), 1418–1429 (2017)CrossRef Soria-Comas, J., Domingo-Ferrer, J., Sánchez, D., Megías, D.: Individual differential privacy: a utility-preserving formulation of differential privacy guarantees. IEEE Trans. Inf. Forensics Security 12(6), 1418–1429 (2017)CrossRef
27.
Zurück zum Zitat Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 23(8), 1200–1214 (2011)CrossRef Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 23(8), 1200–1214 (2011)CrossRef
28.
Zurück zum Zitat Xu, J., Zhang, Z., Xiao, X., Yang, Y., Yu, G.: Differentially private histogram publication. In: Proceedings of the 28th IEEE ICDE, pp. 32–43 (2012) Xu, J., Zhang, Z., Xiao, X., Yang, Y., Yu, G.: Differentially private histogram publication. In: Proceedings of the 28th IEEE ICDE, pp. 32–43 (2012)
29.
Zurück zum Zitat Xu, J., Zhang, Z., Xiao, X., Yang, Y., Yu, G., Winslett, M.: Differentially private histogram publication. VLDB J. 22(6), 797–822 (2013)CrossRef Xu, J., Zhang, Z., Xiao, X., Yang, Y., Yu, G., Winslett, M.: Differentially private histogram publication. VLDB J. 22(6), 797–822 (2013)CrossRef
30.
Zurück zum Zitat Yuan, G., Zhang, Z., Winslett, M., Xiao, X., Yang, Y., Hao, Z.: Low-rank mechanism: optimizing batch queries under differential privacy. PVLDB 5(11), 1352–1363 (2012) Yuan, G., Zhang, Z., Winslett, M., Xiao, X., Yang, Y., Hao, Z.: Low-rank mechanism: optimizing batch queries under differential privacy. PVLDB 5(11), 1352–1363 (2012)
31.
Zurück zum Zitat Yuan, G., Zhang, Z., Winslett, M., Xiao, X., Yang, Y., Hao, Z.: Optimizing batch linear queries under exact and approximate differential privacy. ACM Trans. Database Syst. 40(2), 11 (2015)MathSciNetCrossRef Yuan, G., Zhang, Z., Winslett, M., Xiao, X., Yang, Y., Hao, Z.: Optimizing batch linear queries under exact and approximate differential privacy. ACM Trans. Database Syst. 40(2), 11 (2015)MathSciNetCrossRef
32.
Zurück zum Zitat Zhang, T., Zhu, Q.: Dynamic differential privacy for admm-based distributed classification learning. IEEE Trans. Inf. Forensics Security 12(1), 172–187 (2017)MathSciNetCrossRef Zhang, T., Zhu, Q.: Dynamic differential privacy for admm-based distributed classification learning. IEEE Trans. Inf. Forensics Security 12(1), 172–187 (2017)MathSciNetCrossRef
33.
Zurück zum Zitat Zhang, X., Chen, R., Xu, J., Meng, X., Xie, Y.: Towards accurate histogram publication under differential privacy. In: Proceedings of the 2014 SIAM SDM, pp. 587–595 (2014) Zhang, X., Chen, R., Xu, J., Meng, X., Xie, Y.: Towards accurate histogram publication under differential privacy. In: Proceedings of the 2014 SIAM SDM, pp. 587–595 (2014)
34.
Zurück zum Zitat Zhu, T., Xiong, P., Li, G., Zhou, W.: Correlated differential privacy: hiding information in non-iid data set. IEEE Trans. Inf. Forensics Security 10(2), 229–242 (2015)CrossRef Zhu, T., Xiong, P., Li, G., Zhou, W.: Correlated differential privacy: hiding information in non-iid data set. IEEE Trans. Inf. Forensics Security 10(2), 229–242 (2015)CrossRef
Metadaten
Titel
IHP: improving the utility in differential private histogram publication
verfasst von
Hui Li
Jiangtao Cui
Xue Meng
Jianfeng Ma
Publikationsdatum
02.01.2019
Verlag
Springer US
Erschienen in
Distributed and Parallel Databases / Ausgabe 4/2019
Print ISSN: 0926-8782
Elektronische ISSN: 1573-7578
DOI
https://doi.org/10.1007/s10619-018-07255-6

Weitere Artikel der Ausgabe 4/2019

Distributed and Parallel Databases 4/2019 Zur Ausgabe