Skip to main content
Top

2018 | OriginalPaper | Chapter

Privacy via Maintaining Small Similitude Data for Big Data Statistical Representation

Authors : Philip Derbeko, Shlomi Dolev, Ehud Gudes

Published in: Cyber Security Cryptography and Machine Learning

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Despite its attractiveness, Big Data oftentimes is hard, slow and expensive to handle due to its size. Moreover, as the amount of collected data grows, individual privacy raises more and more concerns: “what do they know about me?” Different algorithms were suggested to enable privacy-preserving data release with the current de-facto standard differential privacy. However, the processing time of keeping the data private is inhibiting and currently not practical for every day use. Combined with the continuously growing data collection, the solution is not seen on a horizon.
In this research, we suggest replacing the Big Data with a much smaller similitude model. The model “resembles” the data with respect to a class of query. The user defines the maximum acceptable error and privacy requirements ahead of the query execution. Those requirements define the minimal size of the similitude model. The suggested method is demonstrated by using a wavelet transform and then by pruning the tree according to both the data reduction and the privacy requirements. We propose methods of combining the noise required for privacy preservation with noise of similitude model, that allow us to decrease the amount of added noise thus, improving the utilization of the method.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
3.
go back to reference Ács, G., Castelluccia, C., Chen, R.: Differentially private histogram publishing through lossy compression. In: 2012 IEEE 12th International Conference on Data Mining, pp. 1–10 (2012) Ács, G., Castelluccia, C., Chen, R.: Differentially private histogram publishing through lossy compression. In: 2012 IEEE 12th International Conference on Data Mining, pp. 1–10 (2012)
5.
go back to reference Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2007, pp. 273–282. ACM, New York (2007) Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2007, pp. 273–282. ACM, New York (2007)
6.
go back to reference Blum, A., Dwork, C., Mcsherry, F., Nissim, K.: Practical privacy: the SulQ framework. In: PODS, pp. 128–138. ACM (2005) Blum, A., Dwork, C., Mcsherry, F., Nissim, K.: Practical privacy: the SulQ framework. In: PODS, pp. 128–138. ACM (2005)
7.
go back to reference Blum, A., Ligett, K., Roth, A.: A learning theory approach to non-interactive database privacy. In: Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, STOC 2008, pp. 609–618. ACM, New York (2008) Blum, A., Ligett, K., Roth, A.: A learning theory approach to non-interactive database privacy. In: Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, STOC 2008, pp. 609–618. ACM, New York (2008)
8.
go back to reference Chakrabarti, K., Garofalakis, M., Rastogi, R., Shim, K.: Approximate query processing using wavelets. VLDB J. 10(2–3), 199–223 (2001)MATH Chakrabarti, K., Garofalakis, M., Rastogi, R., Shim, K.: Approximate query processing using wavelets. VLDB J. 10(2–3), 199–223 (2001)MATH
12.
go back to reference Gaboardi, M., Arias, E.J.G., Hsu, J., Roth, A., Wu, Z.S.: Dual query: practical private query release for high dimensional data. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 32, pp. 1170–1178. PMLR, Bejing, 22–24 June 2014 Gaboardi, M., Arias, E.J.G., Hsu, J., Roth, A., Wu, Z.S.: Dual query: practical private query release for high dimensional data. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 32, pp. 1170–1178. PMLR, Bejing, 22–24 June 2014
13.
go back to reference Garofalakis, M., Kumar, A.: Deterministic wavelet thresholding for maximum-error metrics. In: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2004, pp. 166–176. ACM, New York (2004) Garofalakis, M., Kumar, A.: Deterministic wavelet thresholding for maximum-error metrics. In: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2004, pp. 166–176. ACM, New York (2004)
14.
go back to reference Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Optimal and approximate computation of summary statistics for range aggregates. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2001, pp. 227–236. ACM, New York (2001) Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Optimal and approximate computation of summary statistics for range aggregates. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2001, pp. 227–236. ACM, New York (2001)
15.
go back to reference Hardt, M., Ligett, K., Mcsherry, F.: A simple and practical algorithm for differentially private data release. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 2339–2347. Curran Associates Inc. (2012) Hardt, M., Ligett, K., Mcsherry, F.: A simple and practical algorithm for differentially private data release. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 2339–2347. Curran Associates Inc. (2012)
16.
go back to reference Hardt, M., Rothblum, G.: A multiplicative weights mechanism for privacy-preserving data analysis, pp. 61–70, May 2010 Hardt, M., Rothblum, G.: A multiplicative weights mechanism for privacy-preserving data analysis, pp. 61–70, May 2010
17.
go back to reference Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow. 3(1–2), 1021–1032 (2010)CrossRef Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow. 3(1–2), 1021–1032 (2010)CrossRef
18.
go back to reference Lichman, M.: UCI Machine Learning Repository (2013) Lichman, M.: UCI Machine Learning Repository (2013)
19.
go back to reference Matias, Y., Vitter, J.S., Wang, M.: Wavelet-based histograms for selectivity estimation. SIGMOD Rec. 27(2), 448–459 (1998)CrossRef Matias, Y., Vitter, J.S., Wang, M.: Wavelet-based histograms for selectivity estimation. SIGMOD Rec. 27(2), 448–459 (1998)CrossRef
20.
go back to reference Qardaji, W.H., Yang, W., Li, N.: Understanding hierarchical methods for differentially private histograms. PVLDB 6, 1954–1965 (2013) Qardaji, W.H., Yang, W., Li, N.: Understanding hierarchical methods for differentially private histograms. PVLDB 6, 1954–1965 (2013)
21.
go back to reference Rastogi, V., Nath, S.: Differentially private aggregation of distributed time-series with transformation and encryption. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 735–746. ACM, New York (2010) Rastogi, V., Nath, S.: Differentially private aggregation of distributed time-series with transformation and encryption. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 735–746. ACM, New York (2010)
22.
go back to reference Stollnitz, E.J., Derose, T.D., Salesin, D.H.: Wavelets for Computer Graphics: Theory and Applications. Morgan Kaufmann Publishers Inc., San Francisco (1996) Stollnitz, E.J., Derose, T.D., Salesin, D.H.: Wavelets for Computer Graphics: Theory and Applications. Morgan Kaufmann Publishers Inc., San Francisco (1996)
23.
go back to reference Ullman, J.: Answering n2+O(1) counting queries with differential privacy is hard. In: Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, STOC 2013, pp. 361–370. ACM, New York (2013) Ullman, J.: Answering n2+O(1) counting queries with differential privacy is hard. In: Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, STOC 2013, pp. 361–370. ACM, New York (2013)
25.
go back to reference Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. SIGMOD Rec. 28(2), 193–204 (1999)CrossRef Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. SIGMOD Rec. 28(2), 193–204 (1999)CrossRef
26.
go back to reference Vitter, J.S., Wang, M., Iyer, B.: Data cube approximation and histograms via wavelets. In: Proceedings of the Seventh International Conference on Information and Knowledge Management, CIKM 1998, pp. 96–104. ACM, New York (1998) Vitter, J.S., Wang, M., Iyer, B.: Data cube approximation and histograms via wavelets. In: Proceedings of the Seventh International Conference on Information and Knowledge Management, CIKM 1998, pp. 96–104. ACM, New York (1998)
27.
go back to reference Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pp. 225–236 (2010) Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pp. 225–236 (2010)
28.
go back to reference Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: Privbayes: private data release via Bayesian networks. ACM Trans. Database Syst. 42(4), 25:1–25:41 (2017)MathSciNetCrossRef Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: Privbayes: private data release via Bayesian networks. ACM Trans. Database Syst. 42(4), 25:1–25:41 (2017)MathSciNetCrossRef
Metadata
Title
Privacy via Maintaining Small Similitude Data for Big Data Statistical Representation
Authors
Philip Derbeko
Shlomi Dolev
Ehud Gudes
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-94147-9_9

Premium Partner