Skip to main content

2017 | OriginalPaper | Buchkapitel

7. Information Loss: Evaluation and Measures

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Masking methods modify databases in order to avoid disclosure. This causes some information loss that can be quantified. In this chapter we discuss different alternatives to evaluate in what extent relevant information is lost. We give an overview of generic and specific information loss measures.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Recall that we have discussed distances and metrics, as well as their properties in Sects. 5.​4.​7 and 5.​6.​1.
 
2
Commonality is the percentage of each variable that is explained by a principal component.
 
3
The factor scores stand for the factors that should multiply each variable in X to obtain its projection on each principal component.
 
4
Reference [34] has a similar use of the Hellinger distance for comparing tables, but for tabular data protection.
 
Literatur
1.
Zurück zum Zitat Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, North-Holland, pp. 91–110 (2001) Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, North-Holland, pp. 91–110 (2001)
2.
Zurück zum Zitat Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, North-Holland, pp. 111–134 (2001) Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, North-Holland, pp. 111–134 (2001)
3.
Zurück zum Zitat Rebollo-Monedero, D., Forné, J., Soriano, M.: An algorithm for \(k\)-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers. Data Knowl. Eng. 70(10), 892–921 (2011)CrossRef Rebollo-Monedero, D., Forné, J., Soriano, M.: An algorithm for \(k\)-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers. Data Knowl. Eng. 70(10), 892–921 (2011)CrossRef
4.
Zurück zum Zitat Rebollo-Monedero, D., Forné, J., Pallarés, E., Parra-Arnau, J.: A modification of the Lloyd algorithm for \(k\)-anonymous quantization. Inf. Sci. 222, 185–202 (2013)MathSciNetCrossRefMATH Rebollo-Monedero, D., Forné, J., Pallarés, E., Parra-Arnau, J.: A modification of the Lloyd algorithm for \(k\)-anonymous quantization. Inf. Sci. 222, 185–202 (2013)MathSciNetCrossRefMATH
5.
Zurück zum Zitat Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55, 3232–3243 (2011)MathSciNetCrossRef Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55, 3232–3243 (2011)MathSciNetCrossRef
6.
Zurück zum Zitat Liu, L., Wang, J., Zhang, J.: Wavelet-based data perturbation for simultaneous privacy-preserving and statistics-preserving. In: IEEE ICDM Workshops (2008) Liu, L., Wang, J., Zhang, J.: Wavelet-based data perturbation for simultaneous privacy-preserving and statistics-preserving. In: IEEE ICDM Workshops (2008)
7.
Zurück zum Zitat Muralidhar, K., Sarathy, R.: An enhanced data perturbation approach for small data sets. Decis. Sci. 36(3), 513–529 (2005)CrossRef Muralidhar, K., Sarathy, R.: An enhanced data perturbation approach for small data sets. Decis. Sci. 36(3), 513–529 (2005)CrossRef
8.
Zurück zum Zitat Kim, J., Winkler, W.: Multiplicative noise for masking continuous data, U.S. Bureau of the Census, RR2003/01 (2003) Kim, J., Winkler, W.: Multiplicative noise for masking continuous data, U.S. Bureau of the Census, RR2003/01 (2003)
9.
Zurück zum Zitat Carlson, M., Salabasis, M.: A data swapping technique using ranks: a method for disclosure control. Res. Off. Stat. 5(2), 35–64 (2002) Carlson, M., Salabasis, M.: A data swapping technique using ranks: a method for disclosure control. Res. Off. Stat. 5(2), 35–64 (2002)
10.
Zurück zum Zitat Raghunathan, T.E., Reiter, J.P., Rubin, D.B.: Multiple imputation for statistical disclosure limitation. J. Off. Stat. 19(1), 1–16 (2003) Raghunathan, T.E., Reiter, J.P., Rubin, D.B.: Multiple imputation for statistical disclosure limitation. J. Off. Stat. 19(1), 1–16 (2003)
11.
Zurück zum Zitat Reiter, J.P., Drechsler, J.: Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality. Stat. Sinica 20, 405–421 (2010)MathSciNetMATH Reiter, J.P., Drechsler, J.: Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality. Stat. Sinica 20, 405–421 (2010)MathSciNetMATH
12.
Zurück zum Zitat Drechsler, J., Bender, S., Rässler, S.: Comparing fully and partially synthetic datasets for statistical disclosure control in the German IAB Establishment Panel. Trans. Data Priv. 1, 105–130 (2008)MathSciNet Drechsler, J., Bender, S., Rässler, S.: Comparing fully and partially synthetic datasets for statistical disclosure control in the German IAB Establishment Panel. Trans. Data Priv. 1, 105–130 (2008)MathSciNet
13.
Zurück zum Zitat Reiss, S.P.: Practical data-swapping: the first steps. ACM Trans. Dataase Syst. 9(1), 20–37 (1984)CrossRefMATH Reiss, S.P.: Practical data-swapping: the first steps. ACM Trans. Dataase Syst. 9(1), 20–37 (1984)CrossRefMATH
14.
Zurück zum Zitat Liu, K., Kargupta, H., Ryan, J.: Random projection based multiplicative data perturbation for privacy preserving data mining. IEEE Trans. Knowl. Data Eng. 18(1), 92–106 (2006)CrossRef Liu, K., Kargupta, H., Ryan, J.: Random projection based multiplicative data perturbation for privacy preserving data mining. IEEE Trans. Knowl. Data Eng. 18(1), 92–106 (2006)CrossRef
15.
Zurück zum Zitat Hajian, S., Azgomi, M.A.: A privacy preserving clustering technique using Haar wavelet transform and scaling data perturbation. IEEE (2008) Hajian, S., Azgomi, M.A.: A privacy preserving clustering technique using Haar wavelet transform and scaling data perturbation. IEEE (2008)
16.
Zurück zum Zitat Bapna, S., Gangopadhyay, A.: A wavelet-based approach to preserve privacy for classification mining. Decis. Sci. 37(4), 623–642 (2006)CrossRef Bapna, S., Gangopadhyay, A.: A wavelet-based approach to preserve privacy for classification mining. Decis. Sci. 37(4), 623–642 (2006)CrossRef
17.
Zurück zum Zitat Mukherjee, S., Chen, Z., Gangopadhyay, A.: A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms. VLDB J. 15, 293–315 (2006)CrossRef Mukherjee, S., Chen, Z., Gangopadhyay, A.: A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms. VLDB J. 15, 293–315 (2006)CrossRef
18.
Zurück zum Zitat Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 439–450 (2000) Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 439–450 (2000)
19.
Zurück zum Zitat Domingo-Ferrer, J., Mateo-Sanz, J. M., Torra, V.: Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: Pre-proceedings of ETK-NTTS 2001, vol. 2, pp. 807–826. Eurostat (2001) Domingo-Ferrer, J., Mateo-Sanz, J. M., Torra, V.: Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: Pre-proceedings of ETK-NTTS 2001, vol. 2, pp. 807–826. Eurostat (2001)
20.
Zurück zum Zitat Domingo-Ferrer, J., González-Nicolás, U.: Hybrid microdata using microaggregation. Inf. Sci. 180, 2834–2844 (2010)CrossRef Domingo-Ferrer, J., González-Nicolás, U.: Hybrid microdata using microaggregation. Inf. Sci. 180, 2834–2844 (2010)CrossRef
21.
Zurück zum Zitat Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure risk assessment in perturbative microdata protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 135–152 (2002) Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure risk assessment in perturbative microdata protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 135–152 (2002)
22.
Zurück zum Zitat Trottini, M.: Decision models for data disclosure limitation, Ph.D. Dissertation, Carnegie Mellon University (2003) Trottini, M.: Decision models for data disclosure limitation, Ph.D. Dissertation, Carnegie Mellon University (2003)
23.
Zurück zum Zitat Mateo-Sanz, J.M., Domingo-Ferrer, J., Sebé, F.: Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min. Knowl. Disc. 11(2), 181–193 (2005)MathSciNetCrossRef Mateo-Sanz, J.M., Domingo-Ferrer, J., Sebé, F.: Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min. Knowl. Disc. 11(2), 181–193 (2005)MathSciNetCrossRef
24.
Zurück zum Zitat Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the PODS 2001, pp. 247–255 (2001) Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the PODS 2001, pp. 247–255 (2001)
25.
Zurück zum Zitat Torra, V., Carlson, M.: On the Hellinger distance for measuring information loss in microdata, UNECE/Eurostat Work Session on Statistical Confidentiality, 8th Work Session 2013, Ottawa, Canada (2013) Torra, V., Carlson, M.: On the Hellinger distance for measuring information loss in microdata, UNECE/Eurostat Work Session on Statistical Confidentiality, 8th Work Session 2013, Ottawa, Canada (2013)
26.
Zurück zum Zitat Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)CrossRef Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)CrossRef
27.
Zurück zum Zitat Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng. 17(7), 902–911 (2005)CrossRef Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng. 17(7), 902–911 (2005)CrossRef
28.
Zurück zum Zitat Chang, C.-C., Li, Y.-C., Huang, W.-H.: TFRP: an efficient microaggregation algorithm for statistical disclosure control. J. Syst. Softw. 80, 1866–1878 (2007)CrossRef Chang, C.-C., Li, Y.-C., Huang, W.-H.: TFRP: an efficient microaggregation algorithm for statistical disclosure control. J. Syst. Softw. 80, 1866–1878 (2007)CrossRef
29.
Zurück zum Zitat Panagiotakis, C., Tziritas, G.: Successive group selection for microaggregation. IEEE Trans. Knowl. Data Eng. 25(5), 1191–1195 (2013)CrossRef Panagiotakis, C., Tziritas, G.: Successive group selection for microaggregation. IEEE Trans. Knowl. Data Eng. 25(5), 1191–1195 (2013)CrossRef
30.
Zurück zum Zitat Laszlo, M., Mukherjee, S.: Iterated local search for microaggregation. J. Syst. Soft. 100, 15–26 (2015)CrossRef Laszlo, M., Mukherjee, S.: Iterated local search for microaggregation. J. Syst. Soft. 100, 15–26 (2015)CrossRef
31.
Zurück zum Zitat Cheng, L., Cheng, S., Jiang, F.: ADKAM: A-diversity k-anonymity model via microaggregation. In: Proceedings of the ISPEC 2015. LNCS, vol. 9065, pp. 533–547 (2015) Cheng, L., Cheng, S., Jiang, F.: ADKAM: A-diversity k-anonymity model via microaggregation. In: Proceedings of the ISPEC 2015. LNCS, vol. 9065, pp. 533–547 (2015)
32.
Zurück zum Zitat Salari, M., Jalili, S., Mortazavi, R.: TBM, a transformation based method for microaggregation of large volume mixed data. Data Min. Knowl. Discov. (2016, in press). doi:10.1007/s10618-016-0457-y. Salari, M., Jalili, S., Mortazavi, R.: TBM, a transformation based method for microaggregation of large volume mixed data. Data Min. Knowl. Discov. (2016, in press). doi:10.​1007/​s10618-016-0457-y.
33.
Zurück zum Zitat Gomatam, S., Karr, A.F., Sanil, A.P.: Data swapping as a decision problem. J. Off. Stat. 21(4), 635–655 (2005) Gomatam, S., Karr, A.F., Sanil, A.P.: Data swapping as a decision problem. J. Off. Stat. 21(4), 635–655 (2005)
34.
Zurück zum Zitat Shlomo, N., Antal, L., Elliot, M.: Measuring disclosure risk and data utility for flexible table generators. J. Off. Stat. 31(2), 305–324 (2015) Shlomo, N., Antal, L., Elliot, M.: Measuring disclosure risk and data utility for flexible table generators. J. Off. Stat. 31(2), 305–324 (2015)
35.
Zurück zum Zitat Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics. Springer, New York (2001)CrossRefMATH Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics. Springer, New York (2001)CrossRefMATH
36.
Zurück zum Zitat Torra, V.: Progress report on record linkage for risk assessment. DwB project, Deliverable 11.3 (2014) Torra, V.: Progress report on record linkage for risk assessment. DwB project, Deliverable 11.3 (2014)
37.
Zurück zum Zitat Torra, V.: On information loss measures for categorical data, Report 3, Ottilie Project (2000) Torra, V.: On information loss measures for categorical data, Report 3, Ottilie Project (2000)
38.
Zurück zum Zitat Aggarwal, C.C., Yu, P.S.: A condensation approach to privacy preserving data mining. In: Proceedings of the EDBT, pp. 183–199 (2004) Aggarwal, C.C., Yu, P.S.: A condensation approach to privacy preserving data mining. In: Proceedings of the EDBT, pp. 183–199 (2004)
39.
Zurück zum Zitat Herranz, J., Matwin, S., Nin, J., Torra, V.: Classifying data from protected statistical datasets. Comput. Secur. 29, 875–890 (2010)CrossRef Herranz, J., Matwin, S., Nin, J., Torra, V.: Classifying data from protected statistical datasets. Comput. Secur. 29, 875–890 (2010)CrossRef
40.
Zurück zum Zitat Sakuma, J.: Recommendation based on k-anonymized ratings. Arxiv (2017) Sakuma, J.: Recommendation based on k-anonymized ratings. Arxiv (2017)
41.
Zurück zum Zitat Torra, V., Navarro-Arribas, G.: Integral privacy. In: Proceedings of the CANS 2016. LNCS, vol. 10052, pp. 661–669 (2016) Torra, V., Navarro-Arribas, G.: Integral privacy. In: Proceedings of the CANS 2016. LNCS, vol. 10052, pp. 661–669 (2016)
42.
Zurück zum Zitat Ladra, S., Torra, V.: On the comparison of generic information loss measures and cluster-specific ones. Int. J. Unc. Fuzz. Knowl. Based Syst. 16(1), 107–120 (2008)CrossRef Ladra, S., Torra, V.: On the comparison of generic information loss measures and cluster-specific ones. Int. J. Unc. Fuzz. Knowl. Based Syst. 16(1), 107–120 (2008)CrossRef
43.
Zurück zum Zitat Batet, M., Erola, A., Sánchez, D., Castellà-Roca, J.: Utility preserving query log anonymization via semantic microaggregation. Inf. Sci. 242, 49–63 (2013)CrossRef Batet, M., Erola, A., Sánchez, D., Castellà-Roca, J.: Utility preserving query log anonymization via semantic microaggregation. Inf. Sci. 242, 49–63 (2013)CrossRef
44.
Zurück zum Zitat Torra, V.: On the definition of cluster-specific information loss measures. In: Solanas, A., Martínez-Ballesté, A. (eds.) Advances in Artificial Intelligence for Privacy Protection and Security, pp. 145–163. World Scientific (2009) Torra, V.: On the definition of cluster-specific information loss measures. In: Solanas, A., Martínez-Ballesté, A. (eds.) Advances in Artificial Intelligence for Privacy Protection and Security, pp. 145–163. World Scientific (2009)
Metadaten
Titel
Information Loss: Evaluation and Measures
verfasst von
Vicenç Torra
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-57358-8_7