Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 3/2016

01.05.2016

Enhancing aggregation phase of microaggregation methods for interval disclosure risk minimization

verfasst von: Reza Mortazavi, Saeed Jalili

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Microaggregation is a masking mechanism to protect confidential data in a public release. This technique can produce a k-anonymous dataset where data records are partitioned into groups of at least k members. In each group, a representative centroid is computed by aggregating the group members and is published instead of the original records. In a conventional microaggregation algorithm, the centroids are computed based on simple arithmetic mean of group members. This naïve formulation does not consider the proximity of the published values to the original ones, so an intruder may be able to guess the original values. This paper proposes a disclosure-aware aggregation model, where published values are computed in a given distance from the original ones to attain a more protected and useful published dataset. Empirical results show the superiority of the proposed method in achieving a better trade-off point between disclosure risk and information loss in comparison with other similar anonymization techniques.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The measures are discussed in Sect. 2.3 with more details.
 
2
For simplicity, we define \(Var(X)=\sigma ^2_X=1/n \sum _{i=1}^{n}(x_i-\mu _{X})^2\) where X is a set of n equally likely values \(x_i\) with \(\mu _{X}=Mean(X)\).
 
3
We review some general purpose \(\textit{DR}\) and \(\textit{IL}\) measures only for continuous data type, which is addressed in this paper. The variants of the measures for other data types can be found in Hundepool et al. (2012).
 
4
It is also known as identity disclosure or re-identification risk.
 
5
Interval disclosure is a special case of attribute disclosure for continuous datasets.
 
6
The heuristic can be simply extended to consider each attribute separately, however, our experiments show that there is no a significant improvement that justifies this additional cost.
 
7
These methods are described in Sect. 3.
 
8
Please note that in Table 2, MDAV-DA usually performs better for \(k=5\) than other aggregation levels for MDAV-DA.
 
9
In fact, we select the trade-off point with closest but greater \(\textit{DR}\) than the value of MDAV-DA, to allow a more (potential) decrease of \(\textit{IL}\) for the methods.
 
10
An illustrative example is presented in Fig. 1.
 
Literatur
Zurück zum Zitat Askari M, Safavi-Naini R, Barker K (2012) An information theoretic privacy and utility measure for data sanitization mechanisms. In: Proceedings of the second ACM conference on data and application security and privacy, ACM, New York, NY CODASPY, pp 283–294 Askari M, Safavi-Naini R, Barker K (2012) An information theoretic privacy and utility measure for data sanitization mechanisms. In: Proceedings of the second ACM conference on data and application security and privacy, ACM, New York, NY CODASPY, pp 283–294
Zurück zum Zitat Batet M, Erola A, Sánchez D, Castellà-Roca J (2013) Utility preserving query log anonymization via semantic microaggregation. Inf Sci 242:49–63CrossRef Batet M, Erola A, Sánchez D, Castellà-Roca J (2013) Utility preserving query log anonymization via semantic microaggregation. Inf Sci 242:49–63CrossRef
Zurück zum Zitat Brand R (2003) Microdata protection through noise addition. In: Domingo-Ferrer J (ed) Inference control in statistical databases., Lecture notes in computer scienceSpringer, Berlin, pp 97–116 Brand R (2003) Microdata protection through noise addition. In: Domingo-Ferrer J (ed) Inference control in statistical databases., Lecture notes in computer scienceSpringer, Berlin, pp 97–116
Zurück zum Zitat Brand R, Domingo-Ferrer J, Mateo-Sanz J (2002) Reference data sets to test and compare SDC methods for protection of numerical microdata. European Project IST-2000-25069 CASC, http://neon.vb.cbs.nl/casc Brand R, Domingo-Ferrer J, Mateo-Sanz J (2002) Reference data sets to test and compare SDC methods for protection of numerical microdata. European Project IST-2000-25069 CASC, http://​neon.​vb.​cbs.​nl/​casc
Zurück zum Zitat Charu A, Philip S (2008) Privacy-preserving data mining: models and algorithms. ASPVU, Boston Charu A, Philip S (2008) Privacy-preserving data mining: models and algorithms. ASPVU, Boston
Zurück zum Zitat Defays D, Nanopoulos P (1993) Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 1992 symposium on design and analysis of longitudinal surveys, pp 195–204 Defays D, Nanopoulos P (1993) Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 1992 symposium on design and analysis of longitudinal surveys, pp 195–204
Zurück zum Zitat Domingo-Ferrer J, Torra V (2001a) Disclosure protection methods and information loss for microdata. Confidentiality, disclosure and data access: theory and practical applications for statistical agencies, pp 91–110 Domingo-Ferrer J, Torra V (2001a) Disclosure protection methods and information loss for microdata. Confidentiality, disclosure and data access: theory and practical applications for statistical agencies, pp 91–110
Zurück zum Zitat Domingo-Ferrer J, Torra V (2001b) A quantitative comparison of disclosure control methods for microdata. Confidentiality, disclosure and data access: theory and practical applications for statistical agencies, pp 111–134 Domingo-Ferrer J, Torra V (2001b) A quantitative comparison of disclosure control methods for microdata. Confidentiality, disclosure and data access: theory and practical applications for statistical agencies, pp 111–134
Zurück zum Zitat Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212MathSciNetCrossRef Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212MathSciNetCrossRef
Zurück zum Zitat Domingo-Ferrer J, Rebollo-Monedero D (2009) Measuring risk and utility of anonymized data using information theory. In: Proceedings of the EDBT/ICDT Workshops, ACM, New York, NY, EDBT/ICDT, pp 126–130 Domingo-Ferrer J, Rebollo-Monedero D (2009) Measuring risk and utility of anonymized data using information theory. In: Proceedings of the EDBT/ICDT Workshops, ACM, New York, NY, EDBT/ICDT, pp 126–130
Zurück zum Zitat Domingo-Ferrer J, Mateo-Sanz JM, Torra V (2001) Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: Pre-proceedings of ETK-NTTS, vol 2, pp 807–826 Domingo-Ferrer J, Mateo-Sanz JM, Torra V (2001) Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: Pre-proceedings of ETK-NTTS, vol 2, pp 807–826
Zurück zum Zitat Domingo-Ferrer J, Martínez-Ballesté A, Mateo-Sanz JM, Sebé F (2006a) Efficient multivariate data-oriented microaggregation. VLDB J 15(4):355–369CrossRef Domingo-Ferrer J, Martínez-Ballesté A, Mateo-Sanz JM, Sebé F (2006a) Efficient multivariate data-oriented microaggregation. VLDB J 15(4):355–369CrossRef
Zurück zum Zitat Domingo-Ferrer J, Solanas A, Martinez-Balleste A (2006b) Privacy in statistical databases: k-anonymity through microaggregation. In: Proceedings of international conference on granular computing, IEEE, pp 774–777 Domingo-Ferrer J, Solanas A, Martinez-Balleste A (2006b) Privacy in statistical databases: k-anonymity through microaggregation. In: Proceedings of international conference on granular computing, IEEE, pp 774–777
Zurück zum Zitat Domingo-Ferrer J, Sebé F, Solanas A (2008) An anonymity model achievable via microaggregation. In: Secure data management, Springer, Heidelberg, pp 209–218 Domingo-Ferrer J, Sebé F, Solanas A (2008) An anonymity model achievable via microaggregation. In: Secure data management, Springer, Heidelberg, pp 209–218
Zurück zum Zitat Fayyoumi E, Oommen BJ (2010) A survey on statistical disclosure control and micro-aggregation techniques for secure statistical databases. Softw Pract Exp 40(12):1161–1188CrossRef Fayyoumi E, Oommen BJ (2010) A survey on statistical disclosure control and micro-aggregation techniques for secure statistical databases. Softw Pract Exp 40(12):1161–1188CrossRef
Zurück zum Zitat Hansen S, Mukherjee S (2003) A polynomial algorithm for optimal univariate microaggregation. IEEE Trans Knowl Data Eng 15(4):1043–1044CrossRef Hansen S, Mukherjee S (2003) A polynomial algorithm for optimal univariate microaggregation. IEEE Trans Knowl Data Eng 15(4):1043–1044CrossRef
Zurück zum Zitat Heaton B (2012) New record ordering heuristics for multivariate microaggregation. PhD thesis, Nova Southeastern University Heaton B (2012) New record ordering heuristics for multivariate microaggregation. PhD thesis, Nova Southeastern University
Zurück zum Zitat Herranz J, Matwin S, Nin J, Torra V (2010) Classifying data from protected statistical datasets. Comput Secur 29(8):875–890CrossRef Herranz J, Matwin S, Nin J, Torra V (2010) Classifying data from protected statistical datasets. Comput Secur 29(8):875–890CrossRef
Zurück zum Zitat Herranz J, Nin J, Solé M (2012a) Kd-trees and the real disclosure risks of large statistical databases. Inf Fusion 13(4):260–273CrossRef Herranz J, Nin J, Solé M (2012a) Kd-trees and the real disclosure risks of large statistical databases. Inf Fusion 13(4):260–273CrossRef
Zurück zum Zitat Herranz J, Nin J, Solé M (2012b) More hybrid and secure protection of statistical data sets. IEEE Trans Dependable Secur Comput 9(5):727–740 Herranz J, Nin J, Solé M (2012b) More hybrid and secure protection of statistical data sets. IEEE Trans Dependable Secur Comput 9(5):727–740
Zurück zum Zitat Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholt ES, Spicer K, De Wolf PP (2006) Cenex SDC handbook on statistical disclosure control, version 1.01 Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholt ES, Spicer K, De Wolf PP (2006) Cenex SDC handbook on statistical disclosure control, version 1.01
Zurück zum Zitat Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholt ES, Spicer K, De Wolf PP (2012) Statistical disclosure control. Wiley, ChichesterCrossRef Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholt ES, Spicer K, De Wolf PP (2012) Statistical disclosure control. Wiley, ChichesterCrossRef
Zurück zum Zitat Kim JJ (1986) A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the ASA section on survey research methodology, pp 303–308 Kim JJ (1986) A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the ASA section on survey research methodology, pp 303–308
Zurück zum Zitat Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans Knowl Data Eng 17(7):902–911CrossRef Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans Knowl Data Eng 17(7):902–911CrossRef
Zurück zum Zitat Li Y, Zhu S, Wang L, Jajodia S (2002) A privacy-enhanced microaggregation method. In: Eiter T, Schewe KD (eds) Foundations of Information and Knowledge Systems., Lecture notes in computer scienceSpringer, Berlin, pp 148–159CrossRef Li Y, Zhu S, Wang L, Jajodia S (2002) A privacy-enhanced microaggregation method. In: Eiter T, Schewe KD (eds) Foundations of Information and Knowledge Systems., Lecture notes in computer scienceSpringer, Berlin, pp 148–159CrossRef
Zurück zum Zitat Lin JL, Chang PC, Liu JYC, Wen TH (2010) Comparison of microaggregation approaches on anonymized data quality. Expert Syst Appl 37(12):8161–8165CrossRef Lin JL, Chang PC, Liu JYC, Wen TH (2010) Comparison of microaggregation approaches on anonymized data quality. Expert Syst Appl 37(12):8161–8165CrossRef
Zurück zum Zitat López A (2011) Effect of microaggregation on regression results: an application to Spanish innovation data. Emprical Econ Lett 10(12):1265–1272 López A (2011) Effect of microaggregation on regression results: an application to Spanish innovation data. Emprical Econ Lett 10(12):1265–1272
Zurück zum Zitat Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) L-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov From Data (TKDD) 1(1):1–52CrossRef Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) L-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov From Data (TKDD) 1(1):1–52CrossRef
Zurück zum Zitat Mateo-Sanz J, Sebé F, Domingo-Ferrer J (2004) Outlier protection in continuous microdata masking. In: Domingo-Ferrer J, Torra V (eds) Privacy in statistical databases., Lecture notes in computer scienceSpringer, Berlin, pp 201–215CrossRef Mateo-Sanz J, Sebé F, Domingo-Ferrer J (2004) Outlier protection in continuous microdata masking. In: Domingo-Ferrer J, Torra V (eds) Privacy in statistical databases., Lecture notes in computer scienceSpringer, Berlin, pp 201–215CrossRef
Zurück zum Zitat Mateo-Sanz J, Domingo-Ferrer J, Sebé F (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min Knowl Discov 11(2):181–193MathSciNetCrossRef Mateo-Sanz J, Domingo-Ferrer J, Sebé F (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min Knowl Discov 11(2):181–193MathSciNetCrossRef
Zurück zum Zitat Moore Jr RA (1996) Controlled data-swapping techniques for masking public use microdata sets. Tech. Rep. 96-04, Statistical Research Division Report Series, US Bureau of the Census, Washington D.C Moore Jr RA (1996) Controlled data-swapping techniques for masking public use microdata sets. Tech. Rep. 96-04, Statistical Research Division Report Series, US Bureau of the Census, Washington D.C
Zurück zum Zitat Mortazavi R, Jalili S (2014) Fast data-oriented microaggregation algorithm for large numerical datasets. Knowl Based Syst 67:195–205CrossRef Mortazavi R, Jalili S (2014) Fast data-oriented microaggregation algorithm for large numerical datasets. Knowl Based Syst 67:195–205CrossRef
Zurück zum Zitat Mortazavi R, Jalili S (2015) Preference-based anonymization of numerical datasets by multi-objective microaggregation. Inf Fusion 25:85–104CrossRef Mortazavi R, Jalili S (2015) Preference-based anonymization of numerical datasets by multi-objective microaggregation. Inf Fusion 25:85–104CrossRef
Zurück zum Zitat Mortazavi R, Jalili S, Gohargazi H (2013) Multivariate microaggregation by iterative optimization. Appl Intell 39(3):529–544CrossRef Mortazavi R, Jalili S, Gohargazi H (2013) Multivariate microaggregation by iterative optimization. Appl Intell 39(3):529–544CrossRef
Zurück zum Zitat Navarro-Arribas G, Torra V (2009) Towards microaggregation of log files for Web usage mining in B2C e-commerce. In: Fuzzy information processing society (NAFIPS), IEEE, pp 1–6 Navarro-Arribas G, Torra V (2009) Towards microaggregation of log files for Web usage mining in B2C e-commerce. In: Fuzzy information processing society (NAFIPS), IEEE, pp 1–6
Zurück zum Zitat Navarro-Arribas G, Torra V (2012) Information fusion in data privacy: a survey. Inf Fusion 13(4):235–244CrossRef Navarro-Arribas G, Torra V (2012) Information fusion in data privacy: a survey. Inf Fusion 13(4):235–244CrossRef
Zurück zum Zitat Nin J, Herranz J, Torra V (2008) On the disclosure risk of multivariate microaggregation. Data Knowl Eng 67(3):399–412CrossRef Nin J, Herranz J, Torra V (2008) On the disclosure risk of multivariate microaggregation. Data Knowl Eng 67(3):399–412CrossRef
Zurück zum Zitat Oganian A, Domingo-Ferrer J (2001) On the complexity of optimal microaggregation for statistical disclosure control. Stat J U N Econ Com Eur 18(4):345–354 Oganian A, Domingo-Ferrer J (2001) On the complexity of optimal microaggregation for statistical disclosure control. Stat J U N Econ Com Eur 18(4):345–354
Zurück zum Zitat Oganian A, Karr AF (2006) Combinations of SDC methods for microdata protection. In: Privacy in Statistical Databases, Springer, Heidelberg, pp 102–113 Oganian A, Karr AF (2006) Combinations of SDC methods for microdata protection. In: Privacy in Statistical Databases, Springer, Heidelberg, pp 102–113
Zurück zum Zitat Pagliuca D, Seri G (1999) Some results of individual ranking method on the system of enterprise accounts annual survey. Report, Esprit SDC Project, Deliverable MI-3 D Pagliuca D, Seri G (1999) Some results of individual ranking method on the system of enterprise accounts annual survey. Report, Esprit SDC Project, Deliverable MI-3 D
Zurück zum Zitat Schmid M, Schneeweiss H, Küchenhoff H (2007) Estimation of a linear regression under microaggregation with the response variable as a sorting variable. Statis Neerl 61(4):407–431MathSciNetCrossRefMATH Schmid M, Schneeweiss H, Küchenhoff H (2007) Estimation of a linear regression under microaggregation with the response variable as a sorting variable. Statis Neerl 61(4):407–431MathSciNetCrossRefMATH
Zurück zum Zitat Solanas A (2008) Privacy protection with genetic algorithms. In: Yang A, Shan Y, Bui L (eds) Success in evolutionary computation, studies in computational intelligence. Springer, Berlin, pp 215–237CrossRef Solanas A (2008) Privacy protection with genetic algorithms. In: Yang A, Shan Y, Bui L (eds) Success in evolutionary computation, studies in computational intelligence. Springer, Berlin, pp 215–237CrossRef
Zurück zum Zitat Solanas A, Sebé F, Domingo-Ferrer J (2008) Micro-aggregation-based heuristics for p-sensitive k-anonymity: one step beyond. In: Proceedings of the 2008 international workshop on privacy and anonymity in information society, ACM, pp 61–69 Solanas A, Sebé F, Domingo-Ferrer J (2008) Micro-aggregation-based heuristics for p-sensitive k-anonymity: one step beyond. In: Proceedings of the 2008 international workshop on privacy and anonymity in information society, ACM, pp 61–69
Zurück zum Zitat Solé M, Muntés-Mulero V, Nin J (2012) Efficient microaggregation techniques for large numerical data volumes. Int J Inf Secur 11(4):253–267CrossRef Solé M, Muntés-Mulero V, Nin J (2012) Efficient microaggregation techniques for large numerical data volumes. Int J Inf Secur 11(4):253–267CrossRef
Zurück zum Zitat Torra V (2005) Fuzzy c-means for fuzzy hierarchical clustering. In: The 14th IEEE international conference on fuzzy systems, IEEE, pp 646–651 Torra V (2005) Fuzzy c-means for fuzzy hierarchical clustering. In: The 14th IEEE international conference on fuzzy systems, IEEE, pp 646–651
Zurück zum Zitat Truta TM, Vinay B (2006) Privacy protection: p-sensitive k-anonymity property. In: Proceedings 22nd international conference on data engineering workshops, IEEE, pp 94–94 Truta TM, Vinay B (2006) Privacy protection: p-sensitive k-anonymity property. In: Proceedings 22nd international conference on data engineering workshops, IEEE, pp 94–94
Zurück zum Zitat Willenborg LC, De Waal T (2001) Elements of statistical disclosure control, vol 155. Springer, New YorkMATH Willenborg LC, De Waal T (2001) Elements of statistical disclosure control, vol 155. Springer, New YorkMATH
Zurück zum Zitat Winkler WE (2004) Re-identification methods for masked microdata. In: Privacy in statistical databases, Springer, Berlin, pp 216–230 Winkler WE (2004) Re-identification methods for masked microdata. In: Privacy in statistical databases, Springer, Berlin, pp 216–230
Zurück zum Zitat Yancey W, Winkler W, Creecy R (2002) Disclosure risk assessment in perturbative microdata protection. In: Domingo-Ferrer J (ed) Inference control in statistical databases., Lecture notes in computer scienceSpringer, Berlin, pp 135–152CrossRef Yancey W, Winkler W, Creecy R (2002) Disclosure risk assessment in perturbative microdata protection. In: Domingo-Ferrer J (ed) Inference control in statistical databases., Lecture notes in computer scienceSpringer, Berlin, pp 135–152CrossRef
Metadaten
Titel
Enhancing aggregation phase of microaggregation methods for interval disclosure risk minimization
verfasst von
Reza Mortazavi
Saeed Jalili
Publikationsdatum
01.05.2016
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 3/2016
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-015-0432-z

Weitere Artikel der Ausgabe 3/2016

Data Mining and Knowledge Discovery 3/2016 Zur Ausgabe

Premium Partner