nach oben

Data Mining and Knowledge Discovery

Erschienen in:

01.05.2016

Enhancing aggregation phase of microaggregation methods for interval disclosure risk minimization

verfasst von: Reza Mortazavi, Saeed Jalili

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Microaggregation is a masking mechanism to protect confidential data in a public release. This technique can produce a k-anonymous dataset where data records are partitioned into groups of at least k members. In each group, a representative centroid is computed by aggregating the group members and is published instead of the original records. In a conventional microaggregation algorithm, the centroids are computed based on simple arithmetic mean of group members. This naïve formulation does not consider the proximity of the published values to the original ones, so an intruder may be able to guess the original values. This paper proposes a disclosure-aware aggregation model, where published values are computed in a given distance from the original ones to attain a more protected and useful published dataset. Empirical results show the superiority of the proposed method in achieving a better trade-off point between disclosure risk and information loss in comparison with other similar anonymization techniques.

Vorheriger Artikel Parameter learning in hybrid Bayesian networks using prior knowledge

Nächster Artikel MINAS: multiclass learning algorithm for novelty detection in data streams

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

The measures are discussed in Sect. 2.3 with more details.

For simplicity, we define \(Var(X)=\sigma ^2_X=1/n \sum _{i=1}^{n}(x_i-\mu _{X})^2\) where X is a set of n equally likely values \(x_i\) with \(\mu _{X}=Mean(X)\).

We review some general purpose \(\textit{DR}\) and \(\textit{IL}\) measures only for continuous data type, which is addressed in this paper. The variants of the measures for other data types can be found in Hundepool et al. (2012).

It is also known as identity disclosure or re-identification risk.

Interval disclosure is a special case of attribute disclosure for continuous datasets.

The heuristic can be simply extended to consider each attribute separately, however, our experiments show that there is no a significant improvement that justifies this additional cost.

These methods are described in Sect. 3.

Please note that in Table 2, MDAV-DA usually performs better for \(k=5\) than other aggregation levels for MDAV-DA.

In fact, we select the trade-off point with closest but greater \(\textit{DR}\) than the value of MDAV-DA, to allow a more (potential) decrease of \(\textit{IL}\) for the methods.

An illustrative example is presented in Fig. 1.

Askari M, Safavi-Naini R, Barker K (2012) An information theoretic privacy and utility measure for data sanitization mechanisms. In: Proceedings of the second ACM conference on data and application security and privacy, ACM, New York, NY CODASPY, pp 283–294

Batet M, Erola A, Sánchez D, Castellà-Roca J (2013) Utility preserving query log anonymization via semantic microaggregation. Inf Sci 242:49–63CrossRef

Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517MathSciNetCrossRefMATH

Brand R (2003) Microdata protection through noise addition. In: Domingo-Ferrer J (ed) Inference control in statistical databases., Lecture notes in computer scienceSpringer, Berlin, pp 97–116

Brand R, Domingo-Ferrer J, Mateo-Sanz J (2002) Reference data sets to test and compare SDC methods for protection of numerical microdata. European Project IST-2000-25069 CASC, http://neon.vb.cbs.nl/casc

Burridge J (2003) Information preserving statistical obfuscation. Stat Comput 13(4):321–327MathSciNetCrossRef

Charu A, Philip S (2008) Privacy-preserving data mining: models and algorithms. ASPVU, Boston

Defays D, Nanopoulos P (1993) Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 1992 symposium on design and analysis of longitudinal surveys, pp 195–204

Domingo-Ferrer J, Torra V (2001a) Disclosure protection methods and information loss for microdata. Confidentiality, disclosure and data access: theory and practical applications for statistical agencies, pp 91–110

Domingo-Ferrer J, Torra V (2001b) A quantitative comparison of disclosure control methods for microdata. Confidentiality, disclosure and data access: theory and practical applications for statistical agencies, pp 111–134

Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212MathSciNetCrossRef

Domingo-Ferrer J, Rebollo-Monedero D (2009) Measuring risk and utility of anonymized data using information theory. In: Proceedings of the EDBT/ICDT Workshops, ACM, New York, NY, EDBT/ICDT, pp 126–130

Domingo-Ferrer J, Mateo-Sanz JM, Torra V (2001) Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: Pre-proceedings of ETK-NTTS, vol 2, pp 807–826

Domingo-Ferrer J, Martínez-Ballesté A, Mateo-Sanz JM, Sebé F (2006a) Efficient multivariate data-oriented microaggregation. VLDB J 15(4):355–369CrossRef

Domingo-Ferrer J, Solanas A, Martinez-Balleste A (2006b) Privacy in statistical databases: k-anonymity through microaggregation. In: Proceedings of international conference on granular computing, IEEE, pp 774–777

Domingo-Ferrer J, Sebé F, Solanas A (2008) An anonymity model achievable via microaggregation. In: Secure data management, Springer, Heidelberg, pp 209–218

Drud AS (1994) CONOPT a large-scale GRG code. ORSA J Comput 6(2):207–216CrossRefMATH

Fayyoumi E, Oommen BJ (2010) A survey on statistical disclosure control and micro-aggregation techniques for secure statistical databases. Softw Pract Exp 40(12):1161–1188CrossRef

Hansen S, Mukherjee S (2003) A polynomial algorithm for optimal univariate microaggregation. IEEE Trans Knowl Data Eng 15(4):1043–1044CrossRef

Heaton B (2012) New record ordering heuristics for multivariate microaggregation. PhD thesis, Nova Southeastern University

Herranz J, Matwin S, Nin J, Torra V (2010) Classifying data from protected statistical datasets. Comput Secur 29(8):875–890CrossRef

Herranz J, Nin J, Solé M (2012a) Kd-trees and the real disclosure risks of large statistical databases. Inf Fusion 13(4):260–273CrossRef

Herranz J, Nin J, Solé M (2012b) More hybrid and secure protection of statistical data sets. IEEE Trans Dependable Secur Comput 9(5):727–740

Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholt ES, Spicer K, De Wolf PP (2006) Cenex SDC handbook on statistical disclosure control, version 1.01

Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholt ES, Spicer K, De Wolf PP (2012) Statistical disclosure control. Wiley, ChichesterCrossRef

Kim JJ (1986) A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the ASA section on survey research methodology, pp 303–308

Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans Knowl Data Eng 17(7):902–911CrossRef

Li Y, Zhu S, Wang L, Jajodia S (2002) A privacy-enhanced microaggregation method. In: Eiter T, Schewe KD (eds) Foundations of Information and Knowledge Systems., Lecture notes in computer scienceSpringer, Berlin, pp 148–159CrossRef

Lin JL, Chang PC, Liu JYC, Wen TH (2010) Comparison of microaggregation approaches on anonymized data quality. Expert Syst Appl 37(12):8161–8165CrossRef

López A (2011) Effect of microaggregation on regression results: an application to Spanish innovation data. Emprical Econ Lett 10(12):1265–1272

Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) L-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov From Data (TKDD) 1(1):1–52CrossRef

Mateo-Sanz J, Sebé F, Domingo-Ferrer J (2004) Outlier protection in continuous microdata masking. In: Domingo-Ferrer J, Torra V (eds) Privacy in statistical databases., Lecture notes in computer scienceSpringer, Berlin, pp 201–215CrossRef

Mateo-Sanz J, Domingo-Ferrer J, Sebé F (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min Knowl Discov 11(2):181–193MathSciNetCrossRef

Moore Jr RA (1996) Controlled data-swapping techniques for masking public use microdata sets. Tech. Rep. 96-04, Statistical Research Division Report Series, US Bureau of the Census, Washington D.C

Mortazavi R, Jalili S (2014) Fast data-oriented microaggregation algorithm for large numerical datasets. Knowl Based Syst 67:195–205CrossRef

Mortazavi R, Jalili S (2015) Preference-based anonymization of numerical datasets by multi-objective microaggregation. Inf Fusion 25:85–104CrossRef

Mortazavi R, Jalili S, Gohargazi H (2013) Multivariate microaggregation by iterative optimization. Appl Intell 39(3):529–544CrossRef

Navarro-Arribas G, Torra V (2009) Towards microaggregation of log files for Web usage mining in B2C e-commerce. In: Fuzzy information processing society (NAFIPS), IEEE, pp 1–6

Navarro-Arribas G, Torra V (2012) Information fusion in data privacy: a survey. Inf Fusion 13(4):235–244CrossRef

Nin J, Herranz J, Torra V (2008) On the disclosure risk of multivariate microaggregation. Data Knowl Eng 67(3):399–412CrossRef

Oganian A, Domingo-Ferrer J (2001) On the complexity of optimal microaggregation for statistical disclosure control. Stat J U N Econ Com Eur 18(4):345–354

Oganian A, Karr AF (2006) Combinations of SDC methods for microdata protection. In: Privacy in Statistical Databases, Springer, Heidelberg, pp 102–113

Pagliuca D, Seri G (1999) Some results of individual ranking method on the system of enterprise accounts annual survey. Report, Esprit SDC Project, Deliverable MI-3 D

Schmid M, Schneeweiss H, Küchenhoff H (2007) Estimation of a linear regression under microaggregation with the response variable as a sorting variable. Statis Neerl 61(4):407–431MathSciNetCrossRefMATH

Solanas A (2008) Privacy protection with genetic algorithms. In: Yang A, Shan Y, Bui L (eds) Success in evolutionary computation, studies in computational intelligence. Springer, Berlin, pp 215–237CrossRef

Solanas A, Sebé F, Domingo-Ferrer J (2008) Micro-aggregation-based heuristics for p-sensitive k-anonymity: one step beyond. In: Proceedings of the 2008 international workshop on privacy and anonymity in information society, ACM, pp 61–69

Solé M, Muntés-Mulero V, Nin J (2012) Efficient microaggregation techniques for large numerical data volumes. Int J Inf Secur 11(4):253–267CrossRef

Sweeney L (2002) k-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570MathSciNetCrossRefMATH

Torra V (2005) Fuzzy c-means for fuzzy hierarchical clustering. In: The 14th IEEE international conference on fuzzy systems, IEEE, pp 646–651

Truta TM, Vinay B (2006) Privacy protection: p-sensitive k-anonymity property. In: Proceedings 22nd international conference on data engineering workshops, IEEE, pp 94–94

Willenborg LC, De Waal T (2001) Elements of statistical disclosure control, vol 155. Springer, New YorkMATH

Winkler WE (2004) Re-identification methods for masked microdata. In: Privacy in statistical databases, Springer, Berlin, pp 216–230

Yancey W, Winkler W, Creecy R (2002) Disclosure risk assessment in perturbative microdata protection. In: Domingo-Ferrer J (ed) Inference control in statistical databases., Lecture notes in computer scienceSpringer, Berlin, pp 135–152CrossRef

Titel: Enhancing aggregation phase of microaggregation methods for interval disclosure risk minimization
verfasst von: Reza Mortazavi
Saeed Jalili
Publikationsdatum: 01.05.2016
Verlag: Springer US
Erschienen in: Data Mining and Knowledge Discovery / Ausgabe 3/2016
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI: https://doi.org/10.1007/s10618-015-0432-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2016

Parameter learning in hybrid Bayesian networks using prior knowledge

MINAS: multiclass learning algorithm for novelty detection in data streams

Syndromic surveillance of Flu on Twitter using weakly supervised temporal topic models

Exploiting link structure for web page genre identification

Active exploration for large graphs

Fast exhaustive subgroup discovery with numerical target concepts

Premium Partner