Skip to main content
Erschienen in: Soft Computing 7/2011

01.07.2011 | Original Paper

An evolutionary approach to enhance data privacy

verfasst von: Javier Jiménez, Jordi Marés, Vicenç Torra

Erschienen in: Soft Computing | Ausgabe 7/2011

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Dissemination of data with sensitive information about individuals has an implicit risk of unauthorized disclosure. Perturbative masking methods propose the distortion of the original data sets before publication, tackling a difficult tradeoff between data utility (low information loss) and protection against disclosure (low disclosure risk). In this paper, we describe how information loss and disclosure risk measures can be integrated within an evolutionary algorithm to seek new and enhanced masking protections for continuous microdata. The proposed technique constitutes a hybrid approach that combines state-of-the-art protection methods with an evolutionary algorithm optimization. We also provide experimental results using three data sets in order to illustrate and empirically evaluate the application of this technique.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
The history of these kinds of algorithms goes into the early 1950s and is associated to different scientists, completely independent from each other (Michalewicz and Fogel 2004). Each procedure was slightly different, and some got names like evolutionary computation (Back et al. 2000), genetic algorithms (Holland 1975) or evolution strategies (Rechenberg 1970; Schwefel 1981). Through time, the different approaches borrowed, exchanged and modified ideas. Then the term evolutionary algorithm emerged to describe any of these algorithms, which is the denomination that we follow in this paper.
 
Literatur
Zurück zum Zitat Agrawal R, Srikant R (2000) Privacy preserving data mining. In: Proceedings of the ACM SIGMOD conference on management of data, pp 439–450 Agrawal R, Srikant R (2000) Privacy preserving data mining. In: Proceedings of the ACM SIGMOD conference on management of data, pp 439–450
Zurück zum Zitat Back T, Fogel DB, Michalewicz Z (eds) (2000) Evolutionary computation. Advanced algorithms and operations, vol 2. Institute of Physics Publishing, Bristol Back T, Fogel DB, Michalewicz Z (eds) (2000) Evolutionary computation. Advanced algorithms and operations, vol 2. Institute of Physics Publishing, Bristol
Zurück zum Zitat Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. In: IEEE proceedings of the 21st international conference on data engineering, ICDE, pp 217–228 Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. In: IEEE proceedings of the 21st international conference on data engineering, ICDE, pp 217–228
Zurück zum Zitat Brand R, Domingo-Ferrer J, Mateo-Sanz JM (2002) Reference data sets to test and compare SDC methods for protection of numerical microdata. Unscheduled Deliverable, European Project IST–2000–25069 CASC Brand R, Domingo-Ferrer J, Mateo-Sanz JM (2002) Reference data sets to test and compare SDC methods for protection of numerical microdata. Unscheduled Deliverable, European Project IST–2000–25069 CASC
Zurück zum Zitat Caruana RA, Schaffer JD (1988) Representation and hidden bias: Gray vs. binary coding for genetic algorithms. In: Proceedings of the 5th international conference on machine learning, Morgan Kaufmann, Los Altos, pp 153–161 Caruana RA, Schaffer JD (1988) Representation and hidden bias: Gray vs. binary coding for genetic algorithms. In: Proceedings of the 5th international conference on machine learning, Morgan Kaufmann, Los Altos, pp 153–161
Zurück zum Zitat Defays D, Anwar MN (1995) Micro-aggregation: a generic method. In: Proceedings of the 2nd international symposium on statistical confidentiality, pp 69–78 Defays D, Anwar MN (1995) Micro-aggregation: a generic method. In: Proceedings of the 2nd international symposium on statistical confidentiality, pp 69–78
Zurück zum Zitat Defays D, Nanopoulos P (1993) Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 1992 symposium on design and analysis of longitudinal surveys, pp 195–204 Defays D, Nanopoulos P (1993) Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 1992 symposium on design and analysis of longitudinal surveys, pp 195–204
Zurück zum Zitat Dick G (2005) A comparison of localised and global niching methods. In: Proceedings of the 17th annual colloquium of the spatial information research centre, pp 91–101 Dick G (2005) A comparison of localised and global niching methods. In: Proceedings of the 17th annual colloquium of the spatial information research centre, pp 91–101
Zurück zum Zitat Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201CrossRef Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201CrossRef
Zurück zum Zitat Domingo-Ferrer J, Torra V (2001) A quantitative comparison of disclosure control methods for microdata. In: Doyle P, Lane JI, Theeuwes JJM, Zayatz LV (eds) Confidentiality, disclosure and data access: theory and practical applications for statistical agencies, Chap 6. Elsevier, Amsterdam, pp 111–133 Domingo-Ferrer J, Torra V (2001) A quantitative comparison of disclosure control methods for microdata. In: Doyle P, Lane JI, Theeuwes JJM, Zayatz LV (eds) Confidentiality, disclosure and data access: theory and practical applications for statistical agencies, Chap 6. Elsevier, Amsterdam, pp 111–133
Zurück zum Zitat Domingo-Ferrer J, Torra V (2004) Disclosure risk assessment in statistical data protection. J Comput Appl Math 164:285–293MathSciNetCrossRef Domingo-Ferrer J, Torra V (2004) Disclosure risk assessment in statistical data protection. J Comput Appl Math 164:285–293MathSciNetCrossRef
Zurück zum Zitat Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212MathSciNetCrossRef Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212MathSciNetCrossRef
Zurück zum Zitat Domingo-Ferrer J, Mateo-Sanz JM, Torra V (2001) Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: New techniques and technologies for statistics: exchange of technology and know-how, ETK-NTTS’2001. Creta, Hersonissos, pp 807–826 Domingo-Ferrer J, Mateo-Sanz JM, Torra V (2001) Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: New techniques and technologies for statistics: exchange of technology and know-how, ETK-NTTS’2001. Creta, Hersonissos, pp 807–826
Zurück zum Zitat Duncan GT, Fienberg SE, Krishnan R, Padman R, Roehrig SF (2001a) Disclosure limitation methods and information loss for tabular data. In: Doyle P, Lane JI, Theuwes JJM, Vatz L (eds) Confidentiality, disclosure and data access: theory and practical applications for statistical agencies, Chap 7. Elsevier, Amsterdam, pp 135–166 Duncan GT, Fienberg SE, Krishnan R, Padman R, Roehrig SF (2001a) Disclosure limitation methods and information loss for tabular data. In: Doyle P, Lane JI, Theuwes JJM, Vatz L (eds) Confidentiality, disclosure and data access: theory and practical applications for statistical agencies, Chap 7. Elsevier, Amsterdam, pp 135–166
Zurück zum Zitat Duncan GT, Keller-McNulty SA, Stokes SL (2001b) Disclosure risk vs. data utility: the R-U confidentiality map. Technical report 121, National Institute of Statistical Sciences, NISS, North Carolina Duncan GT, Keller-McNulty SA, Stokes SL (2001b) Disclosure risk vs. data utility: the R-U confidentiality map. Technical report 121, National Institute of Statistical Sciences, NISS, North Carolina
Zurück zum Zitat Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press (2nd edn, MIT Press, 1992) Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press (2nd edn, MIT Press, 1992)
Zurück zum Zitat Iyengar VS (2002) Transforming data to satisfy privacy constraints. In: Proceedings of the Eigth ACM SIGKDD international conference on knowledge discovery and data mining, pp 279–288 Iyengar VS (2002) Transforming data to satisfy privacy constraints. In: Proceedings of the Eigth ACM SIGKDD international conference on knowledge discovery and data mining, pp 279–288
Zurück zum Zitat Jiménez J, Torra V (2009a) JPEG-based microdata protection methods. Technical reports IIIA–TR–2009–06, IIIA-CSIC Jiménez J, Torra V (2009a) JPEG-based microdata protection methods. Technical reports IIIA–TR–2009–06, IIIA-CSIC
Zurück zum Zitat Jiménez J, Torra V (2009b) Utility and risk of JPEG–based continuous microdata protection methods. In: IEEE Proceedings of the 4th international conference on availability, reliability and security, ARES Jiménez J, Torra V (2009b) Utility and risk of JPEG–based continuous microdata protection methods. In: IEEE Proceedings of the 4th international conference on availability, reliability and security, ARES
Zurück zum Zitat Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans Knowl Data Eng 17(7):902–911CrossRef Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans Knowl Data Eng 17(7):902–911CrossRef
Zurück zum Zitat LeFevre KR (2007) Anonymity in data publishing and distribution. PhD thesis, University of Wisconsin, Madison LeFevre KR (2007) Anonymity in data publishing and distribution. PhD thesis, University of Wisconsin, Madison
Zurück zum Zitat Mahfoud SW (1992) Crowding and preselection revisited. Technical report 92004, Illinois Genetic Algorithms Laboratory (IlliGAL), University of Illinois, also in Parallel Problem Solving From Nature, PPSN, 2:27–36 Mahfoud SW (1992) Crowding and preselection revisited. Technical report 92004, Illinois Genetic Algorithms Laboratory (IlliGAL), University of Illinois, also in Parallel Problem Solving From Nature, PPSN, 2:27–36
Zurück zum Zitat Mateo-Sanz JM, Domingo-Ferrer J, Sebé F (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min Knowl Discov 11(2):181–193MathSciNetCrossRef Mateo-Sanz JM, Domingo-Ferrer J, Sebé F (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min Knowl Discov 11(2):181–193MathSciNetCrossRef
Zurück zum Zitat Michalewicz Z, Fogel DB (2004) How to solve it: Modern Heuristics, 2nd edn. Springer, Berlin Michalewicz Z, Fogel DB (2004) How to solve it: Modern Heuristics, 2nd edn. Springer, Berlin
Zurück zum Zitat Moore RA Jr (1996) Controlled data-swapping techniques for masking public use microdata sets. Research report, RR 96-04, Statistical Research Division Report Series, US Bureau of the Census Moore RA Jr (1996) Controlled data-swapping techniques for masking public use microdata sets. Research report, RR 96-04, Statistical Research Division Report Series, US Bureau of the Census
Zurück zum Zitat Nin J, Herranz J, Torra V (2008a) On the disclosure risk of multivariate microaggregation. Data Knowl Eng 67(3):399–412CrossRef Nin J, Herranz J, Torra V (2008a) On the disclosure risk of multivariate microaggregation. Data Knowl Eng 67(3):399–412CrossRef
Zurück zum Zitat Nin J, Herranz J, Torra V (2008b) Rethinking rank swapping to decrease disclosure risk. Data Knowl Eng 64(1):346–364CrossRef Nin J, Herranz J, Torra V (2008b) Rethinking rank swapping to decrease disclosure risk. Data Knowl Eng 64(1):346–364CrossRef
Zurück zum Zitat Rechenberg I (1970) Evolutions strategie: optimierung technischer systeme nach prinzipien der biologischen information. PhD thesis, Technical University of Berlin, reprinted by Fromman Verlag, Freiburg, Germany, 1973 Rechenberg I (1970) Evolutions strategie: optimierung technischer systeme nach prinzipien der biologischen information. PhD thesis, Technical University of Berlin, reprinted by Fromman Verlag, Freiburg, Germany, 1973
Zurück zum Zitat Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027CrossRef Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027CrossRef
Zurück zum Zitat Schaffer JD, Caruana R, Eshelman LJ, Das R (1989) A study of control parameters affecting online performance of genetic algorithms for function optimization. In: Schaffer JD (ed) ICGA, Morgan Kaufmann, pp 51–60 Schaffer JD, Caruana R, Eshelman LJ, Das R (1989) A study of control parameters affecting online performance of genetic algorithms for function optimization. In: Schaffer JD (ed) ICGA, Morgan Kaufmann, pp 51–60
Zurück zum Zitat Schwefel HP (1981) Numerical optimization of computer models (Tr. from German to English). Wiley, Chichester Schwefel HP (1981) Numerical optimization of computer models (Tr. from German to English). Wiley, Chichester
Zurück zum Zitat Sebé F, Domingo-Ferrer J, Mateo JM, Torra V (2002) Post-masking optimization of the tradeoff between information loss and disclosure risk in masked microdata sets. In: Inference control in statistical databases: from theory to practice, LNCS, vol 2316. Springer, Berlin, pp 163–171 Sebé F, Domingo-Ferrer J, Mateo JM, Torra V (2002) Post-masking optimization of the tradeoff between information loss and disclosure risk in masked microdata sets. In: Inference control in statistical databases: from theory to practice, LNCS, vol 2316. Springer, Berlin, pp 163–171
Zurück zum Zitat Solanas A (2008) Privacy protection with genetic algorithms. In: Ang Yang LTB Yin Shan (ed) Success in evolutionary computation, Studies in computational intelligence series. Springer, Berlin, pp 215–239 Solanas A (2008) Privacy protection with genetic algorithms. In: Ang Yang LTB Yin Shan (ed) Success in evolutionary computation, Studies in computational intelligence series. Springer, Berlin, pp 215–239
Zurück zum Zitat Willenborg L, de Waal T (1996) Statistical disclosure control in practice. Springer, Berlin Willenborg L, de Waal T (1996) Statistical disclosure control in practice. Springer, Berlin
Zurück zum Zitat Yancey WE, Winkler WE, Creecy RH (2002) Disclosure risk assessment in perturbative microdata protection. In: Inference control in statistical databases: from theory to practice, LNCS, vol 2316. Springer, Berlin, pp 135–152 Yancey WE, Winkler WE, Creecy RH (2002) Disclosure risk assessment in perturbative microdata protection. In: Inference control in statistical databases: from theory to practice, LNCS, vol 2316. Springer, Berlin, pp 135–152
Metadaten
Titel
An evolutionary approach to enhance data privacy
verfasst von
Javier Jiménez
Jordi Marés
Vicenç Torra
Publikationsdatum
01.07.2011
Verlag
Springer-Verlag
Erschienen in
Soft Computing / Ausgabe 7/2011
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-010-0672-1

Weitere Artikel der Ausgabe 7/2011

Soft Computing 7/2011 Zur Ausgabe