Skip to main content
Top
Published in: Soft Computing 7/2011

01-07-2011 | Original Paper

An evolutionary approach to enhance data privacy

Authors: Javier Jiménez, Jordi Marés, Vicenç Torra

Published in: Soft Computing | Issue 7/2011

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Dissemination of data with sensitive information about individuals has an implicit risk of unauthorized disclosure. Perturbative masking methods propose the distortion of the original data sets before publication, tackling a difficult tradeoff between data utility (low information loss) and protection against disclosure (low disclosure risk). In this paper, we describe how information loss and disclosure risk measures can be integrated within an evolutionary algorithm to seek new and enhanced masking protections for continuous microdata. The proposed technique constitutes a hybrid approach that combines state-of-the-art protection methods with an evolutionary algorithm optimization. We also provide experimental results using three data sets in order to illustrate and empirically evaluate the application of this technique.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
The history of these kinds of algorithms goes into the early 1950s and is associated to different scientists, completely independent from each other (Michalewicz and Fogel 2004). Each procedure was slightly different, and some got names like evolutionary computation (Back et al. 2000), genetic algorithms (Holland 1975) or evolution strategies (Rechenberg 1970; Schwefel 1981). Through time, the different approaches borrowed, exchanged and modified ideas. Then the term evolutionary algorithm emerged to describe any of these algorithms, which is the denomination that we follow in this paper.
 
Literature
go back to reference Agrawal R, Srikant R (2000) Privacy preserving data mining. In: Proceedings of the ACM SIGMOD conference on management of data, pp 439–450 Agrawal R, Srikant R (2000) Privacy preserving data mining. In: Proceedings of the ACM SIGMOD conference on management of data, pp 439–450
go back to reference Back T, Fogel DB, Michalewicz Z (eds) (2000) Evolutionary computation. Advanced algorithms and operations, vol 2. Institute of Physics Publishing, Bristol Back T, Fogel DB, Michalewicz Z (eds) (2000) Evolutionary computation. Advanced algorithms and operations, vol 2. Institute of Physics Publishing, Bristol
go back to reference Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. In: IEEE proceedings of the 21st international conference on data engineering, ICDE, pp 217–228 Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. In: IEEE proceedings of the 21st international conference on data engineering, ICDE, pp 217–228
go back to reference Brand R, Domingo-Ferrer J, Mateo-Sanz JM (2002) Reference data sets to test and compare SDC methods for protection of numerical microdata. Unscheduled Deliverable, European Project IST–2000–25069 CASC Brand R, Domingo-Ferrer J, Mateo-Sanz JM (2002) Reference data sets to test and compare SDC methods for protection of numerical microdata. Unscheduled Deliverable, European Project IST–2000–25069 CASC
go back to reference Caruana RA, Schaffer JD (1988) Representation and hidden bias: Gray vs. binary coding for genetic algorithms. In: Proceedings of the 5th international conference on machine learning, Morgan Kaufmann, Los Altos, pp 153–161 Caruana RA, Schaffer JD (1988) Representation and hidden bias: Gray vs. binary coding for genetic algorithms. In: Proceedings of the 5th international conference on machine learning, Morgan Kaufmann, Los Altos, pp 153–161
go back to reference Defays D, Anwar MN (1995) Micro-aggregation: a generic method. In: Proceedings of the 2nd international symposium on statistical confidentiality, pp 69–78 Defays D, Anwar MN (1995) Micro-aggregation: a generic method. In: Proceedings of the 2nd international symposium on statistical confidentiality, pp 69–78
go back to reference Defays D, Nanopoulos P (1993) Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 1992 symposium on design and analysis of longitudinal surveys, pp 195–204 Defays D, Nanopoulos P (1993) Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 1992 symposium on design and analysis of longitudinal surveys, pp 195–204
go back to reference Dick G (2005) A comparison of localised and global niching methods. In: Proceedings of the 17th annual colloquium of the spatial information research centre, pp 91–101 Dick G (2005) A comparison of localised and global niching methods. In: Proceedings of the 17th annual colloquium of the spatial information research centre, pp 91–101
go back to reference Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201CrossRef Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201CrossRef
go back to reference Domingo-Ferrer J, Torra V (2001) A quantitative comparison of disclosure control methods for microdata. In: Doyle P, Lane JI, Theeuwes JJM, Zayatz LV (eds) Confidentiality, disclosure and data access: theory and practical applications for statistical agencies, Chap 6. Elsevier, Amsterdam, pp 111–133 Domingo-Ferrer J, Torra V (2001) A quantitative comparison of disclosure control methods for microdata. In: Doyle P, Lane JI, Theeuwes JJM, Zayatz LV (eds) Confidentiality, disclosure and data access: theory and practical applications for statistical agencies, Chap 6. Elsevier, Amsterdam, pp 111–133
go back to reference Domingo-Ferrer J, Torra V (2004) Disclosure risk assessment in statistical data protection. J Comput Appl Math 164:285–293MathSciNetCrossRef Domingo-Ferrer J, Torra V (2004) Disclosure risk assessment in statistical data protection. J Comput Appl Math 164:285–293MathSciNetCrossRef
go back to reference Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212MathSciNetCrossRef Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212MathSciNetCrossRef
go back to reference Domingo-Ferrer J, Mateo-Sanz JM, Torra V (2001) Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: New techniques and technologies for statistics: exchange of technology and know-how, ETK-NTTS’2001. Creta, Hersonissos, pp 807–826 Domingo-Ferrer J, Mateo-Sanz JM, Torra V (2001) Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: New techniques and technologies for statistics: exchange of technology and know-how, ETK-NTTS’2001. Creta, Hersonissos, pp 807–826
go back to reference Duncan GT, Fienberg SE, Krishnan R, Padman R, Roehrig SF (2001a) Disclosure limitation methods and information loss for tabular data. In: Doyle P, Lane JI, Theuwes JJM, Vatz L (eds) Confidentiality, disclosure and data access: theory and practical applications for statistical agencies, Chap 7. Elsevier, Amsterdam, pp 135–166 Duncan GT, Fienberg SE, Krishnan R, Padman R, Roehrig SF (2001a) Disclosure limitation methods and information loss for tabular data. In: Doyle P, Lane JI, Theuwes JJM, Vatz L (eds) Confidentiality, disclosure and data access: theory and practical applications for statistical agencies, Chap 7. Elsevier, Amsterdam, pp 135–166
go back to reference Duncan GT, Keller-McNulty SA, Stokes SL (2001b) Disclosure risk vs. data utility: the R-U confidentiality map. Technical report 121, National Institute of Statistical Sciences, NISS, North Carolina Duncan GT, Keller-McNulty SA, Stokes SL (2001b) Disclosure risk vs. data utility: the R-U confidentiality map. Technical report 121, National Institute of Statistical Sciences, NISS, North Carolina
go back to reference Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press (2nd edn, MIT Press, 1992) Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press (2nd edn, MIT Press, 1992)
go back to reference Iyengar VS (2002) Transforming data to satisfy privacy constraints. In: Proceedings of the Eigth ACM SIGKDD international conference on knowledge discovery and data mining, pp 279–288 Iyengar VS (2002) Transforming data to satisfy privacy constraints. In: Proceedings of the Eigth ACM SIGKDD international conference on knowledge discovery and data mining, pp 279–288
go back to reference Jiménez J, Torra V (2009a) JPEG-based microdata protection methods. Technical reports IIIA–TR–2009–06, IIIA-CSIC Jiménez J, Torra V (2009a) JPEG-based microdata protection methods. Technical reports IIIA–TR–2009–06, IIIA-CSIC
go back to reference Jiménez J, Torra V (2009b) Utility and risk of JPEG–based continuous microdata protection methods. In: IEEE Proceedings of the 4th international conference on availability, reliability and security, ARES Jiménez J, Torra V (2009b) Utility and risk of JPEG–based continuous microdata protection methods. In: IEEE Proceedings of the 4th international conference on availability, reliability and security, ARES
go back to reference Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans Knowl Data Eng 17(7):902–911CrossRef Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans Knowl Data Eng 17(7):902–911CrossRef
go back to reference LeFevre KR (2007) Anonymity in data publishing and distribution. PhD thesis, University of Wisconsin, Madison LeFevre KR (2007) Anonymity in data publishing and distribution. PhD thesis, University of Wisconsin, Madison
go back to reference Mahfoud SW (1992) Crowding and preselection revisited. Technical report 92004, Illinois Genetic Algorithms Laboratory (IlliGAL), University of Illinois, also in Parallel Problem Solving From Nature, PPSN, 2:27–36 Mahfoud SW (1992) Crowding and preselection revisited. Technical report 92004, Illinois Genetic Algorithms Laboratory (IlliGAL), University of Illinois, also in Parallel Problem Solving From Nature, PPSN, 2:27–36
go back to reference Mateo-Sanz JM, Domingo-Ferrer J, Sebé F (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min Knowl Discov 11(2):181–193MathSciNetCrossRef Mateo-Sanz JM, Domingo-Ferrer J, Sebé F (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min Knowl Discov 11(2):181–193MathSciNetCrossRef
go back to reference Michalewicz Z, Fogel DB (2004) How to solve it: Modern Heuristics, 2nd edn. Springer, Berlin Michalewicz Z, Fogel DB (2004) How to solve it: Modern Heuristics, 2nd edn. Springer, Berlin
go back to reference Moore RA Jr (1996) Controlled data-swapping techniques for masking public use microdata sets. Research report, RR 96-04, Statistical Research Division Report Series, US Bureau of the Census Moore RA Jr (1996) Controlled data-swapping techniques for masking public use microdata sets. Research report, RR 96-04, Statistical Research Division Report Series, US Bureau of the Census
go back to reference Nin J, Herranz J, Torra V (2008a) On the disclosure risk of multivariate microaggregation. Data Knowl Eng 67(3):399–412CrossRef Nin J, Herranz J, Torra V (2008a) On the disclosure risk of multivariate microaggregation. Data Knowl Eng 67(3):399–412CrossRef
go back to reference Nin J, Herranz J, Torra V (2008b) Rethinking rank swapping to decrease disclosure risk. Data Knowl Eng 64(1):346–364CrossRef Nin J, Herranz J, Torra V (2008b) Rethinking rank swapping to decrease disclosure risk. Data Knowl Eng 64(1):346–364CrossRef
go back to reference Rechenberg I (1970) Evolutions strategie: optimierung technischer systeme nach prinzipien der biologischen information. PhD thesis, Technical University of Berlin, reprinted by Fromman Verlag, Freiburg, Germany, 1973 Rechenberg I (1970) Evolutions strategie: optimierung technischer systeme nach prinzipien der biologischen information. PhD thesis, Technical University of Berlin, reprinted by Fromman Verlag, Freiburg, Germany, 1973
go back to reference Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027CrossRef Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027CrossRef
go back to reference Schaffer JD, Caruana R, Eshelman LJ, Das R (1989) A study of control parameters affecting online performance of genetic algorithms for function optimization. In: Schaffer JD (ed) ICGA, Morgan Kaufmann, pp 51–60 Schaffer JD, Caruana R, Eshelman LJ, Das R (1989) A study of control parameters affecting online performance of genetic algorithms for function optimization. In: Schaffer JD (ed) ICGA, Morgan Kaufmann, pp 51–60
go back to reference Schwefel HP (1981) Numerical optimization of computer models (Tr. from German to English). Wiley, Chichester Schwefel HP (1981) Numerical optimization of computer models (Tr. from German to English). Wiley, Chichester
go back to reference Sebé F, Domingo-Ferrer J, Mateo JM, Torra V (2002) Post-masking optimization of the tradeoff between information loss and disclosure risk in masked microdata sets. In: Inference control in statistical databases: from theory to practice, LNCS, vol 2316. Springer, Berlin, pp 163–171 Sebé F, Domingo-Ferrer J, Mateo JM, Torra V (2002) Post-masking optimization of the tradeoff between information loss and disclosure risk in masked microdata sets. In: Inference control in statistical databases: from theory to practice, LNCS, vol 2316. Springer, Berlin, pp 163–171
go back to reference Solanas A (2008) Privacy protection with genetic algorithms. In: Ang Yang LTB Yin Shan (ed) Success in evolutionary computation, Studies in computational intelligence series. Springer, Berlin, pp 215–239 Solanas A (2008) Privacy protection with genetic algorithms. In: Ang Yang LTB Yin Shan (ed) Success in evolutionary computation, Studies in computational intelligence series. Springer, Berlin, pp 215–239
go back to reference Willenborg L, de Waal T (1996) Statistical disclosure control in practice. Springer, Berlin Willenborg L, de Waal T (1996) Statistical disclosure control in practice. Springer, Berlin
go back to reference Yancey WE, Winkler WE, Creecy RH (2002) Disclosure risk assessment in perturbative microdata protection. In: Inference control in statistical databases: from theory to practice, LNCS, vol 2316. Springer, Berlin, pp 135–152 Yancey WE, Winkler WE, Creecy RH (2002) Disclosure risk assessment in perturbative microdata protection. In: Inference control in statistical databases: from theory to practice, LNCS, vol 2316. Springer, Berlin, pp 135–152
Metadata
Title
An evolutionary approach to enhance data privacy
Authors
Javier Jiménez
Jordi Marés
Vicenç Torra
Publication date
01-07-2011
Publisher
Springer-Verlag
Published in
Soft Computing / Issue 7/2011
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-010-0672-1

Other articles of this Issue 7/2011

Soft Computing 7/2011 Go to the issue

Premium Partner