Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 1/2017

15.03.2016

TBM, a transformation based method for microaggregation of large volume mixed data

verfasst von: Mostafa Salari, Saeed Jalili, Reza Mortazavi

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 1/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Due to recent advances in data collection and processing, data publishing has emerged by some organizations for scientific and commercial purposes. Published data should be anonymized such that staying useful while the privacy of data respondents is preserved. Microaggregation is a popular mechanism for data anonymization, but naturally operates on numerical datasets. However, the type of data in the real world is usually mixed i.e., there are both numeric and categorical attributes together. In this paper, we propose a novel transformation based method for microaggregation of mixed data called TBM. The method uses multidimensional scaling to generate a numeric equivalent from mixed dataset. The partitioning step of microaggregation is performed on the equivalent dataset but the aggregation step on the original data. TBM can microaggregate large mixed datasets in a short time with low information loss. Experimental results show that the proposed method attains better trade-off between data utility and privacy in a shorter time in comparison with the traditional methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Microaggregation with minimum information loss.
 
2
The definition of LCS in ontology is similar to CCG in VGH.
 
Literatur
Zurück zum Zitat Abril D, Navarro-Arribas G, Torra V (2010a) Towards privacy preserving information retrieval through semantic microaggregation. In: 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, pp 296–299. IEEE, Piscataway Abril D, Navarro-Arribas G, Torra V (2010a) Towards privacy preserving information retrieval through semantic microaggregation. In: 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, pp 296–299. IEEE, Piscataway
Zurück zum Zitat Abril D, Navarro-Arribas G, Torra V (2010b) Towards semantic microaggregation of categorical data for confidential documents. Modeling decisions for artificial intelligence. Springer, Heidelberg, pp 266–276CrossRef Abril D, Navarro-Arribas G, Torra V (2010b) Towards semantic microaggregation of categorical data for confidential documents. Modeling decisions for artificial intelligence. Springer, Heidelberg, pp 266–276CrossRef
Zurück zum Zitat Alpaydin E (2010) Introduction to machine learning, 2nd edn. The MIT Press, LondonMATH Alpaydin E (2010) Introduction to machine learning, 2nd edn. The MIT Press, LondonMATH
Zurück zum Zitat Bai L, Liang J, Dang C (2011) An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data. Knowl Based Syst 24(6):785–795CrossRef Bai L, Liang J, Dang C (2011) An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data. Knowl Based Syst 24(6):785–795CrossRef
Zurück zum Zitat Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517CrossRefMATH Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517CrossRefMATH
Zurück zum Zitat Cao F, Liang J, Li D, Bai L, Dang C (2012) A dissimilarity measure for the k-modes clustering algorithm. Knowl Based Syst 26:120–127CrossRef Cao F, Liang J, Li D, Bai L, Dang C (2012) A dissimilarity measure for the k-modes clustering algorithm. Knowl Based Syst 26:120–127CrossRef
Zurück zum Zitat Chettri S, Borah B (2012) MDAV2K: a variable-size microaggregation technique for privacy preservation. In: International conference on information technology convergence and services, pp 105–118 Chettri S, Borah B (2012) MDAV2K: a variable-size microaggregation technique for privacy preservation. In: International conference on information technology convergence and services, pp 105–118
Zurück zum Zitat Chettri S, Borah B (2013) An efficient microaggregation method for protecting mixed data. Computer networks and communications (NetCom). Springer, New York, pp 551–561CrossRef Chettri S, Borah B (2013) An efficient microaggregation method for protecting mixed data. Computer networks and communications (NetCom). Springer, New York, pp 551–561CrossRef
Zurück zum Zitat Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212MathSciNetCrossRef Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212MathSciNetCrossRef
Zurück zum Zitat Domingo-Ferrer J, Martínez-Ballesté A, Mateo-Sanz JM, Sebé F (2006) Efficient multivariate data-oriented microaggregation. Int J Very Large Data Bases 15(4):355–369CrossRef Domingo-Ferrer J, Martínez-Ballesté A, Mateo-Sanz JM, Sebé F (2006) Efficient multivariate data-oriented microaggregation. Int J Very Large Data Bases 15(4):355–369CrossRef
Zurück zum Zitat Fayyoumi E, Oommen BJ (2009) Achieving microaggregation for secure statistical databases using fixed-structure partitioning-based learning automata. IEEE Trans Syst Man Cybern B 39(5):1192–1205CrossRef Fayyoumi E, Oommen BJ (2009) Achieving microaggregation for secure statistical databases using fixed-structure partitioning-based learning automata. IEEE Trans Syst Man Cybern B 39(5):1192–1205CrossRef
Zurück zum Zitat Ghinita G, Karras P, Kalnis P, Mamoulis N (2007) Fast data anonymization with low information loss. In: Proceedings of the 33rd international conference on Very large data bases, VLDB Endowment, pp 758–769 Ghinita G, Karras P, Kalnis P, Mamoulis N (2007) Fast data anonymization with low information loss. In: Proceedings of the 33rd international conference on Very large data bases, VLDB Endowment, pp 758–769
Zurück zum Zitat Guzman-Arenas A, Cuevas AD, Jimenez A (2011) The centroid or consensus of a set of objects with qualitative attributes. Expert Syst Appl 38(5):4908–4919CrossRef Guzman-Arenas A, Cuevas AD, Jimenez A (2011) The centroid or consensus of a set of objects with qualitative attributes. Expert Syst Appl 38(5):4908–4919CrossRef
Zurück zum Zitat Han J, Yu J, Mo Y, Lu J, Liu H (2014) Mage: a semantics retaining k-anonymization method for mixed data. Knowl Based Syst 55:75–86CrossRef Han J, Yu J, Mo Y, Lu J, Liu H (2014) Mage: a semantics retaining k-anonymization method for mixed data. Knowl Based Syst 55:75–86CrossRef
Zurück zum Zitat Hansen SL, Mukherjee S (2003) A polynomial algorithm for optimal univariate microaggregation. IEEE Trans Knowl Data Eng 15(4):1043–1044CrossRef Hansen SL, Mukherjee S (2003) A polynomial algorithm for optimal univariate microaggregation. IEEE Trans Knowl Data Eng 15(4):1043–1044CrossRef
Zurück zum Zitat Huang Z (1997) Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD), Singapore, pp 21–34 Huang Z (1997) Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD), Singapore, pp 21–34
Zurück zum Zitat Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304CrossRef Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304CrossRef
Zurück zum Zitat Jiang W, Clifton C (2006) A secure distributed framework for achieving k-anonymity. Int J Very Large Data Bases 15(4):316–333CrossRef Jiang W, Clifton C (2006) A secure distributed framework for achieving k-anonymity. Int J Very Large Data Bases 15(4):316–333CrossRef
Zurück zum Zitat Juan Y, Jianmin H, Jianmin C, Zanzhu X (2009) TopDown-KACA: an efficient local-recoding algorithm for k-anonymity. In: IEEE international conference on granular computing, GRC’09, pp 727–732. IEEE, Piscataway Juan Y, Jianmin H, Jianmin C, Zanzhu X (2009) TopDown-KACA: an efficient local-recoding algorithm for k-anonymity. In: IEEE international conference on granular computing, GRC’09, pp 727–732. IEEE, Piscataway
Zurück zum Zitat Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Tran Knowl Data Eng 17(7):902–911CrossRef Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Tran Knowl Data Eng 17(7):902–911CrossRef
Zurück zum Zitat Li J, Wong RCW, Fu AWC, Pei J (2006) Achieving k-anonymity by clustering in attribute hierarchical structures. In: Tjoa AM, Trujillo J (eds) DaWaK 2006. Springer, Berlin Heidelberg, pp 405–416 Li J, Wong RCW, Fu AWC, Pei J (2006) Achieving k-anonymity by clustering in attribute hierarchical structures. In: Tjoa AM, Trujillo J (eds) DaWaK 2006. Springer, Berlin Heidelberg, pp 405–416
Zurück zum Zitat Li N, Li T, Venkatasubramanian S (2007) t-closeness: Privacy beyond k-anonymity and l-diversity. In: Proceedings of the 21st IEEE International Conference on Data Engineering (ICDE), vol 7, pp 106–115 Li N, Li T, Venkatasubramanian S (2007) t-closeness: Privacy beyond k-anonymity and l-diversity. In: Proceedings of the 21st IEEE International Conference on Data Engineering (ICDE), vol 7, pp 106–115
Zurück zum Zitat Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) l-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1):3CrossRef Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) l-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1):3CrossRef
Zurück zum Zitat Martínez S, Sánchez D, Valls A (2012) Semantic adaptive microaggregation of categorical microdata. Comput Secur 31(5):653–672CrossRef Martínez S, Sánchez D, Valls A (2012) Semantic adaptive microaggregation of categorical microdata. Comput Secur 31(5):653–672CrossRef
Zurück zum Zitat Martínez S, Valls A, Snchez D (2012) Semantically-grounded construction of centroids for datasets with textual attributes. Knowl Based Syst 35:160–172CrossRef Martínez S, Valls A, Snchez D (2012) Semantically-grounded construction of centroids for datasets with textual attributes. Knowl Based Syst 35:160–172CrossRef
Zurück zum Zitat Monreale A, Trasarti R, Pedreschi D, Renso C, Bogorny V (2011) C-safety: a framework for the anonymization of semantic trajectories. Trans Data Privacy 4(2):73–101MathSciNet Monreale A, Trasarti R, Pedreschi D, Renso C, Bogorny V (2011) C-safety: a framework for the anonymization of semantic trajectories. Trans Data Privacy 4(2):73–101MathSciNet
Zurück zum Zitat Mortazavi R, Jalili S (2014) Fast data-oriented microaggregation algorithm for large numerical datasets. Knowl Based Syst 67:195–205CrossRef Mortazavi R, Jalili S (2014) Fast data-oriented microaggregation algorithm for large numerical datasets. Knowl Based Syst 67:195–205CrossRef
Zurück zum Zitat Mortazavi R, Jalili S, Gohargazi H (2013) Multivariate microaggregation by iterative optimization. Appl Intell 39(3):529–544CrossRef Mortazavi R, Jalili S, Gohargazi H (2013) Multivariate microaggregation by iterative optimization. Appl Intell 39(3):529–544CrossRef
Zurück zum Zitat Pagliuca D, Seri G (1999) Some results of individual ranking method on the system of enterprise accounts annual survey. Esprit SDC Project, Deliverable MI-3 D 2:1999 Pagliuca D, Seri G (1999) Some results of individual ranking method on the system of enterprise accounts annual survey. Esprit SDC Project, Deliverable MI-3 D 2:1999
Zurück zum Zitat Samarati P (2001) Protecting respondents identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027CrossRef Samarati P (2001) Protecting respondents identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027CrossRef
Zurück zum Zitat Solanas A, Martínez-Ballesté A (2006) V-MDAV: Variable group size multivariate microaggregation. COMPSTAT2006 pp 917–925 Solanas A, Martínez-Ballesté A (2006) V-MDAV: Variable group size multivariate microaggregation. COMPSTAT2006 pp 917–925
Zurück zum Zitat Solé M, Muntés-Mulero V, Nin J (2012) Efficient microaggregation techniques for large numerical data volumes. Int J Inf Secur 11(4):253–267CrossRef Solé M, Muntés-Mulero V, Nin J (2012) Efficient microaggregation techniques for large numerical data volumes. Int J Inf Secur 11(4):253–267CrossRef
Zurück zum Zitat Ting-ting C, Jian-min H, Hui-qun Y, Juan Y (2008) An efficient microaggregation algorithm for mixed data. In: Proceedings of the international conference on computer science and software engineering, IEEE Computer Society 3:1053–1056 Ting-ting C, Jian-min H, Hui-qun Y, Juan Y (2008) An efficient microaggregation algorithm for mixed data. In: Proceedings of the international conference on computer science and software engineering, IEEE Computer Society 3:1053–1056
Zurück zum Zitat Torra V (2004) Microaggregation for categorical variables: a median based approach. Privacy in statistical databases. Springer, Heidelberg, pp 162–174CrossRef Torra V (2004) Microaggregation for categorical variables: a median based approach. Privacy in statistical databases. Springer, Heidelberg, pp 162–174CrossRef
Zurück zum Zitat Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 133–138 Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 133–138
Metadaten
Titel
TBM, a transformation based method for microaggregation of large volume mixed data
verfasst von
Mostafa Salari
Saeed Jalili
Reza Mortazavi
Publikationsdatum
15.03.2016
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 1/2017
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-016-0457-y

Weitere Artikel der Ausgabe 1/2017

Data Mining and Knowledge Discovery 1/2017 Zur Ausgabe