nach oben

Data Mining and Knowledge Discovery

Erschienen in:

15.03.2016

TBM, a transformation based method for microaggregation of large volume mixed data

verfasst von: Mostafa Salari, Saeed Jalili, Reza Mortazavi

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 1/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Due to recent advances in data collection and processing, data publishing has emerged by some organizations for scientific and commercial purposes. Published data should be anonymized such that staying useful while the privacy of data respondents is preserved. Microaggregation is a popular mechanism for data anonymization, but naturally operates on numerical datasets. However, the type of data in the real world is usually mixed i.e., there are both numeric and categorical attributes together. In this paper, we propose a novel transformation based method for microaggregation of mixed data called TBM. The method uses multidimensional scaling to generate a numeric equivalent from mixed dataset. The partitioning step of microaggregation is performed on the equivalent dataset but the aggregation step on the original data. TBM can microaggregate large mixed datasets in a short time with low information loss. Experimental results show that the proposed method attains better trade-off between data utility and privacy in a shorter time in comparison with the traditional methods.

Vorheriger Artikel Hierarchical evolving Dirichlet processes for modeling nonlinear evolutionary traces in temporal data

Nächster Artikel Adversarial classification using signaling games with an application to phishing detection

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Microaggregation with minimum information loss.

The definition of LCS in ontology is similar to CCG in VGH.

http://archive.ics.uci.edu/ml/datasets/Adult.

Abril D, Navarro-Arribas G, Torra V (2010a) Towards privacy preserving information retrieval through semantic microaggregation. In: 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, pp 296–299. IEEE, Piscataway

Abril D, Navarro-Arribas G, Torra V (2010b) Towards semantic microaggregation of categorical data for confidential documents. Modeling decisions for artificial intelligence. Springer, Heidelberg, pp 266–276CrossRef

Alpaydin E (2010) Introduction to machine learning, 2nd edn. The MIT Press, LondonMATH

Bai L, Liang J, Dang C (2011) An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data. Knowl Based Syst 24(6):785–795CrossRef

Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517CrossRefMATH

Cao F, Liang J, Li D, Bai L, Dang C (2012) A dissimilarity measure for the k-modes clustering algorithm. Knowl Based Syst 26:120–127CrossRef

Chettri S, Borah B (2012) MDAV2K: a variable-size microaggregation technique for privacy preservation. In: International conference on information technology convergence and services, pp 105–118

Chettri S, Borah B (2013) An efficient microaggregation method for protecting mixed data. Computer networks and communications (NetCom). Springer, New York, pp 551–561CrossRef

Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212MathSciNetCrossRef

Domingo-Ferrer J, Martínez-Ballesté A, Mateo-Sanz JM, Sebé F (2006) Efficient multivariate data-oriented microaggregation. Int J Very Large Data Bases 15(4):355–369CrossRef

Fayyoumi E, Oommen BJ (2009) Achieving microaggregation for secure statistical databases using fixed-structure partitioning-based learning automata. IEEE Trans Syst Man Cybern B 39(5):1192–1205CrossRef

Ghinita G, Karras P, Kalnis P, Mamoulis N (2007) Fast data anonymization with low information loss. In: Proceedings of the 33rd international conference on Very large data bases, VLDB Endowment, pp 758–769

Guzman-Arenas A, Cuevas AD, Jimenez A (2011) The centroid or consensus of a set of objects with qualitative attributes. Expert Syst Appl 38(5):4908–4919CrossRef

Han J, Yu J, Mo Y, Lu J, Liu H (2014) Mage: a semantics retaining k-anonymization method for mixed data. Knowl Based Syst 55:75–86CrossRef

Hansen SL, Mukherjee S (2003) A polynomial algorithm for optimal univariate microaggregation. IEEE Trans Knowl Data Eng 15(4):1043–1044CrossRef

Huang Z (1997) Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD), Singapore, pp 21–34

Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304CrossRef

Jiang W, Clifton C (2006) A secure distributed framework for achieving k-anonymity. Int J Very Large Data Bases 15(4):316–333CrossRef

Juan Y, Jianmin H, Jianmin C, Zanzhu X (2009) TopDown-KACA: an efficient local-recoding algorithm for k-anonymity. In: IEEE international conference on granular computing, GRC’09, pp 727–732. IEEE, Piscataway

Kokolakis G, Fouskakis D (2009) Importance partitioning in micro-aggregation. Comput Stat Data Anal 53(7):2439–2445MathSciNetCrossRefMATH

Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Tran Knowl Data Eng 17(7):902–911CrossRef

Li J, Wong RCW, Fu AWC, Pei J (2006) Achieving k-anonymity by clustering in attribute hierarchical structures. In: Tjoa AM, Trujillo J (eds) DaWaK 2006. Springer, Berlin Heidelberg, pp 405–416

Li N, Li T, Venkatasubramanian S (2007) t-closeness: Privacy beyond k-anonymity and l-diversity. In: Proceedings of the 21st IEEE International Conference on Data Engineering (ICDE), vol 7, pp 106–115

Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) l-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1):3CrossRef

Martínez S, Sánchez D, Valls A (2012) Semantic adaptive microaggregation of categorical microdata. Comput Secur 31(5):653–672CrossRef

Martínez S, Valls A, Snchez D (2012) Semantically-grounded construction of centroids for datasets with textual attributes. Knowl Based Syst 35:160–172CrossRef

Monreale A, Trasarti R, Pedreschi D, Renso C, Bogorny V (2011) C-safety: a framework for the anonymization of semantic trajectories. Trans Data Privacy 4(2):73–101MathSciNet

Mortazavi R, Jalili S (2014) Fast data-oriented microaggregation algorithm for large numerical datasets. Knowl Based Syst 67:195–205CrossRef

Mortazavi R, Jalili S, Gohargazi H (2013) Multivariate microaggregation by iterative optimization. Appl Intell 39(3):529–544CrossRef

Pagliuca D, Seri G (1999) Some results of individual ranking method on the system of enterprise accounts annual survey. Esprit SDC Project, Deliverable MI-3 D 2:1999

Samarati P (2001) Protecting respondents identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027CrossRef

Solanas A, Martínez-Ballesté A (2006) V-MDAV: Variable group size multivariate microaggregation. COMPSTAT2006 pp 917–925

Solé M, Muntés-Mulero V, Nin J (2012) Efficient microaggregation techniques for large numerical data volumes. Int J Inf Secur 11(4):253–267CrossRef

Ting-ting C, Jian-min H, Hui-qun Y, Juan Y (2008) An efficient microaggregation algorithm for mixed data. In: Proceedings of the international conference on computer science and software engineering, IEEE Computer Society 3:1053–1056

Torra V (2004) Microaggregation for categorical variables: a median based approach. Privacy in statistical databases. Springer, Heidelberg, pp 162–174CrossRef

Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 133–138

Titel: TBM, a transformation based method for microaggregation of large volume mixed data
verfasst von: Mostafa Salari
Saeed Jalili
Reza Mortazavi
Publikationsdatum: 15.03.2016
Verlag: Springer US
Erschienen in: Data Mining and Knowledge Discovery / Ausgabe 1/2017
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI: https://doi.org/10.1007/s10618-016-0457-y

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2017

Evidence-based uncertainty sampling for active learning

SimUSF: an efficient and effective similarity measure that is invariant to violations of the interval scale assumption

Generalizing DTW to the multi-dimensional case requires an adaptive approach

Reliable early classification of time series based on discriminating the classes over time

Outlying property detection with numerical attributes

Adversarial classification using signaling games with an application to phishing detection