Skip to main content
Erschienen in: Knowledge and Information Systems 3/2016

01.06.2016 | Regular Paper

A transversal hypergraph approach for the frequent itemset hiding problem

verfasst von: Elias C. Stavropoulos, Vassilios S. Verykios, Vasileios Kagklis

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We propose a methodology for hiding all sensitive frequent itemsets in a transaction database. Our methodology relies on a novel technique that enumerates the minimal transversals of a hypergraph in order to induce the ideal border between frequent and sensitive itemsets. The ideal border is then utilized to formulate an integer linear program (ILP) that answers whether a feasible sanitized database that attains the ideal border, exists. The solution of the program identifies the set of transactions that need to be modified (sanitized) so that the hiding can be achieved with the maximum accuracy. If no solution exists, we modify the ILP by relaxing the constraints needed to be satisfied so that the sanitized database preserves the privacy with guarantee but with minimum effect in data quality. Experimental evaluation of the proposed approach on a number of real datasets has shown that the produced sanitized databases exhibit higher accuracy when compared with the solutions of other well-known approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aggarwal CC, Yu PS (eds) (2008) Privacy-preserving data mining: models and algorithms. Advances in database systems. Springer, New York Aggarwal CC, Yu PS (eds) (2008) Privacy-preserving data mining: models and algorithms. Advances in database systems. Springer, New York
2.
Zurück zum Zitat Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases (VLDB’94), pp 487–499 Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases (VLDB’94), pp 487–499
3.
Zurück zum Zitat Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM-SIGMOD international conference on management of data (SIGMOD 2000), pp 439–450 Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM-SIGMOD international conference on management of data (SIGMOD 2000), pp 439–450
4.
Zurück zum Zitat Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios V (1999) Disclosure limitation of sensitive rules. In: Proceedings of the knowledge and data engineering exchange (KDEX’99), pp 45–52 Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios V (1999) Disclosure limitation of sensitive rules. In: Proceedings of the knowledge and data engineering exchange (KDEX’99), pp 45–52
5.
Zurück zum Zitat Bailey J, Manoukian T, Ramamohanarao K (2003) A fast algorithm for computing hypergraph transversals and its application in mining emerging patterns. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM 2003), pp 485–488. IEEE computer Society, Dec 2003 Bailey J, Manoukian T, Ramamohanarao K (2003) A fast algorithm for computing hypergraph transversals and its application in mining emerging patterns. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM 2003), pp 485–488. IEEE computer Society, Dec 2003
6.
Zurück zum Zitat Bayardo R (1998) Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM-SIGMOD international conference on management of data (SIGMOD’98), pp 85–93 Bayardo R (1998) Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM-SIGMOD international conference on management of data (SIGMOD’98), pp 85–93
7.
Zurück zum Zitat Berge C (1989) Hypergraphs: combinatorics of finite sets, volume 45 of North Holland mathematical library. Elsevier Science Publishers B.V., Amsterdam Berge C (1989) Hypergraphs: combinatorics of finite sets, volume 45 of North Holland mathematical library. Elsevier Science Publishers B.V., Amsterdam
8.
Zurück zum Zitat Bodon F (2003) A fast APRIORI implementation. In: Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations (FIMI’03), vol 90, pp 56–65 Bodon F (2003) A fast APRIORI implementation. In: Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations (FIMI’03), vol 90, pp 56–65
9.
Zurück zum Zitat Bonchi F, Ferrari E (2011) Privacy-aware knowledge discovery: novel applications and new techniques. Chapman & Hall/CRC data mining and knowledge discovery series. CRC Press INC Bonchi F, Ferrari E (2011) Privacy-aware knowledge discovery: novel applications and new techniques. Chapman & Hall/CRC data mining and knowledge discovery series. CRC Press INC
10.
Zurück zum Zitat Borgelt C (2012) Frequent item set mining. Wiley Interdiscip Rev: Data Min Knowl Discov 2(6):437–456 Borgelt C (2012) Frequent item set mining. Wiley Interdiscip Rev: Data Min Knowl Discov 2(6):437–456
11.
Zurück zum Zitat Boros E, Elbassioni K, Gurvich V, Khachiyan L (2003) An efficient implementation of a quasi-polynomial algorithm for generating hypergraph transversals. In: Proceedings of the 11th annual European symposium on algorithms (ESA 2003), vol 2432 of LNCS, 556–567 Boros E, Elbassioni K, Gurvich V, Khachiyan L (2003) An efficient implementation of a quasi-polynomial algorithm for generating hypergraph transversals. In: Proceedings of the 11th annual European symposium on algorithms (ESA 2003), vol 2432 of LNCS, 556–567
12.
Zurück zum Zitat Boros E, Elbassioni K, Makino K (2008) On Berge multiplication for monotone boolean dualization. In: Proceedings of the 35th international colloquium on automata, languages and programming (ICALP 2008), volume 5125 of LNCS, 48–59 Boros E, Elbassioni K, Makino K (2008) On Berge multiplication for monotone boolean dualization. In: Proceedings of the 35th international colloquium on automata, languages and programming (ICALP 2008), volume 5125 of LNCS, 48–59
13.
Zurück zum Zitat Boros E, Gurvich V, Khachiyan L, Makino K (2003) On maximal frequent and minimal infrequent sets in binary matrices. Ann Math Artif Intell 39(3):211–221MathSciNetCrossRefMATH Boros E, Gurvich V, Khachiyan L, Makino K (2003) On maximal frequent and minimal infrequent sets in binary matrices. Ann Math Artif Intell 39(3):211–221MathSciNetCrossRefMATH
14.
Zurück zum Zitat Brijs T, Swinnen G, Vanhoof K, Wets G (1999) Using association rules for product assortment decisions: a case study. In: proceedings of the 5th ACM-SIGKDD international conference on knowledge discovery and data mining (KDD’99), pp 254–260 Brijs T, Swinnen G, Vanhoof K, Wets G (1999) Using association rules for product assortment decisions: a case study. In: proceedings of the 5th ACM-SIGKDD international conference on knowledge discovery and data mining (KDD’99), pp 254–260
15.
Zurück zum Zitat Bu S, Lakshmanan LVS, Ng RT, Ramesh G (2007) Preservation of patterns and input–output privacy. In: Proceedings of the IEEE 23rd international conference on data engineering (ICDE 2007), pp 696–705 Bu S, Lakshmanan LVS, Ng RT, Ramesh G (2007) Preservation of patterns and input–output privacy. In: Proceedings of the IEEE 23rd international conference on data engineering (ICDE 2007), pp 696–705
16.
Zurück zum Zitat Calders T (2004) Computational complexity on itemset frequency satisfiability. In: Proceedings of symposium on principles of database systems 2004 (PODS’04), pp 143–154 Calders T (2004) Computational complexity on itemset frequency satisfiability. In: Proceedings of symposium on principles of database systems 2004 (PODS’04), pp 143–154
17.
18.
Zurück zum Zitat Clifton C (1999) Protecting against data mining through samples. In: Proceedings of the 13th international conference on database security (DBSec’99), pp 193–207 Clifton C (1999) Protecting against data mining through samples. In: Proceedings of the 13th international conference on database security (DBSec’99), pp 193–207
19.
Zurück zum Zitat Dong G, Li J (2005) Mining border descriptions of emerging patterns from dataset pairs. Knowl Info Syst 8(2):178–202CrossRef Dong G, Li J (2005) Mining border descriptions of emerging patterns from dataset pairs. Knowl Info Syst 8(2):178–202CrossRef
20.
Zurück zum Zitat Eiter T, Gottlob G (1995) Identifying the minimal transversals of a hypergraph and related problems. SIAM J Comput 24(6):1278–1304MathSciNetCrossRefMATH Eiter T, Gottlob G (1995) Identifying the minimal transversals of a hypergraph and related problems. SIAM J Comput 24(6):1278–1304MathSciNetCrossRefMATH
21.
Zurück zum Zitat Eiter T, Gottlob G (2002) Hypergraph transversal computation and related problems in Logic and AI. In: Proceedings of European conference on logic in AI (JELIA 2002), vol 2424 of LNCS/LNAI, pp 549–564 Eiter T, Gottlob G (2002) Hypergraph transversal computation and related problems in Logic and AI. In: Proceedings of European conference on logic in AI (JELIA 2002), vol 2424 of LNCS/LNAI, pp 549–564
22.
Zurück zum Zitat Eiter T, Gottlob G, Makino K (2003) New results on monotone dualization and generating hypergraph transversals. SIAM J Comput 32(2):514–537MathSciNetCrossRefMATH Eiter T, Gottlob G, Makino K (2003) New results on monotone dualization and generating hypergraph transversals. SIAM J Comput 32(2):514–537MathSciNetCrossRefMATH
23.
Zurück zum Zitat Evfimievski AV, Srikant R, Agrawal R, Gehrke J (2004) Privacy preserving mining of association rules. Info Syst 29(4):343–364CrossRef Evfimievski AV, Srikant R, Agrawal R, Gehrke J (2004) Privacy preserving mining of association rules. Info Syst 29(4):343–364CrossRef
24.
Zurück zum Zitat Faloutsos C, Megalooikonomou V (2007) On data mining, compression, and Kolmogorov complexity. Data Min Knowl Discov 15(1):3–20MathSciNetCrossRef Faloutsos C, Megalooikonomou V (2007) On data mining, compression, and Kolmogorov complexity. Data Min Knowl Discov 15(1):3–20MathSciNetCrossRef
26.
Zurück zum Zitat Fredman ML, Khachiyan L (1996) On the complexity of dualization of monotone disjunctive normal forms. J Algorithm 21:618–628MathSciNetCrossRefMATH Fredman ML, Khachiyan L (1996) On the complexity of dualization of monotone disjunctive normal forms. J Algorithm 21:618–628MathSciNetCrossRefMATH
27.
Zurück zum Zitat Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv 42(4):571–588CrossRef Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv 42(4):571–588CrossRef
28.
Zurück zum Zitat Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman and Company, San FranciscoMATH Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman and Company, San FranciscoMATH
30.
Zurück zum Zitat Gkoulalas-Divanis A, Verykios VS (2009) Hiding sensitive knowledge without side effects. Knowl Info Syst 20(3):263–299CrossRef Gkoulalas-Divanis A, Verykios VS (2009) Hiding sensitive knowledge without side effects. Knowl Info Syst 20(3):263–299CrossRef
32.
Zurück zum Zitat Gottlob G (2013) Deciding monotone duality and identifying frequent itemsets in quadratic logspace. Technical report arxiv:1212.1881v3 [cs.DC] Gottlob G (2013) Deciding monotone duality and identifying frequent itemsets in quadratic logspace. Technical report arxiv:​1212.​1881v3 [cs.DC]
33.
Zurück zum Zitat Gunopulos D, Khardon R, Mannila H, Saluja S, Sharma HTR (2003) Discovering all most specific sentences. ACM Trans Database Syst 28(2):140–174CrossRef Gunopulos D, Khardon R, Mannila H, Saluja S, Sharma HTR (2003) Discovering all most specific sentences. ACM Trans Database Syst 28(2):140–174CrossRef
34.
Zurück zum Zitat Gurvich V, Khachiyan L (1999) On generating the irredundant conjunctive and disjunctive normal forms of monotone Boolean functions. Discret Appl Math 96–97:363–373MathSciNetCrossRefMATH Gurvich V, Khachiyan L (1999) On generating the irredundant conjunctive and disjunctive normal forms of monotone Boolean functions. Discret Appl Math 96–97:363–373MathSciNetCrossRefMATH
35.
Zurück zum Zitat Guzzo A, Moccia L, Saccà D, Serra E (2013) Solving inverse frequent itemset mining with infrequency constraints via large-scale linear programs. ACM Trans Knowl Discov Data 7(4), Article 18, 1–39 Guzzo A, Moccia L, Saccà D, Serra E (2013) Solving inverse frequent itemset mining with infrequency constraints via large-scale linear programs. ACM Trans Knowl Discov Data 7(4), Article 18, 1–39
36.
Zurück zum Zitat Guzzo A, Saccà D, Serra E (2009) An effective approach to inverse frequent set mining. In: Proceedings of the 9th IEEE international conference on data mining (ICDM’09), pp 806–811 Guzzo A, Saccà D, Serra E (2009) An effective approach to inverse frequent set mining. In: Proceedings of the 9th IEEE international conference on data mining (ICDM’09), pp 806–811
37.
38.
40.
Zurück zum Zitat Kagklis V, Verykios VS, Tzimas G, Tsakalidis AK (2014) An integer linear programming scheme to sanitize sensitive frequent itemsets. In: Proceedings of 2014 IEEE international conference on tools with AI (ICTAI 2014), 2014. To appear Kagklis V, Verykios VS, Tzimas G, Tsakalidis AK (2014) An integer linear programming scheme to sanitize sensitive frequent itemsets. In: Proceedings of 2014 IEEE international conference on tools with AI (ICTAI 2014), 2014. To appear
41.
Zurück zum Zitat Kantarcioglu M, Jin J, Clifton C (2004) When do data mining results violate privacy? In: Proceedings of the 10th ACM-SIGKDD international conference on knowledge discovery and data mining (KDD’04), pp 599–604 Kantarcioglu M, Jin J, Clifton C (2004) When do data mining results violate privacy? In: Proceedings of the 10th ACM-SIGKDD international conference on knowledge discovery and data mining (KDD’04), pp 599–604
42.
43.
Zurück zum Zitat Kavvadias DJ, Stavropoulos EC (2005) An efficient algorithm for the transversal hypergraph generation. J Graph Algorithms Appl 9(2):239–264MathSciNetCrossRefMATH Kavvadias DJ, Stavropoulos EC (2005) An efficient algorithm for the transversal hypergraph generation. J Graph Algorithms Appl 9(2):239–264MathSciNetCrossRefMATH
45.
Zurück zum Zitat Leloglu E, Ayav T, Ergenc B (2014) Coefficient-based exact approach for frequent itemset hiding. In: eKNOW2014: The 6th international conference on information, process, and knowledge management, pp 124–130 Leloglu E, Ayav T, Ergenc B (2014) Coefficient-based exact approach for frequent itemset hiding. In: eKNOW2014: The 6th international conference on information, process, and knowledge management, pp 124–130
46.
Zurück zum Zitat Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Discov 1:241–258CrossRef Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Discov 1:241–258CrossRef
47.
Zurück zum Zitat Menon S, Sarkar S, Mukherjee S (2005) Maximizing accuracy of shared databases when concealing sensitive patterns. Info Syst Res 16(3):256–270CrossRef Menon S, Sarkar S, Mukherjee S (2005) Maximizing accuracy of shared databases when concealing sensitive patterns. Info Syst Res 16(3):256–270CrossRef
48.
Zurück zum Zitat Mielikäinen T (2003) On inverse frequent set mining problems. In: Proceedings of the 2nd workshop on privacy preserving data mining (PPDM’03), pp 18–33 Mielikäinen T (2003) On inverse frequent set mining problems. In: Proceedings of the 2nd workshop on privacy preserving data mining (PPDM’03), pp 18–33
49.
Zurück zum Zitat Moustakides GV, Verykios VS (2008) A maxmin approach for hiding frequent itemsets. Data Knowl Eng 65(1):75–89CrossRef Moustakides GV, Verykios VS (2008) A maxmin approach for hiding frequent itemsets. Data Knowl Eng 65(1):75–89CrossRef
50.
51.
Zurück zum Zitat Rizvi S, Haritsa JR (2002) Maintaining data privacy in association rule mining. In: Proceedings of the 28th international conference on very large data bases (VLDB’02), pp 682–693 Rizvi S, Haritsa JR (2002) Maintaining data privacy in association rule mining. In: Proceedings of the 28th international conference on very large data bases (VLDB’02), pp 682–693
52.
Zurück zum Zitat Sun X, Yu P (2005) A border–based approach for hiding sensitive frequent itemsets. In: Proceedings of 5th IEEE internationa conference on data mining (ICDM 2005), pp 426–433 Sun X, Yu P (2005) A border–based approach for hiding sensitive frequent itemsets. In: Proceedings of 5th IEEE internationa conference on data mining (ICDM 2005), pp 426–433
53.
Zurück zum Zitat Sun X, Yu PS (2007) Hiding sensitive frequent itemsets by a border-based approach. J Comput Sci Eng 1(1):74–94CrossRef Sun X, Yu PS (2007) Hiding sensitive frequent itemsets by a border-based approach. J Comput Sci Eng 1(1):74–94CrossRef
54.
Zurück zum Zitat Sweeney L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzziness Knowl Based Syst 10(5):571–588MathSciNetCrossRefMATH Sweeney L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzziness Knowl Based Syst 10(5):571–588MathSciNetCrossRefMATH
55.
56.
Zurück zum Zitat Takata K (2007) A worst-case analysis of the sequential method to list the minimal hitting sets of a hypergraph. SIAM J Discret Math 21(4):936–946MathSciNetCrossRefMATH Takata K (2007) A worst-case analysis of the sequential method to list the minimal hitting sets of a hypergraph. SIAM J Discret Math 21(4):936–946MathSciNetCrossRefMATH
Metadaten
Titel
A transversal hypergraph approach for the frequent itemset hiding problem
verfasst von
Elias C. Stavropoulos
Vassilios S. Verykios
Vasileios Kagklis
Publikationsdatum
01.06.2016
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 3/2016
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-015-0862-3

Weitere Artikel der Ausgabe 3/2016

Knowledge and Information Systems 3/2016 Zur Ausgabe