nach oben

Knowledge and Information Systems

Erschienen in:

01.06.2016 | Survey Paper

Constrained pattern mining in the new era

verfasst von: Andreia Silva, Cláudia Antunes

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Twenty years of research on frequent itemset mining, or pattern mining, has led to the existence of a set of efficient algorithms for identifying different types of patterns, from transactional to sequential. Despite the great advances in this field, big data brought a completely new context to operate, with new challenges arising from the growth in data size, dynamics and complexity. These challenges include the shift not only from static to dynamic data, but also from tabular to complex data sources, such as social networks (expressed as graphs) and data warehouses (expressed as multi-relational models). In this new context, and more than ever, users need effective ways to control the large number of discovered patterns, and to be able to choose what patterns to consider at each time. The most accepted and common approach to minimize these drawbacks has been to capture and represent the semantics of the domain through constraints, and use them not only to reduce the number of results, but also to focus the algorithms in areas where it is more likely to gain information and return more interesting results. The use of constraints in pattern mining has been widely studied, and there are a lot of proposed types of constraints and pushing strategies. In this paper, we present a new global view of the work done on the incorporation of constraints in the pattern mining process. In particular, we propose a new framework for constrained pattern mining, that allows us to organize and analyze existing algorithms and strategies, based on the different types and properties of constraints, and on the data sources they are able to handle.

Nächster Artikel Tracking the evolution of social emotions with topic models

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

In this work we adopt and extend the notation presented by Ng et al. [53].

Prefix-monotone constraints were first proposed with the name of convertible constraints [58]. Since we can convert other constraints using several approaches (like using relaxations), we use the term prefix-monotone to designate the constraints that are convertible due to the order of items.

A first draft of this algorithm was proposed in [57], with the name CFG (Constrained Frequent pattern-Growth).

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases (VLDB 94). Morgan Kaufmann, San Francisco, pp 487–499

Ahmed C, Tanbeer S, Jeong BS, Lee YK (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721CrossRef

Albert-Lorincz H, Boulicaut JF (2003) Mining frequent sequential patterns under regular expressions: a highly adaptive strategy for pushing constraints. In: Proceedings of the 3rd SIAM international conference on data mining (SDM 03). Springer, San Francisco, pp 316–320

Antunes C (2007) Onto4ar: a framework for mining association rules. In: Workshop on constraint-based mining and learning in the international conference on principles and practice of knowledge discovery in databases (PKDDW-CMILE 07). Springer, Warsaw, p 37

Antunes C (2008) An ontology-based framework for mining patterns in the presence of background knowledge. In: Proceedings of international conference on advanced intelligence (ICAI 08). Post and Telecom Press, Beijing, pp 163–168

Antunes C (2009) Mining patterns in the presence of domain knowledge. In: Proceedings of the 11th international conference on enterprise information systems (ICEIS 09). Springer, Milan, pp 188–193

Antunes C (2009) Pattern mining over star schemas in the onto4ar framework. In: Proceedings of the 2009 international workshop on semantic aspects in data mining (SADM 09). IEEE Computer Society, Washington, pp 453–458

Antunes C, Oliveira A (2002) Inference of sequential association rules guided by context-free grammars. In: Proceedings of 6th international conference on grammatical inference (ICGI 2002). Springer, Amsterdam, pp 289–293

Antunes C, Oliveira A (2003) Generalization of pattern-growth methods for sequential pattern mining with gap constraints. In: Proceedings of the 3rd international conference on machine learning and data mining in pattern recognition (MLDM 03). Springer, Leipzig, pp 239–251

10.

Antunes C, Oliveira A (2005) Constraint relaxations for discovering unknown sequential patterns. In: Knowledge discovery in inductive databases: 3rd international workshop, KDID 2004 (Revised Selected and Invited Papers), pp 11–32

11.

Antunes C, Oliveira AL (2004) Sequential pattern mining with approximated constraints. In: Proceedings of IADIS international applied computing conference (AC 04). IADIS Press, Lisbon, pp 131–138

12.

Bayardo RJ (2005) The hows, whys, and whens of constraints in itemset and rule discovery. In: Proceedings of the 2004 European conference on constraint-based mining and inductive databases. Springer, Hinterzarten, pp 1–13

13.

Bayardo RJ, Agrawal R (1999) Mining the most interesting rules. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 99). ACM, San Diego, pp 145–154

14.

Bonchi F, Giannotti F, Mazzanti A, Pedreschi D (2003) Adaptive constraint pushing in frequent pattern mining. In: Proceedings of the 7th conference on principles and practice of knowledge discovery in databases (PKDD 03). Springer, Berlin, pp 47–58

15.

Bonchi F, Giannotti F, Mazzanti A, Pedreschi D (2005) Exante: a preprocessing method for frequent-pattern mining. IEEE Intell Syst 20(3):25–31CrossRef

16.

Boulicaut JF (2004) Inductive databases and multiple uses of frequent itemsets: the cinq approach. In: Database support for data mining applications. Springer, Berlin, pp 1–23

17.

Boulicaut JF, Jeudy B (2000) Using constraints for itemset mining: Should we prune or not? In: Actes des 16èmes Journées Bases de Données Avancées (BDA 00). Blois, France

18.

Boulicaut JF, Jeudy B (2005) Constraint-based data mining. In: Maimon O, Rokach L (eds) The data mining and knowledge discovery handbook. Springer, Berlin, pp 399–416

19.

Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: generalizing association rules to correlations. SIGMOD Rec 26(2):265–276CrossRef

20.

Bucila C, Gehrke J, Kifer D, White WM (2003) Dualminer: a dual-pruning algorithm for itemsets with constraints. Data Min Knowl Discov 7(3):241–272MathSciNetCrossRef

21.

Cao L, Luo D, Zhang C (2007) Knowledge actionability: satisfying technical and business interestingness. Int J Bus Intell Data Min 2(4):496–514CrossRef

22.

Capelle M, Masson C, Boulicaut JF (2002) Mining frequent sequential patterns under a similarity constraint. In: Proceedings of the third international conference on intelligent data engineering and automated learning (IDEAL 02). Springer, London, pp 1–6

23.

Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: Third IEEE international conference on data mining (ICDM 03). IEEE, pp 19–26

24.

De Raedt L, Guns T, Nijssen S (2008) Constraint programming for itemset mining. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 08). ACM, New York, pp 204–212

25.

De Raedt L, Jaeger M, Lee S, Mannila H (2010) A theory of inductive query answering. In: Džeroski S, Goethals B, Panov P (eds) Inductive databases and constraint-based data mining. Springer, New York, pp 79–103CrossRef

26.

De Raedt L, Kramer S (2001) The levelwise version space algorithm and its application to molecular fragment finding. In: Proceedings of the 17th international joint conference on artificial intelligence—Volume 2 (IJCAI 01). Morgan Kaufmann Publishers Inc., Seattle, pp 853–859

27.

Dong G, Li, J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 99). ACM, San Diego, pp 43–52

28.

Džeroski S (2003) Multi-relational data mining: an introduction. SIGKDD Explor Newsl 5(1):1–16CrossRef

29.

Frawley WJ, Piatetsky-Shapiro G, Matheus CJ (1992) Knowledge discovery in databases: an overview. AI Mag 13(3):57–70

30.

Garofalakis MN, Rastogi R, Shim K (1999) Spirit: sequential pattern mining with regular expression constraints. In: Proceedings of the 25th international conference on very large data bases (VLDB 99). Morgan Kaufmann Publishers Inc., San Francisco, pp 223–234

31.

Giannella C, Han J, Pei J, Yan X, Yu PS (2003) Mining frequent patterns in data streams at multiple time granularities. In: Kargupta H, Joshi A, Sivakumar K, Yesha Y (eds) Data mining: next generation challenges and future directions. AAAI/MIT Press

32.

Grahne G, Lakshmanan LVS, Wang X (2000) Efficient mining of constrained correlated sets. In: Proceedings of 16th international conference on data engineering, pp 512–521

33.

Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–86MathSciNetCrossRef

34.

Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. The Morgan Kaufmann Series in Data Management Systems. Elsevier, Amsterdam

35.

Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD. ACM, New York, pp 1–12

36.

Jaroszewicz S, Scheffer T (2005) Fast discovery of unexpected patterns in data, relative to a bayesian network. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery in data mining (KDD 05). ACM, Chicago, pp 118–127

37.

Jaroszewicz S, Simovici DA (2004) Interestingness of frequent itemsets using bayesian networks as background knowledge. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 04). ACM, Seattle, pp 178–186

38.

Lent B, Swami A, Widom J (1997) Clustering association rules. In: Proceedings of the 13th international conference on data engineering (ICDE 97). IEEE Computer Society, Birmingham, pp 220–231

39.

Leung CKS, Brajczuk DA (2009) Efficient algorithms for mining constrained frequent patterns from uncertain data. In: Proceedings of the 1st ACM SIGKDD workshop on knowledge discovery from uncertain data (U 09). ACM, Paris, pp 9–18

40.

Leung CKS, Hao B, Brajczuk D (2010) Mining uncertain data for frequent itemsets that satisfy aggregate constraints. In: Proceedings of the 2010 ACM symposium on applied computing (SAC 10). ACM, Sierre, pp 1034–1038

41.

Leung CKS, Khan Q (2006) Efficient mining of constrained frequent patterns from streams. In: Proceedings of the 10th international database engineering and applications symposium (IDEAS 06), vol 0. IEEE Computer Society, Delhi, pp 61–68

42.

Leung CKS, Lakshmanan L, Ng R (2002) Exploiting succinct constraints using fp-trees. SIGKDD Explor Newsl 4(1):40–49CrossRef

43.

Leung CKS, Sun L (2012) A new class of constraints for constrained frequent pattern mining. In: Proceedings of the 27th annual ACM symposium on applied computing (SAC 12). ACM, Trento, pp 199–204

44.

Li YC, Yeh JS, Chang CC (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217CrossRef

45.

Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceedings of the 1998 international conference on knowledge discovery and data mining (KDD 98). AAAI Press, New York, pp 80–86

46.

Liu H, Lin Y, Han J (2011) Methods for mining frequent items in data streams: an overview. Knowl Inf Syst 26(1):1–30CrossRef

47.

Liu Y, Keng Liao W, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining (PAKDD 05). Springer, Berlin, pp 689–695

48.

Mabroukeh N, Ezeife C (2009) Semantic-rich markov models for web prefetching. In: Proceedings of the IEEE international conference on data mining workshops (ICDMW 09). Miami, pp 465–470

49.

Mabroukeh N, Ezeife C (2009) Using domain ontology for semantic web usage mining and next page prediction. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM 09). ACM, Hong Kong, pp 1677–1680

50.

Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 28th international conference on very large data bases (VLDB 02). Morgan Kaufman, Hong Kong, pp 346–357

51.

Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Discov 1(3):241–258CrossRef

52.

Mannila H, Toivonen H, Inkeri Verkamo A (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1(3):259–289CrossRef

53.

Ng R, Lakshmanan L, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data. ACM, Seattle, pp 13–24

54.

Nijssen S, Jiménez A, Guns T (2011) Constraint-based pattern mining in multi-relational databases. In: ICDM workshops. IEEE Computer Society, Vancouver, pp 1120–1127

55.

Özden B, Ramaswamy S, Silberschatz A (1998) Cyclic association rules. In: Proceedings of the 14th international conference on data engineering (ICDE 98). IEEE Computer Society, Washington, pp 412–421

56.

Padmanabhan B, Tuzhilin A (1998) A belief-driven method for discovering unexpected patterns. In: Proceedings of the 4th international conference on knowledge discovery in data mining (KDD 98). AAAI Press, pp 94–100

57.

Pei J, Han J (2000) Can we push more constraints into frequent pattern mining? In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining (KDD 00). ACM, Boston, pp 350–354

58.

Pei J, Han J (2002) Constrained frequent pattern mining: a pattern-growth view. SIGKDD Explor Newsl 4(1):31–39CrossRef

59.

Pei J, Han J, Lakshmanan LVS (2001) Mining frequent itemsets with convertible constraints. In: Proceedings of the 17th international conference on data engineering (ICDE 01). IEEE Computer Society, Washington, pp 433–442

60.

Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) Prefixspan: mining sequential patterns by prefix-projected growth. In: Proceedings of the 17th international conference on data engineering (ICDE 01). IEEE Computer Society, Washington, pp 215–224

61.

Pei J, Han J, Wang W (2002) Mining sequential patterns with constraints in large databases. In: Proceedings of the 2002 ACM international conference on information and knowledge management (CIKM 02). McLean, pp 18–25

62.

Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28(2):133–160CrossRef

63.

Silva A, Antunes C (2010) Pattern mining on stars with fp-growth. In: Proceedings of the 7th international conference on modeling decisions for artificial intelligence (MDAI 10). Springer, Perpignan, pp 175–186

64.

Silva A, Antunes C (2013) Pushing constraints into a pattern tree. In: Proceedings of the 10th international conference on modeling decisions for artificial intelligence (MDAI 13). Springer, Barcelona

65.

Silva A, Antunes C (2013) Pushing constraints into data streams. In: 2nd international workshop on big data, streams and heterogeneous source mining (BigMine 13). ACM, London, pp 79–86

66.

Silva A, Antunes C (2013) Towards the integration of constrained mining with star schemas. In: 13th IEEE international conference on data mining workshops—domain driven data mining (DDDM 13). IEEE Computer Society, pp 413–420

67.

Soulet A, Crmilleux B (2005) An efficient framework for mining flexible constraints. In: Ho T, Cheung D, Liu H (eds) Advances in knowledge discovery and data mining, Lecture Notes in Computer Science, vol 3518. Springer, Berlin, pp 661–671CrossRef

68.

Srikant R, Agrawal R (1995) Mining generalized association rules. In: Proceedings of the 21th international conference on very large data bases (VLDB 95). Morgan Kaufmann Publishers Inc., San Francisco, pp 407–419

69.

Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology: advances in database technology (EDBT 96). Springer, London, pp 3–17

70.

Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Proceedings of the 3rd ACM SIGKDD international conference on knowledge discovery and data mining (KDD 97). AAAI Press, California, pp 67–73

71.

Tseng VS, Wu CW, Shie BE, Yu PS (2010) Up-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 10). ACM, London, pp 253–262

72.

Wang K, Jiang Y, Lakshmanan LVS (2003) Mining unexpected rules by pushing user dynamics. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 03). ACM, Washington, pp 246–255

73.

Wang K, Jiang Y, Yu JX, Dong G, Han J (2005) Divide-and-approximate: a novel constraint push strategy for iceberg cube mining. IEEE Trans Knowl Data Eng 17(3):354–368CrossRef

74.

Wu CW, Lin YF, Yu PS, Tseng VS (2013) Mining high utility episodes in complex event sequences. In: Proceedings of 19th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 13). ACM, London, pp 536–544

75.

Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inf Technol Decis Mak 5(4):597–604CrossRef

76.

Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the fourth SIAM international conference on data mining (ICDM 04), pp 482–486

77.

Yin J, Zheng Z, Cao L (2012) Uspan: An efficient algorithm for mining high utility sequential patterns. In: Proceedings of 18th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 12). ACM, London, pp 660–668

78.

Yun U, Leggett JJ (2005) Wfim: Weighted frequent itemset mining with a weight range and a minimum weight. In: SDM

79.

Zaki M (2000) Sequence mining in categorical domains: Incorporating constraints. In: Proceedings of the 9th international conference on information and knowledge management (CIKM 00). ACM, McLean, pp 422–429

80.

Zhang X, Chou PL, Dong G (2007) Efficient computation of iceberg cubes by bounding aggregate functions. IEEE Trans Knowl Data Eng 19(7):903–918CrossRef

81.

Zhu F, Yan X, Han J, Yu PS (2007) gprune: a constraint pushing framework for graph pattern mining. In: Proceedings of the 11th Pacific-Asia conference on advances in knowledge discovery and data mining (PAKDD 07). Springer, Nanjing, pp 388–400

Titel: Constrained pattern mining in the new era
verfasst von: Andreia Silva
Cláudia Antunes
Publikationsdatum: 01.06.2016
Verlag: Springer London
Erschienen in: Knowledge and Information Systems / Ausgabe 3/2016
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-015-0860-5

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 3/2016

A transversal hypergraph approach for the frequent itemset hiding problem

Mining exceptional relationships with grammar-guided genetic programming

Tracking the evolution of social emotions with topic models

A differentially private algorithm for location data release

AMORE: design and implementation of a commercial-strength parallel hybrid movie recommendation engine

Splitting anonymization: a novel privacy-preserving approach of social network