Skip to main content
Erschienen in: Knowledge and Information Systems 3/2016

01.03.2016 | Regular Paper

Frequent pattern mining in attributed trees: algorithms and applications

verfasst von: Claude Pasquier, Jérémy Sanhes, Frédéric Flouvat, Nazha Selmaoui-Folcher

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Frequent pattern mining is an important data mining task with a broad range of applications. Initially focused on the discovery of frequent itemsets, studies were extended to mine structural forms like sequences, trees or graphs. In this paper, we introduce a new domain of patterns, attributed trees (atrees), and a method to extract these patterns in a forest of atrees. Attributed trees are trees in which vertices are associated with itemsets. Mining this type of patterns (called asubtrees), which combines tree mining and itemset mining, requires the exploration of a huge search space. To make our approach scalable, we investigate the mining of condensed representations. For attributed trees, the classical concept of closure involves both itemset closure and structural closure. We present three algorithms for mining all patterns, closed patterns w.r.t. itemsets (content) and/or structure in attributed trees. We show that, for low support values, mining content-closed attributed trees is a good compromise between non-redundancy of solutions and execution time.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. SIGMOD Rec 22(2):207–216CrossRef Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. SIGMOD Rec 22(2):207–216CrossRef
2.
Zurück zum Zitat Agrawal R, Srikant R (1995) Mining sequential patterns. In: ICDE, 95, pp 3–14 Agrawal R, Srikant R (1995) Mining sequential patterns. In: ICDE, 95, pp 3–14
3.
Zurück zum Zitat Asai T, Abe K, Kawasoe S, Arimura H, Sakamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: SDM Asai T, Abe K, Kawasoe S, Arimura H, Sakamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: SDM
4.
Zurück zum Zitat Asai T, Arimura H, Uno T, Nakano S-I (2003) Discovering frequent substructures in large unordered trees. In: The 6th International Conference on Discovery Science, Springer, pp 47–61 Asai T, Arimura H, Uno T, Nakano S-I (2003) Discovering frequent substructures in large unordered trees. In: The 6th International Conference on Discovery Science, Springer, pp 47–61
5.
Zurück zum Zitat Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: KDD, pp 429–435 Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: KDD, pp 429–435
6.
Zurück zum Zitat Balcázar JL, Bifet A, Lozano A (2010) Mining frequent closed rooted trees. Mach Learn 78(1–2):1–33MathSciNet Balcázar JL, Bifet A, Lozano A (2010) Mining frequent closed rooted trees. Mach Learn 78(1–2):1–33MathSciNet
7.
Zurück zum Zitat Bayardo RJ (1998) Efficiently mining long patterns from databases. In: ACM SIGMOD International Conference on Management of Data SIGMOD 98, pp 85–93 Bayardo RJ (1998) Efficiently mining long patterns from databases. In: ACM SIGMOD International Conference on Management of Data SIGMOD 98, pp 85–93
8.
Zurück zum Zitat Chehreghani MH (2011) Efficiently mining unordered trees. In: ICDM, pp 111–120 Chehreghani MH (2011) Efficiently mining unordered trees. In: ICDM, pp 111–120
9.
Zurück zum Zitat Chi Y, Muntz RR, Nijssen S, Kok JN (2004) Frequent subtree mining—an overview. Fundam Inf 66(1–2):161–198MathSciNet Chi Y, Muntz RR, Nijssen S, Kok JN (2004) Frequent subtree mining—an overview. Fundam Inf 66(1–2):161–198MathSciNet
10.
Zurück zum Zitat Chi Y, Yang Y, Muntz RR (2003) Indexing and mining free trees. In: Proceedings of the 2003 IEEE International Conference on Data Mining (ICDM’03) Chi Y, Yang Y, Muntz RR (2003) Indexing and mining free trees. In: Proceedings of the 2003 IEEE International Conference on Data Mining (ICDM’03)
11.
Zurück zum Zitat Chi Y, Yang Y, Muntz RR (2004) Hybridtreeminer: an efficient algorithm for mining frequent rooted trees and free trees using canonical form. In: Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on, pp 11–20 Chi Y, Yang Y, Muntz RR (2004) Hybridtreeminer: an efficient algorithm for mining frequent rooted trees and free trees using canonical form. In: Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on, pp 11–20
12.
Zurück zum Zitat Chi Y, Yang Y, Xia Y, Muntz RR (2004) Cmtreeminer: mining both closed and maximal frequent subtrees. In: PAKDD, pp 63–73 Chi Y, Yang Y, Xia Y, Muntz RR (2004) Cmtreeminer: mining both closed and maximal frequent subtrees. In: PAKDD, pp 63–73
13.
Zurück zum Zitat Deshpande M, Kuramochi M, Karypis G (2003) Frequent sub-structure-based approaches for classifying chemical compounds. In: Third IEEE International Conference on Data Mining, IEEE Comput. Soc, pp 35–42 Deshpande M, Kuramochi M, Karypis G (2003) Frequent sub-structure-based approaches for classifying chemical compounds. In: Third IEEE International Conference on Data Mining, IEEE Comput. Soc, pp 35–42
14.
Zurück zum Zitat Fukuzaki M, Seki M, Kashima H, Sese J (2010) Finding itemset-sharing patterns in a large itemset-associated graph. In: PAKDD, pp 147–159 Fukuzaki M, Seki M, Kashima H, Sese J (2010) Finding itemset-sharing patterns in a large itemset-associated graph. In: PAKDD, pp 147–159
15.
Zurück zum Zitat Gay D, Selmaoui-Folcher N, Boulicaut J-F (2010) Application-independent feature construction based on almost-closedness properties. Knowl Inf Syst 30(1):87–111CrossRef Gay D, Selmaoui-Folcher N, Boulicaut J-F (2010) Application-independent feature construction based on almost-closedness properties. Knowl Inf Syst 30(1):87–111CrossRef
16.
Zurück zum Zitat Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. SIGMOD Rec 29(2):1–12CrossRef Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. SIGMOD Rec 29(2):1–12CrossRef
17.
Zurück zum Zitat Hido S, Kawano H (2005) Amiot: induced ordered tree mining in tree-structured databases. In: ICDM, pp 170–177 Hido S, Kawano H (2005) Amiot: induced ordered tree mining in tree-structured databases. In: ICDM, pp 170–177
18.
Zurück zum Zitat Jiang C, Coenen F, Zito M (2013) A survey of frequent subgraph mining algorithms. Knowl Eng Rev 28:75–105CrossRef Jiang C, Coenen F, Zito M (2013) A survey of frequent subgraph mining algorithms. Knowl Eng Rev 28:75–105CrossRef
19.
Zurück zum Zitat Luccio F, Enriquez AM, Rieumont PO, Pagli L (2001) Exact rooted subtree matching in sublinear time, Universita Di Pisa Technical Report TR-01 14 Luccio F, Enriquez AM, Rieumont PO, Pagli L (2001) Exact rooted subtree matching in sublinear time, Universita Di Pisa Technical Report TR-01 14
21.
Zurück zum Zitat Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations. In: KDD, pp 189–194 Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations. In: KDD, pp 189–194
22.
Zurück zum Zitat Miyoshi Y, Ozaki T, Ohkawa T (2009) Frequent pattern discovery from a single graph with quantitative itemsets. In: ICDM Workshops, pp 527–532 Miyoshi Y, Ozaki T, Ohkawa T (2009) Frequent pattern discovery from a single graph with quantitative itemsets. In: ICDM Workshops, pp 527–532
23.
Zurück zum Zitat Moser F, Colak R, Rafiey A, Ester M (2009) Mining cohesive patterns from graphs with feature vectors. In: SDM, pp 593–604 Moser F, Colak R, Rafiey A, Ester M (2009) Mining cohesive patterns from graphs with feature vectors. In: SDM, pp 593–604
24.
Zurück zum Zitat Mougel P-N, Rigotti C, Gandrillon O (2012) Finding collections of k-clique percolated components in attributed graphs. In: PAKDD, pp 181–192 Mougel P-N, Rigotti C, Gandrillon O (2012) Finding collections of k-clique percolated components in attributed graphs. In: PAKDD, pp 181–192
25.
Zurück zum Zitat Nijssen S, Kok JN (2003) Efficient discovery of frequent unordered trees. In: First International Workshop on Mining Graphs, Trees and Sequences (MGTS) Nijssen S, Kok JN (2003) Efficient discovery of frequent unordered trees. In: First International Workshop on Mining Graphs, Trees and Sequences (MGTS)
26.
Zurück zum Zitat Pasquier C, Sanhes J, Flouvat F, Selmaoui-Folcher N (2013) Frequent Pattern Mining in Attributed trees. In: Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’13)., Gold Coast Australia, pp 26–37 Pasquier C, Sanhes J, Flouvat F, Selmaoui-Folcher N (2013) Frequent Pattern Mining in Attributed trees. In: Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’13)., Gold Coast Australia, pp 26–37
27.
Zurück zum Zitat Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: ICDT, pp 398–416 Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: ICDT, pp 398–416
28.
Zurück zum Zitat Pensa RG, Boulicaut J-F (2005) From local pattern mining to relevant bi-cluster characterization. In: 6th International Symposium on Intelligent Data Analysis (IDA 2005), pp 293–304 Pensa RG, Boulicaut J-F (2005) From local pattern mining to relevant bi-cluster characterization. In: 6th International Symposium on Intelligent Data Analysis (IDA 2005), pp 293–304
29.
Zurück zum Zitat Rymon R (1992) Search through systematic set enumeration. In: Proceedings of the 3rd International Conference on Principles of Knowledge Representation and Reasoning (KR’92), pp 539–550 Rymon R (1992) Search through systematic set enumeration. In: Proceedings of the 3rd International Conference on Principles of Knowledge Representation and Reasoning (KR’92), pp 539–550
30.
Zurück zum Zitat Selmaoui-Folcher N, Flouvat F (2011) How to use “classical” tree mining algorithms to find complex spatio-temporal patterns?. In: DEXA (2), pp 107–117 Selmaoui-Folcher N, Flouvat F (2011) How to use “classical” tree mining algorithms to find complex spatio-temporal patterns?. In: DEXA (2), pp 107–117
31.
Zurück zum Zitat Termier A, Rousset M-C, Sebag M (2004) Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases. In: ICDM, pp 543–546 Termier A, Rousset M-C, Sebag M (2004) Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases. In: ICDM, pp 543–546
32.
Zurück zum Zitat Termier A, Rousset M-C, Sebag M, Ohara K, Washio T, Motoda H (2008) Dryadeparent, an efficient and robust closed attribute tree mining algorithm. IEEE Trans Knowl Data Eng 20(3):300–320CrossRef Termier A, Rousset M-C, Sebag M, Ohara K, Washio T, Motoda H (2008) Dryadeparent, an efficient and robust closed attribute tree mining algorithm. IEEE Trans Knowl Data Eng 20(3):300–320CrossRef
33.
Zurück zum Zitat Wang C, Hong M, Pei J, Zhou H, Wang W, Shi B (2004) Efficient pattern-growth methods for frequent tree pattern mining. In: PAKDD, pp 441–451 Wang C, Hong M, Pei J, Zhou H, Wang W, Shi B (2004) Efficient pattern-growth methods for frequent tree pattern mining. In: PAKDD, pp 441–451
34.
Zurück zum Zitat Washio T, Motoda H (2003) State of the art of graph-based data mining. SIGKDD Explor Newsl 5(1):59–68CrossRef Washio T, Motoda H (2003) State of the art of graph-based data mining. SIGKDD Explor Newsl 5(1):59–68CrossRef
35.
Zurück zum Zitat Xiao Y, Yao J-F, Li Z, Dunham MH (2003) Efficient data mining for maximal frequent subtrees. In: ICDM, pp 379–386 Xiao Y, Yao J-F, Li Z, Dunham MH (2003) Efficient data mining for maximal frequent subtrees. In: ICDM, pp 379–386
36.
Zurück zum Zitat Yan X, Yu PS, Han J (2004) Graph indexing: a frequent structure-based approach. In: SIGMOD Conference, pp 335–346 Yan X, Yu PS, Han J (2004) Graph indexing: a frequent structure-based approach. In: SIGMOD Conference, pp 335–346
37.
Zurück zum Zitat Zaki MJ (2002) Efficiently mining frequent trees in a forest. In: KDD, pp 71–80 Zaki MJ (2002) Efficiently mining frequent trees in a forest. In: KDD, pp 71–80
38.
Zurück zum Zitat Zaki MJ (2004) Efficiently mining frequent embedded unordered trees. Fundam Inf 66(1–2):33–52MathSciNet Zaki MJ (2004) Efficiently mining frequent embedded unordered trees. Fundam Inf 66(1–2):33–52MathSciNet
39.
Zurück zum Zitat Zaki MJ (2005) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans Knowl Data Eng 17(8):1021–1035CrossRef Zaki MJ (2005) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans Knowl Data Eng 17(8):1021–1035CrossRef
40.
Zurück zum Zitat Zou L, Lu Y, Zhang H, Hu R (2006) Prefixtreespan: a pattern growth algorithm for mining embedded subtrees. In: WISE, pp 499–505 Zou L, Lu Y, Zhang H, Hu R (2006) Prefixtreespan: a pattern growth algorithm for mining embedded subtrees. In: WISE, pp 499–505
Metadaten
Titel
Frequent pattern mining in attributed trees: algorithms and applications
verfasst von
Claude Pasquier
Jérémy Sanhes
Frédéric Flouvat
Nazha Selmaoui-Folcher
Publikationsdatum
01.03.2016
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 3/2016
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-015-0831-x

Weitere Artikel der Ausgabe 3/2016

Knowledge and Information Systems 3/2016 Zur Ausgabe

Premium Partner