ABSTRACT
Recent research on pattern discovery has progressed form mining frequent itemsets and sequences to mining structured patterns including trees, lattices, and graphs. As a general data structure, graph can model complicated relations among data with wide applications in bioinformatics, Web exploration, and etc. However, mining large graph patterns in challenging due to the presence of an exponential number of frequent subgraphs. Instead of mining all the subgraphs, we propose to mine closed frequent graph patterns. A graph g is closed in a database if there exists no proper supergraph of g that has the same support as g. A closed graph pattern mining algorithm, CloseGraph, is developed by exploring several interesting pruning methods. Our performance study shows that CloseGraph not only dramatically reduces unnecessary subgraphs to be generated but also substantially increases the efficiency of mining, especially in the presence of large graph patterns.
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB'94.]] Google ScholarDigital Library
- T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Satamoto, and S. Arikawa. Efficient substructure discovery from large semi-structured data. In SDM'02.]]Google Scholar
- C. Borgelt and M. R. Berthold. Mining molecular fragments: Finging relevant substructures of molecules. In ICDM'02.]] Google ScholarDigital Library
- D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: A maximal frequent itemset algorithm for transactional databases. In ICDE'01.]] Google ScholarDigital Library
- T. Cormen, C. Leiserson, R. Rivest, and C. Stein. Introduction to Algorithms, 2nd ed. The MIT Press, Cambridge, MA, 2001.]] Google ScholarDigital Library
- L. Dehaspe, H. Toivonen, and R. King. Finding frequent substructures in chemical compounds. In KDD'98.]]Google Scholar
- I. Eidhammer, I. Jonassen, and W. R. Taylor Structure comparison and structure patterns. In J. of Comp. Bio. 7, 2000.]]Google Scholar
- J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD'00.]] Google ScholarDigital Library
- L. B. Holder, D. J. Cook, and S. Djoko. Substructure discovery in the subdue system. In KDD'94.]]Google Scholar
- A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In PKDD'00.]] Google ScholarDigital Library
- M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM'01.]] Google ScholarDigital Library
- M. Kuramochi and G. Karypis. Discovering frequent geometric subgraphs. In ICDM'02.]] Google ScholarDigital Library
- N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In ICDT'99.]] Google ScholarDigital Library
- J. Pei, J. Han, and R. Mao. CLOSET: An efficient algorithm for mining frequent closed itemsets. In DMKD'00.]]Google Scholar
- N. Vanetik, E. Gudes, and S. E. Shimony. Computing frequent graph patterns from semistructured data. In ICDM'02.]] Google ScholarDigital Library
- X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. UIUC-CS Tech. Report: R-2002-2296, (a 4-page short version in ICDM'02).]]Google Scholar
- X. Yan, J. Han, and R. Afshar. CloSpan: Mining closed sequential patterns in large datasets. In SDM'03.]]Google Scholar
- M. J. Zaki and C. J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In SDM'02.]]Google Scholar
- M. J. Zaki. Efficiently mining frequent trees in a forest. In KDD'02.]] Google ScholarDigital Library
Index Terms
- CloseGraph: mining closed frequent graph patterns
Recommendations
TSP: Mining top-k closed sequential patterns
Sequential pattern mining has been studied extensively in the data mining community. Most previous studies require the specification of a min_support threshold for mining a complete set of sequential patterns satisfying the threshold. However, in ...
TSP: Mining top-k closed sequential patterns
Sequential pattern mining has been studied extensively in the data mining community. Most previous studies require the specification of a min_support threshold for mining a complete set of sequential patterns satisfying the threshold. However, in ...
Mining frequent closed patterns in pointset databases
In this paper, we proposed an efficient algorithm, called PCP-Miner (Pointset Closed Pattern Miner), for mining frequent closed patterns from a pointset database, where a pointset contains a set of points. Our proposed algorithm consists of two phases. ...
Comments