2005 | OriginalPaper | Buchkapitel
An Efficient Algorithm for Mining Both Closed and Maximal Frequent Free Subtrees Using Canonical Forms
verfasst von : Ping Guo, Yang Zhou, Jun Zhuang, Ting Chen, Yan-Rong Kang
Erschienen in: Advanced Data Mining and Applications
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
A large number of text files, including HTML documents and XML documents, can be organized as tree structures. One objective of data mining is to discover frequent patterns in them. In this paper, first, we introduce a canonical form of free tree, which is based on the
breadth-first canonical string;
secondly, we present some properties of a closed frequent subtree and a maximal frequent subtree as well as their relationships
;
thirdly, we study a pruning technique of frequent free subtree and improvement on the mining of the nonclosed frequent free subtree; finally, we present an algorithm that mines all closed and maximal frequent free trees and prove validity of this algorithm.