2010 | OriginalPaper | Chapter
Mining Distance-Constrained Embedded Subtrees
Authors : Fedja Hadzic, Henry Tan, Tharam S. Dillon
Published in: Mining of Data with Complex Structures
Publisher: Springer Berlin Heidelberg
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
For certain applications, the distance between the nodes in a hierarchical structure could be considered important and two embedded subtrees with different distance relationships among the nodes need to be considered as separate entities. The embedded subtrees extracted using the traditional definition are incapable of being further distinguished based upon the node distance within that subtree. In this chapter, we describe the extension of the general TMG framework, to enable the mining of distance-constrained embedded subtrees, (Hadzic 2008; Tan 2008). In such subtrees, the distances of the nodes relative to the root of the subtree need to be taken into account during the candidate enumeration phase. The distances of nodes relative to the root (node depth) of a particular subtree will need to be stored and used as an additional equality criterion for grouping the enumerated candidate subtrees. In Chapter 9, we will illustrate scenarios and applications where the mining of distance-constrained embedded subtrees would be preferable to mining of traditional embedded subtrees, since the extracted subtree patterns will be more informative. We also highlight the importance of distance-constrained subtree mining in the context of web log mining, where the web logs are represented in tree-structured form. In what follows, we will discuss the importance of distance-constrained embedded subtrees from a more general perspective and relate it to some previous work on extracting tree-structured queries.