For certain applications, the distance between the nodes in a hierarchical structure could be considered important and two embedded subtrees with different distance relationships among the nodes need to be considered as separate entities. The embedded subtrees extracted using the traditional definition are incapable of being further distinguished based upon the node distance within that subtree. In this chapter, we describe the extension of the general TMG framework, to enable the mining of distance-constrained embedded subtrees, (Hadzic 2008; Tan 2008). In such subtrees, the distances of the nodes relative to the root of the subtree need to be taken into account during the candidate enumeration phase. The distances of nodes relative to the root (node depth) of a particular subtree will need to be stored and used as an additional equality criterion for grouping the enumerated candidate subtrees. In Chapter 9, we will illustrate scenarios and applications where the mining of distance-constrained embedded subtrees would be preferable to mining of traditional embedded subtrees, since the extracted subtree patterns will be more informative. We also highlight the importance of distance-constrained subtree mining in the context of web log mining, where the web logs are represented in tree-structured form. In what follows, we will discuss the importance of distance-constrained embedded subtrees from a more general perspective and relate it to some previous work on extracting tree-structured queries.
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
- Mining Distance-Constrained Embedded Subtrees
Tharam S. Dillon
- Springer Berlin Heidelberg
Neuer Inhalt/© ITandMEDIA