2011 | OriginalPaper | Buchkapitel
XStreamCluster: An Efficient Algorithm for Streaming XML Data Clustering
verfasst von : Odysseas Papapetrou, Ling Chen
Erschienen in: Database Systems for Advanced Applications
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
XML clustering finds many applications, ranging from storage to query processing. However, existing clustering algorithms focus on static XML collections, whereas modern information systems frequently deal with streaming XML data that needs to be processed online. Streaming XML clustering is a challenging task because of the high
computational
and
space
efficiency requirements implicated for online approaches. In this paper we propose
XStreamCluster
, which addresses the two challenges using a two-layered optimization. The bottom layer employs Bloom filters to encode the XML documents, providing a space-efficient solution to memory usage. The top layer is based on Locality Sensitive Hashing and contributes to the computational efficiency. The theoretical analysis shows that the approximate solution of XStreamCluster generates similarly good clusters as the exact solution, with high probability. The experimental results demonstrate that XStreamCluster improves both memory efficiency and computational time by at least an order of magnitude without affecting clustering quality, compared to its variants and a baseline approach.