2010 | OriginalPaper | Buchkapitel
A Clustering Algorithm Based on Matrix over High Dimensional Data Stream
verfasst von : Guibin Hou, Ruixia Yao, Jiadong Ren, Changzhen Hu
Erschienen in: Web Information Systems and Mining
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Clustering high-dimensional data stream is a difficult and important issue. In this paper, we propose MStream, a new clustering algorithm based on matrix over high dimensional data stream. MStream algorithm incorporates a synopsis structure, called GC (Grid Cell Structure), and grid matrix technique. The algorithm adopts the two-phased framework. In the online component, the GC is employed to monitor one-dimensional statistics data distribution of each dimension independently. Sparse GCs which need to be deleted are checked by predefined threshold. In the offline component, it is possible to tracing multi-dimensional clusters by dense GCs which are maintained in the online component. Grid matrix technique is introduced to generate the final multi-dimensional clusters in the whole data space. Experimental results show that our algorithm has the flexible scalability and higher clustering quality.