2011 | OriginalPaper | Buchkapitel
A Parallel Cop-Kmeans Clustering Algorithm Based on MapReduce Framework
verfasst von : Chao Lin, Yan Yang, Tonny Rutayisire
Erschienen in: Knowledge Engineering and Management
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Clustering with background information is highly desirable in many business applications recently due to its potential to capture important semantics of the business/dataset. Must-Link and Cannot-Link constraints between a given pair of instances in the dataset are common prior knowledge incorporated in many clustering algorithms today. Cop-Kmeans incorporates these constraints in its clustering mechanism. However, due to rapidly increasing scale of data today, it is becoming overwhelmingly difficult for it to handle massive dataset. In this paper, we propose a parallel Cop-Kmeans algorithm based on MapReduce- a technique which basically distributes the clustering load over a given number of processors. Experimental results show that this approach can scale well to massive dataset while maintaining all crucial characteristics of the serial Cop-Kmeans algorithm.