Cluster ensemble methods attempt to find better and more robust clustering solutions by fusing information from several data partitionings. In this chapter, we address the different phases of this recent approach: from the generation of the partitions,
the clustering ensemble
, to the combination and validation of the combined result. While giving an overall revision of the state-of-the-art in the area, we focus on our own work on the subject. In particular, the Evidence Accumulation Clustering (EAC) paradigm is detailed and analyzed. For the validation/selection of the final partition, we focus on metrics that can quantitatively measure the consistency between partitions and combined results, and thus enabling the choice of best results without the use of additional information. Information-theoretic measures in conjunction with a variance analysis using bootstrapping are detailed and empirically evaluated. Experimental results throughout the paper illustrate the various concepts and methods addressed, using synthetic and real data and involving both vectorial and string-based data representations. We show that the clustering ensemble approach can be used in very distinct contexts with the state of the art quality results.
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten