2006 | OriginalPaper | Chapter
A cluster stability criteria based on the two-sample test concept
Authors: Z. Volkovich, Z. Barzily, L. Morozensky
Publisher: Springer Berlin Heidelberg
A method for assessing cluster stability is presented in this paper. We hypothesize that if one uses a “consistent” clustering algorithm to partition several independent samples then the clustered samples should be identically distributed. We use the two sample energy test approach for analyzing this hypothesis. Such a test is not very efficient in the clustering problems because outliers in the samples and limitations of the clustering algorithms heavily contribute to the noise level. Thus, we repeat calculating the value of the test statistic many times and an empirical distribution of this statistic is obtained. We choose the value of the “true” number of clusters as the one which yields the most concentrated distribution. Results of the numerical experiments are reported.