Spectral clustering algorithms recently gained much interest in research community. This surge in interest is mainly due to their ease of use, their applicability to a variety of data types and domains as well as the fact that they very often outperform traditional clustering algorithms. These algorithms consider the pair-wise similarity between data objects and construct a similarity matrix to group data into natural subsets, so that the objects located in the same cluster share many common characteristics. Objects are then allocated into clusters by employing a proximity measure, which is used to compute the similarity or distance between the data objects in the matrix. As such, an early and fundamental step in spectral cluster analysis is the selection of a proximity measure. This choice also has the highest impact on the quality and usability of the end result. However, this crucial aspect is frequently overlooked. For instance, most prior studies use the Euclidean distance measure without explicitly stating the consequences of selecting such measure. To address this issue, we perform a comparative and explorative study on the performance of various existing proximity measures when applied to spectral clustering algorithm. Our results indicate that the commonly used Euclidean distance measure is not always suitable, specifically in domains where the data is highly imbalanced and the correct clustering of boundary objects are critical. Moreover, we also noticed that for numeric data type, the relative distance measures outperformed the absolute distance measures and therefore, may boost the performance of a clustering algorithm if used. As for the datasets with mixed variables, the selection of distance measure for numeric variable again has the highest impact on the end result.
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
- Spectral Clustering: An Explorative Study of Proximity Measures
Nadia Farhanaz Azam
Herna L. Viktor
- Springer Berlin Heidelberg