17-02-2020 | Original Article | Issue 8/2020

An adaptive kernelized rank-order distance for clustering non-spherical data with high noise
Important notes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Abstract
Clustering is a fundamental research topic in unsupervised learning. Similarity measure is a key factor for clustering. However, it is still challenging for existing similarity measures to cluster non-spherical data with high noise levels. Rank-order distance is proposed to well capture the structures of non-spherical data by sharing the neighboring information of the samples, but it cannot well tolerate high noise. In order to address above issue, we propose KROD, a new similarity measure incorporating rank-order distance with Gaussian kernel. By reducing the noise in the neighboring information of samples, KROD improves rank-order distance to tolerate high noise, thus the structures of non-spherical data with high noise levels can be well captured. Then, KROD strengthens these captured structures by Gaussian kernel so that the samples in the same cluster are closer to each other and can be easily clustered correctly. Experiment illustrates that KROD can effectively improve existing methods for discovering non-spherical clusters with high noise levels. The source code can be downloaded from https://github.com/grcai.