Clustering is a widely applied tool of data mining to detect the hidden structure of complex multivariate datasets. Hence, clustering solves two kinds of problems simultaneously, it partitions the datasets into cluster of objects that are similar to each other and describes the clusters by cluster prototypes to provide some information about the distribution of the data. In most of the cases these cluster prototypes describe the clusters as simple geometrical objects, like spheres, ellipsoids, lines, linear subspaces etc., and the cluster prototype defines a special distance function. Unfortunately in most of the cases the user does not have prior knowledge about the number of clusters and not even about the proper shape of prototypes. The real distribution of data is generally much more complex than these simple geometrical objects, and the number of clusters depends much more on how well the chosen cluster prototypes fit the distribution of data than on the real groups within the data. This is especially true when the clusters are used for local linear modeling purposes.
The aim of this paper is not to define a new distance norm based on a problem dependent cluster prototype but to show how the so called geodesic distance that is based on the exploration of the manifold the data lie on, can be used in the clustering instead of the classical Euclidean distance. The paper presents how this distance measure can be integrated within fuzzy clustering and some examples are presented to demonstrate the advantages of the proposed new methods.