Elsevier

Information Sciences

Volume 245, 1 October 2013, Pages 38-52
Information Sciences

Robust constrained fuzzy clustering

https://doi.org/10.1016/j.ins.2013.03.056Get rights and content

Abstract

It is well-known that outliers and noisy data can be very harmful when applying clustering methods. Several fuzzy clustering methods which are able to handle the presence of noise have been proposed. In this work, we propose a robust clustering approach called F-TCLUST based on trimming a fixed proportion of observations that are (“impartially”) determined by the data set itself. The proposed approach also considers an eigenvalue ratio constraint that makes it a mathematically well-defined problem and serves to control the allowed differences among cluster scatters. A computationally feasible algorithm is proposed for its practical implementation. Some guidelines about how to choose the parameters controlling the performance of the fuzzy clustering procedure are also given.

Introduction

Hard clustering procedures are aimed at searching for a partition of data into k disjoint clusters, with similar subjects grouped in the same cluster and dissimilar subjects in different ones. On the other hand, fuzzy clustering methods provide non-negative membership values of observations to clusters, and this generates overlapping clusters where every subject is shared among all clusters (see, e.g., [35], [11], [3], [22]).

It is also widely recognized that clustering methods need to be robust if they are to be useful in practice. Notice that, otherwise, clustering ability may deteriorate drastically due to the presence of even a small fraction of outlying data. In fact, historically, the fuzzy clustering community was the first one to face the robustness challenge in Clustering. This is mainly due to the fact that outliers tend to be approximately “equally remote” from all clusters and, thus, they may have similar (but not necessarily small) membership values. For instance, membership values for outlying observations could be close to 1/k for all the clusters, with k being the number of groups whenever membership values are assumed to sum up to 1. [8], [1] provide a general review of robust fuzzy clustering methods (see also [17] for a review of robust hard clustering procedures).

One of the methods which is more widely considered in robust fuzzy clustering is the fuzzy C-means method with “noise component” [6] and a plethora of modifications. Unfortunately, this approach inherits its preference for spherical clusters from fuzzy C-means. So, this method is often unable to properly detect clusters with very different shapes. Several procedures have been proposed to address this problem (see, e.g., [20], [19], [36], [33]).

In this work, we adapt a hard robust clustering approach called TCLUST [16] to the fuzzy clustering framework. The proposed approach also extends the “Least Trimmed Squares” approach to fuzzy clustering introduced by [23] toward a more general methodology. The proposed methodology is thus based on trimming a fixed fraction α of the “most outlying” observations. We may denote this trimming as “impartial” since the data set itself tells us which are the observations to be trimmed off without the intervention of the user declaring privileged directions or zones for trimming. The fixed trimming level controls the number of observations to be discarded in a different way from other methods that are based on fixing a “noise distance” (see, e.g., [7], [9], [31]). These methods are also cited as “noise clustering” in the literature. Discarding a fixed fraction of data has also been considered in [24].

There exist other interesting fuzzy clustering proposals where robustness is incorporated through the replacement of the Euclidean distance as a measure of the discrepancies between observations and cluster centers [37], [27], [39]. However, they are mainly aimed at searching spherical equally scattered groups as fuzzy C-means methods do. Another interesting robust fuzzy clustering approach to dynamic classification can be found in [12].

An important feature of the proposed approach is that it allows for non-spherically-shaped clusters, but it also forces the obtained clusters to be “comparable” in terms of cluster scatters. In this way, clusters with arbitrarily very different scatters are not allowed. This is done by imposing an eigenvalue ratio constraint on the cluster scatter matrices. Some type of constraint on the scatter matrices is compulsory because, otherwise, the fuzzy clustering problem would become a mathematically ill-posed problem.

The proposed methodology, called F-TCLUST, is presented in Section 2. A feasible algorithm for its practical application is given in Section 3. The algorithm is theoretically justified in Section 4. Section 5 provides some guidance about how to choose the several parameters that F-TCLUST takes into account. Section 6 presents an application to a well-known real data set. Finally, the paper concludes with some closing remarks and future research lines.

Section snippets

The F-TCLUST method

Suppose that we have n observations {x1, …, xn} in Rp and we want to group them into k clusters in a fuzzy way. Therefore, our aim is to obtain a collection of non-negative membership values uij  [0, 1] for all i = 1, …, n and j = 1, …, k. A membership value 1 indicates that object i fully belongs to cluster j while a 0 membership value means that it does not belong at all to this cluster. However, intermediate degrees of membership are allowed when uij  (0, 1). We consider that an observation is

A feasible algorithm

The maximization of the objective functions (1), (3) with all these constraints is not an easy task. In this section, we propose a computationally feasible algorithm aimed at solving this complex problem. The proposed algorithm is based on two alternating steps. First, given the values of the parameters in a given iteration, the best possible membership values are obtained. Conversely, given some membership values, the parameters are updated by maximizing expression (3) on these parameters.

Justification of the proposed algorithm

As previously commented, the proposed algorithm is based on two alternating steps. In one of them, we search for the membership values maximizing the target function given the current parameters (Steps 2.1 and 2.2), and, in the other, we search for the parameters maximizing the target function under the constraint on the eigenvalues (Step 2.3) given the current membership values. The algorithm thus increases the value of the target function through this iterative process, which allows to find a

Choice of parameters

Since the proposed methodology aims to be very general, several parameters are involved in it. In this section, we will explain the different roles that the parameters in the F-TCLUST play through a simulated data set. Thus we consider a very simple example made up of 450 random observations in R2 from the N2(0,I) distribution and another 450 observations from theN2510,4-2-24distribution. We also add 100 uniformly distributed observations in the rectangle [−10, 15] × [−10, 15], but not

A real data example

The “Swiss Bank Notes” data set in [13] includes p = 6 variables measuring certain features in the printed image of 100 genuine and 100 counterfeit old Swiss 1000-franc bank notes.

Fig. 8a shows a scatterplot of the fourth (“Distance of the inner frame to lower border”) against the sixth variable (“Length of the diagonal”) and Fig. 8b shows a scatterplot of the first (“Length”) against the fourth. In these two plots, the classification of bills in [13] is shown by using symbols “G” for the genuine

Conclusions and future research lines

In this paper, we have presented the so-called F-TCLUST robust fuzzy clustering approach which is based on a “maximum-likelihood” principle. The possibility of trimming a fixed proportion α of observations (self-determined by data) is also considered. An eigenvalue ratio constraint on the eigenvalues of the scatter matrices controls the relative shape and size of the clusters and serves to avoid the detection of spurious clusters. A computationally feasible algorithm is proposed which does not

References (39)

  • A. Banerjee et al.

    Robust clustering

    WIREs Data Mining and Knowledge Discovery

    (2012)
  • M. Barni et al.

    Comments on a possibilistic approach to clustering

    IEEE Transactions on Fuzzy Systems

    (1996)
  • J.C. Bezdek

    Pattern Recognition with Fuzzy Objective Function Algorithms

    (1981)
  • C. Borgelt, R. Kruse, Fuzzy and probabilistic clustering with shape and size constraints, in: Proceedings of the 11th...
  • M.-S. Choi, R. Krishnapuram, Fuzzy and robust formulation of maximum-likelihood-based gaussian mixture decomposition,...
  • R.D. Cook, Graphical detection of regression outliers and mixtures, in: Proceedings of the International Statistical...
  • R.N. Davé et al.

    Robust clustering methods: a unified view

    IEEE Transactions on Fuzzy Systems

    (1997)
  • R.N. Davé, S. Sen, Noise clustering algorithm revisited, in: Proceedings of the Biennial Workshop NAFIPS 1997,...
  • A.P. Dempster et al.

    Maximum likelihood from incomplete data via the EM algorithm

    Journal of the Royal Statistical Society, Series B

    (1977)
  • Cited by (26)

    • Fuzzy k-Means: history and applications

      2021, Econometrics and Statistics
    • Robust, fuzzy, and parsimonious clustering, based on mixtures of Factor Analyzers

      2018, International Journal of Approximate Reasoning
      Citation Excerpt :

      In each component, a reduced number of latent variables, called factors and lying in a lower-dimensional subspace, explain the observed variables, through a linear submodel. Therefore, there is scope for a robust fuzzy clustering approach, along the lines of [14], and [11] for fuzzy robust clusterwise regression. The original contribution of the present paper is to combine: i) robust estimation techniques; ii) fuzzy clustering; iii) dimension reduction through factor analysis; iv) the flexibility of having hard and soft assignments for units (the “hard contrast” property introduced in [35]).

    • Stepwise possibilistic c-regressions

      2016, Information Sciences
      Citation Excerpt :

      Integrating cluster analysis into the regression framework is another feasible approach for switching regression models. In particular, there are usually less definite boundaries between clusters in real datasets, so fuzzy clustering, which allows data points to belong to more than one cluster, generally performs better in real applications [3,17,36,46,50]. For fuzzy clustering, the fuzzy c-means (FCM) algorithm is the most commonly used method [3,23,48,53].

    • A novel kernel fuzzy clustering algorithm for Geo-Demographic Analysis

      2015, Information Sciences
      Citation Excerpt :

      The comparative experiments [26] showed that the MIPFGWC algorithm combining the SIM2 model with an intuitionistic possibilistic fuzzy geographically clustering algorithm has better clustering quality than other relevant algorithms such as NE [6], FGWC [14] and IPFGWC [25]. In addition to MIPFGWC, there are some relevant works concerning the applications and algorithms for GDA such as in [4,7,12,15,17,18,20–24,29,31,36,37,40]. Among all existing works, MIPFGWC is considered as the state-of-the-art algorithm for the GDA problem.

    • Unsupervised clustering and feature weighting based on Generalized Dirichlet mixture modeling

      2014, Information Sciences
      Citation Excerpt :

      In [16], the Robust C-Prototypes (RCP) uses the M-estimator [21] to extend the FCM. Similarly, the Fuzzy C Least Median of Squares (FCLMS) [32] uses the least median of squares estimator [41], and F-TCLUST [19] trims a fixed proportion of the observations. Another approach to make the FCM robust to noise introduces the concept of ‘Noise Cluster’ where noise points are identified in every iteration and assigned to an additional.

    View all citing articles on Scopus
    View full text