Skip to main content
Erschienen in: Neural Computing and Applications 7/2009

01.10.2009 | Original Article

A novel pruning approach for robust data clustering

verfasst von: Xu-Lei Yang, Qing Song, Yi-Lei Wu, Ai-Ze Cao

Erschienen in: Neural Computing and Applications | Ausgabe 7/2009

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we make an effort to overcome the sensitivity of traditional clustering algorithms to noisy data points (noise and outliers). A novel pruning method, in terms of information theory, is therefore proposed to phase out noisy points for robust data clustering. This approach identifies and prunes the noisy points based on the maximization of mutual information against input data distributions such that the resulting clusters are least affected by noise and outliers, where the degree of robustness is controlled through a separate parameter to make a trade-off between rejection of noisy points and optimal clustered data. The pruning approach is general, and it can improve the robustness of many existing traditional clustering methods. In particular, we apply the pruning approach to improve the robustness of fuzzy c-means clustering and its extensions, e.g., fuzzy c-spherical shells clustering and kernel-based fuzzy c-means clustering. As a result, we obtain three clustering algorithms that are the robust versions of the existing ones. The effectiveness of the proposed pruning approach is supported by experimental results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
In this paper, the traditional clustering algorithms specially mean the basic K-means and fuzzy c-means algorithms, but not their improved versions, such as the enhanced LBG (less sensitive to data initialization) [18] or the possibilistic c-means (less sensitive to noisy data points) [12].
 
2
Most researchers assume that they can provide their clustering algorithms with a suitable initialization. Others use multiple (random) initializations that guarantee (with a given probability) that at least one initialization is good.
 
3
The average mutual information can be written in either of the two following forms [2]: \(I=\sum_{j=1}^l\sum_{k=1}^c {u_j u_{k|j}\log\frac{{u_{j|k}}}{{u_j}}}\) or \(I=\sum_{j=1}^l\sum_{k=1}^c {u_j u_{k|j}\log\frac{{u_{k|j}}}{{u_k}}}.\)
 
4
Please note the prior distribution u j is equal to \(\frac{{1}}{{l}}\) in the clustering procedure of traditional clustering algorithms, while it is used to phase out the noisy points in the proposed pruning method as discussed later in this section.
 
5
We run the four clustering algorithms on X12 with the same initial cluster centers ([−3.34, 1.67][1.67, 0.00]) as in [17]. It is observed that the result of FCM is nearly identical to that in [17]; however, the results of PCM and PFCM are a bit worse than those in [17]. For the purpose of fair comparison, the numerical results of PCM and PFCM in Table 1 are directly taken from [17], while the numerical results of FCM and RFCM are generated by our program.
 
Literatur
1.
Zurück zum Zitat Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New YorkMATH Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New YorkMATH
2.
3.
Zurück zum Zitat Blahut RE (1987) Principle and practice of information theory. Addison-Wesley, Massachusetts Blahut RE (1987) Principle and practice of information theory. Addison-Wesley, Massachusetts
4.
Zurück zum Zitat Dave RN (1990) Fuzzy shell-clustering and applications to circle detection in digital images. Int J Gen Syst 16(4):343–355CrossRefMathSciNet Dave RN (1990) Fuzzy shell-clustering and applications to circle detection in digital images. Int J Gen Syst 16(4):343–355CrossRefMathSciNet
5.
Zurück zum Zitat Dave RN, Bhaswan K (1992) Adaptive fuzzy c-shells clustering and detection of ellipse. IEEE Trans Neural Netw 3(5):643–662CrossRef Dave RN, Bhaswan K (1992) Adaptive fuzzy c-shells clustering and detection of ellipse. IEEE Trans Neural Netw 3(5):643–662CrossRef
6.
Zurück zum Zitat Dave RN, Krishnapuram R (1997) Robust clustering methods: a unified view. IEEE Trans Fuzzy Syst 5(2):270–293CrossRef Dave RN, Krishnapuram R (1997) Robust clustering methods: a unified view. IEEE Trans Fuzzy Syst 5(2):270–293CrossRef
7.
Zurück zum Zitat Gath I, Geva AB (1989) Unsupervised optimal fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 11(7):773–781CrossRef Gath I, Geva AB (1989) Unsupervised optimal fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 11(7):773–781CrossRef
8.
Zurück zum Zitat Girolami M (2002) Mercer kernel-based clustering in feature space. IEEE Trans Neural Netw 13(3):780–784CrossRef Girolami M (2002) Mercer kernel-based clustering in feature space. IEEE Trans Neural Netw 13(3):780–784CrossRef
9.
Zurück zum Zitat Cuillén A, Pomares H, Rojas I, Gonzélez J, Herrera LJ, Rojas F, Valenzuela O (2007) Studying possibility in a clustering algorithm for RBFNN design for function approximation. Neural Comput Appl 17(1):75–89CrossRef Cuillén A, Pomares H, Rojas I, Gonzélez J, Herrera LJ, Rojas F, Valenzuela O (2007) Studying possibility in a clustering algorithm for RBFNN design for function approximation. Neural Comput Appl 17(1):75–89CrossRef
10.
Zurück zum Zitat Gustafson EE, Kessel WC (1979) Fuzzy clustering with a fuzzy covariance matrix. In: Proceedings of the IEEE conference decision control. San Diego, CA, pp 761–766 Gustafson EE, Kessel WC (1979) Fuzzy clustering with a fuzzy covariance matrix. In: Proceedings of the IEEE conference decision control. San Diego, CA, pp 761–766
11.
Zurück zum Zitat Krishnapuram R, Nasraoui O, Frigui H (1992) The fuzzy c-spherical shells algorithm: a new approach. IEEE Trans Neural Netw 3(5):663–671CrossRef Krishnapuram R, Nasraoui O, Frigui H (1992) The fuzzy c-spherical shells algorithm: a new approach. IEEE Trans Neural Netw 3(5):663–671CrossRef
12.
Zurück zum Zitat Krishnapuram R, Keller JM (1993) A possibilistic approach to clustering. IEEE Trans Fuzzy Syst 1(2):98–110CrossRef Krishnapuram R, Keller JM (1993) A possibilistic approach to clustering. IEEE Trans Fuzzy Syst 1(2):98–110CrossRef
13.
Zurück zum Zitat Krishnapuram R, Keller JM (1996) The possibilistic c-means algorithm: insights and recommendations. IEEE Trans Fuzzy Syst 4(3):385–393CrossRef Krishnapuram R, Keller JM (1996) The possibilistic c-means algorithm: insights and recommendations. IEEE Trans Fuzzy Syst 4(3):385–393CrossRef
14.
Zurück zum Zitat MacQueen S (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, pp 281–297 MacQueen S (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, pp 281–297
15.
Zurück zum Zitat Man Y, Gath I (1994) Detection and separation of ring-shaped clusters using fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 16(8):855–861CrossRef Man Y, Gath I (1994) Detection and separation of ring-shaped clusters using fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 16(8):855–861CrossRef
16.
Zurück zum Zitat Müller KR, Mike S, Ratsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel based learning algorithms. IEEE Trans Neural Netw 12(2):181–201CrossRef Müller KR, Mike S, Ratsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel based learning algorithms. IEEE Trans Neural Netw 12(2):181–201CrossRef
17.
Zurück zum Zitat Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic fuzzy c-means clusteirng algorithm. IEEE Trans Fuzzy Syst 13(4):517–530CrossRefMathSciNet Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic fuzzy c-means clusteirng algorithm. IEEE Trans Fuzzy Syst 13(4):517–530CrossRefMathSciNet
18.
Zurück zum Zitat Patané G, Russo M (2001) The enhanced LBG algorithm. Neural Netw 14(9):1219–1237CrossRef Patané G, Russo M (2001) The enhanced LBG algorithm. Neural Netw 14(9):1219–1237CrossRef
19.
Zurück zum Zitat Rose K (1998) Deterministic annealing for clustering, compression, classification, regression, and related optimization problems. Proc IEEE 86(11):2210–2239CrossRef Rose K (1998) Deterministic annealing for clustering, compression, classification, regression, and related optimization problems. Proc IEEE 86(11):2210–2239CrossRef
20.
Zurück zum Zitat Schölkopf B, Smola AJ, Müller KR (1996) Nonlinear component analysis as a kernel eigenvalue problem. Technical report, Max Planck Institute for Biological Cybernetics, Tubingen, Germany Schölkopf B, Smola AJ, Müller KR (1996) Nonlinear component analysis as a kernel eigenvalue problem. Technical report, Max Planck Institute for Biological Cybernetics, Tubingen, Germany
21.
Zurück zum Zitat Selim SZ, Ismail MA (1984) K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans Pattern Anal Mach Intell 6(1):81–86MATHCrossRef Selim SZ, Ismail MA (1984) K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans Pattern Anal Mach Intell 6(1):81–86MATHCrossRef
22.
Zurück zum Zitat Song Q (2005) A robust information clustering algorithm. Neural Comput 17(12):2672–2698MATHCrossRef Song Q (2005) A robust information clustering algorithm. Neural Comput 17(12):2672–2698MATHCrossRef
23.
Zurück zum Zitat Still S, Bialek W (2004) How many clusters? An information-theoretic perspective. Neural Comput 16:2483–2506MATHCrossRef Still S, Bialek W (2004) How many clusters? An information-theoretic perspective. Neural Comput 16:2483–2506MATHCrossRef
24.
Zurück zum Zitat Yang XL, Song Q, Zhang WB (2006) A kernel-based deterministic annealing algorithm for data clustering. IEE Vis Image Signal Process 153(5):557–568CrossRef Yang XL, Song Q, Zhang WB (2006) A kernel-based deterministic annealing algorithm for data clustering. IEE Vis Image Signal Process 153(5):557–568CrossRef
25.
Zurück zum Zitat Yang XL, Song Q, Wang Y (2007) A weighted support vector machine for data classification. Int J Pattern Recogn Artif Intell 21(5):961–976CrossRef Yang XL, Song Q, Wang Y (2007) A weighted support vector machine for data classification. Int J Pattern Recogn Artif Intell 21(5):961–976CrossRef
26.
Zurück zum Zitat Zhang JS, Leung YW (2004) Improved possibilistic c-means clustering algorithms. IEEE Trans Fuzzy Syst 12(2):209–217CrossRefMathSciNet Zhang JS, Leung YW (2004) Improved possibilistic c-means clustering algorithms. IEEE Trans Fuzzy Syst 12(2):209–217CrossRefMathSciNet
Metadaten
Titel
A novel pruning approach for robust data clustering
verfasst von
Xu-Lei Yang
Qing Song
Yi-Lei Wu
Ai-Ze Cao
Publikationsdatum
01.10.2009
Verlag
Springer-Verlag
Erschienen in
Neural Computing and Applications / Ausgabe 7/2009
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-009-0281-z

Weitere Artikel der Ausgabe 7/2009

Neural Computing and Applications 7/2009 Zur Ausgabe

Premium Partner