A hybrid genetic algorithm–fuzzy c-means approach for incomplete data clustering based on nearest-neighbor intervals

Li, Dan; Gu, Hong; Zhang, Liyong

doi:10.1007/s00500-013-0997-7

A hybrid genetic algorithm–fuzzy c-means approach for incomplete data clustering based on nearest-neighbor intervals

Methodologies and Application
Published: 24 February 2013

Volume 17, pages 1787–1796, (2013)
Cite this article

Soft Computing Aims and scope Submit manuscript

Dan Li¹,
Hong Gu¹ &
Liyong Zhang¹

703 Accesses
34 Citations
Explore all metrics

Abstract

Incomplete data are often encountered in data sets used in clustering problems, and inappropriate treatment of incomplete data can significantly degrade the clustering performance. In view of the uncertainty of missing attributes, we put forward an interval representation of missing attributes based on nearest-neighbor information, named nearest-neighbor interval, and a hybrid approach utilizing genetic algorithm and fuzzy c-means is presented for incomplete data clustering. The overall algorithm is within the genetic algorithm framework, which searches for appropriate imputations of missing attributes in corresponding nearest-neighbor intervals to recover the incomplete data set, and hybridizes fuzzy c-means to perform clustering analysis and provide fitness metric for genetic optimization simultaneously. Several experimental results on a set of real-life data sets are presented to demonstrate the better clustering performance of our hybrid approach over the compared methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Robust Fuzzy c-Means Clustering Algorithm for Incomplete Data

Fuzzy c-Means Clustering of Incomplete Data Using Dimension-Wise Fuzzy Variances of Clusters

FIT2COMIn – Robust Clustering Algorithm for Incomplete Data

References

Acuna E, Rodriguez C (2004) The treatment of missing values and its effect in the classifier accuracy. Classification, clustering and data mining applications, vol 3. pp 639–648
Bai H, Zhang P, Ajjarapu V (2009) A novel parameter identification approach via hybrid learning for aggregate load modeling. IEEE Trans Power Syst 24:1145–1154
Article Google Scholar
Bandyopadhyay S (2005) Simulated annealing using a reversible jump Markov chain Monte Carlo algorithm for fuzzy clustering. IEEE Trans Knowl Data Eng 17:479–490
Article Google Scholar
Bandyopadhyay S, Sara S (2008) A point symmetry-based clustering technique for automatic evolution of clusters. IEEE Trans Knowl Data Eng 20:1441–1457
Article Google Scholar
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New York
Book MATH Google Scholar
Blake CL, Merz CJ (1998) UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA
Blickle T, Thiele L (1996) A comparison of selection schemes used in evolutionary algorithms. Evol Comput 4:361–394
Article Google Scholar
Chang PC, Liao TW (2006) Combing SOM and fuzzy rule base for flow time prediction in semiconductor manufacturing factory. Appl Soft Comput 6:198–206
Article Google Scholar
Chang PC, Liu CH, Fan CY (2009) Data clustering and fuzzy neural network for sales forecasting: a case study in printed circuit board industry. Knowl Based Syst 22:344–355
Google Scholar
Chang PC, Fan CY, Dzan WY (2010) A CBR-based fuzzy decision tree approach for database classification. Expert Syst Appl 37:214–225
Article Google Scholar
Davis L (1991) Handbook of genetic algorithms. Van Nostrand Reinhold, New York
Google Scholar
Deb K (2001) Multiobjective optimization using evolutionary algorithms. Wiley, Chichester
Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1–38
MathSciNet MATH Google Scholar
Dixon JK (1979) Pattern recognition with partly missing data. IEEE Trans Syst Man Cybern 9:617–621
Article Google Scholar
Farhangfar A, Kurgan LA, Pedrycz W (2007) A novel framework for imputation of missing values in databases. IEEE Trans Syst Man Cybern A 37:692–709
Article Google Scholar
Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Menlo Park
Hathaway RJ, Bezdek JC (1995) Optimization of clustering criteria by reformulation. IEEE Trans Fuzzy Syst 3:241–245
Article Google Scholar
Hathaway RJ, Bezdek JC (2001) Fuzzy c-means clustering of incomplete data. IEEE Trans Syst Man Cybern Part B 31:735–744
Article Google Scholar
Hathaway RJ, Bezdek JC (2002) Clustering incomplete relational data using the non-Euclidean relational fuzzy c-means algorithm. Pattern Recognit Lett 23:151–160
Article MATH Google Scholar
Honda K, Ichihashi H (2004) Linear fuzzy clustering techniques with missing values and their application to local principle component analysis. IEEE Trans Fuzzy Syst 12:183–193
Article Google Scholar
Hoppner F, Klawonn F, Kruse R, Runkler T (1999) Fuzzy cluster analysis: methods for classification data analysis and image recognition. Wiley, New York
Google Scholar
Huang X, Zhu Q (2002) A pseudo-nearest-neighbor approach for missing data recovery on Gaussian random data sets. Pattern Recognit Lett 23:1613–1622
Article MathSciNet MATH Google Scholar
Leung FHF, Lam HK, Ling SH, Tam PKS (2003) Tuning of the structure and parameters of a neural network using an improved genetic algorithm. IEEE Trans Neural Netw 14:79–88
Article Google Scholar
Li D, Gu H, Zhang LY (2010a) A fuzzy c-means clustering algorithm based on nearest-neighbor intervals for incomplete data. Expert Syst Appl 37:6942–6947
Article Google Scholar
Li D, Zhong CQ, Zhang LY (2010) Fuzzy c-means Clustering of partially missing data sets based on statistical representation. In: Proceedings of the 7th international conference on fuzzy systems and knowledge discovery, pp 460–464
Lim CP, Leong JH, Kuan MM (2005) A hybrid neural network system for pattern classification tasks with missing features. IEEE Trans Pattern Anal Mach Intell 27:648–653
Article Google Scholar
Liu YG, Chen KF, Liao XF, Zhang W (2004) A genetic clustering method for intrusion detection. Pattern Recognit 37:927–942
Article Google Scholar
Mclachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Marcel Dekker, New York
MATH Google Scholar
Michalewicz Z (1994) Genetic algorithms + data structure = evolution programs. Springer, New York
Google Scholar
Miyamoto S, Takata O, Umayahara K (1998) Handling missing values in fuzzy c-means. In: Proceedings of the third Asian fuzzy systems symposium, pp 139–142
Mukhopadhyay A, Maulik U, Bandyopadhyay S (2009) Multiobjective genetic algorithm-based fuzzy clustering of categorical attributes. IEEE Trans Evol Comput 13:991–1005
Article Google Scholar
Ren ZW, San Y (2007) Improvement of real-valued genetic algorithm and performance study. Acta Electronica Sinica 35:269–274 (in Chinese)
Google Scholar
Silva EL, Gil HA, Areiza JM (2000) Transmission network expansion planning under an improved genetic algorithm. IEEE Trans Power Syst 15:1168–1175
Article Google Scholar
Stade I (1996) Hot deck imputation procedures. In: Incomplete data in sample survey symposium on incomplete data proceedings, pp 225–248
Su JP, Lee TE, Yu KW (2009) A combined hard and soft variable-structure control scheme for a class of nonlinear systems. IEEE Trans Ind Electron 56:3305–3313
Article Google Scholar
Timm H, Doring C, Kruse R (2004) Different approaches to fuzzy clustering of incomplete data sets. Int J Approx Reason 35:239–249
Article MathSciNet MATH Google Scholar
Wei CH, Fahn CS (2002) The multisynapse neural network and its application to fuzzy clustering. IEEE Trans Neural Netw 13:600–618
Article Google Scholar
Zhu JJ, Liu SX, Wang MG (2004) Estimation of weight vector of interval numbers judgment matrix in AHP using genetic algorithm. J Syst Eng 19:343–349 (in Chinese)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Control Science and Engineering, Dalian University of Technology, Dalian, 116024, China
Dan Li, Hong Gu & Liyong Zhang

Authors

Dan Li
View author publications
You can also search for this author in PubMed Google Scholar
Hong Gu
View author publications
You can also search for this author in PubMed Google Scholar
Liyong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dan Li.

Additional information

Communicated by T. P. Hong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, D., Gu, H. & Zhang, L. A hybrid genetic algorithm–fuzzy c-means approach for incomplete data clustering based on nearest-neighbor intervals. Soft Comput 17, 1787–1796 (2013). https://doi.org/10.1007/s00500-013-0997-7

Download citation

Published: 24 February 2013
Issue Date: October 2013
DOI: https://doi.org/10.1007/s00500-013-0997-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid genetic algorithm–fuzzy c-means approach for incomplete data clustering based on nearest-neighbor intervals

Abstract

Access this article

Similar content being viewed by others

A Robust Fuzzy c-Means Clustering Algorithm for Incomplete Data

Fuzzy c-Means Clustering of Incomplete Data Using Dimension-Wise Fuzzy Variances of Clusters

FIT2COMIn – Robust Clustering Algorithm for Incomplete Data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A hybrid genetic algorithm–fuzzy c-means approach for incomplete data clustering based on nearest-neighbor intervals

Abstract

Access this article

Similar content being viewed by others

A Robust Fuzzy c-Means Clustering Algorithm for Incomplete Data

Fuzzy c-Means Clustering of Incomplete Data Using Dimension-Wise Fuzzy Variances of Clusters

FIT2COMIn – Robust Clustering Algorithm for Incomplete Data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation