Skip to main content

Advertisement

Log in

A hybrid genetic algorithm–fuzzy c-means approach for incomplete data clustering based on nearest-neighbor intervals

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Incomplete data are often encountered in data sets used in clustering problems, and inappropriate treatment of incomplete data can significantly degrade the clustering performance. In view of the uncertainty of missing attributes, we put forward an interval representation of missing attributes based on nearest-neighbor information, named nearest-neighbor interval, and a hybrid approach utilizing genetic algorithm and fuzzy c-means is presented for incomplete data clustering. The overall algorithm is within the genetic algorithm framework, which searches for appropriate imputations of missing attributes in corresponding nearest-neighbor intervals to recover the incomplete data set, and hybridizes fuzzy c-means to perform clustering analysis and provide fitness metric for genetic optimization simultaneously. Several experimental results on a set of real-life data sets are presented to demonstrate the better clustering performance of our hybrid approach over the compared methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Acuna E, Rodriguez C (2004) The treatment of missing values and its effect in the classifier accuracy. Classification, clustering and data mining applications, vol 3. pp 639–648

  • Bai H, Zhang P, Ajjarapu V (2009) A novel parameter identification approach via hybrid learning for aggregate load modeling. IEEE Trans Power Syst 24:1145–1154

    Article  Google Scholar 

  • Bandyopadhyay S (2005) Simulated annealing using a reversible jump Markov chain Monte Carlo algorithm for fuzzy clustering. IEEE Trans Knowl Data Eng 17:479–490

    Article  Google Scholar 

  • Bandyopadhyay S, Sara S (2008) A point symmetry-based clustering technique for automatic evolution of clusters. IEEE Trans Knowl Data Eng 20:1441–1457

    Article  Google Scholar 

  • Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New York

    Book  MATH  Google Scholar 

  • Blake CL, Merz CJ (1998) UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA

  • Blickle T, Thiele L (1996) A comparison of selection schemes used in evolutionary algorithms. Evol Comput 4:361–394

    Article  Google Scholar 

  • Chang PC, Liao TW (2006) Combing SOM and fuzzy rule base for flow time prediction in semiconductor manufacturing factory. Appl Soft Comput 6:198–206

    Article  Google Scholar 

  • Chang PC, Liu CH, Fan CY (2009) Data clustering and fuzzy neural network for sales forecasting: a case study in printed circuit board industry. Knowl Based Syst 22:344–355

    Google Scholar 

  • Chang PC, Fan CY, Dzan WY (2010) A CBR-based fuzzy decision tree approach for database classification. Expert Syst Appl 37:214–225

    Article  Google Scholar 

  • Davis L (1991) Handbook of genetic algorithms. Van Nostrand Reinhold, New York

    Google Scholar 

  • Deb K (2001) Multiobjective optimization using evolutionary algorithms. Wiley, Chichester

    Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Dixon JK (1979) Pattern recognition with partly missing data. IEEE Trans Syst Man Cybern 9:617–621

    Article  Google Scholar 

  • Farhangfar A, Kurgan LA, Pedrycz W (2007) A novel framework for imputation of missing values in databases. IEEE Trans Syst Man Cybern A 37:692–709

    Article  Google Scholar 

  • Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Menlo Park

  • Hathaway RJ, Bezdek JC (1995) Optimization of clustering criteria by reformulation. IEEE Trans Fuzzy Syst 3:241–245

    Article  Google Scholar 

  • Hathaway RJ, Bezdek JC (2001) Fuzzy c-means clustering of incomplete data. IEEE Trans Syst Man Cybern Part B 31:735–744

    Article  Google Scholar 

  • Hathaway RJ, Bezdek JC (2002) Clustering incomplete relational data using the non-Euclidean relational fuzzy c-means algorithm. Pattern Recognit Lett 23:151–160

    Article  MATH  Google Scholar 

  • Honda K, Ichihashi H (2004) Linear fuzzy clustering techniques with missing values and their application to local principle component analysis. IEEE Trans Fuzzy Syst 12:183–193

    Article  Google Scholar 

  • Hoppner F, Klawonn F, Kruse R, Runkler T (1999) Fuzzy cluster analysis: methods for classification data analysis and image recognition. Wiley, New York

    Google Scholar 

  • Huang X, Zhu Q (2002) A pseudo-nearest-neighbor approach for missing data recovery on Gaussian random data sets. Pattern Recognit Lett 23:1613–1622

    Article  MathSciNet  MATH  Google Scholar 

  • Leung FHF, Lam HK, Ling SH, Tam PKS (2003) Tuning of the structure and parameters of a neural network using an improved genetic algorithm. IEEE Trans Neural Netw 14:79–88

    Article  Google Scholar 

  • Li D, Gu H, Zhang LY (2010a) A fuzzy c-means clustering algorithm based on nearest-neighbor intervals for incomplete data. Expert Syst Appl 37:6942–6947

    Article  Google Scholar 

  • Li D, Zhong CQ, Zhang LY (2010) Fuzzy c-means Clustering of partially missing data sets based on statistical representation. In: Proceedings of the 7th international conference on fuzzy systems and knowledge discovery, pp 460–464

  • Lim CP, Leong JH, Kuan MM (2005) A hybrid neural network system for pattern classification tasks with missing features. IEEE Trans Pattern Anal Mach Intell 27:648–653

    Article  Google Scholar 

  • Liu YG, Chen KF, Liao XF, Zhang W (2004) A genetic clustering method for intrusion detection. Pattern Recognit 37:927–942

    Article  Google Scholar 

  • Mclachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Marcel Dekker, New York

    MATH  Google Scholar 

  • Michalewicz Z (1994) Genetic algorithms + data structure = evolution programs. Springer, New York

    Google Scholar 

  • Miyamoto S, Takata O, Umayahara K (1998) Handling missing values in fuzzy c-means. In: Proceedings of the third Asian fuzzy systems symposium, pp 139–142

  • Mukhopadhyay A, Maulik U, Bandyopadhyay S (2009) Multiobjective genetic algorithm-based fuzzy clustering of categorical attributes. IEEE Trans Evol Comput 13:991–1005

    Article  Google Scholar 

  • Ren ZW, San Y (2007) Improvement of real-valued genetic algorithm and performance study. Acta Electronica Sinica 35:269–274 (in Chinese)

    Google Scholar 

  • Silva EL, Gil HA, Areiza JM (2000) Transmission network expansion planning under an improved genetic algorithm. IEEE Trans Power Syst 15:1168–1175

    Article  Google Scholar 

  • Stade I (1996) Hot deck imputation procedures. In: Incomplete data in sample survey symposium on incomplete data proceedings, pp 225–248

  • Su JP, Lee TE, Yu KW (2009) A combined hard and soft variable-structure control scheme for a class of nonlinear systems. IEEE Trans Ind Electron 56:3305–3313

    Article  Google Scholar 

  • Timm H, Doring C, Kruse R (2004) Different approaches to fuzzy clustering of incomplete data sets. Int J Approx Reason 35:239–249

    Article  MathSciNet  MATH  Google Scholar 

  • Wei CH, Fahn CS (2002) The multisynapse neural network and its application to fuzzy clustering. IEEE Trans Neural Netw 13:600–618

    Article  Google Scholar 

  • Zhu JJ, Liu SX, Wang MG (2004) Estimation of weight vector of interval numbers judgment matrix in AHP using genetic algorithm. J Syst Eng 19:343–349 (in Chinese)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dan Li.

Additional information

Communicated by T. P. Hong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, D., Gu, H. & Zhang, L. A hybrid genetic algorithm–fuzzy c-means approach for incomplete data clustering based on nearest-neighbor intervals. Soft Comput 17, 1787–1796 (2013). https://doi.org/10.1007/s00500-013-0997-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-013-0997-7

Keywords

Navigation