Skip to main content
Top
Published in: Neural Computing and Applications 14/2020

19-10-2019 | Original Article

A Novel Fuzzy Rough Clustering Parameter-based missing value imputation

Authors: P. S. Raja, K. Sasirekha, K. Thangavel

Published in: Neural Computing and Applications | Issue 14/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

For a long time, missing values are the biggest challenging problem in data mining, machine learning and statistical analysis. In the current scenario, various methods exist to handle the missing values as it’s an important task to discover meaningful information. However, the most frequently used method to handle the missing values in a large dataset is discarding the instances with missing values. In such situation, deletion of instances with missing values causes loss of crucial information, which affects the performance of algorithms. Hence, an intelligent method needs to handle the missing values. In the recent past, the fuzzy and rough set has been widely employed in many applications. In this research work, a Novel Fuzzy C-Means Rough Parameter-based missing value imputation method is proposed with the hybridization of the fuzzy and rough set to handle missing values. The proposed algorithm is capable of handling the situation of uncertainty and vagueness in datasets through rough and fuzzy sets while maintaining vital information. The experimentation has been carried out on three benchmark datasets such as the Dukes’ B colon cancer dataset, the Mice Protein Expression and Yeast datasets to asses the efficacy of the proposed method. It is observed that the proposed method produces improved results than Fuzzy C-Means Centroid-based missing value imputation and Fuzzy C-Means Parameter-based missing value imputation method.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Rey-del-Castillo P, Cardeñosa J (2012) Fuzzy min-max neural networks for categorical data: application to missing data imputation. Neural Comput Appl 21(6):1349–1362CrossRef Rey-del-Castillo P, Cardeñosa J (2012) Fuzzy min-max neural networks for categorical data: application to missing data imputation. Neural Comput Appl 21(6):1349–1362CrossRef
2.
go back to reference García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review. Neural Comput Appl 19(2):263–282CrossRef García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review. Neural Comput Appl 19(2):263–282CrossRef
3.
go back to reference Raja PS, Thangavel K (2016) Soft clustering based missing value imputation. In: Annual convention of the Computer Society of India. Springer, Singapore, pp 119–133 Raja PS, Thangavel K (2016) Soft clustering based missing value imputation. In: Annual convention of the Computer Society of India. Springer, Singapore, pp 119–133
4.
go back to reference Liu ZG, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn 52:85–95CrossRef Liu ZG, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn 52:85–95CrossRef
5.
go back to reference Amiri M, Jensen R (2016) Missing data imputation using fuzzy-rough methods. Neurocomputing 205:152–164CrossRef Amiri M, Jensen R (2016) Missing data imputation using fuzzy-rough methods. Neurocomputing 205:152–164CrossRef
6.
go back to reference Tuikkala J, Elo LL, Nevalainen OS, Aittokallio T (2008) Missing value imputation improves clustering and interpretation of gene expression microarray data. BMC Bioinform 9(1):202CrossRef Tuikkala J, Elo LL, Nevalainen OS, Aittokallio T (2008) Missing value imputation improves clustering and interpretation of gene expression microarray data. BMC Bioinform 9(1):202CrossRef
7.
go back to reference Rahman MM, Davis DN (2013) Machine learning-based missing value imputation method for clinical datasets. In: Yang GC, Ao S, Gelman L (eds) IAENG transactions on engineering technologies. Springer, Dordrecht, pp 245–257CrossRef Rahman MM, Davis DN (2013) Machine learning-based missing value imputation method for clinical datasets. In: Yang GC, Ao S, Gelman L (eds) IAENG transactions on engineering technologies. Springer, Dordrecht, pp 245–257CrossRef
8.
go back to reference Tian J, Yu B, Yu D, Ma S (2014) Missing data analyses: a hybrid multiple imputation algorithm using Gray System Theory and entropy based on clustering. Appl Intell 40(2):376–388CrossRef Tian J, Yu B, Yu D, Ma S (2014) Missing data analyses: a hybrid multiple imputation algorithm using Gray System Theory and entropy based on clustering. Appl Intell 40(2):376–388CrossRef
9.
go back to reference Liao Z, Lu X, Yang T, Wang H (2009) Missing data imputation: a fuzzy K-means clustering algorithm over a sliding window. In: Sixth international conference on fuzzy systems and knowledge discovery, 2009. FSKD’09, vol 3. IEEE, pp 133–137 Liao Z, Lu X, Yang T, Wang H (2009) Missing data imputation: a fuzzy K-means clustering algorithm over a sliding window. In: Sixth international conference on fuzzy systems and knowledge discovery, 2009. FSKD’09, vol 3. IEEE, pp 133–137
10.
go back to reference Luengo J, Sáez JA, Herrera F (2012) Missing data imputation for fuzzy rule-based classification systems. Soft Comput 16(5):863–881CrossRef Luengo J, Sáez JA, Herrera F (2012) Missing data imputation for fuzzy rule-based classification systems. Soft Comput 16(5):863–881CrossRef
11.
go back to reference Zhang Y, Kambhampati C, Davis DN, Goode K, Cleland JG (2012) A comparative study of missing value imputation with multiclass classification for clinical heart failure data. In: 2012 9th international conference on fuzzy systems and knowledge discovery (FSKD). IEEE, pp 2840–2844 Zhang Y, Kambhampati C, Davis DN, Goode K, Cleland JG (2012) A comparative study of missing value imputation with multiclass classification for clinical heart failure data. In: 2012 9th international conference on fuzzy systems and knowledge discovery (FSKD). IEEE, pp 2840–2844
12.
go back to reference Stefanowski J, Tsoukias A (2001) Incomplete information tables and rough classification. Comput Intell 17(3):545–566MATHCrossRef Stefanowski J, Tsoukias A (2001) Incomplete information tables and rough classification. Comput Intell 17(3):545–566MATHCrossRef
13.
go back to reference Pan R, Yang T, Cao J, Lu K, Zhang Z (2015) Missing data imputation by K nearest neighbours based on grey relational structure and mutual information. Appl Intell 43(3):614–632CrossRef Pan R, Yang T, Cao J, Lu K, Zhang Z (2015) Missing data imputation by K nearest neighbours based on grey relational structure and mutual information. Appl Intell 43(3):614–632CrossRef
14.
go back to reference Luengo J, García S, Herrera F (2012) On the choice of the best imputation methods for missing values considering three groups of classification methods. Knowl Inf Syst 32(1):77–108CrossRef Luengo J, García S, Herrera F (2012) On the choice of the best imputation methods for missing values considering three groups of classification methods. Knowl Inf Syst 32(1):77–108CrossRef
15.
go back to reference Li D, Deogun J, Spaulding W, Shuart B (2004) Towards missing data imputation: a study of fuzzy k-means clustering method. In: International conference on rough sets and current trends in computing. Springer, Berlin, Heidelberg, pp 573–579 Li D, Deogun J, Spaulding W, Shuart B (2004) Towards missing data imputation: a study of fuzzy k-means clustering method. In: International conference on rough sets and current trends in computing. Springer, Berlin, Heidelberg, pp 573–579
16.
go back to reference Li D, Deogun J, Spaulding W, Shuart B (2005) Dealing with missing data: algorithms based on fuzzy set and rough set theories. In: Peters JF, Skowron A (eds) Transactions on rough sets IV. Springer, Berlin, pp 37–57MATHCrossRef Li D, Deogun J, Spaulding W, Shuart B (2005) Dealing with missing data: algorithms based on fuzzy set and rough set theories. In: Peters JF, Skowron A (eds) Transactions on rough sets IV. Springer, Berlin, pp 37–57MATHCrossRef
17.
go back to reference Rahman MG, Islam MZ (2016) Missing value imputation using a fuzzy clustering-based EM approach. Knowl Inf Syst 46(2):389–422CrossRef Rahman MG, Islam MZ (2016) Missing value imputation using a fuzzy clustering-based EM approach. Knowl Inf Syst 46(2):389–422CrossRef
18.
go back to reference Tang J, Zhang G, Wang Y, Wang H, Liu F (2015) A hybrid approach to integrate fuzzy C-means based imputation method with genetic algorithm for missing traffic volume data estimation. Transp Res Part C Emerg Technol 51:29–40CrossRef Tang J, Zhang G, Wang Y, Wang H, Liu F (2015) A hybrid approach to integrate fuzzy C-means based imputation method with genetic algorithm for missing traffic volume data estimation. Transp Res Part C Emerg Technol 51:29–40CrossRef
19.
go back to reference García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR, Verleysen M (2008) K-nearest neighbours based on mutual information for incomplete data classification. In: ESANN, pp 37–42 García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR, Verleysen M (2008) K-nearest neighbours based on mutual information for incomplete data classification. In: ESANN, pp 37–42
20.
go back to reference Aydilek IB, Arslan A (2013) A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inf Sci 233:25–35CrossRef Aydilek IB, Arslan A (2013) A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inf Sci 233:25–35CrossRef
21.
go back to reference Zhang L, Lu W, Liu X, Pedrycz W, Zhong C (2016) Fuzzy c-means clustering of incomplete data based on probabilistic information granules of missing values. Knowl Based Syst 99:51–70CrossRef Zhang L, Lu W, Liu X, Pedrycz W, Zhong C (2016) Fuzzy c-means clustering of incomplete data based on probabilistic information granules of missing values. Knowl Based Syst 99:51–70CrossRef
22.
go back to reference Luengo J, García S, Herrera F (2010) A study on the use of imputation methods for experimentation with Radial Basis Function Network classifiers handling missing attribute values: the good synergy between RBFNs and EventCovering method. Neural Netw 23(3):406–418CrossRef Luengo J, García S, Herrera F (2010) A study on the use of imputation methods for experimentation with Radial Basis Function Network classifiers handling missing attribute values: the good synergy between RBFNs and EventCovering method. Neural Netw 23(3):406–418CrossRef
23.
go back to reference Peters G, Lampart M (2006) A partitive rough clustering algorithm. In: International conference on rough sets and current trends in computing. Springer, Berlin, pp 657–666 Peters G, Lampart M (2006) A partitive rough clustering algorithm. In: International conference on rough sets and current trends in computing. Springer, Berlin, pp 657–666
24.
go back to reference Panda S, Sahu S, Jena P, Chattopadhyay S (2012) Comparing fuzzy-C means and K-means clustering techniques: a comprehensive study. In: Wyld D, Zizka J, Nagamalai D (eds) Advances in computer science, engineering & applications. Springer, Berlin, pp 451–460CrossRef Panda S, Sahu S, Jena P, Chattopadhyay S (2012) Comparing fuzzy-C means and K-means clustering techniques: a comprehensive study. In: Wyld D, Zizka J, Nagamalai D (eds)  Advances in computer science, engineering & applications. Springer, Berlin, pp 451–460CrossRef
27.
go back to reference Hathaway RJ, Bezdek JC (2001) Fuzzy c-means clustering of incomplete data. IEEE Trans Syst Man Cybern Part B (Cybernetics) 31(5):735–744CrossRef Hathaway RJ, Bezdek JC (2001) Fuzzy c-means clustering of incomplete data. IEEE Trans Syst Man Cybern Part B (Cybernetics) 31(5):735–744CrossRef
28.
go back to reference Pawlak Z (1982) Rough sets. Int J Parallel Program 11(5):341–356MATH Pawlak Z (1982) Rough sets. Int J Parallel Program 11(5):341–356MATH
29.
go back to reference Pawlak Z (1998) Rough set theory and its applications to data analysis. Cybern Syst 29(7):661–688MATHCrossRef Pawlak Z (1998) Rough set theory and its applications to data analysis. Cybern Syst 29(7):661–688MATHCrossRef
30.
go back to reference Bonikowski Z, Bryniarski E, Wybraniec-Skardowska U (1998) Extensions and intentions in the rough set theory. Inf Sci 107(1–4):149–167MathSciNetMATHCrossRef Bonikowski Z, Bryniarski E, Wybraniec-Skardowska U (1998) Extensions and intentions in the rough set theory. Inf Sci 107(1–4):149–167MathSciNetMATHCrossRef
31.
go back to reference Liang J, Shi Z (2004) The information entropy, rough entropy and knowledge granulation in rough set theory. Int J Uncertain Fuzziness Knowl Based Syst 12(01):37–46MathSciNetMATHCrossRef Liang J, Shi Z (2004) The information entropy, rough entropy and knowledge granulation in rough set theory. Int J Uncertain Fuzziness Knowl Based Syst 12(01):37–46MathSciNetMATHCrossRef
32.
go back to reference Peters G, Lampart M, Weber R (2008) Evolutionary rough k-medoid clustering. In: Peters JF, Skowron A (eds) Transactions on rough sets VIII. Springer, Berlin, pp 289–306MATHCrossRef Peters G, Lampart M, Weber R (2008) Evolutionary rough k-medoid clustering. In: Peters JF, Skowron A (eds) Transactions on rough sets VIII. Springer, Berlin, pp 289–306MATHCrossRef
33.
go back to reference Peters G (2005) Outliers in rough k-means clustering. In: International conference on pattern recognition and machine intelligence. Springer, Berlin, Heidelberg, pp 702–707 Peters G (2005) Outliers in rough k-means clustering. In: International conference on pattern recognition and machine intelligence. Springer, Berlin, Heidelberg, pp 702–707
34.
go back to reference Lingras P, Peters G (2011) Rough clustering. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):64–72CrossRef Lingras P, Peters G (2011) Rough clustering. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):64–72CrossRef
35.
go back to reference Wang Y, Jatkoe T, Zhang Y, Mutch MG, Talantov D, Jiang J, Atkins D (2004) Gene expression profiles and molecular markers to predict recurrence of Dukes’ B colon cancer. J Clin Oncol 22(9):1564–1571CrossRef Wang Y, Jatkoe T, Zhang Y, Mutch MG, Talantov D, Jiang J, Atkins D (2004) Gene expression profiles and molecular markers to predict recurrence of Dukes’ B colon cancer. J Clin Oncol 22(9):1564–1571CrossRef
38.
go back to reference Crespo Turrado C, Sánchez Lasheras F, Calvo-Rollé JL, Piñón-Pazos AJ, de Cos Juez FJ (2015) A new missing data imputation algorithm applied to electrical data loggers. Sensors 15(12):31069–31082CrossRef Crespo Turrado C, Sánchez Lasheras F, Calvo-Rollé JL, Piñón-Pazos AJ, de Cos Juez FJ (2015) A new missing data imputation algorithm applied to electrical data loggers. Sensors 15(12):31069–31082CrossRef
39.
go back to reference Sim J, Lee JS, Kwon O (2015) Missing values and optimal selection of an imputation method and classification algorithm to improve the accuracy of ubiquitous computing applications. Math Probl Eng 2015:538613CrossRef Sim J, Lee JS, Kwon O (2015) Missing values and optimal selection of an imputation method and classification algorithm to improve the accuracy of ubiquitous computing applications. Math Probl Eng 2015:538613CrossRef
40.
go back to reference Bertsimas D, Pawlowski C, Zhuo YD (2017) From predictive methods to missing data imputation: an optimization approach. J Mach Learn Res 18(1):7133–7171MathSciNetMATH Bertsimas D, Pawlowski C, Zhuo YD (2017) From predictive methods to missing data imputation: an optimization approach. J Mach Learn Res 18(1):7133–7171MathSciNetMATH
Metadata
Title
A Novel Fuzzy Rough Clustering Parameter-based missing value imputation
Authors
P. S. Raja
K. Sasirekha
K. Thangavel
Publication date
19-10-2019
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 14/2020
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-019-04535-9

Other articles of this Issue 14/2020

Neural Computing and Applications 14/2020 Go to the issue

Premium Partner