Skip to main content
Erschienen in: Soft Computing 6/2020

08.07.2019 | Methodologies and Application

Missing value imputation using unsupervised machine learning techniques

verfasst von: P. S. Raja, K. Thangavel

Erschienen in: Soft Computing | Ausgabe 6/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In data mining, preprocessing is one of the essential processes which involves data normalization, noise removal, handling missing values, etc. This paper focuses on handling missing values using unsupervised machine learning techniques. Soft computation approaches are combined with the clustering techniques to form a novel method to handle the missing values, which help us to overcome the problems of inconsistency. Rough K-means centroid-based imputation method is proposed and compared with K-means centroid-based imputation method, fuzzy C-means centroid-based imputation method, K-means parameter-based imputation method, fuzzy C-means parameter-based imputation method, and rough K-means parameter-based imputation methods. The experimental analysis is carried out on four benchmark datasets, viz. Dermatology, Pima, Wisconsin, and Yeast datasets, which have taken from UCI data repository. The proposed method proves the efficacy of different datasets, and the results are also promising one.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Bezdek JC (2013) Pattern recognition with fuzzy objective function algorithms. Springer, BerlinMATH Bezdek JC (2013) Pattern recognition with fuzzy objective function algorithms. Springer, BerlinMATH
Zurück zum Zitat Cannon RL, Dave JV, Bezdek JC (1986) Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Trans Pattern Anal Mach Intell 2:248–255CrossRef Cannon RL, Dave JV, Bezdek JC (1986) Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Trans Pattern Anal Mach Intell 2:248–255CrossRef
Zurück zum Zitat Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57MathSciNetCrossRef Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57MathSciNetCrossRef
Zurück zum Zitat Gajawada S, Toshniwal D (2012) Missing value imputation method based on clustering and nearest neighbours. Int J Future Comput Commun 1(2):206CrossRef Gajawada S, Toshniwal D (2012) Missing value imputation method based on clustering and nearest neighbours. Int J Future Comput Commun 1(2):206CrossRef
Zurück zum Zitat García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review. Neural Comput Appl 19(2):263–282CrossRef García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review. Neural Comput Appl 19(2):263–282CrossRef
Zurück zum Zitat Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, AmsterdamMATH Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, AmsterdamMATH
Zurück zum Zitat Hathaway RJ, Bezdek JC (2001) Fuzzy c-means clustering of incomplete data. IEEE, PiscatawayCrossRef Hathaway RJ, Bezdek JC (2001) Fuzzy c-means clustering of incomplete data. IEEE, PiscatawayCrossRef
Zurück zum Zitat Havens TC, Bezdek JC, Leckie C, Hall LO, Palaniswami M (2012) Fuzzy c-means algorithms for very large data. IEEE Trans Fuzzy Syst 20(6):1130–1146CrossRef Havens TC, Bezdek JC, Leckie C, Hall LO, Palaniswami M (2012) Fuzzy c-means algorithms for very large data. IEEE Trans Fuzzy Syst 20(6):1130–1146CrossRef
Zurück zum Zitat Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304CrossRef Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304CrossRef
Zurück zum Zitat Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666CrossRef Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666CrossRef
Zurück zum Zitat Khan SS, Ahmad A (2004) Cluster center initialization algorithm for K-means clustering. Pattern Recogn Lett 25(11):1293–1302CrossRef Khan SS, Ahmad A (2004) Cluster center initialization algorithm for K-means clustering. Pattern Recogn Lett 25(11):1293–1302CrossRef
Zurück zum Zitat Kondo Y, Salibian-Barrera M, Zamar R (2012) A robust and sparse K-means clustering algorithm, arXiv preprint arXiv:1201.6082 Kondo Y, Salibian-Barrera M, Zamar R (2012) A robust and sparse K-means clustering algorithm, arXiv preprint arXiv:1201.6082
Zurück zum Zitat Li D, Deogun J, Spaulding W, Shuart B (2004) Towards missing data imputation: a study of fuzzy k-means clustering method. InRough Sets Curr Trends Comput 3066:573–579MathSciNetCrossRef Li D, Deogun J, Spaulding W, Shuart B (2004) Towards missing data imputation: a study of fuzzy k-means clustering method. InRough Sets Curr Trends Comput 3066:573–579MathSciNetCrossRef
Zurück zum Zitat Li D, Deogun J, Spaulding W, Shuart B (2005) Dealing with missing data: algorithms based on fuzzy set and rough set theories. In: Peters JF, Skowron A (eds) Transactions on rough sets IV. Springer, Berlin, pp 37–57CrossRef Li D, Deogun J, Spaulding W, Shuart B (2005) Dealing with missing data: algorithms based on fuzzy set and rough set theories. In: Peters JF, Skowron A (eds) Transactions on rough sets IV. Springer, Berlin, pp 37–57CrossRef
Zurück zum Zitat Lingras P, Peters G (2011) Rough clustering. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):64–72CrossRef Lingras P, Peters G (2011) Rough clustering. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):64–72CrossRef
Zurück zum Zitat Liu ZG, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn 52:85–95CrossRef Liu ZG, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn 52:85–95CrossRef
Zurück zum Zitat Nelwamondo FV (2008) Computational intelligence techniques for missing data imputation. Doctoral dissertation, University of the Witwatersrand, Johannesburg Nelwamondo FV (2008) Computational intelligence techniques for missing data imputation. Doctoral dissertation, University of the Witwatersrand, Johannesburg
Zurück zum Zitat Panda S, Sahu S, Jena P, Chattopadhyay S (2012) Comparing fuzzy-C means and K-means clustering techniques: a comprehensive study. In: Wyld DC, Zizka J, Nagamalai D (eds) Proceedings of 2nd international conference on computer science, engineering and applications, vol 166. Advances in computer science, engineering & applications. Springer, Berlin, Heidelberg, pp 451–460 Panda S, Sahu S, Jena P, Chattopadhyay S (2012) Comparing fuzzy-C means and K-means clustering techniques: a comprehensive study. In: Wyld DC, Zizka J, Nagamalai D (eds) Proceedings of 2nd international conference on computer science, engineering and applications, vol 166. Advances in computer science, engineering & applications. Springer, Berlin, Heidelberg, pp 451–460
Zurück zum Zitat Pawlak Z (1998) Rough set theory and its applications to data analysis. Cybern Syst 29(7):661–688CrossRef Pawlak Z (1998) Rough set theory and its applications to data analysis. Cybern Syst 29(7):661–688CrossRef
Zurück zum Zitat Peters G (2005) Outliers in rough k-means clustering. InPReMI, pp 702–707 Peters G (2005) Outliers in rough k-means clustering. InPReMI, pp 702–707
Zurück zum Zitat Peters G (2006) Some refinements of rough k-means clustering. Pattern Recognit 39(8):1481–1491CrossRef Peters G (2006) Some refinements of rough k-means clustering. Pattern Recognit 39(8):1481–1491CrossRef
Zurück zum Zitat Peters G, Crespo F (2013) An illustrative comparison of rough k-means to classical clustering approaches. InRSFDGrC, pp 337–344 Peters G, Crespo F (2013) An illustrative comparison of rough k-means to classical clustering approaches. InRSFDGrC, pp 337–344
Zurück zum Zitat Peters G, Lampart M (2006) A partitive rough clustering algorithm. In: International conference on rough sets and current trends in computing. Springer, Berlin, pp 657–666CrossRef Peters G, Lampart M (2006) A partitive rough clustering algorithm. In: International conference on rough sets and current trends in computing. Springer, Berlin, pp 657–666CrossRef
Zurück zum Zitat Peters G, Lampart M, Weber R (2008) Evolutionary rough k-medoid clustering. Lect Notes Comput Sci 5084:289–306CrossRef Peters G, Lampart M, Weber R (2008) Evolutionary rough k-medoid clustering. Lect Notes Comput Sci 5084:289–306CrossRef
Zurück zum Zitat Rahman MM, Davis DN (2013) Machine learning-based missing value imputation method for clinical datasets. In: Yang G-C, Ao S-I, Gelman L (eds) IAENG transactions on engineering technologies. Springer, Dordrecht, pp 245–257CrossRef Rahman MM, Davis DN (2013) Machine learning-based missing value imputation method for clinical datasets. In: Yang G-C, Ao S-I, Gelman L (eds) IAENG transactions on engineering technologies. Springer, Dordrecht, pp 245–257CrossRef
Zurück zum Zitat Rahman MG, Islam MZ (2016) Missing value imputation using a fuzzy clustering-based EM approach. Knowl Inf Syst 46(2):389–422CrossRef Rahman MG, Islam MZ (2016) Missing value imputation using a fuzzy clustering-based EM approach. Knowl Inf Syst 46(2):389–422CrossRef
Zurück zum Zitat Raja PS, Thangavel K (2016) Soft clustering based missing value imputation. In: Subramanian S et al (eds) Annual convention of the computer society of India. Springer, Singapore, pp 119–133 Raja PS, Thangavel K (2016) Soft clustering based missing value imputation. In: Subramanian S et al (eds) Annual convention of the computer society of India. Springer, Singapore, pp 119–133
Zurück zum Zitat Rey-del-Castillo P, Cardeñosa J (2012) Fuzzy min-max neural networks for categorical data: application to missing data imputation. Neural Comput Appl 21(6):1349–1362CrossRef Rey-del-Castillo P, Cardeñosa J (2012) Fuzzy min-max neural networks for categorical data: application to missing data imputation. Neural Comput Appl 21(6):1349–1362CrossRef
Zurück zum Zitat Suguna N, Thanushkodi KG (2011) Predicting missing attribute values using k-means clustering. J Comput Sci 7(2):216CrossRef Suguna N, Thanushkodi KG (2011) Predicting missing attribute values using k-means clustering. J Comput Sci 7(2):216CrossRef
Zurück zum Zitat Tuikkala J, Elo LL, Nevalainen OS, Aittokallio T (2008) Missing value imputation improves clustering and interpretation of gene expression microarray data. BMC Bioinform 9(1):202CrossRef Tuikkala J, Elo LL, Nevalainen OS, Aittokallio T (2008) Missing value imputation improves clustering and interpretation of gene expression microarray data. BMC Bioinform 9(1):202CrossRef
Zurück zum Zitat Zhang S, Zhang J, Zhu X, Qin Y, Zhang C (2008) Missing value imputation based on data clustering. In: Gavrilova ML, Tan CJK (eds) Transactions on computational science I. Lecture notes in computer science, vol 4750, pp 128–138 Zhang S, Zhang J, Zhu X, Qin Y, Zhang C (2008) Missing value imputation based on data clustering. In: Gavrilova ML, Tan CJK (eds) Transactions on computational science I. Lecture notes in computer science, vol 4750, pp 128–138
Metadaten
Titel
Missing value imputation using unsupervised machine learning techniques
verfasst von
P. S. Raja
K. Thangavel
Publikationsdatum
08.07.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 6/2020
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-019-04199-6

Weitere Artikel der Ausgabe 6/2020

Soft Computing 6/2020 Zur Ausgabe

Premium Partner