nach oben

Soft Computing

Erschienen in:

08.07.2019 | Methodologies and Application

Missing value imputation using unsupervised machine learning techniques

verfasst von: P. S. Raja, K. Thangavel

Erschienen in: Soft Computing | Ausgabe 6/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In data mining, preprocessing is one of the essential processes which involves data normalization, noise removal, handling missing values, etc. This paper focuses on handling missing values using unsupervised machine learning techniques. Soft computation approaches are combined with the clustering techniques to form a novel method to handle the missing values, which help us to overcome the problems of inconsistency. Rough K-means centroid-based imputation method is proposed and compared with K-means centroid-based imputation method, fuzzy C-means centroid-based imputation method, K-means parameter-based imputation method, fuzzy C-means parameter-based imputation method, and rough K-means parameter-based imputation methods. The experimental analysis is carried out on four benchmark datasets, viz. Dermatology, Pima, Wisconsin, and Yeast datasets, which have taken from UCI data repository. The proposed method proves the efficacy of different datasets, and the results are also promising one.

Vorheriger Artikel A holistic optimization approach for inverted cart-pendulum control tuning

Nächster Artikel Supervised machine learning techniques and genetic optimization for occupational diseases risk prediction

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

Bezdek JC (2013) Pattern recognition with fuzzy objective function algorithms. Springer, BerlinMATH

Cannon RL, Dave JV, Bezdek JC (1986) Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Trans Pattern Anal Mach Intell 2:248–255CrossRef

Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57MathSciNetCrossRef

Gajawada S, Toshniwal D (2012) Missing value imputation method based on clustering and nearest neighbours. Int J Future Comput Commun 1(2):206CrossRef

García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review. Neural Comput Appl 19(2):263–282CrossRef

Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, AmsterdamMATH

Hathaway RJ, Bezdek JC (2001) Fuzzy c-means clustering of incomplete data. IEEE, PiscatawayCrossRef

Havens TC, Bezdek JC, Leckie C, Hall LO, Palaniswami M (2012) Fuzzy c-means algorithms for very large data. IEEE Trans Fuzzy Syst 20(6):1130–1146CrossRef

https://archive.ics.uci.edu/ml/datasets/Yeast

Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304CrossRef

Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666CrossRef

Khan SS, Ahmad A (2004) Cluster center initialization algorithm for K-means clustering. Pattern Recogn Lett 25(11):1293–1302CrossRef

Kondo Y, Salibian-Barrera M, Zamar R (2012) A robust and sparse K-means clustering algorithm, arXiv preprint arXiv:1201.6082

Li D, Deogun J, Spaulding W, Shuart B (2004) Towards missing data imputation: a study of fuzzy k-means clustering method. InRough Sets Curr Trends Comput 3066:573–579MathSciNetCrossRef

Li D, Deogun J, Spaulding W, Shuart B (2005) Dealing with missing data: algorithms based on fuzzy set and rough set theories. In: Peters JF, Skowron A (eds) Transactions on rough sets IV. Springer, Berlin, pp 37–57CrossRef

Lingras P, Peters G (2011) Rough clustering. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):64–72CrossRef

Liu ZG, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn 52:85–95CrossRef

Nelwamondo FV (2008) Computational intelligence techniques for missing data imputation. Doctoral dissertation, University of the Witwatersrand, Johannesburg

Panda S, Sahu S, Jena P, Chattopadhyay S (2012) Comparing fuzzy-C means and K-means clustering techniques: a comprehensive study. In: Wyld DC, Zizka J, Nagamalai D (eds) Proceedings of 2nd international conference on computer science, engineering and applications, vol 166. Advances in computer science, engineering & applications. Springer, Berlin, Heidelberg, pp 451–460

Pawlak Z (1998) Rough set theory and its applications to data analysis. Cybern Syst 29(7):661–688CrossRef

Peters G (2005) Outliers in rough k-means clustering. InPReMI, pp 702–707

Peters G (2006) Some refinements of rough k-means clustering. Pattern Recognit 39(8):1481–1491CrossRef

Peters G, Crespo F (2013) An illustrative comparison of rough k-means to classical clustering approaches. InRSFDGrC, pp 337–344

Peters G, Lampart M (2006) A partitive rough clustering algorithm. In: International conference on rough sets and current trends in computing. Springer, Berlin, pp 657–666CrossRef

Peters G, Lampart M, Weber R (2008) Evolutionary rough k-medoid clustering. Lect Notes Comput Sci 5084:289–306CrossRef

Rahman MM, Davis DN (2013) Machine learning-based missing value imputation method for clinical datasets. In: Yang G-C, Ao S-I, Gelman L (eds) IAENG transactions on engineering technologies. Springer, Dordrecht, pp 245–257CrossRef

Rahman MG, Islam MZ (2016) Missing value imputation using a fuzzy clustering-based EM approach. Knowl Inf Syst 46(2):389–422CrossRef

Raja PS, Thangavel K (2016) Soft clustering based missing value imputation. In: Subramanian S et al (eds) Annual convention of the computer society of India. Springer, Singapore, pp 119–133

Rey-del-Castillo P, Cardeñosa J (2012) Fuzzy min-max neural networks for categorical data: application to missing data imputation. Neural Comput Appl 21(6):1349–1362CrossRef

Suguna N, Thanushkodi KG (2011) Predicting missing attribute values using k-means clustering. J Comput Sci 7(2):216CrossRef

Tuikkala J, Elo LL, Nevalainen OS, Aittokallio T (2008) Missing value imputation improves clustering and interpretation of gene expression microarray data. BMC Bioinform 9(1):202CrossRef

Zhang S, Zhang J, Zhu X, Qin Y, Zhang C (2008) Missing value imputation based on data clustering. In: Gavrilova ML, Tan CJK (eds) Transactions on computational science I. Lecture notes in computer science, vol 4750, pp 128–138

Titel: Missing value imputation using unsupervised machine learning techniques
verfasst von: P. S. Raja
K. Thangavel
Publikationsdatum: 08.07.2019
Verlag: Springer Berlin Heidelberg
Erschienen in: Soft Computing / Ausgabe 6/2020
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI: https://doi.org/10.1007/s00500-019-04199-6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 6/2020

The effects of consumer confusion on hotel brand loyalty: an application of linguistic nonlinear regression model in the hospitality sector

Rough set-based feature selection for credit risk prediction using weight-adjusted boosting ensemble method

Adaptively secure efficient broadcast encryption with constant-size secret key and ciphertext

An efficient neural network for solving convex optimization problems with a nonlinear complementarity problem function

A unified algorithm based on HTS and self-adapting PSO for the construction of octagonal and rectilinear SMT

A new preference disaggregation method for clustering problem: DISclustering

Premium Partner