The presence of missing values in a dataset can affect the performance of a classifier constructed using that dataset as a training sample. Several methods have been proposed to treat missing data and the one used most frequently deletes instances containing at least one missing value of a feature. In this paper we carry out experiments with twelve datasets to evaluate the effect on the misclassification error rate of four methods for dealing with missing values: the case deletion method, mean imputation, median imputation, and the KNN imputation procedure. The classifiers considered were the Linear Discriminant Analysis (LDA) and the KNN classifier. The first one is a parametric classifier whereas the second one is a nonparametric classifier.
Weitere Kapitel dieses Buchs durch Wischen aufrufen
- The Treatment of Missing Values and its Effect on Classifier Accuracy
- Springer Berlin Heidelberg