Skip to main content
Top

2021 | OriginalPaper | Chapter

Addressing Missing Data in a Healthcare Dataset Using an Improved kNN Algorithm

Authors : Tressy Thomas, Enayat Rajabi

Published in: Computational Science – ICCS 2021

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Missing values are ubiquitous in many real-world datasets. In scenarios where a dataset is not very large, addressing its missing values by utilizing appropriate data imputation methods benefits analysis significantly. In this paper, we leveraged and evaluated a new imputation approach called k-Nearest Neighbour with Most Significant Features and incomplete cases (KNNI\(_\mathrm{MSF}\)) to impute missing values in a healthcare dataset. This algorithm leverages k-Nearest Neighbour (kNN) and ReliefF feature selection techniques to address incomplete cases in the dataset. The merit of imputation is measured by comparing the classification performance of data models trained with the dataset with imputation and without imputation. We used a real-world dataset, “very low birth weight infants”, to predict the survival outcome of infants with low birth weights. Five different classifiers were used in the experiments. The comparison of multiple performance metrics shows that classifiers built on imputed dataset produce much better outcomes. KNNI\(_\mathrm{MSF}\) outperformed in general than the k-Nearest Neighbour Imputation using the Random Forest feature weights (KNNI\(_\mathrm{RF}\)) algorithm with respect to the balanced accuracy and specificity.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Schmidt, D., Niemann, M., Lindemann-Von Trzebiatowski, G.: The handling of missing values in medical domains with respect to pattern mining algorithms. In: CEUR Workshop Proceedings, vol. 1492 (2015) Schmidt, D., Niemann, M., Lindemann-Von Trzebiatowski, G.: The handling of missing values in medical domains with respect to pattern mining algorithms. In: CEUR Workshop Proceedings, vol. 1492 (2015)
3.
go back to reference Enders, C.K., Craig, K.: Applied Missing Data Analysis. The Guilford Press. New York, London (2010) Enders, C.K., Craig, K.: Applied Missing Data Analysis. The Guilford Press. New York, London (2010)
7.
go back to reference Orczyk, T., Porwik, P.: Influence of missing data imputation method on the classification accuracy of the medical data. J. Med. Inform.Technol. 22, 111–116 (2013) Orczyk, T., Porwik, P.: Influence of missing data imputation method on the classification accuracy of the medical data. J. Med. Inform.Technol. 22, 111–116 (2013)
Metadata
Title
Addressing Missing Data in a Healthcare Dataset Using an Improved kNN Algorithm
Authors
Tressy Thomas
Enayat Rajabi
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-77977-1_17

Premium Partner