Skip to main content
Top

2013 | OriginalPaper | Chapter

Machine Learning-Based Missing Value Imputation Method for Clinical Datasets

Authors : M. Mostafizur Rahman, D. N. Davis

Published in: IAENG Transactions on Engineering Technologies

Publisher: Springer Netherlands

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Missing value imputation is one of the biggest tasks of data pre-processing when performing data mining. Most medical datasets are usually incomplete. Simply removing the incomplete cases from the original datasets can bring more problems than solutions. A suitable method for missing value imputation can help to produce good quality datasets for better analysing clinical trials. In this paper we explore the use of a machine learning technique as a missing value imputation method for incomplete cardiovascular data. Mean/mode imputation, fuzzy unordered rule induction algorithm imputation, decision tree imputation and other machine learning algorithms are used as missing value imputation and the final datasets are classified using decision tree, fuzzy unordered rule induction, KNN and K-Mean clustering. The experiment shows that final classifier performance is improved when the fuzzy unordered rule induction algorithm is used to predict missing attribute values for K-Mean clustering and in most cases, the machine learning techniques were found to perform better than the standard mean imputation technique.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Sittig DF, Wright A, Osheroff JA, Middleton B, Teich JM, Ash JS et al (2008) Grand challenges in clinical decision support. J Biomed Inform 41:387–392 Sittig DF, Wright A, Osheroff JA, Middleton B, Teich JM, Ash JS et al (2008) Grand challenges in clinical decision support. J Biomed Inform 41:387–392
2.
go back to reference Fox J, Glasspool D, Patkar V, Austin M, Black L, South M et al (2010) Delivering clinical decision support services: there is nothing as practical as a good theory. J Biomed Inform 43:831–843 Fox J, Glasspool D, Patkar V, Austin M, Black L, South M et al (2010) Delivering clinical decision support services: there is nothing as practical as a good theory. J Biomed Inform 43:831–843
3.
go back to reference Bellazzi R, Zupan B (2008) Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inform 77:81–97CrossRef Bellazzi R, Zupan B (2008) Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inform 77:81–97CrossRef
4.
go back to reference Dasu T, Johnson T (2003) Exploratory data mining and data cleaning. Wiley-Interscience, New YorkMATHCrossRef Dasu T, Johnson T (2003) Exploratory data mining and data cleaning. Wiley-Interscience, New YorkMATHCrossRef
5.
go back to reference Tsumoto S (2000) Problems with mining medical data. In: Computer software and applications conference, COMPSAC, pp 467–468 Tsumoto S (2000) Problems with mining medical data. In: Computer software and applications conference, COMPSAC, pp 467–468
6.
go back to reference Almeida RJ, Kaymak U, Sousa JMC (2010) A new approach to dealing with missing values in data-driven fuzzy modelling. IEEE International Conference on Fuzzy Systems (FUZZ), Barcelona Almeida RJ, Kaymak U, Sousa JMC (2010) A new approach to dealing with missing values in data-driven fuzzy modelling. IEEE International Conference on Fuzzy Systems (FUZZ), Barcelona
7.
go back to reference Roderick JAL, Donald BR (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York Roderick JAL, Donald BR (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York
8.
go back to reference Marlin BM (2008) Missing data problems in machine learning. Doctor of Philosophy, Graduate Department of Computer Science, University of Toronto, Toronto, Canada Marlin BM (2008) Missing data problems in machine learning. Doctor of Philosophy, Graduate Department of Computer Science, University of Toronto, Toronto, Canada
9.
go back to reference Baraldi AN, Enders CK (2010) An introduction to modern missing data analyses. J Sch Psychol 48:5–37CrossRef Baraldi AN, Enders CK (2010) An introduction to modern missing data analyses. J Sch Psychol 48:5–37CrossRef
10.
11.
go back to reference Jerez JM, Molina I, Garcı’a-Laencina JP, Alba E, Nuria R, Miguel Mn et al (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med 50:105–115 Jerez JM, Molina I, Garcı’a-Laencina JP, Alba E, Nuria R, Miguel Mn et al (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med 50:105–115
12.
go back to reference Peugh JL, Enders CK (2004) Missing data in educational research: a review of reporting practices and suggestions for improvement. Rev Educ Res 74:525–556CrossRef Peugh JL, Enders CK (2004) Missing data in educational research: a review of reporting practices and suggestions for improvement. Rev Educ Res 74:525–556CrossRef
13.
go back to reference Rahman MM, Davis DN (2012) Fuzzy unordered rules induction algorithm used as missing value imputation methods for K-Mean clustering on real cardiovascular data. Lecture notes in engineering and computer science: Proceedings of the world congress on engineering (2012) London, UK, pp 391–394 Rahman MM, Davis DN (2012) Fuzzy unordered rules induction algorithm used as missing value imputation methods for K-Mean clustering on real cardiovascular data. Lecture notes in engineering and computer science: Proceedings of the world congress on engineering (2012) London, UK, pp 391–394
14.
go back to reference Esther-Lydia S-RR, Pino-Mejias M, Lopez-Coello M-D, Cubiles-de-la-Vega (2011) Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Networks 24:1 Esther-Lydia S-RR, Pino-Mejias M, Lopez-Coello M-D, Cubiles-de-la-Vega (2011) Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Networks 24:1
15.
go back to reference Weiss SM, Indurkhya N (2000) Decision-rule solutions for data mining with missing values. In: IBERAMIA-SBIA, pp 1–10 Weiss SM, Indurkhya N (2000) Decision-rule solutions for data mining with missing values. In: IBERAMIA-SBIA, pp 1–10
16.
go back to reference Pawan L, Ming Z, Satish S (2008) Evolutionary regression and neural imputations of missing values. Springer, London Pawan L, Ming Z, Satish S (2008) Evolutionary regression and neural imputations of missing values. Springer, London
17.
go back to reference Setiawan NA, Venkatachalam P, Hani AFM (2008) Missing attribute value prediction based on artificial neural network and rough set theory. In: Proceedings of the international conference on biomedical engineering and informatics, BMEI 2008, p 306–310 Setiawan NA, Venkatachalam P, Hani AFM (2008) Missing attribute value prediction based on artificial neural network and rough set theory. In: Proceedings of the international conference on biomedical engineering and informatics, BMEI 2008, p 306–310
18.
go back to reference Yun-fei Q, Xin-yan Z, Xue L, Liang-shan S (2010) Research on the missing attribute value data-oriented for decision tree. 2nd International conference on signal processing systems (ICSPS) 2010 Yun-fei Q, Xin-yan Z, Xue L, Liang-shan S (2010) Research on the missing attribute value data-oriented for decision tree. 2nd International conference on signal processing systems (ICSPS) 2010
19.
go back to reference Meesad P, Hengpraprohm K (2008) Combination of KNN-based feature selection and KNN based missing-value imputation of microarray data. In: Proceedings of the 3rd international conference on innovative computing information and control, ICICIC ’08 Meesad P, Hengpraprohm K (2008) Combination of KNN-based feature selection and KNN based missing-value imputation of microarray data. In: Proceedings of the 3rd international conference on innovative computing information and control, ICICIC ’08
20.
go back to reference Wang L, Fu D-M (2009) Estimation of missing values using a weighted K-nearest neighbors algorithm. In: Proceedings of the international conference on environmental science and information application technology, pp 660–663 Wang L, Fu D-M (2009) Estimation of missing values using a weighted K-nearest neighbors algorithm. In: Proceedings of the international conference on environmental science and information application technology, pp 660–663
21.
go back to reference García-Laencina PJ, Sancho-Gómez J-L, Figueiras-Vidal AR, Verleysen M (2009) K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neuro Comput 72:1483–1493 García-Laencina PJ, Sancho-Gómez J-L, Figueiras-Vidal AR, Verleysen M (2009) K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neuro Comput 72:1483–1493
22.
go back to reference Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M (2004) Methods for imputation of missing values in air quality data sets. Atmos Environ 38:1352–2310 Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M (2004) Methods for imputation of missing values in air quality data sets. Atmos Environ 38:1352–2310
23.
go back to reference Hühn J, Hüllermeier E (2009) Fuzzy unordered rules induction algorithm. Data Min Knowl Disc 19:293–319CrossRef Hühn J, Hüllermeier E (2009) Fuzzy unordered rules induction algorithm. Data Min Knowl Disc 19:293–319CrossRef
24.
go back to reference Lotte F, Lecuyer A, Arnaldi B (2007) FuRIA: A novel feature extraction algorithm for brain-computer interfaces using inverse models and Fuzzy regions of interest. In: Proceedings of the 3rd international IEEE/EMBS conference on neural engineering, CNE ’07 Lotte F, Lecuyer A, Arnaldi B (2007) FuRIA: A novel feature extraction algorithm for brain-computer interfaces using inverse models and Fuzzy regions of interest. In: Proceedings of the 3rd international IEEE/EMBS conference on neural engineering, CNE ’07
25.
go back to reference Lotte F, Lecuyer A, Arnaldi B (2009) FURIA: An inverse solution based feature extraction algorithm using Fuzzy set theory for brain-computer interfaces. IEEE Trans Signal Process 57:3253–3263MathSciNetCrossRef Lotte F, Lecuyer A, Arnaldi B (2009) FURIA: An inverse solution based feature extraction algorithm using Fuzzy set theory for brain-computer interfaces. IEEE Trans Signal Process 57:3253–3263MathSciNetCrossRef
26.
go back to reference Barros RC, Basgalupp MP, de Carvalho ACPLF, Freitas AA (2012) A survey of evolutionary algorithms for decision-tree induction. IEEE Trans Syst Man Cybern Part C Appl Rev 42:291–312 Barros RC, Basgalupp MP, de Carvalho ACPLF, Freitas AA (2012) A survey of evolutionary algorithms for decision-tree induction. IEEE Trans Syst Man Cybern Part C Appl Rev 42:291–312
27.
go back to reference Yoo I, Alafaireet P, Marinov M, Pena-Hernandez K, Gopidi R, Chang JF et al (Aug 2012) Data mining in healthcare and biomedicine: a survey of the literature. J Med Syst 36:2431–48CrossRef Yoo I, Alafaireet P, Marinov M, Pena-Hernandez K, Gopidi R, Chang JF et al (Aug 2012) Data mining in healthcare and biomedicine: a survey of the literature. J Med Syst 36:2431–48CrossRef
28.
go back to reference Maimon O, Rokach L (2010) Data mining and knowledge discovery handbook. Springer, Berlin Maimon O, Rokach L (2010) Data mining and knowledge discovery handbook. Springer, Berlin
29.
go back to reference Quinlan JR (1985) Induction of decision trees. School of Computing Sciences, Broadway, N.S.W., Australia: New South Wales Institute of Technology Quinlan JR (1985) Induction of decision trees. School of Computing Sciences, Broadway, N.S.W., Australia: New South Wales Institute of Technology
30.
go back to reference Quinlan JR (1993) C4.5: programs for machine learning. San Mateo: Morgan Kaufmann Quinlan JR (1993) C4.5: programs for machine learning. San Mateo: Morgan Kaufmann
31.
go back to reference Bouckaert RR, Frank E, Hall MA, Holmes G, Pfahringer B, Reutemann P et al (2010) WEKA-Experiences with a Java open-source project. J Mach Learn Res 11:2533–2541 Bouckaert RR, Frank E, Hall MA, Holmes G, Pfahringer B, Reutemann P et al (2010) WEKA-Experiences with a Java open-source project. J Mach Learn Res 11:2533–2541
32.
go back to reference Aha DW, Kibler D, Albert MK (Jan 1991) Instance-based learning algorithms. Mach Learn 6:37–66 Aha DW, Kibler D, Albert MK (Jan 1991) Instance-based learning algorithms. Mach Learn 6:37–66
33.
go back to reference Davis DN, Nguyen TTT (2008) Generating and veriffying risk prediction models using data mining (A case study from cardiovascular medicine). Presented at the European society for cardiovascular surgery, 57th Annual congress of ESCVS, Barcelona Spain, 2008 Davis DN, Nguyen TTT (2008) Generating and veriffying risk prediction models using data mining (A case study from cardiovascular medicine). Presented at the European society for cardiovascular surgery, 57th Annual congress of ESCVS, Barcelona Spain, 2008
34.
go back to reference Marsala C (2009) A fuzzy decision tree based approach to characterize medical data. In: Proceedings of the IEEE International Conference on Fuzzy Systems, 2009 Marsala C (2009) A fuzzy decision tree based approach to characterize medical data. In: Proceedings of the IEEE International Conference on Fuzzy Systems, 2009
35.
go back to reference Devendran V, Hemalatha T, Amitabh W (2008) Texture based scene categorization using artificial neural networks and support vector machines: a comparative study. ICGST-GVIP, vol 8. 2008 Devendran V, Hemalatha T, Amitabh W (2008) Texture based scene categorization using artificial neural networks and support vector machines: a comparative study. ICGST-GVIP, vol 8. 2008
36.
go back to reference Nguyen TTT (2009) Predicting cardiovascular risks using pattern recognition and data mining. Ph.D., Department of Computer Science, The University of Hull, Hull, UK Nguyen TTT (2009) Predicting cardiovascular risks using pattern recognition and data mining. Ph.D., Department of Computer Science, The University of Hull, Hull, UK
37.
go back to reference Nguyen TTT, Davis DN (2007) A clustering algorithm for predicting cardioVascular risk. Presented at the international conference of data mining and knowledge engineering, London, 2007 Nguyen TTT, Davis DN (2007) A clustering algorithm for predicting cardioVascular risk. Presented at the international conference of data mining and knowledge engineering, London, 2007
38.
go back to reference Landgrebe TCW, Duin RPW (2008) Efficient multiclass ROC approximation by decomposition via confusion matrix perturbation analysis. IEEE Trans Pattern Anal Mach Intell 30:810–822 Landgrebe TCW, Duin RPW (2008) Efficient multiclass ROC approximation by decomposition via confusion matrix perturbation analysis. IEEE Trans Pattern Anal Mach Intell 30:810–822
Metadata
Title
Machine Learning-Based Missing Value Imputation Method for Clinical Datasets
Authors
M. Mostafizur Rahman
D. N. Davis
Copyright Year
2013
Publisher
Springer Netherlands
DOI
https://doi.org/10.1007/978-94-007-6190-2_19