Skip to main content

2013 | OriginalPaper | Buchkapitel

Machine Learning-Based Missing Value Imputation Method for Clinical Datasets

verfasst von : M. Mostafizur Rahman, D. N. Davis

Erschienen in: IAENG Transactions on Engineering Technologies

Verlag: Springer Netherlands

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Missing value imputation is one of the biggest tasks of data pre-processing when performing data mining. Most medical datasets are usually incomplete. Simply removing the incomplete cases from the original datasets can bring more problems than solutions. A suitable method for missing value imputation can help to produce good quality datasets for better analysing clinical trials. In this paper we explore the use of a machine learning technique as a missing value imputation method for incomplete cardiovascular data. Mean/mode imputation, fuzzy unordered rule induction algorithm imputation, decision tree imputation and other machine learning algorithms are used as missing value imputation and the final datasets are classified using decision tree, fuzzy unordered rule induction, KNN and K-Mean clustering. The experiment shows that final classifier performance is improved when the fuzzy unordered rule induction algorithm is used to predict missing attribute values for K-Mean clustering and in most cases, the machine learning techniques were found to perform better than the standard mean imputation technique.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Sittig DF, Wright A, Osheroff JA, Middleton B, Teich JM, Ash JS et al (2008) Grand challenges in clinical decision support. J Biomed Inform 41:387–392 Sittig DF, Wright A, Osheroff JA, Middleton B, Teich JM, Ash JS et al (2008) Grand challenges in clinical decision support. J Biomed Inform 41:387–392
2.
Zurück zum Zitat Fox J, Glasspool D, Patkar V, Austin M, Black L, South M et al (2010) Delivering clinical decision support services: there is nothing as practical as a good theory. J Biomed Inform 43:831–843 Fox J, Glasspool D, Patkar V, Austin M, Black L, South M et al (2010) Delivering clinical decision support services: there is nothing as practical as a good theory. J Biomed Inform 43:831–843
3.
Zurück zum Zitat Bellazzi R, Zupan B (2008) Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inform 77:81–97CrossRef Bellazzi R, Zupan B (2008) Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inform 77:81–97CrossRef
4.
Zurück zum Zitat Dasu T, Johnson T (2003) Exploratory data mining and data cleaning. Wiley-Interscience, New YorkMATHCrossRef Dasu T, Johnson T (2003) Exploratory data mining and data cleaning. Wiley-Interscience, New YorkMATHCrossRef
5.
Zurück zum Zitat Tsumoto S (2000) Problems with mining medical data. In: Computer software and applications conference, COMPSAC, pp 467–468 Tsumoto S (2000) Problems with mining medical data. In: Computer software and applications conference, COMPSAC, pp 467–468
6.
Zurück zum Zitat Almeida RJ, Kaymak U, Sousa JMC (2010) A new approach to dealing with missing values in data-driven fuzzy modelling. IEEE International Conference on Fuzzy Systems (FUZZ), Barcelona Almeida RJ, Kaymak U, Sousa JMC (2010) A new approach to dealing with missing values in data-driven fuzzy modelling. IEEE International Conference on Fuzzy Systems (FUZZ), Barcelona
7.
Zurück zum Zitat Roderick JAL, Donald BR (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York Roderick JAL, Donald BR (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York
8.
Zurück zum Zitat Marlin BM (2008) Missing data problems in machine learning. Doctor of Philosophy, Graduate Department of Computer Science, University of Toronto, Toronto, Canada Marlin BM (2008) Missing data problems in machine learning. Doctor of Philosophy, Graduate Department of Computer Science, University of Toronto, Toronto, Canada
9.
Zurück zum Zitat Baraldi AN, Enders CK (2010) An introduction to modern missing data analyses. J Sch Psychol 48:5–37CrossRef Baraldi AN, Enders CK (2010) An introduction to modern missing data analyses. J Sch Psychol 48:5–37CrossRef
10.
Zurück zum Zitat Maimon O, Rokach L (2010) Data mining and knowledge discovery handbook. Springer, BerlinMATHCrossRef Maimon O, Rokach L (2010) Data mining and knowledge discovery handbook. Springer, BerlinMATHCrossRef
11.
Zurück zum Zitat Jerez JM, Molina I, Garcı’a-Laencina JP, Alba E, Nuria R, Miguel Mn et al (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med 50:105–115 Jerez JM, Molina I, Garcı’a-Laencina JP, Alba E, Nuria R, Miguel Mn et al (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med 50:105–115
12.
Zurück zum Zitat Peugh JL, Enders CK (2004) Missing data in educational research: a review of reporting practices and suggestions for improvement. Rev Educ Res 74:525–556CrossRef Peugh JL, Enders CK (2004) Missing data in educational research: a review of reporting practices and suggestions for improvement. Rev Educ Res 74:525–556CrossRef
13.
Zurück zum Zitat Rahman MM, Davis DN (2012) Fuzzy unordered rules induction algorithm used as missing value imputation methods for K-Mean clustering on real cardiovascular data. Lecture notes in engineering and computer science: Proceedings of the world congress on engineering (2012) London, UK, pp 391–394 Rahman MM, Davis DN (2012) Fuzzy unordered rules induction algorithm used as missing value imputation methods for K-Mean clustering on real cardiovascular data. Lecture notes in engineering and computer science: Proceedings of the world congress on engineering (2012) London, UK, pp 391–394
14.
Zurück zum Zitat Esther-Lydia S-RR, Pino-Mejias M, Lopez-Coello M-D, Cubiles-de-la-Vega (2011) Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Networks 24:1 Esther-Lydia S-RR, Pino-Mejias M, Lopez-Coello M-D, Cubiles-de-la-Vega (2011) Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Networks 24:1
15.
Zurück zum Zitat Weiss SM, Indurkhya N (2000) Decision-rule solutions for data mining with missing values. In: IBERAMIA-SBIA, pp 1–10 Weiss SM, Indurkhya N (2000) Decision-rule solutions for data mining with missing values. In: IBERAMIA-SBIA, pp 1–10
16.
Zurück zum Zitat Pawan L, Ming Z, Satish S (2008) Evolutionary regression and neural imputations of missing values. Springer, London Pawan L, Ming Z, Satish S (2008) Evolutionary regression and neural imputations of missing values. Springer, London
17.
Zurück zum Zitat Setiawan NA, Venkatachalam P, Hani AFM (2008) Missing attribute value prediction based on artificial neural network and rough set theory. In: Proceedings of the international conference on biomedical engineering and informatics, BMEI 2008, p 306–310 Setiawan NA, Venkatachalam P, Hani AFM (2008) Missing attribute value prediction based on artificial neural network and rough set theory. In: Proceedings of the international conference on biomedical engineering and informatics, BMEI 2008, p 306–310
18.
Zurück zum Zitat Yun-fei Q, Xin-yan Z, Xue L, Liang-shan S (2010) Research on the missing attribute value data-oriented for decision tree. 2nd International conference on signal processing systems (ICSPS) 2010 Yun-fei Q, Xin-yan Z, Xue L, Liang-shan S (2010) Research on the missing attribute value data-oriented for decision tree. 2nd International conference on signal processing systems (ICSPS) 2010
19.
Zurück zum Zitat Meesad P, Hengpraprohm K (2008) Combination of KNN-based feature selection and KNN based missing-value imputation of microarray data. In: Proceedings of the 3rd international conference on innovative computing information and control, ICICIC ’08 Meesad P, Hengpraprohm K (2008) Combination of KNN-based feature selection and KNN based missing-value imputation of microarray data. In: Proceedings of the 3rd international conference on innovative computing information and control, ICICIC ’08
20.
Zurück zum Zitat Wang L, Fu D-M (2009) Estimation of missing values using a weighted K-nearest neighbors algorithm. In: Proceedings of the international conference on environmental science and information application technology, pp 660–663 Wang L, Fu D-M (2009) Estimation of missing values using a weighted K-nearest neighbors algorithm. In: Proceedings of the international conference on environmental science and information application technology, pp 660–663
21.
Zurück zum Zitat García-Laencina PJ, Sancho-Gómez J-L, Figueiras-Vidal AR, Verleysen M (2009) K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neuro Comput 72:1483–1493 García-Laencina PJ, Sancho-Gómez J-L, Figueiras-Vidal AR, Verleysen M (2009) K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neuro Comput 72:1483–1493
22.
Zurück zum Zitat Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M (2004) Methods for imputation of missing values in air quality data sets. Atmos Environ 38:1352–2310 Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M (2004) Methods for imputation of missing values in air quality data sets. Atmos Environ 38:1352–2310
23.
Zurück zum Zitat Hühn J, Hüllermeier E (2009) Fuzzy unordered rules induction algorithm. Data Min Knowl Disc 19:293–319CrossRef Hühn J, Hüllermeier E (2009) Fuzzy unordered rules induction algorithm. Data Min Knowl Disc 19:293–319CrossRef
24.
Zurück zum Zitat Lotte F, Lecuyer A, Arnaldi B (2007) FuRIA: A novel feature extraction algorithm for brain-computer interfaces using inverse models and Fuzzy regions of interest. In: Proceedings of the 3rd international IEEE/EMBS conference on neural engineering, CNE ’07 Lotte F, Lecuyer A, Arnaldi B (2007) FuRIA: A novel feature extraction algorithm for brain-computer interfaces using inverse models and Fuzzy regions of interest. In: Proceedings of the 3rd international IEEE/EMBS conference on neural engineering, CNE ’07
25.
Zurück zum Zitat Lotte F, Lecuyer A, Arnaldi B (2009) FURIA: An inverse solution based feature extraction algorithm using Fuzzy set theory for brain-computer interfaces. IEEE Trans Signal Process 57:3253–3263MathSciNetCrossRef Lotte F, Lecuyer A, Arnaldi B (2009) FURIA: An inverse solution based feature extraction algorithm using Fuzzy set theory for brain-computer interfaces. IEEE Trans Signal Process 57:3253–3263MathSciNetCrossRef
26.
Zurück zum Zitat Barros RC, Basgalupp MP, de Carvalho ACPLF, Freitas AA (2012) A survey of evolutionary algorithms for decision-tree induction. IEEE Trans Syst Man Cybern Part C Appl Rev 42:291–312 Barros RC, Basgalupp MP, de Carvalho ACPLF, Freitas AA (2012) A survey of evolutionary algorithms for decision-tree induction. IEEE Trans Syst Man Cybern Part C Appl Rev 42:291–312
27.
Zurück zum Zitat Yoo I, Alafaireet P, Marinov M, Pena-Hernandez K, Gopidi R, Chang JF et al (Aug 2012) Data mining in healthcare and biomedicine: a survey of the literature. J Med Syst 36:2431–48CrossRef Yoo I, Alafaireet P, Marinov M, Pena-Hernandez K, Gopidi R, Chang JF et al (Aug 2012) Data mining in healthcare and biomedicine: a survey of the literature. J Med Syst 36:2431–48CrossRef
28.
Zurück zum Zitat Maimon O, Rokach L (2010) Data mining and knowledge discovery handbook. Springer, Berlin Maimon O, Rokach L (2010) Data mining and knowledge discovery handbook. Springer, Berlin
29.
Zurück zum Zitat Quinlan JR (1985) Induction of decision trees. School of Computing Sciences, Broadway, N.S.W., Australia: New South Wales Institute of Technology Quinlan JR (1985) Induction of decision trees. School of Computing Sciences, Broadway, N.S.W., Australia: New South Wales Institute of Technology
30.
Zurück zum Zitat Quinlan JR (1993) C4.5: programs for machine learning. San Mateo: Morgan Kaufmann Quinlan JR (1993) C4.5: programs for machine learning. San Mateo: Morgan Kaufmann
31.
Zurück zum Zitat Bouckaert RR, Frank E, Hall MA, Holmes G, Pfahringer B, Reutemann P et al (2010) WEKA-Experiences with a Java open-source project. J Mach Learn Res 11:2533–2541 Bouckaert RR, Frank E, Hall MA, Holmes G, Pfahringer B, Reutemann P et al (2010) WEKA-Experiences with a Java open-source project. J Mach Learn Res 11:2533–2541
32.
Zurück zum Zitat Aha DW, Kibler D, Albert MK (Jan 1991) Instance-based learning algorithms. Mach Learn 6:37–66 Aha DW, Kibler D, Albert MK (Jan 1991) Instance-based learning algorithms. Mach Learn 6:37–66
33.
Zurück zum Zitat Davis DN, Nguyen TTT (2008) Generating and veriffying risk prediction models using data mining (A case study from cardiovascular medicine). Presented at the European society for cardiovascular surgery, 57th Annual congress of ESCVS, Barcelona Spain, 2008 Davis DN, Nguyen TTT (2008) Generating and veriffying risk prediction models using data mining (A case study from cardiovascular medicine). Presented at the European society for cardiovascular surgery, 57th Annual congress of ESCVS, Barcelona Spain, 2008
34.
Zurück zum Zitat Marsala C (2009) A fuzzy decision tree based approach to characterize medical data. In: Proceedings of the IEEE International Conference on Fuzzy Systems, 2009 Marsala C (2009) A fuzzy decision tree based approach to characterize medical data. In: Proceedings of the IEEE International Conference on Fuzzy Systems, 2009
35.
Zurück zum Zitat Devendran V, Hemalatha T, Amitabh W (2008) Texture based scene categorization using artificial neural networks and support vector machines: a comparative study. ICGST-GVIP, vol 8. 2008 Devendran V, Hemalatha T, Amitabh W (2008) Texture based scene categorization using artificial neural networks and support vector machines: a comparative study. ICGST-GVIP, vol 8. 2008
36.
Zurück zum Zitat Nguyen TTT (2009) Predicting cardiovascular risks using pattern recognition and data mining. Ph.D., Department of Computer Science, The University of Hull, Hull, UK Nguyen TTT (2009) Predicting cardiovascular risks using pattern recognition and data mining. Ph.D., Department of Computer Science, The University of Hull, Hull, UK
37.
Zurück zum Zitat Nguyen TTT, Davis DN (2007) A clustering algorithm for predicting cardioVascular risk. Presented at the international conference of data mining and knowledge engineering, London, 2007 Nguyen TTT, Davis DN (2007) A clustering algorithm for predicting cardioVascular risk. Presented at the international conference of data mining and knowledge engineering, London, 2007
38.
Zurück zum Zitat Landgrebe TCW, Duin RPW (2008) Efficient multiclass ROC approximation by decomposition via confusion matrix perturbation analysis. IEEE Trans Pattern Anal Mach Intell 30:810–822 Landgrebe TCW, Duin RPW (2008) Efficient multiclass ROC approximation by decomposition via confusion matrix perturbation analysis. IEEE Trans Pattern Anal Mach Intell 30:810–822
Metadaten
Titel
Machine Learning-Based Missing Value Imputation Method for Clinical Datasets
verfasst von
M. Mostafizur Rahman
D. N. Davis
Copyright-Jahr
2013
Verlag
Springer Netherlands
DOI
https://doi.org/10.1007/978-94-007-6190-2_19

Neuer Inhalt