Skip to main content
Top

2023 | OriginalPaper | Chapter

A Prediction Model with Multi-Pattern Missing Data Imputation for Medical Dataset

Authors : K. Jegadeeswari, R. Ragunath, R. Rathipriya

Published in: Advanced Network Technologies and Intelligent Computing

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Medical data is over and over again analyzed for disease diagnosis and proper treatment. Medical dataset usually contain missing data it is also treated as error. These missing values possibly will clue to incorrect disease diagnosis result. Meanwhile the medical data collection is costly, time incontrollable and an essential on the way to collected beginning various issues. Therefore get better missing data is an alternative of re-collecting the medical data. In this paper a Prediction Model has been proposed for missing data imputation in medical data. An experiment includes various datasets to validate the model as well as to establish the importance of imputation. A new Method name called enhanced random forest regression predictor is proposed for missing data imputation on medical dataset. Method is validated using 3 datasets named wisconsin, dermatology and breast cancer. All the datasets are downloaded from UCI repository. Missing data are generated manually in the original data from 1% to 15%. The proposed Prediction model is predict the missing values based on enhanced random forest regression predictor and evaluates the model using various classifiers. Classification is assessment of normal and abnormal disease diagnostics and produce the result of this experiment is accuracy. Proposed predictor has been compared with two imputation method as KNN and mice forest. Missing prediction model is perform better compared with other methods. Evaluation is demonstrating the classification and gives accuracy which is compared with original dataset and the imputed dataset. Missing data problem is a serious problem in medical data and can guidance downstream disease analysis. A proposed enhanced missing prediction model for missing data imputation is an application of imputing the missing data and disease analysis using classification in better way.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Muro, S., et al.: Identification of expressed genes linked to malignancy of human colorectal carcinoma by parameteric clustering of quantitative expression data. Genome. Biol. 4(R21), 1–10 (2003) Muro, S., et al.: Identification of expressed genes linked to malignancy of human colorectal carcinoma by parameteric clustering of quantitative expression data. Genome. Biol. 4(R21), 1–10 (2003)
2.
go back to reference Mirus, J.E., et al.: Cross-species antibody microarray interrogation identifies a 3-protein panel of plasma biomarkers for early diagnosis of pancreas cancer. Clin. Cancer Res. 21(7), 1764–1771 (2015)CrossRef Mirus, J.E., et al.: Cross-species antibody microarray interrogation identifies a 3-protein panel of plasma biomarkers for early diagnosis of pancreas cancer. Clin. Cancer Res. 21(7), 1764–1771 (2015)CrossRef
3.
go back to reference Wang, W., et al.: Microarray profiling shows distinct differences between primary tumors and commonly used preclinical models in hepatocellular carcinoma. BMC Cancer 15, 828 (2015)CrossRef Wang, W., et al.: Microarray profiling shows distinct differences between primary tumors and commonly used preclinical models in hepatocellular carcinoma. BMC Cancer 15, 828 (2015)CrossRef
4.
go back to reference Shipp, M.A., et al.: Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med. 8(1), 68–74 (2002)CrossRef Shipp, M.A., et al.: Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med. 8(1), 68–74 (2002)CrossRef
5.
go back to reference Li, J., Wong, L., Yang, Q.: Guest editors’ introduction: data mining in bioinformatics. IEEE Intell. Syst. 20(6), 16–18 (2005)CrossRef Li, J., Wong, L., Yang, Q.: Guest editors’ introduction: data mining in bioinformatics. IEEE Intell. Syst. 20(6), 16–18 (2005)CrossRef
6.
go back to reference Ayilara, O.F., Zhang, L., Sajobi, T.T., Sawatzky, R., Bohm, E., Lix, L.M.: Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry. Health Quality Life Outcomes, 17(1) (2019) Ayilara, O.F., Zhang, L., Sajobi, T.T., Sawatzky, R., Bohm, E., Lix, L.M.: Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry. Health Quality Life Outcomes, 17(1) (2019)
7.
go back to reference Dantan, E., Proust-Lima, C., Letenneur, L., Jacqmin-Gadda, H.: Pattern mixture models and latent class models for the analysis of multivariate longitudinal data with informative dropouts. Int. J. Biostat. 4(1) (2008) Dantan, E., Proust-Lima, C., Letenneur, L., Jacqmin-Gadda, H.: Pattern mixture models and latent class models for the analysis of multivariate longitudinal data with informative dropouts. Int. J. Biostat. 4(1) (2008)
8.
go back to reference Jegadeeswari, K., Ragunath, R., Rathipriya, R.: Missing data imputation using ensemble learning technique: a review. Soft Comput. Secur. Appl. 223-236 (2023) Jegadeeswari, K., Ragunath, R., Rathipriya, R.: Missing data imputation using ensemble learning technique: a review. Soft Comput. Secur. Appl. 223-236 (2023)
9.
go back to reference Ramli, M.N., Yahaya, A., Ramli, N., Yusof, N., Abdullah, M.: Roles of imputation methods for filling the missing values: a review. Adv. Environ. Biol. 7, 3861–3870 (2013) Ramli, M.N., Yahaya, A., Ramli, N., Yusof, N., Abdullah, M.: Roles of imputation methods for filling the missing values: a review. Adv. Environ. Biol. 7, 3861–3870 (2013)
10.
go back to reference Rezvan, P.H., Lee, K.J., Simpson, J.A.: The rise of multiple imputation: a review of the reporting and implementation of the method in medical research. BMC Med. Res. Methodol. 15, 30 (2015)CrossRef Rezvan, P.H., Lee, K.J., Simpson, J.A.: The rise of multiple imputation: a review of the reporting and implementation of the method in medical research. BMC Med. Res. Methodol. 15, 30 (2015)CrossRef
11.
go back to reference Eisemann, N., Waldmann, A., Katalinic, A.: Imputation of missing values of tumour stage in population-based cancer registration. BMC Med. Res. Methodol. 11, 129 (2011)CrossRef Eisemann, N., Waldmann, A., Katalinic, A.: Imputation of missing values of tumour stage in population-based cancer registration. BMC Med. Res. Methodol. 11, 129 (2011)CrossRef
12.
go back to reference Rahman, S.A., Huang, Y., Claassen, J., Heintzman, N., Kleinberg, S.: Combining Fourier and lagged k-nearest neighbor imputation for biomedical time series data. J. Biomed. Inform. 58, 198–207 (2015)CrossRef Rahman, S.A., Huang, Y., Claassen, J., Heintzman, N., Kleinberg, S.: Combining Fourier and lagged k-nearest neighbor imputation for biomedical time series data. J. Biomed. Inform. 58, 198–207 (2015)CrossRef
13.
go back to reference Gómez-Carracedo, M.P., Andrade, J.M., López-Mahía, P., Muniategui, S., Prada, D.: A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets. Chemom. Intell. Lab. Syst. 134, 23–33 (2014)CrossRef Gómez-Carracedo, M.P., Andrade, J.M., López-Mahía, P., Muniategui, S., Prada, D.: A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets. Chemom. Intell. Lab. Syst. 134, 23–33 (2014)CrossRef
14.
go back to reference Langkamp, D.L., Lehman, A., Lemeshow, S.: Techniques for handling missing data in secondary analyses of large surveys. Acad. Pediatr. 10(3), 205–210 (2010)CrossRef Langkamp, D.L., Lehman, A., Lemeshow, S.: Techniques for handling missing data in secondary analyses of large surveys. Acad. Pediatr. 10(3), 205–210 (2010)CrossRef
28.
go back to reference He, C., Zhao, C., Li, G.Z., Zhu, W., Yang, W., Yang, M.Q.: A hybrid iterative approach for microarray missing value estimation. In: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Shenzhen, pp. 2–1350. IEEE (2016) He, C., Zhao, C., Li, G.Z., Zhu, W., Yang, W., Yang, M.Q.: A hybrid iterative approach for microarray missing value estimation. In: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Shenzhen, pp. 2–1350. IEEE (2016)
29.
go back to reference Emmanuel, T., Maupong, T., Mpoeleng, D., Semong, T., Banyatsang, M., Tabona, O.: A survey on missing data in machine learning (2021) Emmanuel, T., Maupong, T., Mpoeleng, D., Semong, T., Banyatsang, M., Tabona, O.: A survey on missing data in machine learning (2021)
Metadata
Title
A Prediction Model with Multi-Pattern Missing Data Imputation for Medical Dataset
Authors
K. Jegadeeswari
R. Ragunath
R. Rathipriya
Copyright Year
2023
DOI
https://doi.org/10.1007/978-3-031-28183-9_38

Premium Partner