Skip to main content
Top

2022 | OriginalPaper | Chapter

Extension of the Hybrid Method for Efficient Imputation of Records with Several Missing Attributes

Authors : Kone Dramane, Kimou Kouadio Prosper, Goore Bi Tra

Published in: e-Infrastructure and e-Services for Developing Countries

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The treatment of records with several discrete missing values present in the databases is still a delicate problem. Indeed, these records can bias the results of data mining algorithms, thus invalidating the results. In this paper, we present an extension of the Hybrid Method for Efficient Imputation of Discrete Missing Attributes (HMID) to effectively handle these records. The method consists of partitioning the database into two subsets, one containing complete records and the other incomplete records. From the complete set, decision trees for all missing discrete attributes are created. The multiple missing records can be in the same leaf or in different leaves. In the same leaf, they are estimated directly by the HMID method. Otherwise, the sheets containing them are merged into a horizontal segment to determine the dominant modality of the complete attributes. In which case, multiple records are estimated. We evaluate our algorithm using two databases. The Adult dataset extracted from the UCI Machine Learning database and SH_CDI_Single extracted from the World Bank database. Finally, we compare our algorithm with four imputation methods using the accuracy of missing value estimation and RMSE. Our results indicate that the proposed method performs better than the existing algorithms we compared.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
4.
go back to reference Dramane, K., Tra, G.B., Prosper, K.K.: New hybrid method for efficient imputation of discrete missing attributes. Int. J. Innov. Appl. Stud. 31(4), 763–775 (2021) Dramane, K., Tra, G.B., Prosper, K.K.: New hybrid method for efficient imputation of discrete missing attributes. Int. J. Innov. Appl. Stud. 31(4), 763–775 (2021)
9.
go back to reference Rahman, Md.G., Islam, M.Z.: FIMUS: a framework for imputing missing values using co-appearance, correlation and similarity analysis. Knowl.-Based Syst. 56, 311–327 (2014) Rahman, Md.G., Islam, M.Z.: FIMUS: a framework for imputing missing values using co-appearance, correlation and similarity analysis. Knowl.-Based Syst. 56, 311–327 (2014)
12.
go back to reference Imbert, A.: Décrire, prendre en compte, imputer et évaluer les valeurs manquantes dans les études statistiques: une revue des approches existantes. J. Soc. Française Stat. 159(2), 1–55 (2018)MATH Imbert, A.: Décrire, prendre en compte, imputer et évaluer les valeurs manquantes dans les études statistiques: une revue des approches existantes. J. Soc. Française Stat. 159(2), 1–55 (2018)MATH
13.
go back to reference Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, p. 15 (2002) Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, p. 15 (2002)
15.
go back to reference Audigier, V., et al.: Multiple imputation for multilevel data with continuous and binary variables. Stat. Sci. 33(2), 160–183 (2018)MathSciNetCrossRef Audigier, V., et al.: Multiple imputation for multilevel data with continuous and binary variables. Stat. Sci. 33(2), 160–183 (2018)MathSciNetCrossRef
16.
go back to reference Lu, Z., Wu, X., Bongard, J.C.: Active learning through adaptive heterogeneous ensembling. IEEE Trans. Knowl. Data Eng. 27(2), 368–381 (2015)CrossRef Lu, Z., Wu, X., Bongard, J.C.: Active learning through adaptive heterogeneous ensembling. IEEE Trans. Knowl. Data Eng. 27(2), 368–381 (2015)CrossRef
17.
go back to reference Patel, N., Singh, D.: An algorithm to construct decision tree for machine learning based on similarity factor. Int. J. Comput. Appl. 111(10), 22–26 (2015) Patel, N., Singh, D.: An algorithm to construct decision tree for machine learning based on similarity factor. Int. J. Comput. Appl. 111(10), 22–26 (2015)
18.
go back to reference Yang, Y., Chen, W.: Taiga: performance optimization of the C4.5 decision tree construction algorithm. Tsinghua Sci. Technol. 21(4), 415–425 (2016)MathSciNetCrossRef Yang, Y., Chen, W.: Taiga: performance optimization of the C4.5 decision tree construction algorithm. Tsinghua Sci. Technol. 21(4), 415–425 (2016)MathSciNetCrossRef
20.
go back to reference Cherfi, A., Nouira, K., Ferchichi, A.: Very fast C4.5 decision tree algorithm. Appl. Artif. Intell. 32(2), 119–137 (2018)CrossRef Cherfi, A., Nouira, K., Ferchichi, A.: Very fast C4.5 decision tree algorithm. Appl. Artif. Intell. 32(2), 119–137 (2018)CrossRef
21.
go back to reference Batista, G.E.A.P.A., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Appl. Artif. Intell. 17(5–6), 519–533 (2003)CrossRef Batista, G.E.A.P.A., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Appl. Artif. Intell. 17(5–6), 519–533 (2003)CrossRef
22.
go back to reference Schneider, T.: Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. J. Clim. 14, 853–871 (2001)CrossRef Schneider, T.: Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. J. Clim. 14, 853–871 (2001)CrossRef
23.
go back to reference Liu, C.-C., Dai, D.-Q., Yan, H.: The theoretic framework of local weighted approximation for microarray missing value estimation. Pattern Recognit 43(8), 2993–3002 (2010)CrossRef Liu, C.-C., Dai, D.-Q., Yan, H.: The theoretic framework of local weighted approximation for microarray missing value estimation. Pattern Recognit 43(8), 2993–3002 (2010)CrossRef
25.
Metadata
Title
Extension of the Hybrid Method for Efficient Imputation of Records with Several Missing Attributes
Authors
Kone Dramane
Kimou Kouadio Prosper
Goore Bi Tra
Copyright Year
2022
DOI
https://doi.org/10.1007/978-3-031-06374-9_17

Premium Partner