Skip to main content
Top
Published in: Neural Computing and Applications 17/2020

18-01-2019 | IWINAC 2015

Improving deep learning performance with missing values via deletion and compensation

Authors: Adrián Sánchez-Morales, José-Luis Sancho-Gómez, Juan-Antonio Martínez-García, Aníbal R. Figueiras-Vidal

Published in: Neural Computing and Applications | Issue 17/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Missing values in a dataset is one of the most common difficulties in real applications. Many different techniques based on machine learning have been proposed in the literature to face this problem. In this work, the great representation capability of the stacked denoising auto-encoders is used to obtain a new method of imputating missing values based on two ideas: deletion and compensation. This method improves imputation performance by artificially deleting values in the input features and using them as targets in the training process. Nevertheless, although the deletion of samples is demonstrated to be really efficient, it may cause an imbalance between the distributions of the training and the test sets. In order to solve this issue, a compensation mechanism is proposed based on a slight modification of the error function to be optimized. Experiments over several datasets show that the deletion and compensation not only involve improvements in imputation but also in classification in comparison with other classical techniques.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Little R, Rubin D (2002) Statistical analysis with missing data, 2nd edn. Wiley, LondonCrossRef Little R, Rubin D (2002) Statistical analysis with missing data, 2nd edn. Wiley, LondonCrossRef
4.
go back to reference Quinlan JR (1993) C4.5: programs for machine learning. Morgan-Kaufmann, Burlington Quinlan JR (1993) C4.5: programs for machine learning. Morgan-Kaufmann, Burlington
7.
go back to reference Delalleau O, Courville A, Bengio Y (2008) Gaussian mixtures with missing data: an efficient EM training algorithm. In: Proceeding of the computing research association conference, Snowbird, p 155 Delalleau O, Courville A, Bengio Y (2008) Gaussian mixtures with missing data: an efficient EM training algorithm. In: Proceeding of the computing research association conference, Snowbird, p 155
8.
go back to reference Ghahramani Z, Jordan MI (1994) Supervised learning from incomplete data via an EM approach. In: Cowan JD, Tesauro G, Alspector J (eds) Advances in neural information processing systems, vol 6. Morgan-Kaufmann, Burlington, pp 120–127 Ghahramani Z, Jordan MI (1994) Supervised learning from incomplete data via an EM approach. In: Cowan JD, Tesauro G, Alspector J (eds) Advances in neural information processing systems, vol 6. Morgan-Kaufmann, Burlington, pp 120–127
12.
go back to reference Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, David Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525CrossRef Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, David Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525CrossRef
18.
go back to reference Smola AJ, Vishwanathan SVN, Hofmann T (2005) Kernel methods for missing variables. In: Proceedings of the 10th international workshop on artificial intelligence and statistics, pp 325–332 Smola AJ, Vishwanathan SVN, Hofmann T (2005) Kernel methods for missing variables. In: Proceedings of the 10th international workshop on artificial intelligence and statistics, pp 325–332
20.
go back to reference Bengio Y, Lecun Y (2007) Scaling learning algorithms towards AI. MIT Press, Cambridge Bengio Y, Lecun Y (2007) Scaling learning algorithms towards AI. MIT Press, Cambridge
26.
27.
go back to reference Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning, ICML’08. ACM, New York, pp 1096–1103. https://doi.org/10.1145/1390156.1390294 Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning, ICML’08. ACM, New York, pp 1096–1103. https://​doi.​org/​10.​1145/​1390156.​1390294
28.
go back to reference Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408MathSciNetMATH Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408MathSciNetMATH
30.
go back to reference Little RJA, Rubin DB (1986) Statistical analysis with missing data. Wiley, LondonMATH Little RJA, Rubin DB (1986) Statistical analysis with missing data. Wiley, LondonMATH
31.
go back to reference Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall, LondonCrossRef Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall, LondonCrossRef
38.
go back to reference Vorobeychik Y, Kantarcioglu M (2018) Adversarial machine learning. Synth Lect Artif Intell Mach Learn 12(3):1–169CrossRef Vorobeychik Y, Kantarcioglu M (2018) Adversarial machine learning. Synth Lect Artif Intell Mach Learn 12(3):1–169CrossRef
Metadata
Title
Improving deep learning performance with missing values via deletion and compensation
Authors
Adrián Sánchez-Morales
José-Luis Sancho-Gómez
Juan-Antonio Martínez-García
Aníbal R. Figueiras-Vidal
Publication date
18-01-2019
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 17/2020
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-019-04013-2

Other articles of this Issue 17/2020

Neural Computing and Applications 17/2020 Go to the issue

Premium Partner