Skip to main content

Introduction to Missing Data Estimation

  • Chapter
  • First Online:
Deep Learning and Missing Data in Engineering Systems

Part of the book series: Studies in Big Data ((SBD,volume 48))

Abstract

This chapter describes in detail the problem of missing data. It also describes the different missing data patterns and mechanisms. This is followed by a discussion of the classical missing data techniques ensued by a presentation of machine learning approaches to address the missing data problem. Subsequently, machine learning optimization techniques are presented for missing data estimation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Abdella, M., & Marwala, T. (2005a). The use of genetic algorithms and neural networks to approximate missing data in database. 24, 577–589.

    Google Scholar 

  • Abdella, M. (2005). The use of genetic algorithms and neural networks to approximate missing data in database. Unpublished master’s thesis, University of the Witwatersrand, Johannesburg.

    Google Scholar 

  • Abdella, M., & Marwala, T. (2005b). Treatment of missing data using neural networks. In: Proceedings of the IEEE International Joint Conference on Neural Networks, vol. 1, 598–603

    Google Scholar 

  • Allison, P. D. (2000). Multiple imputation for missing data. Sociological Methods & Research, 28(3), 301–309.

    Article  Google Scholar 

  • Allison, P. D. (2002). Missing data. Thousand Oaks: Sage Publications.

    Book  MATH  Google Scholar 

  • Atalla, M. J., & Inman, D. J. (1998). On model updating using neural networks. Mechanical Systems and Signal Processing, 12, 135–161.

    Article  Google Scholar 

  • Baek, K., & Cho, S. (2003). Bankruptcy prediction for credit risk using an auto-associative neural network in Korean firms. In: IEEE Conference on Computational Intelligence for Financial Engineering, pp. 25–29, Hong Kong, China.

    Google Scholar 

  • Brain, L. B., Marwala, T., & Tettey, T. (2006). Autoencoder networks for HIV classification. Current Science, 91(11), 1467–1473.

    Google Scholar 

  • Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.

    Article  Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1997). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistics Society, 39(1), 1–38.

    MathSciNet  MATH  Google Scholar 

  • Dhlamini, S. M., Nelwamondo, F. V., & Marwala, T. (2006). Condition monitoring of HV bushings in the presence of missing data using evolutionary computing. Transactions on Power Systems, 1(2), 280–287.

    Google Scholar 

  • Engelbrecht, A. P. (2006). Particle swarm optimization: Where does it belong? In: Proceedings of IEEE Swarm Intelligence Symposium, pp. 48–54.

    Google Scholar 

  • Faris, P. D., Ghali, W. A., Brant, R., Norris, C. M., Galbraith, P. D., & Knudtson, M. L. (2002). Multiple imputation versus data enhancement for dealing with missing data in observational health care outcome analyses. Journal of Clinical Epidemiology, 55(2), 184–191.

    Article  Google Scholar 

  • Gabrys, B. (2002). Neuro-fuzzy approach to processing inputs with missing values in pattern recognition problems. International Journal of Approximate Reasoning, 30, 149–179.

    Article  MathSciNet  MATH  Google Scholar 

  • Garca-Laencina, P., Sancho-Gmez, J., Figueiras-Vidal, A., & Verleysen, M. (2009). K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing, 72(7–9), 1483–1493.

    Article  Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2008). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.

    MATH  Google Scholar 

  • Haykin, S. (1999). Neural networks (2nd ed.). New Jersey: Prentice-Hall.

    Google Scholar 

  • Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their Applications, 13(4), 18–28.

    Article  Google Scholar 

  • Hines, J. W., Robert, E. U., & Wrest, D. J. (1998). Use of autoassociative neural networks for signal validation. Journal of Intelligent and Robotic Systems, 21(2), 143–154.

    Article  Google Scholar 

  • Ho, P., Silva, M. C. M., & Hogg, T. A. (2001). Multiple imputation and maximum likelihood principal component analysis of incomplete multivariate data from a study of the ageing of port. Chemometrics and Intelligent Laboratory Systems, 55(1–2), 1–11.

    Article  Google Scholar 

  • Hui, D., Wan, S., Su, B., Katul, G., Monson, R., & Luo, Y. (2004). Gap-filling missing data in eddy covariance measurements using multiple imputation (MI) for annual estimations. Agricultural and Forest Meteorology, 121(1–2), 93–111.

    Article  Google Scholar 

  • Isaacs, J. C. (2014). Representational learning for sonar ATR. In SPIE Defense + Security. In: Detection and Sensing of Mines, Explosive Objects, and Obscured Targets XIX. International Society for Optics and Photonics, vol. 9072, p. 907203. https://doi.org/10.1117/12.2053057.

  • Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., & Kolehmainen, M. (2004). Methods for imputation of missing values in air quality data sets. Atmospheric Environment, 38(18), 2895–2907.

    Article  Google Scholar 

  • Kalousis, A., & Hilario, M. (2000). Supervised knowledge discovery from incomplete data. In: Proceedings of the 2nd International Conference on Data Mining. WIT Press. http://cui.unige.ch/AI-group/research/metal/Papers/missingvalues.ps. Accessed Oct 2016.

  • Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization (PSO). In: Proceedings of IEEE International Conference on Neural Networks (ICNN), Perth, Australia, vol. 4, pp. 1942–1948.

    Google Scholar 

  • Leke, C., & Marwala, T. (2016). Missing data estimation in high-dimensional datasets: A swarm intelligence-deep neural network approach. In: International Conference in Swarm Intelligence. Springer International Publishing, pp. 259–270.

    Google Scholar 

  • Leke, C., Twala, B., & Marwala, T. (2014). Modeling of missing data prediction: Computational intelligence and optimization algorithms. In: 2014 IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 1400–1404.

    Google Scholar 

  • Little, R., & Rubin, D. (2014). Statistical analysis with missing data (Vol. 333). New York: Wiley.

    MATH  Google Scholar 

  • Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.

    MATH  Google Scholar 

  • Liu, Y., & Brown, S. D. (2013). Comparison of five iterative imputation methods for multivariate classification. Chemometrics and Intelligent Laboratory Systems, 120, 106–115.

    Article  Google Scholar 

  • Lu, P. J., & Hsu, T. C. (2002). Application of autoassociative neural network on gas-path sensor data validation. Journal of Propulsion and Power, 18(4), 879–888.

    Article  Google Scholar 

  • Marwala, T. (2010). Finite element model updating using computational intelligence techniques: Applications to structural dynamics. Heidelberg: Springer.

    Book  MATH  Google Scholar 

  • Marwala, T., & Lagazio, M. (2011). Militarized conflict modeling using computational intelligence techniques. London: Springer.

    Book  Google Scholar 

  • Marwala, T. (2009). Computational intelligence for missing data imputation: Estimation and management knowledge optimization techniques. Hershey, New York: Information Science Reference.

    Book  Google Scholar 

  • Marwala, T. (2001). Probabilistic fault identification using a committee of neural networks and vibration data. Journal of Aircraft, 38(1), 138–146.

    Article  Google Scholar 

  • Marwala, T., & Chakraverty, S. (2006). Fault classification in structures with incomplete measured data using autoassociative neural networks and genetic algorithm. Current Science, 90(4), 542–549.

    Google Scholar 

  • Marwala, T. (2013). Economic modelling using artificial intelligence methods. London: Springer.

    Book  MATH  Google Scholar 

  • Ming-Hau, C. (2010). Pattern recognition of business failure by autoassociative neural networks in considering the missing values. International Computer Symposium (ICS) (pp. 711–715). Taiwan: Taipei.

    Google Scholar 

  • Mistry, J., Nelwamondo, F., & Marwala, T. (2008). Estimating missing data and determining the confidence of the estimate data. In: Seventh International Conference on Machine Learning and Applications, San Diego, CA, USA, pp. 752–755.

    Google Scholar 

  • Nelwamondo, F. V., Mohamed, S., & Marwala, T. (2007a). Missing data: A comparison of neural network and expectation maximization techniques. Current Science, 93(11), 1514–1521.

    Google Scholar 

  • Nelwamondo, F. V., & Marwala, T. (2007a). Handling missing data from heteroskedastic and non-stationary data. Lecture Notes in Computer Science, 4491(1), 1297–1306

    Google Scholar 

  • Nelwamondo, F. V., & Marwala, T. (2007b). Rough set theory for the treatment of incomplete data. In: Proceedings of the IEEE Conference on Fuzzy Systems, London, UK, pp. 338–343.

    Google Scholar 

  • Nelwamondo, F. V., & Marwala, T. (2007c). Fuzzy ARTMAP and neural network approach to online processing of inputs with missing values. SAIEE Africa Research Journal, 98(2), 45–51.

    Google Scholar 

  • Nelwamondo, F. V., Mohamed, S., & Marwala, T. (2007b). Missing data: A comparison of neural network and expectation maximisation techniques. Current Science, 93(12), 1514–1521.

    Google Scholar 

  • Nelwamondo, F. V., & Marwala, T. (2008). Techniques for handling missing data: applications to online condition monitoring. International Journal of Innovative Computing, Information and Control, 4(6), 1507–1526.

    Google Scholar 

  • Nishanth, K. J., & Ravi, V. (2013). A computational intelligence based online data imputation method: An application for banking. Journal of Information Processing Systems, 9(4), 633–650.

    Article  Google Scholar 

  • Pérez, A., Dennis, R. J., Gil, J. F. A., Róndon, M. A., & López, A. (2002). Use of the mean, hot deck and multiple imputation techniques to predict outcome in intensive care unit patients in Colombia. Journal of Statistics in Medicine, 21(24), 3885–3896.

    Article  Google Scholar 

  • Pigott, T. D. (2001). A review of methods for missing data. Educational Research and Evaluation, 7(4), 353–383.

    Article  Google Scholar 

  • Poleto, F. Z., Singer, J. M., & Paulino, C. D. (2011). Missing data mechanisms and their implications on the analysis of categorical data. Statistics and Computing, 21(1), 31–43.

    Article  MathSciNet  MATH  Google Scholar 

  • Polikar, R., De Pasquale, J., Mohammed, H. S., Brown, G., & Kuncheva, L. I. (2010). Learn ++mf: A random subspace approach for the missing feature problem. Pattern Recognition, 43(11), 3817–3832.

    Article  MATH  Google Scholar 

  • Ramoni, M., & Sebastiani, P. (2001). Robust learning with missing data. Journal of Machine Learning, 45(2), 147–170.

    Article  MATH  Google Scholar 

  • Rubin, D. (1978). Multiple imputations in sample surveys-a phenomenological Bayesian approach to nonresponse. Proceedings of the survey research methods section of the American Statistical Association, 1, 20–34.

    Google Scholar 

  • Sartori, N., Salvan, A., & Thomaseth, K. (2005). Multiple imputation of missing values in a cancer mortality analysis with estimated exposure dose. Computational Statistics & Data Analysis, 49(3), 937–953.

    Article  MathSciNet  MATH  Google Scholar 

  • Scheffer, J. (2000). Dealing with missing data. Research Letters in the Information and Mathematical Sciences. 3:153–160. (last accessed: 18-March-2016). [Online]. Available: http://www.massey.ac.nz/wwiims/research/letters.

  • Shinozaki, T., & Ostendorf, M. (2008). Cross-validation and aggregated EM training for robust parameter estimation. Computer Speech & Language, 22(2), 185–195.

    Article  Google Scholar 

  • Silva-Ramirez, E.-L., Pino-Mejias, R., Lopez-Coello, M., & Cubiles-de-la Vega, M.-D. (2011). Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Networks, 24(1), 121–129.

    Article  Google Scholar 

  • Smauoi, N., & Al-Yakoob, S. (2003). Analyzing the dynamics of cellular flames using karhunenloeve decomposition and autoassociative neural networks. Society for Industrial and Applied Mathematics, 24, 1790–1808.

    MATH  Google Scholar 

  • Steeb, W.-H. (2008). The Nonlinear Workbook. Singapore: World Scientific.

    Book  MATH  Google Scholar 

  • Stolkin, R., Greig, A., Hodgetts, M., & Gilby, J. (2008). An EM/E-MRF algorithm for adaptive model-based tracking in extremely poor visibility. Image and Vision Computing, 26(4), 480–495.

    Article  Google Scholar 

  • Suykens, J. A. K., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9(3), 293–300.

    Article  Google Scholar 

  • Tim, T., Mutajogire, M., & Marwala, T. (2004). Stock market prediction using evolutionary neural networks (pp. 123–133). PRASA: Fifteenth Annual Symposium of the Pattern Recognition.

    Google Scholar 

  • Tremblay, M. C., Dutta, K., & Vandermeer, D. (2010). Using data mining techniques to discover bias patterns in missing data. Journal of Data and Information Quality, 2(1), 1–19.

    Article  Google Scholar 

  • Twala, B. (2009). An empirical comparison of techniques for handling incomplete data using decision trees. Applied Artificial Intelligence, 23(5), 373–405.

    Article  Google Scholar 

  • Twala, B., & Cartwright, M. (2010). Ensemble missing data techniques for software effort prediction. Intelligent Data Analysis., 14(3), 299–331.

    Article  Google Scholar 

  • Twala, B. E. T. H., Jones, M. C., & Hand, D. J. (2008). Good methods for coping with missing data in decision trees. Pattern Recognition Letters, 29(7), 950–956.

    Article  Google Scholar 

  • Twala, B., & Phorah, M. (2010). Predicting incomplete gene microarray data with the use of supervised learning algorithms. Pattern Recognition Letters, 31, 2061–2069.

    Article  Google Scholar 

  • Twala, B. E. T. H. (2005). Effective techniques for handling incomplete data using decision trees. Unpublished doctoral dissertation, The Open University, UK.

    Google Scholar 

  • Wang, S. (2005). Classification with incomplete survey data: A Hopfield neural network approach. Computers & Operations Research, 24, 53–62.

    Google Scholar 

  • Yansaneh, I. S., Wallace, L. S., & Marker, D. A. (1998). Imputation methods for large complex datasets: An application to the Nehis. In: Proceedings of the Survey Research Methods Section, pp. 314–319.

    Google Scholar 

  • Yu, S., & Kobayashi, H. (2003). A hidden semi-Markov model with missing data and multiple observation sequences for mobility tracking. Signal Processing, 83(2), 235–250.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Collins Achepsah Leke .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Leke, C.A., Marwala, T. (2019). Introduction to Missing Data Estimation. In: Deep Learning and Missing Data in Engineering Systems. Studies in Big Data, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-030-01180-2_1

Download citation

Publish with us

Policies and ethics