Skip to main content
Log in

A methodology for treating missing data applied to daily rainfall data in the Candelaro River Basin (Italy)

  • Published:
Environmental Monitoring and Assessment Aims and scope Submit manuscript

Abstract

Environmental time series are often affected by the “presence” of missing data, but when dealing statistically with data, the need to fill in the gaps estimating the missing values must be considered. At present, a large number of statistical techniques are available to achieve this objective; they range from very simple methods, such as using the sample mean, to very sophisticated ones, such as multiple imputation. A brand new methodology for missing data estimation is proposed, which tries to merge the obvious advantages of the simplest techniques (e.g. their vocation to be easily implemented) with the strength of the newest techniques. The proposed method consists in the application of two consecutive stages: once it has been ascertained that a specific monitoring station is affected by missing data, the “most similar” monitoring stations are identified among neighbouring stations on the basis of a suitable similarity coefficient; in the second stage, a regressive method is applied in order to estimate the missing data. In this paper, four different regressive methods are applied and compared, in order to determine which is the most reliable for filling in the gaps, using rainfall data series measured in the Candelaro River Basin located in South Italy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Allison, P. D. (2001). Missing data. Thousand Oaks, CA: Sage.

    Google Scholar 

  • Barca, E., Passarella, G., Lo Presti, R., Masciale, R., & Vurro, M. (2006). HarmoniRiB river basin data documentation: Chapter 7—Candelaro River Basin. Bari, Italy: Water Research Institute of the National Research Council. Retrieved from http://www.harmonirib.com.

  • Chandler, R. E., & Wheater, H. S. (1998). Climate change detection using generalized linear models for rainfall—a case study from the West of Ireland. I. preliminary analysis and modelling of rainfall occurrence. Research Report No. 194, Department of Statistical Science, University College London.

  • Conover, W. J. (1971). Practical nonparametric statistics (2nd ed.). New York: Wiley.

    Google Scholar 

  • Conversano, C. (2003). Incremental Algorithms for missing data imputation based on recursive partitioning. In Proceedings of the 35th symposium on the interface. Salt Lake City, Utah, 12–15 March 2003.

  • Drécourt, J. P., & Madsen, H. (2002). Uncertainty estimation in groundwater modelling using Kalman filtering. In K. Kovar & Z. Hrkal (Eds.), Proceedings of the 4th international conference on calibration and reliability in groundwater modelling, ModelCARE 2002 (Vol. 46(2/3), pp. 306–309). Acta Universitatis Carolinae–Geologica 2002, Prague.

  • Dunn, P. K. (2003). Precipitation occurrence and amount can be modelled simultaneously. Faculty of Sciences, USQ, Working Paper Series SC-MC-0305.

  • Glantz, S. (1988). Primer in biostatistics. Milan, Italy: McGraw-Hill.

    Google Scholar 

  • Goodison, B. E., Louie, P. Y. T., & Yang, D. (1998). WMO solid precipitation measurement intercomparison—final report. Instruments and Observing Methods Report No. 67, WMO/TD-No. 872.

  • Hubbard, K. G. (1994). Spatial variability of daily weather variables in the high plains of the USA. Agricultural and Forest Meteorology, 68, 29–41.

    Article  Google Scholar 

  • Istituto Centrale di Statistica (1983). In ISTAT (Ed.), Annuario di statistiche meteorologiche 1981 (Vol. XXI), Rome.

  • Johansson, B., & Chen, D. (2003). The influence of wind and topography on precipitation distribution in Sweden: Statistical analysis and modelling. International Journal of Climatology, 23, 1523–1535.

    Article  Google Scholar 

  • Johnson, M. L. (2003). Lose something? Ways to find your missing data. Houston Center for Quality of Care and Utilization Studies Professional Development Series 17-09-2003.

  • Little, J. R. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.

    Google Scholar 

  • Neter, J., Kutner, M. H., & Nachtsheim, C. J. (1996). Applied linear statistical models. Chicago, IL: Irwin.

    Google Scholar 

  • Prudhomme, C., & Reed, D. W. (1998). Relationships between extreme daily precipitation and topography in a mountainous region: A case study in Scotland. International Journal of Climatology, 18, 1439–1453.

    Article  Google Scholar 

  • Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.

    Article  Google Scholar 

  • Rubin, D. B. (1987). Multiple imputation for nonresponce in surveys. New York: Wiley.

    Book  Google Scholar 

  • Rubin, D. B. (1988). An overview of multiple imputation. In Proceedings of the survey research methods section of the American statistical association (pp. 79–84). American Statistical Association.

  • Rubel, F., & Hantel, M. (1999). Correction of daily gauge measurements in the Baltic sea drainage basin. Nordic Hydrology, 30, 191–208.

    Google Scholar 

  • Sande, I. G. (1983). Hot-deck imputation procedures. In W. G. Madow & I. Olkin (Eds.), Proceedings of symposium: Incomplete data in sample surveys (Vol. 3). New York: Academic Press.

  • Schafer, J. L. (1997). Analysis of incomplete multivariate data. London: Chapman & Hall.

    Google Scholar 

  • Scheffer, J. (2002). Dealing with missing data. Research Letters in the Information and Mathematical Sciences, 3, 153–160. Retrieved from http://www.massey.ac.nz/~wwiims/research/letters/.

  • Sevruk, B. (1986). Correction of precipitation measurements: Summary report. In B. Sevruk (Ed.), Correction of precipitation measurements (Vol. 23, pp. 13–23). Zurich: Zuricher Geographische Schriften.

  • Sevruk, B., & Nespor, V. (1998). Empirical and theoretical assessment of the wind induced error of rain measurement. Water Science and Technology, 37(11), 171–178.

    Article  Google Scholar 

  • Shannon, C. E. (1948). A mathematical theory of communication. AT&T Technical Journal, 27, 379–423, 623–656.

    Google Scholar 

  • Theil, H. (1950). A rank-invariant method of linear and polynomial regression analysis. Indicationes Mathematicae, 12, 85–91.

    Google Scholar 

  • Vejen, F., Allerup, P., & Madsen, H. (1998). Korrection for fejlkilder af daglige nedbørmålinger i Danmark. Technical Report 98-9, Danish Meteorological Institute. In Danish.

  • Wooldridge, J. (2006). Introductory econometrics: A modern approach (3rd ed.). Cincinnati, OH: South-Western College.

    Google Scholar 

  • Yevjevich, V. (1972). Probability and statistics in hydrology. Fort Collins, CO: Water Resources Publications.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giuseppe Passarella.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lo Presti, R., Barca, E. & Passarella, G. A methodology for treating missing data applied to daily rainfall data in the Candelaro River Basin (Italy). Environ Monit Assess 160, 1–22 (2010). https://doi.org/10.1007/s10661-008-0653-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10661-008-0653-3

Keywords

Navigation