Abstract
Environmental time series are often affected by the “presence” of missing data, but when dealing statistically with data, the need to fill in the gaps estimating the missing values must be considered. At present, a large number of statistical techniques are available to achieve this objective; they range from very simple methods, such as using the sample mean, to very sophisticated ones, such as multiple imputation. A brand new methodology for missing data estimation is proposed, which tries to merge the obvious advantages of the simplest techniques (e.g. their vocation to be easily implemented) with the strength of the newest techniques. The proposed method consists in the application of two consecutive stages: once it has been ascertained that a specific monitoring station is affected by missing data, the “most similar” monitoring stations are identified among neighbouring stations on the basis of a suitable similarity coefficient; in the second stage, a regressive method is applied in order to estimate the missing data. In this paper, four different regressive methods are applied and compared, in order to determine which is the most reliable for filling in the gaps, using rainfall data series measured in the Candelaro River Basin located in South Italy.
Similar content being viewed by others
References
Allison, P. D. (2001). Missing data. Thousand Oaks, CA: Sage.
Barca, E., Passarella, G., Lo Presti, R., Masciale, R., & Vurro, M. (2006). HarmoniRiB river basin data documentation: Chapter 7—Candelaro River Basin. Bari, Italy: Water Research Institute of the National Research Council. Retrieved from http://www.harmonirib.com.
Chandler, R. E., & Wheater, H. S. (1998). Climate change detection using generalized linear models for rainfall—a case study from the West of Ireland. I. preliminary analysis and modelling of rainfall occurrence. Research Report No. 194, Department of Statistical Science, University College London.
Conover, W. J. (1971). Practical nonparametric statistics (2nd ed.). New York: Wiley.
Conversano, C. (2003). Incremental Algorithms for missing data imputation based on recursive partitioning. In Proceedings of the 35th symposium on the interface. Salt Lake City, Utah, 12–15 March 2003.
Drécourt, J. P., & Madsen, H. (2002). Uncertainty estimation in groundwater modelling using Kalman filtering. In K. Kovar & Z. Hrkal (Eds.), Proceedings of the 4th international conference on calibration and reliability in groundwater modelling, ModelCARE 2002 (Vol. 46(2/3), pp. 306–309). Acta Universitatis Carolinae–Geologica 2002, Prague.
Dunn, P. K. (2003). Precipitation occurrence and amount can be modelled simultaneously. Faculty of Sciences, USQ, Working Paper Series SC-MC-0305.
Glantz, S. (1988). Primer in biostatistics. Milan, Italy: McGraw-Hill.
Goodison, B. E., Louie, P. Y. T., & Yang, D. (1998). WMO solid precipitation measurement intercomparison—final report. Instruments and Observing Methods Report No. 67, WMO/TD-No. 872.
Hubbard, K. G. (1994). Spatial variability of daily weather variables in the high plains of the USA. Agricultural and Forest Meteorology, 68, 29–41.
Istituto Centrale di Statistica (1983). In ISTAT (Ed.), Annuario di statistiche meteorologiche 1981 (Vol. XXI), Rome.
Johansson, B., & Chen, D. (2003). The influence of wind and topography on precipitation distribution in Sweden: Statistical analysis and modelling. International Journal of Climatology, 23, 1523–1535.
Johnson, M. L. (2003). Lose something? Ways to find your missing data. Houston Center for Quality of Care and Utilization Studies Professional Development Series 17-09-2003.
Little, J. R. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.
Neter, J., Kutner, M. H., & Nachtsheim, C. J. (1996). Applied linear statistical models. Chicago, IL: Irwin.
Prudhomme, C., & Reed, D. W. (1998). Relationships between extreme daily precipitation and topography in a mountainous region: A case study in Scotland. International Journal of Climatology, 18, 1439–1453.
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.
Rubin, D. B. (1987). Multiple imputation for nonresponce in surveys. New York: Wiley.
Rubin, D. B. (1988). An overview of multiple imputation. In Proceedings of the survey research methods section of the American statistical association (pp. 79–84). American Statistical Association.
Rubel, F., & Hantel, M. (1999). Correction of daily gauge measurements in the Baltic sea drainage basin. Nordic Hydrology, 30, 191–208.
Sande, I. G. (1983). Hot-deck imputation procedures. In W. G. Madow & I. Olkin (Eds.), Proceedings of symposium: Incomplete data in sample surveys (Vol. 3). New York: Academic Press.
Schafer, J. L. (1997). Analysis of incomplete multivariate data. London: Chapman & Hall.
Scheffer, J. (2002). Dealing with missing data. Research Letters in the Information and Mathematical Sciences, 3, 153–160. Retrieved from http://www.massey.ac.nz/~wwiims/research/letters/.
Sevruk, B. (1986). Correction of precipitation measurements: Summary report. In B. Sevruk (Ed.), Correction of precipitation measurements (Vol. 23, pp. 13–23). Zurich: Zuricher Geographische Schriften.
Sevruk, B., & Nespor, V. (1998). Empirical and theoretical assessment of the wind induced error of rain measurement. Water Science and Technology, 37(11), 171–178.
Shannon, C. E. (1948). A mathematical theory of communication. AT&T Technical Journal, 27, 379–423, 623–656.
Theil, H. (1950). A rank-invariant method of linear and polynomial regression analysis. Indicationes Mathematicae, 12, 85–91.
Vejen, F., Allerup, P., & Madsen, H. (1998). Korrection for fejlkilder af daglige nedbørmålinger i Danmark. Technical Report 98-9, Danish Meteorological Institute. In Danish.
Wooldridge, J. (2006). Introductory econometrics: A modern approach (3rd ed.). Cincinnati, OH: South-Western College.
Yevjevich, V. (1972). Probability and statistics in hydrology. Fort Collins, CO: Water Resources Publications.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lo Presti, R., Barca, E. & Passarella, G. A methodology for treating missing data applied to daily rainfall data in the Candelaro River Basin (Italy). Environ Monit Assess 160, 1–22 (2010). https://doi.org/10.1007/s10661-008-0653-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10661-008-0653-3