Abstract
There has been a great deal of recent discussion of the practice of regression analysis (or more generally, linear modelling) in behaviour and ecology. In this paper, I wish to highlight two factors that have been under-considered, collinearity and measurement error in predictors, as well as to consider what happens when both exist at the same time. I examine what the consequences are for conventional regression analysis (ordinary least squares, OLS) as well as model averaging methods, typified by information theoretic approaches based around Akaike’s information criterion. Collinearity causes variance inflation of estimated slopes in OLS analysis, as is well known. In the presence of collinearity, model averaging reduces this variance for predictors with weak effects, but also can lead to parameter bias. When collinearity is strong or when all predictors have strong effects, model averaging relies heavily on the full model including all predictors and hence the results from this and OLS are essentially the same. I highlight that it is not safe to simply eliminate collinear variables without due consideration of their likely independent effects as this can lead to biases. Measurement error is also considered and I show that when collinearity exists, this can lead to extreme biases when predictors are collinear, have strong effects but differ in their degree of measurement error. I highlight techniques for dealing with and diagnosing these problems. These results reinforce that automated model selection techniques should not be relied on in the analysis of complex multivariable datasets.
Similar content being viewed by others
References
Anderson DR (2008) Model-based inference in the life sciences. Springer, New York
Burnham KP, Anderson DR (1998) Model selection and multimodel inference. Springer, Berlin
Burnham KP, Anderson DR (2002) Model selection and multimodel inference. Springer, Berlin
Burnham KP, Anderson D, Huyvaert K (2010) AICc model selection in ecological and behavioural science: some background, observations and comparisons. Behav Ecol Sociobiol. doi:10.1007/s00265-010-1029-6
Carroll RJ, Spiegelman CH, Gordon Lan KK, Bailey KT, Abbott RD (1984) On errors-in-variables for binary regression models. Biometrika 71:19–25
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu C (2006) Measurement error in nonlinear models: a modern perspective. Chapman & Hall, London
Chatfield C (1996) The analysis of time series. Chapman & Hall, London
Claeskens G, Hjort NL (2008) Model selection and model averaging. Cambridge University Press, Cambridge
Cook JR, Stefanski LA (1994) Simulation-extrapolation estimation in parametric error models. J Am Stat Soc 89:1314–1328
Dennis B, Ponciano JM, Lele SR, Taper ML, Staples DF (2006) Estimating density dependence, process noise and observation error. Ecol Monogr 76:323–341
Draper NR, Smith H (1998) Applied regression analysis. Blackwell Scientific, Oxford
Ellner SP, Seifu Y, Smith RH (2002) Fitiing population dynamic models to time-series data by gradient matching. Ecology 83:2256–2270
Felsenstein J (1988) Phylogenies and quantitative characters. Ann Rev Ecolog Syst 19:445–471
Forstmeier W, Schielzeth H (2010) Cryptic multiple hypothesis testing in linear models: overestimated effect sizes and the winner’s curse. Behav Ecol Sociobiol. doi:10.1007/s00265-010-1038-5
Fox J-P, Glas C (2003) Bayesian modelling of measurement error in predictor variables using item response theory. Psychometrika 68:169–191
Freckleton RP (2002) On the misuse of residuals in ecology: regression of residuals versus multiple regression. J Anim Ecol 71:542–545
Freckleton RP, Watkinson AR, Thomas TH, Webb DJ (1998) Yield of sugar beet in relation to weather and nutrients. Agric For Meteorol 93:39–51
Freckleton RP, Watkinson AR, Green RE, Sutherland WJ (2006) Census error and the detection of density dependence. J Anim Ecol 75:837–851
Garamszegi LZ (2010) Information-theoretic approaches in statistical analysis in behavioural ecology: an introduction. Behav Ecol Sociobiol. doi:10.1007/s00265-010-1028-7
Garcia-Berthou E (2001) On the misuse of residuals in ecology: testing regression residuals vs. the analysis of covariance. J Anim Ecol 70:708–711
Goldstein H (1995) Multilevel statistical models. Eward Arnold, London
Grafen A, Hails R (2002) Modern statistics for the life sciences. Oxford University Press, Oxford
Haining R (1990) Spatial data analysis in the social and environmental sciences. Cambridge University Press, Cambridge
Harvey PH, Pagel MD (1991) The comparative method in evolutionary biology. Oxford University Press, Oxford
Hegyi G, Garamszegi LZ (2010) Using information theory as a substitute for stepwise regression in ecology and behavious. Behav Ecol Sociobiol. doi:10.1007/s00265-010-1036-7
Johnson JB, Omland KS (2004) Model selection in ecology and evolution. Trends Ecol Evol 19:101–108
Leigh RA, Johnston AE (1994) Long-term experiments in agricultural and ecological science. In CAB International, Wallingford
Linden A, Knape J (2009) Estimating environmental effects on population dynamics: consequences of observation error. Oikos 118:675–680
Link WA, Barker RJ (2006) Model wieghts and the foundations of multimodel inference. Ecology 87:2626–2635
Quinn G, Keough M (2002) Experimental design and data analysis for biologists. Cambridge University Press, Cambridge
Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55
Rushton SP, Ormerod SJ, Kerby G (2004) New paradigms for modelling species distributions. J Appl Ecol 41:193–200
Ruxton GD, Colgrave N (2002) Experimental design for the life sciences. Oxford University Press, Oxford
Schafer DW (1987) Covariate measurement error in generalized linear models. Biometrika 74:385–391
Shenk TM, White GC, Burnham KP (1998) Sampling variance effects on detecting density dependence from temporal trends in natural populations. Ecol Monogr 68:445–463
Sokal RR, Rohlf FJ (1995) Biometry. W.H. Freeman & Co., New York
Stefanski LA, Cook JR (1995) Simulation extrapolation: the measurement error jackknife. J Am Stat Assoc 90:1247–1256
Székely T, Freckleton RP, Reynolds JD (2004) Sexual selection explains Rensch’s rule of size dimorphism in shorebirds. Proc Natl Acad Sci 101:12224–12227
Whittingham MJ, Stephens PA, Bradbury R, Freckleton RP (2006) Why do I still use stepwise regression? J Anim Ecol 42:270–280
Acknowledgements
The author is funded by a Royal Society University Research Fellowship. Thanks to Tom Webb, the referees and László Garamszegi for comments on an earlier version of the MS.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by L. Garamszegi
This contribution is part of the Special Issue “Model selection, multimodel inference and information-theoretic approaches in behavioural ecology” (see Garamszegi 2010).
Rights and permissions
About this article
Cite this article
Freckleton, R.P. Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error. Behav Ecol Sociobiol 65, 91–101 (2011). https://doi.org/10.1007/s00265-010-1045-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00265-010-1045-6