Skip to main content
Log in

Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error

  • Original Paper
  • Published:
Behavioral Ecology and Sociobiology Aims and scope Submit manuscript

Abstract

There has been a great deal of recent discussion of the practice of regression analysis (or more generally, linear modelling) in behaviour and ecology. In this paper, I wish to highlight two factors that have been under-considered, collinearity and measurement error in predictors, as well as to consider what happens when both exist at the same time. I examine what the consequences are for conventional regression analysis (ordinary least squares, OLS) as well as model averaging methods, typified by information theoretic approaches based around Akaike’s information criterion. Collinearity causes variance inflation of estimated slopes in OLS analysis, as is well known. In the presence of collinearity, model averaging reduces this variance for predictors with weak effects, but also can lead to parameter bias. When collinearity is strong or when all predictors have strong effects, model averaging relies heavily on the full model including all predictors and hence the results from this and OLS are essentially the same. I highlight that it is not safe to simply eliminate collinear variables without due consideration of their likely independent effects as this can lead to biases. Measurement error is also considered and I show that when collinearity exists, this can lead to extreme biases when predictors are collinear, have strong effects but differ in their degree of measurement error. I highlight techniques for dealing with and diagnosing these problems. These results reinforce that automated model selection techniques should not be relied on in the analysis of complex multivariable datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Anderson DR (2008) Model-based inference in the life sciences. Springer, New York

    Book  Google Scholar 

  • Burnham KP, Anderson DR (1998) Model selection and multimodel inference. Springer, Berlin

    Google Scholar 

  • Burnham KP, Anderson DR (2002) Model selection and multimodel inference. Springer, Berlin

    Google Scholar 

  • Burnham KP, Anderson D, Huyvaert K (2010) AICc model selection in ecological and behavioural science: some background, observations and comparisons. Behav Ecol Sociobiol. doi:10.1007/s00265-010-1029-6

  • Carroll RJ, Spiegelman CH, Gordon Lan KK, Bailey KT, Abbott RD (1984) On errors-in-variables for binary regression models. Biometrika 71:19–25

    Article  Google Scholar 

  • Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu C (2006) Measurement error in nonlinear models: a modern perspective. Chapman & Hall, London

    Book  Google Scholar 

  • Chatfield C (1996) The analysis of time series. Chapman & Hall, London

    Google Scholar 

  • Claeskens G, Hjort NL (2008) Model selection and model averaging. Cambridge University Press, Cambridge

    Google Scholar 

  • Cook JR, Stefanski LA (1994) Simulation-extrapolation estimation in parametric error models. J Am Stat Soc 89:1314–1328

    Google Scholar 

  • Dennis B, Ponciano JM, Lele SR, Taper ML, Staples DF (2006) Estimating density dependence, process noise and observation error. Ecol Monogr 76:323–341

    Article  Google Scholar 

  • Draper NR, Smith H (1998) Applied regression analysis. Blackwell Scientific, Oxford

    Google Scholar 

  • Ellner SP, Seifu Y, Smith RH (2002) Fitiing population dynamic models to time-series data by gradient matching. Ecology 83:2256–2270

    Article  Google Scholar 

  • Felsenstein J (1988) Phylogenies and quantitative characters. Ann Rev Ecolog Syst 19:445–471

    Article  Google Scholar 

  • Forstmeier W, Schielzeth H (2010) Cryptic multiple hypothesis testing in linear models: overestimated effect sizes and the winner’s curse. Behav Ecol Sociobiol. doi:10.1007/s00265-010-1038-5

  • Fox J-P, Glas C (2003) Bayesian modelling of measurement error in predictor variables using item response theory. Psychometrika 68:169–191

    Article  Google Scholar 

  • Freckleton RP (2002) On the misuse of residuals in ecology: regression of residuals versus multiple regression. J Anim Ecol 71:542–545

    Article  Google Scholar 

  • Freckleton RP, Watkinson AR, Thomas TH, Webb DJ (1998) Yield of sugar beet in relation to weather and nutrients. Agric For Meteorol 93:39–51

    Article  Google Scholar 

  • Freckleton RP, Watkinson AR, Green RE, Sutherland WJ (2006) Census error and the detection of density dependence. J Anim Ecol 75:837–851

    Article  PubMed  Google Scholar 

  • Garamszegi LZ (2010) Information-theoretic approaches in statistical analysis in behavioural ecology: an introduction. Behav Ecol Sociobiol. doi:10.1007/s00265-010-1028-7

  • Garcia-Berthou E (2001) On the misuse of residuals in ecology: testing regression residuals vs. the analysis of covariance. J Anim Ecol 70:708–711

    Article  Google Scholar 

  • Goldstein H (1995) Multilevel statistical models. Eward Arnold, London

    Google Scholar 

  • Grafen A, Hails R (2002) Modern statistics for the life sciences. Oxford University Press, Oxford

    Google Scholar 

  • Haining R (1990) Spatial data analysis in the social and environmental sciences. Cambridge University Press, Cambridge

    Google Scholar 

  • Harvey PH, Pagel MD (1991) The comparative method in evolutionary biology. Oxford University Press, Oxford

    Google Scholar 

  • Hegyi G, Garamszegi LZ (2010) Using information theory as a substitute for stepwise regression in ecology and behavious. Behav Ecol Sociobiol. doi:10.1007/s00265-010-1036-7

  • Johnson JB, Omland KS (2004) Model selection in ecology and evolution. Trends Ecol Evol 19:101–108

    Article  PubMed  Google Scholar 

  • Leigh RA, Johnston AE (1994) Long-term experiments in agricultural and ecological science. In CAB International, Wallingford

  • Linden A, Knape J (2009) Estimating environmental effects on population dynamics: consequences of observation error. Oikos 118:675–680

    Article  Google Scholar 

  • Link WA, Barker RJ (2006) Model wieghts and the foundations of multimodel inference. Ecology 87:2626–2635

    Article  PubMed  Google Scholar 

  • Quinn G, Keough M (2002) Experimental design and data analysis for biologists. Cambridge University Press, Cambridge

    Google Scholar 

  • Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55

    Article  Google Scholar 

  • Rushton SP, Ormerod SJ, Kerby G (2004) New paradigms for modelling species distributions. J Appl Ecol 41:193–200

    Article  Google Scholar 

  • Ruxton GD, Colgrave N (2002) Experimental design for the life sciences. Oxford University Press, Oxford

    Google Scholar 

  • Schafer DW (1987) Covariate measurement error in generalized linear models. Biometrika 74:385–391

    Article  Google Scholar 

  • Shenk TM, White GC, Burnham KP (1998) Sampling variance effects on detecting density dependence from temporal trends in natural populations. Ecol Monogr 68:445–463

    Article  Google Scholar 

  • Sokal RR, Rohlf FJ (1995) Biometry. W.H. Freeman & Co., New York

    Google Scholar 

  • Stefanski LA, Cook JR (1995) Simulation extrapolation: the measurement error jackknife. J Am Stat Assoc 90:1247–1256

    Article  Google Scholar 

  • Székely T, Freckleton RP, Reynolds JD (2004) Sexual selection explains Rensch’s rule of size dimorphism in shorebirds. Proc Natl Acad Sci 101:12224–12227

    Article  PubMed  Google Scholar 

  • Whittingham MJ, Stephens PA, Bradbury R, Freckleton RP (2006) Why do I still use stepwise regression? J Anim Ecol 42:270–280

    Google Scholar 

Download references

Acknowledgements

The author is funded by a Royal Society University Research Fellowship. Thanks to Tom Webb, the referees and László Garamszegi for comments on an earlier version of the MS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert P. Freckleton.

Additional information

Communicated by L. Garamszegi

This contribution is part of the Special Issue “Model selection, multimodel inference and information-theoretic approaches in behavioural ecology” (see Garamszegi 2010).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Freckleton, R.P. Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error. Behav Ecol Sociobiol 65, 91–101 (2011). https://doi.org/10.1007/s00265-010-1045-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00265-010-1045-6

Keywords

Navigation