Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error

Freckleton, Robert P.

doi:10.1007/s00265-010-1045-6

Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error

Original Paper
Published: 14 September 2010

Volume 65, pages 91–101, (2011)
Cite this article

Behavioral Ecology and Sociobiology Aims and scope Submit manuscript

Robert P. Freckleton¹

4640 Accesses
247 Citations
Explore all metrics

Abstract

There has been a great deal of recent discussion of the practice of regression analysis (or more generally, linear modelling) in behaviour and ecology. In this paper, I wish to highlight two factors that have been under-considered, collinearity and measurement error in predictors, as well as to consider what happens when both exist at the same time. I examine what the consequences are for conventional regression analysis (ordinary least squares, OLS) as well as model averaging methods, typified by information theoretic approaches based around Akaike’s information criterion. Collinearity causes variance inflation of estimated slopes in OLS analysis, as is well known. In the presence of collinearity, model averaging reduces this variance for predictors with weak effects, but also can lead to parameter bias. When collinearity is strong or when all predictors have strong effects, model averaging relies heavily on the full model including all predictors and hence the results from this and OLS are essentially the same. I highlight that it is not safe to simply eliminate collinear variables without due consideration of their likely independent effects as this can lead to biases. Measurement error is also considered and I show that when collinearity exists, this can lead to extreme biases when predictors are collinear, have strong effects but differ in their degree of measurement error. I highlight techniques for dealing with and diagnosing these problems. These results reinforce that automated model selection techniques should not be relied on in the analysis of complex multivariable datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sampling Techniques for Quantitative Research

Saturation in qualitative research: exploring its conceptualization and operationalization

Article Open access 14 September 2017

Benjamin Saunders, Julius Sim, … Clare Jinks

Mixed methods research: what it is and what it could be

Article Open access 29 March 2019

Rob Timans, Paul Wouters & Johan Heilbron

References

Anderson DR (2008) Model-based inference in the life sciences. Springer, New York
Book Google Scholar
Burnham KP, Anderson DR (1998) Model selection and multimodel inference. Springer, Berlin
Google Scholar
Burnham KP, Anderson DR (2002) Model selection and multimodel inference. Springer, Berlin
Google Scholar
Burnham KP, Anderson D, Huyvaert K (2010) AICc model selection in ecological and behavioural science: some background, observations and comparisons. Behav Ecol Sociobiol. doi:10.1007/s00265-010-1029-6
Carroll RJ, Spiegelman CH, Gordon Lan KK, Bailey KT, Abbott RD (1984) On errors-in-variables for binary regression models. Biometrika 71:19–25
Article Google Scholar
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu C (2006) Measurement error in nonlinear models: a modern perspective. Chapman & Hall, London
Book Google Scholar
Chatfield C (1996) The analysis of time series. Chapman & Hall, London
Google Scholar
Claeskens G, Hjort NL (2008) Model selection and model averaging. Cambridge University Press, Cambridge
Google Scholar
Cook JR, Stefanski LA (1994) Simulation-extrapolation estimation in parametric error models. J Am Stat Soc 89:1314–1328
Google Scholar
Dennis B, Ponciano JM, Lele SR, Taper ML, Staples DF (2006) Estimating density dependence, process noise and observation error. Ecol Monogr 76:323–341
Article Google Scholar
Draper NR, Smith H (1998) Applied regression analysis. Blackwell Scientific, Oxford
Google Scholar
Ellner SP, Seifu Y, Smith RH (2002) Fitiing population dynamic models to time-series data by gradient matching. Ecology 83:2256–2270
Article Google Scholar
Felsenstein J (1988) Phylogenies and quantitative characters. Ann Rev Ecolog Syst 19:445–471
Article Google Scholar
Forstmeier W, Schielzeth H (2010) Cryptic multiple hypothesis testing in linear models: overestimated effect sizes and the winner’s curse. Behav Ecol Sociobiol. doi:10.1007/s00265-010-1038-5
Fox J-P, Glas C (2003) Bayesian modelling of measurement error in predictor variables using item response theory. Psychometrika 68:169–191
Article Google Scholar
Freckleton RP (2002) On the misuse of residuals in ecology: regression of residuals versus multiple regression. J Anim Ecol 71:542–545
Article Google Scholar
Freckleton RP, Watkinson AR, Thomas TH, Webb DJ (1998) Yield of sugar beet in relation to weather and nutrients. Agric For Meteorol 93:39–51
Article Google Scholar
Freckleton RP, Watkinson AR, Green RE, Sutherland WJ (2006) Census error and the detection of density dependence. J Anim Ecol 75:837–851
Article PubMed Google Scholar
Garamszegi LZ (2010) Information-theoretic approaches in statistical analysis in behavioural ecology: an introduction. Behav Ecol Sociobiol. doi:10.1007/s00265-010-1028-7
Garcia-Berthou E (2001) On the misuse of residuals in ecology: testing regression residuals vs. the analysis of covariance. J Anim Ecol 70:708–711
Article Google Scholar
Goldstein H (1995) Multilevel statistical models. Eward Arnold, London
Google Scholar
Grafen A, Hails R (2002) Modern statistics for the life sciences. Oxford University Press, Oxford
Google Scholar
Haining R (1990) Spatial data analysis in the social and environmental sciences. Cambridge University Press, Cambridge
Google Scholar
Harvey PH, Pagel MD (1991) The comparative method in evolutionary biology. Oxford University Press, Oxford
Google Scholar
Hegyi G, Garamszegi LZ (2010) Using information theory as a substitute for stepwise regression in ecology and behavious. Behav Ecol Sociobiol. doi:10.1007/s00265-010-1036-7
Johnson JB, Omland KS (2004) Model selection in ecology and evolution. Trends Ecol Evol 19:101–108
Article PubMed Google Scholar
Leigh RA, Johnston AE (1994) Long-term experiments in agricultural and ecological science. In CAB International, Wallingford
Linden A, Knape J (2009) Estimating environmental effects on population dynamics: consequences of observation error. Oikos 118:675–680
Article Google Scholar
Link WA, Barker RJ (2006) Model wieghts and the foundations of multimodel inference. Ecology 87:2626–2635
Article PubMed Google Scholar
Quinn G, Keough M (2002) Experimental design and data analysis for biologists. Cambridge University Press, Cambridge
Google Scholar
Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55
Article Google Scholar
Rushton SP, Ormerod SJ, Kerby G (2004) New paradigms for modelling species distributions. J Appl Ecol 41:193–200
Article Google Scholar
Ruxton GD, Colgrave N (2002) Experimental design for the life sciences. Oxford University Press, Oxford
Google Scholar
Schafer DW (1987) Covariate measurement error in generalized linear models. Biometrika 74:385–391
Article Google Scholar
Shenk TM, White GC, Burnham KP (1998) Sampling variance effects on detecting density dependence from temporal trends in natural populations. Ecol Monogr 68:445–463
Article Google Scholar
Sokal RR, Rohlf FJ (1995) Biometry. W.H. Freeman & Co., New York
Google Scholar
Stefanski LA, Cook JR (1995) Simulation extrapolation: the measurement error jackknife. J Am Stat Assoc 90:1247–1256
Article Google Scholar
Székely T, Freckleton RP, Reynolds JD (2004) Sexual selection explains Rensch’s rule of size dimorphism in shorebirds. Proc Natl Acad Sci 101:12224–12227
Article PubMed Google Scholar
Whittingham MJ, Stephens PA, Bradbury R, Freckleton RP (2006) Why do I still use stepwise regression? J Anim Ecol 42:270–280
Google Scholar

Download references

Acknowledgements

The author is funded by a Royal Society University Research Fellowship. Thanks to Tom Webb, the referees and László Garamszegi for comments on an earlier version of the MS.

Author information

Authors and Affiliations

Department of Animal and Plant Sciences, University of Sheffield, Sheffield, S10 2TN, UK
Robert P. Freckleton

Authors

Robert P. Freckleton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert P. Freckleton.

Additional information

Communicated by L. Garamszegi

This contribution is part of the Special Issue “Model selection, multimodel inference and information-theoretic approaches in behavioural ecology” (see Garamszegi 2010).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Freckleton, R.P. Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error. Behav Ecol Sociobiol 65, 91–101 (2011). https://doi.org/10.1007/s00265-010-1045-6

Download citation

Received: 01 January 2010
Revised: 10 August 2010
Accepted: 10 August 2010
Published: 14 September 2010
Issue Date: January 2011
DOI: https://doi.org/10.1007/s00265-010-1045-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error

Abstract

Access this article

Similar content being viewed by others

Sampling Techniques for Quantitative Research

Saturation in qualitative research: exploring its conceptualization and operationalization

Mixed methods research: what it is and what it could be

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error

Abstract

Access this article

Similar content being viewed by others

Sampling Techniques for Quantitative Research

Saturation in qualitative research: exploring its conceptualization and operationalization

Mixed methods research: what it is and what it could be

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation