Top

2010 | Book

Read chapter Read first chapter

Regression

Linear Models in Statistics

Authors: N. H. Bingham, John M. Fry

Publisher: Springer London

Book Series : Springer Undergraduate Mathematics Series

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

Regression is the branch of Statistics in which a dependent variable of interest is modelled as a linear combination of one or more predictor variables, together with a random error. The subject is inherently two- or higher- dimensional, thus an understanding of Statistics in one dimension is essential.

Regression: Linear Models in Statistics fills the gap between introductory statistical theory and more specialist sources of information. In doing so, it provides the reader with a number of worked examples, and exercises with full solutions.

The book begins with simple linear regression (one predictor variable), and analysis of variance (ANOVA), and then further explores the area through inclusion of topics such as multiple linear regression (several predictor variables) and analysis of covariance (ANCOVA). The book concludes with special topics such as non-parametric regression and mixed models, time series, spatial processes and design of experiments.

Aimed at 2nd and 3rd year undergraduates studying Statistics, Regression: Linear Models in Statistics requires a basic knowledge of (one-dimensional) Statistics, as well as Probability and standard Linear Algebra. Possible companions include John Haigh’s Probability Models, and T. S. Blyth & E.F. Robertsons’ Basic Linear Algebra and Further Linear Algebra.

Frontmatter

1. Linear Regression

Abstract

When we first meet Statistics, we encounter random quantities (random variables, in probability language, or variates, in statistical language) one at a time. This suffices for a first course.Soon however we need to handle more than one random quantity at a time. Already we have to think about how they are related to each other. Let us take the simplest case first, of two variables. Consider first the two extreme cases. At one extreme, the two variables may be independent (unrelated). For instance, one might result from laboratory data taken last week, the other might come from old trade statistics. The two are unrelated. Each is uninformative about the other. They are best looked at separately. What we have here are really two one-dimensional problems, rather than one two-dimensional problem, and it is best to consider matters in these terms.

N. H. Bingham, John M. Fry

2. The Analysis of Variance (ANOVA)

Abstract

While the linear regression of Chapter 1 goes back to the nineteenth century, the Analysis of Variance of this chapter dates from the twentieth century, in applied work by Fisher motivated by agricultural problems (see §2.6). We begin this chapter with some necessary preliminaries, on the special distributions of Statistics needed for small-sample theory: the chi-square distributions χ²(n) (§2.1), the Fisher F-distributions F(m, n) (§2.3), and the independence of normal sample means and sample variances (§2.5). We shall generalise linear regression to multiple regression in Chapters 3 and 4 – which use the Analysis of Variance of this chapter – and unify regression and Analysis of Variance in Chapter 5 on Analysis of Covariance.

N. H. Bingham, John M. Fry

3. Multiple Regression

Abstract

We saw in Chapter 1 how the model

$${y}_{i} = a + b{x}_{i} + {\epsilon }_{i},\qquad {\epsilon }_{i}\quad \mbox{ iid}\quad N(0,{\sigma }^{2})$$

for simple linear regression occurs. We saw also that we may need to consider two or more regressors. We dealt with two regressors uand v, and could deal with three regressors u, vand wsimilarly. But in general we will need to be able to handle any number of regressors, and rather than rely on the finite resources of the alphabet it is better to switch to suffix notation, and use the language of vectors and matrices. For a random vector X, we will write $E\vec{X}$for its mean vector(thus the mean of the ith coordinate X _iis $E({X}_{i}) = {(E\vec{X})}_{i}$), and $\mbox{ var}(\vec{X})$for its covariance matrix(whose (i,j) entry is cov(X _i,X _j)). We will use pregressors, called x ₁,…,x _p, each with a corresponding parameter β₁,…,β_p(‘pfor parameter’). In the equation above, regard aas short for a.1, with 1 as a regressor corresponding to a constant term (the intercept term in the context of linear regression).

N. H. Bingham, John M. Fry

4. Further Multilinear Regression

Abstract

For one regressor x, simple linear regression is fine for fitting straight-line trends. But what about more general trends – quadratic trends, for example? (E.g. height against time for a body falling under gravity is quadratic.) Or cubic trends? (E.g.: the van der Waals equation of state in physical chemistry.) Or quartic? – etc.

N. H. Bingham, John M. Fry

5. Adding additional covariates and the Analysis of Covariance

Abstract

Suppose that having fitted the regression model

$$\vec{y} = X\beta + \epsilon, $$

(M0)

we wish to introduce qadditional explanatory variables into our model. The augmented regression model, M _A, say becomes

$$\vec{y} = X\beta + Z\gamma + \epsilon. $$

(MA)

We rewrite this as

$$\begin{array}{rcl} \vec{y} = X\beta + Z\gamma + \epsilon & =& (X,Z){\left (\beta, \gamma \right )}^{T} + \epsilon, \\ & =& W\delta + \epsilon, \\ \end{array}$$

say, where

$$\begin{array}{rcl} W := (X,Z),\quad \delta := \left (\begin{array}{c} \beta \\ \gamma \end{array} \right )\!.& & \\ \end{array}$$

Here Xis n×pand assumed to be of rank p, Zis n×qof rank q, and the columns of Zare linearly independent of the columns of X. This final assumption means that there is a sense in which the qadditional explanatory variables are adding genuinely new information to that already contained in the pre-existing Xmatrix.

N. H. Bingham, John M. Fry

6. Linear Hypotheses

Abstract

We have seen several examples of hypotheses on models encountered so far. For example, in dealing with polynomial regression §4.1 we met, when dealing with a polynomial model of degree k, the hypothesis that the degree was at most k−1 (that is, that the leading coefficient was zero). In Chapter 5, we encountered nested models, for example two general lines, including two parallel lines. We then met the hypothesis that the slopes were in fact equal (and so the lines were parallel). We can also conduct a statistical check of structural constraints (for instance, that the angles of a triangle sum to two right-angles– see Exercise 6.5).

N. H. Bingham, John M. Fry

7. Model Checking and Transformation of Data

Abstract

In the above, we have assumed several things: (i) the mean μ = Ey is a linear function of the regressors, or of the parameters; (ii) the errors are additive; (iii) the errors are independent; (iv) the errors are normally distributed (Gaussian); (v) the errors have equal variance.

N. H. Bingham, John M. Fry

8. Generalised Linear Models

Abstract

In previous chapters, we have studied the model

$$y = A\beta + \epsilon, $$

where the mean Ey = Aβ depends linearly on the parameters β, the errors are normal (Gaussian), and the errors are additive. We have also seen (Chapter 7) that in some situations, a transformation of the problem may help to correct some departure from our standard model assumptions. For example, in §7.3 on variance-stabilising transformations, we transformed our data from y to some function g(y), to make the variance constant (at least approximately). We did not there address the effect on the error structure of so doing. Of course, $g(y) = g(A\beta + \epsilon )$ as above will not have an additive Gaussian error structure any more, even approximately, in general.

N. H. Bingham, John M. Fry

9. Other topics

Abstract

In §5.1 we considered extending our initial model (M ₀), with p parameters, to an augmented model M _A with a further q parameters. Here, as in Chapter 2, we have p + q < < n, there are many fewer parameters than data points. We now turn to a situation with some similarities but with important contrasts. Here our initial model has fixed effects, but our augmented model adds random effects, which may be comparable in number to the sample size n.

N. H. Bingham, John M. Fry

Backmatter

Title: Regression
Authors: N. H. Bingham
John M. Fry
Publisher: Springer London
Electronic ISBN: 978-1-84882-969-5
Print ISBN: 978-1-84882-968-8
DOI: https://doi.org/10.1007/978-1-84882-969-5

Springer Professional

Regression

Linear Models in Statistics

About this book

Table of Contents

Frontmatter

1. Linear Regression

2. The Analysis of Variance (ANOVA)

3. Multiple Regression

4. Further Multilinear Regression

5. Adding additional covariates and the Analysis of Covariance

6. Linear Hypotheses

7. Model Checking and Transformation of Data

8. Generalised Linear Models

9. Other topics

Backmatter

Premium Partner