Skip to main content
main-content

Über dieses Buch

This book provides a concise and integrated overview of hypothesis testing in four important subject areas, namely linear and nonlinear models, multivariate analysis, and large sample theory. The approach used is a geometrical one based on the concept of projections and their associated idempotent matrices, thus largely avoiding the need to involvematrix ranks. It is shown that all the hypotheses encountered are either linear or asymptotically linear, and that all the underlying models used are either exactly or asymptotically linear normal models. This equivalence can be used, for example, to extend the concept of orthogonality to other models in the analysis of variance, and to show that the asymptotic equivalence of the likelihood ratio, Wald, and Score (Lagrange Multiplier) hypothesis tests generally applies.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Preliminaries

Abstract
Linear algebra is used extensively throughout this book and those topics particularly relevant to the development in this monograph are given within the chapters; other results are given in the Appendix. References to the Appendix are labeled with a prefix “A”, for example A.3 is theorem 3 in the Appendix. Vectors and matrices are denoted by boldface letters a and A, respectively, and scalars are denoted by italics. For example, a = (a i ) is a vector with ith element a i and A = (a ij ) is a matrix with i, jth element a ij . I shall use the same notation with random variables, because using uppercase for random variables and lowercase for their values can cause confusion with vectors and matrices. We endeavor, however, to help the reader by using the lower case letters in the latter half of the alphabet, namely u, v, , z, with the occasional exception (because of common usage) for random variables and the rest of the alphabet for constants. All vectors and matrices contain real elements, that is belong to \(\mathbb{R}\), and we denote n-dimensional Euclidean space by \(\mathbb{R}^{n}\).
George A. F. Seber

Chapter 2. The Linear Hypothesis

Abstract
Ini this chapter we consider a number of linear hypotheses before giving a general definition. Our first example is found in regression analysis.
George A. F. Seber

Chapter 3. Estimation

Abstract
Suppose we have the model \(\mathbf{y} =\boldsymbol{\theta } +\boldsymbol{\varepsilon }\), where \(\mathrm{E}[\boldsymbol{\varepsilon }] = \mathbf{0}\), \(\mathrm{Var}[\boldsymbol{\varepsilon }] =\sigma ^{2}\mathbf{I}_{n}\), and \(\boldsymbol{\theta }\in \varOmega\), a p-dimensional vector space. One reasonable estimate of \(\boldsymbol{\theta }\) would be the value \(\hat{\boldsymbol{\theta }}\), called the least squares estimate, that minimizes the total “error” sum of squares
$$\displaystyle{SS =\sum _{ i=1}^{n}\varepsilon _{ i}^{2} =\parallel \mathbf{y}-\boldsymbol{\theta }\parallel ^{2}}$$
subject to \(\boldsymbol{\theta }\in \varOmega\). A clue as to how we might calculate \(\hat{\boldsymbol{\theta }}\) is by considering the simple case in which y is a point P in three dimensions and Ω is a plane through the origin O. We have to find the point Q (\(=\hat{\boldsymbol{\theta }}\)) in the plane so that PQ 2 is a minimum; this is obviously the case when OQ is the orthogonal projection of OP onto the plane. This idea can now be generalized in the following theorem.
George A. F. Seber

Chapter 4. Hypothesis Testing

Abstract
Given the model \(\mathbf{y} \sim N_{n}(\boldsymbol{\theta },\sigma ^{2}\mathbf{I}_{n})\) and assumption G that \(\boldsymbol{\theta }\in \varOmega\), a p-dimensional subspace of \(\mathbb{R}^{n}\), we wish to test the linear hypothesis \(H:\boldsymbol{\theta }\in \omega\), where ω is a pq dimensional subspace of Ω.
George A. F. Seber

Chapter 5. Inference Properties

Abstract
We assume the model \(\mathbf{y} =\boldsymbol{\theta } +\boldsymbol{\varepsilon }\), \(G:\boldsymbol{\theta }\in \varOmega\), a p-dimensional vector space in \(\mathbb{R}^{n}\), and \(H:\boldsymbol{\theta }\in \omega\), a pq dimensional subspace of Ω; \(\boldsymbol{\varepsilon }\) is \(N_{n}[\mathbf{0},\sigma ^{2}\mathbf{I}_{n}]\). To test H we choose a region W called the critical region and we reject H if and only if y ∈ W. The power of the test \(\beta (W,\boldsymbol{\theta })\) is defined to be probability of rejecting H when \(\boldsymbol{\theta }\) is the true value of \(\mathrm{E}[\mathbf{y}]\). Thus,
$$\displaystyle{\beta (W,\boldsymbol{\theta }) =\Pr [\mathbf{y} \in W\vert \boldsymbol{\theta }]}$$
and is a function of W and \(\boldsymbol{\theta }\). The size of a critical region W is \(\sup _{\boldsymbol{\theta }\in W}\beta (W,\boldsymbol{\theta })\), and if \(\beta (W,\boldsymbol{\theta }) =\alpha\) for all \(\boldsymbol{\theta }\in \omega\), then W is said to be a similar region of size α. If W is of size α and \(\beta (W,\boldsymbol{\theta }) \geq \alpha\) for every \(\boldsymbol{\theta }\in \varOmega -\omega\) (the set of all points in Ω which are not in ω), then W is said to be unbiased. In particular, if we have the strict inequality \(\beta (W,\boldsymbol{\theta }) >\alpha\) for \(\boldsymbol{\theta }\in \varOmega -\omega\), then W is said to be consistent. Finally we define W to be a uniformly most powerful (UMP) critical region of a given class C if W ∈ C and if, for any W′ ∈ C and all \(\boldsymbol{\theta }\in \varOmega -\omega\),
$$\displaystyle{\beta (W,\boldsymbol{\theta }) \geq \beta (W',\boldsymbol{\theta }).}$$
Obviously a wide choice of W is possible for testing H, and so we would endeavor to choose a critical region which has some, or if possible, all of the desired properties mentioned above, namely similarity, unbiasedness or consistency, and providing a UMP test for certain reasonable classes of critical regions. Other criteria such as invariance are also used (Lehmann and Romano 2005). The F-test for H, given by
$$\displaystyle{F = \frac{f_{2}} {f_{1}} \frac{\mathbf{y}'(\mathbf{P}_{\varOmega } -\mathbf{P}_{\omega })\mathbf{y}} {\mathbf{y}'(\mathbf{I}_{n} -\mathbf{P}_{\varOmega })\mathbf{y}},}$$
where f 1 = q and \(f_{2} = n - p\), provides such a critical region W 0, say, and we now consider some properties of W 0.
George A. F. Seber

Chapter 6. Testing Several Hypotheses

Abstract
Let \(\boldsymbol{\theta }\) be an unknown vector parameter, let G be the hypothesis that \(\boldsymbol{\theta }\in \varOmega\), a p-dimensional vector space in \(\mathbb{R}^{n}\), and assume that \(\mathbf{y} \sim N_{n}[\boldsymbol{\theta },\sigma ^{2}\mathbf{I}_{n}]\).
George A. F. Seber

Chapter 7. Enlarging the Model

Abstract
Sometimes after a linear model has been fitted it is realized that more explanatory (x) variables need to be added, as in the following examples.
In an industrial experiment in which the response (y) is the yield and the explanatory variables are temperature, pressure, etc., we may wish to determine what values of the x-variables are needed to produce a certain yield. However, it may be realized that another variable, say concentration, needs to be incorporated in the regression model. This can be readily done by simply using a standard regression computational package. In this case the added variable is quantitative and is readily added into the original model.
George A. F. Seber

Chapter 8. Nonlinear Regression Models

Abstract
Nonlinear models arise when E[y] is a nonlinear function of unknown parameters. Hypotheses about these parameters may be linear or nonlinear. Such models tend to be used when they are suggested by theoretical considerations or used to build non-linear behavior into a model. Even when a linear approximation works well, a nonlinear model may still be used to retain a clear interpretation of the parameters. Once we have established a nonlinear relationship the next problem is how to incorporate the “error” term \(\varepsilon\). Sometimes a nonlinear relationship can be transformed into a linear one but in doing so we may end up with an error term that has awkward properties. In this case it is usually better to work with the non-linear model. These kinds of problems are demonstrated by several examples.
George A. F. Seber

Chapter 9. Multivariate Models

Abstract
Up till now we have been considering various univariate linear models of the form \(y_{i} =\theta _{i} +\varepsilon _{i}\) (i = 1, 2, , n), where \(E[\varepsilon _{i}] = 0\) and the \(\varepsilon _{i}\) are independently and identically distributed. We assumed G that \(\boldsymbol{\theta }\in \varOmega\), where Ω is a p-dimensional vector space in \(\mathbb{R}^{n}\). A natural extension to this is to replace the response variable y i by a 1 × d row vector of response variables y i ′, and replace the vector y = (y i ) by the data matrix
$$\displaystyle{\mathbf{Y} = \left (\begin{array}{c} \mathbf{y}_{1}' \\ \mathbf{y}_{2}'\\ \vdots \\ \mathbf{y}_{n}' \end{array} \right ) = (\mathbf{y}^{(1)},\mathbf{y}^{(2)},\ldots,\mathbf{y}^{(d)}),}$$
say. Here y (j) (j = 1, 2, , d) represents n independent observations on the jth variable of y. Writing \(\mathbf{y}^{(j)} =\boldsymbol{\theta } ^{(j)} + \mathbf{u}^{(j)}\) with E[u (j)] = 0, we now have d univariate models, which will generally not be independent, and we can combine them into one equation giving us
$$\displaystyle{\mathbf{Y} =\boldsymbol{\varTheta } +\mathbf{U},}$$
where \(\boldsymbol{\varTheta }= (\boldsymbol{\theta }^{(1)},\boldsymbol{\theta }^{(2)},\ldots,\boldsymbol{\theta }^{(d)})\), \(\mathbf{U} = (\mathbf{u}^{(1)},\mathbf{u}^{(2)},\ldots,\mathbf{u}^{(d)})\), and E[U] = 0. Of particular interest are vector extensions of experimental designs where each observation is replaced by a vector observation. For example, we can extend the randomized block design
$$\displaystyle{\theta _{ij} =\mu +\alpha _{i} +\tau _{j}\quad (i = 1,2,\ldots,I;j = 1,2,\ldots,J),}$$
George A. F. Seber

Chapter 10. Large Sample Theory: Constraint-Equation Hypotheses

Abstract
Apart from Chap. 8 on nonlinear models we have been considering linear models and hypotheses. We now wish to extend those ideas to non-linear hypotheses based on samples of n independent observations \(x_{1},x_{2},\ldots,x_{n}\) (these may be vectors) from a general probability density function \(f(x,\boldsymbol{\theta })\), where \(\boldsymbol{\theta }= (\theta _{1},\theta _{2},\ldots,\theta _{p})'\) and \(\boldsymbol{\theta }\) is known to belong to W a subset of \(\mathbb{R}^{p}\). We wish to test the null hypothesis H that \(\boldsymbol{\theta }_{T}\), the true value of \(\boldsymbol{\theta }\), belongs to W H , a subset of W, given that n is large. We saw in previous chapters that there are two ways of specifying H; either in the form of “constraint” equations such as \(\mathbf{a}(\boldsymbol{\theta }) = (a_{1}(\boldsymbol{\theta }),a_{2}(\boldsymbol{\theta }),\ldots,a_{q}(\boldsymbol{\theta }))' = \mathbf{0}\), or in the form of “freedom” equations \(\boldsymbol{\theta }=\boldsymbol{\theta } (\boldsymbol{\alpha })\), where \(\boldsymbol{\alpha }= (\alpha _{1},\alpha _{2},\ldots,\alpha _{p-q})'\), or perhaps by a combination of both constraint and freedom equations. Although to any freedom-equation specification there will correspond a constraint-equation specification and vice versa, this relationship is often difficult to derive in practice, and therefore the two forms shall be dealt with separately in this and the next chapter.
George A. F. Seber

Chapter 11. Large Sample Theory: Freedom-Equation Hypotheses

Abstract
In this chapter we assume once again that \(\boldsymbol{\theta }\in W\). However our hypothesis H now takes the form of freedom equations, namely \(\boldsymbol{\theta }=\boldsymbol{\theta } (\boldsymbol{\alpha })\), where \(\boldsymbol{\alpha }= (\alpha _{1},\alpha _{2},\ldots,\alpha _{p-q})'\). We require the following additional notation. Let \(\boldsymbol{\Theta }_{\boldsymbol{\alpha }}\) be the p × pq matrix with (i, j)th element \(\partial \theta _{i}/\partial \alpha _{j}\), which we assume to have rank pq. As before, \(L(\boldsymbol{\theta }) =\log \prod _{ i=1}^{n}f(x_{i},\boldsymbol{\theta })\) is the log likelihood function. Let \(\mathbf{D}_{\boldsymbol{\theta }}L(\boldsymbol{\theta })\) and \(\mathbf{D}_{\boldsymbol{\alpha }}L(\boldsymbol{\theta }(\boldsymbol{\alpha }))\) be the column vectors whose ith elements are \(\partial L(\boldsymbol{\theta })/\partial \theta _{i}\) and \(\partial L(\boldsymbol{\theta })/\partial \alpha _{i}\) respectively. As before, \(\mathbf{B}_{\boldsymbol{\theta }}\) is the p × p information matrix with i, jth element
$$\displaystyle{-n^{-1}E_{\boldsymbol{\theta }}\left [\frac{\partial ^{2}L(\boldsymbol{\theta })} {\partial \theta _{i}\partial \theta _{j}} \right ] = -E\left [\frac{\partial ^{2}\log \,f(x,\boldsymbol{\theta })} {\partial \theta _{i}\partial \theta _{j}} \right ],}$$
and we add \(\mathbf{B}_{\boldsymbol{\alpha }}\), the \(p - q \times p - q\) information matrix with i, jth element \(-E[\partial ^{2}\log \,f(x,\boldsymbol{\theta }(\boldsymbol{\alpha }))/\partial \alpha _{i}\partial \alpha _{j}]\). To simplify the notation we use \([\cdot ]_{\boldsymbol{\alpha }}\) to denote that the matrix in square brackets is evaluated at \(\boldsymbol{\alpha }\), for example
$$\displaystyle{\mathbf{B}_{\boldsymbol{\alpha }} = [\boldsymbol{\varTheta }'\mathbf{B}_{\boldsymbol{\theta }}\boldsymbol{\varTheta }]_{\alpha } =\boldsymbol{\varTheta } _{\boldsymbol{\alpha }}'\mathbf{B}_{\boldsymbol{\theta }(\boldsymbol{\alpha })}\boldsymbol{\varTheta }_{\boldsymbol{\alpha }}.}$$
We note that
$$\displaystyle{\mathbf{D}_{\alpha }L(\boldsymbol{\theta }) =\boldsymbol{\varTheta } _{\alpha }'\mathbf{D}_{\boldsymbol{\theta }}L(\boldsymbol{\theta }(\boldsymbol{\alpha })).}$$
George A. F. Seber

Chapter 12. Multinomial Distribution

Abstract
In this chapter we consider asymptotic theory for the multinomial distribution, which is defined below. Although the distribution used is singular, the approximating linear theory can still be used.
George A. F. Seber

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise