main-content

## Über dieses Buch

This monograph reviews some of the work that has been done for longitudi­ nal data in the rapidly expanding field of nonparametric regression. The aim is to give the reader an impression of the basic mathematical tools that have been applied, and also to provide intuition about the methods and applications. Applications to the analysis of longitudinal studies are emphasized to encourage the non-specialist and applied statistician to try these methods out. To facilitate this, FORTRAN programs are provided which carry out some of the procedures described in the text. The emphasis of most research work so far has been on the theoretical aspects of nonparametric regression. It is my hope that these techniques will gain a firm place in the repertoire of applied statisticians who realize the large potential for convincing applications and the need to use these techniques concurrently with parametric regression. This text evolved during a set of lectures given by the author at the Division of Statistics at the University of California, Davis in Fall 1986 and is based on the author's Habilitationsschrift submitted to the University of Marburg in Spring 1985 as well as on published and unpublished work. Completeness is not attempted, neither in the text nor in the references. The following persons have been particularly generous in sharing research or giving advice: Th. Gasser, P. Ihm, Y. P. Mack, V. Mammi tzsch, G . G. Roussas, U. Stadtmuller, W. Stute and R.

## Inhaltsverzeichnis

### 1. Introduction

Abstract
If we analyse longitudinal data, we are usually interested in the estimation of the underlying curve which produces the observed measurements. This curve describes the time course of some measured quantity like the behavior of blood pressure after exercise or the height growth of children. If, as usual, the single measurements of the quantity made at different time points are noisy, we have to employ a statistical method in order to estimate the curve. The classical method here is parametric regression, where we specify a class of regression functions depending on finitely many parameters, the so- called “parametric model”. Such a model is then fitted to the data by estimating the parameters, usually by the least squares method, sometimes, if realistic assumptions on the distribution of the measurement errors are available, by the method of maximum likelihood (Draper and Smith, 1980). For regression models which are nonlinear in the parameters, an iterative numerical algorithm has to be employed in order to obtain the parameter estimates as solutions of the normal equations. This can lead to computational difficulties when we deal with sophisticated nonlinear models.
Hans-Georg Müller

### 2. Longitudinal Data and Regression Models

Abstract
There exist several kinds of longitudinal data, i.e., measurements (observations) of the same quantity (occurrence) on the same subject at different time points, each of which requires different methods for analysis. We will be concerned with time course data, i.e. quantitative measurements as they are of interest in growth processes, physiological processes and in the assessment of the time course of a disease by means of laboratory parameters. Other longitudinal data which are sometimes of interest are event data like the timing of deaths, allograft rejections, or heart attacks, which usually are analysed by applying statistical methods for point processes and survival analysis. Other longitudinal biomedical data are the so-called biosignals EEG (electroencephalogram) and ECG (electrocardiogram). For the analysis of the EEG one can adopt methods from time series analysis, whereas the ECG mainly poses classification and discrimination problems. The appropriate methods for time course data are regression or time series models. If samples of time courses are studied, a classical approach are socalled growth curves which basically consist of multivariate analysis of variance techniques (see e.g. Goldstein, 1986). Our approach here, however, is different: We estimate individual time courses on the basis of a (parametric or nonparametric) regression model and use specific features of these individual estimates to draw inferences about samples.
Hans-Georg Müller

### 3. Nonparametric Regression Methods

Abstract
Besides kernel estimators, commonly used nonparametric regression estimators are local least squares estimators and smoothing splines. Besides these estimators, we also discuss orthogonal series estimators which have been applied mainly in density estimation. All these estimators are localized weighted averages of the data, i.e. linear in the observations (Yi). The general form is
$$\hat g{\text{L}}\left( {\text{t}} \right) = \sum\limits_{i = 1}^n {W_i } ,{\text{n}}\left( {\text{t}} \right){\text{Y}}_i ,{\text{n}}$$
with weight functions Wi,n(t), and different estimates differ only with respect to the weight functions. As we will see, the estimators considered do not differ too much and asymptotically they are all equivalent to more or less complicated kernel estimators. Therefore, kernel estimators are very general and also the method which is most easily understood intuitively.
Hans-Georg Müller

### 4. Kernel and Local Weighted Least Squares Methods

Abstract
It is assumed from now on that in the model (2.1)
$${\text{Y}}_i {\text{,n-g(t}}_{i,n} ) + \varepsilon _{i,n} {\text{ ,i = 1,}}...{\text{,n}}$$
where usually indices n are omitted, 0≤t1≤t2≤…≤tn≤l without loss of generality. As in (2.1), the errors are assumed to be i.i.d. with Eεi=0, Eεi2= σ2∞ (for most considerations in Chapters 4–6, uncorrelatedness is sufficient).
Hans-Georg Müller

### 5. Optimization of Kernel and Weighted Local Regression Methods

Abstract
Optimization here means minimization of the asymptotically leading term of the IMSE. Since the asymptotic expression for the IMSE is the same for both kernel and weighted local least squares methods, optimization considerations are also the same.
Hans-Georg Müller

### 6. Multivariate Kernel Estimators

Abstract
The kernel estimate (4.4) can be generalized to the case of a multivariate regression function g: A → ℝ where A ⊂ ℝm, m ≥ 1. The proofs usually can be generalized from the univariate case without difficulty. There are, however, some genuinely new features in the multivariate situation. One is the sparsity of data, a problem that gets extremely worse with increasing dimension. Boundary effects are much more complicated. Choice of bandwidths and kernels requires further considerations, e.g. whether one should use the same bandwidth for all directions and whether one should use product kernels which are products of univariate functions.
Hans-Georg Müller

### 7. Choice of Global and Local Bandwidths

Abstract
For practice applications of curve smoothing methods, the choice of a good smoothing parameter is a very important issue. For kernel and weighted local least squares estimators this is the choice of the bandwidth, which besides the choice of the correct order of the kernel or polynomial has a strong influence on the quality of the estimate. The smoothing parameter, losely speaking, provides information about the signal-to-noise ratio in the data; strongly oscillating measurements can be due to a strongly oscillating curve with small measurement errors or to a very smooth curve with large measurement errors. In many finite sample situations it is very difficult to make the right decision and to use correctly a small bandwidth in the first and a large bandwidth in the second case. Therefore a completely satisfying finite sample solution of the bandwidth choice problem is not possible. The methods proposed for bandwidth choice are motivated by asymptotic considerations. A comprehensive survey of the finite sample behavior in simulations of various methods of bandwidth choice seems not to exist so far.
Hans-Georg Müller

### 8. Longitudinal Parameters

Abstract
In biomedical settings, a common problem is the comparison and description of samples of curves. Assuming there are N subjects and nj measurements are made for the j-th subject, we might describe the situation by the following model:
$${\text{Y}}_{{\text{ij }}} - {\text{g}}_{\text{j}} {\text{(t}}_{{\text{ij}}} {\text{) + }}\varepsilon _{{\text{ij}}} ,{\text{ j = 1,}}...{\text{,N, i = 1,}}...{\text{,n}}_{\text{j}} {\text{ }},$$
, where the functions gj are assumed to be random processes. More specific assumptions and the problem of estimating a “longitudinal average” curve (to be distinguished from a cross-sectional ordinary average curve which would not represent a “typical” time course since e.g. peaks would be unreasonably broadened) are discussed e.g. in Müller and Ihm (1985). Another approach to deal with samples of curves is shape-invariant modelling (Lawton, Sylvestre and Maggio, 1972; Stützle et al, 1980; Kneip and Gasser, 1986), where under minimal assumptions of some “invariant shapes” constituting a curve by different scaling, the nonparametric shapes as well as the scaling parameters are found by iterative improvement of the model, pooling at each step residuals which belong to “corresponding” times over the sample of curves and estimating the model improvement by a spline function. Other related proposals have been made, e.g. principal components techniques for stochastic processes have been tried (Castro, Lawton and Sylvestre, 1986).
Hans-Georg Müller

### 9. Nonparametric Estimation of the Human Height Growth Curve

Abstract
As an example of an application of some of the methods discussed before, the analysis of the human height growth curve by nonparametric regression methods is considered. The data that are analysed were obtained in the Zurich Longitudinal Growth Study (1955–78) which was discussed already in 2.3. The nonparametric analysis of these data is published in Largo et al (1978) and Gasser et al (1984a,b; 1985a,b), and this chapter is based on the results of the latter four papers which are summarized and discussed. Of special interest for growth curves is the estimation of derivatives. Further, the comparison between parametric and nonparametric models, between smoothing splines and kernel estimators, the definition of longitudinal parameters and the phenomenon of growth spurts are discussed.
Hans-Georg Müller

### 10. Further Applications

Abstract
The remarks made here concern typical problems in the medical field which can as well be encountered in other fields of application. Longitudinal medical data are not only collected with the aim of description and assessment of the dynamics of some time-dependent physiological or pathological process, but also for purposes of patient monitoring and classification w.r. to prognosis. The data for the prognosis problem usually would consist of a vector of covariates like age, sex and age at diagnosis plus a vector of longitudinal observations per patient. The basic idea is then to extract a few longitudinal parameters from the time course data and to add them to the vector of (stationary) covariates. These vectors are then subjected to discriminant analysis techniques with the aim of selecting the variables that separate best between the groups with good and bad prognosis; one possible method is e.g. CART (Breiman et al, 1982, compare Grossmann, 1985), which has some appealing features in a medical context, like ease of classifying a new case by means of a classification tree. Besides classical longitudinal parameters, also the variability of the observations as measured by $$\hat{\sigma}$$ (7.1), (7.2) can be of interest for classification purposes (with prognosis as a special case) as well as more complicated functionals of the curves which would be estimated by evaluating the corresponding functional of the estimated curves. The parameters should be extracted and selected with the ultimate goal of minimizing the misclassification rate which usually is estimated by a cross-validation procedure (see Breiman et al, 1982).
Hans-Georg Müller

### 11. Consistency Properties of Moving Weighted Averages

Abstract
We consider here the usual fixed design regression model
$$Y_{i,n} {\text{ = g(t}}_{i,n} ) + \varepsilon _{i,n}$$
with triangular array errors εi, n i.i.d. for each n, Eεi,n = 0 and g: A → ℝ, A ⊂ ℝm, corresponding to the multivariate case (6.1). The notation and assumptions are the same as in 6.1.
Hans-Georg Müller

### 12. Fortran Routines for Kernel Smoothing and Differentiation

Abstract
The programs listed below are suited for kernel estimation and differentiation (υ=0–3) with estimators (4.4); various kernels of different orders can be chosen and there are two options for bandwidth choices: FAC-CV which combines the factor method for bandwidth choice for derivatives (7.17) with cross-validation (7.11) for υ = 0 (and corresponds to CV for υ = 0) and FAC-R which combines (7.17) with the Rice criterion (7.12) for υ = 0. The simulation study reported in 7.4 indicates that FAC-R yields the best bandwidth choice for derivatives. The program can handle nonequidistant data, and provides two options for boundary modifications, with bandwidth like in the interior or increased (stationary) bandwidth in the boundary regions, see 5.8.
Hans-Georg Müller

### Backmatter

Weitere Informationen