Abstract
Twin studies are useful for assessing the relative importance of genetic or heritable component from the environmental component. In this paper we develop a methodology to study the heritability of age-at-onset or lifespan traits, with application to analysis of twin survival data. Due to limited period of observation, the data can be left truncated and right censored (LTRC). Under the LTRC setting we propose a genetic mixed linear model, which allows general fixed predictors and random components to capture genetic and environmental effects. Inferences are based upon the hierarchical-likelihood (h-likelihood), which provides a statistically efficient and unified framework for various mixed-effect models. We also propose a simple and fast computation method for dealing with large data sets. The method is illustrated by the survival data from the Swedish Twin Registry. Finally, a simulation study is carried out to evaluate its performance.
Similar content being viewed by others
References
Agresti A, Caffo B, Ohman-Strickland P (2004) Example in which misspecification of a random effects distribution reduces efficiency, and possible remedies. Comput Stat Data Anal 47:639–653
Chernoff H (1954) On the distribution of the likelihood ratio. Ann Math Stat 25:573–578
Ha ID, Lee Y (2005a) Multilevel mixed linear models for survival data. Lifetime Data Anal 11:131–142
Ha ID, Lee Y (2005b) Comparison of hierarchical likelihood versus orthodox best linear unbiased predictor approaches for frailty models. Biometrika 92:717–723
Ha ID, Lee Y, Song J-K (2002) Hierarchical likelihood approach for mixed linear models with censored data. Lifetime Data Anal 8:163–176
Henderson CR (1975) Best linear unbiased estimation and prediction under a selection model. Biometrics 31:423–447
Hougaard P (2000) Analysis of multivariate survival data. Springer-Verlag, New York
Klein JP, Pelz C, Zhang M (1999) Modelling random effects for censored data by a multivariate normal regression model. Biometrics 55:497–506
Lambert P, Collett D, Kimber A, Johnson R (2004) Parametric accelerated failure time models with random effects and an application to kidney transplant survival. Stat Med 23:3177–3192
Lai TZ, Ying Z (1994) A missing information principle and M-estimators in regression analysis with censored and truncated data. Ann Stat 22:1222–1255
Lee Y, Nelder JA (1996). Hierarchical generalized linear models (with discussion). J Roy Stat Soc Ser B 58:619–678
Lee Y, Nelder JA (2001a) Modelling and analysing correlated non-normal data. Stat Modell 1:3–16
Lee Y, Nelder JA (2001b). Hierarchical generalised linear models: a synthesis of generalised linear models, random-effect models and structured dispersions. Biometrika 88:987–1006
Neale MC, Cardon LR (1992) Methodology for genetic studies of twin and families. Kluwer Academic, Dordrecht
Noh M, Lee Y (2007) REML estimation for binary data in GLMMs. J Multivariate Anal 98:896–915
Pawitan Y, Reilly M, Nilsson, E, Cnattingius S, Lichtenstein P (2004) Estimation of genetic and environmental factors for binary traits using family data. Stat Med 23:449–465
Self SG, Liang KY (1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 82:605–610
Sham PC (1998) Statistics in human genetics. Arnold, London
Stram DO, Lee JW (1994) Variance components testing in the longitudinal mixed effects model. Biometrics 50:1171–1177
Tobin J (1958) Estimation of relationship for limited dependent variables. Econometrica 26:24–36
Vu HTV, Knuiman MW (2002) A hybrid ML-EM algorithm for calculation of maximum likelihood estimates in semiparametric shared frailty models. Comput Stat Data Anal 40:173–187
Wienke A, Holm NV, Christensen K, Skytthe A, Vaupel JW, Yashin AI (2003) The heritability of cause-specific mortality: a correlated gamma-frailty model applied to mortality due to respiratory disease in Danish twins born 1870–1930. Stat Med 22:3873–3887
Wolfinger RD (1993) Covariance structure selection in general mixed models. Communications in Statistics-Simulation and Computation 22: 1079–1106
Xue X (2001) Analysis of childhood brain tumour data in New York city using frailty. Stat Med 20:3459–3473
Yashin AI, Iachine IA (1995) How long can humans live? Lower bound for biological limit of human longevity calculated from Danish twin data using correlated frailty model. Mech Ageing Dev 80:147–169
Yashin AI, Iachine IA, Harris JR (1999) Half of the variation in susceptibility to mortality is genetic: findings from Swedish twin survival data. Behav Genet 29:11–19
Zhi X, Grambsch PM, Eberly LE (2005) Likelihood ratio test for the variance component in a semi-parametric shared gamma frailty model. Research Report 2005–005, Division of Biostatistics, University of Minnesota
Acknowledgements
We are grateful to the Swedish Twin Registry for providing us the dataset used in this paper. This work was supported by Korea Research Foundation Grant (KRF-2003-002-C00045).
Author information
Authors and Affiliations
Corresponding author
Additional information
Edited by Pak Sham
Appendices
Appendix A
Derivation of model (2)
From v ij = g ij + c i0 for j = 1, 2, the model (1) can be expressed as a simple matrix form:
where T i = (T i1, T i2)T, X i = (x i1, x i2)T is the 2 × p model matrix of β, Z i is the model matrix of v i , ε i = (ε i1, ε i2)T ∼ N(0,σ 2ε I 2), and I 2 is the 2 × 2 identity matrix. For the MZ i , Z i = (1,1)T and v i (= v i1 = v i2) ∼ N(0,σ 2 v ), but for the DZ i , Z i = I 2 and v i = (v i1, v i2)T ∼ N(0,σ 2 v Σ i ) with a compound symmetric structure
Here
and
where ρ ∈ [0.5, 1.0]. The use of ρ leads to useful results. From (A.3) we see that σ 2 g is much larger than σ 2 c (i.e., σ 2 g ≫ σ 2 c ) if ρ goes to 0.5, but σ 2 g ≪ σ 2 c if ρ goes to 1.0. In particular, the model (A.1) reduces to model (1) without random-environment effects c ij if ρ = 0.5 (i.e., σ 2 c = 0), while it becomes model (1) without random-genetic effects g ij if ρ = 1.0 (i.e., σ 2 g = 0).
Following Lee and Nelder (2001a), the random effects v i for DZ i are assumed to have the form L i (ρ)u i , where u i ∼ N(0,σ 2 v I 2). For the DZ i , using the Cholesky decomposition we have a lower triangular matrix L i such that Σ i = L i L T i . Here, we choose
and so the random effects v i = L i u i ∼ N(0,σ 2 v L i L T i ).
Thus, model (A.1) can be written as
where u i ∼ N(0,σ 2 v I k ), and Z * i = (1,1)T and I k = 1 for the MZ i , and Z * i = L i (ρ) and I k = I 2 for the DZ i . Note that from (A.2) and (A.3) we obtain σ 2 g and σ 2 c as follows:
Then the jth element of model (A.4) becomes the model (2).
Appendix B
Proofs of score equations of (7) and (8) and the computation of variance of \({\widehat \beta}\)
Let μ be the n × 1 vector with ijth element μ ij ,
where X = (X T1 , …, X T q )T is the n × p model matrix for the p × 1 fixed effects β and Z * = blockdiag(Z *1 , …, Z * q ) is the n × q * block diagonal matrix for the q * × 1 random effects u = (u 1, …, u q )T. Here, q * = q 1 + 2q 2, q 1 is the number of MZ twin pairs and q 2 is that of DZ twin pairs. Note that q = q 1 + q 2. Let y * = (y * T1 …, y * T q )T be the n × 1 vector with the ith vector y * i = (y * i1 , y * i2 )T. Assume that ρ is known. Given θ = (σ 2ε ,σ 2 v )T and y *, from (5) and (6) the score equations for the MHLEs of τ = (βT, u T)T become Henderson’s (1975) mixed-model equations with pseudo-response variables y *:
where \({\Lambda=\lambda I_{q^{\ast}}}\), λ = σ 2ε /σ 2 v and \({I_{q^{\ast}}}\) is the q * × q * identity matrix. Equation (B.1) can be expressed as the two equations:
Substituting X = (X T1 , …, X T q )T, Z * = blockdiag(Z *1 , …, Z * q ) and y * = (y * T1 , …, y * T q )T into (B.2) and (B.3) reduces them to Eqs. (7) and (8).
The asymptotic covariance matrix for \({\widehat{\tau }-\tau }\) is given by D −1 with
where
Here, W = diag(w ij ) is the n × n diagonal matrix with the ijth element w ij = δ ij + (1 − δ ij )ξ(m ij ) − ξ(m * ij ) and ξ(x) = V(x){V(x) − x}. So, the upper left-hand corner of D −1 in (B.4) gives the variance matrix of \({\widehat{\beta }}\), which is also easily computed for the large samples as follows. Let H 11 be the upper left-hand corner of H −1 in (B.4). Then we have that
where
Here W i is the ith component matrix of W.
Rights and permissions
About this article
Cite this article
Ha, I.D., Lee, Y. & Pawitan, Y. Genetic Mixed Linear Models for Twin Survival Data. Behav Genet 37, 621–630 (2007). https://doi.org/10.1007/s10519-007-9150-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10519-007-9150-7