Skip to main content
Log in

Genetic Mixed Linear Models for Twin Survival Data

  • Original Paper
  • Published:
Behavior Genetics Aims and scope Submit manuscript

Abstract

Twin studies are useful for assessing the relative importance of genetic or heritable component from the environmental component. In this paper we develop a methodology to study the heritability of age-at-onset or lifespan traits, with application to analysis of twin survival data. Due to limited period of observation, the data can be left truncated and right censored (LTRC). Under the LTRC setting we propose a genetic mixed linear model, which allows general fixed predictors and random components to capture genetic and environmental effects. Inferences are based upon the hierarchical-likelihood (h-likelihood), which provides a statistically efficient and unified framework for various mixed-effect models. We also propose a simple and fast computation method for dealing with large data sets. The method is illustrated by the survival data from the Swedish Twin Registry. Finally, a simulation study is carried out to evaluate its performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agresti A, Caffo B, Ohman-Strickland P (2004) Example in which misspecification of a random effects distribution reduces efficiency, and possible remedies. Comput Stat Data Anal 47:639–653

    Article  Google Scholar 

  • Chernoff H (1954) On the distribution of the likelihood ratio. Ann Math Stat 25:573–578

    Google Scholar 

  • Ha ID, Lee Y (2005a) Multilevel mixed linear models for survival data. Lifetime Data Anal 11:131–142

    Article  Google Scholar 

  • Ha ID, Lee Y (2005b) Comparison of hierarchical likelihood versus orthodox best linear unbiased predictor approaches for frailty models. Biometrika 92:717–723

    Article  Google Scholar 

  • Ha ID, Lee Y, Song J-K (2002) Hierarchical likelihood approach for mixed linear models with censored data. Lifetime Data Anal 8:163–176

    Article  PubMed  Google Scholar 

  • Henderson CR (1975) Best linear unbiased estimation and prediction under a selection model. Biometrics 31:423–447

    Article  PubMed  CAS  Google Scholar 

  • Hougaard P (2000) Analysis of multivariate survival data. Springer-Verlag, New York

    Google Scholar 

  • Klein JP, Pelz C, Zhang M (1999) Modelling random effects for censored data by a multivariate normal regression model. Biometrics 55:497–506

    Article  PubMed  CAS  Google Scholar 

  • Lambert P, Collett D, Kimber A, Johnson R (2004) Parametric accelerated failure time models with random effects and an application to kidney transplant survival. Stat Med 23:3177–3192

    Article  PubMed  Google Scholar 

  • Lai TZ, Ying Z (1994) A missing information principle and M-estimators in regression analysis with censored and truncated data. Ann Stat 22:1222–1255

    Google Scholar 

  • Lee Y, Nelder JA (1996). Hierarchical generalized linear models (with discussion). J Roy Stat Soc Ser B 58:619–678

    Google Scholar 

  • Lee Y, Nelder JA (2001a) Modelling and analysing correlated non-normal data. Stat Modell 1:3–16

    Article  Google Scholar 

  • Lee Y, Nelder JA (2001b). Hierarchical generalised linear models: a synthesis of generalised linear models, random-effect models and structured dispersions. Biometrika 88:987–1006

    Article  Google Scholar 

  • Neale MC, Cardon LR (1992) Methodology for genetic studies of twin and families. Kluwer Academic, Dordrecht

    Google Scholar 

  • Noh M, Lee Y (2007) REML estimation for binary data in GLMMs. J Multivariate Anal 98:896–915

    Article  Google Scholar 

  • Pawitan Y, Reilly M, Nilsson, E, Cnattingius S, Lichtenstein P (2004) Estimation of genetic and environmental factors for binary traits using family data. Stat Med 23:449–465

    Article  PubMed  CAS  Google Scholar 

  • Self SG, Liang KY (1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 82:605–610

    Article  Google Scholar 

  • Sham PC (1998) Statistics in human genetics. Arnold, London

    Google Scholar 

  • Stram DO, Lee JW (1994) Variance components testing in the longitudinal mixed effects model. Biometrics 50:1171–1177

    Article  PubMed  CAS  Google Scholar 

  • Tobin J (1958) Estimation of relationship for limited dependent variables. Econometrica 26:24–36

    Article  Google Scholar 

  • Vu HTV, Knuiman MW (2002) A hybrid ML-EM algorithm for calculation of maximum likelihood estimates in semiparametric shared frailty models. Comput Stat Data Anal 40:173–187

    Article  Google Scholar 

  • Wienke A, Holm NV, Christensen K, Skytthe A, Vaupel JW, Yashin AI (2003) The heritability of cause-specific mortality: a correlated gamma-frailty model applied to mortality due to respiratory disease in Danish twins born 1870–1930. Stat Med 22:3873–3887

    Article  PubMed  Google Scholar 

  • Wolfinger RD (1993) Covariance structure selection in general mixed models. Communications in Statistics-Simulation and Computation 22: 1079–1106

    Google Scholar 

  • Xue X (2001) Analysis of childhood brain tumour data in New York city using frailty. Stat Med 20:3459–3473

    Article  PubMed  CAS  Google Scholar 

  • Yashin AI, Iachine IA (1995) How long can humans live? Lower bound for biological limit of human longevity calculated from Danish twin data using correlated frailty model. Mech Ageing Dev 80:147–169

    Article  PubMed  CAS  Google Scholar 

  • Yashin AI, Iachine IA, Harris JR (1999) Half of the variation in susceptibility to mortality is genetic: findings from Swedish twin survival data. Behav Genet 29:11–19

    Article  PubMed  CAS  Google Scholar 

  • Zhi X, Grambsch PM, Eberly LE (2005) Likelihood ratio test for the variance component in a semi-parametric shared gamma frailty model. Research Report 2005–005, Division of Biostatistics, University of Minnesota

Download references

Acknowledgements

We are grateful to the Swedish Twin Registry for providing us the dataset used in this paper. This work was supported by Korea Research Foundation Grant (KRF-2003-002-C00045).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yudi Pawitan.

Additional information

Edited by Pak Sham

Appendices

Appendix A

Derivation of model (2)

From v ij  = g ij  + c i0 for j = 1, 2, the model (1) can be expressed as a simple matrix form:

$$ \hbox{log}\,T_{i}=X_{i} \beta + Z_i v_i + \epsilon _{i}, $$
(A.1)

where T i  = (T i1, T i2)T, X i  = (x i1, x i2)T is the 2 × p model matrix of β, Z i is the model matrix of v i , ε i  = (ε i1, ε i2)T ∼ N(0,σ 2ε I 2), and I 2 is the 2 × 2 identity matrix. For the MZ i , Z i  = (1,1)T and v i (= v i1 = v i2) ∼ N(0,σ 2 v ), but for the DZ i , Z i  = I 2 and v i  = (v i1, v i2)T ∼ N(0,σ 2 v Σ i ) with a compound symmetric structure

$$ \Sigma_i=\left( \begin{array}{cc} {1} & {\rho} \\ & \\ {\rho} & {1} \end{array} \right). $$

Here

$$ \sigma_v^2=\sigma_g^2+\sigma_c^2 $$
(A.2)

and

$$ \rho=\hbox{corr}(v_{i1}, v_{i2})=\frac{0.5\sigma_g^2+\sigma_c^2}{\sigma_g^2+\sigma_c^2}, $$
(A.3)

where ρ ∈ [0.5, 1.0]. The use of ρ leads to useful results. From (A.3) we see that σ 2 g is much larger than σ 2 c (i.e., σ 2 g ≫ σ 2 c ) if ρ goes to 0.5, but σ 2 g ≪ σ 2 c if ρ goes to 1.0. In particular, the model (A.1) reduces to model (1) without random-environment effects c ij if ρ = 0.5 (i.e., σ 2 c  = 0), while it becomes model (1) without random-genetic effects g ij if ρ = 1.0 (i.e., σ 2 g  = 0).

Following Lee and Nelder (2001a), the random effects v i for DZ i are assumed to have the form L i (ρ)u i , where u i  ∼ N(0,σ 2 v I 2). For the DZ i , using the Cholesky decomposition we have a lower triangular matrix L i such that Σ i  = L i L T i . Here, we choose

$$ L_i(\rho)=\left( \begin{array}{cc} {1} & {0} \\ {\rho} & \sqrt{1-\rho^2} \end{array} \right), $$

and so the random effects v i  = L i u i  ∼ N(0,σ 2 v L i L T i ).

Thus, model (A.1) can be written as

$$ \hbox{log}\,T_{i}=X_{i} \beta + Z_i^{\ast} u_i + \epsilon _{i}, $$
(A.4)

where u i  ∼ N(0,σ 2 v I k ), and Z * i  = (1,1)T and I k  = 1 for the MZ i , and Z * i  = L i (ρ) and I k  = I 2 for the DZ i . Note that from (A.2) and (A.3) we obtain σ 2 g and σ 2 c as follows:

$$ \sigma_g^2=\sigma_v^2 - \sigma_c^2\quad\hbox{and}\quad \sigma_c^2=2(\rho-0.5)\sigma_v^2. $$
(A.5)

Then the jth element of model (A.4) becomes the model (2).

Appendix B

Proofs of score equations of (7) and (8) and the computation of variance of \({\widehat \beta}\)

Let μ be the n × 1 vector with ijth element μ ij ,

$$ \mu=X \beta + Z^{\ast} u, $$

where X = (X T1 , …, X T q )T is the n × p model matrix for the p × 1 fixed effects β and Z * = blockdiag(Z *1 , …, Z * q ) is the n × q * block diagonal matrix for the q * × 1 random effects u = (u 1, …, u q )T. Here, q * = q 1 + 2q 2, q 1 is the number of MZ twin pairs and q 2 is that of DZ twin pairs. Note that q = q 1 + q 2. Let y * = (y * T1 …, y * T q )T be the n × 1 vector with the ith vector y * i  = (y * i1 , y * i2 )T. Assume that ρ is known. Given θ = (σ 2ε 2 v )T and y *, from (5) and (6) the score equations for the MHLEs of τ = (βT, u T)T become Henderson’s (1975) mixed-model equations with pseudo-response variables y *:

$$ \left( \begin{array}{cc} {X^{T}X} & {X^{T}Z^{\ast}} \\ {Z^{\ast T}X} & {Z^{\ast T}Z^{\ast}+\Lambda } \end{array} \right) \left( \begin{array}{c} {\widehat{\beta }} \\ {\widehat{u}} \end{array} \right) =\left( \begin{array}{c} X^{T}y^{\ast } \\ Z^{\ast T}y^{\ast } \end{array} \right), $$
(B.1)

where \({\Lambda=\lambda I_{q^{\ast}}}\), λ = σ 2ε 2 v and \({I_{q^{\ast}}}\) is the q * × q * identity matrix. Equation (B.1) can be expressed as the two equations:

$$ (X^T X) \widehat{\beta}+(X^T Z^{\ast}) \widehat{u}=X^T y^{\ast}, $$
(B.2)
$$ (Z^{\ast T} X) \widehat{\beta}+(Z^{\ast T} Z^{\ast} +\lambda I_{q^{\ast}}) \widehat{u}=Z^{\ast T} y^{\ast}. $$
(B.3)

Substituting X = (X T1 , …, X T q )T, Z * = blockdiag(Z *1 , …, Z * q ) and y * = (y * T1 , …, y * T q )T into (B.2) and (B.3) reduces them to Eqs. (7) and (8).

The asymptotic covariance matrix for \({\widehat{\tau }-\tau }\) is given by D −1 with

$$ D(h, \tau)=- \frac{\partial ^{2}h}{\partial \tau ^{2}}=\frac{1}{\sigma _{\epsilon }^{2}}H, $$
(B.4)

where

$$ H={\left( \begin{array}{cc} {X^{T}WX} & {X^{T}WZ^{\ast}} \\ {Z^{\ast T}WX} & {Z^{\ast T}WZ^{\ast}+\Lambda} \end{array} \right) }. $$

Here, W = diag(w ij ) is the n × n diagonal matrix with the ijth element w ij  = δ ij  + (1 − δ ij )ξ(m ij )  − ξ(m * ij ) and ξ(x) = V(x){V(x) − x}. So, the upper left-hand corner of D −1 in (B.4) gives the variance matrix of \({\widehat{\beta }}\), which is also easily computed for the large samples as follows. Let H 11 be the upper left-hand corner of H −1 in (B.4). Then we have that

$$ \hbox{var}(\widehat{\beta})=\sigma_\epsilon^2 {H}^{11}, $$
(B.5)

where

$$ \begin{array}{lll} {H}^{11} &= & \left\{ (X^T W X)-(X^T W Z^{\ast})(Z^{\ast T} W Z^{\ast} + \lambda I_{q^{\ast}})^{-1} (Z^{\ast T} W X) \right\}^{-1} \\ & = & \left\{ \sum\limits_i X_i^T W_i X_i-\sum\limits_i (X_i^T W_i Z_i^{\ast})(Z_i^{\ast T} W_i Z_i^{\ast} + \lambda I_{k})^{-1} (Z_i^{\ast T} W_i X_i) \right\}^{-1}. \end{array} $$

Here W i is the ith component matrix of W.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ha, I.D., Lee, Y. & Pawitan, Y. Genetic Mixed Linear Models for Twin Survival Data. Behav Genet 37, 621–630 (2007). https://doi.org/10.1007/s10519-007-9150-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10519-007-9150-7

Keywords

Navigation