Genetic Mixed Linear Models for Twin Survival Data

Ha, Il Do; Lee, Youngjo; Pawitan, Yudi

doi:10.1007/s10519-007-9150-7

Genetic Mixed Linear Models for Twin Survival Data

Original Paper
Published: 31 March 2007

Volume 37, pages 621–630, (2007)
Cite this article

Behavior Genetics Aims and scope Submit manuscript

Il Do Ha¹,
Youngjo Lee² &
Yudi Pawitan³

191 Accesses
7 Citations
Explore all metrics

Abstract

Twin studies are useful for assessing the relative importance of genetic or heritable component from the environmental component. In this paper we develop a methodology to study the heritability of age-at-onset or lifespan traits, with application to analysis of twin survival data. Due to limited period of observation, the data can be left truncated and right censored (LTRC). Under the LTRC setting we propose a genetic mixed linear model, which allows general fixed predictors and random components to capture genetic and environmental effects. Inferences are based upon the hierarchical-likelihood (h-likelihood), which provides a statistically efficient and unified framework for various mixed-effect models. We also propose a simple and fast computation method for dealing with large data sets. The method is illustrated by the survival data from the Swedish Twin Registry. Finally, a simulation study is carried out to evaluate its performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multivariate Generalized Linear Models for Twin and Family Data

Article 16 January 2022

Measuring early or late dependence for bivariate lifetimes of twins

Article 04 September 2014

How the effects of aging and stresses of life are integrated in mortality rates: insights for genetic studies of human health and longevity

Article 18 August 2015

References

Agresti A, Caffo B, Ohman-Strickland P (2004) Example in which misspecification of a random effects distribution reduces efficiency, and possible remedies. Comput Stat Data Anal 47:639–653
Article Google Scholar
Chernoff H (1954) On the distribution of the likelihood ratio. Ann Math Stat 25:573–578
Google Scholar
Ha ID, Lee Y (2005a) Multilevel mixed linear models for survival data. Lifetime Data Anal 11:131–142
Article Google Scholar
Ha ID, Lee Y (2005b) Comparison of hierarchical likelihood versus orthodox best linear unbiased predictor approaches for frailty models. Biometrika 92:717–723
Article Google Scholar
Ha ID, Lee Y, Song J-K (2002) Hierarchical likelihood approach for mixed linear models with censored data. Lifetime Data Anal 8:163–176
Article PubMed Google Scholar
Henderson CR (1975) Best linear unbiased estimation and prediction under a selection model. Biometrics 31:423–447
Article PubMed CAS Google Scholar
Hougaard P (2000) Analysis of multivariate survival data. Springer-Verlag, New York
Google Scholar
Klein JP, Pelz C, Zhang M (1999) Modelling random effects for censored data by a multivariate normal regression model. Biometrics 55:497–506
Article PubMed CAS Google Scholar
Lambert P, Collett D, Kimber A, Johnson R (2004) Parametric accelerated failure time models with random effects and an application to kidney transplant survival. Stat Med 23:3177–3192
Article PubMed Google Scholar
Lai TZ, Ying Z (1994) A missing information principle and M-estimators in regression analysis with censored and truncated data. Ann Stat 22:1222–1255
Google Scholar
Lee Y, Nelder JA (1996). Hierarchical generalized linear models (with discussion). J Roy Stat Soc Ser B 58:619–678
Google Scholar
Lee Y, Nelder JA (2001a) Modelling and analysing correlated non-normal data. Stat Modell 1:3–16
Article Google Scholar
Lee Y, Nelder JA (2001b). Hierarchical generalised linear models: a synthesis of generalised linear models, random-effect models and structured dispersions. Biometrika 88:987–1006
Article Google Scholar
Neale MC, Cardon LR (1992) Methodology for genetic studies of twin and families. Kluwer Academic, Dordrecht
Google Scholar
Noh M, Lee Y (2007) REML estimation for binary data in GLMMs. J Multivariate Anal 98:896–915
Article Google Scholar
Pawitan Y, Reilly M, Nilsson, E, Cnattingius S, Lichtenstein P (2004) Estimation of genetic and environmental factors for binary traits using family data. Stat Med 23:449–465
Article PubMed CAS Google Scholar
Self SG, Liang KY (1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 82:605–610
Article Google Scholar
Sham PC (1998) Statistics in human genetics. Arnold, London
Google Scholar
Stram DO, Lee JW (1994) Variance components testing in the longitudinal mixed effects model. Biometrics 50:1171–1177
Article PubMed CAS Google Scholar
Tobin J (1958) Estimation of relationship for limited dependent variables. Econometrica 26:24–36
Article Google Scholar
Vu HTV, Knuiman MW (2002) A hybrid ML-EM algorithm for calculation of maximum likelihood estimates in semiparametric shared frailty models. Comput Stat Data Anal 40:173–187
Article Google Scholar
Wienke A, Holm NV, Christensen K, Skytthe A, Vaupel JW, Yashin AI (2003) The heritability of cause-specific mortality: a correlated gamma-frailty model applied to mortality due to respiratory disease in Danish twins born 1870–1930. Stat Med 22:3873–3887
Article PubMed Google Scholar
Wolfinger RD (1993) Covariance structure selection in general mixed models. Communications in Statistics-Simulation and Computation 22: 1079–1106
Google Scholar
Xue X (2001) Analysis of childhood brain tumour data in New York city using frailty. Stat Med 20:3459–3473
Article PubMed CAS Google Scholar
Yashin AI, Iachine IA (1995) How long can humans live? Lower bound for biological limit of human longevity calculated from Danish twin data using correlated frailty model. Mech Ageing Dev 80:147–169
Article PubMed CAS Google Scholar
Yashin AI, Iachine IA, Harris JR (1999) Half of the variation in susceptibility to mortality is genetic: findings from Swedish twin survival data. Behav Genet 29:11–19
Article PubMed CAS Google Scholar
Zhi X, Grambsch PM, Eberly LE (2005) Likelihood ratio test for the variance component in a semi-parametric shared gamma frailty model. Research Report 2005–005, Division of Biostatistics, University of Minnesota

Download references

Acknowledgements

We are grateful to the Swedish Twin Registry for providing us the dataset used in this paper. This work was supported by Korea Research Foundation Grant (KRF-2003-002-C00045).

Author information

Authors and Affiliations

Department of Asset Management, Daegu Haany University, Gyeongsan, 712-715, Korea
Il Do Ha
Department of Statistics, Seoul National University, Seoul, 151-742, Korea
Youngjo Lee
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, 171 77, Stockholm, Sweden
Yudi Pawitan

Authors

Il Do Ha
View author publications
You can also search for this author in PubMed Google Scholar
Youngjo Lee
View author publications
You can also search for this author in PubMed Google Scholar
Yudi Pawitan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yudi Pawitan.

Additional information

Edited by Pak Sham

Appendices

Appendix A

Derivation of model (2)

From v _ij = g _ij + c _i0 for j = 1, 2, the model (1) can be expressed as a simple matrix form:

$$ \hbox{log}\,T_{i}=X_{i} \beta + Z_i v_i + \epsilon _{i}, $$

(A.1)

where T _i = (T _i1, T _i2)^T, X _i = (x _i1, x _i2)^T is the 2 × p model matrix of β, Z _i is the model matrix of v _i, ε_i = (ε_i1, ε_i2)^T ∼ N(0,σ ²_ε I ₂), and I ₂ is the 2 × 2 identity matrix. For the MZ_i, Z _i = (1,1)^T and v _i(= v _i1 = v _i2) ∼ N(0,σ ²_v ), but for the DZ_i, Z _i = I ₂ and v _i = (v _i1, v _i2)^T ∼ N(0,σ ²_v Σ_i) with a compound symmetric structure

$$ \Sigma_i=\left( \begin{array}{cc} {1} & {\rho} \\ & \\ {\rho} & {1} \end{array} \right). $$

Here

$$ \sigma_v^2=\sigma_g^2+\sigma_c^2 $$

(A.2)

and

$$ \rho=\hbox{corr}(v_{i1}, v_{i2})=\frac{0.5\sigma_g^2+\sigma_c^2}{\sigma_g^2+\sigma_c^2}, $$

(A.3)

where ρ ∈ [0.5, 1.0]. The use of ρ leads to useful results. From (A.3) we see that σ ²_g is much larger than σ ²_c (i.e., σ ²_g ≫ σ ²_c ) if ρ goes to 0.5, but σ ²_g ≪ σ ²_c if ρ goes to 1.0. In particular, the model (A.1) reduces to model (1) without random-environment effects c _ij if ρ = 0.5 (i.e., σ ²_c = 0), while it becomes model (1) without random-genetic effects g _ij if ρ = 1.0 (i.e., σ ²_g = 0).

Following Lee and Nelder (2001a), the random effects v _i for DZ_i are assumed to have the form L _i(ρ)u _i, where u _i ∼ N(0,σ ²_v I ₂). For the DZ_i, using the Cholesky decomposition we have a lower triangular matrix L _i such that Σ_i = L _i L ^T_i . Here, we choose

$$ L_i(\rho)=\left( \begin{array}{cc} {1} & {0} \\ {\rho} & \sqrt{1-\rho^2} \end{array} \right), $$

and so the random effects v _i = L _i u _i ∼ N(0,σ ²_v L _i L ^T_i ).

Thus, model (A.1) can be written as

$$ \hbox{log}\,T_{i}=X_{i} \beta + Z_i^{\ast} u_i + \epsilon _{i}, $$

(A.4)

where u _i ∼ N(0,σ ²_v I _k), and Z ^*_i = (1,1)^T and I _k = 1 for the MZ_i, and Z ^*_i = L _i(ρ) and I _k = I ₂ for the DZ_i. Note that from (A.2) and (A.3) we obtain σ ²_g and σ ²_c as follows:

$$ \sigma_g^2=\sigma_v^2 - \sigma_c^2\quad\hbox{and}\quad \sigma_c^2=2(\rho-0.5)\sigma_v^2. $$

(A.5)

Then the jth element of model (A.4) becomes the model (2).

Appendix B

Proofs of score equations of (7) and (8) and the computation of variance of ${\widehat \beta}$

Let μ be the n × 1 vector with ijth element μ_ij,

$$ \mu=X \beta + Z^{\ast} u, $$

where X = (X ^T₁ , …, X ^T_q )^T is the n × p model matrix for the p × 1 fixed effects β and Z ^* = blockdiag(Z ^*₁ , …, Z ^*_q ) is the n × q ^* block diagonal matrix for the q ^* × 1 random effects u = (u ₁, …, u _q)^T. Here, q ^* = q ₁ + 2q ₂, q ₁ is the number of MZ twin pairs and q ₂ is that of DZ twin pairs. Note that q = q ₁ + q ₂. Let y ^* = (y ^{* T}₁ …, y ^{* T}_q )^T be the n × 1 vector with the ith vector y ^*_i = (y ^*_i1 , y ^*_i2 )^T. Assume that ρ is known. Given θ = (σ ²_ε ,σ ²_v )^T and y ^*, from (5) and (6) the score equations for the MHLEs of τ = (β^T, u ^T)^T become Henderson’s (1975) mixed-model equations with pseudo-response variables y ^*:

$$ \left( \begin{array}{cc} {X^{T}X} & {X^{T}Z^{\ast}} \\ {Z^{\ast T}X} & {Z^{\ast T}Z^{\ast}+\Lambda } \end{array} \right) \left( \begin{array}{c} {\widehat{\beta }} \\ {\widehat{u}} \end{array} \right) =\left( \begin{array}{c} X^{T}y^{\ast } \\ Z^{\ast T}y^{\ast } \end{array} \right), $$

(B.1)

where ${\Lambda=\lambda I_{q^{\ast}}}$, λ = σ ²_ε /σ ²_v and ${I_{q^{\ast}}}$ is the q ^* × q ^* identity matrix. Equation (B.1) can be expressed as the two equations:

$$ (X^T X) \widehat{\beta}+(X^T Z^{\ast}) \widehat{u}=X^T y^{\ast}, $$

(B.2)

$$ (Z^{\ast T} X) \widehat{\beta}+(Z^{\ast T} Z^{\ast} +\lambda I_{q^{\ast}}) \widehat{u}=Z^{\ast T} y^{\ast}. $$

(B.3)

Substituting X = (X ^T₁ , …, X ^T_q )^T, Z ^* = blockdiag(Z ^*₁ , …, Z ^*_q ) and y ^* = (y ^{* T}₁ , …, y ^{* T}_q )^T into (B.2) and (B.3) reduces them to Eqs. (7) and (8).

The asymptotic covariance matrix for ${\widehat{\tau }-\tau }$ is given by D ⁻¹ with

$$ D(h, \tau)=- \frac{\partial ^{2}h}{\partial \tau ^{2}}=\frac{1}{\sigma _{\epsilon }^{2}}H, $$

(B.4)

where

$$ H={\left( \begin{array}{cc} {X^{T}WX} & {X^{T}WZ^{\ast}} \\ {Z^{\ast T}WX} & {Z^{\ast T}WZ^{\ast}+\Lambda} \end{array} \right) }. $$

Here, W = diag(w _ij) is the n × n diagonal matrix with the ijth element w _ij = δ_ij + (1 − δ_ij)ξ(m _ij) − ξ(m ^*_ij ) and ξ(x) = V(x){V(x) − x}. So, the upper left-hand corner of D ⁻¹ in (B.4) gives the variance matrix of ${\widehat{\beta }}$, which is also easily computed for the large samples as follows. Let H ¹¹ be the upper left-hand corner of H ⁻¹ in (B.4). Then we have that

$$ \hbox{var}(\widehat{\beta})=\sigma_\epsilon^2 {H}^{11}, $$

(B.5)

where

$$ \begin{array}{lll} {H}^{11} &= & \left\{ (X^T W X)-(X^T W Z^{\ast})(Z^{\ast T} W Z^{\ast} + \lambda I_{q^{\ast}})^{-1} (Z^{\ast T} W X) \right\}^{-1} \\ & = & \left\{ \sum\limits_i X_i^T W_i X_i-\sum\limits_i (X_i^T W_i Z_i^{\ast})(Z_i^{\ast T} W_i Z_i^{\ast} + \lambda I_{k})^{-1} (Z_i^{\ast T} W_i X_i) \right\}^{-1}. \end{array} $$

Here W _i is the ith component matrix of W.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ha, I.D., Lee, Y. & Pawitan, Y. Genetic Mixed Linear Models for Twin Survival Data. Behav Genet 37, 621–630 (2007). https://doi.org/10.1007/s10519-007-9150-7

Download citation

Received: 28 February 2006
Accepted: 01 March 2007
Published: 31 March 2007
Issue Date: July 2007
DOI: https://doi.org/10.1007/s10519-007-9150-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Genetic Mixed Linear Models for Twin Survival Data

Abstract

Access this article

Similar content being viewed by others

Multivariate Generalized Linear Models for Twin and Family Data

Measuring early or late dependence for bivariate lifetimes of twins

How the effects of aging and stresses of life are integrated in mortality rates: insights for genetic studies of human health and longevity

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A

Derivation of model (2)

Appendix B

Proofs of score equations of (7) and (8) and the computation of variance of \({\widehat \beta}\)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Genetic Mixed Linear Models for Twin Survival Data

Abstract

Access this article

Similar content being viewed by others

Multivariate Generalized Linear Models for Twin and Family Data

Measuring early or late dependence for bivariate lifetimes of twins

How the effects of aging and stresses of life are integrated in mortality rates: insights for genetic studies of human health and longevity

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A

Derivation of model (2)

Appendix B

Proofs of score equations of (7) and (8) and the computation of variance of \({\widehat \beta}\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation