Elsevier

Medical Image Analysis

Volume 18, Issue 6, August 2014, Pages 891-902
Medical Image Analysis

Correspondence between fMRI and SNP data by group sparse canonical correlation analysis

https://doi.org/10.1016/j.media.2013.10.010Get rights and content

Highlights

  • A group sparse canonical correlation analysis (CCA) model was developed, including several sparse models as special cases.

  • The model can overcome the difficulty of conventional CCA in analysing high dimensional data with group structures.

  • The model is validated by studying a biologically significant problem on how genetic variations influence brain activities.

Abstract

Both genetic variants and brain region abnormalities are recognized as important factors for complex diseases (e.g., schizophrenia). In this paper, we investigated the correspondence between single nucleotide polymorphism (SNP) and brain activity measured by functional magnetic resonance imaging (fMRI) to understand how genetic variation influences the brain activity. A group sparse canonical correlation analysis method (group sparse CCA) was developed to explore the correlation between these two datasets which are high dimensional-the number of SNPs/voxels is far greater than the number of samples. Different from the existing sparse CCA methods (sCCA), our approach can exploit structural information in the correlation analysis by introducing group constraints. A simulation study demonstrates that it outperforms the existing sCCA. We applied this method to the real data analysis and identified two pairs of significant canonical variates with average correlations of 0.4527 and 0.4292 respectively, which were used to identify genes and voxels associated with schizophrenia. The selected genes are mostly from 5 schizophrenia (SZ)-related signalling pathways. The brain mappings of the selected voxles also indicate the abnormal brain regions susceptible to schizophrenia. A gene and brain region of interest (ROI) correlation analysis was further performed to confirm the significant correlations between genes and ROIs.

Graphical abstract

In this paper, we investigated the correspondence between single nucleotide polymorphism (SNP) and brain activity measured by functional magnetic resonance imaging (fMRI). Such a study is biologically significant for understanding how genetic variation influences the brain activity (see figure). A group sparse canonical correlation analysis method (group sparse CCA) was developed to explore the correlation between these two data sets which are high dimensional-the number of SNPs/voxels is far greater than the number of samples. A simulation study demonstrates that it outperforms the existing sCCA. We applied this method to the real data analysis and identified two pairs of significant canonical variates, which were then used to identify genes and voxels associated with schizophrenia. A gene and brain region of interest (ROI) correlation analysis was further performed to confirm the significant correlations between genes and ROIs.

  1. Download : Download high-res image (63KB)
  2. Download : Download full-size image

Introduction

Schizophrenia is a complex disease and considered to be caused by the interplay of a number of genetic factors (e.g., change of gene regulation, and alteration of mRNA and SNP) and environmental effects. Genetic factors play an important role in causing schizophrenia disease. People born from a family with a history of schizophrenia have higher risks of schizophrenia than those without a family schizophrenia history. In recent years, many studies have focused on exploring critical genes associated with the schizophrenia. Many potential genetic variants have been reported as possible risk factors such as the G72/G30 gene locus on chromosome 13q (Badner and Gershon, 2002; Abecasis et al., 2004) Gene DISC1 variation (Callicott et al., 2005, Porteous et al., 2006) and copy number variations on gene GRIK3, EFNA5, AKAP5 and CACNG2 (Wilson et al., 2006, Sutrala et al., 2007). In addition to genetic studies, fMRI has also been widely used for the study of schizophrenia because of its capability to identify functional abnormalities within brain regions of schizophrenic patients (Jansma et al., 2004, Li et al., 2007, Meda et al., 2008, Szycik et al., 2009).

Genetic variants and brain region abnormalities are both important markers for the study of schizophrenia. Combining both data can not only contribute to a better understanding of biological mechanisms on brain structure and function but also have the potential to improve the diagnosis and treatments of complex diseases. However, current imaging genetics studies either take brain imaging measurements as endophenotypes to study the associated genetic variants or investigate the effects of a small set of candidate genetic variants on the whole brain measurements (Hamid et al., 2009, Le Cao et al., 2009, Wiley, 2011). It is still challenging to explore the relationship between a large amount of genetic variants and a large number of brain imaging measurements. Therefore, correlative analysis approaches for large-scale multimodal data analysis are highly demanded.

In this work, we aim to study the effects of multiple SNPs or genes on functional brain activity in schizophrenia. An effective multivariate statistical method is needed. Canonical Correlation Analysis (CCA (Hotelling, 1936)) or Partial Least Squares regression (PLSR (Le Cao et al., 2008)) have been proposed to analyze multimodal datasets. The CCA aims to maximize the correlation between the linear combinations of variables from two data sets, e.g., a linear combination of SNPs and a linear combination of voxels. However, the method will have the over-fitting issues in analyzing high dimensional data such as SNP and brain imaging data as shown in Fig. 1. Thousands of SNPs with linkage disequilibrium (LD) are detected to reflect the genetic variant at different locus. The number of voxels included in the whole brain fMRI image is also very large (e.g., 53 × 63 × 46). Traditional CCA will perform poorly in such a case due to the multi-collinearity (linear dependence) problem, and thus having computational difficulty (Parkhomenko et al., 2009). To address above issue, sparse CCA (sCCA) methods, mostly using the l  1 norm (CCA-l1) or the combination of l  1 and l  2 norm (CCA-elastic net) penalties, have been developed by introducing the sparse penalties into the traditional CCA model (Waaijenborg et al., 2008, Le Cao et al., 2009, Parkhomenko et al., 2009, Witten et al., 2009, Witten and Tibshirani, 2009, Boutte and Liu, 2010). Despite the success, they didn’t account for group structures within the data in the analysis (e.g., multiple SNPs within the same gene, a group of voxels within the same region, a group of voxels within the same ROI, etc.), which often exist or are implied by the biological mechanism. For example, SNPs within the same gene have similar functions and act together at the gene or pathway level to affect the brain activity. These SNP effects can be added up to a larger difference (Tyekucheva et al., 2011). Several previous works have shown the benefit of accounting for the group effect of features in the sCCA models (Chen and Liu, 2012, Chen et al., 2013, Lin et al., 2013). However, to our knowledge, little work has been reported to incorporate the group effect into the sCCA model for fMRI and SNP data integration. Motivated by this fact, in this paper, we developed a group sparse CCA model based integration method by imposing the sparse group lasso penalty on the CCA model for the integrative analysis of SNP and fMRI data; please refer to Fig. 1 for illustration. This method has the following advantages: (1) A group of features (voxels/SNPs) will be inspected during the correlation analysis, which can study the joint effects of multiple SNPs on the regions of voxels; (2) feature selection will be performed at both group and single feature level. Irrelevant groups of features as well as single feature within each group can be removed. Our group sparse CCA method can both exploit group information in the correlation analysis while filter out noisy features within the group simultaneously.

The group sparse CCA can estimate the correlation between canonical variates, corresponding to a set of significant SNPs or brain imaging voxels. Based on the estimates, we provided a gene-ROI correlation analysis to further confirm the significance of the correlations between genes and brain functions in ROIs.

The rest of the paper is organized as follows. The proposed group sparse CCA model and algorithm are introduced in the section of theory. The group sparse CCA based integration method for SNP and fMRI data is described in the section of method. The validation and comparison of our model with other sCCA models on both simulated and real data analysis are presented in the section of results. The pathway analysis results and limitations of the proposed method are discussed finally.

Section snippets

Theory

In this section, we first introduces CCA model, based on which the group sparse model is presented. Then we propose a numerical algorithm based on block coordinate descent to solve the model. Finally, we show that the general model we propose can include several existing sCCA models and hence the numerical algorithm can also be applied for their efficient solutions.

Method

We applied group sparse CCA to investigate the association of functional brain regions with genetic variations as shown in Fig. 1. Components extracted from fMRI represent brain regions expressing the functional difference in different subjects. Components from SNP data are linear combinations of SNPs from different genes that may have associations with the disease. After preprocessing, the collected SNPs and ROI-based voxels are both still high dimensional with a large number of features

Simulation

To assess the performance of the proposed group sparse CCA method, we first simulated two correlated data sets and then we compared group sparse CCA with the other penalized CCA methods such as the CCA-group and CCA-l1 on these simulated data.

Two data sets of SNP data X with p SNPs and fMRI data Y with q voxels were simulated. To correlate the SNPs with the voxels, a latent model similar to (Parkhomenko et al., 2009) was used. We first set a latent variable ϒ = {γi|i = 1, …, n} with normal

Discussion and conclusion

In this paper, we proposed a novel method to explore the relationship between genomic data and fMRI brain imaging data by considering the group effects of the variables in the data. We introduced the group sparse CCA method and the numerical implementation based on the regularized SVD and block coordinate decent algorithm. The performance of group sparse CCA model was compared with other sCCA models in the simulation study, showing that our group sparse CCA method could better recover the true

Acknowledgement

This work is partially supported by both NSF and NIH. It is also supported by Shanghai Eastern Scholarship Program.

References (54)

  • V. Kumari et al.

    Procedural learning in schizophrenia: a functional magnetic resonance imaging investigation

    Schizophr. Res.

    (2002)
  • X. Li et al.

    FMRI study of language activation in schizophrenia, schizoaffective disorder and in individuals genetically at high risk

    Schizophr. Res.

    (2007)
  • M.S. Lidow

    Calcium signaling dysfunction in schizophrenia: a unifying approach

    Brain Res. Brain Res. Rev.

    (2003)
  • S.A. Meda et al.

    An fMRI study of working memory in first-degree unaffected relatives of schizophrenia patients

    Schizophr. Res.

    (2008)
  • D.J. Porteous et al.

    The genetics and biology of DISC1 – an emerging role in psychosis and cognition

    Biol. Psychiatr.

    (2006)
  • M.E. Shenton et al.

    A review of MRI findings in schizophrenia

    Schizophr. Res.

    (2001)
  • J. Sui et al.

    Discriminating schizophrenia and bipolar disorder by fusing fMRI and DTI in a multimodal CCA+ joint ICA model

    Neuroimage

    (2011)
  • S.R. Sutrala et al.

    Gene copy number variation in schizophrenia

    Schizophr. Res.

    (2007)
  • G.R. Szycik et al.

    Audiovisual integration of speech is disturbed in schizophrenia: an fMRI study

    Schizophr. Res.

    (2009)
  • E.F. Torrey

    Schizophrenia and the inferior parietal lobule

    Schizophr. Res.

    (2007)
  • N. Tzourio-Mazoyer et al.

    Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain

    Neuroimage

    (2002)
  • J.A. Badner et al.

    Meta-analysis of whole-genome linkage scans of bipolar disorder and schizophrenia

    Mol. Psychiatr.

    (2002)
  • M. Bellani et al.

    The potential role of the parietal lobe in schizophrenia

    Epidemiol. Psichiatr. Soc.

    (2010)
  • Boutte, D., Liu, J., 2010. Sparse canonical correlation analysis applied to fMRI and genetic data fusion. In: IEEE...
  • J.H. Callicott et al.

    Variation in DISC1 affects hippocampal structure and function and increases risk for schizophrenia

    Proc. Natl. Acad. Sci. USA

    (2005)
  • X. Chen et al.

    An efficient optimization algorithm for structured sparse CCA, with applications to eQTL Mapping

    Stat. Biosci.

    (2012)
  • J. Chen et al.

    Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis

    Biostatistics

    (2013)
  • Cited by (126)

    • Enhanced neuroimaging genetics using multi-view non-negative matrix factorization with sparsity and prior knowledge

      2022, Medical Image Analysis
      Citation Excerpt :

      Alam et al. handled outliers from multimodal datasets using the influence function of the multiple kernel CCA (Alam et al., 2018). Lin et al. and Du et al. adapted group sparse CCA by introducing a two-group l1 norm to consider linkage disequilibrium (LD) blocks from the SNP grouping structure and regions of interest (ROIs) from the imaging QT grouping structure (Du et al., 2014; Lin et al., 2014). Yan et al. also proposed two regularization terms for LD blocks and ROIs to incorporate co-expression patterns across the genes from the amyloid pathway by using a network-guided penalty (Yan et al., 2014b).

    View all citing articles on Scopus
    View full text