Correspondence between fMRI and SNP data by group sparse canonical correlation analysis

doi:10.1016/j.media.2013.10.010

Medical Image Analysis

Volume 18, Issue 6, August 2014, Pages 891-902

https://doi.org/10.1016/j.media.2013.10.010 Get rights and content

Highlights

•
A group sparse canonical correlation analysis (CCA) model was developed, including several sparse models as special cases.
•
The model can overcome the difficulty of conventional CCA in analysing high dimensional data with group structures.
•
The model is validated by studying a biologically significant problem on how genetic variations influence brain activities.

Abstract

Both genetic variants and brain region abnormalities are recognized as important factors for complex diseases (e.g., schizophrenia). In this paper, we investigated the correspondence between single nucleotide polymorphism (SNP) and brain activity measured by functional magnetic resonance imaging (fMRI) to understand how genetic variation influences the brain activity. A group sparse canonical correlation analysis method (group sparse CCA) was developed to explore the correlation between these two datasets which are high dimensional-the number of SNPs/voxels is far greater than the number of samples. Different from the existing sparse CCA methods (sCCA), our approach can exploit structural information in the correlation analysis by introducing group constraints. A simulation study demonstrates that it outperforms the existing sCCA. We applied this method to the real data analysis and identified two pairs of significant canonical variates with average correlations of 0.4527 and 0.4292 respectively, which were used to identify genes and voxels associated with schizophrenia. The selected genes are mostly from 5 schizophrenia (SZ)-related signalling pathways. The brain mappings of the selected voxles also indicate the abnormal brain regions susceptible to schizophrenia. A gene and brain region of interest (ROI) correlation analysis was further performed to confirm the significant correlations between genes and ROIs.

Graphical abstract

In this paper, we investigated the correspondence between single nucleotide polymorphism (SNP) and brain activity measured by functional magnetic resonance imaging (fMRI). Such a study is biologically significant for understanding how genetic variation influences the brain activity (see figure). A group sparse canonical correlation analysis method (group sparse CCA) was developed to explore the correlation between these two data sets which are high dimensional-the number of SNPs/voxels is far greater than the number of samples. A simulation study demonstrates that it outperforms the existing sCCA. We applied this method to the real data analysis and identified two pairs of significant canonical variates, which were then used to identify genes and voxels associated with schizophrenia. A gene and brain region of interest (ROI) correlation analysis was further performed to confirm the significant correlations between genes and ROIs.

Introduction

Schizophrenia is a complex disease and considered to be caused by the interplay of a number of genetic factors (e.g., change of gene regulation, and alteration of mRNA and SNP) and environmental effects. Genetic factors play an important role in causing schizophrenia disease. People born from a family with a history of schizophrenia have higher risks of schizophrenia than those without a family schizophrenia history. In recent years, many studies have focused on exploring critical genes associated with the schizophrenia. Many potential genetic variants have been reported as possible risk factors such as the G72/G30 gene locus on chromosome 13q (Badner and Gershon, 2002; Abecasis et al., 2004) Gene DISC1 variation (Callicott et al., 2005, Porteous et al., 2006) and copy number variations on gene GRIK3, EFNA5, AKAP5 and CACNG2 (Wilson et al., 2006, Sutrala et al., 2007). In addition to genetic studies, fMRI has also been widely used for the study of schizophrenia because of its capability to identify functional abnormalities within brain regions of schizophrenic patients (Jansma et al., 2004, Li et al., 2007, Meda et al., 2008, Szycik et al., 2009).

Genetic variants and brain region abnormalities are both important markers for the study of schizophrenia. Combining both data can not only contribute to a better understanding of biological mechanisms on brain structure and function but also have the potential to improve the diagnosis and treatments of complex diseases. However, current imaging genetics studies either take brain imaging measurements as endophenotypes to study the associated genetic variants or investigate the effects of a small set of candidate genetic variants on the whole brain measurements (Hamid et al., 2009, Le Cao et al., 2009, Wiley, 2011). It is still challenging to explore the relationship between a large amount of genetic variants and a large number of brain imaging measurements. Therefore, correlative analysis approaches for large-scale multimodal data analysis are highly demanded.

In this work, we aim to study the effects of multiple SNPs or genes on functional brain activity in schizophrenia. An effective multivariate statistical method is needed. Canonical Correlation Analysis (CCA (Hotelling, 1936)) or Partial Least Squares regression (PLSR (Le Cao et al., 2008)) have been proposed to analyze multimodal datasets. The CCA aims to maximize the correlation between the linear combinations of variables from two data sets, e.g., a linear combination of SNPs and a linear combination of voxels. However, the method will have the over-fitting issues in analyzing high dimensional data such as SNP and brain imaging data as shown in Fig. 1. Thousands of SNPs with linkage disequilibrium (LD) are detected to reflect the genetic variant at different locus. The number of voxels included in the whole brain fMRI image is also very large (e.g., 53 × 63 × 46). Traditional CCA will perform poorly in such a case due to the multi-collinearity (linear dependence) problem, and thus having computational difficulty (Parkhomenko et al., 2009). To address above issue, sparse CCA (sCCA) methods, mostly using the l − 1 norm (CCA-l1) or the combination of l − 1 and l − 2 norm (CCA-elastic net) penalties, have been developed by introducing the sparse penalties into the traditional CCA model (Waaijenborg et al., 2008, Le Cao et al., 2009, Parkhomenko et al., 2009, Witten et al., 2009, Witten and Tibshirani, 2009, Boutte and Liu, 2010). Despite the success, they didn’t account for group structures within the data in the analysis (e.g., multiple SNPs within the same gene, a group of voxels within the same region, a group of voxels within the same ROI, etc.), which often exist or are implied by the biological mechanism. For example, SNPs within the same gene have similar functions and act together at the gene or pathway level to affect the brain activity. These SNP effects can be added up to a larger difference (Tyekucheva et al., 2011). Several previous works have shown the benefit of accounting for the group effect of features in the sCCA models (Chen and Liu, 2012, Chen et al., 2013, Lin et al., 2013). However, to our knowledge, little work has been reported to incorporate the group effect into the sCCA model for fMRI and SNP data integration. Motivated by this fact, in this paper, we developed a group sparse CCA model based integration method by imposing the sparse group lasso penalty on the CCA model for the integrative analysis of SNP and fMRI data; please refer to Fig. 1 for illustration. This method has the following advantages: (1) A group of features (voxels/SNPs) will be inspected during the correlation analysis, which can study the joint effects of multiple SNPs on the regions of voxels; (2) feature selection will be performed at both group and single feature level. Irrelevant groups of features as well as single feature within each group can be removed. Our group sparse CCA method can both exploit group information in the correlation analysis while filter out noisy features within the group simultaneously.

The group sparse CCA can estimate the correlation between canonical variates, corresponding to a set of significant SNPs or brain imaging voxels. Based on the estimates, we provided a gene-ROI correlation analysis to further confirm the significance of the correlations between genes and brain functions in ROIs.

The rest of the paper is organized as follows. The proposed group sparse CCA model and algorithm are introduced in the section of theory. The group sparse CCA based integration method for SNP and fMRI data is described in the section of method. The validation and comparison of our model with other sCCA models on both simulated and real data analysis are presented in the section of results. The pathway analysis results and limitations of the proposed method are discussed finally.

Section snippets

Theory

In this section, we first introduces CCA model, based on which the group sparse model is presented. Then we propose a numerical algorithm based on block coordinate descent to solve the model. Finally, we show that the general model we propose can include several existing sCCA models and hence the numerical algorithm can also be applied for their efficient solutions.

Method

We applied group sparse CCA to investigate the association of functional brain regions with genetic variations as shown in Fig. 1. Components extracted from fMRI represent brain regions expressing the functional difference in different subjects. Components from SNP data are linear combinations of SNPs from different genes that may have associations with the disease. After preprocessing, the collected SNPs and ROI-based voxels are both still high dimensional with a large number of features

Simulation

To assess the performance of the proposed group sparse CCA method, we first simulated two correlated data sets and then we compared group sparse CCA with the other penalized CCA methods such as the CCA-group and CCA-l1 on these simulated data.

Two data sets of SNP data X with p SNPs and fMRI data Y with q voxels were simulated. To correlate the SNPs with the voxels, a latent model similar to (Parkhomenko et al., 2009) was used. We first set a latent variable $ϒ$ = {γ_i|i = 1, …, n} with normal

Discussion and conclusion

In this paper, we proposed a novel method to explore the relationship between genomic data and fMRI brain imaging data by considering the group effects of the variables in the data. We introduced the group sparse CCA method and the numerical implementation based on the regularized SVD and block coordinate decent algorithm. The performance of group sparse CCA model was compared with other sCCA models in the simulation study, showing that our group sparse CCA method could better recover the true

Acknowledgement

This work is partially supported by both NSF and NIH. It is also supported by Shanghai Eastern Scholarship Program.

References (54)

G.R. Abecasis et al.
Genomewide scan in families with schizophrenia from the founder population of Afrikaners reveals evidence for linkage and uniparental disomy on chromosome 1
Am. J. Hum. Genet.
(2004)
N.C. Andreasen et al.
The role of the cerebellum in schizophrenia
Biol. Psychiatr.
(2008)
J.R. Bishop et al.
Association between the polymorphic GRM3 gene and negative symptom improvement during olanzapine treatment
Schizophr. Res.
(2005)
A. Buonanno
The neuregulin signaling pathway and schizophrenia: from genes to synapses and neural circuits
Brain Res. Bull.
(2010)
S.M. Clinton et al.
Thalamic dysfunction in schizophrenia: neurochemical, neuropathological, and in vivo imaging abnormalities
Schizophr. Res.
(2004)
P. Fransson et al.
The precuneus/posterior cingulate cortex plays a pivotal role in the default mode network: evidence from a partial correlation network analysis
Neuroimage
(2008)
J.M. Jansma et al.
Working memory capacity in schizophrenia: a parametric fMRI study
Schizophr. Res.
(2004)
R. Karlsson et al.
MAGI1 copy number variation in bipolar affective disorder and schizophrenia
Biol. Psychiatr.
(2012)
K.A. Kiehl et al.
An event-related functional magnetic resonance imaging study of an auditory oddball task in schizophrenia
Schizophr. Res.
(2001)
K.A. Kiehl et al.
Abnormal hemodynamics in schizophrenia during an auditory oddball task
Biol. Psychiatr.
(2005)

V. Kumari et al.

Procedural learning in schizophrenia: a functional magnetic resonance imaging investigation

Schizophr. Res.

(2002)

X. Li et al.

FMRI study of language activation in schizophrenia, schizoaffective disorder and in individuals genetically at high risk

Schizophr. Res.

(2007)

M.S. Lidow

Calcium signaling dysfunction in schizophrenia: a unifying approach

Brain Res. Brain Res. Rev.

(2003)

S.A. Meda et al.

An fMRI study of working memory in first-degree unaffected relatives of schizophrenia patients

Schizophr. Res.

(2008)

D.J. Porteous et al.

The genetics and biology of DISC1 – an emerging role in psychosis and cognition

Biol. Psychiatr.

(2006)

M.E. Shenton et al.

A review of MRI findings in schizophrenia

Schizophr. Res.

(2001)

J. Sui et al.

Discriminating schizophrenia and bipolar disorder by fusing fMRI and DTI in a multimodal CCA+ joint ICA model

Neuroimage

(2011)

S.R. Sutrala et al.

Gene copy number variation in schizophrenia

Schizophr. Res.

(2007)

G.R. Szycik et al.

Audiovisual integration of speech is disturbed in schizophrenia: an fMRI study

Schizophr. Res.

(2009)

E.F. Torrey

Schizophrenia and the inferior parietal lobule

Schizophr. Res.

(2007)

N. Tzourio-Mazoyer et al.

Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain

Neuroimage

(2002)

J.A. Badner et al.

Meta-analysis of whole-genome linkage scans of bipolar disorder and schizophrenia

Mol. Psychiatr.

(2002)

M. Bellani et al.

The potential role of the parietal lobe in schizophrenia

Epidemiol. Psichiatr. Soc.

(2010)

Boutte, D., Liu, J., 2010. Sparse canonical correlation analysis applied to fMRI and genetic data fusion. In: IEEE...

J.H. Callicott et al.

Variation in DISC1 affects hippocampal structure and function and increases risk for schizophrenia

Proc. Natl. Acad. Sci. USA

(2005)

X. Chen et al.

An efficient optimization algorithm for structured sparse CCA, with applications to eQTL Mapping

Stat. Biosci.

(2012)

J. Chen et al.

Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis

Biostatistics

(2013)

Cited by (126)

Hypergraph-regularized multimodal learning by graph diffusion for imaging genetics based Alzheimer's Disease diagnosis
2023, Medical Image Analysis
Recent studies show that multi-modal data fusion techniques combining information from diverse sources are helpful to diagnose and predict complex brain disorders. However, most existing diagnosis methods have only simply employed a feature combination strategy for multiple imaging and genetic data, ignoring the imaging phenotypes associated with the risk gene information. To this end, we present a hypergraph-regularized multimodal learning by graph diffusion (HMGD) for joint association learning and outcome prediction. Specifically, we first present a graph diffusion method for enhancing similarity measures among subjects given from multi-modality phenotypes, which fully uses multiple input similarity graphs and integrates them into a unified graph with valuable geometric structures among different imaging phenotypes. Then, we employ the unified graph to represent the high-order similarity relationships among subjects, and enforce a hypergraph-regularized term to incorporate both inter- and cross-modality information for selecting the imaging phenotypes associated with the risk single nucleotide polymorphism (SNP). Finally, a multi-kernel support vector machine (MK-SVM) is adopted to fuse such phenotypic features selected from different modalities for the final diagnosis and prediction. The proposed approach is experimentally explored on brain imaging genetic data of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) datasets. Relevant results present that the proposed approach is superior to several competing algorithms, and realizes strong associations and discovers significant consistent and robust ROIs across different imaging phenotypes associated with the genetic risk biomarkers to guide disease interpretation and prediction.
inMTSCCA: An Integrated Multi-task Sparse Canonical Correlation Analysis for Multi-omic Brain Imaging Genetics
2023, Genomics, Proteomics and Bioinformatics
Identifying genetic risk factors for Alzheimer’s disease (AD) is an important research topic. To date, different endophenotypes, such as imaging-derived endophenotypes and proteomic expression-derived endophenotypes, have shown the great value in uncovering risk genes compared to case–control studies. Biologically, a co-varying pattern of different omics-derived endophenotypes could result from the shared genetic basis. However, existing methods mainly focus on the effect of endophenotypes alone; the effect of cross-endophenotype (CEP) associations remains largely unexploited. In this study, we used both endophenotypes and their CEP associations of multi-omic data to identify genetic risk factors, and proposed two integrated multi-task sparse canonical correlation analysis (inMTSCCA) methods, i.e., pairwise endophenotype correlation-guided MTSCCA (pcMTSCCA) and high-order endophenotype correlation-guided MTSCCA (hocMTSCCA). pcMTSCCA employed pairwise correlations between magnetic resonance imaging (MRI)-derived, plasma-derived, and cerebrospinal fluid (CSF)-derived endophenotypes as an additional penalty. hocMTSCCA used high-order correlations among these multi-omic data for regularization. To figure out genetic risk factors at individual and group levels, as well as altered endophenotypic markers, we introduced sparsity-inducing penalties for both models. We compared pcMTSCCA and hocMTSCCA with three related methods on both simulation and real (consisting of neuroimaging data, proteomic analytes, and genetic data) datasets. The results showed that our methods obtained better or comparable canonical correlation coefficients (CCCs) and better feature subsets than benchmarks. Most importantly, the identified genetic loci and heterogeneous endophenotypic markers showed high relevance. Therefore, jointly using multi-omic endophenotypes and their CEP associations is promising to reveal genetic risk factors. The source code and manual of inMTSCCA are available at https://ngdc.cncb.ac.cn/biocode/tools/BT007330.
Canonical Correlation Analysis and Partial Least Squares for Identifying Brain–Behavior Associations: A Tutorial and a Comparative Study
2022, Biological Psychiatry: Cognitive Neuroscience and Neuroimaging
Canonical correlation analysis (CCA) and partial least squares (PLS) are powerful multivariate methods for capturing associations across 2 modalities of data (e.g., brain and behavior). However, when the sample size is similar to or smaller than the number of variables in the data, standard CCA and PLS models may overfit, i.e., find spurious associations that generalize poorly to new data. Dimensionality reduction and regularized extensions of CCA and PLS have been proposed to address this problem, yet most studies using these approaches have some limitations. This work gives a theoretical and practical introduction into the most common CCA/PLS models and their regularized variants. We examine the limitations of standard CCA and PLS when the sample size is similar to or smaller than the number of variables. We discuss how dimensionality reduction and regularization techniques address this problem and explain their main advantages and disadvantages. We highlight crucial aspects of the CCA/PLS analysis framework, including optimizing the hyperparameters of the model and testing the identified associations for statistical significance. We apply the described CCA/PLS models to simulated data and real data from the Human Connectome Project and Alzheimer’s Disease Neuroimaging Initiative (both of n > 500). We use both low- and high-dimensionality versions of these data (i.e., ratios between sample size and variables in the range of ∼1–10 and ∼0.1–0.01, respectively) to demonstrate the impact of data dimensionality on the models. Finally, we summarize the key lessons of the tutorial.
Enhanced neuroimaging genetics using multi-view non-negative matrix factorization with sparsity and prior knowledge
2022, Medical Image Analysis
Citation Excerpt :
Alam et al. handled outliers from multimodal datasets using the influence function of the multiple kernel CCA (Alam et al., 2018). Lin et al. and Du et al. adapted group sparse CCA by introducing a two-group l1 norm to consider linkage disequilibrium (LD) blocks from the SNP grouping structure and regions of interest (ROIs) from the imaging QT grouping structure (Du et al., 2014; Lin et al., 2014). Yan et al. also proposed two regularization terms for LD blocks and ROIs to incorporate co-expression patterns across the genes from the amyloid pathway by using a network-guided penalty (Yan et al., 2014b).
Neuroimaging genetics is a powerful approach to jointly explore genetic features with rich brain imaging phenotypes for neurodegenerative diseases. Conventional imaging genetics approaches based on canonical correlation analysis cannot accommodate multimodal inputs effectively and have limited interpretability. We propose a novel imaging genetics approach based on non-negative matrix factorization (NMF). By leveraging the parsimonious property known as topic modeling in multi-view NMF, we add sparsity constraints and prior information to identify a sparse set of biologically related features across modalities. Thus, our approach incorporates prior knowledge and improves multimodal integration capabilities and interpretability. We applied our algorithm to simulated and real imaging genetics datasets of Parkinson's disease (PD) for performance evaluation. Our algorithm could identify important associated features mapped to interpretable distinct topics more robustly than other methods. It revealed promising features of single-nucleotide polymorphisms and brain regions related to a subset of PD-related clinical scores in a few topics using a real imaging genetic dataset. The proposed imaging genetics approach can reveal novel associations between genetic and neuroimaging features to improve understanding of various neurodegenerative diseases.
Modeling genotype-protein interaction and correlation for Alzheimer's disease: a multi-omics imaging genetics study
2024, Briefings in Bioinformatics
Identification of Genetic Risk Factors Based on Disease Progression Derived from Longitudinal Brain Imaging Phenotypes
2024, IEEE Transactions on Medical Imaging

View all citing articles on Scopus

View full text

Correspondence between fMRI and SNP data by group sparse canonical correlation analysis

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Theory

Method

Simulation

Discussion and conclusion

Acknowledgement

Am. J. Hum. Genet.

Biol. Psychiatr.

Schizophr. Res.

Brain Res. Bull.

Schizophr. Res.

Neuroimage

Schizophr. Res.

Biol. Psychiatr.

Schizophr. Res.

Biol. Psychiatr.

Schizophr. Res.

Schizophr. Res.

Brain Res. Brain Res. Rev.

Schizophr. Res.

Biol. Psychiatr.

Schizophr. Res.

Neuroimage

Schizophr. Res.

Schizophr. Res.

Schizophr. Res.

Neuroimage

Meta-analysis of whole-genome linkage scans of bipolar disorder and schizophrenia

Mol. Psychiatr.

The potential role of the parietal lobe in schizophrenia

Epidemiol. Psichiatr. Soc.

Variation in DISC1 affects hippocampal structure and function and increases risk for schizophrenia

Proc. Natl. Acad. Sci. USA

An efficient optimization algorithm for structured sparse CCA, with applications to eQTL Mapping

Stat. Biosci.

Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis

Biostatistics