Deriving reproducible biomarkers from multi-site resting-state data: An Autism-based example

doi:10.1016/j.neuroimage.2016.10.045

NeuroImage

Volume 147, 15 February 2017, Pages 736-745

https://doi.org/10.1016/j.neuroimage.2016.10.045 Get rights and content

Highlights

•
We propose a fully-automatic pipeline to extract biomarkers from resting state fMRI.
•
We demonstrate prediction in a clinical setting, on subjects coming from unseen site.
•
On 871 subjects of the ABIDE dataset we achieve prediction accuracy better than state of the art (68%).
•
A post-hoc analysis of the pipeline steps sketches an ideal pipeline for prediction.
•
Extracted autism biomarkers are stable across training sets and consistent with literature.

Abstract

Resting-state functional Magnetic Resonance Imaging (R-fMRI) holds the promise to reveal functional biomarkers of neuropsychiatric disorders. However, extracting such biomarkers is challenging for complex multi-faceted neuropathologies, such as autism spectrum disorders. Large multi-site datasets increase sample sizes to compensate for this complexity, at the cost of uncontrolled heterogeneity. This heterogeneity raises new challenges, akin to those face in realistic diagnostic applications. Here, we demonstrate the feasibility of inter-site classification of neuropsychiatric status, with an application to the Autism Brain Imaging Data Exchange (ABIDE) database, a large (N=871) multi-site autism dataset. For this purpose, we investigate pipelines that extract the most predictive biomarkers from the data. These R-fMRI pipelines build participant-specific connectomes from functionally-defined brain areas. Connectomes are then compared across participants to learn patterns of connectivity that differentiate typical controls from individuals with autism. We predict this neuropsychiatric status for participants from the same acquisition sites or different, unseen, ones. Good choices of methods for the various steps of the pipeline lead to 67% prediction accuracy on the full ABIDE data, which is significantly better than previously reported results. We perform extensive validation on multiple subsets of the data defined by different inclusion criteria. These enables detailed analysis of the factors contributing to successful connectome-based prediction. First, prediction accuracy improves as we include more subjects, up to the maximum amount of subjects available. Second, the definition of functional brain areas is of paramount importance for biomarker discovery: brain areas extracted from large R-fMRI datasets outperform reference atlases in the classification tasks.

Graphical abstract

Introduction

In psychiatry, as in other fields of medicine, both the standardized observation of signs, as well as the symptom profile are critical for diagnosis. However, compared to other fields of medicine, psychiatry lacks accompanying objective markers that could lead to more refined diagnoses and targeted treatment (Kapur et al., 2012). Advances in non-invasive brain imaging techniques and analyses (e.g. Craddock et al., 2013, Van Essen and Ugurbil, 2012) are showing great promise for uncovering patterns of brain structure and function that can be used as objective measures of mental illness. Such neurophenotypes are important for clinical applications such as disease staging, determination of risk prognosis, prediction and monitoring of treatment response, and aid towards diagnosis (e.g. Castellanos et al. (2013)).

Among the many imaging techniques available, resting-state fMRI (R-fMRI) is a promising candidate to define functional neurophenotypes (Kelly et al., 2008, Van Essen and Ugurbil, 2012). In particular, it is non-invasive and, unlike conventional task-based fMRI, it does not require a constrained experimental setup nor the active and focused participation of the subject. It has been proven to capture interactions between brain regions that may lead to neuropathology diagnostic biomarkers (Greicius, 2008). Numerous studies have linked variations in brain functional architecture measured from R-fMRI to behavioral traits and mental health conditions such as Alzheimer disease (e.g. Greicius et al., 2004, Chen et al., 2011, Schizophrenia (e.g. Garrity et al.,, Zhou et al., 2007, Jafri et al., 2008, Calhoun et al.,), ADHD, autism (e.g. Plitt et al., 2015 and others (e.g. Anderson et al. 2011 ). Extending these findings, predictive modeling approaches have revealed patterns of brain functional connectivity that could serve as biomarkers for classifying depression (e.g. Craddock et al.), ADHD (e.g. Consortium et al.), autism (e.g. Anderson et al. (2011)), and even age (Dosenbach et al., 2010). This growing number of studies has shown the feasibility of using R-fMRI to identify biomarkers. However questions about the readiness of R-fMRI to detect clinically useful biomarkers remain (Plitt et al., 2015). In particular, the reproducibility and generalizability of these approaches in research or clinical settings are debatable. Given the modest sample size of most R-fMRI studies, the effect of cross-study differences in data acquisition, image processing, and sampling strategies (Desmond and Glover, 2002, Murphy and Garavan, 2004, Thirion et al., 2007) has not been quantified.

Using larger datasets is commonly cited as a solution to challenges in reproducibility and statistical power (Button et al., 2013). They are considered a prerequisite to R-fMRI-based classifiers for the detection of psychiatric illness. Recent efforts have accelerated the generation of large databases through sharing and aggregating independent data samples (Fair et al., Mennes et al., 2013, Di Martino et al., 2014). However, a number of concerns must be addressed before accepting the utility of this approach. Most notably, the many potential sources of uncontrolled variation that can exist across studies and sites, which range from MRI acquisition protocols (e.g. scanner type, imaging sequence, see Friedman et al. (2008)), to participant instructions (e.g. eyes open vs. closed, see Yan et al. (2013)), to recruitment strategies (age-group, IQ-range, level of impairment, treatment history and acceptable comorbidities). Such variation in aggregate samples is often viewed as dissuasive, as its effect on diagnosis and biomarker extraction is unknown. It commonly motivates researchers to limit the number of sites included in their analyses at the cost of sample size.

Cross-validated results obtained from predictive models are more robust to inhomogeneities: they measure model generalizability by applying it to unseen data, i. e. , data not used to train the model. In particular, leave-out cross-validation strategies, which remove single individuals (or random subsets), are common in biomarkers studies. However, these strategies do not measure the effect of potential site-specific confounds. In the present study we leverage aggregated R-fMRI samples to address this problem. Instead of leaving out random subsamples as test sets, we left out entire sites to measure performance in the presence of uncontrolled variability.

Beyond challenges due to inter-site data heterogeneity, choices in the functional-connectivity data-processing pipeline further add to the variability of results (Carp, 2012, Yan et al., 2013, Shirer et al.,). While preprocessing procedures are now standard, the different steps of the prediction pipeline vary from one study to another. These entail specifying regions of interest, extracting regional time courses, computing connectivity between regions, and identifying connections that relate to subject's phenotypes (Craddock et al.,; Richiardi et al., 2011, Shirer et al., 2012, Eickhoff et al., 2015).

Lack of ground truth for brain functional architecture undermines the validation of R-fMRI data-processing pipelines. The use of functional connectivity for individual prediction suggests a natural figure of merit: prediction accuracy. We contribute quantitative evaluations, to help settling down on a parameter-free pipeline for R-fMRI. Using efficient implementations, we were able to evaluate many pipeline options and select the best method to estimate atlases, extract connectivity matrices, and predict phenotypes.

To demonstrate that pipelines to extract R-fMRI neuro-phenotypes can reliably learn inter-site biomarkers of psychiatric status on inhomogeneous data, we analyzed R-fMRI in the Autism Brain Imaging Data Exchange (ABIDE) (Di Martino et al., 2014). It compiles a dataset of 1112 R-fMRI participants by gathering data from 17 different sites. After preprocessing, we selected 871 to meet quality criteria for MRI and phenotypic information. Our inter-site prediction methodology reproduced conditions found under most clinical settings, by leaving out whole sites and using them as newly seen test sets. To validate the robustness of our approach, we performed nested cross-validation and varied samples per inclusion criteria (e.g. sex, age). Finally, to assess losses in predictive power associated with using a heterogeneous aggregate dataset instead of uniformly defined samples, we included a comparison of intra- and inter-site prediction strategies.

Section snippets

Material and methods

A connectome is a functional-connectivity matrix between a set brain regions of interest (ROIs). We call such a set of ROIs an atlas, even though some of the methods we consider extract the regions from the data rather than relying on a reference atlas (see Fig. 1). We investigate here pipelines that discriminate individuals based on the connection strength of this connectome (Varoquaux and Craddock, 2013), with a classifier on the edge weights (Craddock et al.,).

Specifically, we consider

Results

Here we present the most notable trends emerging from our analyses. More details are provided in supplementary materials. First, we compared inter- and intra-site prediction of ASD diagnostic labels while varying the number of subjects in the training set. Second, in a post-hoc analysis, we identified the pipeline choices most relevant for prediction and proposed a good choice of pipeline steps for prediction. Finally, by analyzing the weights of the classifiers, we highlighted functional

Discussion

We studied pipelines that extract neurophenotypes from aggregate R-fMRI datasets through the following steps: (1) region definition from R-fMRI, (2) extraction of regions activity time series (3) estimation of functional interactions between regions, and (4) construction of a discriminant model for brain-based diagnostic classification. The proposed pipelines can be built with the Nilearn neuroimaging-analysis software and the atlas computed with MSDL is available for download.¹³

Acknowledgments

We acknowledge funding from the NiConnect project (ANR-11-BINF-0004 NiConnect) and the SUBSample project from the DIGITEO Institute, France. The effort of Adriana Di Martino was partially supported by the NIMH grant 1R21MH107045-01A1. Support for M.P. Milham made possible by gifts to the Child Mind Institute from Phyllis Green, Randolph Cowen, and Joseph Healey. Finally, we thank the anonymous reviewers for their feedback, as well as all sites and investigators who have worked to share their

References (76)

Y. Behzadi et al.
A component based noise correction method (CompCor) for bold and perfusion based fMRI
Neuroimage
(2007)
J. Carp
The secret lives of experiments: methods reporting in the fMRI literature
Neuroimage
(2012)
F.X. Castellanos et al.
Clinical applications of the functional connectome
Neuroimage
(2013)
R.S. Desikan et al.
An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest
Neuroimage
(2006)
J.E. Desmond et al.
Estimating sample size in functional MRI (fMRI) neuroimaging studies: statistical power analyses
J. Neurosci. Methods
(2002)
A. Di Martino et al.
Aberrant striatal functional connectivity in children with autism
Biol. Psychiatry
(2011)
I. Dinstein et al.
Disrupted neural synchronization in toddlers with autism
Neuron
(2011)
D.H. Geschwind et al.
Autism spectrum disorders: developmental disconnection syndromes
Curr. Opin. Neurobiol.
(2007)
C. Goutte et al.
On clustering fMRI time series
NeuroImage
(1999)
M.J. Jafri et al.
A method for functional network connectivity among spatially independent resting-state components in schizophrenia
Neuroimage
(2008)

A.C. Kelly et al.

Competition between functional brain networks mediates behavioral variability

Neuroimage

(2008)

N.M. Kleinhans et al.

Atypical functional lateralization of language in autism spectrum disorders

Brain Res.

(2008)

D. Lashkari et al.

Discovering structure in the space of fMRI selectivity profiles

NeuroImage

(2010)

O. Ledoit et al.

A well-conditioned estimator for large-dimensional covariance matrices

J. Multivar. Anal.

(2004)

T.E. Lund et al.

Motion or activity: their role in intra-and inter-subject variation in fMRI

Neuroimage

(2005)

P. Martinsson et al.

A randomized algorithm for the decomposition of matrices

Appl. Comput. Harmon. Anal.

(2011)

M. Mennes et al.

Making data sharing work: the fcp/indi experience

Neuroimage

(2013)

C.S. Monk et al.

Abnormalities of intrinsic functional connectivity in autism spectrum disorders

Neuroimage

(2009)

K. Murphy et al.

An empirical investigation into the number of subjects required for an event-related fmri study

Neuroimage

(2004)

R. Saxe et al.

People thinking about thinking people: the role of the temporo-parietal junction in theory of mind

Neuroimage

(2003)

S. Smith et al.

Network modelling methods for fMRI

Neuroimage

(2011)

B. Thirion et al.

Analysis of a large fMRI cohort: statistical and methodological issues for group analyses

Neuroimage

(2007)

D.C. Van Essen et al.

The future of the human connectome

Neuroimage

(2012)

G. Varoquaux et al.

Learning and comparing functional connectomes across subjects

NeuroImage

(2013)

G. Varoquaux et al.

A group model for stable multi-subject ICA on fMRI datasets

NeuroImage

(2010)

M.E. Vissers et al.

Brain connectivity and high functioning autism: a promising path of research that needs refined models, methodological convergence, and stronger behavioral links

Neurosci. Biobehav. Rev.

(2012)

C.-G. Yan et al.

Standardizing the intrinsic brain: towards robust measurement of inter-individual variation in 1000 functional connectomes

Neuroimage

(2013)

Y. Zhou et al.

Functional disintegration in paranoid schizophrenia using resting-state fMRI

Schizophr Res.

(2007)

Abraham, A., Dohmatob, E., Thirion, B., Samaras, D., Varoquaux, G., Extracting brain regions from rest fMRI with...

J.S. Anderson et al.

Decreased interhemispheric functional connectivity in autism

Cereb. Cortex

(2010)

J.S. Anderson et al.

Functional connectivity magnetic resonance imaging classification of autism

Brain

(2011)

Anderson, J.S., Nielsen, J.A., Ferguson, M.A., Burback, M.C., Cox, E.T., Dai, L., Gerig, G., Edgin, J.O., Korenberg,...

B.B. Avants et al.

Advanced normalization tools (ants)

Insight J

(2009)

C.F. Beckmann et al.

Probabilistic independent component analysis for functional magnetic resonance imaging

Trans. Med Im.

(2004)

K.S. Button et al.

Power failure: why small sample size undermines the reliability of neuroscience

Nat. Rev. Neurosci.

(2013)

V.D. Calhoun et al.

A method for making group inferences from fMRI data using independent component analysis

Hum. Brain Mapp.

(2001)

Calhoun, V.D., Sui, J., Kiehl, K., Turner, J., Allen, E., Pearlson, G., Exploring the psychosis functional connectome:...

G. Chen et al.

Classification of alzheimer disease, mild cognitive impairment, and normal cognitive status with large-scale network analysis based on resting-state functional mr imaging

Radiology

(2011)

Cited by (499)

A novel interactive deep cascade spectral graph convolutional network with multi-relational graphs for disease prediction
2024, Neural Networks
Graph neural networks (GNNs) have recently grown in popularity for disease prediction. Existing GNN-based methods primarily build the graph topological structure around a single modality and combine it with other modalities to acquire feature representations of acquisitions. The complicated relationship in each modality, however, may not be well highlighted due to its specificity. Further, relatively shallow networks restrict adequate extraction of high-level features, affecting disease prediction performance. Accordingly, this paper develops a new interactive deep cascade spectral graph convolutional network with multi-relational graphs (IDCGN) for disease prediction tasks. Its crucial points lie in constructing multiple relational graphs and dual cascade spectral graph convolution branches with interaction (DCSGBI). Specifically, the former designs a pairwise imaging-based edge generator and a pairwise non-imaging-based edge generator from different modalities by devising two learnable networks, which adaptively capture graph structures and provide various views of the same acquisition to aid in disease diagnosis. Again, DCSGBI is established to enrich high-level semantic information and low-level details of disease data. It devises a cascade spectral graph convolution operator for each branch and incorporates the interaction strategy between different branches into the network, successfully forming a deep model and capturing complementary information from diverse branches. In this manner, more favorable and sufficient features are learned for a reliable diagnosis. Experiments on several disease datasets reveal that IDCGN exceeds state-of-the-art models and achieves promising results.
Decoding Autism: Uncovering patterns in brain connectivity through sparsity analysis with rs-fMRI data
2024, Journal of Neuroscience Methods
In the realm of neuro-disorders, precise diagnosis and treatment rely heavily on objective imaging-based biomarker identification. This study employs a sparsity approach on resting-state fMRI to discern relevant brain region connectivity for predicting Autism.
The proposed methodology involves four key steps: (1) Utilizing three probabilistic brain atlases to extract functionally homogeneous brain regions from fMRI data. (2) Employing a hybrid approach of Graphical Lasso and Akaike Information Criteria to optimize sparse inverse covariance matrices for representing the brain functional connectivity. (3) Employing statistical techniques to scrutinize functional brain structures in Autism and Control subjects. (4) Implementing both autoencoder-based feature extraction and entire feature-based approach coupled with AI-based learning classifiers to predict Autism.
The ensemble classifier with the extracted feature set achieves a classification accuracy of 84.7% ± 0.3% using the MSDL atlas. Meanwhile, the 1D-CNN model, employing all features, exhibits superior classification accuracy of 88.6% ± 1.7% with the Smith 2009 (rsn70) atlas.
The proposed methodology outperforms the conventional correlation-based functional connectivity approach with a notably high prediction accuracy of more than 88%, whereas considering all direct and noisy indirect region-based functional connectivity, the traditional methods bound the prediction accuracy within 70% to 79%.
This study underscores the potential of sparsity-based FC analysis using rs-fMRI data as a prognostic biomarker for detecting Autism.
Multi-head self-attention mechanism-based global feature learning model for ASD diagnosis
2024, Biomedical Signal Processing and Control
The static functional connectivity (SFC) networks based on resting-state functional MRI (rs-fMRI) typically focus on local correlations between specific brain regions, neglecting the broader connections across the entire brain. This limitation can hinder the accurate diagnosis of neurological conditions such as Autism Spectrum Disorder (ASD). This study aimed to overcome this limitation and improve ASD dentification accuracy.
We propose a self-attention based ASD classification model. Employing sliding windows with longer window width, we locally sample the original data to increase the training sample size, thereby alleviating model overfitting. Subsequently, we introduce the multi-head self-attention mechanism, forming a deep model composed of stacked attention blocks. This ensure the capture of not only local correlations but also overall brain network features, significantly enhancing the classification accuracy of ASD.
Our proposed model was evaluated on fMRI data from the ABIDE NYU site. Experimental results demonstrated an accuracy of 81.47%, a sensitivity of 83.8%, and a specificity of 80.16%. Compared to other methods in the literature, our approach exhibited superior accuracy. Furthermore, the experiments revealed that the biomarkers used by the model for classification are primarily distributed across brain regions such as the superior frontal gyrus, middle frontal gyrus, and hippocampus, aligning with previous research findings.
The sliding window method effectively enriches the dataset and alleviates overfitting. Simultaneously, the suggested model, which relies on self-attention mechanisms, has the ability to effectively extract global information from brain regions, providing a viable method to improve the accuracy of ASD identification.
Joint learning of multi-level dynamic brain networks for autism spectrum disorder diagnosis
2024, Computers in Biology and Medicine
Graph convolutional networks (GCNs), with their powerful ability to model non-Euclidean graph data, have shown advantages in learning representations of brain networks. However, considering the complexity, multilayeredness, and spatio-temporal dynamics of brain activities, we have identified two limitations in current GCN-based research on brain networks: 1) Most studies have focused on unidirectional information transmission across brain network levels, neglecting joint learning or bidirectional information exchange among networks. 2) Most of the existing models determine node neighborhoods by thresholding or simply binarizing the brain network, which leads to the loss of edge weight information and weakens the model's sensitivity to important information in the brain network. To address the above issues, we propose a multi-level dynamic brain network joint learning architecture based on GCN for autism spectrum disorder (ASD) diagnosis. Specifically, firstly, constructing different-level dynamic brain networks. Then, utilizing joint learning based on GCN for interactive information exchange among these multi-level brain networks. Finally, designing an edge self-attention mechanism to assign different edge weights to inter-node connections, which allows us to pick out the crucial features for ASD diagnosis. Our proposed method achieves an accuracy of 81.5 %. The results demonstrate that our method enables bidirectional transfer of high-order and low-order information, facilitating information complementarity between different levels of brain networks. Additionally, the use of edge weights enhances the representation capability of ASD-related features.
Riemannian frameworks for the harmonization of resting-state functional MRI scans
2024, Medical Image Analysis
Magnetic Resonance Imaging provides unprecedented images of the brain. Unfortunately, scanners and acquisition protocols can significantly impact MRI scans. The development of statistical methods able to reduce this variability without altering the relevant information in the scans, often coined harmonization methods, has been the topic of an increasing research effort supported by the recent growth of publicly available neuroimaging data sets and new possibilities for combining them to achieve greater statistical power. In this work, we focus on the challenges specifically raised by the harmonization of resting-state functional MRI scans. We propose to harmonize resting-state fMRI scans by reducing the impact of covariates such as scanner differences and scanning protocols on their associated functional connectomes and then propagating the changes back to the rs-fMRI time series. We use Riemannian geometric frameworks to preserve the mathematical properties of functional connectomes during their harmonization, and we demonstrate how state-of-the-art harmonization methods can be embedded within these frameworks to reduce covariates effects while preserving the relevant clinical information associated with aging or brain disorders. During our experiments, a large set of synthetic data was generated and processed to compare eighty variants of the proposed approach. The framework achieving the best harmonization was then applied to three low-dimensional data sets made of 712 sets of fMRI time series provided by the ABIDE consortium and two high-dimensional data sets obtained by processing 1527 rs-fMRI scans provided by the Human Connectome Project, the Framingham Heart Study and the Genetics of Brain Structure and Function study. These experiments established that our new framework could successfully harmonize low-dimensional connectomes and voxelwise functional time series and confirmed the need for preserving connectomes properties during their harmonization.
Gaussian Process-based prediction of memory performance and biomarker status in ageing and Alzheimer's disease—A systematic model evaluation
2023, Medical Image Analysis
Neuroimaging markers based on Magnetic Resonance Imaging (MRI) combined with various other measures (such as genetic covariates, biomarkers, vascular risk factors, neuropsychological tests etc.) might provide useful predictions of clinical outcomes during the progression towards Alzheimer’s disease (AD). The use of multiple features in predictive frameworks for clinical outcomes has become increasingly prevalent in AD research. However, many studies do not focus on systematically and accurately evaluating combinations of multiple input features. Hence, the aim of the present work is to explore and assess optimal combinations of various features for MR-based prediction of (1) cognitive status and (2) biomarker positivity with a multi-kernel learning Gaussian process framework. The explored features and parameters included (A) combinations of brain tissues, modulation, smoothing, and image resolution; (B) incorporating demographics & clinical covariates; (C) the impact of the size of the training data set; (D) the influence of dimensionality reduction and the choice of kernel types. The approach was tested in a large German cohort including 959 subjects from the multicentric longitudinal study of cognitive impairment and dementia (DELCODE). Our evaluation suggests the best prediction of memory performance was obtained for a combination of neuroimaging markers, demographics, genetic information (ApoE4) and CSF biomarkers explaining 57% of outcome variance in out-of-sample predictions. The highest performance for A $β$ 42/40 status classification was achieved for a combination of demographics, ApoE4, and a memory score while usage of structural MRI further improved the classification of individual patient’s pTau status.

View all citing articles on Scopus

View full text

Deriving reproducible biomarkers from multi-site resting-state data: An Autism-based example

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Material and methods

Results

Discussion

Acknowledgments

Neuroimage

Neuroimage

Neuroimage

Neuroimage

J. Neurosci. Methods

Biol. Psychiatry

Neuron

Curr. Opin. Neurobiol.

NeuroImage

Neuroimage

Neuroimage

Brain Res.

NeuroImage

J. Multivar. Anal.

Neuroimage

Appl. Comput. Harmon. Anal.

Neuroimage

Neuroimage

Neuroimage

Neuroimage

Neuroimage

Neuroimage

Neuroimage

NeuroImage

NeuroImage

Neurosci. Biobehav. Rev.

Neuroimage

Schizophr Res.

Decreased interhemispheric functional connectivity in autism

Cereb. Cortex

Functional connectivity magnetic resonance imaging classification of autism

Brain

Advanced normalization tools (ants)

Insight J

Probabilistic independent component analysis for functional magnetic resonance imaging

Trans. Med Im.

Power failure: why small sample size undermines the reliability of neuroscience

Nat. Rev. Neurosci.

A method for making group inferences from fMRI data using independent component analysis

Hum. Brain Mapp.

Classification of alzheimer disease, mild cognitive impairment, and normal cognitive status with large-scale network analysis based on resting-state functional mr imaging

Radiology