Deriving reproducible biomarkers from multi-site resting-state data: An Autism-based example
Graphical abstract
Introduction
In psychiatry, as in other fields of medicine, both the standardized observation of signs, as well as the symptom profile are critical for diagnosis. However, compared to other fields of medicine, psychiatry lacks accompanying objective markers that could lead to more refined diagnoses and targeted treatment (Kapur et al., 2012). Advances in non-invasive brain imaging techniques and analyses (e.g. Craddock et al., 2013, Van Essen and Ugurbil, 2012) are showing great promise for uncovering patterns of brain structure and function that can be used as objective measures of mental illness. Such neurophenotypes are important for clinical applications such as disease staging, determination of risk prognosis, prediction and monitoring of treatment response, and aid towards diagnosis (e.g. Castellanos et al. (2013)).
Among the many imaging techniques available, resting-state fMRI (R-fMRI) is a promising candidate to define functional neurophenotypes (Kelly et al., 2008, Van Essen and Ugurbil, 2012). In particular, it is non-invasive and, unlike conventional task-based fMRI, it does not require a constrained experimental setup nor the active and focused participation of the subject. It has been proven to capture interactions between brain regions that may lead to neuropathology diagnostic biomarkers (Greicius, 2008). Numerous studies have linked variations in brain functional architecture measured from R-fMRI to behavioral traits and mental health conditions such as Alzheimer disease (e.g. Greicius et al., 2004, Chen et al., 2011, Schizophrenia (e.g. Garrity et al.,, Zhou et al., 2007, Jafri et al., 2008, Calhoun et al.,), ADHD, autism (e.g. Plitt et al., 2015 and others (e.g. Anderson et al. 2011 ). Extending these findings, predictive modeling approaches have revealed patterns of brain functional connectivity that could serve as biomarkers for classifying depression (e.g. Craddock et al.), ADHD (e.g. Consortium et al.), autism (e.g. Anderson et al. (2011)), and even age (Dosenbach et al., 2010). This growing number of studies has shown the feasibility of using R-fMRI to identify biomarkers. However questions about the readiness of R-fMRI to detect clinically useful biomarkers remain (Plitt et al., 2015). In particular, the reproducibility and generalizability of these approaches in research or clinical settings are debatable. Given the modest sample size of most R-fMRI studies, the effect of cross-study differences in data acquisition, image processing, and sampling strategies (Desmond and Glover, 2002, Murphy and Garavan, 2004, Thirion et al., 2007) has not been quantified.
Using larger datasets is commonly cited as a solution to challenges in reproducibility and statistical power (Button et al., 2013). They are considered a prerequisite to R-fMRI-based classifiers for the detection of psychiatric illness. Recent efforts have accelerated the generation of large databases through sharing and aggregating independent data samples (Fair et al., Mennes et al., 2013, Di Martino et al., 2014). However, a number of concerns must be addressed before accepting the utility of this approach. Most notably, the many potential sources of uncontrolled variation that can exist across studies and sites, which range from MRI acquisition protocols (e.g. scanner type, imaging sequence, see Friedman et al. (2008)), to participant instructions (e.g. eyes open vs. closed, see Yan et al. (2013)), to recruitment strategies (age-group, IQ-range, level of impairment, treatment history and acceptable comorbidities). Such variation in aggregate samples is often viewed as dissuasive, as its effect on diagnosis and biomarker extraction is unknown. It commonly motivates researchers to limit the number of sites included in their analyses at the cost of sample size.
Cross-validated results obtained from predictive models are more robust to inhomogeneities: they measure model generalizability by applying it to unseen data, i. e. , data not used to train the model. In particular, leave-out cross-validation strategies, which remove single individuals (or random subsets), are common in biomarkers studies. However, these strategies do not measure the effect of potential site-specific confounds. In the present study we leverage aggregated R-fMRI samples to address this problem. Instead of leaving out random subsamples as test sets, we left out entire sites to measure performance in the presence of uncontrolled variability.
Beyond challenges due to inter-site data heterogeneity, choices in the functional-connectivity data-processing pipeline further add to the variability of results (Carp, 2012, Yan et al., 2013, Shirer et al.,). While preprocessing procedures are now standard, the different steps of the prediction pipeline vary from one study to another. These entail specifying regions of interest, extracting regional time courses, computing connectivity between regions, and identifying connections that relate to subject's phenotypes (Craddock et al.,; Richiardi et al., 2011, Shirer et al., 2012, Eickhoff et al., 2015).
Lack of ground truth for brain functional architecture undermines the validation of R-fMRI data-processing pipelines. The use of functional connectivity for individual prediction suggests a natural figure of merit: prediction accuracy. We contribute quantitative evaluations, to help settling down on a parameter-free pipeline for R-fMRI. Using efficient implementations, we were able to evaluate many pipeline options and select the best method to estimate atlases, extract connectivity matrices, and predict phenotypes.
To demonstrate that pipelines to extract R-fMRI neuro-phenotypes can reliably learn inter-site biomarkers of psychiatric status on inhomogeneous data, we analyzed R-fMRI in the Autism Brain Imaging Data Exchange (ABIDE) (Di Martino et al., 2014). It compiles a dataset of 1112 R-fMRI participants by gathering data from 17 different sites. After preprocessing, we selected 871 to meet quality criteria for MRI and phenotypic information. Our inter-site prediction methodology reproduced conditions found under most clinical settings, by leaving out whole sites and using them as newly seen test sets. To validate the robustness of our approach, we performed nested cross-validation and varied samples per inclusion criteria (e.g. sex, age). Finally, to assess losses in predictive power associated with using a heterogeneous aggregate dataset instead of uniformly defined samples, we included a comparison of intra- and inter-site prediction strategies.
Section snippets
Material and methods
A connectome is a functional-connectivity matrix between a set brain regions of interest (ROIs). We call such a set of ROIs an atlas, even though some of the methods we consider extract the regions from the data rather than relying on a reference atlas (see Fig. 1). We investigate here pipelines that discriminate individuals based on the connection strength of this connectome (Varoquaux and Craddock, 2013), with a classifier on the edge weights (Craddock et al.,).
Specifically, we consider
Results
Here we present the most notable trends emerging from our analyses. More details are provided in supplementary materials. First, we compared inter- and intra-site prediction of ASD diagnostic labels while varying the number of subjects in the training set. Second, in a post-hoc analysis, we identified the pipeline choices most relevant for prediction and proposed a good choice of pipeline steps for prediction. Finally, by analyzing the weights of the classifiers, we highlighted functional
Discussion
We studied pipelines that extract neurophenotypes from aggregate R-fMRI datasets through the following steps: (1) region definition from R-fMRI, (2) extraction of regions activity time series (3) estimation of functional interactions between regions, and (4) construction of a discriminant model for brain-based diagnostic classification. The proposed pipelines can be built with the Nilearn neuroimaging-analysis software and the atlas computed with MSDL is available for download.13
Acknowledgments
We acknowledge funding from the NiConnect project (ANR-11-BINF-0004 NiConnect) and the SUBSample project from the DIGITEO Institute, France. The effort of Adriana Di Martino was partially supported by the NIMH grant 1R21MH107045-01A1. Support for M.P. Milham made possible by gifts to the Child Mind Institute from Phyllis Green, Randolph Cowen, and Joseph Healey. Finally, we thank the anonymous reviewers for their feedback, as well as all sites and investigators who have worked to share their
References (76)
- et al.
A component based noise correction method (CompCor) for bold and perfusion based fMRI
Neuroimage
(2007) The secret lives of experiments: methods reporting in the fMRI literature
Neuroimage
(2012)- et al.
Clinical applications of the functional connectome
Neuroimage
(2013) - et al.
An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest
Neuroimage
(2006) - et al.
Estimating sample size in functional MRI (fMRI) neuroimaging studies: statistical power analyses
J. Neurosci. Methods
(2002) - et al.
Aberrant striatal functional connectivity in children with autism
Biol. Psychiatry
(2011) - et al.
Disrupted neural synchronization in toddlers with autism
Neuron
(2011) - et al.
Autism spectrum disorders: developmental disconnection syndromes
Curr. Opin. Neurobiol.
(2007) - et al.
On clustering fMRI time series
NeuroImage
(1999) - et al.
A method for functional network connectivity among spatially independent resting-state components in schizophrenia
Neuroimage
(2008)
Competition between functional brain networks mediates behavioral variability
Neuroimage
Atypical functional lateralization of language in autism spectrum disorders
Brain Res.
Discovering structure in the space of fMRI selectivity profiles
NeuroImage
A well-conditioned estimator for large-dimensional covariance matrices
J. Multivar. Anal.
Motion or activity: their role in intra-and inter-subject variation in fMRI
Neuroimage
A randomized algorithm for the decomposition of matrices
Appl. Comput. Harmon. Anal.
Making data sharing work: the fcp/indi experience
Neuroimage
Abnormalities of intrinsic functional connectivity in autism spectrum disorders
Neuroimage
An empirical investigation into the number of subjects required for an event-related fmri study
Neuroimage
People thinking about thinking people: the role of the temporo-parietal junction in theory of mind
Neuroimage
Network modelling methods for fMRI
Neuroimage
Analysis of a large fMRI cohort: statistical and methodological issues for group analyses
Neuroimage
The future of the human connectome
Neuroimage
Learning and comparing functional connectomes across subjects
NeuroImage
A group model for stable multi-subject ICA on fMRI datasets
NeuroImage
Brain connectivity and high functioning autism: a promising path of research that needs refined models, methodological convergence, and stronger behavioral links
Neurosci. Biobehav. Rev.
Standardizing the intrinsic brain: towards robust measurement of inter-individual variation in 1000 functional connectomes
Neuroimage
Functional disintegration in paranoid schizophrenia using resting-state fMRI
Schizophr Res.
Decreased interhemispheric functional connectivity in autism
Cereb. Cortex
Functional connectivity magnetic resonance imaging classification of autism
Brain
Advanced normalization tools (ants)
Insight J
Probabilistic independent component analysis for functional magnetic resonance imaging
Trans. Med Im.
Power failure: why small sample size undermines the reliability of neuroscience
Nat. Rev. Neurosci.
A method for making group inferences from fMRI data using independent component analysis
Hum. Brain Mapp.
Classification of alzheimer disease, mild cognitive impairment, and normal cognitive status with large-scale network analysis based on resting-state functional mr imaging
Radiology
Cited by (499)
Decoding Autism: Uncovering patterns in brain connectivity through sparsity analysis with rs-fMRI data
2024, Journal of Neuroscience MethodsMulti-head self-attention mechanism-based global feature learning model for ASD diagnosis
2024, Biomedical Signal Processing and ControlJoint learning of multi-level dynamic brain networks for autism spectrum disorder diagnosis
2024, Computers in Biology and MedicineRiemannian frameworks for the harmonization of resting-state functional MRI scans
2024, Medical Image Analysis