Elsevier

NeuroImage

Volume 84, 1 January 2014, Pages 245-253
NeuroImage

Integrative analysis of the connectivity and gene expression atlases in the mouse brain

https://doi.org/10.1016/j.neuroimage.2013.08.049Get rights and content

Highlights

  • We study the correlation between gene expression and brain connectivity.

  • We predict connectivity in the mouse brain using ensemble methods.

  • Gene expressions can predict brain connectivity.

  • We identify genes that generate brain connectivity.

  • Gene expressions are correlated with brain connectivity.

Abstract

Brain function is the result of interneuron signal transmission controlled by the fundamental biochemistry of each neuron. The biochemical content of a neuron is in turn determined by spatiotemporal gene expression and regulation encoded into the genomic regulatory networks. It is thus of particular interest to elucidate the relationship between gene expression patterns and connectivity in the brain. However, systematic studies of this relationship in a single mammalian brain are lacking to date. Here, we investigate this relationship in the mouse brain using the Allen Brain Atlas data. We employ computational models for predicting brain connectivity from gene expression data. In addition to giving competitive predictive performance, these models can rank the genes according to their predictive power. We show that gene expression is predictive of connectivity in the mouse brain when the connectivity signals are discretized. When the expression patterns of 4084 genes are used, we obtain a predictive accuracy of 93%. Our results also show that a small number of genes can almost give the full predictive power of using thousands of genes. We can achieve a prediction accuracy of 91% by using only 25 genes. Gene ontology analysis of the highly ranked genes shows that they are enriched for connectivity related processes.

Introduction

The mammalian brain contains a large number of cells connected into an interaction network that controls the information flow among neurons (Swanson, 2003, Watson et al., 2010, Watson et al., 2011). The brain connectome plays a pivotal role in generating the cognition, emotion, and perception of an organism. Neurological diseases, such as autism and schizophrenia, are commonly found to be associated with abnormal brain connectivity (Geschwind and Levitt, 2007, Just et al., 2007, Lawrie et al., 2002). Hence, understanding the brain functional circuitry becomes one of the central research themes in neuroscience. At the cellular level, each neuron is largely unique in the sense that it contains a unique combination of proteins that determine how the neuron functions. At the molecular level, the proteins in a neuron are encoded by the genome, which also contains regulatory sequences to control when and where each gene is turned on or off at what level. In other words, the fundamental biochemistry of neurons is determined by spatiotemporal gene expression and regulation encoded into the genomic regulatory networks. This prompts research efforts to characterize the cellular localization of gene expression in the brain and investigate the relationship between genome and connectome (Boguski and Jones, 2004, Bota et al., 2003, Carson et al., 2005, Dong et al., 2009, Ji, 2011, Lichtman and Sanes, 2008, Swanson, 2011, Thompson et al., 2008, Toledo-Rodriguez et al., 2004, Zheng and Rajapakse, 2006).

The initial attempts to investigate the relationship between gene expression and neuronal connectivity focused on the nervous system of Caenorhabditis elegans, because the synaptic connectivity in this organism is known. In Varadan et al. (2006), computational techniques were presented to link gene expression and neuronal connectivity. In addition, sets of synergistically interacting genes were identified based on entropy minimization and Boolean parsimony. Experimental results showed that the synergistic expressions of a subset of genes are predictive of neuronal connectivity. Kaufman et al. (2006) used correlation and prediction analysis assays to study the relationship between gene expression and neuronal connectivity in C. elegans. They showed that the expression signature of a neuron carries significant information about its synaptic connectivity. They also identified a list of putative genes that retain high predictive power. Baruch et al. (2008) studied the molecular markers and logic that direct synapse formation in C. elegans. They built a probabilistic model and attempted to explain the neuronal connectivity diagram of C. elegans as a function of the expression patterns of its neurons. Their results showed that the synaptic connections in C. elegans can be predicted by using the expression patterns of only a small number of genes.

Motivated by prior research results on C. elegans, a few recent studies have attempted to investigate the relationship between gene expression and brain connectivity in the rodent brain. Since the gene expression and brain connectivity data were not available in a single rodent species when those studies were performed, they usually fused data from two different species (French and Pavlidis, 2011, Wolf et al., 2011). Specifically, French and Pavlidis (2011) used the gene expression data of the mouse brain from the Allen Brain Atlas (Sunkin et al., 2013) and the connectivity data of the rat brain from the Brain Architecture Management System (Bota and Swanson, 2008) to study the relationship between gene expression and brain connectivity. By using a series of covariation analysis techniques, they reported that gene expressions in the mouse brain are correlated to the connectivity in the rat brain. In addition, they identified a set of genes that are most correlated with connectivity. Wolf et al. (2011) used the same sets of data and tried to predict regional connectivity in the rat brain by using gene expression data from the mouse brain. They also identified a set of highly predictive genes whose functional roles in disease conditions were evaluated.

In this work, we study the relationship between gene expression and brain connectivity in a single rodent brain, namely the mouse brain. Our investigation is made possible by the recent release of the mouse brain connectivity data from the Allen Mouse Brain Connectivity Atlas (Allen Institute for Brain Science, 2012d). By integrating this resource with the Allen Mouse Brain Atlas data (Allen Institute for Brain Science, 2012b, Lein et al., 2007), we attempt to systematically study the relationship between gene expression and brain connectivity in a single mammalian brain. We employ ensemble learning methods (Zhou, 2012) for predicting the brain connectivity using gene expression data. These methods generate many base models by randomly sampling the original training data, thereby yielding accurate and robust predictions (Geremia et al., 2011, Gray et al., 2013, Liu et al., 2012, Yuan et al., 2012b). We consider two types of base models in this work, that is, tree and sparse models, which have been commonly used in neurological applications (Cribben et al., 2012, de Brecht and Yamagishi, 2012, Geremia et al., 2011, Gray et al., 2013, Ryali et al., 2010, Ye et al., 2012). One common and appealing property of these models is that they can perform feature selection and prediction simultaneously, thereby enabling us to identify genes that retain high predictive power.

Our results show that gene expression is predictive of connectivity in the mouse brain when the connectivity signals are discretized. When the expression patterns of 4084 genes are used, we obtain a predictive accuracy of 93%. Our results also show that the expression patterns for a small number of genes can almost give the full predictive power of using thousands of genes. We can achieve a prediction accuracy of 91% by using the expression patterns of only 25 genes. Gene ontology analysis of the highly ranked genes shows that they are significantly enriched for connectivity related processes. We also performed covariation analysis on the gene expression and connectivity data. Our results show that gene expression and connectivity are correlated in the mouse brain. We show that our results on prediction and covariation analysis are significant when the spatial autocorrelation effects are considered.

Section snippets

Allen Mouse Brain Connectivity Atlas

The Allen Mouse Brain Connectivity Atlas (the Connectivity Atlas) provides 3-D, high-resolution maps of neural connections in the adult mouse brain (Allen Institute for Brain Science, 2012d). In this atlas, axonal projections mapped from major anatomical regions are labeled by recombinant adeno-associated virus tracers and visualized using serial two-photon tomography. The primary data consist of high-resolution images that capture the axonal projections from anatomic regions throughout the

Results and discussion

In this section, we report the results of brain connectivity prediction using ensemble methods. For each prediction task, we apply five-fold cross validation and use the area under the ROC curve (AUC) as the performance measure (Kaufman et al., 2006, Wolf et al., 2011). In this procedure, the samples are split into five (approximately) equally-sized subsets. Four subsets are used to train a model, and the fifth subset is used for performance evaluation. This process is iterated five times so

Conclusions

In this work, we investigate the relationship between gene expression and structure-level connectivity in the mouse brain. We employ two types of ensemble models, i.e., ensemble of trees and ensemble of sparse models, for predicting brain connectivity using gene expression data. Our results show that gene expression is predictive of connectivity in the mouse brain when the connectivity signals are discretized. In addition, we show that the expression data for a small number of genes can achieve

Acknowledgments

We thank the Allen Institute for Brain Science for making the Allen Brain Atlas data available. We thank Chinh Dang, David Feng, Leon French, Terri Gilbert, Chen Goldberg, Michael Hawrylycz, Nathan Manor, Luis Puelles, Carol Thompson, and Lior Wolf for assistance in interpreting the data and results. This work is supported by the National Science Foundation grant DBI-1147134 and by the Old Dominion University Office of Research.

References (70)

  • J. Ye et al.

    Sparse learning and stability selection for predicting MCI to AD conversion using baseline ADNI data

    BMC Neurol.

    (2012)
  • L. Yuan et al.

    Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data

    Neuroimage

    (2012)
  • D. Zhang et al.

    Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer's disease

    Neuroimage

    (2012)
  • X. Zheng et al.

    Learning functional structure from fMR images

    Neuroimage

    (2006)
  • Allen Institute for Brain Science

    Allen Brain Atlas API

  • Allen Institute for Brain Science

    Allen Mouse Brain Atlas [Internet]

  • Allen Institute for Brain Science

    Allen Mouse Brain Atlas: Technical White Paper: Informatics Data Processing

  • Allen Institute for Brain Science

    Allen Mouse Brain Connectivity Atlas [Internet]

  • Allen Institute for Brain Science

    Allen Mouse Brain Connectivity Atlas: Technical White Paper: Informatics Data Processing

  • A. Altmann et al.

    Permutation importance: a corrected feature importance measure

    Bioinformatics

    (2010)
  • Y. Amit et al.

    Shape quantization and recognition with randomized trees

    Neural Comput.

    (1997)
  • M. Ashburner et al.

    Gene Ontology: tool for the unification of biology

    Nat. Genet.

    (2000)
  • O. Banerjee et al.

    Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data

    J. Mach. Learn. Res.

    (2008)
  • L. Baruch et al.

    Using expression profiles of Caenorhabditis elegans neurons to identify genes that mediate synaptic connectivity

    PLoS Comput. Biol.

    (2008)
  • M.S. Boguski et al.

    Neurogenomics: at the intersection of neurobiology and genome sciences

    Nat. Neurosci.

    (2004)
  • M. Bota et al.

    BAMS neuroanatomical ontology: design and implementation

    Front. Neuroinformatics

    (2008)
  • M. Bota et al.

    From gene networks to brain networks

    Nat. Neurosci.

    (2003)
  • S. Boyd et al.

    Distributed optimization and statistical learning via the alternating direction method of multipliers

    Found. Trends Mach. Learn.

    (2011)
  • E.I. Boyle et al.

    GO::TermFinder—open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes

    Bioinformatics

    (2004)
  • L. Breiman

    Random forests

    Mach. Learn.

    (2001)
  • P. Bühlmann

    Bagging, boosting and ensemble methods

  • C.J.C. Burges

    A tutorial on support vector machines for pattern recognition

    Data Min. Knowl. Disc.

    (1998)
  • J.D. Cahoy et al.

    A transcriptome database for astrocytes, neurons, and oligodendrocytes: a new resource for understanding brain development and function

    J. Neurosci.

    (2008)
  • J.P. Carson et al.

    A digital atlas to characterize the mouse brain transcriptome

    PLoS Comput. Biol.

    (2005)
  • R. Caruana et al.

    An empirical comparison of supervised learning algorithms

  • Cited by (0)

    View full text