Skip to main content


Weitere Artikel dieser Ausgabe durch Wischen aufrufen

12.03.2021 | Brief Communication | Ausgabe 5/2021 Open Access

Cognitive Neurodynamics 5/2021

The database with more than 1000 robust human connectomes in five resolutions

Cognitive Neurodynamics > Ausgabe 5/2021
Bálint Varga, Vince Grolmusz
Wichtige Hinweise

Supplementary Information

The online version contains supplementary material available at https://​doi.​org/​10.​1007/​s11571-021-09670-5.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Connectomes or braingraphs are compact and focused derivatives of the diffusion magnetic resonance images (MRIs) of the brain: their vertices are labeled by the anatomical areas, and two such vertices are connected by a weighted graph-edge, if a tractography workflow Besson et al. ( 2014) finds neural tracks between the areas, corresponded to the vertices. By focusing on the connections between cerebral areas instead of analyzing the whole MR image, we can make use of the rich and refined resources of graph theory, born with the famous article of Leonhard Euler on the problem of the Königsberg Bridges Euler ( 1741) in 1741.
Our research group earlier has prepared several undirected and directed braingraph sets (Kerepesi et al. 2016, 2017; Szalkai et al. 2015a, 2017a, 2019a) from the 500 Subjects Data Release McNab et al. ( 2013) of the Human Connectome Project (HCP). The resulting graphs were made available at the site https://​braingraph.​org, and were applied in several structural studies of the human brain (Szalkai et al. 2015b; Kerepesi et al. 2018a; Szalkai et al. 2019a; Kerepesi et al. March 2018b; Szalkai et al. Feb 2019b, 2018; Szalkai et al. 2017b; Szalkai et al. 2016; Fellner et al. 2019, 2020a, 2020b).
In the present contribution we describe a new braingraph set, computed from the 1200 Subjects Data Release of the Human Connectome Project McNab et al. ( 2013). The set contains 1064 connectomes, each in five resolutions, and each edge is weighted by three different weight functions. Our dataset may serve as a robust resource for the computational neuroscience community in the coming years.


The data source of the workflow is the 1200 Subjects Data Release of the Human Connectome Project (HCP) McNab et al. ( 2013), documented at the site https://​www.​humanconnectome.​org/​study/​hcp-young-adult/​document/​1200-subjects-data-release. For the present study the “re-preprocessed” 3T diffusion data was applied, as was detailed at the HCP site.
The Connectome Mapper Tool Kit (CMTK) workflow Daducci et al. ( 2012) was utilized in the graph computation on the HCP data. For each subject, we have applied the segmentation and the parcellation steps only once, but the probabilistic tractography part of the workflow 10 times. The parcellation scheme was the Lausanne2008 atlas, the labels applied are listed in https://​github.​com/​LTS5/​cmp_​nipype/​blob/​master/​cmtklib/​data/​parcellation/​lausanne2008/​ParcellationLaus​anne2008.​xls.
The graph construction was performed in the following steps:
For each subject the MRtrix 0.3 tractography algorithm Tournier et al. ( 2012) was run, with probabilistic seeding and probabilistic tractography. The number of streamlines was set to 1 million. For defining the graph edges, let us consider two distinct, anatomically labeled areas of the cortical- or sub-cortical gray areas of the brain, denoted by A and B. If the tractography algorithm found at least one streamline between the area A and B, then vertex a, representing area A was connected to vertex b, representing area B, by a graph edge. The three weights of \(\{a,b\}\) give the number of streamlines or fibers found between areas A and B, the average length of the streamlines, and the mean fractional anisotropy of the streamlines.
Step 1 was repeated 10 times for each subject. We accepted \(\{a,b\}\) to be an edge of the connectome of the subject, if it was present in all ten graphs computed in the repetitions. Next, for each edge we computed the maximum and the minimum number of the fibers, defining that edge, and deleted those two extremal values. Consequently, there remained 8 fiber numbers for each edge. We computed the mean value of those fiber numbers, the mean value of the lengths of the streamlines and the fractional anisotropies for the three weights of the edge.
In other words, the probabilistic tractography was performed 10 times, the graphs were constructed after each run, (i.e., 10 graphs were constructed for each subject), next the extremal fiber number values were deleted, the remaining 8 values were averaged, and the edges, which were present in all 10 graphs were allowed to be included in the resulting graph.
Steps 1 and 2 were performed only in the highest (i.e., the finest) resolution with 1015 vertices. For lower resolutions, the graphs were computed from the 1015-vertex graph by contracting vertices, summing the fiber numbers of the multiple edges between the two contracted vertices and contracting the multiple edges.
On the choice of 10 as the repetition number of the probabilistic tractography we refer to the detailed analysis in the “ Discussion and results” section below.
From the dataset of the HCP website we were able to finish the graph computations for 1064 subjects.
The computation was done on our 24-member Intel i7 cluster (each with 6 physical and 12 virtual CPU cores and 16 GB of RAM) within 3 weeks running time.

Data records

The data source of this work was published at the Human Connectome Project’s website at http://​www.​humanconnectome.​org/​McNab et al. ( 2013) as the 1200 Subjects Public Release. The parcellation data, containing the anatomically labeled ROIs, is listed in the CMTK nypipe GitHub repository https://​github.​com/​LTS5/​cmp_​nipype/​blob/​master/​cmtklib/​data/​parcellation/​lausanne2008/​ParcellationLaus​anne2008.​xls.
The braingraphs, computed by us, can be accessed at the https://​braingraph.​org/​cms/​download-pit-group-connectomes/​ site, by selecting one of the download options, denoted by “X nodes set, 1064 brains, 1 000 000 streamlines, 10x repeated”, where \(X=86, 129, 234, 463, 1015\).
The graphs are given in GraphML format, described in https://​cmtk.​org Daducci et al. ( 2012). Each file begins with an attribute definition section, then the nodes are described with their coordinates and anatomical labels, corresponding to the parcellation at https://​github.​com/​LTS5/​cmp_​nipype/​blob/​master/​cmtklib/​data/​parcellation/​lausanne2008/​ParcellationLaus​anne2008.​xls.
Next the (un-directed) edges are listed. The edges carry three weights:
  • The number of fibers;
  • The mean value of the fiber lengths in the edge;
  • And the mean fractional anisotropy of the fibers
Note that the edge weights are averages from the eight of the ten tractography-runs, therefore, even the fiber number is—typically —a non-integer.

Discussion and results

Here we describe the workflow, which implied the choice of the 10 repetitions of step 1 in the graph construction above. We note that the present section describes only the process, resulting the specific choice of the repetition number 10, and not the actual graph construction (which was already duly described in the “Methods” section).
The implementations of the deterministic tractography algorithms also contain a probabilistic seeding step; i.e., two runs of these tractography computations almost always yield different results. When we use probabilistic tractography Girard et al. Sep ( 2014); Buchanan et al. Feb ( 2014), it is evident that distinct runs yield different results.
For generating reproducible results in the graph construction with a probabilistic tractography phase, it is a natural idea to repeat the probabilistic tractography algorithm for the very same input several times, and to average the results of the tractography in a careful way.
Let us fix two vertices, and let the random variable X denote the number of fibers discovered between then, then, clearly, for any X: \(E(X-E(X))=E(X)-E(X)=0\), that is, the expectation of the difference of X from its expected value E( X) is 0. This fact implies that the repetitions and the averaging will increase the reliability of the tractography results.
For the determination of the number of repetitions k, with the trade-off with practical computability and robustness, we have followed the strategy, described as follows. In short, we determined the number of necessary repetitions by comparing deviations for 10 average values, each for k repetitions, for \(k=1,2,\ldots ,50\).
More exactly, we have chosen 9 subjects: for each non-zero leading digits of the ID numbers, one was chosen randomly (the choices were: 136631, 200008, 300618, 401422, 500222, 601127,700634, 800941, 901038). For a given subject, and a given positive integer value k, we have generated the following ten braingraphs:
$$\begin{aligned} {G_k}_1, {G_k}_2, \ldots {G_k}_{10}, \end{aligned}$$
where \({G_k}_i\) was calculated by k repetitions of the tractography phase, and averaging the numbers of fibers for each edge on the k runs.
For \(i=1,2,\ldots ,10\), we have generated independent k instances, and averaged these k fiber numbers for each edge. Next, we have thrown out those edges, which were not present in all the ten copies of the averaged graphs. Now, for each remaining edge \(\{u,v\}\) of the graph G, we computed the average fiber number values over k repetitions: one average value \(w^{(k)}_i(u,v)\) for each i in \({G_k}_i\), for \(i=1,2,\ldots ,10\). For readability, we omit ( uv) from \(w^{(k)}_i(u,v)\) in what follows.
For these ten \(w^{(k)}_i\) values we computed the relative standard deviation (also called coefficient of variation) of the ten \(w^{(k)}_i\) values:
$$\begin{aligned} c_v(w^{(k)})={\sigma (w^{(k)})\over \mu (w^{(k)})}, \end{aligned}$$
$$\begin{aligned} \mu (w^{(k)})={ 1\over 10}\sum _{i=1}^{10}w^{(k)}_i, \ \ \sigma (w^{(k)})=\sqrt{{1\over 9}\sum _{i=1}^{10} (w^{(k)}_i-\mu (w^{(k)}))^2} \end{aligned}$$
Figure  1 displays the change of the relative standard deviation of the fiber number of a given edge (the edge, connecting vertex 19 and vertex 21 in the 463-vertex resolution in the case of subject No. 901038) for \(k=1,2,\ldots ,50\).
Figure  2 shows the change of the relative standard deviations, averaged for all edges as a function of k, in the case of a given braingraph, in 234-vertex resolution. Supporting Figures 1, 2, 3 and 4 show the same in graphs of different resolutions.
Based on the visual examination of Figure  2 (and the related figures for other resolutions and subjects, cf. Supporting Figs. 1, 2, 3 and 4), we have chosen the \(k=10\) value for repetitions as a good trade-off between deviation and practical computability: for repetitions \(k>10\) the decrease of the red horizontal lines, showing the median relative standard deviations, is very small on Fig.  2 and Supporting Figs. 1 and 2, and still small on Supporting Figs. 3 and 4.


Data were provided in part by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University. VG and BV were partially supported by the VEKOP-2.3.2-16-2017-00014 program, supported by the European Union and the State of Hungary, co-financed by the European Regional Development Fund, and the NKFI-127909 grant of the National Research, Development and Innovation Office of Hungary. VG and BV was supported in part by the EFOP-3.6.3-VEKOP-16-2017-00002 grant, supported by the European Union, co-financed by the European Social Fund.


Conflicts of interest

The authors declare no conflicts of interest.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Über diesen Artikel

Weitere Artikel der Ausgabe 5/2021

Cognitive Neurodynamics 5/2021 Zur Ausgabe