Skip to main content
Erschienen in: Vietnam Journal of Computer Science 3/2016

Open Access 01.08.2016 | Regular Paper

Ontology-based disease similarity network for disease gene prediction

verfasst von: Duc-Hau Le, Vu-Tung Dang

Erschienen in: Vietnam Journal of Computer Science | Ausgabe 3/2016

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Finding underlying molecular mechanisms of diseases is one of the important issues in biomedical research. In which, prediction of novel disease-associated genes is mostly focused. Many methods have been proposed based on biological networks and shown effectively for the problem. These network-based methods are usually relied on a “disease module” principle that functionally similar genes are associated with similar phenotypes or diseases. Among them, methods solely based on gene/protein networks only exploit that principle by structural modules in the gene/protein networks. Meanwhile, others based on integration of these networks with a disease similarity network better exploit the principle and consequently result in higher prediction performance. In these studies, the disease similarity network is extracted from a disease similarity matrix which was calculated using text mining techniques on OMIM records. Considering that diseases have been recently well annotated by human phenotype ontology (i.e., a controlled vocabulary database) and semantic similarity measures can be used to calculate similarities among them. Therefore, it would be more accurate to construct disease similarity network based on semantic similarity measures on phenotype ontology database. In this study, we constructed such network and integrated them with several kinds of gene/protein networks. Experiment results show that the ontology-based disease similarity network much improves the prediction performance compared to the one based on OMIM records, irrespective of gene/protein networks. In addition, we show ability of our method in predicting novel Alzheimer’s disease-associated genes, in which 19 out of top 100 ranked candidate genes are supported with evidences from literature.
Hinweise

Electronic supplementary material

The online version of this article (doi:10.​1007/​s40595-016-0063-3) contains supplementary material, which is available to authorized users.

1 Introduction

Disease gene prediction, the task of identifying the most plausible candidate disease genes, is an important issue in biomedical research and many studies have been done for this [1, 2]. Identification of disease-associated genes also leads to more effective researches about therapies for genetic diseases and gradually approaches a future of personalized medicine [35]. In past decades, linkage analysis was usually used to identify novel disease genes, in which susceptible loci including hundreds of genes are investigated, and thus it is much costly for doing many experiments in wet lab. Therefore, ranking/prioritization methods for such candidate genes are introduced (i.e., genes are ranked by their relevance to a disease of interest). Highly ranked genes are further investigated to find out associated biomedical evidences. And therefore, the goal of gene ranking/prioritization is to predict novel disease-associated genes.
The prediction of novel disease-associated genes are usually approached by three main directions: (1) functional annotation based; (2) machine learning based; and (3) network based. In which, functional annotation-based methods have prioritized candidate genes by measuring the degree of similarity of each candidate genes to a set of known disease genes based on profiles which were built from many functional annotation data sources [68]. Therefore, those methods mostly focused on the integration of various biological datasets to obtain more accurate similarity. However, those approaches are limited in that functional annotation data sources have not covered whole human genome yet. For the second approach, many learning techniques have been applied to predict disease-associated genes. In which, the problem is considered as a classification one, where a classifier is learned from training data; then the learned classifier is used to predict whether or not a test/candidate gene is a disease gene. Briefly, at the early, machine learning-based studies usually approached disease gene prediction as a binary classification problem [9], where the learning samples are comprised of positive training samples and negative training samples [9] such as decision trees (DT) [10, 11] k-nearest neighbor (kNN) [12], naive Bayesian classifier [13, 14], binary support vector machine classifier [1517], artificial neural network (ANN) techniques [18] and random forest (RF) [9]. In these binary classifier-based methods, positive training samples are constructed from known disease genes, whereas negative training samples are the remaining which are not known to be associated with diseases. This is the limitation of binary classifier-based solutions for the disease gene prediction problem, since the negative training set should be actual non-disease genes. However, construction of this set is nearly impossible in biomedical researches. Therefore, more advanced machine learning techniques, which do not require to define a the negative training set, have been recently introduced for this problem [19]. However, the problem was still formulated as a classification, while it should be a ranking/prioritization one. Therefore, methods for prediction of disease-associated genes have extended to network-based ones [20, 21] and shown to outperform functional annotation- and machine learning-based ones [22, 23]. These network-based methods are mostly based on biological networks, which are constructed based on various kinds of biomedical data, and therefore they are not limited by the coverage of functional annotation data sources. In addition, these methods can be considered as positive and unlabeled learning techniques where the rankings of candidate genes are estimated based on their relative similarities to known disease ones and others. Moreover, the dominance of network-based methods is also because they are based on a principle of “disease module” (e.g., functionally similar genes are associated with similar phenotypes or diseases). Among methods solely based on gene/protein networks, a method using a random walk with restart (RWR) algorithm [22, 24, 25] is more dominant compared to other methods such as nearest neighbor, shortest path and clustering [26]. Because this algorithm calculates a global similarity among candidate and known disease genes on whole network and therefore not only genes directly connected to disease genes are considered, but also indirect ones. This algorithm has been successfully applied to other problems such as prediction of disease-associated miRNAs [27] and protein complexes [28]. However, this method can only exploit the “disease module” in the gene/protein network (i.e., genes/proteins associated with the same or similar diseases usually form functional/physical modules on gene/protein interaction networks [2931]).
Recently, a variant of RWR algorithm, namely RWRH, was proposed for a heterogeneous network. This algorithm was then applied to predict disease-associated genes on a heterogeneous network of proteins and disease phenotypes [32]. This network was constructed by integrating a disease similarity network based on text mining algorithms on OMIM records [33] and a protein interaction network. As a result, it was reported that RWRH better exploit “disease module” principle than RWR [22] since then OMIM-based disease similarity network was additionally integrated [32]. More importantly, the RWRH algorithm can be extended to use any network of genes/proteins as well as disease similarity one. Indeed, a recent RWRH-based method has used a semantic similarity network of genes instead of the protein interaction network [34] and shown to outperform the original one [32]. We also note that a disease similarity network can be constructed based on shared disease gene [30], shared pathways [35], shared miRNA [36], shared protein complex [37], shared disease ontology [38] and disease comorbidity [39]. Similarly to RWR, RWRH algorithm has been successfully applied to other problems such as prediction of novel drug–target interactions [40] as well as novel disease-associated miRNAs [41] and long non-coding RNAs [42].
In this study, we extended the use of RWRH algorithm to the prediction of disease-associated gene by integrating semantic similarities among diseases and a gene/protein network. More specifically, considering that disease phenotypes have been recently annotated by human phenotype ontology (shortly called HPO) [43] (i.e., controlled vocabulary database) and a number of semantic similarity measures have been proposed to calculate the similarity between annotated biomedical objects [44], it would be more accurate to calculate the similarity among diseases based on such the measures. Therefore, we constructed a disease similarity network using a semantic similarity measure on HPO. Then, this network was integrated with a gene/protein network by known disease phenotype–gene associations. We compared our method with the one relied on the OMIM-based disease similarity network as in [32, 34]. In which, the gene/protein network can be the protein interaction network as in [32], the gene semantic similarity network as in [34] as well as one constructed based on expression profiles of genes. Experimental results show that the performance of our method is better than that based on the OMIM-based disease similarity network irrespective of the gene/protein networks. This indicates that HPO-based similarity calculation of diseases improves the performance of RWRH algorithm for the prediction of disease-associated genes. In addition, we used our method to find novel genes associated with Alzheimer’s disease. The evidence search from literature about the associations between 100 highly ranked candidate genes and Alzheimer’s disease confirmed 19 of them, which are not yet recorded in public disease–gene association database.
Table 1
Size of gene/protein networks and number of testing disease phenotypes for corresponding heterogeneous networks
#
Gene/protein network
Size (number of genes/proteins, number of interactions)
Number of testing disease phenotypes
1
PPINet
(10,486, 50,791)
2639
2
GENet
(9852, 49,404)
2533
3
GONet
(7897, 41,466)
2345

2 Methods

2.1 Construction of heterogeneous networks of diseases and genes

To build heterogeneous networks of diseases and genes, we constructed two kinds of networks: (1) gene/protein network, which connects genes/proteins by functional interactions; (2) disease similarity network, where a link between two diseases is specified by their similarity. Then, we connected these two networks by a bipartite network consisting of known disease–gene associations. Figure 1 shows construction of such heterogeneous networks of genes/proteins and diseases.
Gene/protein networks
Protein–protein interaction network
First, we collected a human protein interaction network (shortly called PPINet) containing 10,486 genes and 50,791 interactions from NCBI FTP repository.1 Proteins in this network are connected by physical interactions. Therefore, we considered PPINet as an unweighted network.
Gene expression-based similarity network
Second, we constructed a weighted gene network based on gene expression data (shortly called GENet). More specifically, a gene co-expression database comprising 19,777 human genes was downloaded from COXPRESSdb [45]. To measure the similarity between a pair of genes, we employed the mutual rank method, which evaluates the strength of co-expression [46]. The mutual rank ranges from 0 to 19,776 and the normalized value \(w_{ij} =\frac{(19,776-MR(v_i ,v_j ))}{19,776}\), where MR(\(v_{i}, v_{j})\) denotes the mutual rank between gene \(v_{i}\) and \( v_{j}\). The GENet was constructed by replacing the original weight of each link in the PPINet network with the normalized mutual rank value of gene pairs that participate in the network.
Gene ontology-based similarity network
Third, we constructed another weighted gene network based on gene ontology data (shortly called GONet). To construct this network, we used the UniProtKB [47] corpus in the GO annotation database [48]. There were 18,245 Homo sapiens proteins in total. Among them, there were 15,576 proteins annotated with molecular function terms, 14,911 proteins annotated with biological process terms, and 16,983 proteins annotated with cellular component terms. Then, to construct the network, we first needed to introduce the information content (IC). The IC of a term e in the corpus is defined as follows:
$$\begin{aligned} \mathrm{IC}(e)= -\mathrm{log}\left( p(e)\right) , \end{aligned}$$
where p(e) is the probability of e occurring in the corpus, i.e., \(p(e)=\frac{f(e)}{f(\mathrm{root})}\) such that \(f(e)=\mathrm{Annot}(e)\quad +\sum \nolimits _{c \in \mathrm{Children}(e)} {f(c)} \). In this formula, Annot(e) means the number of proteins annotated with e in the corpus, Children(e) represents the set of children terms of e in the GO graph and root is root term of the GO graph. Then, the semantic similarity between the two GO terms, \(e_{i}\) and \(e_{j}\), based on the most informative common ancestor approach [49], is calculated as follows:
$$\begin{aligned} \mathrm{simTerm}(e_i ,e_j )=\mathop {\max }\limits _{c\in P(e_i ,e_j )} (\mathrm{IC}(c)), \end{aligned}$$
where \(P(e_{i}, e_{j})\) is the set of shared ancestors of \(e_{i}\) and \(e_{j}\). The functional similarity between a pair of genes \(v_{i}\) and \( v_{j}\) is calculated as the maximum of simTerm values between all possible pairs of terms as follows:
$$\begin{aligned} \mathrm{simGene}(v_i ,v_j )=\mathop {\max }\limits _{e_i \in T(v_i ),\;e_j \in T(v_j )} \left( \mathrm{simTerm}(e_i ,e_j )\right) , \end{aligned}$$
where T(v) represents the set of terms annotating v. This value is normalized in range [0, 1] to account for an unequal number of GO terms for both genes as follows:
$$\begin{aligned} w_{ij} =\frac{2\times \mathrm{simGene}(v_i ,v_j )}{\mathrm{simGene}(v_i ,v_i )+ \mathrm{simGene}(v_j ,v_j )}. \end{aligned}$$
By employing the sub-ontology databases of biological process, cellular component and molecular function individually (i.e., root terms for these gene sub-ontology graphs are biological process, cellular component and molecular function, respectively), three GO-based weighted networks were constructed, in which the original weight of each link in the PPINet network was replaced by the normalized similarity value \(w_{ij}\) of two genes participating in each link. We referred to these as the BPNet, CCNet and MFNet networks, respectively. Finally, we integrated them using “per-edge average” method to construct GONet network as follows:
$$\begin{aligned} \bar{w}_{ij} =\frac{1}{M}\mathop \sum \limits _{k=1}^M (w_{ij} )_k \end{aligned}$$
where M is number of networks containing interaction between gene \(v_{i}\) and \(v_{j}\). \((w_{ij} )_k\) is the weight of interaction between \(v_{i}\) and \(v_{j}\) in network k.
After selecting most connected component, we finally obtained PPINet, GENet and GONet networks with size as shown in Table 1.

2.2 Disease similarity networks

OMIM-based disease similarity network
First, following the same procedure as in [32, 34], we collected a phenotypic disease similarity matrix from [50], where an element of the matrix represents degree of similarity between two phenotypes. The similarities in this matrix were calculated based on various text mining algorithms on OMIM records, which describe diseases using natural language [33]. By selecting only five neighbors which have largest similarities for each node, we constructed a phenotypic disease similarity network (shortly called OMIMNet) consisting of 19,791 interactions among 5080 phenotypes.
HPO-based disease similarity network
Second, to construct another disease similarity network, we calculated similarity among disease phenotypes based on human phenotype ontology (HPO, a controlled vocabulary database) [43] (i.e., root term for this ontology graph is All). More specifically, we collected HPO terms and corresponding annotation data at Human Phenotype Ontology database2 [43]. Then, we followed the same procedure as for gene ontology-based similarity networks to calculate similarity between every pair of disease phenotypes. Similarly, by selecting only five neighbors which have largest similarities for each node, we constructed a HPO-based disease similarity network (shortly called HPONet) consisting of 34,476 interactions among 6521 phenotypes.

2.3 A bipartite network

The bipartite network are known disease–gene associations collected from NCBI FTP repository.3 This connects a total of 3284 diseases and 2761 genes.

2.4 RWRH-based method

Given a connected weighted graph G(V, E) with a set of nodes \(V=\{v_{1}, v_{2}, {\ldots }, v_{N}\}\) and a set of links \(E=\{(v_{i}, v_{j})\vert v_{i}, v_{j}\in V\}\), a set of source/seed nodes \(S\subseteq V\) and a \(N\times N \) adjacency matrix W of link weights. Here, we are going to introduce algorithms for measuring relative importance of node \(v_{i}\) to S. By modeling a heterogeneous network of genes and diseases as a graph, ranking/prioritization of candidate genes/diseases is to predict novel genes/diseases associated with a disease of interest (d). The rankings of candidate genes/diseases are based on their relative importance to a set of known d-associated genes and d. This value also measures how much a candidate gene/disease is associated with d.

2.5 Random walk with restart (RWR) algorithm

Random walk with restart (RWR) is a variant of the random walk and it mimics a walker that moves from a current node to a randomly selected adjacent node or goes back to source nodes with a back-probability \(\gamma \in \) (0, 1). RWR can be formally described as follows:
$$\begin{aligned} P^{t+1}=( {1-\gamma }){W^{'}}P^t+\gamma P^0, \end{aligned}$$
where \(P^t\) is a \(N \times 1\) probability vector of \(\vert V\vert \) nodes at a time step t of which the ith element represents the probability of the walker being at node \(v_{i}\in V\), and \(P^0\) is the \(N\times \)1 initial probability vector. \({W^{'}}\)is the transition matrix of the graph, the (i, j) element in \({W^{'}}\), denotes a probability with which a walker at \(v_{i}\) moves to \(v_{j}\) among \(V\backslash {\{}v_{i}{\}}\). All nodes in the network are eventually ranked according to the steady-state probability vector \(P^\infty \). The steady state of each node represents its relative importance to the set of source nodes S.
This algorithm was used for disease gene prediction based on a homogeneous network of genes/proteins [22, 24]. In which, the transition matrix \({W^{'}}\) is defined as follows:
$$\begin{aligned} ({W^{'}})_{ij} =\frac{(W_\mathrm{G} )_{ij} }{\mathop \sum \nolimits _j (W_\mathrm{G})_{ij} }, \end{aligned}$$
where \(W_\mathrm{G}\) is adjacency matrix of the network of genes/proteins.
In addition, the set of source nodes (S) was specified by genes known to be associated with d. Therefore, the initial probability vector was defined as follows:
$$\begin{aligned} P^0=\left\{ {{\begin{array}{ll} {\frac{1}{\left| S \right| } \qquad \quad \mathrm{if} \, v_i \in S} \\ {0 \qquad \qquad \mathrm{otherwise}.} \\ \end{array} }} \right. \end{aligned}$$

2.6 Random walk with restart on heterogeneous network (RWRH) algorithm

This algorithm can be considered a variant of the RWR algorithm, since it was defined in the same formula as for RWR. The difference is construction of transition matrix \({W^{'}}\). More specifically, \({W^{'}}\) was defined as follows:
$$\begin{aligned} {W^{'}}=\left[ {{\begin{array}{l@{\quad }l} {W_\mathrm{G}^{'}} &{}{W_{\mathrm{GD}}^{'}} \\ {W_{\mathrm{DG}}^{'}} &{} {W_\mathrm{D}^{'}} \\ \end{array} }} \right] , \end{aligned}$$
where \(W_\mathrm{G}^{'}\) and \(W_\mathrm{D}^{'}\) are intra-subnetwork transition matrices of a network of genes/proteins and a disease similarity network, respectively. \(W_{\mathrm{GD}}^{'}\), \(W_{\mathrm{DG}}^{'}\) are inter-subnetwork transition matrices. Let \(\lambda \) be the jumping probability the random walker jumps from the network of genes/proteins to the disease similarity network or vice versa. Then, these matrices were defined as follows:
$$\begin{aligned} (W_{\mathrm{GD}}^{'} )_{i,j} =p({d_j \vert g_i })=\left\{ {{\begin{array}{l@{\quad }l} {\frac{(\lambda {W_{\mathrm{GD}} })_{ij} }{\sum _\mathrm{j} ( {W_{\mathrm{GD}} })_{ij}} \qquad \mathrm{if}\, \sum _{j} ( {W_{\mathrm{GD}} })_{ij} \ne 0}\\ {0 \qquad \qquad \qquad \mathrm{otherwise,}} \\ \end{array} }} \right. \end{aligned}$$
$$\begin{aligned} (W_{\mathrm{DG}}^{'} )_{i,j} =p( {g_j \vert d_i })=\left\{ {{\begin{array}{l@{\quad }l} {\frac{\lambda ( {W_{\mathrm{GD}} })_{ji} }{\sum _\mathrm{j}( {W_{\mathrm{GD}} })_{ji}} \qquad \mathrm{if} \, \sum _{j} ( {W_{\mathrm{GD}} })_{ji} \ne 0} \\ {0 \qquad \qquad \qquad \mathrm{otherwise,}} \\ \end{array} }} \right. \end{aligned}$$
$$\begin{aligned} (W_\mathrm{G}^{'} )_{i,j} =\left\{ {{\begin{array}{l@{\quad }l} {\frac{( {W_\mathrm{G} })_{ij} }{\sum _\mathrm{j} ( {W_\mathrm{G} })_{ij}} \qquad \qquad \qquad \mathrm{if} \, \sum _j ( {W_{\mathrm{GD}} })_{ij} =0} \\ {\frac{(1-\lambda )( {W_\mathrm{G} })_{ij} }{\sum _j ( {W_\mathrm{G} })_{ij} } \qquad \qquad \quad \mathrm{otherwise,}} \\ \end{array} }} \right. \end{aligned}$$
$$\begin{aligned} (W_\mathrm{D}^{'} )_{i,j} =\left\{ {{\begin{array}{l@{\quad }l} {\frac{( {W_\mathrm{D} })_{ij} }{\sum _\mathrm{j} ( {W_\mathrm{D}})_{ij} } \qquad \qquad \qquad \mathrm{if} \,\sum _j ( {W_{\mathrm{GD}} })_{ji} =0}\\ {\frac{(1-\lambda )( {W_\mathrm{D} })_{ij} }{\sum _j ( {W_\mathrm{D} })_{ij} } \qquad \qquad \quad \mathrm{otherwise,}}\\ \end{array} }} \right. \end{aligned}$$
where \(W_\mathrm{D}\) and \(W_{\mathrm{GD}}\) are adjacency matrices of the disease similarity and the bipartite networks.
By letting \(\eta \) be the parameter to weight the importance of each network, the initial probability vector was defined as follows:
$$\begin{aligned} P^0=\left\{ {{\begin{array}{ll} ( {1-\eta })\frac{1}{\left| S \right| } \qquad \qquad \quad \!\! \mathrm{if} v_i \in S \\ {\eta \qquad \qquad \qquad \qquad \quad \mathrm{if}\, v_i \equiv d}\\ 0 \qquad \qquad \qquad \qquad \quad \mathrm{otherwise.} \\ \end{array} }} \right. \end{aligned}$$
In case we are interested in a disease class/group, which contains set of diseases (D), \(P^0\) was defined as follows:
$$\begin{aligned} P^0=\left\{ {{\begin{array}{ll} {( {1-\eta })\frac{1}{\left| S \right| } \qquad \mathrm{if}\, v_i \in S}\\ {\eta \frac{1}{\left| D \right| }\quad \qquad \qquad \mathrm{if}\, v_i \in D} \\ {0 \qquad \qquad \qquad \,\, \mathrm{otherwise.}}\\ \end{array} }} \right. \end{aligned}$$
For these two algorithms, all remaining genes in the networks, which are not known to be associated with d or D, were selected as candidates for ranking.

3 Results and discussion

3.1 Performance comparison

Note that, our method was based on the construction of heterogeneous networks by integrating HPONet network with a gene/protein network. Therefore, three heterogeneous networks were constructed for our method, i.e., HPONet-PPINet, HPONet-GENet and HPONet-GONet. Meanwhile, heterogeneous networks in [32, 34] were OMIMNet-GONet and OMIMNet-PPINet, respectively. In addition to these five heterogeneous networks, we constructed OMIMNet-GENet for the comparison. To compare the performance of our method with that of others, we used leave-one-out cross-validation (LOOCV) method for each disease phenotype in a set of disease phenotypes which associates with at least one gene in the gene/protein networks. Due to the differences in size of gene/protein networks, the number of testing disease phenotypes was little different for different heterogeneous networks as shown in Table 1. Based on results of RWRH algorithm for prediction of disease-associated genes [32, 34] and prediction of disease-associated miRNAs [41], we set back-probability (i.e., \(\gamma )\), jumping probability (i.e., \(\lambda \)) and subnetwork importance weight (i.e., \(\eta \)) to 0.5, 0.6 and 0.7, respectively. For each disease phenotype (d), in each round of LOOCV, we held out one known d-associated gene. The rest of known d-associated genes and d were used as seed nodes. The held-out gene and remaining genes in the homogeneous network, which were not known to be associated with d, were ranked by the methods. Then, we plotted the receiver operating characteristic (ROC) curve and calculated the area under the curve (AUC) to compare the performance of the methods. This curve represents the relationship between sensitivity and (1\(-\)specificity), where sensitivity refers to the percentage of known d-associated genes that were ranked above a particular threshold and specificity refers to the percentage of genes which were not known to be associated top ranked below this threshold. Figure 2 shows that the performance of our method (i.e., HPONet-PPINet, HPONet-GENet and HPONet-GONet) was better than that of study [34] (i.e., OMIMNet-GONet), study [32] (i.e., OMIMNet-PPINet) and OMIMNet-GENet. In addition, the performance of heterogeneous networks, which were based on HPO, were comparable (i.e., AUC values for HPONet-PPINet, HPONet-GENet and HPONet-GONet were 0.927, 0.926 and 0.926, respectively). Similarly, the performance of heterogeneous networks, which were based on OMIM, were comparable (i.e., AUC values for OMIMNet-PPINet, OMIMNet-GENet and OMIMNet-GONet were 0.736, 0.73 and 0.71, respectively). These results indicate that HPO-based calculation of the disease similarity network (i.e., HPONet) better reflects functional relations among diseases than that based on text mining algorithms on OMIM records for the prediction of disease-associated genes.
Table 2
Nineteen evidenced Alzheimer’s disease-associated genes in top 100 ranked candidate genes
Rank
Gene Entrez ID
Gene symbol
PubMed ID
1
6622
SNCA
19022350, 21056999, 22836259, 23820587
2
348
APOE
11803456, 12000192, 12232782, 12498968, 12876259, 12960780, 14741429, 15165699, 15181247, 15184600, 15184629, 15455263, 16165272, 16796589, 17050040, 17089130, 17101827, 17374951, 17474819, 17524782, 17613540, 17659844, 17854398, 18058831, 18083276, 18205760, 18416843, 18505684, 18525129, 19116453, 19199875, 19339712, 19398704, 20198498, 20473139, 20479234, 20535486, 20538374, 21143177, 21283692, 21297273, 21297948, 21409287, 21556001, 21803501, 22016362, 22179327, 22269984, 22383234, 22502727, 22596266, 22712640, 22815080, 22899317, 23050006, 23183136, 23293020, 23571587, 23581910, 23627755, 23663404, 23668794, 23771217, 23948883, 24312462, 24388797, 24446209, 24473795, 24599963, 24603451
3
5621
PRNP
18349519, 19556894
9
1312
COMT
15488308, 22483294, 23034259, 24477323
21
4137
MAPT
15848182, 16165272, 16182262, 17920160, 18431250, 18431254, 18586097, 18806919, 19153649, 19523877, 19524111, 19560101, 20473135, 20678074, 21342022, 21348938, 21442128, 21489990, 23554879, 23597931, 25378699
24
7329
UBE2I
19765634
28
1508
CTSB
23024364
29
5663
PSEN1
12668610, 15159497, 15622541, 17229472, 17594345, 18028191, 18479822, 18525293, 19667325, 19796846, 22133015, 23850332
34
627
BDNF
12192623, 15838855, 15935057, 16054753, 19088493, 19522715, 22212405, 22364688, 24334212
37
5054
SERPINE1
19604604
38
5327
PLAT
22027013
41
4035
LRP1
15048651, 18706476, 22027013
42
5329
PLAUR
11814408
50
1815
DRD4
23034259
53
7345
UCHL1
16626667, 22660851, 22726800
73
5071
PARK2
19716418
83
6667
SP1
16378688, 23435408
94
5340
PLG
22027013
95
3952
LEP
21633502

3.2 Case study: Alzheimer’s disease

In this experiment, we tried to predict novel genes associated with Alzheimer’s disease (Shortly called AD) (MIM ID is 104300). AD is a multi-factorial and fatal neurodegenerative disorder for which the mechanisms leading to profound neuronal loss are incompletely recognized. There are 16 genes are known to be associated with AD [33]; however only eleven of them are available in the gene/protein networks. To predict novel genes associated with this disease, we selected the heterogeneous network comprising HPONet and GENet. Then we used these eleven genes and the MIM ID of AD as source nodes, and other genes in the homogeneous network as candidates. After all candidate genes were ranked, we selected 100 highly ranked candidates for evidence search about the association between them and AD from literature on PubMed using Entrez Programming Utilites [51]. Table 2 shows 19 evidenced candidate genes. For instance, study [52] (PubMed ID: 16378688) showed that SP1 deposition in hyper-phosphorylated tau deposits may have functional consequences in the pathology of AD. In addition, it was suggested that UBE2I polymorphisms might be associated with a risk of AD [53] (PubMed ID: 19765634). Also, low protein levels of UCHL1 are associated with high protein levels of BACE1 in sporadic AD brains [54] (PubMed ID: 22726800). Finally, enhancing CTSB activity could lower Abeta, especially Abeta42, in AD patients with or without familial mutations [55] (PubMed ID: 23024364). Other not yet evidenced genes in the top 100 genes can be good candidates for biologists for further investigation (see Online Resource 1).

4 Conclusions

It was reported in previous studies that disease similarity improves the performance of prediction of novel disease-associated genes, since it better exploits the “disease module” principle. Based on this, methods on a heterogeneous networks comprising a disease similarity network and a gene/protein network are superior to those which are solely based on the gene/protein network. However, construction of the disease similarity network in previous studies are limited since they mostly based on an out-of-date disease similarity matrix, which was constructed using text mining algorithms on OMIM records. Considering that human phenotype ontology is now available and it well annotates to disease phenotypes, disease similarity can be semantically calculated based on such the controlled vocabulary using semantic-based similarity measures. Therefore, in this study, instead of using the OMIM-based disease similarity network, we construct a HPO-based one using a semantic similarity measure. Using the random walk with restart algorithm on a heterogeneous network, we compared the performance of the heterogeneous network built based on our method with that based on the OMIM-based disease similarity network. Simulation results show that our method is better irrespective of gene/protein networks. This indicates that the HPO-based disease similarity network better exposed functional similarities among diseases than that of OMIM-based one. A case study on Alzheimer’s disease has been done to show the ability of our method in predicting novel disease-associated genes. We also note that, many other semantic similarity measures proposed to calculate similarity between annotated biomedical entities can be used to construct disease similarity networks. In addition, these networks can be constructed based on shared pathways [35], shared miRNA [36], shared protein complex [37], shared disease ontology [38] and disease comorbidity [39]. Therefore, it would be interesting for future studies to test which one is best for the prediction of novel disease-associated genes.

Acknowledgments

This research is funded by Ministry of Education and Training (MOET) under Grant Number B2014-01-84.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://​creativecommons.​org/​licenses/​by/​4.​0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Anhänge

Electronic supplementary material

Below is the link to the electronic supplementary material.
Literatur
1.
Zurück zum Zitat Kann, M.G.: Advances in translational bioinformatics: computational approaches for the hunting of disease genes. Brief. Bioinform. 11(1), 96–110 (2009). doi:10.1093/bib/bbp048 CrossRef Kann, M.G.: Advances in translational bioinformatics: computational approaches for the hunting of disease genes. Brief. Bioinform. 11(1), 96–110 (2009). doi:10.​1093/​bib/​bbp048 CrossRef
2.
Zurück zum Zitat Tranchevent, L.-C., Capdevila, F.B., Nitsch, D., De Moor, B., De Causmaecker, P., Moreau, Y.: A guide to web tools to prioritize candidate genes. Brief. Bioinform. 12(1), 22–32 (2010). doi:10.1093/bib/bbq007 CrossRef Tranchevent, L.-C., Capdevila, F.B., Nitsch, D., De Moor, B., De Causmaecker, P., Moreau, Y.: A guide to web tools to prioritize candidate genes. Brief. Bioinform. 12(1), 22–32 (2010). doi:10.​1093/​bib/​bbq007 CrossRef
4.
Zurück zum Zitat Jones, D.: Steps on the road to personalized medicine. Nat. Rev. Drug Discov. 6(10), 770–771 (2007)CrossRef Jones, D.: Steps on the road to personalized medicine. Nat. Rev. Drug Discov. 6(10), 770–771 (2007)CrossRef
7.
Zurück zum Zitat Aerts, S., Lambrechts, D., Maity, S., Van Loo, P., Coessens, B., De Smet, F., Tranchevent, L.-C., De Moor, B., Marynen, P., Hassan, B., Carmeliet, P., Moreau, Y.: Gene prioritization through genomic data fusion. Nat. Biotechnol. 24(5), 537–544 (2006)CrossRef Aerts, S., Lambrechts, D., Maity, S., Van Loo, P., Coessens, B., De Smet, F., Tranchevent, L.-C., De Moor, B., Marynen, P., Hassan, B., Carmeliet, P., Moreau, Y.: Gene prioritization through genomic data fusion. Nat. Biotechnol. 24(5), 537–544 (2006)CrossRef
8.
Zurück zum Zitat Chen, J., Xu, H., Aronow, B., Jegga, A.: Improved human disease candidate gene prioritization using mouse phenotype. BMC Bioinform. 8(1), 392 (2007)CrossRef Chen, J., Xu, H., Aronow, B., Jegga, A.: Improved human disease candidate gene prioritization using mouse phenotype. BMC Bioinform. 8(1), 392 (2007)CrossRef
9.
Zurück zum Zitat Le, D.-H., Xuan Hoai, N., Kwon, Y.-K.: A Comparative study of classification-based machine learning methods for novel disease gene prediction. In: Nguyen, V.-H., Le, A.-C., Huynh, V.-N. (eds.) Knowledge and Systems Engineering, vol. 326. Advances in Intelligent Systems and Computing, pp. 577–588. Springer International Publishing (2015) Le, D.-H., Xuan Hoai, N., Kwon, Y.-K.: A Comparative study of classification-based machine learning methods for novel disease gene prediction. In: Nguyen, V.-H., Le, A.-C., Huynh, V.-N. (eds.) Knowledge and Systems Engineering, vol. 326. Advances in Intelligent Systems and Computing, pp. 577–588. Springer International Publishing (2015)
10.
Zurück zum Zitat Lospez-Bigas, N., Ouzounis, C.A.: Genome-wide identification of genes likely to be involved in human genetic disease. Nucleic Acids Res. 32(10), 3108–3114 (2004)CrossRef Lospez-Bigas, N., Ouzounis, C.A.: Genome-wide identification of genes likely to be involved in human genetic disease. Nucleic Acids Res. 32(10), 3108–3114 (2004)CrossRef
11.
Zurück zum Zitat Adie, E., Adams, R., Evans, K., Porteous, D., Pickard, B.: Speeding disease gene discovery by sequence based candidate prioritization. BMC Bioinform. 6(1), 55 (2005)CrossRef Adie, E., Adams, R., Evans, K., Porteous, D., Pickard, B.: Speeding disease gene discovery by sequence based candidate prioritization. BMC Bioinform. 6(1), 55 (2005)CrossRef
13.
Zurück zum Zitat Calvo, S., Jain, M., Xie, X., Sheth, S.A., Chang, B., Goldberger, O.A., Spinazzola, A., Zeviani, M., Carr, S.A., Mootha, V.K.: Systematic identification of human mitochondrial disease genes through integrative genomics. Nat. Genet. 38(5), 576–582 (2006)CrossRef Calvo, S., Jain, M., Xie, X., Sheth, S.A., Chang, B., Goldberger, O.A., Spinazzola, A., Zeviani, M., Carr, S.A., Mootha, V.K.: Systematic identification of human mitochondrial disease genes through integrative genomics. Nat. Genet. 38(5), 576–582 (2006)CrossRef
14.
Zurück zum Zitat Lage, K., Karlberg, E.O., Storling, Z.M., Olason, P.I., Pedersen, A.G., Rigina, O., Hinsby, A.M., Tumer, Z., Pociot, F., Tommerup, N., Moreau, Y., Brunak, S.: A human phenome-interactome network of protein complexes implicated in genetic disorders. Nat. Biotech. 25(3), 309–316 (2007)CrossRef Lage, K., Karlberg, E.O., Storling, Z.M., Olason, P.I., Pedersen, A.G., Rigina, O., Hinsby, A.M., Tumer, Z., Pociot, F., Tommerup, N., Moreau, Y., Brunak, S.: A human phenome-interactome network of protein complexes implicated in genetic disorders. Nat. Biotech. 25(3), 309–316 (2007)CrossRef
15.
Zurück zum Zitat Smalter, A., Lei, S.F., Chen, X.: Human disease-gene classification with integrative sequence-based and topological features of protein-protein interaction networks. In: IEEE International conference on bioinformatics and biomedicine (BIBM), pp. 209–216 (2007) Smalter, A., Lei, S.F., Chen, X.: Human disease-gene classification with integrative sequence-based and topological features of protein-protein interaction networks. In: IEEE International conference on bioinformatics and biomedicine (BIBM), pp. 209–216 (2007)
16.
Zurück zum Zitat Radivojac, P., Peng, K., Clark, W.T., Peters, B.J., Mohan, A., Boyle, S.M., Mooney, S.D.: An integrated approach to inferring gene-disease associations in humans. Proteins Struct. Funct. Bioinform. 72(3), 1030–1037 (2008). doi:10.1002/prot.21989 CrossRef Radivojac, P., Peng, K., Clark, W.T., Peters, B.J., Mohan, A., Boyle, S.M., Mooney, S.D.: An integrated approach to inferring gene-disease associations in humans. Proteins Struct. Funct. Bioinform. 72(3), 1030–1037 (2008). doi:10.​1002/​prot.​21989 CrossRef
17.
Zurück zum Zitat Keerthikumar, S., Bhadra, S., Kandasamy, K., Raju, R., Ramachandra, Y.L., Bhattacharyya, C., Imai, K., Ohara, O., Mohan, S., Pandey, A.: Prediction of candidate primary immunodeficiency disease genes using a support vector machine learning approach. DNA Res. 16(6), 345–351 (2009)CrossRef Keerthikumar, S., Bhadra, S., Kandasamy, K., Raju, R., Ramachandra, Y.L., Bhattacharyya, C., Imai, K., Ohara, O., Mohan, S., Pandey, A.: Prediction of candidate primary immunodeficiency disease genes using a support vector machine learning approach. DNA Res. 16(6), 345–351 (2009)CrossRef
18.
Zurück zum Zitat Jiabao, S., Patra, J.C., Yongjin, L.: Functional link artificial neural network-based disease gene prediction. In: International joint conference on neural networks (IJCNN), 14–19 June 2009, pp. 3003–3010 (2009) Jiabao, S., Patra, J.C., Yongjin, L.: Functional link artificial neural network-based disease gene prediction. In: International joint conference on neural networks (IJCNN), 14–19 June 2009, pp. 3003–3010 (2009)
19.
Zurück zum Zitat Le, D.-H., Nguyen, M.-H.: Towards more realistic machine learning techniques for prediction of disease-associated genes. In: Proceedings of the sixth international symposium on information and communication technology, Hue City, 2833269, ACM, pp. 116–120 (2015) Le, D.-H., Nguyen, M.-H.: Towards more realistic machine learning techniques for prediction of disease-associated genes. In: Proceedings of the sixth international symposium on information and communication technology, Hue City, 2833269, ACM, pp. 116–120 (2015)
21.
Zurück zum Zitat Barabasi, A.-L., Gulbahce, N., Loscalzo, J.: Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12(1), 56–68 (2011)CrossRef Barabasi, A.-L., Gulbahce, N., Loscalzo, J.: Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12(1), 56–68 (2011)CrossRef
22.
Zurück zum Zitat Kohler, S., Bauer, S., Horn, D., Robinson, P.: Walking the Interactome for prioritization of candidate disease genes. Am. J. Hum. Genet. 82(4), 949–958 (2008)CrossRef Kohler, S., Bauer, S., Horn, D., Robinson, P.: Walking the Interactome for prioritization of candidate disease genes. Am. J. Hum. Genet. 82(4), 949–958 (2008)CrossRef
23.
Zurück zum Zitat Chen, J., Aronow, B., Jegga, A.: Disease candidate gene identification and prioritization using protein interaction networks. BMC Bioinform. 10(1), 73 (2009)CrossRef Chen, J., Aronow, B., Jegga, A.: Disease candidate gene identification and prioritization using protein interaction networks. BMC Bioinform. 10(1), 73 (2009)CrossRef
24.
Zurück zum Zitat Le, D.-H., Kwon, Y.-K.: GPEC: a Cytoscape plug-in for random walk-based gene prioritization and biomedical evidence collection. Comput. Biol. Chem. 37, 17–23 (2012)CrossRef Le, D.-H., Kwon, Y.-K.: GPEC: a Cytoscape plug-in for random walk-based gene prioritization and biomedical evidence collection. Comput. Biol. Chem. 37, 17–23 (2012)CrossRef
28.
Zurück zum Zitat Le, D.-H.: A novel method for identifying disease associated protein complexes based on functional similarity protein complex networks. Algo. Mol. Biol. 10(1), 14 (2015)CrossRef Le, D.-H.: A novel method for identifying disease associated protein complexes based on functional similarity protein complex networks. Algo. Mol. Biol. 10(1), 14 (2015)CrossRef
33.
Zurück zum Zitat Amberger, J., Bocchini, C.A., Scott, A.F., Hamosh, A.: McKusick’s online Mendelian inheritance in man (OMIM). Nucleic Acids Res. 37(suppl 1), D793–D796 (2009). doi:10.1093/nar/gkn665 CrossRef Amberger, J., Bocchini, C.A., Scott, A.F., Hamosh, A.: McKusick’s online Mendelian inheritance in man (OMIM). Nucleic Acids Res. 37(suppl 1), D793–D796 (2009). doi:10.​1093/​nar/​gkn665 CrossRef
34.
Zurück zum Zitat Jiang, R., Gan, M., He, P.: Constructing a gene semantic similarity network for the inference of disease genes. BMC Syst. Biol. 5(Suppl 2), S2 (2011)CrossRef Jiang, R., Gan, M., He, P.: Constructing a gene semantic similarity network for the inference of disease genes. BMC Syst. Biol. 5(Suppl 2), S2 (2011)CrossRef
35.
Zurück zum Zitat Li, Y., Agarwal, P.: A pathway-based view of human diseases and disease relationships. PLoS ONE 4(2), e4346 (2009)CrossRef Li, Y., Agarwal, P.: A pathway-based view of human diseases and disease relationships. PLoS ONE 4(2), e4346 (2009)CrossRef
36.
Zurück zum Zitat Lu, M., Zhang, Q., Deng, M., Miao, J., Guo, Y., Gao, W., Cui, Q.: An analysis of human microRNA and disease associations. PLoS ONE 3(10), e3420 (2008)CrossRef Lu, M., Zhang, Q., Deng, M., Miao, J., Guo, Y., Gao, W., Cui, Q.: An analysis of human microRNA and disease associations. PLoS ONE 3(10), e3420 (2008)CrossRef
37.
Zurück zum Zitat Markou, M., Singh, S.: Novelty detection: a review—part 2: neural network based approaches. Signal Process. 83(12), 2499–2521 (2003)CrossRefMATH Markou, M., Singh, S.: Novelty detection: a review—part 2: neural network based approaches. Signal Process. 83(12), 2499–2521 (2003)CrossRefMATH
38.
Zurück zum Zitat Li, J., Gong, B., Chen, X., Liu, T., Wu, C., Zhang, F., Li, C., Li, X., Rao, S., Li, X.: DOSim: an R package for similarity between diseases based on disease ontology. BMC Bioinform. 12(1), 266 (2011)CrossRef Li, J., Gong, B., Chen, X., Liu, T., Wu, C., Zhang, F., Li, C., Li, X., Rao, S., Li, X.: DOSim: an R package for similarity between diseases based on disease ontology. BMC Bioinform. 12(1), 266 (2011)CrossRef
39.
Zurück zum Zitat Lee, D.S., Park, J., Kay, K.A., Christakis, N.A., Oltvai, Z.N., Barabasi, A.L.: The implications of human metabolic network topology for disease comorbidity. Proc. Natl. Acad. Sci. 105(29), 9880–9885 (2008). doi:10.1073/pnas.0802208105 CrossRef Lee, D.S., Park, J., Kay, K.A., Christakis, N.A., Oltvai, Z.N., Barabasi, A.L.: The implications of human metabolic network topology for disease comorbidity. Proc. Natl. Acad. Sci. 105(29), 9880–9885 (2008). doi:10.​1073/​pnas.​0802208105 CrossRef
40.
Zurück zum Zitat Chen, X., Liu, M.-X., Yan, G.-Y.: Drug-target interaction prediction by random walk on the heterogeneous network. Mol. Biosyst. 8(7), 1970–1978 (2012). doi:10.1039/C2MB00002D CrossRef Chen, X., Liu, M.-X., Yan, G.-Y.: Drug-target interaction prediction by random walk on the heterogeneous network. Mol. Biosyst. 8(7), 1970–1978 (2012). doi:10.​1039/​C2MB00002D CrossRef
41.
Zurück zum Zitat Le, D.-H.: Disease phenotype similarity improves the prediction of novel disease-associated microRNAs. In: 2015 2nd National Foundation for Science and Technology Development conference on information and computer science (NICS), 16–18 Sept 2015, pp. 76–81 (2015) Le, D.-H.: Disease phenotype similarity improves the prediction of novel disease-associated microRNAs. In: 2015 2nd National Foundation for Science and Technology Development conference on information and computer science (NICS), 16–18 Sept 2015, pp. 76–81 (2015)
42.
Zurück zum Zitat Zhou, M., Wang, X., Li, J., Hao, D., Wang, Z., Shi, H., Han, L., Zhou, H., Sun, J.: Prioritizing candidate disease-related long non-coding RNAs by walking on the heterogeneous lncRNA and disease network. Mol. Biosyst. 11(3), 760–769 (2015). doi:10.1039/C4MB00511B CrossRef Zhou, M., Wang, X., Li, J., Hao, D., Wang, Z., Shi, H., Han, L., Zhou, H., Sun, J.: Prioritizing candidate disease-related long non-coding RNAs by walking on the heterogeneous lncRNA and disease network. Mol. Biosyst. 11(3), 760–769 (2015). doi:10.​1039/​C4MB00511B CrossRef
43.
Zurück zum Zitat Köhler, S., Doelken, S.C., Mungall, C.J., Bauer, S., Firth, H.V., Bailleul-Forestier, I., Black, G.C.M., Brown, D.L., Brudno, M., Campbell, J., FitzPatrick, D.R., Eppig, J.T., Jackson, A.P., Freson, K., Girdea, M., Helbig, I., Hurst, J.A., Jãhn, J., Jackson, L.G., Kelly, A.M., Ledbetter, D.H., Mansour, S., Martin, C.L., Moss, C., Mumford, A., Ouwehand, W.H., Park, S.M., Riggs, E.R., Scott, R.H., Sisodiya, S., Vooren, S.V., Wapner, R.J., Wilkie, A.O.M., Wright, C.F., Vulto-van Silfhout, A.T., Leeuw, N., de Vries, B.B.A., Washingthon, N.L., Smith, C.L., Westerfield, M., Schofield, P., Ruef, B.J., Gkoutos, G.V., Haendel, M., Smedley, D., Lewis, S.E., Robinson, P.N.: The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 42(D1), D966–D974 (2014). doi:10.1093/nar/gkt1026 Köhler, S., Doelken, S.C., Mungall, C.J., Bauer, S., Firth, H.V., Bailleul-Forestier, I., Black, G.C.M., Brown, D.L., Brudno, M., Campbell, J., FitzPatrick, D.R., Eppig, J.T., Jackson, A.P., Freson, K., Girdea, M., Helbig, I., Hurst, J.A., Jãhn, J., Jackson, L.G., Kelly, A.M., Ledbetter, D.H., Mansour, S., Martin, C.L., Moss, C., Mumford, A., Ouwehand, W.H., Park, S.M., Riggs, E.R., Scott, R.H., Sisodiya, S., Vooren, S.V., Wapner, R.J., Wilkie, A.O.M., Wright, C.F., Vulto-van Silfhout, A.T., Leeuw, N., de Vries, B.B.A., Washingthon, N.L., Smith, C.L., Westerfield, M., Schofield, P., Ruef, B.J., Gkoutos, G.V., Haendel, M., Smedley, D., Lewis, S.E., Robinson, P.N.: The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 42(D1), D966–D974 (2014). doi:10.​1093/​nar/​gkt1026
44.
Zurück zum Zitat Pesquita, C., Faria, D., Falcão, A.O., Lord, P., Couto, F.M.: Semantic similarity in biomedical ontologies. PLoS Comput. Biol. 5(7), e1000443 (2009)MathSciNetCrossRef Pesquita, C., Faria, D., Falcão, A.O., Lord, P., Couto, F.M.: Semantic similarity in biomedical ontologies. PLoS Comput. Biol. 5(7), e1000443 (2009)MathSciNetCrossRef
45.
Zurück zum Zitat Obayashi, T., Kinoshita, K.: COXPRESdb: a database to compare gene coexpression in seven model animals. Nucleic Acids Res. 39(suppl 1), D1016–D1022 (2011). doi:10.1093/nar/gkq1147 CrossRef Obayashi, T., Kinoshita, K.: COXPRESdb: a database to compare gene coexpression in seven model animals. Nucleic Acids Res. 39(suppl 1), D1016–D1022 (2011). doi:10.​1093/​nar/​gkq1147 CrossRef
46.
Zurück zum Zitat Obayashi, T., Kinoshita, K., Nakai, K., Shibaoka, M., Hayashi, S., Saeki, M., Shibata, D., Saito, K., Ohta, H.: ATTED-II: a database of co-expressed genes and cis elements for identifying co-regulated gene groups in Arabidopsis. Nucleic Acids Res. 35(suppl 1), D863–D869 (2006). doi:10.1093/nar/gkl783 Obayashi, T., Kinoshita, K., Nakai, K., Shibaoka, M., Hayashi, S., Saeki, M., Shibata, D., Saito, K., Ohta, H.: ATTED-II: a database of co-expressed genes and cis elements for identifying co-regulated gene groups in Arabidopsis. Nucleic Acids Res. 35(suppl 1), D863–D869 (2006). doi:10.​1093/​nar/​gkl783
47.
Zurück zum Zitat UniProt Consortium: The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res. 38, D142–D148 (2010)CrossRef UniProt Consortium: The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res. 38, D142–D148 (2010)CrossRef
48.
Zurück zum Zitat Barrell, D., Dimmer, E., Huntley, R.P., Binns, D., O’Donovan, C., Apweiler, R.: The GOA database in 2009—an integrated Gene Ontology Annotation resource. Nucleic Acids Res. 37(suppl 1), D396–D403 (2009). doi:10.1093/nar/gkn803 CrossRef Barrell, D., Dimmer, E., Huntley, R.P., Binns, D., O’Donovan, C., Apweiler, R.: The GOA database in 2009—an integrated Gene Ontology Annotation resource. Nucleic Acids Res. 37(suppl 1), D396–D403 (2009). doi:10.​1093/​nar/​gkn803 CrossRef
49.
Zurück zum Zitat Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. Paper presented at the 14th international joint conference on artificial intelligence, vol. 1, Montreal Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. Paper presented at the 14th international joint conference on artificial intelligence, vol. 1, Montreal
50.
Zurück zum Zitat van Driel, M.A., Bruggeman, J., Vriend, G., Brunner, H.G., Leunissen, J.A.M.: A text-mining analysis of the human phenome. Eur. J. Hum. Genet. 14(5), 535–542 (2006)CrossRef van Driel, M.A., Bruggeman, J., Vriend, G., Brunner, H.G., Leunissen, J.A.M.: A text-mining analysis of the human phenome. Eur. J. Hum. Genet. 14(5), 535–542 (2006)CrossRef
51.
53.
Zurück zum Zitat Ahn, K., Song, J.H., Kim, D.K., Park, M.H., Jo, S.A., Koh, Y.H.: Ubc9 gene polymorphisms and late-onset Alzheimer’s disease in the Korean population: a genetic association study. Neurosci. Lett. 465(3), 272–275 (2009). doi:10.1016/j.neulet.2009.09.017 CrossRef Ahn, K., Song, J.H., Kim, D.K., Park, M.H., Jo, S.A., Koh, Y.H.: Ubc9 gene polymorphisms and late-onset Alzheimer’s disease in the Korean population: a genetic association study. Neurosci. Lett. 465(3), 272–275 (2009). doi:10.​1016/​j.​neulet.​2009.​09.​017 CrossRef
54.
Zurück zum Zitat Guglielmotto, M., Monteleone, D., Boido, M., Piras, A., Giliberto, L., Borghi, R., Vercelli, A., Fornaro, M., Tabaton, M., Tamagno, E.: A\({\rm \beta } \)1-42-mediated down-regulation of Uch-L1 is dependent on NF-\(\kappa \)B activation and impaired BACE1 lysosomal degradation. Aging Cell 11(5), 834–844 (2012). doi:10.1111/j.1474-9726.2012.00854.x CrossRef Guglielmotto, M., Monteleone, D., Boido, M., Piras, A., Giliberto, L., Borghi, R., Vercelli, A., Fornaro, M., Tabaton, M., Tamagno, E.: A\({\rm \beta } \)1-42-mediated down-regulation of Uch-L1 is dependent on NF-\(\kappa \)B activation and impaired BACE1 lysosomal degradation. Aging Cell 11(5), 834–844 (2012). doi:10.​1111/​j.​1474-9726.​2012.​00854.​x CrossRef
55.
Zurück zum Zitat Wang, C., Sun, B., Zhou, Y., Grubb, A., Gan, L.: Cathepsin B degrades amyloid-\(\beta \) in Mice expressing wild-type human amyloid precursor protein. J. Biol. Chem. 287(47), 39834–39841 (2012). doi:10.1074/jbc.M112.371641 CrossRef Wang, C., Sun, B., Zhou, Y., Grubb, A., Gan, L.: Cathepsin B degrades amyloid-\(\beta \) in Mice expressing wild-type human amyloid precursor protein. J. Biol. Chem. 287(47), 39834–39841 (2012). doi:10.​1074/​jbc.​M112.​371641 CrossRef
Metadaten
Titel
Ontology-based disease similarity network for disease gene prediction
verfasst von
Duc-Hau Le
Vu-Tung Dang
Publikationsdatum
01.08.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
Vietnam Journal of Computer Science / Ausgabe 3/2016
Print ISSN: 2196-8888
Elektronische ISSN: 2196-8896
DOI
https://doi.org/10.1007/s40595-016-0063-3

Weitere Artikel der Ausgabe 3/2016

Vietnam Journal of Computer Science 3/2016 Zur Ausgabe