Keywords
Cytoscape app, Automation features, CyREST, R, Disease-gene association, Disease-disease association, Random walk with restart algorithm, Heterogeneous network, Gene prioritization, Disease prioritization
This article is included in the Cytoscape gateway.
Cytoscape app, Automation features, CyREST, R, Disease-gene association, Disease-disease association, Random walk with restart algorithm, Heterogeneous network, Gene prioritization, Disease prioritization
One of the challenging tasks in biomedicine is to prioritize candidate genes and diseases by the degree of their relevance to a disease of interest. This is the starting point to identify novel disease-gene and disease-disease associations. A large number of computational methods including network- and machine learning-based ones have been proposed for such a task1,2. State-of-the-art network-based methods often integrate diseases and genes together to form a heterogeneous network, then a propagation algorithm is applied to exploit the similarity between diseases/genes and known disease-gene associations to predict novel associations3–7. Some tools have been also developed to facilitate the use of the state-of-the-art methods. However, most of them only focus on predicting novel disease-gene associations8–10, including some tools which were developed as apps of Cytoscape11. Recently, we have developed a Cytoscape app, HGPEC12, to predict both disease-gene and disease-disease associations based on a state-of-the-art method on a heterogeneous network of diseases and genes3. HGPEC was shown to be better than two other network-based Cytoscape apps for prediction of novel disease-gene associations, GPEC13 and PRINCIPLE14 in terms of prediction performance12. In addition, HGPEC can prioritize candidate genes of diseases without known molecular basis and collect evidence to support novel predictions from various data resources such as Gene Ontology15, Disease Ontology16, KEGG pathway17, GeneRIF18, PubMed19, protein complexes20 and OMIM21. Being developed as an app of Cytoscape, HGPEC can exploit advanced features of Cytoscape such as data visualization and integration. However, Cytoscape is a desktop-based tool, thus HGPEC cannot link to other analysis tools such as R and Python flexibly. Therefore, this also limits the use of HGPEC because it cannot be used automatically as a component of a complex analysis pipeline in these tools. In addition, this prevents Cytoscape from integrating data from other data resources. Recently, automation features have been added to Cytoscape to facilitate those tasks.
In this study, we upgrade HGPEC by adding automation features into it and name the new app as autoHGPEC. Basically, autoHGPEC has the same functions as HGPEC. However, these functions can be called by both CyREST functions and commands, thus can be called from external environments. To use autoHGPEC, a heterogeneous network of diseases and genes composing of a disease similarity network, a gene/protein network and known disease-gene associations has to be given. Then, a disease of interest must be selected from the disease similarity network. After that, the disease and its known associated genes (if any) are used as training/seed data. A set of candidate genes then has to be defined by selecting from the gene network or chromosome. These candidate genes and all remaining diseases are then ranked by a RWRH-based method (see the Methods section). Finally, users can select top ranked genes/diseases for further analyses such as visualization and evidence collection. We show the ability of autoHGPEC in predicting novel genes and diseases associated with breast cancer.
autoHGPEC was implemented using a ranking algorithm, random walk with restart on a heterogeneous network (RWRH)12. Briefly, this network-based algorithm propagates the disease information embedded in a disease of interest and its known associated genes (also known as seed/training nodes) to other diseases and genes in the heterogeneous network. This propagation is performed by random walking from the seed nodes. At each node, the random walker goes to adjacency nodes or goes back to the seed nodes with a prior probability. This process is repeated iteratively until a steady-state is reached. A score assigned to each node at this state represents the degree of relevance to the seed nodes, thus relevance to the disease of interest. Finally, candidate genes and diseases are ranked by the scores and top ranked candidates can be selected as promising genes and diseases for further investigation.
autoHGPEC is an upgrading version of HGPEC12 with added automation features. Therefore, main functions such as prioritization, visualization and evidence collection of HGPEC were kept. In addition, as in HGPEC, a number of databases were preinstalled in autoHGPEC to facilitate the use of this app. These include disease similarity networks, gene/protein networks and known disease-gene associations as well as annotation data such as Gene Ontology15, Disease Ontology16, KEGG pathways17, GeneRIF18, and protein complexes20. However, users can also select other networks by themselves. In order to provide automation features for HGPEC, we first refactor source code of HGPEC to implement Cytoscape Tunable annotations to replace control panels of HGPEC in the west by a menu system. Therefore, all the functions of HGPEC are accessed through the menu system. In addition, the workflow of HGPEC is exposed to the users by using CyREST Command API (which can be followed in Swagger UI under the menu autoHGPEC). The CyREST API is developed with appropriated functions as well. Thus, the result of each step in the workflow can be passed on to the caller for further analysis in R or Python in JSON format.
autoHGPEC is designed to predict novel disease-gene and disease-disease associations and evidence collection based on a random walk on heterogeneous network with added automation features. Therefore, it operates in the same workflow as in HGPEC12. However, in addition to desktop-based Cytoscape though the menu system, its functions can be called using CyREST Command API and from other analysis tools such as R. Figure 1 show the workflow of autoHGPEC in three running environments (see user manual in Supplementary File 1). As an app of Cytoscape with automation features, autoHGPEC can be run on any computer which satisfies the minimal requirements to run Cytoscape.
To demonstrate functions of autoHGPEC with automation features, we showed its ability in predicting novel genes and diseases associated with breast cancer (OMIM ID: 114480). Here, we briefly describe this case study by following the 5-step workflow in Figure 1 (see user manual in Supplementary File 1 for more detail):
- First, a heterogeneous network of genes and diseases was constructed by connecting a preinstalled disease similarity network (i.e., Disease_Similarity_Network_5) including 5,080 diseases and 19,729 interactions, a preinstalled human protein interaction network (i.e., Default_Human_PPI_Network) including 10,486 genes and 50,791 interactions, and known disease-gene associations collected from OMIM21. This step can be accomplished by following commands from within R:
- Second, breast cancer (OMIM ID: 114480) was selected for investigation. This disease is known to be associated with 21 genes, which are also available in the human protein interaction network. Then, the training set was built with these genes and the disease of interest. We can run two following commands within R for this task:
- Third, we selected all of 10,465 remaining genes in the protein interaction network as candidate genes. This option can be done by following command:
- Fourth, all genes and diseases in the heterogeneous network are ranked by applying the RWRH-based method with back-probability, jumping probability and subnetwork importance weight were set to 0.5, 0.6 and 0.7, respectively. The following command can be used to accomplish this task:
- Finally, we visualized and collected evidence for the associations between 20 highly ranked candidate genes/diseases and breast cancer. The users must highlight the diseases and genes of their interest in the corresponding network. These tasks can be performed using two following commands, respectively:
Visualization results (Figure 2a and b) show that most of the top ranked candidate genes are directly connected to known breast cancer-associated genes. In addition, highly ranked candidate diseases are directly connected to either known/training genes or the disease of interest. For evidence collection, we annotated and searched evidence for promising associations between the top ranked candidate genes/disease and breast cancer. Evidence collection results showed that each of the promising associations is supported by at least two data sources. More detail about interpretation on the results of visualization and evidence collection for these associations can be found in the HGPEC study12. Beside the fact is that almost commands of autoHGPEC return results in JSON format, the results of autoHGPEC is revealed via CyREST API as well (menu Help/Automation/CyREST Api). For example, the command in R, commandRun('autoHGPEC step2_1_select_disease diseaseName="breast cancer"'), in Step 2 can be performed directly by CyREST API with the request URL http://localhost:1234/autohgpec/v1/selectDisease/breast%20cancer (this URL is available after successfully constructing the heterogeneous network in Step 1). Then, it returns a list of OMIM IDs associated with “breast cancer” in JSON format as follows:
[
{
"name": "BREAST CANCER 1 GENE; BRCA1",
"DiseaseID": "MIM113705",
"MedGenCUI": ""
},
{
"name": "BREAST CANCER",
"DiseaseID": "MIM114480",
"MedGenCUI": "C0346153",
"AssociatedGenes": "5888, 3845, 83990, 8493, 580, 841, 3161, 7517, 9821, 79728, 5245, 5002, 672, 675, 5290, 11200, 207, 472, 4835, 999, 7157, 8438"
},
{
"name": "BREAST-OVARIAN CANCER, FAMILIAL, SUSCEPTIBILITY TO, 1; BROVCA1",
"DiseaseID": "MIM604370",
"MedGenCUI": "C2676676",
"AssociatedGenes": "4978, 2956, 672, 5290, 207, 5071"
},
{
"name": "BREAST CANCER ANTIESTROGEN RESISTANCE 3; BCAR3",
"DiseaseID": "MIM604704",
"MedGenCUI": ""
}
]
Therefore, users can easily call this CyREST API and use this result in their workflow as they need.
Random walk with restart algorithm on heterogeneous network of diseases and genes was shown as a state-of-the-art method for predicting novel disease-gene and disease-disease associations compared to other network-based algorithms3,12. However, its prediction performance highly depends on the used heterogeneous network, which is a combination of a disease similarity network and a gene/protein interaction network and known disease-gene associations. Indeed, a study showed that the prediction performance can be improved by using a gene ontology-based gene similarity network instead of using the human protein interaction network22. In addition, we have recently shown that using the disease similarity network constructed by Human Phenotype Ontology23 improved the prediction performance of disease-associated genes24 as well as disease-associated non-coding RNAs25,26. Therefore, to facilitate the use of the similarity networks of diseases/genes, we enable user to provide these networks by themselves. For gene/protein network, user can import the network from various molecular interaction data sources or from other analysis pipelines. Similarly, disease similarity networks can be inputted from other analysis tools such as DOSim27 and HPOSim28. Moreover, the ranked candidate genes can be used as inputs of other annotation and enrichment toolkits to support more about their associations with the disease of interest such as DAVID29 and GSEA30. Taken together, with added automation features, autoHGPEC can be more useful and reached by a wider range of users.
Identification of novel disease-gene and disease-disease associations is an important task in biomedical research. Recently, we have developed a Cytoscape app, namely HGPEC, using a state-of-the-art network-based method for such task. This paper describes an upgrading version of HGPEC, namely autoHGPEC, with added automation features. By adding these functions, autoHGPEC can be used as a component of other complex analysis pipelines as well as make use of other data resources. We demonstrated the use of autoHGPEC by predicting novel breast cancer-associated genes and diseases. Further investigation by visualizing and collecting evidences for associations between top 20 ranked genes/diseases and breast cancer has shown the ability of autoHGPEC.
1. autoHGPEC on Cytoscape Apps: http://apps.cytoscape.org/apps/autohgpec
2. User manual can be downloaded at https://sites.google.com/site/duchaule2011/bioinformatics-tools/autohgpec
3. autoHGPEC can be run from within R using RCy3 (https://www.bioconductor.org/packages/release/bioc/html/RCy3.html)
4. Source code: https://github.com/trangtran86/autoHGPEC
5. Archived source code as at time of publication: http://doi.org/10.5281/zenodo.122852131
6. License: MIT
All prerequisite data are already included in the apps. Refer to the user manual (Supplementary File 1) for other additional annotation data such as Gene Ontology.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the rationale for developing the new software tool clearly explained?
Partly
Is the description of the software tool technically sound?
Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?
Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?
Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?
Yes
Competing Interests: No competing interests were disclosed.
Is the rationale for developing the new software tool clearly explained?
Yes
Is the description of the software tool technically sound?
Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?
Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?
Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?
Yes
Competing Interests: No competing interests were disclosed.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 1 24 May 18 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)