1 Introduction
Both physical and genetic interaction networks have been instrumental in providing valuable insights into complex biological systems. These insights include understanding how different processes communicate through to knowledge of protein function [
4]. The advent of high-throughput technologies along with traditional small-scale experiments has aided in the systematic identification of pairwise protein interactions [
60] [
37] and protein complexes [
29] [
16]. Public interaction databases including: BioGRID [
6], Human Protein Reference Database (HPRD) [
28], IntAct [
27], Database of Interacting Proteins (DIP) [
40] and Kyoto Encyclopedia of Genes and Genomes (KEGG) [
26] store many interaction and pathway data across diverse organisms [
23]. All these data have been useful as a means to understanding the underlying mechanisms of a cell.
In particular, the construction of protein interaction networks has been beneficial in providing insight into protein function [
29]. Protein–protein interactions (PPIs) play an important role in biological processes. Most proteins perform their functions by interacting with other proteins. Furthermore, they aid in the formation of protein complexes and mediate post-translational protein modifications [
54]. Systematic efforts have been made over the past few years to map the human protein interaction interactome [collection of all human protein–protein interactions (PPIs)]. These have been performed using high-throughput techniques including: yeast two-hybrid (Y2H) [
39], mass spectrometry [
13,
50] and co-affinity purification [
57]. In addition with curation of small-scale experiments and computational approaches [
51], these studies have been advantageous in increasing the coverage of human interactome maps. Furthermore, they have reduced interactome map biases and have been beneficial in providing an estimate of the interactome size [
42]. However, these maps still remain incomplete and noisy, which needs to be taken into consideration when applying these PPIs in studies [
2]. Literature-curated data sets, although richer in interactions, are prone to investigative biases [
60] as they contain more interactions for the more explored disease proteins [
56].
With the emergence of the area “network medicine”, further development of protein interaction maps is essential. Network medicine as described by Barabasi et al. [
2] aims to explore disease complexity through the systematic identification of disease pathways and modules also taking into consideration molecular relationships between phenotypes. Through the analysis of network topology and network dynamics, key discoveries including identification of novel disease genes and pathways, biomarkers and drug targets for disease are advanced [
48]. Key work in the area include the study by Xu et al. [
56] who analyzed topological features of a PPI network. This study observed that hereditary disease-genes from the Online Mendelian Inheritance in Man (OMIM) database [
21] have a larger degree and tendency to interact with other disease-genes in literature curated networks. These tendencies were not observed in networks constructed from high-throughput experiments. Other studies such as Chuang et al. [
9] and Taylor et al. [
46] have indicated that the alterations in the physical interaction network may be an indicator of breast cancer prognosis. The paper by Goh et al. [
17] illustrated that the majority of disease genes are nonessential and are located in the periphery of functional networks. Research by [
14] discovered that genes connected to diseases with similar phenotypes are more likely to interact directly with each other. Network analysis tool such as clustering or graph partitioning have been advantageous in uncovering functional and potential disease modules in the interactome [
35]. The study by Vanunu et al. [
49] applied a diffusion-based method named PRINCE to prioritize genes in prostate cancer, AD and type 2 diabetes.
What underpins these key research studies and future studies is the reliance on human interactome maps which are critical to the understanding of genotype-phenotype relationships [
37]. In this study, we aim into investigate whether experimental data or curated data used to construct a human protein–protein interaction network (PPIN) has an impact upon disease network analysis. Using our previously proposed integrative network-driven pipeline [
5] we integrate diverse heterogeneous data including: gene-expression, PPIN, ontology-based similarity, degree connectivity and betweenness centrality measures to uncover potential disease-candidate genes. To investigate the effect of human PPIN selection, a comparison of disease-gene candidates is presented when different human PPINs are integrated into the framework. Two PPINs have been selected for this study: (1) the recently published proteome-scale map of the Human Interactome Network by Rolland et al. [
37] which is referred to as PPIN_HTP and (2) a literature curated map obtained from extracting binary PPIs from public databases referred to as PPIN_LIT [
51]. To illustrate PPIN impact on disease-gene selection, Alzheimer’s disease (AD) has been selected as a Case Study. AD is a genetically complex disease whereby patients present with progressive dementia [
10]. It is the most common form of age cognitive impairment [
47]. It is characterized by the loss of neurons along with the presence of axonal dystrophy, mature senile plaques and neurofibrillary tangles [
34]. Gene expression profiling studies have been successful in identifying AD affected pathways across different brain areas and tissues including: mitochondrial function, intracellular signaling and neuroinflammation [
10]. To evaluate the impact of PPIN selection on the disease-gene selection process we perform biological process enrichment analysis and compare the candidate gene list to a manually curated reference dataset of verified known and susceptible AD disease genes. Furthermore, we investigate the tissues in which AD candidate disease-genes are expressed through incorporation of tissue-specific expression data.
The remainder of the paper is organized as follows, in Sect.
2 the integrative framework is described along with details on datasets and PPINs used in the analysis. Section
3 provides a summary of the results obtained and conclusions along with future work is presented in Sect.
4.
4 Conclusions
The development of high-throughput techniques along with the emergence of network medicine is aiding our understanding of disease and the interrelatedness of disease-related genes and protein [
2]. Network theory has been useful in the study of complex neurodegenerative diseases such as AD, Parksinson’s Disease [
36] and Multiple Sclerosis [
48]. In this study we have highlighted AD as a Case Study in disease network analysis. AD is the most common neurodegenerative disease. Presently, AD therapies are only symptomatic, therefore, an important health priority is the development of novel therapies to impede its progress [
18]. The integration of PPINs along with disease datasets is an important tool in unraveling the molecular basis of diseases. This integration can provide identification of genes and proteins associated with diseases, an understanding of disease-network properties, identification of subnetworks, and network-based disease gene classification [
43]. However, the map of the binary human PPIN is still incomplete. The study by Yu et al. [
60] suggested that high-throughput Y2H datasets contained more false positives compared to literature-curated datasets. Whereas, Rolland et al. [
37] observed that literature-curated PPINs are highly biased and only cover a small portion of the interactome.
In this study we presented an evaluation between PPINs constructed using data obtained from experimental high throughput experiments compared to curated data and their affect on identifying candidate AD disease genes through network analysis and integration. We firstly observed limited overlap (305 protein pairs) between the AD specific PPIN_LIT and PPIN_HTP. Furthermore, using the integrative framework to identify significant AD gene candidates no overlap between significant AD gene candidate genes identified using the literature derived PPIN compared to the PPIN constructed from high-throughput data were observed. In terms of enrichment analysis, a strong performance was observed for significant gene hubs identified using the PPIN_LIT. Compared to the PPIN_HTP, a larger proportion of terms are enriched in both the GO and KEGG pathways. In addition, gene candidates from the literature based PPIN are modulated in AD pathogenesis such as neuron differentiation and involved in KEGG pathways such as neurotrophin signaling pathways. Interestingly, the AD susceptible TRAF1 gene was identified by both analysis using the PPIN_LIT and PPIN_HTP networks. Through tissue specific expression analysis we observed that 48 % of AD gene candidates obtained from the literature curated PPIN and 19 % of gene candidates extracted using the high-throughput PPIN were found to be expressed in the whole brain and prefrontal cortex tissues. In summary, we could reason that the PPIN_LIT outperforms the PPIN_HTP in terms of enrichment and tissue analysis along with reference dataset comparison. However, it is important to take into consideration the limited availability and coverage of tissue-specific data [
20] along with the possibility that significant genes identified by the PPIN_HTP may still be meaningful but have not been identified due to sociological or experimental biases [
37].
As more high-throughput experiments are performed such as Y2H, the coverage of the human interactome continues to improve. This increased coverage, quality, and diversity of human PPIN data will provide further opportunities for the molecular characterization and understanding of human disease [
2]. In future work we aim to integrate the high quality binary pairs obtained from literature curation with experimental binary interaction maps increasing the coverage of the interactome.