Skip to main content
Top

2014 | OriginalPaper | Chapter

Thresholding of Semantic Similarity Networks Using a Spectral Graph-Based Technique

Authors : Pietro Hiram Guzzi, Pierangelo Veltri, Mario Cannataro

Published in: New Frontiers in Mining Complex Patterns

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The functional similarity among terms of an ontology is evaluated by using Semantic Similarity Measures (SSM). In computational biology, biological entities such as genes or proteins are usually annotated with terms extracted from Gene Ontology (GO) and the most common application is to find the similarity or dissimilarity among two entities through the application of SSMs to their annotations. More recently, the extensive application of SSMs yielded to the Semantic Similarity Networks (SSNs). SSNs are edge-weighted graphs where the nodes are concepts (e.g. proteins) and each edge has an associated weight that represents the semantic similarity among related pairs of nodes. Community detection algorithms that analyse SSNs, such as protein complexes prediction or motif extraction, may reveal clusters of functionally associated proteins. Because SSNs have a high number of arcs with low weight, likened to noise, the application of classical clustering algorithms on raw networks exhibits low performance. To improve the performance of such algorithms, a possible approach is to simplify the structure of SSNs through a preprocessing step able to delete arcs likened to noise. Thus we propose a novel preprocessing strategy to simplify SSNs based on an hybrid global-local thresholding approach based on spectral graph theory. As proof of concept we demonstrate that community detection algorithms applied to filtered (thresholded) networks, have better performances in terms of biological relevance of the results, with respect to the use of raw unfiltered networks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ala, U., Piro, R.M., Grassi, E., Damasco, C., Silengo, L., Oti, M., Provero, P., Cunto, F.D.: Prediction of human disease genes by human-mouse conserved coexpression analysis. PLoS Comput. Biol. 4(3), e1000043 (2008)CrossRef Ala, U., Piro, R.M., Grassi, E., Damasco, C., Silengo, L., Oti, M., Provero, P., Cunto, F.D.: Prediction of human disease genes by human-mouse conserved coexpression analysis. PLoS Comput. Biol. 4(3), e1000043 (2008)CrossRef
2.
go back to reference Bader, G.D., Hogue, C.W.V.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 27, 1–27 (2003) Bader, G.D., Hogue, C.W.V.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 27, 1–27 (2003)
3.
go back to reference Bertolazzi, P., Bock, M.E., Guerra, C.: On the functional and structural characterization of hubs in protein-protein interaction networks. Biotechnol. Adv. 31(2), 274–286 (2013)CrossRef Bertolazzi, P., Bock, M.E., Guerra, C.: On the functional and structural characterization of hubs in protein-protein interaction networks. Biotechnol. Adv. 31(2), 274–286 (2013)CrossRef
4.
go back to reference Domany, E., Blatt, M., Wiseman, S.: Superparamagnetic clustering of data. Phys. Rev. Lett. 76(18), 3251–3254 (1996)CrossRef Domany, E., Blatt, M., Wiseman, S.: Superparamagnetic clustering of data. Phys. Rev. Lett. 76(18), 3251–3254 (1996)CrossRef
6.
go back to reference Brohée, S., van Helden, J.: Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinform. 7, 488 (2006)CrossRef Brohée, S., van Helden, J.: Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinform. 7, 488 (2006)CrossRef
7.
go back to reference Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., Apweiler, R.: The gene ontology annotation (goa) database: sharing knowledge in uniprot with gene ontology. Nucl. Acids Res. 32(suppl-1), D262–D266 (2004)CrossRef Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., Apweiler, R.: The gene ontology annotation (goa) database: sharing knowledge in uniprot with gene ontology. Nucl. Acids Res. 32(suppl-1), D262–D266 (2004)CrossRef
8.
go back to reference Cannataro, M., Guzzi, P.H., Veltri, P.: Protein-to-protein interactions: technologies, databases, and algorithms. ACM Comput. Surv. 43, 1:1–1:36 (2010)CrossRef Cannataro, M., Guzzi, P.H., Veltri, P.: Protein-to-protein interactions: technologies, databases, and algorithms. ACM Comput. Surv. 43, 1:1–1:36 (2010)CrossRef
9.
go back to reference Cannataro, M., Guzzi, P.H., Sarica, A.: Data mining and life sciences applications on the grid. Wiley Interdisc. Rev.: Data Mining Knowl. Discov. 3(3), 216–238 (2013) Cannataro, M., Guzzi, P.H., Sarica, A.: Data mining and life sciences applications on the grid. Wiley Interdisc. Rev.: Data Mining Knowl. Discov. 3(3), 216–238 (2013)
10.
go back to reference Chung, F.: Spectral Graph Theory. Regional Conference Series in Mathematics, vol. 92. American Mathematical Society, Providence (1994) Chung, F.: Spectral Graph Theory. Regional Conference Series in Mathematics, vol. 92. American Mathematical Society, Providence (1994)
11.
go back to reference Cvetković, D., Simić, S.K.: Towards a spectral theory of graphs based on the signless laplacian, ii. Linear Algebra Appl. 432(9), 2257–2272 (2010)CrossRefMATHMathSciNet Cvetković, D., Simić, S.K.: Towards a spectral theory of graphs based on the signless laplacian, ii. Linear Algebra Appl. 432(9), 2257–2272 (2010)CrossRefMATHMathSciNet
12.
go back to reference Ding, C., He, X., Zha, H.: A spectral method to separate disconnected and nearly-disconnected web graph components. In: Proceedings of the Seventh ACM International Conference on Knowledge Discovery and Data Mining, San Francisco, 26–29 August 2001 Ding, C., He, X., Zha, H.: A spectral method to separate disconnected and nearly-disconnected web graph components. In: Proceedings of the Seventh ACM International Conference on Knowledge Discovery and Data Mining, San Francisco, 26–29 August 2001
13.
go back to reference Enright, S., Van Dongen, A.J., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30(7), 1575–1584 (2002)CrossRef Enright, S., Van Dongen, A.J., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30(7), 1575–1584 (2002)CrossRef
14.
go back to reference Freeman, T.C., Goldovsky, L., Brosch, M., van Dongen, S., Maziere, P., Grocock, R.J., Freilich, S., Thornton, J., Enright, A.J.: Construction, visualization, and clustering of transcription networks from microarray expression data. PLoS Comput. Biol. 3(10), e206 (2007)CrossRef Freeman, T.C., Goldovsky, L., Brosch, M., van Dongen, S., Maziere, P., Grocock, R.J., Freilich, S., Thornton, J., Enright, A.J.: Construction, visualization, and clustering of transcription networks from microarray expression data. PLoS Comput. Biol. 3(10), e206 (2007)CrossRef
15.
go back to reference Guldener, U., Munsterkotter, M., Oesterheld, M., Pagel, P., Ruepp, A., Mewes, H.W., Stumpflen, V.: Mpact: the mips protein interaction resource on yeast. Nucleic Acids Res. 34, D436–D441 (2006)CrossRef Guldener, U., Munsterkotter, M., Oesterheld, M., Pagel, P., Ruepp, A., Mewes, H.W., Stumpflen, V.: Mpact: the mips protein interaction resource on yeast. Nucleic Acids Res. 34, D436–D441 (2006)CrossRef
16.
go back to reference Guzzi, P.H., Mina, M., Guerra, C., Cannataro, M.: Semantic similarity analysis of protein data: assessment with biological features and issues. Briefings Bioinform. 13(5), 569–585 (2012)CrossRef Guzzi, P.H., Mina, M., Guerra, C., Cannataro, M.: Semantic similarity analysis of protein data: assessment with biological features and issues. Briefings Bioinform. 13(5), 569–585 (2012)CrossRef
17.
go back to reference Harispe, S., Sanchez, D., Ranwez, S., Janaqi, S., Montmain, J.: A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J. Biomed. Inform. 48, 38–53 (2014)CrossRef Harispe, S., Sanchez, D., Ranwez, S., Janaqi, S., Montmain, J.: A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J. Biomed. Inform. 48, 38–53 (2014)CrossRef
18.
go back to reference Ji, J., Zhang, A., Liu, C., Quan, X., Liu, Z.: Survey: functional module detection from protein-protein interaction networks. IEEE Trans. Knowl. Data Eng. 99(PrePrints), 1 (2013) Ji, J., Zhang, A., Liu, C., Quan, X., Liu, Z.: Survey: functional module detection from protein-protein interaction networks. IEEE Trans. Knowl. Data Eng. 99(PrePrints), 1 (2013)
19.
go back to reference King, A.D., Przulj, N., Jurisica, I.: Bioinformatics (Oxford, England) King, A.D., Przulj, N., Jurisica, I.: Bioinformatics (Oxford, England)
20.
go back to reference Lee, H.K., Hsu, A.K., Sajdak, J., Qin, J., Pavlidis, P.: Coexpression analysis of human genes across many microarray data sets. Genome Res. 14, 1085–1094 (2004)CrossRef Lee, H.K., Hsu, A.K., Sajdak, J., Qin, J., Pavlidis, P.: Coexpression analysis of human genes across many microarray data sets. Genome Res. 14, 1085–1094 (2004)CrossRef
21.
go back to reference Lin, D.: An Information-Theoretic Definition of Similarity. Morgan Kaufmann, San Francisco (1998) Lin, D.: An Information-Theoretic Definition of Similarity. Morgan Kaufmann, San Francisco (1998)
22.
go back to reference Ma, X., Gao, L.: Biological network analysis: insights into structure and functions. Briefings Funct. Genomics 11(6), 434–442 (2012)CrossRefMathSciNet Ma, X., Gao, L.: Biological network analysis: insights into structure and functions. Briefings Funct. Genomics 11(6), 434–442 (2012)CrossRefMathSciNet
24.
go back to reference Mina, M., Guzzi, P.H.: Alignmcl: comparative analysis of protein interaction networks through markov clustering. In: BIBM Workshops, pp. 174–181. IEEE (2012) Mina, M., Guzzi, P.H.: Alignmcl: comparative analysis of protein interaction networks through markov clustering. In: BIBM Workshops, pp. 174–181. IEEE (2012)
25.
go back to reference Mohar, B.: The laplacian spectrum of graphs. Graph Theor. Comb. Appl. 2, 871–898 (1991)MathSciNet Mohar, B.: The laplacian spectrum of graphs. Graph Theor. Comb. Appl. 2, 871–898 (1991)MathSciNet
26.
go back to reference Ng, A.Y., Jordan, M.I., Weiss, Y., et al.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2, 849–856 (2002) Ng, A.Y., Jordan, M.I., Weiss, Y., et al.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2, 849–856 (2002)
27.
go back to reference Guzzi, P., Mina, M.: Investigating bias in semantic similarity measures for analysis of protein interactions. In: Proceedings of 1st International Workshop on Pattern Recognition in Proteomics, Structural Biology and Bioinformatics (PR PS BB 2011), pp. 71–80, 13 September 2011 (2012) Guzzi, P., Mina, M.: Investigating bias in semantic similarity measures for analysis of protein interactions. In: Proceedings of 1st International Workshop on Pattern Recognition in Proteomics, Structural Biology and Bioinformatics (PR PS BB 2011), pp. 71–80, 13 September 2011 (2012)
28.
go back to reference Pesquita, C., Faria, D., O Falcão, A., Lord, P., Couto, F.M.: Semantic similarity in biomedical ontologies. PLoS Comput. Biol. 5(7), e1000443 (2009)CrossRef Pesquita, C., Faria, D., O Falcão, A., Lord, P., Couto, F.M.: Semantic similarity in biomedical ontologies. PLoS Comput. Biol. 5(7), e1000443 (2009)CrossRef
29.
go back to reference Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: IJCAI, pp. 448–453 (1995) Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: IJCAI, pp. 448–453 (1995)
30.
go back to reference Rito, T., Wang, Z., Deane, C.M., Reinert, G.: How threshold behaviour affects the use of subgraphs for network comparison. Bioinformatics 26(18), i611–i617 (2010)CrossRef Rito, T., Wang, Z., Deane, C.M., Reinert, G.: How threshold behaviour affects the use of subgraphs for network comparison. Bioinformatics 26(18), i611–i617 (2010)CrossRef
31.
go back to reference Su, G., Kuchinsky, A., Morris, J.H., States, D.J., Meng, F.: Glay: community structure analysis of biological networks. Bioinformatics 26(24), 3135–3137 (2010)CrossRef Su, G., Kuchinsky, A., Morris, J.H., States, D.J., Meng, F.: Glay: community structure analysis of biological networks. Bioinformatics 26(24), 3135–3137 (2010)CrossRef
32.
go back to reference Wang, H., Zheng, H., Azuaje, F.: Ontology- and graph-based similarity assessment in biological networks. Bioinformatics 26(20), 2643–2644 (2010)CrossRef Wang, H., Zheng, H., Azuaje, F.: Ontology- and graph-based similarity assessment in biological networks. Bioinformatics 26(20), 2643–2644 (2010)CrossRef
33.
go back to reference Zhu, X., Gerstein, M., Snyder, M.: Getting connected: analysis and principles of biological networks. Genes Dev. 21(9), 1010–1024 (2007)CrossRef Zhu, X., Gerstein, M., Snyder, M.: Getting connected: analysis and principles of biological networks. Genes Dev. 21(9), 1010–1024 (2007)CrossRef
Metadata
Title
Thresholding of Semantic Similarity Networks Using a Spectral Graph-Based Technique
Authors
Pietro Hiram Guzzi
Pierangelo Veltri
Mario Cannataro
Copyright Year
2014
DOI
https://doi.org/10.1007/978-3-319-08407-7_13

Premium Partner