research-article

Towards the assessment of semantic similarity analysis of protein data: main approaches and issues

Authors:
Pietro Hiram Guzzi

University of Catanzaro

University of Catanzaro
View Profile

,
Marco Mina

University of Padova

University of Padova
View Profile

Authors Info & Claims

ACM SIGBioinformatics Record Volume 2 Issue 3September 2012pp 17–18https://doi.org/10.1145/2384691.2384694

Published:01 September 2012Publication History

ACM SIGBioinformatics Record

Abstract

Bioinformatics approaches to the study of proteins yield to the introduction of different methodologies and related tools for the analysis of different types of data related to proteins, ranging from primary, secondary and tertiary structures to interaction data [1], not to mention functional knowledge.

One of the most advanced tools for encoding and representing functional knowledge in a formal way is the Gene Ontology (GO) [2,3]. It is composed of three ontologies, named Biological Process (BP), Molecular Function (MF) and Cellular Component (CC). Each ontology consists of a set of terms (GO terms) representing different functions, biological processes and cellular components within the cell. GO terms are connected each other to form a hierarchical graph. Terms representing similar functions are close to each other within this graph.

Biological molecules are associated with GO terms that represent their functions, biological roles and localization. This process, usually referred to as annotation process, can be performed under the supervision of an expert or in a fully automated way. Obviously, computationally inferred annotations, commonly known as Electronically Inferred Annotation (IEA), are not as reliable as experimentally determined annotations. For this reason every annotation is labeled with an Evidence Code (EC) that keeps track of the type of process used to produce the annotation itself. Considering the release of annotations of April, 2010, about the 98% of all the annotations is an IEA annotation [4].

The term annotation corpus is commonly used to identify all the annotations involving a set of proteins or genes, usually referring the whole proteomes and genomes (i.e. the annotation corpus of yeast). For lack of space we do not further describe the Gene Ontology. A comprehensive review has been provided by du Plessis et al. [4] and by Guzzi et al. [5].

The availability of well formalized functional data enabled the use of computational methods to analyse genes and proteins from the functional point of view. For example, a set of algorithms, known as functional enrichment algorithms, have been developed to determine the statistical significance of the presence (or the absence) of a GO Term in a set of gene products. A detailed review of these algorithms can be found in [4].

An interesting problem is how to express quantitatively the relationships between GO terms. Several measures, referred to as (term) semantic similarity (SS) measures, has been introduced in the last decade. Given two or more GO terms, they try to quantify the similarity of the functional aspects represented by the terms within the cell. Exploiting annotation corpora, semantic similarity measures have been further extended to the evaluation of the similarity of genes and proteins on the basis of their annotations.

Many different works have focused on the following tasks: (i) the definition of ad-hoc semantic similarity measures tailored to the characteristics of Gene Ontology; (ii) the definition of measures of comparison of genes and proteins; (iii) the introduction of methodologies for the systematic assessment of semantic similarity measures; (iv) the use of semantic similarity measures in many different contexts and applications. Despite its relevance, the application of semantic similarity for the systematic analysis of protein data is still an open research area. There are, in fact, two main questions that have to be addressed: (i) the systematic assessment of SS with respect to other biological features, i.e. how much an high or a low value of SS is biologically meaningful; (ii) how reliable are the SS themselves, i.e. is there any systematic error or bias in the calculation of SS? Both these problems are relevant for the diffusion of SS measures; while in the first case several approaches have been proposed, confronting SS measures with a pletora of different biological features, only few works dealt with the second problem in a systematic way [5,6,7].

References

Mario Cannataro, Pietro Hiram Guzzi, Pierangelo Veltri. Protein-to-protein interactions: Technologies, databases, and algorithms. ACM Comput. Surv. 43(1):1, 2010. Google ScholarDigital Library
Francisco Azuaje, Haiying Wang, and Olivier Bodenreider. Ontology-driven similarity approaches to supporting gene functional assessment. Proc. of The Eighth Annual Bio-Ontologies Meeting, pp. 9--10, 2005.Google Scholar
M. A. Harris, J. Clark, A. Ireland, J. Lomax, M. Ashburner, R. Foulger, K. Eilbeck, S. Lewis, B. Marshall, C. Mungall, J. Richter, G. M. Rubin, J. A. Blake, C. Bult, M. Dolan, H. Drabkin, J. T. Eppig, D. P. Hill, L. Ni, M. Ringwald, R. Balakrishnan, J. M. Cherry, K. R. Christie, M. C. Costanzo, S. S. Dwight, S. Engel, D. G. Fisk, J. E. Hirschman, E. L. Hong, R. S. Nash, A. Sethuraman, C. L. Theesfeld, D. Botstein, K. Dolinski, B. Feierbach, T. Berardini, S. Mundodi, S. Y. Rhee, R. Apweiler, D. Barrell, E. Camon, E. Dimmer, V. Lee, R. Chisholm, P. Gaudet, W. Kibbe, R. Kishore, E. M. Schwarz, P. Sternberg, M. Gwinn, L. Hannick, J. Wortman, M. Berriman, V. Wood, P. Tonellato, P. Jaiswal, T. Seigfried, and R. White. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research, 32 (Database Issue):258--261, 2004.Google Scholar
Louis du Plessis, Nives kunca and Christophe Dessimoz. The what, where, how and why of gene ontologya primer for bioinformaticians. Briefings in Bioinformatics, 2011.Google ScholarCross Ref
Pietro Hiram Guzzi, Marco Mina, Concettina Guerra and Mario Cannataro. Semantic Similarity Measures: Assessment with biological features and Issues. Briefings In Bioinformatics, 10.1093/bib/BBR066, 2012.Google Scholar
Da Wei Huang, Brad T. Sherman and Richard A. Lempicki. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Research, 37(1):1--13, 2009.Google ScholarCross Ref
Young-Rae Cho, Woochang Hwang, Murali Ramanathan and Aidong Zhang. Semantic integration to identify overlapping functional modules in protein interaction networks. BMC bioinformatics, 8:265, 2007.Google ScholarCross Ref

Index Terms

Towards the assessment of semantic similarity analysis of protein data: main approaches and issues

Recommendations

Gene expression and protein---protein interaction data for identification of colon cancer related genes using f-information measures

One of the most important and challenging problems in functional genomics is how to select the disease genes. In this regard, the paper presents a new computational method to identify disease genes. It judiciously integrates the information of gene ...
Read More
Finding disease similarity based on implicit semantic similarity

Genomics has contributed to a growing collection of gene-function and gene-disease annotations that can be exploited by informatics to study similarity between diseases. This can yield insight into disease etiology, reveal common pathophysiology and/or ...
Read More
Improving disease gene prioritization using the semantic similarity of Gene Ontology terms

Motivation: Many hereditary human diseases are polygenic, resulting from sequence alterations in multiple genes. Genomic linkage and association studies are commonly performed for identifying disease-related genes. Such studies often yield lists of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGBioinformatics Record Volume 2, Issue 3
September 2012
20 pages
ISSN:2331-9291
EISSN:2159-1210
DOI:10.1145/2384691
Issue’s Table of Contents

Copyright © 2012 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2012
Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 78
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards the assessment of semantic similarity analysis of protein data: main approaches and issues

ACM SIGBioinformatics Record

Abstract

References

Cited By

Index Terms

Recommendations

Gene expression and protein---protein interaction data for identification of colon cancer related genes using f-information measures

Finding disease similarity based on implicit semantic similarity

Improving disease gene prioritization using the semantic similarity of Gene Ontology terms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Towards the assessment of semantic similarity analysis of protein data: main approaches and issues

ACM SIGBioinformatics Record

Abstract

References

Cited By

Index Terms

Recommendations

Gene expression and protein---protein interaction data for identification of colon cancer related genes using f-information measures

Finding disease similarity based on implicit semantic similarity

Improving disease gene prioritization using the semantic similarity of Gene Ontology terms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media