Abstract
Cellular phenotypes result from the combined effect of multiple genes, and high-throughput techniques such as DNA microarrays and deep sequencing allow monitoring this genomic complexity. The large scale of the resulting data, however, creates challenges for interpreting results, as primary analysis often yields hundreds of genes. Gene Ontology (GO), a controlled vocabulary for gene products, enables semantic analysis of such gene sets. GO can be used to define semantic similarity between genes, which enables semantic clustering to reduce the complexity of a result set. Here, we describe how to compute semantic similarities and perform GO-based gene clustering using csbl.go, an R package for GO semantic similarity. We demonstrate the approach with expression profiles from breast cancer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hanahan D, Weinberg RA (2011) Hallmarks of cancer: the next generation. Cell 144:646–674
Vogelstein B, Papadopoulos N, Velculescu VE et al (2013) Cancer genome landscapes. Science 339:1546–1558
Ashburner M, Ball C, Blake J et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29
Rebhan M, Chalifa-Caspi V, Prilusky J et al (1998) GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics 14:656–664
Guzzi PH, Mina M, Guerra C et al (2012) Semantic similarity analysis of protein data: assessment with biological features and issues. Brief Bioinform 13:569–585
Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. Proceedings of the 14th international joint conference on artificial intelligence, vol 1, pp 448–453
Lord P, Stevens R, Brass A et al (2003) Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Bioinformatics 19:1275–1283
Mazandu GK, Mulder NJ (2013) Information content-based gene ontology semantic similarity approaches: toward a unified framework theory. BioMed Res In 2013:292063
Harispe S, Sánchez D, Ranwez S et al (2014) A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J Biomed Inform 48:38–53
Ovaska K, Laakso M, Hautaniemi S (2008) Fast gene ontology based clustering for microarray experiments. BioData Mining 1:11
The Cancer Genome Atlas Network (2012) Comprehensive molecular portraits of human breast tumours. Nature 490:61–70
Gentleman RC, Carey VJ, Bates DM et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80
Lin D (1998) An information-theoretic definition of similarity. Proceedings of the 15th international conference on machine learning, pp 296–304
Jiang J, Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of international conference on research in computational linguistics, pp 19–33
Schlicker A, Domingues F, Rahnenführer J et al (2006) A new measure for functional similarity of gene products based on gene ontology. BMC Bioinformatics 7:302
Huang D, Sherman B, Tan Q et al (2007) The DAVID gene functional classification tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol 8:R183
Bodenreider O, Aubry M, Burgun A (2005) Non-lexical approaches to identifying associative relations in the gene ontology. Pac Symp Biocomput 2005:91–102
Pesquita C, Faria D, Bastos H et al (2008) Metrics for GO based protein semantic similarity: a systematic evaluation. BMC Bioinformatics 9:S4
Brun C, Chevenet F, Martin D et al (2004) Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network. Genome Biol 5:6
Couto FM, Silva MJ, Coutinho PM (2007) Measuring semantic similarity between gene ontology terms. Data Knowl Eng 61:137–152
Yu G, Li F, Qin Y et al (2010) GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26:976–978
Frohlich H, Speer N, Poustka A et al (2007) GOSim – an R-package for computation of information theoretic GO similarities between terms and gene products. BMC Bioinformatics 8:166
Harispe S, Ranwez S, Janaqi S et al (2014) The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies. Bioinformatics 30:740–742
Ovaska K, Laakso M, Haapa-Paananen S et al (2010) Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme. Genome Med 2:65
Acknowledgements
I thank Tiia Pelkonen for proofreading.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media New York
About this protocol
Cite this protocol
Ovaska, K. (2015). Using Semantic Similarities and csbl.go for Analyzing Microarray Data. In: Guzzi, P. (eds) Microarray Data Analysis. Methods in Molecular Biology, vol 1375. Humana Press, New York, NY. https://doi.org/10.1007/7651_2015_241
Download citation
DOI: https://doi.org/10.1007/7651_2015_241
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-3172-9
Online ISBN: 978-1-4939-3173-6
eBook Packages: Springer Protocols