Skip to main content
Top

2019 | OriginalPaper | Chapter

Significance of Global Vectors Representation in Protein Sequences Analysis

Authors : Anon George, H. B. Barathi Ganesh, M. Anand Kumar, K. P. Soman

Published in: Computer Aided Intervention and Diagnostics in Clinical and Medical Images

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Understanding the meaning of protein sequences is tedious with human efforts alone. Through this work, we experiment an NLP technique to extract features and give appropriate representation for the protein sequences. In this paper, we have used GloVe representation for the same. A dataset named Swiss-Prot has been incorporated into this work. We were able to create a representation that has comparable ability to understand the semantics of protein sequences compared to the existing ones. We have analyzed the performance of representation by the classification of different protein families in the Swiss-Prot dataset using machine learning technique. The analysis done by us proved the significance of this representation.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ando D, Colvin M, Rexach M, Gopinathan A (2013) Physical motif clustering within intrinsically disordered nucleoporin sequences reveals universal functional features. PloS One 8(9):e73,831CrossRef Ando D, Colvin M, Rexach M, Gopinathan A (2013) Physical motif clustering within intrinsically disordered nucleoporin sequences reveals universal functional features. PloS One 8(9):e73,831CrossRef
2.
go back to reference Asgari E, Mofrad MR (2015) Continuous distributed representation of biological sequences for deep proteomics and genomics. PloS One 10(11):e0141,287CrossRef Asgari E, Mofrad MR (2015) Continuous distributed representation of biological sequences for deep proteomics and genomics. PloS One 10(11):e0141,287CrossRef
3.
go back to reference Balakrishnan BGH, Vinayakumar AKM, Padannayil SK. Nlp cen amrita@ smm4h: health care text classification through class embeddings Balakrishnan BGH, Vinayakumar AKM, Padannayil SK. Nlp cen amrita@ smm4h: health care text classification through class embeddings
4.
go back to reference Barathi Ganesh H, Anand Kumar M, Soman K (2016) Distributional semantic representation in health care text classification. CEUR 1737 Barathi Ganesh H, Anand Kumar M, Soman K (2016) Distributional semantic representation in health care text classification. CEUR 1737
5.
go back to reference Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y (1998) Predicting function: from genes to genomes and back1. J Mol Biol 283(4):707–725CrossRef Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y (1998) Predicting function: from genes to genomes and back1. J Mol Biol 283(4):707–725CrossRef
6.
go back to reference Cai C, Han L, Ji ZL, Chen X, Chen YZ (2003) Svm-prot: web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res 31(13):3692–3697CrossRef Cai C, Han L, Ji ZL, Chen X, Chen YZ (2003) Svm-prot: web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res 31(13):3692–3697CrossRef
7.
go back to reference Ganesh HB, Kumar MA, Soman K (2016) From vector space models to vector space models of semantics. In: Forum for information retrieval evaluation. Springer, Berlin, pp 50–60 Ganesh HB, Kumar MA, Soman K (2016) From vector space models to vector space models of semantics. In: Forum for information retrieval evaluation. Springer, Berlin, pp 50–60
8.
go back to reference George A, Soman K et al (2018) Teamcen at semeval-2018 task 1: global vectors representation in emotion detection. In: Proceedings of the 12th international workshop on semantic evaluation, pp 334–338 George A, Soman K et al (2018) Teamcen at semeval-2018 task 1: global vectors representation in emotion detection. In: Proceedings of the 12th international workshop on semantic evaluation, pp 334–338
9.
go back to reference Huynen M, Snel B, Lathe W, Bork P (2000) Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res 10(8):1204–1210CrossRef Huynen M, Snel B, Lathe W, Bork P (2000) Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res 10(8):1204–1210CrossRef
10.
go back to reference Lasko TA, Denny JC, Levy MA (2013) Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data. PloS One 8(6):e66,341CrossRef Lasko TA, Denny JC, Levy MA (2013) Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data. PloS One 8(6):e66,341CrossRef
11.
go back to reference Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119 Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
12.
go back to reference Motomura K, Fujita T, Tsutsumi M, Kikuzato S, Nakamura M, Otaki JM (2012) Word decoding of protein amino acid sequences with availability analysis: a linguistic approach. PloS One 7(11):e50,039CrossRef Motomura K, Fujita T, Tsutsumi M, Kikuzato S, Nakamura M, Otaki JM (2012) Word decoding of protein amino acid sequences with availability analysis: a linguistic approach. PloS One 7(11):e50,039CrossRef
13.
go back to reference Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) Scop: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247(4):536–540 Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) Scop: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247(4):536–540
14.
go back to reference Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543 Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
16.
go back to reference Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300CrossRef Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300CrossRef
Metadata
Title
Significance of Global Vectors Representation in Protein Sequences Analysis
Authors
Anon George
H. B. Barathi Ganesh
M. Anand Kumar
K. P. Soman
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-04061-1_27