Skip to main content
Top

2019 | OriginalPaper | Chapter

Protein Remote Homology Detection Based on Profiles

Authors : Qing Liao, Mingyue Guo, Bin Liu

Published in: Bioinformatics and Biomedical Engineering

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

As a most important task in protein sequence analysis, protein remote homology detection has been extensively studied for decades. Currently, the profile-based methods show the state-of-the-art performance. Position-Specific Frequency Matrix (PSFM) is a widely used profile. The reason is that this profile contains evolutionary information, which is critical for protein sequence analysis. However, there exists noise information in the profiles introduced by the amino acids with low frequencies, which are not likely to occur in the corresponding sequence positions during evolutionary process. In this study, we propose one method to remove the noise information in the PSFM by removing the amino acids with low frequencies and two a profile can be generated, called Top frequency profile (TFP). Autocross covariance (ACC) transformation is performed on the profile to convert them into fixed length feature vectors. Combined with Support Vector Machines (SVMs), the predictor is constructed. Evaluated on a benchmark dataset, experimental results show that the proposed method outperforms other state-of-the-art predictors for protein remote homology detection, indicating that the proposed method is useful tools for protein sequence analysis. Because the profiles generated from multiple sequence alignments are important for protein structure and function prediction, the TFP will has many potential applications.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Liu, B., et al.: Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics 30, 472–479 (2014)CrossRef Liu, B., et al.: Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics 30, 472–479 (2014)CrossRef
2.
go back to reference Chen, J., Guo, M., Wang, X., Liu, B.: A comprehensive review and comparison of different computational methods for protein remote homology detection. Brief. Bioinform. 9, 231–244 (2018)CrossRef Chen, J., Guo, M., Wang, X., Liu, B.: A comprehensive review and comparison of different computational methods for protein remote homology detection. Brief. Bioinform. 9, 231–244 (2018)CrossRef
3.
go back to reference Zhao, X., Zou, Q., Liu, B., Liu, X.: Exploratory predicting protein folding model with random forest and hybrid features. Curr. Proteomics 11, 289–299 (2014)CrossRef Zhao, X., Zou, Q., Liu, B., Liu, X.: Exploratory predicting protein folding model with random forest and hybrid features. Curr. Proteomics 11, 289–299 (2014)CrossRef
4.
go back to reference Wei, L., Zou, Q.: Recent progresses in machine learning-based methods for protein fold recognition. Int. J. Mol. Sci. 17, 2118 (2016)CrossRef Wei, L., Zou, Q.: Recent progresses in machine learning-based methods for protein fold recognition. Int. J. Mol. Sci. 17, 2118 (2016)CrossRef
5.
go back to reference Leslie, C.S., Eskin, E., Noble, W.S.: Pacific Symposium on Biocomputing, vol. 7, pp. 566–575. World Scientific (2002) Leslie, C.S., Eskin, E., Noble, W.S.: Pacific Symposium on Biocomputing, vol. 7, pp. 566–575. World Scientific (2002)
6.
go back to reference Li, D., Ju, Y., Zou, Q.: Protein folds prediction with hierarchical structured SVM. Curr. Proteomics 13, 79–85 (2016)CrossRef Li, D., Ju, Y., Zou, Q.: Protein folds prediction with hierarchical structured SVM. Curr. Proteomics 13, 79–85 (2016)CrossRef
7.
go back to reference Gribskov, M., McLachlan, A.D., Eisenberg, D.: Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. 84, 4355–4358 (1987)CrossRef Gribskov, M., McLachlan, A.D., Eisenberg, D.: Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. 84, 4355–4358 (1987)CrossRef
8.
go back to reference Zou, Q., Hu, Q., Guo, M., Wang, G.: HAlign: fast multiple similar DNA/RNA sequence alignment based on the centre star strategy. Bioinformatics 31, 2475–2481 (2015)CrossRef Zou, Q., Hu, Q., Guo, M., Wang, G.: HAlign: fast multiple similar DNA/RNA sequence alignment based on the centre star strategy. Bioinformatics 31, 2475–2481 (2015)CrossRef
9.
go back to reference Li, S., Chen, J., Liu, B.: Protein remote homology detection based on bidirectional long short-term memory. BMC Bioinform. 18, 443 (2017)CrossRef Li, S., Chen, J., Liu, B.: Protein remote homology detection based on bidirectional long short-term memory. BMC Bioinform. 18, 443 (2017)CrossRef
10.
go back to reference Liu, B., Wang, X., Lin, L., Dong, Q., Wang, X.: A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis. BMC Bioinform. 9, 510 (2008)CrossRef Liu, B., Wang, X., Lin, L., Dong, Q., Wang, X.: A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis. BMC Bioinform. 9, 510 (2008)CrossRef
11.
go back to reference Dong, Q., Zhou, S., Guan, J.: A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics 25, 2655–2662 (2009)CrossRef Dong, Q., Zhou, S., Guan, J.: A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics 25, 2655–2662 (2009)CrossRef
12.
go back to reference Wei, L., Liao, M., Gao, X., Zou, Q.: Enhanced protein fold prediction method through a novel feature extraction technique. IEEE Trans. Nanobiosci. 14, 649–659 (2015)CrossRef Wei, L., Liao, M., Gao, X., Zou, Q.: Enhanced protein fold prediction method through a novel feature extraction technique. IEEE Trans. Nanobiosci. 14, 649–659 (2015)CrossRef
13.
go back to reference Wei, L., Liao, M., Gao, X., Zou, Q.: An improved protein structural classes prediction method by incorporating both sequence and structure information. IEEE Trans. Nanobiosci. 14, 339–349 (2015)CrossRef Wei, L., Liao, M., Gao, X., Zou, Q.: An improved protein structural classes prediction method by incorporating both sequence and structure information. IEEE Trans. Nanobiosci. 14, 339–349 (2015)CrossRef
14.
go back to reference Liu, B., Liu, F., Wang, X., Chen, J., Fang, L., Chou, K.-C.: Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res. 43, W65–W71 (2015)CrossRef Liu, B., Liu, F., Wang, X., Chen, J., Fang, L., Chou, K.-C.: Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res. 43, W65–W71 (2015)CrossRef
15.
go back to reference Rangwala, H., Karypis, G.: Profile-based direct kernels for remote homology detection and fold recognition. Bioinformatics 21, 4239–4247 (2005)CrossRef Rangwala, H., Karypis, G.: Profile-based direct kernels for remote homology detection and fold recognition. Bioinformatics 21, 4239–4247 (2005)CrossRef
16.
go back to reference Brenner, S.E., Koehl, P., Levitt, M.: The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res. 28, 254–256 (2000)CrossRef Brenner, S.E., Koehl, P., Levitt, M.: The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res. 28, 254–256 (2000)CrossRef
17.
go back to reference Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comput. Biol. 10, 857–868 (2003)CrossRef Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comput. Biol. 10, 857–868 (2003)CrossRef
18.
go back to reference Altschul, S.F., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)CrossRef Altschul, S.F., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)CrossRef
19.
go back to reference Holm, L., Sander, C.: Removing near-neighbour redundancy from large protein sequence collections. Bioinformatics 14, 423–429 (1998)CrossRef Holm, L., Sander, C.: Removing near-neighbour redundancy from large protein sequence collections. Bioinformatics 14, 423–429 (1998)CrossRef
22.
go back to reference Yu, X., Cao, J., Cai, Y., Shi, T., Li, Y.: Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines. J. Theor. Biol. 240, 175–184 (2006)CrossRef Yu, X., Cao, J., Cai, Y., Shi, T., Li, Y.: Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines. J. Theor. Biol. 240, 175–184 (2006)CrossRef
23.
go back to reference Saigo, H., Vert, J.P., Ueda, N., Akutsu, T.: Protein homology detection using string alignment kernels. Bioinformatics 20, 1682–1689 (2004)CrossRef Saigo, H., Vert, J.P., Ueda, N., Akutsu, T.: Protein homology detection using string alignment kernels. Bioinformatics 20, 1682–1689 (2004)CrossRef
24.
go back to reference Hochreiter, S., Heusel, M., Obermayer, K.: Fast model-based protein homology detection without alignment. Bioinformatics 23, 1728–1736 (2007)CrossRef Hochreiter, S., Heusel, M., Obermayer, K.: Fast model-based protein homology detection without alignment. Bioinformatics 23, 1728–1736 (2007)CrossRef
Metadata
Title
Protein Remote Homology Detection Based on Profiles
Authors
Qing Liao
Mingyue Guo
Bin Liu
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-17938-0_24

Premium Partner