Skip to main content

2019 | OriginalPaper | Buchkapitel

Protein Remote Homology Detection Based on Profiles

verfasst von : Qing Liao, Mingyue Guo, Bin Liu

Erschienen in: Bioinformatics and Biomedical Engineering

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As a most important task in protein sequence analysis, protein remote homology detection has been extensively studied for decades. Currently, the profile-based methods show the state-of-the-art performance. Position-Specific Frequency Matrix (PSFM) is a widely used profile. The reason is that this profile contains evolutionary information, which is critical for protein sequence analysis. However, there exists noise information in the profiles introduced by the amino acids with low frequencies, which are not likely to occur in the corresponding sequence positions during evolutionary process. In this study, we propose one method to remove the noise information in the PSFM by removing the amino acids with low frequencies and two a profile can be generated, called Top frequency profile (TFP). Autocross covariance (ACC) transformation is performed on the profile to convert them into fixed length feature vectors. Combined with Support Vector Machines (SVMs), the predictor is constructed. Evaluated on a benchmark dataset, experimental results show that the proposed method outperforms other state-of-the-art predictors for protein remote homology detection, indicating that the proposed method is useful tools for protein sequence analysis. Because the profiles generated from multiple sequence alignments are important for protein structure and function prediction, the TFP will has many potential applications.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Liu, B., et al.: Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics 30, 472–479 (2014)CrossRef Liu, B., et al.: Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics 30, 472–479 (2014)CrossRef
2.
Zurück zum Zitat Chen, J., Guo, M., Wang, X., Liu, B.: A comprehensive review and comparison of different computational methods for protein remote homology detection. Brief. Bioinform. 9, 231–244 (2018)CrossRef Chen, J., Guo, M., Wang, X., Liu, B.: A comprehensive review and comparison of different computational methods for protein remote homology detection. Brief. Bioinform. 9, 231–244 (2018)CrossRef
3.
Zurück zum Zitat Zhao, X., Zou, Q., Liu, B., Liu, X.: Exploratory predicting protein folding model with random forest and hybrid features. Curr. Proteomics 11, 289–299 (2014)CrossRef Zhao, X., Zou, Q., Liu, B., Liu, X.: Exploratory predicting protein folding model with random forest and hybrid features. Curr. Proteomics 11, 289–299 (2014)CrossRef
4.
Zurück zum Zitat Wei, L., Zou, Q.: Recent progresses in machine learning-based methods for protein fold recognition. Int. J. Mol. Sci. 17, 2118 (2016)CrossRef Wei, L., Zou, Q.: Recent progresses in machine learning-based methods for protein fold recognition. Int. J. Mol. Sci. 17, 2118 (2016)CrossRef
5.
Zurück zum Zitat Leslie, C.S., Eskin, E., Noble, W.S.: Pacific Symposium on Biocomputing, vol. 7, pp. 566–575. World Scientific (2002) Leslie, C.S., Eskin, E., Noble, W.S.: Pacific Symposium on Biocomputing, vol. 7, pp. 566–575. World Scientific (2002)
6.
Zurück zum Zitat Li, D., Ju, Y., Zou, Q.: Protein folds prediction with hierarchical structured SVM. Curr. Proteomics 13, 79–85 (2016)CrossRef Li, D., Ju, Y., Zou, Q.: Protein folds prediction with hierarchical structured SVM. Curr. Proteomics 13, 79–85 (2016)CrossRef
7.
Zurück zum Zitat Gribskov, M., McLachlan, A.D., Eisenberg, D.: Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. 84, 4355–4358 (1987)CrossRef Gribskov, M., McLachlan, A.D., Eisenberg, D.: Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. 84, 4355–4358 (1987)CrossRef
8.
Zurück zum Zitat Zou, Q., Hu, Q., Guo, M., Wang, G.: HAlign: fast multiple similar DNA/RNA sequence alignment based on the centre star strategy. Bioinformatics 31, 2475–2481 (2015)CrossRef Zou, Q., Hu, Q., Guo, M., Wang, G.: HAlign: fast multiple similar DNA/RNA sequence alignment based on the centre star strategy. Bioinformatics 31, 2475–2481 (2015)CrossRef
9.
Zurück zum Zitat Li, S., Chen, J., Liu, B.: Protein remote homology detection based on bidirectional long short-term memory. BMC Bioinform. 18, 443 (2017)CrossRef Li, S., Chen, J., Liu, B.: Protein remote homology detection based on bidirectional long short-term memory. BMC Bioinform. 18, 443 (2017)CrossRef
10.
Zurück zum Zitat Liu, B., Wang, X., Lin, L., Dong, Q., Wang, X.: A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis. BMC Bioinform. 9, 510 (2008)CrossRef Liu, B., Wang, X., Lin, L., Dong, Q., Wang, X.: A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis. BMC Bioinform. 9, 510 (2008)CrossRef
11.
Zurück zum Zitat Dong, Q., Zhou, S., Guan, J.: A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics 25, 2655–2662 (2009)CrossRef Dong, Q., Zhou, S., Guan, J.: A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics 25, 2655–2662 (2009)CrossRef
12.
Zurück zum Zitat Wei, L., Liao, M., Gao, X., Zou, Q.: Enhanced protein fold prediction method through a novel feature extraction technique. IEEE Trans. Nanobiosci. 14, 649–659 (2015)CrossRef Wei, L., Liao, M., Gao, X., Zou, Q.: Enhanced protein fold prediction method through a novel feature extraction technique. IEEE Trans. Nanobiosci. 14, 649–659 (2015)CrossRef
13.
Zurück zum Zitat Wei, L., Liao, M., Gao, X., Zou, Q.: An improved protein structural classes prediction method by incorporating both sequence and structure information. IEEE Trans. Nanobiosci. 14, 339–349 (2015)CrossRef Wei, L., Liao, M., Gao, X., Zou, Q.: An improved protein structural classes prediction method by incorporating both sequence and structure information. IEEE Trans. Nanobiosci. 14, 339–349 (2015)CrossRef
14.
Zurück zum Zitat Liu, B., Liu, F., Wang, X., Chen, J., Fang, L., Chou, K.-C.: Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res. 43, W65–W71 (2015)CrossRef Liu, B., Liu, F., Wang, X., Chen, J., Fang, L., Chou, K.-C.: Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res. 43, W65–W71 (2015)CrossRef
15.
Zurück zum Zitat Rangwala, H., Karypis, G.: Profile-based direct kernels for remote homology detection and fold recognition. Bioinformatics 21, 4239–4247 (2005)CrossRef Rangwala, H., Karypis, G.: Profile-based direct kernels for remote homology detection and fold recognition. Bioinformatics 21, 4239–4247 (2005)CrossRef
16.
Zurück zum Zitat Brenner, S.E., Koehl, P., Levitt, M.: The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res. 28, 254–256 (2000)CrossRef Brenner, S.E., Koehl, P., Levitt, M.: The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res. 28, 254–256 (2000)CrossRef
17.
Zurück zum Zitat Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comput. Biol. 10, 857–868 (2003)CrossRef Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comput. Biol. 10, 857–868 (2003)CrossRef
18.
Zurück zum Zitat Altschul, S.F., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)CrossRef Altschul, S.F., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)CrossRef
19.
Zurück zum Zitat Holm, L., Sander, C.: Removing near-neighbour redundancy from large protein sequence collections. Bioinformatics 14, 423–429 (1998)CrossRef Holm, L., Sander, C.: Removing near-neighbour redundancy from large protein sequence collections. Bioinformatics 14, 423–429 (1998)CrossRef
22.
Zurück zum Zitat Yu, X., Cao, J., Cai, Y., Shi, T., Li, Y.: Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines. J. Theor. Biol. 240, 175–184 (2006)CrossRef Yu, X., Cao, J., Cai, Y., Shi, T., Li, Y.: Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines. J. Theor. Biol. 240, 175–184 (2006)CrossRef
23.
Zurück zum Zitat Saigo, H., Vert, J.P., Ueda, N., Akutsu, T.: Protein homology detection using string alignment kernels. Bioinformatics 20, 1682–1689 (2004)CrossRef Saigo, H., Vert, J.P., Ueda, N., Akutsu, T.: Protein homology detection using string alignment kernels. Bioinformatics 20, 1682–1689 (2004)CrossRef
24.
Zurück zum Zitat Hochreiter, S., Heusel, M., Obermayer, K.: Fast model-based protein homology detection without alignment. Bioinformatics 23, 1728–1736 (2007)CrossRef Hochreiter, S., Heusel, M., Obermayer, K.: Fast model-based protein homology detection without alignment. Bioinformatics 23, 1728–1736 (2007)CrossRef
Metadaten
Titel
Protein Remote Homology Detection Based on Profiles
verfasst von
Qing Liao
Mingyue Guo
Bin Liu
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-17938-0_24

Premium Partner