Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 3/2005

01.11.2005

Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data

verfasst von: Jianlin Cheng, Michael J. Sweredoski, Pierre Baldi

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 3/2005

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Intrinsically disordered regions in proteins are relatively frequent and important for our understanding of molecular recognition and assembly, and protein structure and function. From an algorithmic standpoint, flagging large disordered regions is also important for ab initio protein structure prediction methods. Here we first extract a curated, non-redundant, data set of protein disordered regions from the Protein Data Bank and compute relevant statistics on the length and location of these regions. We then develop an ab initio predictor of disordered regions called DISpro which uses evolutionary information in the form of profiles, predicted secondary structure and relative solvent accessibility, and ensembles of 1D-recursive neural networks. DISpro is trained and cross validated using the curated data set. The experimental results show that DISpro achieves an accuracy of 92.8% with a false positive rate of 5%. DISpro is a member of the SCRATCH suite of protein data mining tools available through http://​www.​igb.​uci.​edu/​servers/​psss.​html.​

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. 1997. Gapped blast and psi-blast: A new generation of protein database search programs. Nucleic Acids Res., 25(17):3389–3402.CrossRefPubMed Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. 1997. Gapped blast and psi-blast: A new generation of protein database search programs. Nucleic Acids Res., 25(17):3389–3402.CrossRefPubMed
Zurück zum Zitat Baldi, P. and Pollastri, G. 2003. The principled design of large-scale recursive neural network architectures–DAG-RNNs and the protein structure prediction problem. Journal of Machine Learning Research, 4:575–602.CrossRef Baldi, P. and Pollastri, G. 2003. The principled design of large-scale recursive neural network architectures–DAG-RNNs and the protein structure prediction problem. Journal of Machine Learning Research, 4:575–602.CrossRef
Zurück zum Zitat Bengio, Y. and Frasconi, P. 1996. Input-output HMM's for sequence processing. IEEE Transactions on Neural Networks, 7(5):1231–1249.CrossRef Bengio, Y. and Frasconi, P. 1996. Input-output HMM's for sequence processing. IEEE Transactions on Neural Networks, 7(5):1231–1249.CrossRef
Zurück zum Zitat Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P. 2000. The protein data bank. Nucleic Acids Research, 28:235–242.CrossRefPubMed Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P. 2000. The protein data bank. Nucleic Acids Research, 28:235–242.CrossRefPubMed
Zurück zum Zitat Dunker, A.K., Brown, C.J., Lawson, J.D., Iakoucheva, L.M. and Obradovic, Z. 2002. Intrinsic disorder and protein function. Biochemistry, 41(21):6573–6582.CrossRefPubMed Dunker, A.K., Brown, C.J., Lawson, J.D., Iakoucheva, L.M. and Obradovic, Z. 2002. Intrinsic disorder and protein function. Biochemistry, 41(21):6573–6582.CrossRefPubMed
Zurück zum Zitat Frasconi, P., Passerini, A., and Vullo, A. 2002. A two-stage svm architecture for predicting the disulfide bonding state of cysteines. In Proc. IEEE Workshop on Neural Networks for Signal Processing, pp. 25–34. Frasconi, P., Passerini, A., and Vullo, A. 2002. A two-stage svm architecture for predicting the disulfide bonding state of cysteines. In Proc. IEEE Workshop on Neural Networks for Signal Processing, pp. 25–34.
Zurück zum Zitat Jones, D.T. 1999. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol., 292:195–202.CrossRefPubMed Jones, D.T. 1999. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol., 292:195–202.CrossRefPubMed
Zurück zum Zitat Kabsch, W. and Sander, C. 1983. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22:2577–2637.CrossRefPubMed Kabsch, W. and Sander, C. 1983. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22:2577–2637.CrossRefPubMed
Zurück zum Zitat Li, X., Romero, P., Rani, M., Dunker, A., and Obradovic, Z. 1999. Predicting protein disorder for n-, c-, and internal regions. Genome Inform., 42:38–48. Li, X., Romero, P., Rani, M., Dunker, A., and Obradovic, Z. 1999. Predicting protein disorder for n-, c-, and internal regions. Genome Inform., 42:38–48.
Zurück zum Zitat Linding, R., Jensen, L.J., Diella, F., Bork, P., Gibson, T.J., and Russell, R.B. 2003. Protein disorder prediction: Implications for structural proteomics. Structure, 11(11):1453–1459.CrossRefPubMed Linding, R., Jensen, L.J., Diella, F., Bork, P., Gibson, T.J., and Russell, R.B. 2003. Protein disorder prediction: Implications for structural proteomics. Structure, 11(11):1453–1459.CrossRefPubMed
Zurück zum Zitat Mika, S. and Rost, B. 2003. Uniqueprot: Creating representative protein-sequence sets. Nucleic Acids Res., 31(13):3789–3791.CrossRefPubMed Mika, S. and Rost, B. 2003. Uniqueprot: Creating representative protein-sequence sets. Nucleic Acids Res., 31(13):3789–3791.CrossRefPubMed
Zurück zum Zitat Pollastri, G., Baldi, P., Fariselli, P. and Casadio, R. 2001a. Prediction of coordination number and relative solvent accessibility in proteins. Proteins, 47:142–153.CrossRef Pollastri, G., Baldi, P., Fariselli, P. and Casadio, R. 2001a. Prediction of coordination number and relative solvent accessibility in proteins. Proteins, 47:142–153.CrossRef
Zurück zum Zitat Pollastri, G., Przybylski, D., Rost, B., and Baldi, P. 2001b. Improving the prediction of protein secondary strucure in three and eight classes using recurrent neural networks and profiles. Proteins, 47:228–235.CrossRef Pollastri, G., Przybylski, D., Rost, B., and Baldi, P. 2001b. Improving the prediction of protein secondary strucure in three and eight classes using recurrent neural networks and profiles. Proteins, 47:228–235.CrossRef
Zurück zum Zitat Przybylski, D. and Rost, B. 2002. Alignments grow, secondary structure prediction improves. Proteins, 46:195–205.CrossRef Przybylski, D. and Rost, B. 2002. Alignments grow, secondary structure prediction improves. Proteins, 46:195–205.CrossRef
Zurück zum Zitat Ward, J.J., Sodhi, J.S., McGuffin, L.J., Buxton, B.F., and Jones, D.T. 2004. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. Journal of Molecular Biology, 337(3):635–645.CrossRefPubMed Ward, J.J., Sodhi, J.S., McGuffin, L.J., Buxton, B.F., and Jones, D.T. 2004. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. Journal of Molecular Biology, 337(3):635–645.CrossRefPubMed
Zurück zum Zitat Wootton, J. 1994. Non-globular domains in protein sequences: Automated segmentation using complexity measures. Computational Chemistry, 18:269–285.CrossRefMATH Wootton, J. 1994. Non-globular domains in protein sequences: Automated segmentation using complexity measures. Computational Chemistry, 18:269–285.CrossRefMATH
Zurück zum Zitat Wright, P.E. and Dyson, H.J. 1999. Intrinsically unstructured proteins: Re-assessing the protein structure-function paradigm. Journal of Molecular Biology, 293(2):321–331.CrossRefPubMed Wright, P.E. and Dyson, H.J. 1999. Intrinsically unstructured proteins: Re-assessing the protein structure-function paradigm. Journal of Molecular Biology, 293(2):321–331.CrossRefPubMed
Metadaten
Titel
Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data
verfasst von
Jianlin Cheng
Michael J. Sweredoski
Pierre Baldi
Publikationsdatum
01.11.2005
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 3/2005
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-005-0001-y

Weitere Artikel der Ausgabe 3/2005

Data Mining and Knowledge Discovery 3/2005 Zur Ausgabe

Premium Partner