Advances in protein structure prediction and de novo protein design: A review

https://doi.org/10.1016/j.ces.2005.04.009Get rights and content

Abstract

This review provides an exposition to the important problems of (i) structure prediction in protein folding and (ii) de novo protein design. The recent advances in protein folding are reviewed based on a classification of the approaches in comparative modeling, fold recognition, and first principles methods with and without database information. The advances towards the challenging problem of loop structure prediction and the first principles method, ASTRO-FOLD, along with the developments in the area of force-fields development have been discussed. Finally, the recent progress in the area of de novo protein design is presented with focus on template flexibility, in silico sequence selection, and successful peptide and protein designs.

Introduction

Proteins are linear chains of amino acids that adopt a unique three-dimensional structure in their native surroundings. It is this native structure that allows the protein to carry out its biochemical function. Levinthal's paradox (Levinthal, 1969, Zwanzig et al., 1992) raised the question why and how a sequence of amino acids can fold into its functional native structure given the abundance of geometrically possible structures.

The pioneering experiments of Anfinsen (1973) shed light on this problem. According to Anfinsen's thermodynamic hypothesis, proteins are not assembled into their native structures by a biological process, but folding is a purely physical process that depends only on the specific amino acid sequence of the protein and the surrounding solvent. Anfinsen's hypothesis implies that in principle protein structure can be predicted if a model of the free energy is available, and if the global minimum of this function can be identified. This idea defines the protein structure prediction problem well, as it allows to infer macroscopic structure of many proteins from a few types of microscopic interactions between the protein's constituents. On the other hand, protein structure prediction remains utterly complex, since even short amino acid sequences can form an abundant number of geometric structures among which the free energy minimum has to be identified.

A protein is composed of several levels of structure. The primary structure of a protein is described by the specific amino acid sequence. Additionally, patterns of local bonding can be identified as secondary structure. The two most common types of secondary structure are α-helices and β-sheets. Connecting these elements of secondary structure are loop regions. The tertiary structure is then the final three-dimensional structure of these elements after the protein folds into its native state. Fig. 1 illustrates an example protein structure.

The protein structure prediction problem is a fundamental problem treated across disciplines. From a chemical engineering point of view, the structure prediction problem is of great interest, because it is a prerequisite for successfully tackling de novo protein design. In de novo protein design the ultimate objective is to identify amino acid sequences that fold into proteins with desired functions. De novo protein design can be looked upon as a product design problem on the molecular scale.

Many approaches to computational protein structure prediction using first principles have been developed over the last decade that are based on Anfinsen's thermodynamic hypothesis. Section 2 attempts to give an overview of recent developments. Computational structure prediction based on first principles is, however, not the only way to determine protein structure. The number of protein structures that have been determined experimentally continues to grow rapidly. At the end of 2004, the number of structures freely available from the Protein Data Bank (Berman et al., 2000) is approaching 28,000. The availability of experimental data on protein structures has inspired the development of methods for computational structure prediction that are knowledge-based rather than physics based. In contrast to methods that attempt to minimize the free energy and derive the structure from first principles, these knowledge-based approaches search databases of known structures to infer information about an amino acid sequence of unknown three-dimensional structure. While such database methods have been criticized for not helping to obtain a fundamental understanding of the mechanisms that drive structure formation, these knowledge-based methods can often successfully predict unknown three dimensional structures.

Progress for all variants of computational protein structure prediction methods is assessed in the biannual, community-wide Critical Assessment of Protein Structure Prediction (CASP) experiments (Moult et al., 2003, Moult et al., 2001, Moult et al., 1997; Moult, 1999). In the CASP experiments, research groups are invited to apply their prediction methods to amino acid sequences for which the native structure is not known but to be determined and to be published soon. Even though the number of amino acid sequences provided by the CASP experiments is small, these competitions provide a good measure to benchmark methods and progress in the field in an arguably unbiased manner (Murzin, 2004). The overview on computational protein structure prediction methods given in this review will draw on results from recent CASP experiments.

Research on protein structure prediction methods as witnessed in the biannual CASP experiments has been motivated to a large extent by scientific curiosity. Protein structure prediction is, however, not only interesting from a scientific, but also from an engineering point of view. It constitutes a major part of the de novo protein design problem, which is also called the inverse protein folding problem (Pabo, 1983, Drexler, 1981) that requires the determination of an amino acid sequence compatible with a given three-dimensional structure. De novo protein design problem is the “inverse” of the protein folding problem because it starts with the structure rather than the sequence and looks for all sequences that will fold into such structure. Experimentalists have tackled this problem with mutagenesis, rational design, and directed evolution. These methods are, however, restricted with respect to the number of mutant structures that can be screened experimentally which is typically in the range of 103106 sequences (Voigt et al., 2001). Computational protein design methods, in contrast, allow for the screening of overwhelmingly large parts of sequence space. Toward this end, the paper summarizes recent progress in the field of de novo protein design.

The review is organized as follows. Section 2 provides an overview on methods for protein structure prediction and summarizes recent developments in all categories of approaches to the problem. Section 3 is devoted to methods for loop structure prediction. Section 4 outlines the first principles protein structure prediction method, ASTRO-FOLD. Section 5 discusses advances in force field development as they pertain to fold recognition and de novo protein design. Section 6 focuses on recent progress in de novo protein design.

Section snippets

Protein structure prediction

Numerous different approaches to protein structure prediction exist. Methods for structure prediction can be divided into four groups: (1) comparative modeling; (2) fold recognition; (3) first principles methods with database information; and (4) first principles methods without database information. As prediction methods became more sophisticated, the boundaries between these categories have been blurred, and today methods exist that cannot clearly be appointed to any of the four categories.

Loop structure prediction

Ab initio methods have recently received increased attention in the prediction of loops, that is, those structures that join β-strands and helices in proteins. Loops exhibit greater structural variability than strands and helices, since they are often exposed at the surface of a protein and have relatively few contacts with the remainder of the structure. Loop structure therefore is considerably more difficult to predict than the structure of the geometrically highly regular strands and

ASTRO-FOLD protein structure prediction approach

One successful prediction method is the first principles ASTRO-FOLD protein folding approach developed by Floudas and coworkers (Klepeis and Floudas, 2003c). The main thrusts of this approach are (1) α-helical prediction through detailed free energy calculations (Klepeis and Floudas, 2002), (2) a mixed-integer linear optimization formulation for the β-sheet prediction (Klepeis and Floudas, 2003a, Floudas, 1995), (3) derivation of secondary structure restraints and loop modeling, and (4) the

Force fields

Protein structure prediction is one of the most important and difficult problems in computational structural biology. As discussed earlier, different approaches have been developed to address this problem. Various components of the protein folding problem (e.g., fold recognition, ab initio prediction, comparative modelling and de novo design) make use of a force field. In the process of structure prediction, sometimes it is required to select the native structure of a protein from a pool of

De novo protein design

There have been considerable successes in the development of computational algorithms for protein design during the past decade. At the turn of the 1990s, some de novo protein design efforts turned out to be futile as either the target fold was not achieved (Betz et al., 1993) or the engineered protein had a different quaternary structure than expected (Lovejoy et al., 1993). These failures were thought to be caused by the relatively qualitative hierarchic approach adopted on protein design at

Acknowledgements

CAF gratefully acknowledges financial support from the National Science Foundation and the National Institutes of Health (R01 GM52032; R24 GM069736). MM gratefully acknowledges funding by a Deutsche Forschungsgemeinschaft research fellowship (MO 1086/1-1).

References (234)

  • G. Dantas et al.

    A large scale test of computational protein design: folding and stability of nine completely redesigned globular proteins

    Journal of Molecular Biology

    (2003)
  • J.R. Desjarlais et al.

    Side-chain and backbone flexibiity in protein core design

    Journal of Molecular Biology

    (1999)
  • E.G. Emberly et al.

    Flexibility of α-helices: results of a statistical analysis of database protein structures

    Journal of Molecular Biology

    (2003)
  • V.A. Eyrich et al.

    Prediction of protein tertiary structure to low resolution: performance for a large and structurally diverse test set

    Journal of Molecular Biology

    (1999)
  • G. Ghirlanda et al.

    From synthetic coiled coils to functional proteins: automated design of a receptor for the calmodulin-binding domain of calcineurin

    Journal of Molecular Biology

    (1998)
  • B. Gillespie et al.

    NMR and temperature-jump measurements of de novo designed proteins demonstrate rapid folding in the absence of explicit selection for kinetics

    Journal of Molecular Biology

    (2003)
  • R.F. Goldstein

    Efficient rotamer elimination applied to protein side-chains and related spin glasses

    Biophysical Journal

    (1994)
  • D.B. Gordon et al.

    Energy functions for protein design

    Current Opinion in Structural Biology

    (1999)
  • P. Güntert et al.

    Torsion angle dynamics for NMR structure calculation with the new program DYANA

    Journal of Molecular Biology

    (1997)
  • M. Hao et al.

    Designing potential energy functions for protein folding

    Current Opinion in Structural Biology

    (1999)
  • M. Hendlich et al.

    Identification of native protein folds amongst a large number of incorrect models

    Journal of Molecular Biology

    (1990)
  • B. Honig et al.

    Free energy balance in protein folding

    Advances in Protein Chemistry

    (1995)
  • R.L. Jernigan et al.

    Structure–derived potentials and protein simulations

    Current Opinion in Structural Biology

    (1996)
  • D.T. Jones

    GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences

    Journal of Molecular Biology

    (1999)
  • D.T. Jones

    Protein secondary structure prediction based on position specific scoring matrices

    Journal of Molecular Biology

    (1999)
  • D. Kihara et al.

    The PDB is a covering set of small protein structures

    Journal of Molecular Biology

    (2003)
  • J.L. Klepeis et al.

    ASTRO-FOLD: a combinatorial and global optimization framework for ab initio prediction of three-dimensional structures of proteins from the amino acid sequence

    Biophysical Journal

    (2003)
  • J.L. Klepeis et al.

    A new class of hybrid global optimization algorithms for peptide structure prediction. Integrated hybrids

    Computer Physics Communication

    (2003)
  • J.L. Klepeis et al.

    Hybrid global optimization algorithms for protein structure prediction: alternating hybrids

    Biophysical Journal

    (2003)
  • P. Koehl et al.

    De novo protein design. I. In search of stability and specificity

    Journal of Molecular Biology

    (1999)
  • P. Koehl et al.

    De novo protein design. II. Plasticity in sequence space

    Journal of Molecular Biology

    (1999)
  • M. Allert et al.

    Computational design of receptors for an organophosphate surrogate of the nerve agent soman

    Proceedings of the National Academy of Sciences of the United States of America

    (2004)
  • P. Aloy et al.

    Predictions without templates: new folds secondary structure, and contacts in CASP5

    Proteins: Structure, Function, and Bioinformatics

    (2003)
  • S.F. Altschul et al.

    Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

    Nucleic Acids Research

    (1997)
  • Y. An et al.

    A novel fold recognition method using composite predicted secondary structures

    Proteins: Structure, Function, and Bioinformatics

    (2002)
  • I.P. Androulakis et al.

    αBB: a global optimization method for general constrained nonconvex problems

    Journal of Global Optimization

    (1995)
  • C.B. Anfinsen

    Principles that govern the folding of protein chains

    Science

    (1973)
  • D.E. Benson et al.

    Rational design of nascent metalloenzymes

    Proceedings of the National Academy of Sciences of the United States of America

    (2000)
  • Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E., 2000....
  • D.P. Bertsekas

    Dynamic Programming and Optimal Control—I

    (1995)
  • D.N. Bolon et al.

    Enzyme-like proteins by computational design

    Proceedings of the National Academy of Sciences of the United States of America

    (2001)
  • J.U. Bowie et al.

    A method to identify protein sequences that fold into a known three-dimensional structure

    Science

    (1991)
  • P. Bradley et al.

    Rosetta predictions in CASP5: successes, failures, and prospects for complete automation

    Proteins: Structure, Function, and Bioinformatics

    (2003)
  • S.H. Bryant et al.

    The frequency of ion-pair substructures in proteins is quantitaively related to electrostatic potential. A statistical model for nonbonded interactions

    Proteins: Structure, Function, and Bioinformatics

    (1991)
  • J.W. Bryson et al.

    From coiled coils to small globular proteins: design of a native-like three-helix bundle

    Protein Science

    (1998)
  • C. Chothia

    One thousand families for the molecular biologist

    Nature

    (1992)
  • C. Clementi et al.

    Folding Lennard–Jones proteins by a contact potential

    Proteins: Structure, Function, and Bioinformatics

    (1999)
  • W.D. Cornell et al.

    A second generation force field for the simulation of proteins, nucleic acids, and organic molecules

    Journal of the American Chemical Society

    (1995)
  • J.A. Cuff et al.

    JPred: a consensus secondary structure prediction server

    Bioinformatics

    (1998)
  • C. Czaplewski et al.

    Prediction of the structures of proteins with the UNRES force field, including dynamic formation and breaking of disulfide bonds

    Protein Engineering Design and Selection

    (2004)
  • Cited by (0)

    View full text