Advances in protein structure prediction and de novo protein design: A review
Introduction
Proteins are linear chains of amino acids that adopt a unique three-dimensional structure in their native surroundings. It is this native structure that allows the protein to carry out its biochemical function. Levinthal's paradox (Levinthal, 1969, Zwanzig et al., 1992) raised the question why and how a sequence of amino acids can fold into its functional native structure given the abundance of geometrically possible structures.
The pioneering experiments of Anfinsen (1973) shed light on this problem. According to Anfinsen's thermodynamic hypothesis, proteins are not assembled into their native structures by a biological process, but folding is a purely physical process that depends only on the specific amino acid sequence of the protein and the surrounding solvent. Anfinsen's hypothesis implies that in principle protein structure can be predicted if a model of the free energy is available, and if the global minimum of this function can be identified. This idea defines the protein structure prediction problem well, as it allows to infer macroscopic structure of many proteins from a few types of microscopic interactions between the protein's constituents. On the other hand, protein structure prediction remains utterly complex, since even short amino acid sequences can form an abundant number of geometric structures among which the free energy minimum has to be identified.
A protein is composed of several levels of structure. The primary structure of a protein is described by the specific amino acid sequence. Additionally, patterns of local bonding can be identified as secondary structure. The two most common types of secondary structure are -helices and -sheets. Connecting these elements of secondary structure are loop regions. The tertiary structure is then the final three-dimensional structure of these elements after the protein folds into its native state. Fig. 1 illustrates an example protein structure.
The protein structure prediction problem is a fundamental problem treated across disciplines. From a chemical engineering point of view, the structure prediction problem is of great interest, because it is a prerequisite for successfully tackling de novo protein design. In de novo protein design the ultimate objective is to identify amino acid sequences that fold into proteins with desired functions. De novo protein design can be looked upon as a product design problem on the molecular scale.
Many approaches to computational protein structure prediction using first principles have been developed over the last decade that are based on Anfinsen's thermodynamic hypothesis. Section 2 attempts to give an overview of recent developments. Computational structure prediction based on first principles is, however, not the only way to determine protein structure. The number of protein structures that have been determined experimentally continues to grow rapidly. At the end of 2004, the number of structures freely available from the Protein Data Bank (Berman et al., 2000) is approaching 28,000. The availability of experimental data on protein structures has inspired the development of methods for computational structure prediction that are knowledge-based rather than physics based. In contrast to methods that attempt to minimize the free energy and derive the structure from first principles, these knowledge-based approaches search databases of known structures to infer information about an amino acid sequence of unknown three-dimensional structure. While such database methods have been criticized for not helping to obtain a fundamental understanding of the mechanisms that drive structure formation, these knowledge-based methods can often successfully predict unknown three dimensional structures.
Progress for all variants of computational protein structure prediction methods is assessed in the biannual, community-wide Critical Assessment of Protein Structure Prediction (CASP) experiments (Moult et al., 2003, Moult et al., 2001, Moult et al., 1997; Moult, 1999). In the CASP experiments, research groups are invited to apply their prediction methods to amino acid sequences for which the native structure is not known but to be determined and to be published soon. Even though the number of amino acid sequences provided by the CASP experiments is small, these competitions provide a good measure to benchmark methods and progress in the field in an arguably unbiased manner (Murzin, 2004). The overview on computational protein structure prediction methods given in this review will draw on results from recent CASP experiments.
Research on protein structure prediction methods as witnessed in the biannual CASP experiments has been motivated to a large extent by scientific curiosity. Protein structure prediction is, however, not only interesting from a scientific, but also from an engineering point of view. It constitutes a major part of the de novo protein design problem, which is also called the inverse protein folding problem (Pabo, 1983, Drexler, 1981) that requires the determination of an amino acid sequence compatible with a given three-dimensional structure. De novo protein design problem is the “inverse” of the protein folding problem because it starts with the structure rather than the sequence and looks for all sequences that will fold into such structure. Experimentalists have tackled this problem with mutagenesis, rational design, and directed evolution. These methods are, however, restricted with respect to the number of mutant structures that can be screened experimentally which is typically in the range of – sequences (Voigt et al., 2001). Computational protein design methods, in contrast, allow for the screening of overwhelmingly large parts of sequence space. Toward this end, the paper summarizes recent progress in the field of de novo protein design.
The review is organized as follows. Section 2 provides an overview on methods for protein structure prediction and summarizes recent developments in all categories of approaches to the problem. Section 3 is devoted to methods for loop structure prediction. Section 4 outlines the first principles protein structure prediction method, ASTRO-FOLD. Section 5 discusses advances in force field development as they pertain to fold recognition and de novo protein design. Section 6 focuses on recent progress in de novo protein design.
Section snippets
Protein structure prediction
Numerous different approaches to protein structure prediction exist. Methods for structure prediction can be divided into four groups: (1) comparative modeling; (2) fold recognition; (3) first principles methods with database information; and (4) first principles methods without database information. As prediction methods became more sophisticated, the boundaries between these categories have been blurred, and today methods exist that cannot clearly be appointed to any of the four categories.
Loop structure prediction
Ab initio methods have recently received increased attention in the prediction of loops, that is, those structures that join -strands and helices in proteins. Loops exhibit greater structural variability than strands and helices, since they are often exposed at the surface of a protein and have relatively few contacts with the remainder of the structure. Loop structure therefore is considerably more difficult to predict than the structure of the geometrically highly regular strands and
ASTRO-FOLD protein structure prediction approach
One successful prediction method is the first principles ASTRO-FOLD protein folding approach developed by Floudas and coworkers (Klepeis and Floudas, 2003c). The main thrusts of this approach are (1) -helical prediction through detailed free energy calculations (Klepeis and Floudas, 2002), (2) a mixed-integer linear optimization formulation for the -sheet prediction (Klepeis and Floudas, 2003a, Floudas, 1995), (3) derivation of secondary structure restraints and loop modeling, and (4) the
Force fields
Protein structure prediction is one of the most important and difficult problems in computational structural biology. As discussed earlier, different approaches have been developed to address this problem. Various components of the protein folding problem (e.g., fold recognition, ab initio prediction, comparative modelling and de novo design) make use of a force field. In the process of structure prediction, sometimes it is required to select the native structure of a protein from a pool of
De novo protein design
There have been considerable successes in the development of computational algorithms for protein design during the past decade. At the turn of the 1990s, some de novo protein design efforts turned out to be futile as either the target fold was not achieved (Betz et al., 1993) or the engineered protein had a different quaternary structure than expected (Lovejoy et al., 1993). These failures were thought to be caused by the relatively qualitative hierarchic approach adopted on protein design at
Acknowledgements
CAF gratefully acknowledges financial support from the National Science Foundation and the National Institutes of Health (R01 GM52032; R24 GM069736). MM gratefully acknowledges funding by a Deutsche Forschungsgemeinschaft research fellowship (MO 1086/1-1).
References (234)
- et al.
A global optimization method, BB, for process design
Computers and Chemical Engineering
(1996) - et al.
Global optimization of MINLP problems in process synthesis and design
Computers and Chemical Engineering
(1997) - et al.
A global optimization method for general twice-differentiable NLPs—II. Implementation and computational results
Computers and Chemical Engineering
(1998) - et al.
A global optimization method for general twice-differentiable NLPs—I. Theoretical advances
Computers and Chemical Engineering
(1998) - et al.
Basic local alignment search tool
Journal of Molecular Biology
(1990) - et al.
Inter-residue potential in globular proteins and the dominance of highly specific hydrophillic interactions at close separation
Journal of Molecular Biology
(1997) - et al.
De novo protein design: from molten globules to native-like states
Current Opinion in Structural Biology
(1993) - et al.
Mechanical unfolding of a beta-hairpin using molecular dynamics
Biophysical Journal
(2000) Constructing smooth potential functions for protein folding
Journal of Molecular Graphics and Modelling
(2001)- et al.
Improved conformational space annealing method to treat beta-structure with the UNRES force-field and to enhance scalability of parallel implementation
Polymer
(2004)