Skip to main content





Methods for Macromolecular Modeling (M3): Assessment of Progress and Future Perspectives

The workshop on Methods for Macromolecular Modeling (M3), held at New York University on 12–14 October 2000, provided the 187 participants from Europe, Asia, the Americas, and the Middle East1, a forum for reviewing progress in the field, discussing promising developments for the future, and voicing concerns about multidisciplinary efforts. Inspired by these issues, we review progress in several key areas, discuss challenging problems in structural biology, and address scientific and cultural issues of mathematics/biology interface research. Specifically, we mention opportunities in structural genomics and more broadly structural biology (protein folding, protein folding disorders and disease, and energetic/conformational pathways in proteins); we also highlight emerging mathematical methods, unified molecular force fields, biomolecular dynamics simulations, and free energy computations. Finally, we discuss three obstacles to interdisciplinary research: quantitative problem formulations, formulation of benchmarks, and understanding the biological significance of research topics.
Hin Hark Gan, Tamar Schlick

Biomolecular Dynamics Applications


Mathematics and Molecular Neurobiology

Advances in mathematics and computer technology, together with advances in structural biology, are opening the way to detailed modeling of biology at the molecular and cellular levels. One objective of such studies is the development of a more complete understanding of biological systems, including the emergence of behavior at the cellular level from that at the molecular level. Another objective is the development of more sophisticated models for structure-aided discovery of new pharmaceuticals.
Nathan A. Baker, Kaihsu Tai, Richard Henchman, David Sept, Adrian Elcock, Michael Holst, J. Andrew McCammon

Structural and Dynamical Characterization of Nucleic Acid Water and Ion Binding Sites

Recent methodological developments led to longer and more accurate molecular dynamics (MD) simulations. In parallel, methods have been designed with the purpose of characterizing water and ion binding features. Here, we give an outline of some of the methods we used in order to extract structural and dynamical information concerning the first water and ion coordination shell, from MD simulations conducted on RNA and DNA structures. Coordinates for the water and ion binding sites located in the first coordination shell of r(G=C), d(G=C), r(A-U), and d(A-T) base-pairs are provided, along with calculated “pseudo” thermal factors.
Pascal Auffinger, Benoit Masquida, Eric Westhof

Molecular Dynamics Methods


A Test Set for Molecular Dynamics Algorithms

This article describes a collection of model problems for aiding numerical analysts, code developers and others in the design of computational methods for molecular dynamics (MD) simulation. Common types of calculations and desirable features of algorithms are surveyed, and these are used to guide selection of representative models. By including essential features of certain classes of molecular systems, but otherwise limiting the physical and quantitative details, it is hoped that the test set can help to facilitate cross-disciplinary algorithm and code development efforts.
Eric Barth, Benedict Leimkuhler, Sebastian Reich

Internal Coordinate Molecular Dynamics Based on the Spectroscopic B-Matrix

Internal coordinate molecular dynamics (ICMD) has been used in the past in simulations for large molecules as an alternative way of increasing step size with a reduced operational dimension that is not achievable by MD in Cartesian coordinates. A new MD formalism in nonredundant generalized (internal and external) coordinates for flexible molecular systems is presented, which is based on the spectroscopic B-matrix rather than the A-matrix of previous methods. The proposed formalism does not require a direct inversion of a large matrix as in the recursive formulations based on robot dynamics, and takes advantage of the sparsity of the spectroscopic B-matrix, ensuring computational efficiency for flexible molecules. Each molecule’s external rotations about an arbitrary atom center, which may differ from its center of mass, are parameterized by the SU(2) Euler representation, giving singularity free parameterization. Based on the clear separability in the generalized coordinates between fast varying degrees of freedom and slowly varying ones, a multiple time step algorithm is introduced that avoids the nontrivial interaction distance classification inherent in the method in Cartesian coordinates.
Sang-Ho Lee, Kim Palmo, Samuel Krimm

The Sigma MD Program and a Generic Interface Applicable to Multi-Functional Programs with Complex, Hierarchical Command Structure

This article summarizes the Sigma program for molecular dynamics simulation and describes a generic web browser-based interface (“WASP”) applicable to programs with complex, hierarchical command structures. Use of the interface is illustrated with its application to the Sigma program (“Wigma”).
Geoff Mann, R. H. Yun, Lars Nyland, Jan Prins, John Board, Jan Hermans

Overcoming Instabilities in Verlet-I/r-RESPA with the Mollified Impulse Method

The primary objective of this paper is to explain the derivation of symplectic mollified Verlet-I/r-RESPA (MOLLY) methods that overcome linear and nonlinear instabilities that arise as numerical artifacts in Verlet-I/r-RESPA. These methods allow for lengthening of the longest time step used in molecular dynamics (MD). We provide evidence that MOLLY methods can take a longest time step that is 50% greater than that of Verlet-I/r-RESPA, for a given drift, including no drift. A 350% increase in the timestep is possible using MOLLY with mild Langevin damping while still computing dynamic properties accurately. Furthermore, longer time steps also enhance the scalability of multiple time stepping integrators that use the popular Particle Mesh Ewald method for computing full electrostatics, since the parallel bottleneck of the fast Fourier transform associated with PME is invoked less often. An additional objective of this paper is to give sufficient implementation details for these mollified integrators, so that interested users may implement them into their MD codes, or use the program ProtoMol in which we have implemented these methods.
Using simple analysis of a 1-d model problem we show the linear instability present in Verlet-I/r-RESPA at approximately half the period of the fastest motion, and more interestingly, how the mollified methods can be designed to overcome them. The paper also includes an experimental component that shows how these methods overcome instability barriers in practice.
We also present evidence that more complicated instabilities are present in Verlet-I/r-RESPA than linear analysis reveals. In particular, we postulate nonlinear resonance mechanisms hereto ignored, although these mechanisms are known for leapfrog. This means that Verlet-I/r-RESPA is no better than leapfrog if one wants a simulation with no drift. Currently, we use mild Langevin damping to overcome these nonlinear instabilities, but it is possible to design symplectic MOLLY integrators that are nonlinearly stable as well.
Jesús A. Izaguirre, Qun Ma, Thierry Matthey, Jeremiah Willcock, Thomas Slabach, Branden Moore, George Viamontes

Monte Carlo Methods


On the Potential of Monte Carlo Methods for Simulating Macromolecular Assemblies

A wide variety of Monte Carlo techniques are described to argue that the methodology has a large untapped potential to solve sampling problems for complex systems.
Mihaly Mezei

Structure Calculation of Protein Segments Connecting Domains with Defined Secondary Structure: A Simulated Annealing Monte Carlo Combined with Biased Scaled Collective Variables Technique

A method for modeling segments of proteins that connect regions with defined secondary structure is illustrated with a study of long segments (8–13 amino acids) that include parts of defined secondary structure motifs. Loop structure calculation can be considered a particular case of this more challenging problem. The new algorithm first finds conformations representative of the segment structure tethered to the protein at one terminus only, and subsequently drives the free end of the segment towards its attachment point using a reversed harmonic constrained simulated annealing scheme. An adjustable force constant drives the free terminal towards the attachment point, using the Monte Carlo (MC) technique of scaled collective variables (SCV). Each segment is initially placed in an extended conformation with the N-terminus covalently bound to the protein, and MC simulated annealing is carried out to find the preferred conformations of the segment. The resulting families of conformations prepare the segment for attachment of the C-terminus. In the second stage a hierarchical protocol drives the segment’s C-terminus towards its final position in the protein. The free C-terminus is attached to a dummy residue, identical to the target residue where the segment will be connected. Successive MC simulations are carried out using the SCV method with increasingly larger values of the harmonic force constant that slowly stabilize the free energy surface to ensure the correct orientation of the segment region with the rest of the system. The performance of the method was evaluated for eight segments in the a-subunit of the G protein transducin for which a high-resolution X-ray crystal structure (2.0 Å) is available. The calculation was performed using the all-atom representation and the CHARMM force field with the electrostatic effects of the solvent described implicitly by the new SCP general continuum model. The segments that are most exposed to solvent are found to be represented best with this method.
Sergio A. Hassan, Ernest L. Mehler, Harel Weinstein

Other Conformational Sampling Methods


Hierarchical Uncoupling-Coupling of Metastable Conformations

Uncoupling-coupling Monte Carlo (UCMC) combines uncoupling techniques for finite Markov chains with Markov chain Monte Carlo methodology. UCMC aims at avoiding the typical metastable or trapping behavior of Monte Carlo techniques. From the viewpoint of Monte Carlo, a slowly converging long-time Markov chain is replaced by a limited number of rapidly mixing short-time ones. Therefore, the state space of the chain has to be hierarchically decomposed into its metastable conformations. This is done by means of combining the technique of conformation analysis as recently introduced by the authors, and appropriate annealing strategies. We present a detailed examination of the uncoupling-coupling procedure which uncovers its theoretical background, and illustrates the hierarchical algorithmic approach. Furthermore, application of the UCMC algorithm to the n-pentane molecule allows us to discuss the effect of its crucial steps in a typical molecular scenario.
Alexander Fischer, Christof Schütte, Peter Deuflhard, Frank Cordes

Automatic Identification of Metastable Conformations via Self-Organized Neural Networks

As has been shown recently, the identification of metastable chemical conformations leads to a Perron cluster eigenvalue problem for a reversible Markov operator. Naive discretization of this operator would suffer from combinatorial explosion. As a first remedy, a pre-identification of essential degrees of freedom out of the set of torsion angles had been applied up to now. The present paper suggests a different approach based on neural networks: its idea is to discretize the Markov operator via self-organizing box maps. The thus obtained box decomposition then serves as a prerequisite for the subsequent Perron cluster analysis. Moreover, this approach also permits exploitation of additional structure within embedded simulations. As it turns out, the new method is fully automatic and efficient also in the treatment of biomolecules. This is exemplified by numerical results.
T. Galliat, P. Deuflhard, R. Roitzsch, F. Cordes

Free Energy Methods


Equilibrium and Nonequilibrium Foundations of Free Energy Computational Methods

Statistical mechanics provides a rigorous framework for the numerical estimation of free energy differences in complex systems such as biomolecules. This paper presents a brief review of the statistical mechanical identities underlying a number of techniques for computing free energy differences. Both equilibrium and nonequilibrium methods are covered.
C. Jarzynski

Free-Energy Calculations in Protein Folding by Generalized-Ensemble Algorithms

We review uses of the generalized-ensemble algorithms for free-energy calculations in protein folding. Two of the well-known methods are multicanonical algorithm and replica-exchange method; the latter is also referred to as parallel tempering. We present a new generalized-ensemble algorithm that combines the merits of the two methods; it is referred to as the replica-exchange multicanonical algorithm. We also give a multidimensional extension of the replica-exchange method. Its realization as an umbrella sampling method, which we refer to as the replica-exchange umbrella sampling, is a powerful algorithm that can give free energy in wide reaction coordinate space.
Yuji Sugita, Yuko Okamoto

Ab Initio QM/MM and Free Energy Calculations of Enzyme Reactions

A recently developed computational approach to studying enzyme reactions is reviewed. This approach consists of three major components: a pseudobond ab initio QM/MM method which provides a consistent and well-defined potential energy surface, an efficient iterative optimization procedure which determines the reaction paths with a realistic enzyme environment, and the free energy calculations which take account of the fluctuation of enzyme system. The review describes the applications of this QM/MM free energy approach to simulate reactions in two enzymes: enolase and triosephosphate isomerase (TIM). The calculations on enolase provide the insight on how the structure of the enolase active site organized to catalyze two different reaction to achieve overall catalytic efficiency. The study of TIM indicated a dual pathway mechanism and a low-barrier hydrogen bond (LBHB) formed in the enediol intermediate. The LBHB is found to be very short with a distance of 2.46 Angstrom between two oxygen donor atoms, but its strength is only about 3–4 kcal/mol stronger than the normal hydrogen bond and even less than the ionic asymmetric hydrogen bond. That is much less than the value of 10–20 kcal/mol in the LBHB hypothesis.
Yingkai Zhang, Haiyan Liu, Weitao Yang

Long Range Interactions and Fast Electrostatics Methods


Treecode Algorithms for Computing Nonbonded Particle Interactions

Two new algorithms are described for computing nonbonded particle interactions in classical molecular systems, (1) a particle-cluster treecode for the real space Ewald sum in a system with periodic boundary conditions, and (2) a cluster-cluster treecode for the total potential energy in a system with vacuum boundary conditions. The first algorithm treats electrostatic interactions and the second algorithm treats general power-law interactions. Both algorithms use a divide-and-conquer strategy, adapted rectangular clusters, and Taylor approximation in Cartesian coordinates. The necessary Taylor coefficients are computed efficiently using recurrence relations. The second algorithm implements variable order approximation, and a run-time choice between Taylor approximation and direct summation. Test results are presented for an equilibrated water system, and random and sparse particle systems.
Robert Krasny, Zhong-Hui Duan

A New Reciprocal Space Based Method for Treating Long Range Interactions in Ab Initio and Force-Field Based Calculations for Surfaces, Wires, and Clusters

A new formalism designed to treat long range forces in calculations of surfaces, systems which are infinitely replicated in two spatial directions and have a finite extent in the third, wires, systems which are infinitely replicated in one spatial direction and have a finite extent in the other two, and clusters, systems which are finite in all three spatial directions, is presented. The new formalism is based in reciprocal space and, therefore, permits straightforward extension of plane-wave based density functional theory, Ewald summation, and smooth particle-mesh Ewald methods to handle systems with less than three-dimensional periodicity. The new method is very easily implemented and will be demonstrated to yield a numerically accurate and efficient algorithm for performing calculations on both model and realistic examples of systems with reduced periodicity.
Mark E. Tuckerman, Peter Minary, Katianna Pihakari, Glenn J. Martyna

Efficient Computational Algorithms for Fast Electrostatics and Molecular Docking

Efficient computational techniques provide advantageous solutions for complex problems in molecular modeling and related fields. These computational algorithms can come at hand where “wet biology” cannot be, or is too expensive to be carried out; they also help in solving computational bottlenecks caused when using the direct calculation. Here we illustrate these ideas by presenting two computational methods. The first algorithm provides a linear-complexity multiscale computation of the many-body problem of calculating long-range electrostatics in charge and dipolar systems [1,2]. The second method brings a Computer Vision approach to a biomolecular structural recognition problem, namely, an automated method for molecular docking [38]. We conclude by demonstrating a possible implementation of electrostatic docking, i.e., combining the use of our multiscale fast electrostatics method in molecular docking.
B. Sandak

Statistical Approaches to Protein Structures


Fold Recognition using the OPLS All-Atom Potential and the Surface Generalized Born Solvent Model

Protein decoy data sets provide a benchmark for testing scoring functions designed for fold recognition and protein homology modeling problems. It is commonly believed that statistical potentials based on reduced atomic models are better able to discriminate native-like from misfolded decoys than scoring functions based on more detailed molecular mechanics models. Recent benchmark tests, however, suggest otherwise. Further analysis of the effectiveness of all atom molecular mechanics scoring functions for detecting misfolded decoys and direct comparison with results obtained using a statistical potential derived for a reduced atomic model are presented in this report. The OPLS all-atom force field is used as a scoring function to detect native protein folds among the Park & Levitt large decoy sets. Solvent electrostatic effects are included through the Surface Generalized Born (SGB) model. The OPLS potential with SGB solvation (OPLS-AA/SGB) provides good discrimination between native-like structures and non-native decoys. From an analysis of the individual energy components of the OPLS-AA/SGB potential for the native and the best-ranked decoy, it is determined that a roughly even balance of the terms of the potential is responsible for distinguishing the native from the misfolded conformations. Different combinations of individual energy terms provide less discrimination than the total energy. The effects of scoring decoys using several dielectric models are compared also. With the SGB solvation model, close to 100% of the structures with energies within 100 kcal/mol of the native state minimum are native-like. In contrast, only 20% of the low energy structures are found to be native-like when a distance dependent dielectric is used instead of SGB to model solvent electrostatic effects. The results are consistent with observations that all-atom molecular potentials coupled with intermediate level solvent dielectric models are competitive with knowledge-based potentials for decoy detection and protein modeling problems such as fold recognition.
Anthony K. Felts, Anders Wallqvist, Emilio Gallicchio, Donna Bassolino, Stanley R. Krystek, Ronald M. Levy

Identification of Sequence-Specific Tertiary Packing Motifs in Protein Structures using Delaunay Tessellation

An approach to recognizing recurrent sequence-structure patterns in proteins has been developed, based on Delaunay tessellation of protein structure. Starting with a united residue (side chain centroids) representation of a protein structure, tessellation partitions the structure into a unique set of irregular tetra-hedra, or simplices whose vertices correspond to four nearest-neighbor residues. Tetrahedral clusters composed of residues not adjacent along the polypeptide chain have been classified according to their amino acid composition and the three distances separating the residues along the sequence; these distances being defined as the sequence lengths from first to second, second to third, and third to fourth residue. An elementary tertiary packing motif is defined as a Delaunay simplex with a specific amino acid composition, together with three sequence distances (i.e., number of residues along the sequence) between vertex residues. Analysis of three databases of diverse protein structures (< 30% sequence identity between any pair, 1922 structures total) identified 224 motifs found in at least two proteins from different fold families each. To further substantiate the methodology, three groups of proteins representing unique structural and functional families were analyzed and packing motifs characteristic of each of them have been identified. The proposed methodology is termed Simplicial Neighborhood Analysis of Protein Packing (SNAPP). SNAPP can be used to locate recurrent tertiary structural motifs as well as sequence-specific, functionally relevant patterns similar to Prosite (Hofmann, et al. 1999) signatures. We anticipate that the SNAPP methodology will be useful in automating the analysis and comparison of protein structures determined in structural and functional genomics projects.
Stephen A. Cammer, Charles W. Carter, Alexander Tropsha


Weitere Informationen