Heuristic energy landscape paving for protein folding problem in the three-dimensional HP lattice model

doi:10.1016/j.compbiolchem.2012.02.001

Computational Biology and Chemistry

Volume 38, June 2012, Pages 17-26

https://doi.org/10.1016/j.compbiolchem.2012.02.001 Get rights and content

Abstract

The protein folding problem, i.e., the prediction of the tertiary structures of protein molecules from their amino acid sequences is one of the most important problems in computational biology and biochemistry. However, the extremely difficult optimization problem arising from energy function is a key challenge in protein folding simulation. The energy landscape paving (ELP) method has already been applied very successfully to off-lattice protein models and other optimization problems with complex energy landscape in continuous space. By improving the ELP method, and subsequently incorporating the neighborhood strategy with the pull-move set into the improved ELP method, a heuristic ELP algorithm is proposed to find low-energy conformations of 3D HP lattice model proteins in the discrete space. The algorithm is tested on three sets of 3D HP benchmark instances consisting 31 sequences. For eleven sequences with 27 monomers, the proposed method explores the conformation surfaces more efficiently than other methods, and finds new lower energies in several cases. For ten 48-monomer sequences, we find the lowest energies so far. With the achieved results, the algorithm converges rapidly and efficiently. For all ten 64-monomer sequences, the algorithm finds lower energies within comparable computation times than previous methods. Numeric results show that the heuristic ELP method is a competitive tool for protein folding simulation in 3D lattice model. To the best of our knowledge, this is the first application of ELP to the 3D discrete space.

Highlights

► We demonstrate the application of a new optimization technique ELP to the 3D discrete space. ► We propose a heuristic ELP algorithm for 3D HP lattice model. ► The heuristic ELP method combines the improved ELP and a new neighborhood strategy. ► Numeric results show that the proposed method is a competitive tool for protein folding simulation.

Introduction

Protein folding, one of the most important and challenging problems of computational biology and biochemistry, refers to the determination of the tertiary structures of protein molecules from their amino acid sequences. Due to rapid advances in DNA analysis, the number of known sequences has increased enormously, but progress in understanding their three-dimensional (3D) structures and their functions has lagged behind owing to the difficulty of solving the folding problem. The difficulty arises mainly from two aspects: the design of appropriate energy functions which can generally distinguish the native states from non-native states of protein molecules, and the extremely difficult optimization problems which arise from energy functions used in folding simulations.

To overcome at least some of these difficulties, various lattice models and simplified energy functions have been introduced. One of the most popular lattice models of protein folding is the so-called hydrophobic–polar (or hydrophobic–hydrophilic, HP) model (Dill, 1985, Lau and Dill, 1989), where only two types of monomers, hydrophobic (H) and polar (or hydrophilic, P) ones, are considered. It is assumed that the main driving force of the formation of the tertiary structure is the interaction between hydrophobic amino acids that tend to form a core in the spatial structure shielded from the surrounding solvent by hydrophilic ones. Given an amino acid sequence s = s₁s₂⋯s_n, s_i ∈ {H, P}, i ∈ {1, 2, …, n}, the protein is folded into a conformation with the minimum free energy. The conformation is a self-avoiding path whose monomers are on the vertices of the regular cubic lattice. The free energy function is defined as the negative number of nonconsecutive hydrophobic–hydrophobic (H–H) bonds, which means that two hydrophobic monomers not adjacent in sequence occupy adjacent vertices in the lattice. All other interactions between hydrophobic–hydrophilic (H–P) bonds and between hydrophilic–hydrophilic (P–P) bonds are defined as zero. In other words, if a conformation s has exactly l such H–H bonds, its free energy E(s) = l·(−1). Hence, a conformation with the lowest free energy corresponds to a conformation with the largest number of H–H bonds.

The interest in this model comes from the fact that the 3D HP model is very simple and can exhibit many features of real protein folding (Lau and Dill, 1990). Chemists have used this model to evaluate new hypothesis of protein structure formation. In addition, the simplicity of the model permits a rigorous analysis of the efficiency for a folding algorithm. Despite its simplification, the corresponding protein folding problem based on the HP model has proven to be NP-complete (Berger and Leighton, 1998). In recent years, a wide variety of approximate computational methods have been employed to simulate and analyze this model, including genetic algorithm (GA) (Unger and Moult, 1993, Custódio et al., 2004, Huang et al., 2010), self-organizing map (SOM) (Zhang et al., 2005), the pruned-enriched Rosenbluth method (PERM) (Grassberger, 1997) and its variations (nPERMis (Hsu et al., 2003) and nPERMh (Huang et al., 2005)), elastic net algorithm with local search (EN) (Guo and Feng, 2006), contact interactions method (CI) (Toma and Toma, 1996), core-directed chain growth method (CG) (Beutler and Dill, 1996), constraint-based hydrophobic core construction (CHCC) (Yue et al., 1995), ant colony optimization (ACO) (Shmygelska and Hoos, 2005), simulated annealing algorithm (SA) (Albrecht et al., 2008), replica exchange Monte Carlo method (REMC) (Thachuk et al., 2007), particle swarm optimization (PSO) (Kanj et al., 2009), and others.

In this paper, the modified energy landscape paving (ELP) (Liu et al., 2009, Liu and Li, 2010) is introduced for the protein folding problem in 3D HP lattice model. ELP was originally proposed by Hansmann and Wille (2002) to simulate protein folding of all-atom proteins, such as pentapeptide Met-enkephalin and 36-residue peptide (HP-36) modeled by the ECEPP/2 force field, and hereafter was introduced for the off-lattice model proteins (Schug et al., 2005, Liu and Huang, 2006), and the circular packing problems (Liu et al., 2009, Liu and Li, 2010). These are applications of ELP in rough energy landscape in continuous space for finding low-energy conformations. To the best of our knowledge, now no researches have applied ELP to discrete combinatorial optimization problem. In our current work, to demonstrate the efficiency of ELP in discrete space, we improve further the ELP method and use it as a tool to fold up given sequences on a 3D cubic lattice. To address this problem, the neighborhood strategy based on the pull-move set is employed. The computational results, on three sets of instances taken from the literature, show the effectiveness of the proposed method.

The paper is organized as follows. Section 2 introduces the improved ELP method, the neighborhood strategy with pull-move set, and gives the description of the folding algorithm. Section 3 is devoted to demonstrate and compare the experimental results, and give performance analysis of different parameter values of the algorithm. Section 4 concludes the article with discussion.

Section snippets

Improved ELP method

The energy landscape paving (ELP) (Hansmann and Wille, 2002) is an improved Monte Carlo (MC) global optimization method, and combines ideas from energy landscape deformation (Besold et al., 1999) and tabu search (Cvijovic and Klinowski, 1995). It modifies the energy function so that the search avoids revisiting recently visited regions. This means that if a conformation c is hit at a certain “time” t, the energy E(c, t) is increased by a “penalty”, and replaced by energy $\tilde{E} (c, t) = E (c, t) + f (H (q, t))$

Numeric results and performance analysis

We implement the heuristic energy landscape paving (hELP) algorithm in Java language and run it on a Notebook PC with Intel Core 2 Duo, 1.6 GHz processor and 1.0 GB of RAM. We perform different computational experiments both to compare the results of the problem under investigation and to analyze the selection of the parameter values for hELP in an appropriate way.

Conclusions

We have described a general global optimization approach, the energy landscape paving (ELP) method. As all good stochastic global optimizers, it is designed to explore low-energy conformations while avoiding at the same time entrapment in local minima. The Metropolis sampling and the accumulated histogram function helps ELP escape local minima. By proposing a new update mechanism of the histogram function in ELP and the generation of a valid initial conformation based on the heuristic strategy,

Acknowledgments

This work is supported by Special Foundation of China Postdoctoral Science Foundation (201104572), the Natural Science Foundation of Jiangsu Province (BK2010570), Jiangsu Planned Projects for Postdoctoral Research Funds (1001030B), the National Public Benefit Research Foundation of China (GYHY200906006, GYHY201106037), the China Postdoctoral Science Foundation funded Project (20100471350), Natural Science Foundation of Education Committee of Jiangsu Province (09KJB520008), and Qing Lan Project.

References (28)

A.A. Albrecht et al.
Stochastic protein folding simulation in the three-dimensional HP-model
Comput. Biol. Chem.
(2008)
G. Besold et al.
Efficient Monte Carlo sampling by direct flattening of free energy barriers
Comput. Mater. Sci.
(1999)
C.H. Huang et al.
Protein folding simulations of 2D HP model by the genetic algorithm based on optimal secondary structures
Comput. Biol. Chem.
(2010)
J.F. Liu et al.
An improved energy landscape paving algorithm for the problem of packing circles into a larger containing circle
Comput. Ind. Eng.
(2009)
B. Berger et al.
Protein folding in the hydrophobic–hydrophilic (HP) model is NP-complete
J. Comput. Biol.
(1998)
T.C. Beutler et al.
A fast conformational search strategy for finding low energy structures of model proteins
Protein Sci.
(1996)
F.L. Custódio et al.
Investigation of the three-dimensional lattice HP protein folding model using a genetic algorithm
Genet. Mol. Biol.
(2004)
D. Cvijovic et al.
Taboo search: an approach to the multiple minima problem
Science
(1995)
K.A. Dill
Theory for the folding and stability of globular protein
Biochemistry
(1985)
P. Grassberger
The pruned-enriched Rosenbluth method: simulations of theta polymers of chain length up to 1000000
Phys. Rev. E
(1997)

Y.Z. Guo et al.

The simulation of the three-dimensional lattice hydrophobic–polar protein folding

J. Chem. Phys.

(2006)

U.H.E. Hansmann et al.

Global optimization by energy landscape paving

Phys. Rev. Lett.

(2002)

H.P. Hsu et al.

Growth algorithms for lattice heteropolymers at low temperatures

J. Chem. Phys.

(2003)

X.Q. Hu et al.

A gradient-directed Monte Carlo method for global optimization in a discrete space: application to protein sequence design and folding

J. Chem. Phys.

(2009)

Cited by (11)

An integer programming model for protein structure prediction using the 3D-HP side chain model
2016, Discrete Applied Mathematics
Citation Excerpt :
Despite these simplifications, the exhaustive search of the conformational space of a protein using the simplest model (HP in two dimensions—2D) leads to a problem which complexity was proved to be NP-hard [9,26]. Consequently, many heuristic methods have been proposed for solving instances of 2D and 3D discrete models [15,22,24,27], as well as for continuous models [20,21,33,18]. However, there are few studies in the literature involving methods to solve the protein folding problem considering models with side chains, such as those suggested by Bromberg and Dill [6] and Hart and Istrail [11].
In spite of the fact that many simplified model variants of protein structure prediction have been widely studied in the past years, few attention has been given to discrete models with side chains, for which there is no specific benchmark. In this paper, we propose an integer programming model for the 3D-HP side chain protein structure prediction problem. The model accounts for the energy resulting from all types of interactions, between pairs of backbone elements, hydrophilic side chains and hydrophobic side chains. Three sets of instances, modified from the literature, were used in the experiments, and the maximum number of non-local hydrophobic contact was found using the ILOG CPLEX optimization package. We offer the optimal solution found for several instances of the benchmark. It is expected that the mathematical model allow further studies of the protein structure prediction with side chains and may, for some cases, provide new optimal values or new bounds that would rekindle the interest to this fascinating problem domain.
A Novel Branch-and-Bound Algorithm for the Protein Folding Problem in the 3D HP Model
2021, IEEE/ACM Transactions on Computational Biology and Bioinformatics
A self-adaptive differential evolution with fragment insertion for the protein structure prediction problem
2019, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Hybrid method to solve HP model on 3D lattice and to probe protein stability upon amino acid mutations
2017, BMC Systems Biology
3D protein structure prediction with BSA-TS algorithm
2016, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Parallel ant colony optimization for the HP protein folding problem
2016, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

View all citing articles on Scopus

View full text

Heuristic energy landscape paving for protein folding problem in the three-dimensional HP lattice model

Abstract

Highlights

Introduction

Section snippets

Improved ELP method

Numeric results and performance analysis

Conclusions

Acknowledgments

Comput. Biol. Chem.

Comput. Mater. Sci.

Comput. Biol. Chem.

Comput. Ind. Eng.

Protein folding in the hydrophobic–hydrophilic (HP) model is NP-complete

J. Comput. Biol.

A fast conformational search strategy for finding low energy structures of model proteins

Protein Sci.

Investigation of the three-dimensional lattice HP protein folding model using a genetic algorithm

Genet. Mol. Biol.

Taboo search: an approach to the multiple minima problem

Science

Theory for the folding and stability of globular protein

Biochemistry

The pruned-enriched Rosenbluth method: simulations of theta polymers of chain length up to 1000000

Phys. Rev. E

The simulation of the three-dimensional lattice hydrophobic–polar protein folding

J. Chem. Phys.

Global optimization by energy landscape paving

Phys. Rev. Lett.

Growth algorithms for lattice heteropolymers at low temperatures

J. Chem. Phys.

A gradient-directed Monte Carlo method for global optimization in a discrete space: application to protein sequence design and folding

J. Chem. Phys.