Heuristic energy landscape paving for protein folding problem in the three-dimensional HP lattice model

https://doi.org/10.1016/j.compbiolchem.2012.02.001Get rights and content

Abstract

The protein folding problem, i.e., the prediction of the tertiary structures of protein molecules from their amino acid sequences is one of the most important problems in computational biology and biochemistry. However, the extremely difficult optimization problem arising from energy function is a key challenge in protein folding simulation. The energy landscape paving (ELP) method has already been applied very successfully to off-lattice protein models and other optimization problems with complex energy landscape in continuous space. By improving the ELP method, and subsequently incorporating the neighborhood strategy with the pull-move set into the improved ELP method, a heuristic ELP algorithm is proposed to find low-energy conformations of 3D HP lattice model proteins in the discrete space. The algorithm is tested on three sets of 3D HP benchmark instances consisting 31 sequences. For eleven sequences with 27 monomers, the proposed method explores the conformation surfaces more efficiently than other methods, and finds new lower energies in several cases. For ten 48-monomer sequences, we find the lowest energies so far. With the achieved results, the algorithm converges rapidly and efficiently. For all ten 64-monomer sequences, the algorithm finds lower energies within comparable computation times than previous methods. Numeric results show that the heuristic ELP method is a competitive tool for protein folding simulation in 3D lattice model. To the best of our knowledge, this is the first application of ELP to the 3D discrete space.

Highlights

► We demonstrate the application of a new optimization technique ELP to the 3D discrete space. ► We propose a heuristic ELP algorithm for 3D HP lattice model. ► The heuristic ELP method combines the improved ELP and a new neighborhood strategy. ► Numeric results show that the proposed method is a competitive tool for protein folding simulation.

Introduction

Protein folding, one of the most important and challenging problems of computational biology and biochemistry, refers to the determination of the tertiary structures of protein molecules from their amino acid sequences. Due to rapid advances in DNA analysis, the number of known sequences has increased enormously, but progress in understanding their three-dimensional (3D) structures and their functions has lagged behind owing to the difficulty of solving the folding problem. The difficulty arises mainly from two aspects: the design of appropriate energy functions which can generally distinguish the native states from non-native states of protein molecules, and the extremely difficult optimization problems which arise from energy functions used in folding simulations.

To overcome at least some of these difficulties, various lattice models and simplified energy functions have been introduced. One of the most popular lattice models of protein folding is the so-called hydrophobic–polar (or hydrophobic–hydrophilic, HP) model (Dill, 1985, Lau and Dill, 1989), where only two types of monomers, hydrophobic (H) and polar (or hydrophilic, P) ones, are considered. It is assumed that the main driving force of the formation of the tertiary structure is the interaction between hydrophobic amino acids that tend to form a core in the spatial structure shielded from the surrounding solvent by hydrophilic ones. Given an amino acid sequence s = s1s2sn, si  {H, P}, i  {1, 2, …, n}, the protein is folded into a conformation with the minimum free energy. The conformation is a self-avoiding path whose monomers are on the vertices of the regular cubic lattice. The free energy function is defined as the negative number of nonconsecutive hydrophobic–hydrophobic (H–H) bonds, which means that two hydrophobic monomers not adjacent in sequence occupy adjacent vertices in the lattice. All other interactions between hydrophobic–hydrophilic (H–P) bonds and between hydrophilic–hydrophilic (P–P) bonds are defined as zero. In other words, if a conformation s has exactly l such H–H bonds, its free energy E(s) = l·(−1). Hence, a conformation with the lowest free energy corresponds to a conformation with the largest number of H–H bonds.

The interest in this model comes from the fact that the 3D HP model is very simple and can exhibit many features of real protein folding (Lau and Dill, 1990). Chemists have used this model to evaluate new hypothesis of protein structure formation. In addition, the simplicity of the model permits a rigorous analysis of the efficiency for a folding algorithm. Despite its simplification, the corresponding protein folding problem based on the HP model has proven to be NP-complete (Berger and Leighton, 1998). In recent years, a wide variety of approximate computational methods have been employed to simulate and analyze this model, including genetic algorithm (GA) (Unger and Moult, 1993, Custódio et al., 2004, Huang et al., 2010), self-organizing map (SOM) (Zhang et al., 2005), the pruned-enriched Rosenbluth method (PERM) (Grassberger, 1997) and its variations (nPERMis (Hsu et al., 2003) and nPERMh (Huang et al., 2005)), elastic net algorithm with local search (EN) (Guo and Feng, 2006), contact interactions method (CI) (Toma and Toma, 1996), core-directed chain growth method (CG) (Beutler and Dill, 1996), constraint-based hydrophobic core construction (CHCC) (Yue et al., 1995), ant colony optimization (ACO) (Shmygelska and Hoos, 2005), simulated annealing algorithm (SA) (Albrecht et al., 2008), replica exchange Monte Carlo method (REMC) (Thachuk et al., 2007), particle swarm optimization (PSO) (Kanj et al., 2009), and others.

In this paper, the modified energy landscape paving (ELP) (Liu et al., 2009, Liu and Li, 2010) is introduced for the protein folding problem in 3D HP lattice model. ELP was originally proposed by Hansmann and Wille (2002) to simulate protein folding of all-atom proteins, such as pentapeptide Met-enkephalin and 36-residue peptide (HP-36) modeled by the ECEPP/2 force field, and hereafter was introduced for the off-lattice model proteins (Schug et al., 2005, Liu and Huang, 2006), and the circular packing problems (Liu et al., 2009, Liu and Li, 2010). These are applications of ELP in rough energy landscape in continuous space for finding low-energy conformations. To the best of our knowledge, now no researches have applied ELP to discrete combinatorial optimization problem. In our current work, to demonstrate the efficiency of ELP in discrete space, we improve further the ELP method and use it as a tool to fold up given sequences on a 3D cubic lattice. To address this problem, the neighborhood strategy based on the pull-move set is employed. The computational results, on three sets of instances taken from the literature, show the effectiveness of the proposed method.

The paper is organized as follows. Section 2 introduces the improved ELP method, the neighborhood strategy with pull-move set, and gives the description of the folding algorithm. Section 3 is devoted to demonstrate and compare the experimental results, and give performance analysis of different parameter values of the algorithm. Section 4 concludes the article with discussion.

Section snippets

Improved ELP method

The energy landscape paving (ELP) (Hansmann and Wille, 2002) is an improved Monte Carlo (MC) global optimization method, and combines ideas from energy landscape deformation (Besold et al., 1999) and tabu search (Cvijovic and Klinowski, 1995). It modifies the energy function so that the search avoids revisiting recently visited regions. This means that if a conformation c is hit at a certain “time” t, the energy E(c, t) is increased by a “penalty”, and replaced by energy E˜(c,t)=E(c,t)+f(H(q,t))

Numeric results and performance analysis

We implement the heuristic energy landscape paving (hELP) algorithm in Java language and run it on a Notebook PC with Intel Core 2 Duo, 1.6 GHz processor and 1.0 GB of RAM. We perform different computational experiments both to compare the results of the problem under investigation and to analyze the selection of the parameter values for hELP in an appropriate way.

Conclusions

We have described a general global optimization approach, the energy landscape paving (ELP) method. As all good stochastic global optimizers, it is designed to explore low-energy conformations while avoiding at the same time entrapment in local minima. The Metropolis sampling and the accumulated histogram function helps ELP escape local minima. By proposing a new update mechanism of the histogram function in ELP and the generation of a valid initial conformation based on the heuristic strategy,

Acknowledgments

This work is supported by Special Foundation of China Postdoctoral Science Foundation (201104572), the Natural Science Foundation of Jiangsu Province (BK2010570), Jiangsu Planned Projects for Postdoctoral Research Funds (1001030B), the National Public Benefit Research Foundation of China (GYHY200906006, GYHY201106037), the China Postdoctoral Science Foundation funded Project (20100471350), Natural Science Foundation of Education Committee of Jiangsu Province (09KJB520008), and Qing Lan Project.

References (28)

  • Y.Z. Guo et al.

    The simulation of the three-dimensional lattice hydrophobic–polar protein folding

    J. Chem. Phys.

    (2006)
  • U.H.E. Hansmann et al.

    Global optimization by energy landscape paving

    Phys. Rev. Lett.

    (2002)
  • H.P. Hsu et al.

    Growth algorithms for lattice heteropolymers at low temperatures

    J. Chem. Phys.

    (2003)
  • X.Q. Hu et al.

    A gradient-directed Monte Carlo method for global optimization in a discrete space: application to protein sequence design and folding

    J. Chem. Phys.

    (2009)
  • Cited by (11)

    • An integer programming model for protein structure prediction using the 3D-HP side chain model

      2016, Discrete Applied Mathematics
      Citation Excerpt :

      Despite these simplifications, the exhaustive search of the conformational space of a protein using the simplest model (HP in two dimensions—2D) leads to a problem which complexity was proved to be NP-hard [9,26]. Consequently, many heuristic methods have been proposed for solving instances of 2D and 3D discrete models [15,22,24,27], as well as for continuous models [20,21,33,18]. However, there are few studies in the literature involving methods to solve the protein folding problem considering models with side chains, such as those suggested by Bromberg and Dill [6] and Hart and Istrail [11].

    • A Novel Branch-and-Bound Algorithm for the Protein Folding Problem in the 3D HP Model

      2021, IEEE/ACM Transactions on Computational Biology and Bioinformatics
    • A self-adaptive differential evolution with fragment insertion for the protein structure prediction problem

      2019, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    • 3D protein structure prediction with BSA-TS algorithm

      2016, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    • Parallel ant colony optimization for the HP protein folding problem

      2016, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    View all citing articles on Scopus
    View full text