PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework

doi:10.1016/j.jtbi.2018.01.023

Journal of Theoretical Biology

Volume 443, 14 April 2018, Pages 125-137

https://doi.org/10.1016/j.jtbi.2018.01.023 Get rights and content

Highlights

•
PREvaIL is a new method for inferring catalytic residues based on a comprehensive set of features.
•
Benchmarking experiments showed that PREvaIL achieved competitive performance.
•
It was able to capture useful signals to improve catalytic residue prediction.
•
PREvaIL can facilitate characterization and functional annotation of proteins.

Abstract

Determining the catalytic residues in an enzyme is critical to our understanding the relationship between protein sequence, structure, function, and enhancing our ability to design novel enzymes and their inhibitors. Although many enzymes have been sequenced, and their primary and tertiary structures determined, experimental methods for enzyme functional characterization lag behind. Because experimental methods used for identifying catalytic residues are resource- and labor-intensive, computational approaches have considerable value and are highly desirable for their ability to complement experimental studies in identifying catalytic residues and helping to bridge the sequence–structure–function gap. In this study, we describe a new computational method called PREvaIL for predicting enzyme catalytic residues. This method was developed by leveraging a comprehensive set of informative features extracted from multiple levels, including sequence, structure, and residue-contact network, in a random forest machine-learning framework. Extensive benchmarking experiments on eight different datasets based on 10-fold cross-validation and independent tests, as well as side-by-side performance comparisons with seven modern sequence- and structure-based methods, showed that PREvaIL achieved competitive predictive performance, with an area under the receiver operating characteristic curve and area under the precision-recall curve ranging from 0.896 to 0.973 and from 0.294 to 0.523, respectively. We demonstrated that this method was able to capture useful signals arising from different levels, leveraging such differential but useful types of features and allowing us to significantly improve the performance of catalytic residue prediction. We believe that this new method can be utilized as a valuable tool for both understanding the complex sequence–structure–function relationships of proteins and facilitating the characterization of novel enzymes lacking functional annotations.

Graphical abstract

The flowchart of PREvaIL for inferring catalytic residues based on the integration of sequence, structural, and residue-contact-network features using the random forest machine-learning framework.

Introduction

As powerful biological catalysts, enzymes can effectively catalyze biochemical reactions at extremely high rates and are thus indispensable for many biological processes and pathways (Khosla and Harbury, 2001). Many important findings acquired from enzyme fast reaction systems (Chou and Zhou, 1982, Kuo-chen and Shou-ping, 1974, Zhou and Zhong, 1982) significantly impact both basic research (Gardner et al., 2015) and drive changes in medicinal chemistry (Chou, 2017). However, the residues comprising an enzyme differ greatly in functional significance, with only a small number directly involved in catalytic activity (Furnham et al., 2014). Accordingly, understanding which of these are catalytic residues is critical for our determining relationships between protein sequence, structure, function, and enhancing our ability to design novel inhibitors and enzymes. This has important implications in the post-genomic era, with its challenge of bridging the widening protein sequence–structure gap. Although sequence information for many enzymes is known, relatively few enzymes have been functionally characterized. Therefore, detailed information regarding catalytic residues and enzyme active sites explicitly involved in catalysis remains lacking. Because experimental methods for identifying catalytic residues are resource- and labor-intensive, high-throughput in silico approaches have considerable value and are highly desirable for complementing experimental efforts in identifying catalytic residues and helping to bridge the sequence–structure–function gap.

In recent years, a variety of computational methods have been developed for predicting catalytic residues or functional residues involved in catalytic reactions (Chou and Cai, 2004). These methods differ in several ways, including in the machine-learning or statistical-scoring technique used, the types of sequence features used, whether or not structural features are used in addition to sequence features, and in the sources of training and testing datasets. According to the types of features used for constructing prediction models, existing methods can be generally categorized into four major groups.

The first group of methods was primarily developed based on protein sequence and typically relied upon extracting useful sequence features for inputs used to train the prediction models. Commonly used sequence features include evolutionary information in the form of position-specific scoring matrices (PSSMs) or sequence conservation inferred from multiple sequence alignments (Capra and Singh, 2007, Fischer et al., 2008, La et al., 2005, Pai et al., 2015, Youn et al., 2007, Zhang et al., 2008) or other sequence-derived features, such as Jensen-Shannon divergence scores, relative entropies (Dou et al., 2012, Dou et al., 2010, Fischer et al., 2008), and predicted structural information inferred from sequences, including secondary structure and solvent accessibility (Dou et al., 2012, Kauffman and Karypis, 2009, Shen et al., 2009).

Recently, many research groups exploited the increasing quantity of structural data deposited in the Protein Data Bank (PDB) (Rose et al., 2017), prompting the proliferation of the second group of methods, which leverage structural information to build the prediction models (Alterovitz et al., 2009, Chea and Livesay, 2007, Cilia and Passerini, 2010, Gutteridge et al., 2003, Han et al., 2012, Kirshner et al., 2013, Panchenko et al., 2004, Petrova and Wu, 2006, Sun et al., 2016, Xin et al., 2010, Youn et al., 2007). Xin et al. (2010) proposed a structure-based kernel algorithm for the prediction of catalytic residues by explicitly modeling the similarity between residue-centered neighborhoods in protein structures (Xin et al., 2010). They showed that the geometry, physicochemical properties, and evolutionary conservation play an important role in determining catalytic residue activity. In a recent study, Sun et al. (2016) developed the CRHunter method which combined both sequence and structural information in an SVM framework that achieved stable performance when compared with other template-based predictors (Sun et al., 2016). Chien and Huang proposed an approach EXIA based on residue side chain orientation and backbone flexibility of protein structure, which achieved a comparable performance to that of evolutionary sequence conservation (Chien and Huang, 2012). In another study, Kirshner et al. (2013) developed the Catsid (Catalytic site identification) search engine, which enables rapid searches for structural matches to a user-specified catalytic site among all PDB structures. Its capacity to rapidly search all known protein structures in the PDB is enabled by a logistic regression-based model that allows for systematic identification of true positives based on a set of feature descriptors (Kirshner et al., 2013).

The third group of methods (Chea and Livesay, 2007, del Sol et al., 2006, del Sol and O'Meara, 2005, Li et al., 2011) involve graph-theoretical methods that essentially rely on representing protein three-dimensional (3D) structures as small world networks (Watts and Strogatz, 1998), where amino acid residues specify vertices within a graph while two residues in a proximal spatial neighborhood form edges. Zhou et al. provided a comprehensive review on recent progress in this area (Zhou et al., 2016). Previous studies showed that representing protein structure as a topological residue-contact network can provide novel insights into protein folding mechanisms, stability, and function (del Sol et al., 2006, del Sol and O'Meara, 2005, Jiao and Ranganathan, 2017, Song et al., 2010, Tang et al., 2008, Wang et al., 2012, Zheng et al., 2012). Chea and Livesay (2007) benchmarked the performance of one particular network measure called closeness centrality and showed that it provided statistically significant predictive power for catalytic residue predictions. They also demonstrated that solvent accessibility or residue identity could be used as an efficient filter by this network feature to further improve its predictive performance (Chea and Livesay, 2007).

The fourth group of methods uses heterogeneous features through the integration or fusion of sequence, structure, and other types of features (Li et al., 2011, Sankararaman et al., 2010, Tang et al., 2008). Because the extracted features are heterogeneous, redundant, and noisy, a number of feature-selection and dimensionality reduction algorithms are often employed and used in combination with the learning algorithms to remove irrelevant features and improve model training in order to increase prediction accuracy. In terms of the algorithms used for training these prediction models, machine learning or statistical scoring approaches are often employed and used include neural networks (Gutteridge et al., 2003), information-theoretic algorithms (Capra and Singh, 2007, Fischer et al., 2008), genetic algorithms (Izidoro et al., 2015), support vector machines (SVMs) (Chea and Livesay, 2007, Li et al., 2011, Pai et al., 2015, Petrova and Wu, 2006, Sun et al., 2016, Youn et al., 2007), kernel-based algorithms (Xin et al., 2010), AdaBoost (Alterovitz et al., 2009), and logistic regression (Dou et al., 2012, Kirshner et al., 2013, Sankararaman et al., 2010). The consensus of these studies has been that evolutionary information, sequence conservation, and the structural neighborhood of catalytic residues are important predictive features, with machine learning-based approaches often providing competitive performance, making them particularly suitable for dealing with high-dimensional heterogeneous feature spaces.

Despite the development and increasing availability of such a wide range of methods, three main challenges need to be overcome to predict catalytic residues by machine leaning-based approaches: (1) Sequence and structural features are still not sufficient to predict the catalytic residues of certain proteins. Accordingly, it is necessary to find and exploit other novel and complementary groups or types of features that can be used to further improve prediction performance. (2) Methods for quantifying and characterizing the relative importance and contribution of each group of features according to model performance are needed. (3) It is necessary to determine which machine learning algorithm provides the overall highest and most reliable prediction performance.

To address these questions, in this study, we present a new machine learning-based approach called PREvaIL (PRotEin various Information-based cataLytic site predictor) for predicting catalytic residues based on a random forest (RF) algorithm. In terms of input features, this approach combines a variety of sequence and structural features, as well as residue-contact-network properties, and uses an efficient feature-selection technique to select a subset of more useful features for catalytic residue prediction. We performed extensive benchmark experiments using eight different test datasets to evaluate the performance of this approach and compared it with other competing methods. The results showed that this new approach performed favorably as compared with other methods, thereby illustrating its effectiveness.

Section snippets

Materials and methods

According to the 5-step rule (Chou, 2011), the first important step in developing a new predictor involves construction or selection of an effective benchmark dataset. In this study, we addressed this problem as follows.

Feature ranking by the MDGI Z-score

We calculated and ranked the MDGI Z-scores of all initial 3424 features (see Table 1 for a summary of these features) using the randomForest R package in order to assess the relative importance and contribution of each feature type. As a result, we identified a total of 127 feature-vector elements with MDGI Z-score > 1.0, of which 41 had an MDGI Z-score > 2.0. The relative importance and ranking of these feature vectors are plotted in Fig. 2. A detailed list of these feature vectors according

Conclusions

In this study, we demonstrated that the combinatorial application of machine learning techniques on multi-level protein features involving sequence-derived, structural, and residue-contact-network features allowed the development of a powerful bioinformatics predictor, PREvaIL. Previous methods explored these different levels of features separately; however, we illustrated their effective integration into a machine-learning framework to provide complementary information to collectively help

Acknowledgments

We would like to thank Drs. Tuo Zhang and Lukasz Kurgan who graciously made publicly available the datasets and extracted features used in the benchmarking testing of CRpred.

References (104)

R. Alterovitz
ResBoost: characterizing and predicting catalytic residues in enzymes
BMC Bioinf.
(2009)
S.F. Altschul
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Res.
(1997)
E. Amanzadeh et al.
Classification of DNA minor and major grooves binding proteins according to the NLSS by data analysis methods
Appl. Biochem. Biotechnol.
(2014)
G. Amitai
Network analysis of protein structures identifies functional residues
J. Mol. Biol.
(2004)
M. Behbahani et al.
Analysis and comparison of lignin peroxidases between fungi and bacteria using three different modes of Chou's general pseudo amino acid composition
J. Theor. Biol.
(2016)
M.M. Beigi et al.
Prediction of metalloproteinase family based on the concept of Chou's pseudo amino acid composition using a machine learning approach
J. Struct. Funct. Genomics
(2011)
L Breiman
Random forests
Mach. Learn.
(2001)
D.S. Cao et al.
propy: a tool to generate various modes of Chou's PseAAC
Bioinformatics
(2013)
J.A. Capra et al.
Predicting functionally important residues from sequence conservation
Bioinformatics
(2007)
P. Carter et al.
Dissecting the catalytic triad of a serine protease
Nature
(1988)

E. Chea et al.

How accurate and statistically robust are catalytic site predictions based on closeness centrality?

BMC Bioinf.

(2007)

Z. Chen

ZincExplorer: an accurate hybrid method to improve the prediction of zinc-binding sites from protein sequences

Mol. Biosyst.

(2013)

Y.-T. Chien et al.

Accurate prediction of protein catalytic residues by side chain orientation and residue contact density

PLoS One

(2012)

K.C. Chou

Prediction of protein cellular attributes using pseudo-amino acid composition

Proteins Struct. Funct. Genet.

(2001)

K.C. Chou

Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes

Bioinformatics

(2005)

K.C. Chou

Some remarks on protein attribute prediction and pseudo amino acid composition

J. Theor. Biol.

(2011)

K.C. Chou

Impacts of Bioinformatics to Medicinal Chemistry

Med. Chem.

(2015)

K.C. Chou

An Unprecedented Revolution in Medicinal Chemistry Driven by the Progress of Biological Science

Curr. Top. Med. Chem.

(2017)

K.C. Chou et al.

A novel approach to predict active sites of enzyme molecules

Proteins Struct. Funct. Bioinf.

(2004)

K.C. Chou et al.

Role of the protein outside active-site on the diffusion-controlled reaction of enzyme

J. Am. Chem. Soc.

(1982)

E. Cilia et al.

Automatic prediction of catalytic residues by modeling residue structural neighborhood

BMC Bioinf.

(2010)

P.J. Cock

Biopython: freely available Python tools for computational molecular biology and bioinformatics

Bioinformatics

(2009)

G. Csardi et al.

The igraph software package for complex network research

Int. J. Complex Syst.

(2006)

A. del Sol

Residue centrality, functionally important residues, and active site shape: analysis of enzyme and non-enzyme families

Protein Sci.

(2006)

A. del Sol et al.

Small-world network approach to identify key residues in protein-protein interaction

Proteins

(2005)

F.M. Disfani

MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins

Bioinformatics

(2012)

Y. Dou

L1pred: a sequence-based prediction tool for catalytic residues in enzymes with the L1-logreg classifier

PLoS One

(2012)

Y. Dou

Prediction of catalytic residues based on an overlapping amino acid classification

Amino Acids

(2010)

P.F. Du et al.

PseAAC-General: fast building various modes of general form of chou's pseudo-amino acid composition for large-scale protein datasets

Int. J. Mol. Sci.

(2014)

P.F. Du

PseAAC-Builder: A cross-platform stand-alone program for generating various special Chou's pseudo-amino acid compositions

Anal. Biochem.

(2012)

M. Esmaeili et al.

Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses

J. Theor. Biol.

(2010)

J.D. Fischer et al.

Prediction of protein functional residues from sequence by probability density estimation

Bioinformatics

(2008)

K. Fritz-Wolf

Structure of mitochondrial creatine kinase

Nature

(1996)

L. Fu

CD-HIT: accelerated for clustering the next-generation sequencing data

Bioinformatics

(2012)

N. Furnham

The Catalytic Site Atlas 2.0: cataloging catalytic sites and residues identified in enzymes

Nucleic Acids Res.

(2014)

P.R. Gardner et al.

Globins Scavenge Sulfur Trioxide Anion Radical

J. Biol. Chem.

(2015)

A. Gutteridge et al.

Using a neural network and spatial clustering to predict the location of active sites in enzymes

J. Mol. Biol.

(2003)

Z. Hajisharifi

Predicting anticancer peptides with Chou′ s pseudo amino acid composition and investigating their mutagenicity via Ames test

J. Theor. Biol.

(2014)

T Hamelryck

An amino acid has two sides: a new 2D measure provides a different view of solvent exposure

Proteins

(2005)

L. Han

Identification of catalytic residues using a novel feature that integrates the microenvironment and geometrical location properties of residues

PLoS One

(2012)

S.J. Hubbard et al.

(1993)

S.C. Izidoro et al.

GASS: identifying enzyme active sites with genetic algorithms

Bioinformatics

(2015)

J.H. Jia

iPPI-Esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC

J. Theor. Biol.

(2015)

J.H. Jia

pSuc-Lys: predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach

J. Theor. Biol.

(2016)

X. Jiao et al.

Prediction of interface residue based on the features of residue interaction network

J. Theor. Biol.

(2017)

L. Jin et al.

Crystal structure at 2.8 A resolution of anabolic ornithine transcarbamylase from Escherichia coli

Nat. Struct. Biol.

(1997)

D.T. Jones

Protein secondary structure prediction based on position-specific scoring matrices

J. Mol. Biol.

(1999)

D.T. Jones et al.

DISOPRED3: precise disordered region predictions with annotated protein-binding activity

Bioinformatics

(2015)

W. Kabsch et al.

Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features

Biopolymers

(1983)

C. Kauffman et al.

LIBRUS: combined machine learning and homology information for sequence-based ligand-binding residue prediction

Bioinformatics

(2009)

Cited by (114)

Protein encoder: An autoencoder-based ensemble feature selection scheme to predict protein secondary structure
2023, Expert Systems with Applications
Citation Excerpt :
An RF algorithm has been used in protein-RNA binding sites, enzyme catalyst residues, helical domain linker, and oligomer status of coiled helical regions. This enables better results (Song et al., 2018; Okun & Priisalu, 2007; Jia & Hu, 2011; Richa et al., 2017; Liu et al., 2010). Yavuz et al. (2018) use MLP classifier for prediction of protein secondary structure.
Proteins play a vital role in the human body as they perform important metabolic tasks. Experimental identification of protein structure is expensive and time consuming. The prediction of protein secondary structure is significant to identify the protein tertiary structure and its folds. The feature subset selection from high dimensional protein primary sequence is a key to improve the accuracy of Protein Secondary Structure Prediction (PSSP). Therefore, it is essential to select the relevant features from high dimensional data to predict the protein secondary structure. This work presents a novel method for the PSSP problem based on a two-phase feature selection technique. The first stage utilizes an unsupervised autoencoder for feature extractions. Whereas, the second stage is an ensemble of three feature selection methods, namely, generic univariate select, recursive feature elimination, and Pearson's correlation. This phase combines multiple feature subsets using mutual information to select the optimum feature subset. For classification, different resultant subset features are used. These include random forest, decision tree, and multilayer perceptron. Two sets of experiments are performed on five datasets for the assessment of proposed work. The proposed solution is compared with three state-of-the-art methods based on Q3 accuracy, Q8 accuracy, and segment overlap score. Obtained results show that the proposed framework performs better in the majority of the cases than the past contributions. The proposed framework achieves Q8 accuracies of 82%, 80%, 79%, 73%, and 74% and Q3 accuracies of 90%, 90%, 92%, 79%, and 74% on CB6133, CB6133-filtered, CB513, CASP10, and CASP11 datasets, respectively.
Improving the prediction of DNA-protein binding by integrating multi-scale dense convolutional network with fault-tolerant coding
2022, Analytical Biochemistry
Accurate prediction of DNA-protein binding (DPB) is of great biological significance for studying the regulatory mechanism of gene expression. In recent years, with the rapid development of deep learning techniques, advanced deep neural networks have been introduced into the field and shown to significantly improve the prediction performance of DPB. However, these methods are primarily based on the DNA sequences measured by the ChIP-seq technology, failing to consider the possible partial variations of the motif sequences and errors of the sequencing technology itself. To address this, we propose a novel computational method, termed MSDenseNet, which combines a new fault-tolerant coding (FTC) scheme with the dense connectional deep neural networks. Three important factors can be attributed to the success of MSDenseNet: First, MSDenseNet utilizes a powerful feature representation approach, which transforms the raw DNA sequence into fusion coding using the fault-tolerant feature sequence; Second, in terms of network structure, MSDenseNet uses a multi-scale convolution within the dense layer and the multi-scale convolution preceding the dense block. This is shown to be able to significantly improve the network performance and accelerate the network convergence speed, and third, building upon the advanced deep neural network, MSDenseNet is capable of effectively mining the hidden complex relationship between the internal attributes of fusion sequence features to enhance the prediction of DPB. Benchmarking experiments on 690 ChIP-seq datasets show that MSDenseNet achieves an average AUC of 0.933 and outperforms the state-of-the-art method. The source code of MSDenseNet is available at https://github.com/csbio-njust-edu/msdensenet. The results show that MSDenseNet can effectively predict DPB. We anticipate that MSDenseNet will be exploited as a powerful tool to facilitate a more exhaustive understanding of DNA-binding proteins and help toward their functional characterization.
Hot spots-making directed evolution easier
2022, Biotechnology Advances
Directed evolution has emerged as a powerful strategy to engineer various properties of proteins. Traditional methods to construct libraries such as error-prone PCR and DNA shuffling commonly produce large, relatively inefficient libraries. In the absence of a high-throughput screening method, searching such libraries is time-consuming, laborious and costly. On the other hand, targeted mutagenesis guided by structure or sequence information has become a popular way to produce so-called smart libraries. With an increased ratio of advantageous to deleterious mutations, smart libraries increase the efficiency of directed evolution, provided that target site prediction is reliable. Mutation target site or hot spot prediction is critical to the quality of libraries and the performance of directed evolution. Appropriate selection of hot spots enables the generation of proteins with desired properties efficiently and rationally. Here, we give an overview of seven kinds of hot spots that are divided into two categories: sequence-based hot spots including CbD (conserved but different) sites and coevolving residues, and then 3D structure-based hot spots including active-site residues, access tunnel sites, flexible sites, distal sites coupled to active center, and interface sites. This review also covers the latest advances in computational tools for identifying these hot spots and many successful cases using them for enzyme engineering.
Computational approaches to predict protein functional families and functional sites
2021, Current Opinion in Structural Biology
Understanding the mechanisms of protein function is indispensable for many biological applications, such as protein engineering and drug design. However, experimental annotations are sparse, and therefore, theoretical strategies are needed to fill the gap. Here, we present the latest developments in building functional subclassifications of protein superfamilies and using evolutionary conservation to detect functional determinants, for example, catalytic-, binding- and specificity-determining residues important for delineating the functional families. We also briefly review other features exploited for functional site detection and new machine learning strategies for combining multiple features.
MLDH-Fold: Protein fold recognition based on multi-view low-rank modeling
2021, Neurocomputing
Protein fold recognition is critical for understanding the molecular functions of proteins and drug design. Computational predictors have been proposed to identify protein into one of the known folds based only on the protein sequence information. However, how to combine different features to improve predictive performance remains a challenging problem. In this study, two novel methods (MVLR and MLDH-Fold) were proposed for protein fold recognition. We proposed a novel multi-view learning framework to combine the different views of protein sequences. Each view represents the similarity scores between the target sequences and template sequences calculated by the threading method. The proposed method extracts the low-rank principal features to precisely represent the similarity scores of each view and constructs the latent subspace with the common information of different views to predict the target proteins. Furthermore, we proposed an ensemble method called MLDH-Fold to combine the MVLR with the template-based methods. Predictive results on the two widely used datasets (LE and YK) show that the proposed computational methods outperform other computational predictors, indicating that the MVLR and MLDH-Fold are useful tools for protein fold recognition.
Machine-Learning-Assisted Nanozyme Design: Lessons from Materials and Engineered Enzymes
2024, Advanced Materials

View all citing articles on Scopus

View full text

PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Materials and methods

Feature ranking by the MDGI Z-score

Conclusions

Acknowledgments

ResBoost: characterizing and predicting catalytic residues in enzymes

BMC Bioinf.

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

Nucleic Acids Res.

Classification of DNA minor and major grooves binding proteins according to the NLSS by data analysis methods

Appl. Biochem. Biotechnol.

Network analysis of protein structures identifies functional residues

J. Mol. Biol.

Analysis and comparison of lignin peroxidases between fungi and bacteria using three different modes of Chou's general pseudo amino acid composition

J. Theor. Biol.

Prediction of metalloproteinase family based on the concept of Chou's pseudo amino acid composition using a machine learning approach

J. Struct. Funct. Genomics

Random forests

Mach. Learn.

propy: a tool to generate various modes of Chou's PseAAC

Bioinformatics

Predicting functionally important residues from sequence conservation

Bioinformatics

Dissecting the catalytic triad of a serine protease

Nature

How accurate and statistically robust are catalytic site predictions based on closeness centrality?

BMC Bioinf.

ZincExplorer: an accurate hybrid method to improve the prediction of zinc-binding sites from protein sequences

Mol. Biosyst.

Accurate prediction of protein catalytic residues by side chain orientation and residue contact density

PLoS One

Prediction of protein cellular attributes using pseudo-amino acid composition

Proteins Struct. Funct. Genet.

Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes

Bioinformatics

Some remarks on protein attribute prediction and pseudo amino acid composition

J. Theor. Biol.

Impacts of Bioinformatics to Medicinal Chemistry

Med. Chem.

An Unprecedented Revolution in Medicinal Chemistry Driven by the Progress of Biological Science

Curr. Top. Med. Chem.

A novel approach to predict active sites of enzyme molecules

Proteins Struct. Funct. Bioinf.

Role of the protein outside active-site on the diffusion-controlled reaction of enzyme

J. Am. Chem. Soc.

Automatic prediction of catalytic residues by modeling residue structural neighborhood

BMC Bioinf.

Biopython: freely available Python tools for computational molecular biology and bioinformatics

Bioinformatics

The igraph software package for complex network research

Int. J. Complex Syst.

Residue centrality, functionally important residues, and active site shape: analysis of enzyme and non-enzyme families

Protein Sci.

Small-world network approach to identify key residues in protein-protein interaction

Proteins

MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins

Bioinformatics

L1pred: a sequence-based prediction tool for catalytic residues in enzymes with the L1-logreg classifier

PLoS One

Prediction of catalytic residues based on an overlapping amino acid classification

Amino Acids

PseAAC-General: fast building various modes of general form of chou's pseudo-amino acid composition for large-scale protein datasets

Int. J. Mol. Sci.

PseAAC-Builder: A cross-platform stand-alone program for generating various special Chou's pseudo-amino acid compositions

Anal. Biochem.

Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses

J. Theor. Biol.

Prediction of protein functional residues from sequence by probability density estimation

Bioinformatics

Structure of mitochondrial creatine kinase

Nature

CD-HIT: accelerated for clustering the next-generation sequencing data

Bioinformatics

The Catalytic Site Atlas 2.0: cataloging catalytic sites and residues identified in enzymes

Nucleic Acids Res.