Next Article in Journal
Synthesis of New Visnagen and Khellin Furochromone Pyrimidine Derivatives and Their Anti-Inflammatory and Analgesic Activity
Previous Article in Journal
Anti-Inflammatory and Antioxidant Components from Hygroryza aristata
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

QSAR Models for CXCR2 Receptor Antagonists Based on the Genetic Algorithm for Data Preprocessing Prior to Application of the PLS Linear Regression Method and Design of the New Compounds Using In Silico Virtual Screening

by
Tahereh Asadollahi
1,
Shayessteh Dadfarnia
1,
Ali Mohammad Haji Shabani
1,
Jahan B. Ghasemi
2,* and
Maryam Sarkhosh
2
1
Department of Chemistry, Faculty of Science, Yazd University, Yazd 89195, Iran
2
Department of Chemistry, Faculty of Science, K. N. Toosi University of Technology, Tehran, Iran
*
Author to whom correspondence should be addressed.
Molecules 2011, 16(3), 1928-1955; https://doi.org/10.3390/molecules16031928
Submission received: 4 January 2011 / Revised: 31 January 2011 / Accepted: 15 February 2011 / Published: 25 February 2011
(This article belongs to the Section Medicinal Chemistry)

Abstract

:
The CXCR2 receptors play a pivotal role in inflammatory disorders and CXCR2 receptor antagonists can in principle be used in the treatment of inflammatory and related diseases. In this study, quantitative relationships between the structures of 130 antagonists of the CXCR2 receptors and their activities were investigated by the partial least squares (PLS) method. The genetic algorithm (GA) has been proposed for improvement of the performance of the PLS modeling by choosing the most relevant descriptors. The results of the factor analysis show that eight latent variables are able to describe about 86.77% of the variance in the experimental activity of the molecules in the training set. Power prediction of the QSAR models developed with SMLR, PLS and GA-PLS methods were evaluated using cross-validation, and validation through an external prediction set. The results showed satisfactory goodness-of-fit, robustness and perfect external predictive performance. A comparison between the different developed methods indicates that GA-PLS can be chosen as supreme model due to its better prediction ability than the other two methods. The applicability domain was used to define the area of reliable predictions. Furthermore, the in silico screening technique was applied to the proposed QSAR model and the structure and potency of new compounds were predicted. The developed models were found to be useful for the estimation of pIC50 of CXCR2 receptors for which no experimental data is available.

1. Introduction

The chemokine CXCR2 receptor, a seven-transmembrane G-protein-coupled receptor, was cloned and identified in the early 1990s [1,2,3]. Chemokines play the key roles in inflammation, wound healing, hematopoiesis and metastasis. The chemokines comprise a large protein family that can be divided into subfamilies on the bases of structural motifs. Chemokines mediate their biological effects via interaction with a large family of 7-transmembrane G Protein-coupled receptors. These receptors are divided into four subgroups: CC, C, CX3C and CXC chemokine ligands (where X represents an amino acid) depending upon the position of the N-terminal cysteine residues within the protein. The chemokine receptors CXCR2/CXCR1 were cloned and identified and are activated by IL-8 (CXCL8) [4,5]. Interleukin 8 (IL-8, CXCL8) and growth related oncogene α (GRO-α) are members of the CXC chemokine subfamily and have a role in the activation and recruitment of the neutrophils to the sites of the inflammation mediated through the CXCR2 receptor. When CXCL8 interacts with the CXCR2 and CXCR1 on the neutrophils, an intercellular response occurs, including calcium flux, degranulation and subsequently chemotaxis. Elevated levels of CXCL8 have been observed in the diseases such as arthritis and chronic obstructive pulmonary disease (COPD) [6]. In the light of these findings, small molecule antagonists of the CXCR2 receptor are attractive biological targets for molecular drug discovery [7].
During the past decades, different approaches have been used for the development of QSAR models. The major differences between these approaches are in the structural parameters (descriptors) used to characterize molecules and/or in the mathematical methods used to establish a correlation between the descriptor values and the biological activities. One of the most successful approaches for the prediction of the chemical properties based on the molecular structural information is modeling of quantitative structure-activity/property relationships (QSAR/QSPR). The main goal of QSAR/QSPR is to predict complex physical, chemical and biological properties of the compounds from molecular structures [8,9]. The close relationship which exists between bulk properties of the compounds and their molecular structures allows one to provide a clear connection between the macroscopic and the microscopic properties of matter. QSAR methodologies have the potential of decreasing substantially the time and effort required for the discovery of the new medicines or improvement of the efficiency of the current one. The success of the QSAR approach can be explained by the insights offered for the structural determination of chemical properties, and the possibility of estimating the properties of the new chemical compounds without any need for them to be synthesized and tested. However, the success of any QSAR model depends on the accuracy of input data, selection of the appropriate descriptors, statistical tools, and most importantly validation of the developed model [10,11,12,13]. A major step in constructing the QSAR models is to find a set of molecular descriptors that represents variation in the structural properties of the molecules.
QSAR analysis employs statistical methods to drive quantitative mathematical relationships between chemical structure and biological activity. Thus, the use of the QSAR for the development of a theoretical model for calculation of the IC50 (the half maximal inhibitory concentration) of a diverse set of compounds seems to be interesting.
The strategy used in the QSAR models includes the following steps; (1) selection of a data set; (2) generation of the data molecular structures; (3) optimization of the geometry of the molecular structures by appropriate method; (4) generation of various structural descriptors; (5) application of variable selection or/and data reduction methods on the calculated descriptors; (6) regression analysis; and finally (7) evaluation of the validity and predictability of the developed QSAR models.
In the past, QSAR models have been built in the general field of chemokine antagonists including CCR1 [14], CCR5 [15,16], CXCR3 [17], CXCR4 [18] and one group of CXCR2 [19,20]. In this work, linear methods such as SMLR, PLS and GA-PLS are used to find quantitative relationships between the structures of several classes of CXCR2 antagonists and their biological activities, and the results obtained by these methods are compared. Furthermore, in silico screening is adopted to the QSAR model in order to predict the structure of new potentially active compounds.

2. Data and Methods

2.1. Data Set

The biological and chemical data of 130 CXCR2 antagonists, taken from literatures were selected for QSAR study [19,21,22,23]. The data set were heterogeneous, and involved several main classes of CXCR2 antagonists including; N,N’-diphenylureas, nicotinamide N-oxides, quinoxalines, triazolethiols, acylsulfonamide carboxylic acid bioisosteres, N-linked sulfonylurea, and furyl-3,4-diamino-3-cyclobut-3-ene-1,2-dione. The general structure and biological activities of the CXCR2 antagonists are provided in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7.
In order to guarantee that training and prediction sets cover the total space occupied by the original data set, it was divided into two parts of training and predication set according to the Kennard-Stones algorithm [24]. The Kennard-Stones algorithm is known as one of the best ways of building training and prediction sets [25,26] and recently, it has been used in many QSAR studies [27,28]. Thus, the training set, which contains 108 compounds with pIC50s in the range of 4.96–9.00 was used for building up the QSAR model, whereas the prediction set containing 22 compounds (out of 130 compounds, i.e., about 20% of the total number of compounds) with pIC50s in the range of 5.70–8.70 was used for evaluation of the model’s predictive ability. The distribution of pIC50 values of 130 essential CXCR2 antagonist receptors are demonstrated in Figure 1. As shown, these pIC50 values cover a wide range from 4.96 to 9.00.
Furthermore, in order to detect the homogeneities in the data set and to recognize the potential outliers in all of the molecules under study, the principal components analysis (PCA) [29] was performed with the calculated structural descriptors on the selected data set. Figure 2 shows that with the two more significant PCs which explain 68.47% of the variation in the data set (59.86% by PC1 and 8.61% by PC2), the distribution of molecules over the region is homogeneous. Thus, the score plot is a reliable representation of the spatial distribution of the points for the data set.

2.2. Computer Hardware and Software

A Dell Personal Computer equipped with the Windows® Vista operating system was used. HyperChem Release 7 software (Hypercube, Inc. Gainesville, Florida, USA 2002) was used to draw the molecular structures. Dragon software (Todeschini and Consonni, 2003 [30]) was employed for calculation of molecular structural descriptors. The selection of significant descriptors, which constructs a relationship between the biological activity of the data and its molecular structures, is an important step in QSAR modeling. For this purpose, the stepwise multiple linear regression method and genetic algorithm procedure were used to select the significant descriptors. The modeling was carried out using PLS Toolbox 3.5 (Eigen vector Research, Inc., Manson, WA, USA) as implemented in MATLAB. Other calculations were performed using MATLAB (version 7.5, Mathworks, Inc. Natick, MA, USA 2007) environment.

2.3. Structural Descriptors

The theoretical molecular descriptors were derived from the chemical structure of the compounds. The 3D-structures of all the compounds were drawn using the HyperChem software. The resulting geometries were further refined by means of the semiempirical AM1 method and the molecular structures were optimized using the Polak-Rebiere algorithm until the root mean square gradient reached 0.1 kJ (mol Å). Then they were transferred into the Dragon program package (version 3) [30] to obtain the different molecular descriptors including constitutional, topological descriptors, RDF, 3D-Morse, and Geometrical descriptors [31]. Finally, the constant or near constant descriptors were omitted i.e., one of the any two descriptors with an inter-correlation greater than 0.95 was removed to reduce the redundant and useless information.

2.4. Model Validation

Evaluation of a model’s stability and predictive ability is another key step in QSAR modeling. Different statistical parameters have been used for the evaluation of the suitability of the developed models for prediction of the activity of the studied compounds [32] this include cross validation coefficient (Q2 or R2cv), relative error percent of prediction sets (REPPred), the root mean square error of prediction (RMSEP), root mean square error of cross-validation (RMSECV), validation through an external prediction set and Y-randomization. However, it should be noted that a high Q2 does not necessarily mean a high predictability of the developed model [31]. In other word, the high value of Q2 is a necessary condition, but not sufficient for a developed model to have high predictability.
In order to assess the predictive ability and to check the statistical significance of the developed models, the proposed models were applied to predict the values of pIC50 of an external set that was not used in the development of the model. The predictive powers of the developed regression models on the training set were evaluated by predicted values of the prediction set. These parameters are listed in Table 8 and show the good statistical qualities and low precision errors of the assessments.
The REP is calculated according to the following equation:
R E P ( % ) = 100 / y ¯ [ 1 n ( y i y ^ i ) 2 ] 0.5
where ŷi, yi, y ¯ and n are the predicted value, the experimental value, the mean of the experimental value in the prediction set and the number of samples, respectively.
The root mean square error cross validation (RMSECV) is a frequently used measure of the differences between the predicted values by a model or an estimator and the actually observed values from the objects being modeled or estimated. The RMSECV is defined as follows:
R M S E C V = i = 1 n ( y ^ i y i ) 2 n
where ŷi, yi and n are the prediction value, the measured value and the number of measurements, respectively. The RMSECV is a measure of a model’s ability to predict new samples. The RMSECV is calculated via a leave one out cross-validation, where each sample is left out of the model formulation and then is predicted. The RMSEP is defined as a measure of the average difference between the predicated and experimental values at the predication stage. The RMSEP is calculated by applying Eq. (2) to the predication set.
Most QSAR modeling methods implement the leave-one-out (LOO) or leave-some-out (LSO) cross-validation procedure [13]. The outcome from the cross-validation procedure is evaluated by cross-validation coefficient (Q2 or R2CV) which is used as the criteria of both robustness and the predictive ability of the model. Cross-validated coefficient of R2CV (LOO-Q2) is calculated according to the following formula:
Q e x t 2 = 1 i = 1 Pr e d ( y i y ^ i ) 2 i 1 Pr e d ( y i y ¯ t r ) 2  
where ŷi and yi are the predicted value, the experimental value (over the prediction set), respectively, and y ¯ t r is the averaged value of the dependent variable for the training set. Tropsha used the following criteria for the external validation on the prediction set:
Q2 > 0.5
R2 > 0.6
0.85 < k < 1.15 or 0.85 < k’ < 1.15
( R 2 R o 2 ) R 2 o r ( R 2 R o 2 ) R 2 0.1  
K = i = 1 n t e s t y i y ^ i i = 1 n t e s t y ^ i 2 K = i = 1 n t e s t y i y ^ i i = 1 n t e s t y i 2  
R o = 1 i = 1 n t e s t ( y ^ i y i r o ) 2 i = 1 n t e s t ( y ^ i y ^ ¯ i ) 2   where   y i r o = K y ^ i
R o = 1 i = 1 n t e s t ( y i y ^ i r o ) 2 i = 1 n t e s t ( y i y ¯ i ) 2   where   y ^ i r o = K y i
In these equations, R2 is the correlation coefficient of regression between the experimental values and the prediction activities of the compounds on the training and prediction sets. R2o, R’2o, are mathematically defined as the regression of the experimental activities against predicted activities and regression of the predicted activities against experimental activities, respectively; where as, k and k’ are the slopes of these equations [33]. When these criteria are satisfied, it can be said that the model is predictive.
Furthermore, in order to assess the robustness of the model, the Y-randomization test was applied. The dependent variable vector (inhibitory activity) was randomly shuffled and a new QSAR model was developed using the original independent variable matrix. As was expected the new QSAR models (after several repetitions) have low R2 and Q2 values; the results are shown in Table 9.

3. Results and Discussion

The predictive ability of QSAR/QSPR models is affected by two factors: the descriptors, which must carry enough of the molecular structure information for the interpretation of the activity/property; and the employed modeling method. However, with too many descriptors, there is the possibility of over fitting of the statistical methods. Thus, in QSAR/QSPR studies the identification and selection of descriptors which provide maximum information in activity variations and have minimum co-linearity is important. On the other hand, the use of PLS usually results in well fitted stable models which have high predictive ability, but the estimation is not always very accurate and stable over the time. Therefore, a genetic algorithm (GA) [34] with a PLS regression improves the model accuracy in the selection of proper descriptors.

3.1. Stepwise Multiple Linear Regression (MLR)

On the basis of Kennard-Stones algorithm, 108 compounds out of 130 were selected as the training set and the remaining 22 were selected as the test set. Stepwise regression was used on the training data set to select the significant descriptors and it was found that between 733 calculated descriptors the MATS5v (Moran autocorrelation-lag5/weighted by atomic van der Waals volumes), GATS8P (Moran autocorrelation-lag8/weighted by atomic polarizabilites), MATS2m (Moran autocorrelation-lag2/weighted by atomic masses) and BEHp2 (highest eigenvalue n. 2 of burden matrix/weighted by atomic polarizabilites) construct the best model and there was no significant correlation between these descriptors (Table 10). So, they were selected for the further study. The selected physicochemical descriptors serve as the first guideline for the design of novel and the potent antagonists of CXCR2. The selected parameters used for development of the QSAR model are listed in Table 11. The model was produced by applying the multiple linear regression (MLR) technique on a database containing the training set. The relative importance and contribution of each descriptor in the model was determined by the calculation of the value of the mean effect (MF) [35] for each descriptor using the following equation:
M F = β j i = 1 i n d i j j m β j i n d i j
where MFj represents the mean effect for the descriptor j, βj is the coefficient of the descriptor j, dij is the value of the interested descriptors for each molecule and m is the number of descriptors in the model. The MF value shows the relative importance of each descriptor in compare to the other descriptors. The MF of the descriptor MATS5v, GATS8p, MATS2m and BEHp2 are also shown in Table 11 and indicate that among the selected descriptors, the most important one is MATS2m (Moran autocorrelation-lag2/weighted by atomic masses) as it has the highest mean effect value and has the largest effect on the pIC50 of the compound. The effect of MATS5v, GATS8p, MATS2m and BEHp2 for the QSAR study of CXCR2 receptors and the standardized regression coefficient on the significance of an individual descriptor in the model is shown in Figure 3 and indicates that, the greater the absolute value of a coefficient, the greater the weight of the variable in the model.
Using the descriptors selected by the stepwise regression method, a new MLR equation was developed on the basis of the training set:
pIC50 = −8.92 − 5.41MATS5v − 1.34GATS8p + 31.53MATS2m − 3.54BEHp2
n = 122, R2 = 0.78, Q2 = 0.66, F = 51.2
where n and F are the compound’s number and the F-ratio, respectively.
In the further study, the constructed model from the training set was used to evaluate the predictive ability of the produced model by predicting the pIC50 values in the prediction set. The results are given in Table 12 and Figure 4.

3.2. Interpretation of the Selected Descriptors

The binding of a ligand to a target depends on the shape of the ligand and on a variety of factors such as molecular electrostatic potential, polarizability, hydrophobicity, and lipophobicity. Therefore, in a QSAR study the strategy for encoding molecular information, either explicitly or implicitly, should account for these physicochemical effects. Furthermore, since the data sets usually include molecules of different size with different numbers of atoms, the structural encoding schemes must allow comparison between such molecules. The descriptors, MATS5v, GATS8p and MATS2m are Autocorrelation of Topological Structure. The 2D-autocorrelation descriptors explain how the values of certain functions, at intervals equal to the lag, are correlated. The 2D autocorrelation descriptors represent the topological structure of the compounds, but are more complex in nature when compared to the classical topological descriptors. The computation of these descriptors involves the summations of different autocorrelation functions corresponding to different structural lags and leads to different autocorrelation vectors corresponding to the lengths of the sub-structural fragments. Basically, the pool of 2D autocorrelation descriptors defines a wide 2D space. On behalf of a greater applicability, physicochemical properties (atomic masses, atomic van der Waals volumes, atomic Sanderson electronegativities, and atomic polarizabilities) were inserted as weighting components. As a result, these descriptors address the topology of the structure or parts thereof in association with a specific physicochemical property. Bearing in mind this aspect, the interpretation of 2D autocorrelation descriptors was uneasy.
BCTU descriptors were designed to encode atomic properties relevant to intermolecular interactions. The three standard BCUT descriptor types–atomic charge, polarizability and hydrogen bonding properties—that are relevant to intermolecular interactions are supported. The BCUT (Burden-CAS-University of Texas eigenvalues) descriptors are the eigenvalues of a modified connectivity matrix known as the Burden matrix [17]. The BCUT metrics are extensions of parameters originally developed by Burden. The Burden parameters are based on a combination of the atomic number for each atom and a description of the nominal bond-type for adjacent and nonadjacent atoms. Among the eigenvalues obtained from B matrix, the highest eigenvalues have been demonstrated to reflect the relevant aspects of molecular structure, and are therefore useful for similarity searching. By B eigenvalue decomposition, one can find the best structure for the molecules, e.g., number of atoms, number of bonds and the electronic distributions of the whole molecule. With respect to this concept, B eigenvalues may play a good role in the prediction in addition to BEHp2.

3.3. Partial Least Squares (PLS)

The general purpose of the linear regression method is to quantify the relationship between several independent or predictive variables and a dependent variable. Independent or predictive variables could be various physicochemical descriptors of the molecules, their principle components or latent variables. The partial least squares (PLS) method is used to establish relationships between the dependent variables of the Y matrix and the descriptors of the X matrix (as independent variables also called “latent” variables) [34]. The procedure performs a principle component analysis on the independent variables matrix and simultaneously maximizing the correlation with the dependent variables matrix. The number of appropriate latent variables (LVs) for describing the best developed model was found out by evaluating the root mean square error cross-validation (RMSECV) while the number of latent variables was changed.
As it is shown in Figure 5 the RMSECV is minimized when the value of LVs is 7 and it is increased significantly when the numbers of LVs are greater than 11. Thus, the optimum LVs for the training set of PLS method was chosen to be 7. The developed PLS regression model with 7 LVs shows a high correlation between the experimental and predicted values of pIC50 in training set (R2 = 0.74 and RMSECV = 0.6).
Finally, for the evaluation of the predictive ability of the developed model, the Q2 value and the external validation method were performed. A high Q2 and R2 values (Q2 > 0.5) were considered as a proof of high predictive ability of the model. The external validation method was performed by dividing the original data set randomly into two parts, training and prediction set, and the values of pIC50 of molecules in the prediction set were predicted by the developed model. The results of the calculated R2, Q2, REP%, RMSECV and etc. for prediction set are reported in Table 2.
It should be noted that even when there is no correlation between the LOO- cross-validated R2 (Q2) and regression coefficient R2 for a predictive set with known values of biological activities, the validated model can be used for predicting activities/ properties of new chemicals [33,36]. Furthermore, As the results reveal, the PLS method is an efficient approach in monitoring many complex processes and is capable of strongly reducing cross-correlated data set with high dimension to a smaller and interpretable set of principle components or latent variables.

3.4. Partial Least Squares combined with Genetic Algorithm (GA-PLS)

As mentioned before, one of the problems in choosing the set of molecular descriptors is the co-linearity within them. To overcome this problem some workers tried to combine the genetic algorithms (GA) with PLS [37,38,39]. GA-PLS consists of three basic steps. (1) Creation of an initial population of chromosomes in which each chromosome is a binary bit string by which the existence of a variable is represented; (2) Evaluation of fitness of each chromosome in the population by the internal predictivity of PLS. Thus, the squared predictive correlation coefficient (Q2) by the leave-one-out procedure in cross-validation is used as the internal predictivity [40]; (3) Reproduction of the population of chromosomes in the next generation. The operations of selection, cross-over and mutation of chromosomes, are made in this step. Then, steps 2 and 3 are continued until the number of the repetitions has reached the designated number of generations. The effective factors in the GA such as repetition rate, rate of mutation, number of chromosomes and generation are optimized.
Rogers and Hopfinger first applied GA-PLS method in QSAR analysis and stated that it is very effective and superior to PLS method. In this paper, to find the more convenient set of descriptors, a GA-PLS analysis was performed [41,42,43].
All descriptors were preprocessed by auto scaling before performing the GA-PLS was performed. The GA was optimized by variation and selection of the fitness values. The fitness function is defined as:
100 { [ i = 1 n ( y i y ^ i ) 2 / n ] [ i = 1 n ( y i y ¯ i ) 2 / k ] } × 100
where y ^ i is the predicted value of a sample i, n is the number of samples, k = n − 1 is the number of samples used in cross-validation. The definitions and types of selected descriptors are given in Table 13. The QSAR model was derived by the doing the GA analysis with partial least squares (PLS)-regression method for the population size of 64 and mutation rate of 0.003. Other parameters are summarized in Table 14. Results of R2, REP%, RMSEP and Q2 for prediction set of GA-PLS study are also reported in Table 2 and as it is shown the results of this analysis are similar to those obtained by PLS method but the Q2 and R2 value of the GA-PLS were improved in compare to the MLR and PLS methods. However, the interpretations of the chemical properties of these descriptors are difficult as their definition is based on mathematics. The details are described in the handbook and literature of Dragon software [30]. Further more, although these results show that the GA method is a satisfactory correspondence for variable selection, but more experiments are needed to generalize the superiority of GA-PLS over other techniques.

3.5. In Silico Screening

The in silico screening procedure is a useful tool for predicting and identifying new biologically active compounds with improved characteristics prior to their actual synthesis [44,45]. Thus, the in silico procedure can be applied as a physico-chemical filter to reduce the number of compounds to be tested experimentally for hit/lead generation. In other words, the in silico procedure minimizes the time and cost associated with identifying new leads. A virtual screening was performed by insertion, deletion and substitution of different substitutes on the original molecules [46,47] and the effects of the structural modifications on the biological activity were investigated. Then, the domain of application of QSAR model was defined to use the model for screening new compounds. The applicability domain (AD) of QSAR model was used to verify the prediction reliability, to identify the problematic compounds and to predict the compounds with acceptable activity that falls within this domain. Several methods have been used for determination of the AD of QSAR models [48], but the most common one is described by Gramatica [49] which used the leverage values for each compound. The leverage approach allows the determination of the position of new chemical in the QSAR model; i.e., whether a new chemical will lie within the structural model domain or outside of it. Furthermore, the leverage approach along with the Williams plot is used to determine the applicability domain in all QSAR models.
To construct the William plot, the leverage hi for each chemical compound, in which QSAR model was used to predict its activity, was calculated according to the following equation:
h i = x i T ( X T X ) x i
where xi is the descriptor vector of the considered compound and X is the descriptor matrix derived from the training set descriptor values and the warning leverage (h*) was determined as [48]:
h = 3 ( p + 1 ) n
where n is the number of training compounds, p is the number of predictor variables. The defined applicability domain (AD) was then visualized via a Williams plot, the plot of the standardized residuals versus the leverage values (h). A compound with hi > h* seriously influences the regression performance and may be excluded from the applicability domain, but it doesn’t appear to be an outlier because its standardized residual may be small. Moreover, a value of 3 for standardized residuals is commonly used as a cut-off value for accepting predictions, because points that lie within ±3 standardized residual from the mean cover 99% of the normally distributed data [50]. Thus, the leverage and the standardized residual were combined for the characterization of the applicability domain.
The Williams plot for the QSAR is illustrated in Figure 6. The warning leverage (h*), was found to be 0.25 for the developed QSAR model. The chemicals that had a standardized residual more than three times of the standard deviation units were considered to be outliers while chemicals with a leverage value higher than h* were considered to be influential or high leverage chemicals. Based on the leverages (h > 0.25), the one compound were found to be outside of the defined AD (Figure 6) of the QSAR model, so, it was identified as structurally influential chemical based on its large leverage value (h > h*).
Next, the in silico screening was applied to the design of new structures with potential CXCR2 inhibitors according to the developed QSAR model and was validated by the developed GA-PLS model. For this purpose, compound 66 of the N,N’-diphenylurea derivatives listed in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7 (IC50 = 8.22) was selected as a template due to its good inhibition. The molecule was modified in such a way that its synthesis was experimentally possible. Then, the in silico screen was applied by substituting different groups in the X and Y positions of the ring; the results of this investigation are given in Table 15. The model tolerated various N,N’-diphenylurea substituents since all of the studied derivatives were within the applicability domain. Among different molecules designed, the compound 10c showed the best activity (pIC50 = 8.50). Thus, in order to clarify the relation between the activities of the compounds with different functional group, this compound was selected for further structural modification. So, in the next step the oxygen of the amide group of compound 10c was substituted by different function groups, the results are demonstrated in Table 16. As it is shown, the model tolerate all the compound designed on the bases of molecule 10c, and the best predicted activity was found for the compound 9d (where X = S). Thus, it is demonstrating that using a simple QSAR model, it is possible to simultaneously identify compounds with improved activity and to determine the structural modifications that don’t fall within the applicability domain. Finally, this result confirms the reliability of the models and it shows that with the construction of the QSAR model and use of in silico screening it is possible to identify new synthetic targets for drug discovery.

4. Conclusions

In this study, three different modeling methods, SMLR, PLS and GA-PLS were used in the construction of a QSAR model for CXCR2 antagonists and the resulting models were compared. It was shown that performing GA prior to the calibration, yields a regression model with improved predictive power. The accuracy and predictability of the proposed models were illustrated by various criteria, including cross-validation, relative error percent of prediction sets (REPPred), the root mean square error of prediction (RMSEP), root mean square error of cross-validation (RMSECV), validation through and Y-randomization. It was also shown that the proposed method is a useful aid for reduction of the time and cost of synthesis and activity determination of CXCR2 receptor antagonists. Furthermore, the results confirm that among the construction models used, the GA-PLS is superior for prediction of the IC50 of CXCR2 antagonists. Our future work will focus on validation for putative CXCR2 antagonists for virtual screening.

References

  1. Holmes, W.E.; Lee, J.; Kuang, W.J.; Rice, G.C.; Wood, W.I. Structure and functional expression of a human interleukin-8 receptor. Science 1991, 253, 1278–1280. [Google Scholar] [CrossRef] [PubMed]
  2. Murphy, P.M.; Tiffany, H.L. Cloning of complementary DNA encoding a functional human interleukin-8 receptor. Science 1991, 253, 1280–1283. [Google Scholar] [CrossRef] [PubMed]
  3. Murphy, P.M.; Baggiolini, M.; Charo, I.F.; Hebert, C.A.; Horuk, R.; Matsushima, K.; Miller, L.H.; Oppenheim, J.J.; Power, C.A. International Union of Pharmacology. XXII. Nomenclature for Chemokine Receptors. Pharmacol. Rev. 2000, 52, 145–176. [Google Scholar] [PubMed]
  4. Loetscher, P.; Seitz, M.; Clark-Lewis, I.; Baggiolini, M.; Moser, B. Both interleukin-8 receptors independently mediate chemotaxis: Jurkat cells transfected with IL-8R1 or IL-8R2 migrate in response to IL-8, GROα and NAP-2. FEBS Lett. 1994, 341, 187–192. [Google Scholar] [CrossRef]
  5. Ahuja, S.K.; Lee, J.C.; Murphy, P.M. The CXC chemokines growth-regulated oncogene (GRO) α, GROβ, GROγ, neutrophil-activating peptide-2, and epithelial cell-derived neutrophil-activating peptide-78 are potent agonists for the type B, but not the type A, human Interleukin-8 Receptor. J. Biol. Chem. 1996, 271, 20545–20550. [Google Scholar] [CrossRef] [PubMed]
  6. Bizzarri, C.; Allegretti, M.; Bitondo, R. Di; Cervellera, M. N.; Collota, F.; Bertini, R. Pharmacological inhibition of Interleukin-8 (CXCL8) as a new approach for the prevention and treatment of several human diseases. Curr. Med. Chem. Anti-inflamm. Anti-Allergy Agents 2003, 2, 67–79. [Google Scholar] [CrossRef]
  7. Busch-Petersen, J. Small molecule antagonists of the CXCR2 and CXCR1 chemokine receptors as therapeutic agents for the treatment of inflammatory diseases. Curr. Med. Chem. 2006, 6, 1345–1352. [Google Scholar] [CrossRef]
  8. Ribeiro, F.A.L.; Ferreira, M.M.C. QSPR models of boiling point, octanol-water partition coefficient and retention time index of polycyclic aromatic hydrocarbons. J. Mol. Struct. Theochem. 2003, 663, 109–126. [Google Scholar] [CrossRef]
  9. Molfetta, F.A.; Bruni, A.T.; Rosseli, F.P.; Silva, A.B.F. A partial least squares and principal component regression study of quinone compounds with trypanocidal activity. Struct. Chem. 2007, 18, 49–57. [Google Scholar] [CrossRef]
  10. Tong, W.; Hong, H.; Xie, Q.; Shi, L.; Fang, H.; Perkins, R. Assessing QSAR limitations-A regulatory perspective. Curr. Comput. Aided Drug Des. 2005, 1, 195–205. [Google Scholar] [CrossRef]
  11. He, L.; Jurs, P.C. Assessing the reliability of a QSAR model’s predictions. J. Mol. Graphics Model. 2005, 23, 503–523. [Google Scholar] [CrossRef] [PubMed]
  12. Ghafourian, T.; Cronin, M.T.D. The impact of variable selection on the modelling of oestrogenicity. SAR QSAR Environ. Res. 2005, 16, 171–190. [Google Scholar] [CrossRef] [PubMed]
  13. Tropsha, A.; Gramatica, P.; Gombar, V.K. The importance of being earnest: Validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb. Sci. 2003, 22, 69–77. [Google Scholar] [CrossRef]
  14. Shahlaei, M.; Fassihi, A.; Saghaie, L. Application of PC-ANN and PC-LS-SVM in QSAR of CCR1 antagonist compounds: A comparative study. Eur. J. Med. Chem. 2010, 45, 1572–1582. [Google Scholar] [CrossRef] [PubMed]
  15. Aher, Y.D.; Agrawal, A.; Bharatam, P.V.; Garg, P. 3D-QSAR studies of substituted1-(3,3-diphenylpropyl)-piperidinyl amides and ureas as CCR5 receptor antagonists. J. Mol. Model. 2007, 13, 519–529. [Google Scholar] [CrossRef] [PubMed]
  16. Afantitis, A.; Melagraki, G.; Sarimveis, H.; Koutentis, P.A.; Markopoulosd, J.; Igglessi-Markopoulou, O. Investigation of substituent effect of 1-(3,3-diphenylpropyl)-piperidinyl phenylacetamides on CCR5 binding affinity using QSAR and virtual screening techniques. J. Comput. Aided Mol. Des. 2006, 20, 83–95. [Google Scholar] [CrossRef] [PubMed]
  17. Afantitis, A.; Melagraki, G.; Sarimveis, H.; Igglessi-Markopoulou, O.; Kollias, G. A novel QSAR model for predicting the inhibition of CXCR3 receptor by 4-N-aryl-[1,4] diazepane ureas. Eur. J. Med. Chem. 2009, 44, 877–884. [Google Scholar] [CrossRef] [PubMed]
  18. Bhonsle, J.B.; Wang, Z.X.; Tamamura, H.; Fujii, N.; Peiper, S.C.; Trent, J.O. A simple, automated Quasi-4D-QSAR, Quasi-multi way PLS approach to develop highly predictive QSAR models for highly flexible CXCR4 inhibitor cyclic pentapeptide ligands using scripted common molecular modeling tools. QSAR Comb. Sci. 2005, 24, 620–630. [Google Scholar] [CrossRef]
  19. Khelebnikov, A.I.; Schepetkin, I.A.; Quinn, M.T. Quantitative structure activity relationships for small non- peptide antagonistsof CXCR2: Indirect 3D approach using the frontal polygon method. Bioorg. Med. Chem. Lett. 2006, 14, 352–365. [Google Scholar] [CrossRef] [PubMed]
  20. Ghasemi, J.B.; Zohrabi, P.; Khajehsharifi, H. Quantitative structure-activity relationship study of nonpeptide antagonists of CXCR2 using stepwise multiple linear regression analysis. Monatsh. Chem. 2010, 141, 111–118. [Google Scholar] [CrossRef]
  21. Yu, Y.; Dwyer, M.P.; Chao, J.; Aki, C.; Chao, J.; Purakkattle, B.; Rindgen, D.; Bond, R.; Mayer-Ezel, R.; kway, J.; et al. Synthesis and structure-activity relationships of heteroaryl substituted-3,4-diamino-3-cyclobut-3-ene-1,2-dione CXCR2/CXCR1 receptor antagonists. Bioorg. Med. Chem. Lett. 2008, 18, 1318–1322. [Google Scholar] [CrossRef] [PubMed]
  22. Winters, M.P.; Crysler, C.; Subasinghe, N.; Ryan, D.; Leong, L.; Zhao, S.; Donatelli, R.; Yurkow, E.; Mazzulla, M.; Boczon, L.; et al. Carboxylic acid bioisosteres acylsulfonamides, acylsulfamides, and sulfonylureas as novel antagonists of the CXCR2 receptor. Bioorg. Med. Chem. Lett. 2008, 18, 1926–1930. [Google Scholar] [CrossRef] [PubMed]
  23. Walters, I.; Austin, C.; Austin, R.; Bonnert, R.; Cage, P.; Christie, M.; Ebden, M.; Gardiner, S.; Grahames, C.; Hill, S.; et al. Evaluation of a series of bicyclic CXCR2 antagonists. Bioorg. Med. Chem. Lett. 2008, 18, 798–803. [Google Scholar] [CrossRef] [PubMed]
  24. Kennard, R.W.; Stone, L.A. Computer aided design of experiments. Technometrics 1969, 11, 137–148. [Google Scholar] [CrossRef]
  25. Melagraki, G.; Afantitis, A.; Makridima, K.; Sarimveis, H.; Igglessi-Markopoulou, O. Prediction of toxicity using a novel RBF neural network training methodology. J. Mol. Model. 2006, 12, 297–305. [Google Scholar] [CrossRef] [PubMed]
  26. Wu, W.; Walczak, B.; Massart, D.L.; Heuerding, S.; Erni, F.; Last, I.R.; Prebble, K.A. Artificial neural networks in classification of NIR spectral data: design of the training set. Chemometr. Intell. Lab. Syst. 1996, 33, 35–46. [Google Scholar] [CrossRef]
  27. Ghosh, P.; Ghosh, M.; Bagchi, M.C. On an aspect of calculated molecular descriptors in QSAR studies of quinolone antibacterials. Mol. Divers. 2006, 10, 415–427. [Google Scholar] [CrossRef] [PubMed]
  28. Chakraborti, A.K.; Gopalakrishnan, B.; Sobhia, M.E.; Malde, A. 3D-QSAR studies of indole derivatives as phosphodiesterase IV inhibitors. Eur. J. Med. Chem. 2003, 38, 975–982. [Google Scholar] [CrossRef] [PubMed]
  29. Agrawal, V.K.; Sohgaura, R.; Khadikar, P.V. QSAR studies on biological activity of piritrexim analogues against pc DHFR. Bioorg. Med. Chem. 2002, 10, 2919–2926. [Google Scholar] [CrossRef]
  30. Todeschini, R. Milano Chemometrics, QSPR Group. http://michem.disat.unimib.it/chm/.
  31. Todeschini, R.; Consonni, V. Handbook of Molecular Descriptors; Wiely-VCH: Weinheim, Germany, 2000. [Google Scholar]
  32. Wold, S.; Eriksson, L. Chemometric Methods in Molecular Design; van de Waterbeemd, H., Ed.; VCH: Weinheim, Germany, 1995; pp. 312–317. [Google Scholar]
  33. Golbraikh, A.; Tropsha, A. Beware of q2! J. Mol. Graph. Model. 2002, 20, 269–276. [Google Scholar] [CrossRef]
  34. Leardi, R.; Boggia, R.; Terrile, M. Genetic algorithms as a strategy for feature selection. J. Chemometr. 1992, 6, 267–281. [Google Scholar] [CrossRef]
  35. Jalali, H.M.; Konuze, E. Use of quantitative structure property relationships in predicting the Kraft point of anionic surfactants. Int. Electron. J. Mol. Des. 2002, 1, 410–417. [Google Scholar]
  36. Acevedo-Martínez, J.; Escalona-Arranz, J.C.; Villar-Rojas, A.; Téllez-Palmero, F.; Pérez-Rosés, R.; González, L.; Carrasco-Velar, R. Quantitative study of the structure-retention index relationship in the imine family. J. Chromatogr. A 2006, 1102, 238–244. [Google Scholar] [CrossRef] [PubMed]
  37. Hou, T.J.; Wang, J.M.; Liao, N.; Xu, X.J. Application of Genetic algorithms on the structure-activity relationship analysis of some cinnamaides. J. Chem. Inf. Comput. Sci. 1999, 39, 775–781. [Google Scholar] [CrossRef] [PubMed]
  38. Hasegawa, K. GA strategy for variable selection in QSAR studies: application of GA-Based region selection to a 3D-QSAR study of acetylcholinesterase inhibitors. J. Chem. Inf. Comput. Sci. 1999, 39, 112–120. [Google Scholar] [CrossRef] [PubMed]
  39. Goicoechea, H.C.; Olivieri, A.C. Wavelength selection for multivariate calibration using a genetic algorithm: A novel initialization strategy. J. Chem. Inf. Comput. Sci. 2001, 42, 1146–1153. [Google Scholar] [CrossRef]
  40. van de Waterbeemd, H. Chemometric Methods in Molecular Design, Methods and Principles in Medicinal Chemistry; Verlag Chemie: Weinheim, Germany, 1995; Volume 2. [Google Scholar]
  41. Rogers, D.; Hopfinger, A.J. Application of genetic function approximation to quantitative structure-activity relationships and quantitative structure-property relationships. J. Chem. Inf. Comput. Sci. 1994, 34, 854–866. [Google Scholar] [CrossRef]
  42. Hasegawa, K.; Kimura, T.; Funatsu, K. GA strategy for variable selection in QSAR studies: Enhancement of comparative molecular binding energy analysis by GA-based PLS method. Quant. Struct. Act. Relat. 1999, 18, 262–272. [Google Scholar] [CrossRef]
  43. Sagradoa, S.; Cronin, M.T.D. Application of the modelling power approach to variable subset selection for GA-PLS QSAR models. Anal. Chim. Acta 2008, 609, 169–174. [Google Scholar] [CrossRef] [PubMed]
  44. Tropsha, A.; Golbraikh, A. Predictive QSAR modeling workflow model applicability domains and virtual screening. Curr. Pharm. Des. 2007, 13, 3494–3504. [Google Scholar] [CrossRef] [PubMed]
  45. Muegge, I.; Oloff, S. Advances in virtual screening. Drug Discov. Today Technol. 2006, 3, 405–411. [Google Scholar] [CrossRef]
  46. Melagraki, G.; Afantitis, A.; Sarimveis, H.; Koutentis, P.A.; Markopoulos, J.; Igglessi-Markopoulou, O. Optimization of biaryl piperidine and 4-amino-2-biarylurea MCH1 receptor antagonists using QSAR modeling, classification techniques and virtual screening. J. Comput. Aided Mol. Des. 2007, 21, 251–267. [Google Scholar] [CrossRef] [PubMed]
  47. Melagraki, G.; Afantitis, A.; Sarimveis, H.; Koutentis, P. A.; Kollias, G. A.; Igglessi-Markopoulou, O. Predictive QSAR workflow for the in silico identification and screening of novel HDAC inhibitors. Mol. Divers. 2009, 13, 301–311. [Google Scholar] [CrossRef] [PubMed]
  48. Eriksson, L.; Jaworska, J.; Worth, A.P.; Cronin, M.T.D.; McDowell, R.M. Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. Environ. Health Perspect. 2003, 111, 1361–1375. [Google Scholar] [CrossRef] [PubMed]
  49. Gramatica, P. Principles of QSAR models validation: internal and external. QSAR Comb. Sci. 2007, 26, 694–701. [Google Scholar] [CrossRef]
  50. Jaworska, J.S.; Nikolova, J.N.; Aldenberg, T. QSAR applicability domain estimation by projection of the training set in descriptor space: a review. ATLA Altern. Lab. Anim. 2005, 33, 445–459. [Google Scholar] [PubMed]
Figure 1. Distribution of pIC50 values for the whole data set.
Figure 1. Distribution of pIC50 values for the whole data set.
Molecules 16 01928 g001
Figure 2. Score-Score plote.
Figure 2. Score-Score plote.
Molecules 16 01928 g002
Figure 3. Standardized coefficients versus descriptors in MLR model.
Figure 3. Standardized coefficients versus descriptors in MLR model.
Molecules 16 01928 g003
Figure 4. Predicted pIC50 values by (a) MLR; (b) PLS and (c) GA-PLS modeling vs. experimental pIC50 values.
Figure 4. Predicted pIC50 values by (a) MLR; (b) PLS and (c) GA-PLS modeling vs. experimental pIC50 values.
Molecules 16 01928 g004aMolecules 16 01928 g004b
Figure 5. The RMSECV versus number of LVs.
Figure 5. The RMSECV versus number of LVs.
Molecules 16 01928 g005aMolecules 16 01928 g005b
Figure 6. Williams plot of standardized residual versus leverage.
Figure 6. Williams plot of standardized residual versus leverage.
Molecules 16 01928 g006
Table 1. Structures and biological activities of the acylsulfonamide derivatives.
Table 1. Structures and biological activities of the acylsulfonamide derivatives.
Molecules 16 01928 i001
CompoundR1R2R3R4IC50 for CXCR2 (µM)pIC50
1MeCNHH0.077.14
2MeBrHH0.176.77
3EtCNHH0.067.19
4n-PrCNHH1.305.89
5BnCNHH1.405.85
6i-PrCNHH0.226.66
7PhCNHH0.266.58
8CF3CNHH0.097.06
9MeCNOMeH0.166.80
10MeCNMeH0.027.72
11MeBr--0.256.60
12MeCN--0.646.19
13PhBr--0.126.92
14PhCN--0.146.85
15o-Cl-PhenylCN--0.406.40
16p-F-PhenylCN--0.526.28
17MeMeH-0.057.30
18MeHH-0.126.92
19HHH-0.077.18
20EtEtH-1.105.96
21n-ButylHH-1.105.96
22PhHH-0.886.05
23-CH2CH2OMeHH-0.266.58
24MeMeOMe-0.067.24
25MeMeMe-0.027.62
Table 2. Structures and biological activities of the furyl and hetrocyclic-3,4-diamino-3-cyclobut-3-ene-1,2-dione derivatives.
Table 2. Structures and biological activities of the furyl and hetrocyclic-3,4-diamino-3-cyclobut-3-ene-1,2-dione derivatives.
Molecules 16 01928 i002
CompoundRIC50 CXCR2 (nM)pIC50
265-H0.0058.3
275-Me0.0068.24
285-Et0.0048.39
295-Br0.0058.33
305-Cl0.0058.32
315-CF30.0177.76
325-CF2H0.0078.17
335-CH2OH0.0038.55
345-CH2N(Me)20.0947.03
355-CON(Me)20.1716.77
365-(20Cl)Ph0.0497.31
375-(2-CF3)Ph0.156.82
385-(3-Cl)Ph0.0587.24
395-(3-CF3)Ph0.0877.06
404-Cl0.00458.35
414-Br0.0058.30
424-(4-Pyridyl)0.0098.02
434-(3-Thienyl)0.0088.09
444-(3,5-Dimethyl-4-isoxazoyl)0.0088.12
452,3-Benzofuran0.0038.46
463-Br0.0167.78
47 Molecules 16 01928 i0038.68.06
48 Molecules 16 01928 i00410.97.96
49 Molecules 16 01928 i0059.88.01
50 Molecules 16 01928 i0069.88.01
51 Molecules 16 01928 i0077.58.12
52 Molecules 16 01928 i0088.28.10
53 Molecules 16 01928 i0098.08.10
54 Molecules 16 01928 i0105.88.24
55 Molecules 16 01928 i0116.28.21
56 Molecules 16 01928 i0126.28.21
57 Molecules 16 01928 i013217.68
58 Molecules 16 01928 i014507.30
Table 3. Structures and biological activities of the N,N’-diphenylureas derivatives.
Table 3. Structures and biological activities of the N,N’-diphenylureas derivatives.
Molecules 16 01928 i015
CompoundR1R2R3R4R5R6IC50 for CXCR2 (nM)pIC50
59OHHClHBrH9066.04
60OHClClHBrH637.20
61OHCONH2ClHBrH108.00
62OHCH2NH2ClHBrH1146.94
63OHSO2NH2ClHBrH78.15
64OHSO2NMe2ClHBrH127.92
65OHHCNHBrH257.60
66OHBrCNHBrH68.22
67OHClCNHBrH227.66
68OHCNClHBrH577.24
69OHHNO2HBrH227.66
70OHHNO2HHH3206.49
71OHNO2HHHH8606.07
72OHHHNO2HH109004.96
73OHHCNHHH2006.70
74OHSO2NH2ClHClCl9.38.03
75–N=N–NH– CNHBrH397.49
Table 4. Structures and biological activities of the nikotinamide N-oxides derivatives.
Table 4. Structures and biological activities of the nikotinamide N-oxides derivatives.
Molecules 16 01928 i016
CompoundRIC50 for CXCR2 (nM)pIC50
76-SO2C2H51306.87
77-SO2CH(CH3)24006.40
78 Molecules 16 01928 i0184606.34
79-SO2C6H5907.05
80 Molecules 16 01928 i019327.49
81-SO2CH2C6H52806.55
82Cl10006.00
Table 5. Structures and biological activities of the triazolethiol derivatives.
Table 5. Structures and biological activities of the triazolethiol derivatives.
Molecules 16 01928 i017
CompoundR1R2IC50 for CXCR2 (nM)pIC50
83C6H5CH2C6H524005.62
843-OHC6H4CH2C6H544005.36
85C6H5CH24-Pyridinyl77005.11
86C6H5CH22-Furanyl42005.38
87C6H5CH24-CNC6H435005.46
88C6H5CH23-CF3C6H435005.46
89C6H5CH24-CF3C6H428005.55
90C6H5CH24-CH3OC6H423005.64
91C6H5CH23,5-diClC6H320005.70
92C6H5CH22-Thienyl20005.70
93C6H5CH22-CH3C6H414005.85
94C6H5CH22-CH3OC6H414005.85
95C6H5CH23-ClC6H410006.00
96C6H5CH22-FC6H48906.05
97C6H5CH24-ClC6H48306.08
98C6H5CH23,4-diClC6H38006.10
99C6H5CH22,5-diClC6H36706.17
100C6H5CH22-ClC6H44506.35
101C6H5CH22,4-diClC6H34106.39
102C6H5CH22-BrC6H43506.46
103C6H5CH22,3-diClC6H33506.46
1044- CH3OC6H4CH22,4-diClC6H3100005.00
1053-CH3OC6H4CH22,4-diClC6H342005.38
1063-CH3C6H4CH22,4-diClC6H37306.14
1074-Cl C6H4CH22,4-diClC6H33006.52
1083-C6H5O C6H4CH22,4-diClC6H31706.77
1093-Cl C6H4CH22,4-diClC6H3927.04
1103-Cl C6H4CH22-ClC6H4287.55
Table 6. Structures and biological activities of the bicyclic CXCR2 antagonists.
Table 6. Structures and biological activities of the bicyclic CXCR2 antagonists.
Compound IC50 for CXCR2 (nM)pIC50
111 Molecules 16 01928 i0201606.80
112 Molecules 16 01928 i02148.40
113 Molecules 16 01928 i022137.89
114 Molecules 16 01928 i0236306.20
115 Molecules 16 01928 i02478.15
116 Molecules 16 01928 i0252806.55
117 Molecules 16 01928 i0261406.85
118 Molecules 16 01928 i0272806.55
119 Molecules 16 01928 i0288506.07
120 Molecules 16 01928 i02958.30
121 Molecules 16 01928 i0303506.46
122 Molecules 16 01928 i031167.80
123 Molecules 16 01928 i03228.70
124 Molecules 16 01928 i033457.35
125 Molecules 16 01928 i03425005.60
126 Molecules 16 01928 i0352206.66
Table 7. Structures and biological activities of the bicyclic CXCR2 antagonists.
Table 7. Structures and biological activities of the bicyclic CXCR2 antagonists.
Molecules 16 01928 i036
CompoundRIC50 for CXCR2 (nM)pIC50
1a
10a
Molecules 16 01928 i0373
1
8.52
9
1b
10b
Molecules 16 01928 i0384
1
8.40
9.00
1c
10c
Molecules 16 01928 i03913
2
7.89
8.70
1d
10d
Molecules 16 01928 i04013
5
7.89
8.30
1e
10e
Molecules 16 01928 i04135
5
7.46
8.30
1f
10f
Molecules 16 01928 i042120
60
6.92
7.22
Table 8. Statistical parameters obtained by applying the PLS, GA-PLS and SMLR.
Table 8. Statistical parameters obtained by applying the PLS, GA-PLS and SMLR.
ParameterPLSGA-PLSSMLR
RMSEP0.500.510.56
AREPred.5.985.531.3
R20.7480.7790.78
R2Training Set0.7270.880.68
Q20.680.7130.66
SEP0.500.510.53
R2 − Ro2/R2−0.291−0.254−0.254
K1.0191.0350.962
Table 9. R2 and Q2 values after several Y-randomization tests.
Table 9. R2 and Q2 values after several Y-randomization tests.
IterationPLSGA-PLS
R2Q2R2Q2
10.0047−0.9490.010−0.577
20.005−0.4230.010−0.919
30.039−0.4670.036−0.417
40.12−0.1980.019−0.506
50.005−0.9550.006−0.878
60.005−0.9550.153−0.063
70.006−0.9670.084−0.245
80.186−1.6010.001−0.699
90.002−0.7530.073−1.21
100.171−1.570.147−0.41
Table 10. Correlation matrix for MLR model.
Table 10. Correlation matrix for MLR model.
pIC50MATS5vGATS8pMATS2mBEHp2
pIC501
MATS5v−0.268631
GATS8P−0.16055−0.008561
MATS2m0.001149−0.08958−0.02861
BEHp20.214723−0.04342−059040.0006151
Table 11. Details of the constructed MLR model.
Table 11. Details of the constructed MLR model.
DescriptoraCoefficientMFb
MATS5v−8.9918 (±8.729)−0.254
GATS8P−5.409 (±0.463)−0.063
MATS2m−1.337 (±0.349)1.484
BEHp231.527 (±7.936)−0.166
Constant−3.539 (±1.156)
a The name and chemical meanings of descriptors are explained in the text; b MF refer to the mean effect value.
Table 12. Comparison of Experimental and predicted values of pIC50 for test set by SMLR, PLS and GA-PLS models.
Table 12. Comparison of Experimental and predicted values of pIC50 for test set by SMLR, PLS and GA-PLS models.
No.pIC50 (Exp.)PLSGA-PLSSMLR
pIC50 (Pred.)ResidualpIC50 (Pred.)ResidualpIC50 (Pred.)Residual
107.247.340.106.79−0.457.420.18
126.506.32−0.176.710.226.35−0.14
177.507.44−0.067.820.327.26−0.24
27.207.800.608.311.117.670.47
216.346.640.306.670.336.680.35
256.006.510.516.520.526.100.10
25a8.707.81−0.898.720.027.85−0.84
37b6.586.46−0.136.57−0.016.16−0.43
405.705.730.036.000.305.28−0.42
436.005.52−0.485.78−0.225.65−0.35
45b5.965.22−0.735.60−0.366.550.59
476.146.800.626.700.525.60−0.57
516.456.580.126.30−0.156.10−0.35
53b6.856.45−0.416.61−0.246.30−0.56
58c8.397.60−0.797.31−1.087.67−0.71
67.928.500.587.64−0.288.210.29
Table 13. Physcicochemical, topological and structural descriptor.
Table 13. Physcicochemical, topological and structural descriptor.
IDDefinitionGroup
1RBN, RBFConstitutional
2D/D, J, MAXDN, MAXDP, X5, X0v, X1v, X3v, X4Av, X5Av, X0sol, X0sol, X1sol, X2sol, X3sol, X4sol, X5sol, S0K, S1K, IDDE, IVDE, SIC0, CIC0, IC1, SIC1, CIC1,IC2, BIC4, BIC5, D/Dr05, D/dr06, T(N..O), T(N..S), T(O..O)Topological
3BEHm1, BEHm2, BEHm3, BEHm4, BEHm5, BEHm6, BEHv6, BEHv7, BEHe3, BEHe4, BELe5, BELe6BUCUT
4GGI2,GGI3,GGI10, JGI1Galvez topol. Charge indices
5ATS8m, ATS8v, MATS5e, MTAS6e, GATS4e, GATS5e2D Autocorrelations
6qnmax, QposCharge descriptors
7FDI, PJI3, DISPv, QYYvGeometrical
8RDF06u, RDF065u, RDF120u, RDF125u, RDF130u, RDF135u, RDF030m, RDF035m, RDF080m, RDF085m, RDF120m, RDF125m, RDF105v, RDF110vRDF
9Mor17u, Mor18u, Mor29u, Mor30u, Mor08m, Mor09m, Mor14m, Mor15m, Mor22m, Mor23m, Mor24m, Mor25m, Mor30m, Mor31m, Mor17v, Mor18v, Mor19v, Mor20v, Mor21v, Mor22v, Mor27v, Mor28v, Mor18e, Mor28e, Mor11p, Mor12p3D-MoRSE
10E2u, E3u, E3e, G1p, G2p, E1p, L2s, L3s, G1s, G2s, Au, AmWHIM
11HIC, HGM, H3u, H4u, H3m, H4m, H7m, H8m, HATS2m, HATS3m, HATS1e, HATS2e, HATS7p, HATS8p, RARS, REIG, R5u, R6u, R3u+, R4u+, RTu+, R2m, RTm, R1m+, R8m+, RTm+, R1v, R2v, RTv, R1v+, R2e, R3e, RTp,R1p+GETAWAY
12MR, PSA, MLOGPProperties
* Description of descriptors refers to [30].
Table 14. Parameters of genetic algorithm GA.
Table 14. Parameters of genetic algorithm GA.
Cross validationRandom subset
Number of subset4
Window width2
Initial term %20%
Maximum generation100
Convergence (%)80
Cross-overDouble
Table 15. Structural modification of CXCR2 receptor antagonists and predicted activities.
Table 15. Structural modification of CXCR2 receptor antagonists and predicted activities.
Molecules 16 01928 i043
IDXYGA-PLS (pIC50 predicted)Leverage-limit
1cHBr7.100.07
2cHCl5.630.05
3cHNO26.170.05
4cHOMe6.010.04
5cHMe5.500.03
6cHEt5.500.04
7cBrNO25.480.04
8cBrMe7.200.05
9cBrOMe6.670.04
10cBrEt8.500.06
11cHH6.490.04
Table 16. Structural modification of CXCR2 receptor antagonists and predicted activities.
Table 16. Structural modification of CXCR2 receptor antagonists and predicted activities.
Molecules 16 01928 i044
IDXGA-PLS (pIC50 predicted)Leverage-limit
10cO8.500.04
2dNH7.740.07
3dNMe8.820.05
4dNOH7.910.07
5dNOMe8.420.06
6dNNH27.990.06
7dNNHMe8.390.05
8dNNMe28.100.08
9dS8.980.05

Share and Cite

MDPI and ACS Style

Asadollahi, T.; Dadfarnia, S.; Shabani, A.M.H.; Ghasemi, J.B.; Sarkhosh, M. QSAR Models for CXCR2 Receptor Antagonists Based on the Genetic Algorithm for Data Preprocessing Prior to Application of the PLS Linear Regression Method and Design of the New Compounds Using In Silico Virtual Screening. Molecules 2011, 16, 1928-1955. https://doi.org/10.3390/molecules16031928

AMA Style

Asadollahi T, Dadfarnia S, Shabani AMH, Ghasemi JB, Sarkhosh M. QSAR Models for CXCR2 Receptor Antagonists Based on the Genetic Algorithm for Data Preprocessing Prior to Application of the PLS Linear Regression Method and Design of the New Compounds Using In Silico Virtual Screening. Molecules. 2011; 16(3):1928-1955. https://doi.org/10.3390/molecules16031928

Chicago/Turabian Style

Asadollahi, Tahereh, Shayessteh Dadfarnia, Ali Mohammad Haji Shabani, Jahan B. Ghasemi, and Maryam Sarkhosh. 2011. "QSAR Models for CXCR2 Receptor Antagonists Based on the Genetic Algorithm for Data Preprocessing Prior to Application of the PLS Linear Regression Method and Design of the New Compounds Using In Silico Virtual Screening" Molecules 16, no. 3: 1928-1955. https://doi.org/10.3390/molecules16031928

Article Metrics

Back to TopTop