Skip to main content

2023 | Buch

QSPR/QSAR Analysis Using SMILES and Quasi-SMILES

insite
SUCHEN

Über dieses Buch

This contributed volume overviews recently presented approaches for carrying out QSPR/QSAR analysis by using a simplifying molecular input-line entry system (SMILES) to represent the molecular structure. In contrast to traditional SMILES, quasi-SMILES is a sequence of special symbols-codes that reflect molecular features and codes of experimental conditions. SMILES and quasi-SMILES serve as a basis to develop QSPR/QSAR as well Nano-QSPR/QSAR via the Monte Carlo calculation that provides the so-called optimal descriptors for QSPR/QSAR models. The book presents a reliable technology for developing Nano-QSPR/QSAR while it also includes the description of the algorithms of the Monte Carlo optimization. It discusses the theory and practice of the technique of variational authodecoders (VAEs) based on SMILES and analyses in detail the index of ideality of correlation (IIC) and the correlation intensity index (CII) which are new criteria for the predictive potential of the model. The mathematical apparatus used is simple so that students of relevant specializations can easily follow. This volume is a valuable contribution to the field and will be of great interest to developers of models of physicochemical properties and biological activity, chemical technologists, and toxicologists involved in the area of drug design.

Inhaltsverzeichnis

Frontmatter

Theoretical Conceptions

Frontmatter
Chapter 1. Fundamentals of Mathematical Modeling of Chemicals Through QSPR/QSAR
Abstract
The evolution of mathematical chemistry in its applications to establish the quantitative structure–property/activity relationships (QSPRs/QSARs) between molecular structure and the physicochemical and biochemical behavior of substances is discussed. The gradual improvement of molecular descriptors and the statistically validated methods developed for the above general task are described. The possible ways of applying and extending OECD principles are demonstrated via computational experiments to build QSPR/QSAR models. The leading role of validation in obtaining applicable models is noted. Stochastic procedures able to improve the reliability of QSPR/QSAR models are demonstrated.
Andrey A. Toropov, Maria Raskova, Ivan Raska Jr., Alla P. Toropova
Chapter 2. Molecular Descriptors in QSPR/QSAR Modeling
Abstract
Molecular descriptors are mathematical representation of a molecule obtained by a well-specified algorithm applied to a defined molecular representation or a well-specified experimental procedure. The molecular descriptors as the core feature-independent parameters used to predict biological activity or molecular property of compounds in the quantitative structure property/activity relationship (QSPR/QSAR) models. Over the years, more than 5000 molecular descriptors have been introduced and calculated using different software. In this chapter, the main classes of theoretical molecular descriptors including 0D, 1D, 2D, 3D, and 4D-descriptors are described. The most significant progress over the last few years in chemometrics, cheminformatics, and bioinformatics has led to new strategies for finding new molecular descriptors. The different approaches for deriving molecular descriptors here reviewed, and some of the new important molecular descriptors and their applications are presented.
Shahin Ahmadi, Sepideh Ketabi, Marjan Jebeli Javan
Chapter 3. Application of SMILES to Cheminformatics and Generation of Optimum SMILES Descriptors Using CORAL Software
Abstract
This chapter uses a simplified molecular input-line entry system (SMILES) to solve diverse problems in science, technology, and medicine. SMILES can be useful to model quantitative structure–property/activity relationships (QSPRs/QSARs). The evolution of the applications of SMILES and the evolution of SMILES descriptors are discussed. The construction of so-called optimal descriptors based on SMILES using the CORAL software is described. These optimal descriptors are useful for training QSPR/QSAR models for a wide range of diverse properties.
Andrey A. Toropov, Alla P. Toropova

SMILES Based Descriptors

Frontmatter
Chapter 4. All SMILES Variational Autoencoder for Molecular Property Prediction and Optimization
Abstract
Variational autoencoders (VAEs) defined over SMILES string and graph-based representations of molecules promise to improve the optimization of molecular properties, thereby revolutionizing the pharmaceuticals and materials industries. However, these VAEs are hindered by the non-unique nature of SMILES strings and the computational cost of graph convolutions. To efficiently pass messages along all paths through the molecular graph, we encode multiple SMILES strings of a single molecule using a set of stacked recurrent neural networks, harmonizing hidden representations of each atom between SMILES representations, and use attentional pooling to build a final fixed-length latent representation. By then decoding to a disjoint set of SMILES strings of the molecule, our All SMILES VAE learns an almost bijective mapping between molecules and latent representations near the high probability mass subspace of the prior. Our SMILES-derived but molecule-based latent representations significantly surpass the state of the art in a variety of fully and semi-supervised property regression and molecular property optimization tasks.
Zaccary Alperstein, Artem Cherkasov, Jason Tyler Rolfe
Chapter 5. SMILES-Based Bioactivity Descriptors to Model the Anti-dengue Virus Activity: A Case Study
Abstract
The present work aims to demonstrate the significance of the newly suggested bioactivity descriptors (so-called signaturizers) towards developing predictive 2D-QSAR models. As a case study, we examined the development of 2D-QSAR models based on a dataset containing 77 compounds with inhibitory activity reported in a DENV2ProHeLa assay, which is basically a cell-based assay that estimates the Dengivirus-2 (DENV-2) protease inhibitory potential within cellular atmosphere. Indeed, though dengue is a well-known neglected tropical disease, its global incidence has risen sharply in recent years. Moreover, DENV infections may lead to serious and life-threatening diseases such as haemorrhagic fever and dengue shock syndrome. Inhibition of the DENV protease may therefore be a potential target for discovering anti-DENV agents. Interestingly, our initial attempts to set up QSAR models based solely on a number of chemicals descriptors coming from a range of different software packages/programs completely failed, since none of these yielded satisfactory statistical results. Hybrid QSAR models were generated also by combining both chemical and biological descriptors. Noteworthy is that the predictive quality of the 2D-QSAR models significantly improved by resorting instead to solely bioactivity descriptors or those combined with chemical descriptors. The comparison analysis carried out in this work certainly shows that bioactivity descriptors can be useful for setting up predictive models to characterise complex biological activity data, but then of course at the expense of their mechanistic interpretation. Simultaneously, this work provides important guidelines to exploit different linear and non-linear model development strategies in a systematic and consistent manner. What is more, it is based on non-commercial open-access tools, programs and webservers, so that the models can be reproduced, and the proposed models’ development strategies be easily and productively followed in the near future.
Soumya Mitra, Sumit Nandi, Amit Kumar Halder, M. Natalia D. S. Cordeiro

SMILES for QSPR/QSAR with Optimal Descriptors

Frontmatter
Chapter 6. QSPR Models for Prediction of Redox Potentials Using Optimal Descriptors
Abstract
The redox potential is an important physicochemical property widely used for the characterization of chemical species, and, as a characteristic constant of a given chemical species, it is also useful for predicting various other properties of the species. In the chapter, we review and discuss the pros and cons of QSPR models for the prediction of redox potentials using optimal descriptors calculated with the SMILES as well as using the so-called hybrid descriptors calculated with considering SMILES and molecular graphs of atomic orbitals.
Karel Nesměrák, Andrey A. Toropov
Chapter 7. Building Up QSPR for Polymers Endpoints by Using SMILES-Based Optimal Descriptors
Abstract
The general scheme of QSPR analysis of endpoints related to polymers is described. The basic idea of the approach is building up a model of a polymer as a mathematical function of monomer structure represented by a simplified molecular input line-entry system (SMILES). The suitability of so-called hybrid optimal descriptors in QSPR analysis of polymer systems is suggested and discussed. QSPR models for glass transition temperature and refractive index are represented in detail. Possible ways of evolution of the QSPR for polymers are listed and discussed.
Valentin O. Kudyshkin, Alla P. Toropova

Quasi-SMILES for QSPR/QSAR

Frontmatter
Chapter 8. Quasi-SMILES-Based QSPR/QSAR Modeling
Abstract
Quantitative structure–property/activity relationships (QSPRs/QSARs) have been used to predict the physicochemical property and biological activity of different substances, considering that the physicochemical property/biological activity of a new or untested substance can be inferred from the molecular structure or other properties of similar compounds whose properties/activities have already been assessed. Traditional QSPR/QSAR models based on physicochemical properties and molecular information are not so successful in predicting endpoint of substances such as nanomaterials due to scarcity of available dataset in same conditions. A new approach using eclectic information as descriptors to predict the endpoint of substance materials was developed in CORAL software (http://​www.​insilico.​eu/​coral). In this approach, physicochemical properties and the experimental conditions of substance are represented by so-called quasi-SMILES, which are character-based representations derived from traditional Simplified Molecular Input Line Entry System (SMILES). Thus, a main advantage of the quasi-SMILES is to increase the number of available datasets by using the eclectic data in developing quasi-SMILES-based QSPRs/QSARs models. This chapter provides instructions on how to use CORAL software for building QSPR/QSAR models based on quasi-SMILES.
Shahin Ahmadi, Neda Azimi
Chapter 9. Quasi-SMILES-Based Mathematical Model for the Prediction of Percolation Threshold for Conductive Polymer Composites
Abstract
The traditional method for creating conductive polymer composites (CPCs) involves mixing carbon black, metal powder, or carbon fibre into a polymer matrix. Since the polymer matrix acts as an insulator, when a threshold filler level is achieved, the conductivity of these composites can exhibit a sharp increase. The common term generally used to describe such phenomena is called ‘percolation’. As the conductive filler content increases in the insulator polymer matrix, it creates different conductive routes, steady rise in the electrical conductivity is observed at a critical volume fraction Φ. That critical volume fraction Φ responsible for the transition of polymers from insulators to conducting is called the ‘percolation threshold’. The diverse experimental percolation threshold cured data of 45 conductive polymer composite systems were classified into four sets: A = active training set; P = passive training set; C = calibration set; V = validation set. Systems of eclectic conditions of various processes of mixing such as dry mixing, latex technology, and melt blending employed to fabricate the conducting polymer composites with various polymer matrixes like high-density polyethylene (HDPE), low-density polyethylene (LDPE), maleic anhydride (MA), polyamide (PA) and the conducting fillers such as multi-wall carbon nanotube (MWNT), single-wall carbon nanotube (SWNT), polyaniline (PANI) are very important and crucial to have desired properties. Unique quasi-SMILES codes for different CPCs were suggested taking into consideration various systems of eclectic conditions. These quasi-SMILES codes were the basis for building mathematical models for predicting percolation threshold CPCs.
Swayam Aryam Behera, Alla P. Toropova, Andrey A. Toropov, P. Ganga Raju Achary
Chapter 10. On the Possibility to Build up the QSAR Model of Different Kinds of Inhibitory Activity for a Large List of Human Intestinal Transporter Using Quasi-SMILES
Abstract
Membrane transporters play a significant role in pharmacokinetics and drug resistance and mediate many biological effects of substances. Among biologically active chemicals, it is necessary to evaluate the profiles of their transporter interactions in order to identify potential medication candidates. The constraints and predictive capability of models for substances with heterogeneous physicochemistry and variable permeability/absorption are explored in this communication using the largest diverse permeability and absorption dataset for 3199 compounds. Here, we offer a classification-based QSAR model of different inhibitory activities for an extensive list of Human Intestinal Transporter using quasi-SMILES. The extraction of properties from quasi-SMILES and the computation of so-called correlation weights for these attributes using Monte Carlo techniques were the foundation for the classification-based models. As qualitative statistical validation criteria, the classification model was tested using sensitivity (= 0.86), specificity (= 1), accuracy (= 0.96), and Matthews correlation coefficient (MCC = 0.90). Described computational experiments confirm the suitability of application of so-called Index of Ideality of Correlation to improve the predictive potential of the models.
P. Ganga Raju Achary, P. Kali Krishna, Alla P. Toropova, Andrey A. Toropov
Chapter 11. Quasi-SMILES as a Tool for Peptide QSAR Modelling
Abstract
Peptides have played an attractive role since a few decades in the discovery of new drugs in various areas involving hormones, antimicrobials, cytokines, etc. The peptide is very righteous alternative for small molecules and biological therapeutics. Different modelling approaches can be applied to accelerate the design of different peptides-based molecules. Simplified molecular input line entry system (SMILES) is a sequence of symbols which is used to recount the molecular structure of compounds. This method helps in the development of QSAR models that describe the physiochemical property of the compounds. In contrast to SMILES, quasi-SMILES is used as an encipher for both information about molecular structure and specific experimental conditions (biological and physicochemical conditions). Quasi-SMILES uses eclectic information to design an extended representation of data. It represents all peptides in abbreviation of their corresponding amino acid and can be applied in the field of peptide-based QSAR modelling. In this chapter, we have discussed the different modelling approaches including quasi-SMILES approach for the development of QSAR models of peptide. The different models and their success in peptide QSAR models have been covered in detail.
Md. Moinul, Samima Khatun, Sk. Abdul Amin, Tarun Jha, Shovanlal Gayen

SMILES and Quasi-SMILES for QSPR/QSAR

Frontmatter
Chapter 12. SMILES and Quasi-SMILES Descriptors in QSAR/QSPR Modeling of Diverse Materials Properties in Safety and Environment Application
Abstract
A brief summary of QSAR/QSPR methodology, together with an explanation of the approach using SMILES and quasi-SMILES descriptors to study diverse hazardous characteristics of diverse materials, is given. Studies of several properties of importance to safety and environment application are described including (i) the cytotoxicity of heterogeneous single metal oxide-based engineered nanoparticles, (ii) the cytotoxicity of a series of metal oxide nanoparticles, (iii) the flammability properties of chemicals and their mixture, (iv) thermal hazards properties of ionic liquids and their mixture and (v) the toxicity of ionic liquids and their mixtures. The limitations and outlook of this field in safety and environment are discussed.
Yong Pan, Xin Zhang, Juncheng Jiang
Chapter 13. SMILES and Quasi-SMILES in QSAR Modeling for Prediction of Physicochemical and Biochemical Properties
Abstract
QSAR modeling of diverse physicochemical and biochemical properties of organic chemicals and nanomaterials utilizing the simplified molecular-input line-entry system (SMILES) and quasi-SMILES representation is quite a popular approach nowadays. Along with the SMILES, the quasi-SMILES approach offers the likelihood to identify and weigh the statistical importance of various eclectic data accessible for computational systematization and analysis. Therefore, the quasi-SMILES can be helpful as a tool for drug design, environmental risk assessment, and regulation caused by applying nanomaterials and organic chemicals as the method gives the possibility to consider building up corresponding models. The Monte Carlo method is applied to build up the QSAR modeling employing information collected from SMILES and quasi-SMILES. The model can be freely developed using open-access CORrelation And Logic (CORAL) software. The quasi-SMILES is an ideal approach for complex chemical systems like nanomaterials where there is no limitation to choose the list of eclectic data to make a reliable, efficient, and predictive QSAR model. In the present book chapter, we will talk about the fundamental of SMILES and quasi-SMILES-based QSAR models and their major applications in physicochemical and biochemical properties prediction.
Siyun Yang, Supratik Kar, Jerzy Leszczynski

Possible Ways of Nano-QSPR/Nano-QSAR Evolution

Frontmatter
Chapter 14. The CORAL Software as a Tool to Develop Models for Nanomaterials’ Endpoints
Abstract
This chapter discusses the evolution of the so-called quasi-SMILES. The traditional simplified molecular-input line-entry system (SMILES) is a string of characters conveying information about the structure of molecules. Quasi-SMILES is a string of characters that can convey codes reflecting the structure of molecules and the conditions for conducting chemical or biochemical experiments. Several examples demonstrate the similarity in reporting data on individual nanomaterials and data on two or more nanomaterials subjected to the same type of experiment. The possibility of gradual expansion of the scope of application of quasi-SMILES, as well as the possibility of using quasi-SMILES as input information for the CORAL software (abbreviation CORrelation And Logic) when building models of physicochemical and biochemical phenomena for nanomaterials, is shown.
Alla P. Toropova, Andrey A. Toropov
Chapter 15. Employing Quasi-SMILES Notation in Development of Nano-QSPR Models for Nanofluids
Abstract
Nowadays, variant strategies are proposed and evaluated to find the best scenario for upgrading the high-accurate QSAR/QSPR modeling, particularly on nano-scale. One of the most interesting samples is nanofluids because of high potential in heat transfer applications. In the case of nano-QSPR, some optimum empirical conditions and characteristic features (e.g., size of nanoparticles and temperature) play impressive roles in nanofluids’ properties. Quasi-simplified molecular input-line entry-system (quasi-SMILES) is nominated as valuable linear notation to meet the demands for representation of nanofluids, either chemical structure or defined conditions. The outcomes of nano-QSPR modeling of nanofluids by quasi-SMILES not only make possible the incorporation of molecular structure with experimental conditions in modeling process but also reveal the influence of some molecular features on studied thermophysical properties. Herein, recent studies on the development of predictive models of nanofluids using quasi-SMILES, which is a new trend to estimate the properties of nanofluids, were discussed comprehensively. It is remarkable to point out that the statistical evaluation of proposed models confirmed the predictability power, reliability, and credit of developed models in all reported cases. It is rational that scholars are working on improving QSAR/QSPR modeling; employing quasi-SMILES is an open opportunity to overcome the limitations of conventional molecular representation.
Kimia Jafari, Mohammad Hossein Fatemi

Possible Ways of QSPR/QSAR Evolution in the Future

Frontmatter
Chapter 16. On Complementary Approaches of Assessing the Predictive Potential of QSPR/QSAR Models
Abstract
This chapter covers an overview of recent studies performed to improve the statistical tools to assess and compare different QSPR/QSAR models. The critical analysis of existing approaches to assess the predictive potential is briefly presented. The disadvantages of the systems of self-consistent models are also discussed. The potential advantages of the systems of self-consistent models are defined. A series of successful applications of the approach for several endpoints are discussed in order to confirm the potential of the approach as a tool to validate QSAR models.
Andrey A. Toropov, Alla P. Toropova, Danuta Leszczynska, Jerzy Leszczynski
Chapter 17. CORAL: Predictions of Quality of Rice Based on Retention Index Using a Combination of Correlation Intensity Index and Consensus Modelling
Abstract
The purpose of this study is to utilize the Monte Carlo technique of CORAL software for establishing a quantitative structure-retention relationship (QSRR) for the retention indices of 136 primary flavour volatile organic molecules. SMILES notations of volatile organic compounds were used to compute the descriptor of correlation weight (DCW). Eight splits have been constructed from the dataset of 136 volatile organic chemicals, each of which was further divided into four sets: training, invisible training, calibration and validation. Two target functions i.e. TF1 (\({\text{CII}}_{{{\text{weight}}}} = 0.0\)), TF2 (\({\text{CII}}_{{{\text{weight}}}} = 0.3\)) were applied to build 16 QSRR models. All QSRR models were statistically good. The coefficient of determination derived by TF2 for the validation set of split 4 has the maximum statistical result (\(R_{{{\text{validation}}}}^{2} = 0.9532\)), hence it was accepted as the best model. The assignment of correlation intensity index (CII) on QSPR models was thoroughly examined and found to be more consistent and relevant. The common promoters of increase and decrease of endpoint were also extracted from four splits 1, 2, 3 and 4. Furthermore, consensus modelling using the split 4 architecture of dataset distribution enhances prediction accuracy by increasing the numerical value of \(R_{{{\text{validation}}}}^{2}\) from 0.9532 to 0.9864.
Parvin Kumar, Ashwani Kumar
Backmatter
Metadaten
Titel
QSPR/QSAR Analysis Using SMILES and Quasi-SMILES
herausgegeben von
Alla P. Toropova
Andrey A. Toropov
Copyright-Jahr
2023
Electronic ISBN
978-3-031-28401-4
Print ISBN
978-3-031-28400-7
DOI
https://doi.org/10.1007/978-3-031-28401-4

Premium Partner