Recent progresses in the exploration of machine learning methods as in-silico ADME prediction tools☆
Graphical abstract
Introduction
The discovery and optimization of therapeutic agents with desirable pharmacodynamics, pharmacokinetic toxicological properties is the key focus of drug development efforts [1]. Predictive tools for accurately assessing pharmacokinetic and toxicological properties as well as pharmacodynamic properties in early development stages are highly useful for increased productivity in drug discovery processes [1], [2], [3]. As part of the efforts for developing these tools, computational methods have been developed and improved for the prediction of compound absorption, distribution, metabolism, and excretion (ADME) properties [4], [5]. In particular, machine learning (ML) methods have shown promising potential in predicting ADME properties by correlating these properties to molecular features and by establishing the complex structure–property relationships for diverse ranges of molecular structures and mechanisms [6], [7].
More recently, efforts have been directed at the development and refinement of ML models for improved prediction and more extensive coverage of various ADME properties particularly excretion [8], [9], [10] and distribution [11], [12] properties, and for the prediction of regulators of drug metabolism [13], [14], [15], [16] and excretion [8] implicated in drug–drug interactions and multi-drug resistance respectively. Efforts have also been made to further explore consensus modeling for improved prediction of the ADME properties and ADME regulatory properties of drug candidates [8], [13]. Moreover, online machine learning ADME and ADME regulatory property prediction servers have emerged [15], [17]. Here we review these progresses and discuss the performances, application prospects and challenges of exploring ML methods as tools for predicting ADME and ADME regulatory properties.
Section snippets
Molecular descriptors for representing compounds in ADME prediction
Molecular descriptors have been extensively used for representing structural and physicochemical properties of compounds from their molecular structures. The compounds associated with a specific ADME property are typically of high structural and mechanistic diversity. Therefore, the prediction of various ADME properties requires different sets of molecular descriptors that adequately cover the relevant molecular features. A large variety of > 3000 molecular descriptors can be computed from such
Commonly used machine learning methods for developing classification models
A number of ML methods have been used for developing ADME predictive tools. These include Linear Discriminant Analysis (LDA), k Nearest Neighbor (kNN), Artificial Neural Network (ANN), Probabilistic Neural Network (PNN), Support Vector Machine (SVM), Decision Tree (DT), Recursive Partitioning (RP), Random Forest (RF), Naïve Bayesian (NB), Multiple Linear Regression (MLR), Partial Least Squares Regression (PLSR), kNN Regression (kNNR), Support Vector Regression (SVR), Random Forest Regression
The exploration of machine learning classification methods for predicting ADME properties
ML classification methods classify compounds into one of the two opposing classes, one associated with a property (e.g. an ADME property) and the other not associated with the property. Because of their ability in classifying compounds of diverse range of structures and physicochemical properties, ML classification methods have been extensively explored for predicting various ADME properties that are typically associated with compounds of diverse structures (e.g. substrates of a drug
The exploration of machine learning classification methods for predicting ADME regulatory properties
ML classification methods have also been extensively used for predicting regulators of drug ADME properties, particularly the inhibitors of drug efflux and influx transporters for regulating multi-drug resistance (Table 3) [8], [64], [65] and the inhibitors of drug metabolism enzymes for assessing drug–drug interactions (Table 4) [13], [14], [66]. These studies have primarily focused on the extended coverage of drug transporters (9 transporters) [8] and metabolism enzymes (5 CYP enzymes CYP
The exploration of machine learning regression methods for predicting ADME and ADME regulatory properties
ML regression methods are intended for estimating the affinity/activity level in addition to the determination of whether or not a compound possesses or regulates a specific ADME property. Table 5 summarises the performance of the recently developed ML regression methods for predicting the affinity/activity level of ADME and ADME regulatory properties. Partly because of the limited availability of experimental affinity/activity levels, ML regression models have been developed for a limited
The trends in the development of machine learning models for predicting ADME and ADME regulatory properties
There are noticeable trends in the recent efforts for developing ML models to predict ADME and ADME regulatory properties. In developing ML classification models for predicting ADME and ADME regulatory properties, three ML methods support vector machines (SVM, 38 models), random forest (RF, 27 models) and k nearest neighbor (kNN, 25 models) have been more frequently used than other ML regression methods (4 models). These three methods have also been used for developing all the consensus ML
Application scope of the developed machine learning models
The recently and previously [79] developed ML classification models broadly cover compound metabolism (by 6 different CYP enzymes) [79], efflux (by 6 different transporters) [8] and influx (by 4 different transporters) [8] at reasonably good predictive accuracies. The SEs, SPs and ACs of the majority of the ML classification models are in ranges of 74%–92%, 66%–76% and 72%–92% respectively. The SEs are close to but the SPs are substantially lower than the SEs (~ 90%) and SPs (~ 90%) of ML virtual
Challenges in the exploration of machine learning methods
The performance of ML methods critically depends on the diversity and representativeness of in the training datasets and the appropriate representation of their structural and physicochemical properties. The training datasets used in the most of the ML models described in Table 2, Table 3, Table 4, Table 5 are not expected to be fully representative of the compounds associated with each specific ADME property. This is particularly true for compounds not possessing a specific ADME property,
Perspectives
Both classification-based and regression-based ML methods have consistently shown promising capability in predicting a variety of ADME and ADME regulatory properties for diverse ranges of structures at accuracy levels comparable to those practically used in drug lead discovery and optimization, making the developed ADME and ADME regulatory prediction models potentially useful tools for assessing ADME properties and predicting ADME regulatory properties. In spite of the significant efforts, the
Acknowledgements
We acknowledge the support by Major State Basic Research Development Program of China 2013CB967204 and Singapore Academic Research Fund R148000181112.
References (93)
- et al.
Present and future in vitro approaches for drug metabolism
J. Pharmacol. Toxicol. Methods
(2000) - et al.
A prediction model of substrates and non-substrates of breast cancer resistance protein (BCRP) developed by GA-CG-SVM method
Comput. Biol. Med.
(2011) - et al.
Deriving the 3D structure of organic molecules from their infrared spectra
Vib. Spectrosc.
(1999) Graph theoretical approach to local and overall aromaticity of benzenoid hydrocarbons
Tetrahedron
(1975)- et al.
Understanding and using genetic algorithms. Part 1. Concepts, properties and context
Chemom. Intell. Lab. Syst.
(1993) - et al.
Comparison of forward selection, backward elimination, and generalized simulated annealing for variable selection
Microchem. J.
(1993) - et al.
A method for quantifying and visualizing the diversity of QSAR models
J. Mol. Graph. Model.
(2004) Probabilistic neural networks
Neural Netw.
(1990)- et al.
PLS-regression: a basic tool of chemometrics
Chemom. Intell. Lab. Syst.
(2001) - et al.
In silico prediction of unbound brain-to-plasma concentration ratio using machine learning algorithms
J. Mol. Graph. Model.
(2011)
Fingerprint-based in silico models for the prediction of P-glycoprotein substrates and inhibitors
Bioorg. Med. Chem.
Exploration of (S)-3-aminopyrrolidine as a potentially interesting scaffold for discovery of novel Abl and PI3K dual inhibitors
Eur. J. Med. Chem.
Heuman indices of hydrophobicity of bile acids and their comparison with a newly developed and conventional molecular descriptors
Biochimie
Drug discovery: a historical perspective
Science
High-throughput screening in drug metabolism and pharmacokinetic support of drug discovery
Annu. Rev. Pharmacol. Toxicol.
ADMET in silico modelling: towards prediction paradise?
Nat. Rev. Drug Discov.
Prediction of drug disposition on the basis of its chemical structure
Clin. Pharmacokinet.
Support vector machines for ADME property classification
QSAR Comb. Sci.
The use of machine learning and nonlinear statistical tools for ADME prediction
Expert Opin. Drug Metab. Toxicol.
Human intestinal transporter database: QSAR modeling and virtual profiling of drug uptake, efflux and interactions
Pharm. Res.
Development of conformation independent computational models for the early recognition of breast cancer resistance protein substrates
Biomed Res. Int.
Quantitative structure–activity relationship models of clinical pharmacokinetics: clearance and volume of distribution
J. Chem. Inf. Model.
Prediction of human volume of distribution values for drugs using linear and nonlinear quantitative structure pharmacokinetic relationship models
Interdiscip. Sci.
Classification of cytochrome P450 inhibitors and noninhibitors using combined classifiers
J. Chem. Inf. Model.
Predictive models for cytochrome p450 isozymes based on quantitative high throughput screening data
J. Chem. Inf. Model.
WhichCyp: prediction of cytochromes P450 inhibition
Bioinformatics
A unified proteochemometric model for prediction of inhibition of cytochrome p450 isoforms
PLoS One
Evaluation of drug–human serum albumin binding interactions with support vector machine aided online automated docking
Bioinformatics
DRAGON
Virtual computational chemistry laboratory — design and description
J. Comput. Aided Mol. Des.
Molconn-Z, in, eduSoft, LC
JOELib/JOELib2
MODEL-molecular descriptor lab: a web-based server for computing structural and physicochemical features of compounds
Biotechnol. Bioeng.
PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints
J. Comput. Chem.
Counts of all walks as atomic and molecular descriptors
J. Chem. Inf. Comput. Sci.
The coding of the three-dimensional structure of molecules by molecular transforms and its application to structure–spectra correlations and studies of biological activity
J. Chem. Inf. Comput. Sci.
Metric validation and the receptor-relevant subspace concept
J. Chem. Inf. Comput. Sci.
MS-WHIM, new 3D theoretical descriptors derived from molecular surface properties: a comparative 3D QSAR study in a series of steroids
J. Comput. Aided Mol. Des.
Charge indexes. New topological descriptors
J. Chem. Inf. Comput. Sci.
Structure/response correlations and similarity/diversity analysis by GETAWAY descriptors. 1. Theory of the novel 3D molecular descriptors
J. Chem. Inf. Comput. Sci.
Molecular profiles. Novel geometry-dependent molecular descriptors
New J. Chem.
Molecular Structure Description: The Electrotopological State
Estimation of molecular free energy relation descriptors using a group contribution approach
J. Chem. Inf. Comput. Sci.
QSAR analysis of drug excretion into human breast milk
J. Clin. Hosp. Pharm.
Gene selection for cancer classification using support vector machines
Mach. Learn.
Prediction of P-glycoprotein substrates by a support vector machine approach
J. Chem. Inf. Comput. Sci.
Cited by (71)
Artificial Intelligence in Pharmaceutical Sciences
2023, EngineeringScreening model of candidate drugs for breast cancer based on ensemble learning algorithm and molecular descriptor
2023, Expert Systems with ApplicationsAnticancer potential of phytochemicals from Oroxylum indicum targeting Lactate Dehydrogenase A through bioinformatic approach
2023, Toxicology ReportsCitation Excerpt :Oroxin A is another flavonoid usually isolated from O. indicum and it has been reported to possess significant inhibitory properties against breast cancer proliferation by generating significant endoplasmic reticulum stress and senescence [64]. Servers involving computational ADME and toxicity analyses have improved greatly in recent years with the incorporation of machine learning methods which have facilitated rapid analyses to evaluate various pharmacokinetic, pharmacodynamic and toxicity properties of drug-like compounds [65]. The present investigation revealed favorable ADME/T properties for Chrysin-7-O-glucuronide, Oroxindin and Oroxin A.
Integrated RNA-sequencing and network pharmacology approach reveals the protection of Yiqi Huoxue formula against idiopathic pulmonary fibrosis by interfering with core transcription factors
2022, PhytomedicineCitation Excerpt :Therefore, using the multiple components-targets networks could aid in the effective treatment of IPF. Among the components of TCM, only molecules that overcome the absorption, distribution, metabolism, and excretion barrier are expected to exert curative effects, indicating their candidature for the active ingredients group (Tao et al., 2015). The active components of YQHX are key in elucidating their mechanism of action.
- ☆
This review is part of the Advanced Drug Delivery Reviews theme issue on “In silico ADMET predictions in pharmaceutical research”.