Graph kernels for chemical informatics

doi:10.1016/j.neunet.2005.07.009

Neural Networks

Volume 18, Issue 8, October 2005, Pages 1093-1110

https://doi.org/10.1016/j.neunet.2005.07.009 Get rights and content

Abstract

Increased availability of large repositories of chemical compounds is creating new challenges and opportunities for the application of machine learning methods to problems in computational chemistry and chemical informatics. Because chemical compounds are often represented by the graph of their covalent bonds, machine learning methods in this domain must be capable of processing graphical structures with variable size. Here, we first briefly review the literature on graph kernels and then introduce three new kernels (Tanimoto, MinMax, Hybrid) based on the idea of molecular fingerprints and counting labeled paths of depth up to d using depth-first search from each possible vertex. The kernels are applied to three classification problems to predict mutagenicity, toxicity, and anti-cancer activity on three publicly available data sets. The kernels achieve performances at least comparable, and most often superior, to those previously reported in the literature reaching accuracies of 91.5% on the Mutag dataset, 65–67% on the PTC (Predictive Toxicology Challenge) dataset, and 72% on the NCI (National Cancer Institute) dataset. Properties and tradeoffs of these kernels, as well as other proposed kernels that leverage 1D or 3D representations of molecules, are briefly discussed.

Section snippets

Introduction: processing structured data

Many problems in artificial intelligence, data mining, and machine learning involve variable-size structured data, such as strings and sequences, trees, and directed or undirected graphs. These data are to be contrasted with the standard, fixed-length, vectorial data found also in many applications. Examples of structured data include: (1) text and documents in information retrieval; (2) DNA/RNA/protein sequences and evolutionary trees in bioinformatics; and (3) molecular structures in chemical

Kernel methods

Although the basic idea behind kernel methods is quite old (e.g. Kimeldorf & Wahba, 1971), over the last decade it has regained considerable interest spurred by the work of Boser et al., 1992, Cortes and Vapnik, 1995 on support vector machines. Many new kernel methods have been developed and/or successfully applied to challenging problems in recent years (Cristianini and Shawe-Taylor, 2000, Schölkopf and Smola, 2002).

In essence, kernel methods handle non-linear complex tasks using linear

Convolution kernels for chemistry

In this section, we develop classes of graph kernels that can be viewed as spectral kernels derived by counting and comparing walks in the underlying graphs. To circumvent the limitations posed by other graph kernels and address problems of molecular classification, we propose to use and extend the common technique of molecular fingerprinting (Flower, 1998, Raymond and Willett, 2001) used by chemists to build new feature vector representations of molecules. What differentiates our approach from

Data

We apply the kernels introduced in the previous section together with the Voted Perceptron classifier to the problems of predicting mutagenicity, toxicity, and anti-cancer activity on three different data sets (Mutag, PTC, and NCI).

Results

We conducted several experiments to compare the various classes of kernels with different parameter settings. Here, we report representative subsets of results together with the main findings. In addition to the usual accuracy measure, we also measure the ROC score, which is the normalized area under the ROC curve plotting the proportion of true positive (TP) predictions as a function of the proportion of false positive (FP) predictions, as the classification threshold is varied (Hanley &

Discussion

We have reviewed graph kernels and developed new graph kernels for chemical molecules that yield state-of-the art results on several datasets. These kernels measure the similarity between feature vectors, or molecular fingerprints, consisting of binary vectors or vectors of counts associated with all labeled paths of length less or equal to some value d derived by depth-first searches, starting from each vertex of a molecule. The accuracies obtained, for instance, 72% on the NCI dataset, are

Acknowledgements

Work supported by a Laurel Wilkening Faculty Innovation award, an NIH Biomedical Informatics Training grant (LM-07443-01), an NSF MRI grant (EIA-0321390), a Sun Microsystems award, a grant from the University of California Systemwide Biotechnology Research and Education Program to PB and an MD/PhD Harvey Fellowship to S.J.S. We would also like to acknowledge OpenEye Scientific Software for their free academic software license.

References (78)

G. Kimeldorf et al.
Some results on Tchebycheffian spline functions
Journal of Mathematical and Analytical Application
(1971)
A. Srinivasan et al.
Theories for mutagenicity: a study of first-order and feature based induction
Artificial Intelligence
(1996)
M. Aizerman et al.
Theoretical foundations of the potential function method in pattern recognition learning
Automation and Remote Control
(1964)
F.R. Bach et al.
Kernel independent component analysis
Journal of Machine Learning Research
(2002)
Chen, J., Swamidass, S. J., Dou, Y., Bruand, J., & Baldi, P. (2005). ChemDB: a public, database of small molecules and...
P. Baldi et al.
Bioinformatics: the machine learning approach
(2001)
P. Baldi et al.
Hybrid modeling, HMM/NN architectures, and protein applications
Neural Computation
(1996)
P. Baldi et al.
The principled design of large-scale recursive neural network architectures-DAG-RNNs and the protein structure prediction problem
Journal of Machine Learning Research
(2003)
Baldi, P., & Rosen-Zvi, M. On the relationship between deterministic and probabilistic directed graphical models: from...
B. Boser et al.
A training algorithm for optimal margin classifiers
(1992)

D. Cherqaoui et al.

Use of neural network to determine the boiling point of alkanes

Journal Chemistry Society, Faraday Transactions

(1994)

P.Y. Chou et al.

Prediction of the secondary structure of proteins from their amino acid sequence

Advances Enzymol Related Areas Monecular Biology

(1978)

M. Collins et al.

Convolution Kernels for Natural Language

(2002)

C. Cortes et al.

Support vector networks

Machine Learning

(1995)

N. Cristianini et al.

An Introduction to Support Vector Mechines and other Kernel-Based Learning Methods

(2000)

A.K. Debnath et al.

Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity

Journal of Medicinal Chemistry

(1991)

C.M. Dobson

Chemical space and biology

Nature

(2004)

Dumais, S., Platt, J., Heckerman D., & Sahami, M. (1998). Inductive learning algorithms and representations for text...

M.A. Fligner et al.

A Modification of the Jaccard/ Tanimoto Similarity Index for Diverse Selection of Chemical Compounds Using Binary Strings

Technometrics,

(2002)

D.R. Flower

On the properties of bit string-based measures of chemical similarity

Journal of Chemical Information and Computer Science

(1998)

P. Frasconi et al.

A general framework for adaptive processing of data structures

IEEE Transactions on Neural Networks

(1998)

Y. Freund et al.

Large margin classification using the perceptron algorithm

Machine Learning

(1999)

B.J. Frey

Graphical models for machine learning and digital communication

(1998)

T. Friess et al.

The Kernel-Adatron Algorithm: a Fast and Simple Learning Procedure for Support Vector Machines

Gärtner, T. (2003). Exponential and Geomteric Kernels for Graphs. NIPS Workshop on Unreal Data: Principles of Modeling...

Gärtner, T., Flach, P.A. & Wrobel, S. (2003). On Graph Kernels: Hardness Results and Efficient Alternatives. In Proc....

J. Gasteiger et al.

Chemical information in 3D-space

Journal of Chemical Information and Computer Sciences

(1996)

C. Goller et al.

Learning task-dependent distributed structure-representations by backpropagation through structure

IEEE International Conference on Neural Networks, pages

(1996)

J.C. Gower

A general coefficient of similarity and some of its properties

Biometrics

(1971)

J.C. Gower et al.

Metric and euclidean properties of dissimilarity coefficients

Journal of Classification

(1986)

D. Hadjipavlou-Litina et al.

Quantitative structure-activity relationship of the benzodiazepines. A review and reevaluation

Chemical Reviews

(1994)

J.A. Hanley et al.

The meaning and use of the area under a receiver operating characteristic (roc) curve

Radiology

(1982)

Haussler, D. (1999). Convolution Kernels on Discrete Structures. Technical Report UCS-CRL-99-10, UC Santa...

D. Heckerman

A tutorial on learning with Bayesian networks

C. Helma et al.

The predictive toxicology challenge 2000-2001

Bioinformatics

(2001)

R. Herbrich

Learning kernel classifiers, theory and algorithms

(2002)

Horváth, T., Gärtner, T., & Wrobel, S. (2004). Cyclic pattern kernels for predictive graph mining. In Proc. of the 2004...

T. Jaakkola et al.

Exploiting generative models in discriminative classifiers

James, C.A., Weininger, D., & Delany, J. (2004). Daylight theory manual. Available at...

Cited by (0)

View full text

2005 Speical IssueGraph kernels for chemical informatics

Abstract

Section snippets

Introduction: processing structured data

Kernel methods

Convolution kernels for chemistry

Data

Results

Discussion

Acknowledgements

Journal of Mathematical and Analytical Application

Artificial Intelligence

Theoretical foundations of the potential function method in pattern recognition learning

Automation and Remote Control

Kernel independent component analysis

Journal of Machine Learning Research

Bioinformatics: the machine learning approach

Hybrid modeling, HMM/NN architectures, and protein applications

Neural Computation

The principled design of large-scale recursive neural network architectures-DAG-RNNs and the protein structure prediction problem

Journal of Machine Learning Research

A training algorithm for optimal margin classifiers

Use of neural network to determine the boiling point of alkanes

Journal Chemistry Society, Faraday Transactions

Prediction of the secondary structure of proteins from their amino acid sequence

Advances Enzymol Related Areas Monecular Biology

Convolution Kernels for Natural Language

Support vector networks

Machine Learning

An Introduction to Support Vector Mechines and other Kernel-Based Learning Methods

Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity

Journal of Medicinal Chemistry

Chemical space and biology

Nature

A Modification of the Jaccard/ Tanimoto Similarity Index for Diverse Selection of Chemical Compounds Using Binary Strings

Technometrics,

On the properties of bit string-based measures of chemical similarity

Journal of Chemical Information and Computer Science

A general framework for adaptive processing of data structures

IEEE Transactions on Neural Networks

Large margin classification using the perceptron algorithm

Machine Learning

Graphical models for machine learning and digital communication

The Kernel-Adatron Algorithm: a Fast and Simple Learning Procedure for Support Vector Machines

Chemical information in 3D-space

Journal of Chemical Information and Computer Sciences

Learning task-dependent distributed structure-representations by backpropagation through structure

IEEE International Conference on Neural Networks, pages

A general coefficient of similarity and some of its properties

Biometrics

Metric and euclidean properties of dissimilarity coefficients

Journal of Classification

Quantitative structure-activity relationship of the benzodiazepines. A review and reevaluation

Chemical Reviews

The meaning and use of the area under a receiver operating characteristic (roc) curve

Radiology

A tutorial on learning with Bayesian networks

The predictive toxicology challenge 2000-2001

Bioinformatics

Learning kernel classifiers, theory and algorithms

Exploiting generative models in discriminative classifiers

2005 Speical Issue
Graph kernels for chemical informatics