Elsevier

Neural Networks

Volume 18, Issue 8, October 2005, Pages 1093-1110
Neural Networks

2005 Speical Issue
Graph kernels for chemical informatics

https://doi.org/10.1016/j.neunet.2005.07.009Get rights and content

Abstract

Increased availability of large repositories of chemical compounds is creating new challenges and opportunities for the application of machine learning methods to problems in computational chemistry and chemical informatics. Because chemical compounds are often represented by the graph of their covalent bonds, machine learning methods in this domain must be capable of processing graphical structures with variable size. Here, we first briefly review the literature on graph kernels and then introduce three new kernels (Tanimoto, MinMax, Hybrid) based on the idea of molecular fingerprints and counting labeled paths of depth up to d using depth-first search from each possible vertex. The kernels are applied to three classification problems to predict mutagenicity, toxicity, and anti-cancer activity on three publicly available data sets. The kernels achieve performances at least comparable, and most often superior, to those previously reported in the literature reaching accuracies of 91.5% on the Mutag dataset, 65–67% on the PTC (Predictive Toxicology Challenge) dataset, and 72% on the NCI (National Cancer Institute) dataset. Properties and tradeoffs of these kernels, as well as other proposed kernels that leverage 1D or 3D representations of molecules, are briefly discussed.

Section snippets

Introduction: processing structured data

Many problems in artificial intelligence, data mining, and machine learning involve variable-size structured data, such as strings and sequences, trees, and directed or undirected graphs. These data are to be contrasted with the standard, fixed-length, vectorial data found also in many applications. Examples of structured data include: (1) text and documents in information retrieval; (2) DNA/RNA/protein sequences and evolutionary trees in bioinformatics; and (3) molecular structures in chemical

Kernel methods

Although the basic idea behind kernel methods is quite old (e.g. Kimeldorf & Wahba, 1971), over the last decade it has regained considerable interest spurred by the work of Boser et al., 1992, Cortes and Vapnik, 1995 on support vector machines. Many new kernel methods have been developed and/or successfully applied to challenging problems in recent years (Cristianini and Shawe-Taylor, 2000, Schölkopf and Smola, 2002).

In essence, kernel methods handle non-linear complex tasks using linear

Convolution kernels for chemistry

In this section, we develop classes of graph kernels that can be viewed as spectral kernels derived by counting and comparing walks in the underlying graphs. To circumvent the limitations posed by other graph kernels and address problems of molecular classification, we propose to use and extend the common technique of molecular fingerprinting (Flower, 1998, Raymond and Willett, 2001) used by chemists to build new feature vector representations of molecules. What differentiates our approach from

Data

We apply the kernels introduced in the previous section together with the Voted Perceptron classifier to the problems of predicting mutagenicity, toxicity, and anti-cancer activity on three different data sets (Mutag, PTC, and NCI).

Results

We conducted several experiments to compare the various classes of kernels with different parameter settings. Here, we report representative subsets of results together with the main findings. In addition to the usual accuracy measure, we also measure the ROC score, which is the normalized area under the ROC curve plotting the proportion of true positive (TP) predictions as a function of the proportion of false positive (FP) predictions, as the classification threshold is varied (Hanley &

Discussion

We have reviewed graph kernels and developed new graph kernels for chemical molecules that yield state-of-the art results on several datasets. These kernels measure the similarity between feature vectors, or molecular fingerprints, consisting of binary vectors or vectors of counts associated with all labeled paths of length less or equal to some value d derived by depth-first searches, starting from each vertex of a molecule. The accuracies obtained, for instance, 72% on the NCI dataset, are

Acknowledgements

Work supported by a Laurel Wilkening Faculty Innovation award, an NIH Biomedical Informatics Training grant (LM-07443-01), an NSF MRI grant (EIA-0321390), a Sun Microsystems award, a grant from the University of California Systemwide Biotechnology Research and Education Program to PB and an MD/PhD Harvey Fellowship to S.J.S. We would also like to acknowledge OpenEye Scientific Software for their free academic software license.

References (78)

  • G. Kimeldorf et al.

    Some results on Tchebycheffian spline functions

    Journal of Mathematical and Analytical Application

    (1971)
  • A. Srinivasan et al.

    Theories for mutagenicity: a study of first-order and feature based induction

    Artificial Intelligence

    (1996)
  • M. Aizerman et al.

    Theoretical foundations of the potential function method in pattern recognition learning

    Automation and Remote Control

    (1964)
  • F.R. Bach et al.

    Kernel independent component analysis

    Journal of Machine Learning Research

    (2002)
  • Chen, J., Swamidass, S. J., Dou, Y., Bruand, J., & Baldi, P. (2005). ChemDB: a public, database of small molecules and...
  • P. Baldi et al.

    Bioinformatics: the machine learning approach

    (2001)
  • P. Baldi et al.

    Hybrid modeling, HMM/NN architectures, and protein applications

    Neural Computation

    (1996)
  • P. Baldi et al.

    The principled design of large-scale recursive neural network architectures-DAG-RNNs and the protein structure prediction problem

    Journal of Machine Learning Research

    (2003)
  • Baldi, P., & Rosen-Zvi, M. On the relationship between deterministic and probabilistic directed graphical models: from...
  • B. Boser et al.

    A training algorithm for optimal margin classifiers

    (1992)
  • D. Cherqaoui et al.

    Use of neural network to determine the boiling point of alkanes

    Journal Chemistry Society, Faraday Transactions

    (1994)
  • P.Y. Chou et al.

    Prediction of the secondary structure of proteins from their amino acid sequence

    Advances Enzymol Related Areas Monecular Biology

    (1978)
  • M. Collins et al.

    Convolution Kernels for Natural Language

    (2002)
  • C. Cortes et al.

    Support vector networks

    Machine Learning

    (1995)
  • N. Cristianini et al.

    An Introduction to Support Vector Mechines and other Kernel-Based Learning Methods

    (2000)
  • A.K. Debnath et al.

    Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity

    Journal of Medicinal Chemistry

    (1991)
  • C.M. Dobson

    Chemical space and biology

    Nature

    (2004)
  • Dumais, S., Platt, J., Heckerman D., & Sahami, M. (1998). Inductive learning algorithms and representations for text...
  • M.A. Fligner et al.

    A Modification of the Jaccard/ Tanimoto Similarity Index for Diverse Selection of Chemical Compounds Using Binary Strings

    Technometrics,

    (2002)
  • D.R. Flower

    On the properties of bit string-based measures of chemical similarity

    Journal of Chemical Information and Computer Science

    (1998)
  • P. Frasconi et al.

    A general framework for adaptive processing of data structures

    IEEE Transactions on Neural Networks

    (1998)
  • Y. Freund et al.

    Large margin classification using the perceptron algorithm

    Machine Learning

    (1999)
  • B.J. Frey

    Graphical models for machine learning and digital communication

    (1998)
  • T. Friess et al.

    The Kernel-Adatron Algorithm: a Fast and Simple Learning Procedure for Support Vector Machines

  • Gärtner, T. (2003). Exponential and Geomteric Kernels for Graphs. NIPS Workshop on Unreal Data: Principles of Modeling...
  • Gärtner, T., Flach, P.A. & Wrobel, S. (2003). On Graph Kernels: Hardness Results and Efficient Alternatives. In Proc....
  • J. Gasteiger et al.

    Chemical information in 3D-space

    Journal of Chemical Information and Computer Sciences

    (1996)
  • C. Goller et al.

    Learning task-dependent distributed structure-representations by backpropagation through structure

    IEEE International Conference on Neural Networks, pages

    (1996)
  • J.C. Gower

    A general coefficient of similarity and some of its properties

    Biometrics

    (1971)
  • J.C. Gower et al.

    Metric and euclidean properties of dissimilarity coefficients

    Journal of Classification

    (1986)
  • D. Hadjipavlou-Litina et al.

    Quantitative structure-activity relationship of the benzodiazepines. A review and reevaluation

    Chemical Reviews

    (1994)
  • J.A. Hanley et al.

    The meaning and use of the area under a receiver operating characteristic (roc) curve

    Radiology

    (1982)
  • Haussler, D. (1999). Convolution Kernels on Discrete Structures. Technical Report UCS-CRL-99-10, UC Santa...
  • D. Heckerman

    A tutorial on learning with Bayesian networks

  • C. Helma et al.

    The predictive toxicology challenge 2000-2001

    Bioinformatics

    (2001)
  • R. Herbrich

    Learning kernel classifiers, theory and algorithms

    (2002)
  • Horváth, T., Gärtner, T., & Wrobel, S. (2004). Cyclic pattern kernels for predictive graph mining. In Proc. of the 2004...
  • T. Jaakkola et al.

    Exploiting generative models in discriminative classifiers

  • James, C.A., Weininger, D., & Delany, J. (2004). Daylight theory manual. Available at...
  • Cited by (0)

    View full text