1 Introduction
-
Feature maps of composed kernels We review closure properties of kernels, the corresponding feature maps and the size and sparsity of the feature vectors. Based on this, we obtain explicit feature maps for convolution kernels with arbitrary base kernels. This generalizes the result of the conference paper, where binary base kernel were considered.
-
Construction of explicit feature maps We derive explicit feature maps for weighted vertex kernels and the shortest-path kernel (Borgwardt and Kriegel 2005) supporting base kernels with explicit feature maps for the comparison of attributes. We prove approximation guarantees in case of approximative feature maps of base kernels. This contribution is not contained in the conference paper, where only the explicit computation of the shortest-path kernel for graphs with discrete labels was discussed.
-
Fixed length walk kernels We generalize the explicit computation scheme to support arbitrary vertex and edge kernels with explicit feature maps for the comparison of attributes. In the conference paper only binary kernels were considered. Moreover, we have significantly expanded the section on walk kernels by spelling out all proofs, adding illustrative figures and clarifying the relation to the k-step random walk kernel as defined by Sugiyama and Borgwardt (2015).
-
Experimental evaluation We largely extended our evaluation, which now includes experiments for the novel computation schemes of graph kernels as well as a comparison between a graphlet kernel and the subgraph matching kernel (Kriege and Mutzel 2012).
2 Related work
2.1 Graph kernels
2.2 Embedding techniques for attributed graphs
3 Preliminaries
4 Basic kernels, composed kernels and their feature maps
Kernel | Feature map | Dimension | Sparsity |
---|---|---|---|
\(k^{\alpha }(x,y) = \alpha k(x,y)\) | \(\phi ^{\alpha }(x) = \sqrt{\alpha }\phi (x)\) | d | \(\mathsf {nnz}(\phi (x))\) |
\(k^{+}(x,y) = \sum _{i =1}^{D}k_i(x,y)\) | \(\phi ^{+}(x) = \bigoplus _{i=1}^D \phi _i(x)\) | \(\sum ^D_{i=1} d_i\) | \(\sum _{i=1}^{D} \mathsf {nnz}(\phi _i(x))\) |
\(k^{\bullet }(x,y) = \prod _{i =1}^{D}k_i(x,y)\) | \(\phi ^{\bullet }(x) = \bigotimes _{i=1}^D \phi _i(x)\) | \(\prod ^D_{i=1} d_i\) | \(\prod _{i=1}^{D} \mathsf {nnz}(\phi _i(x))\) |
\(k^\times (X,Y) = \sum _{x \in X} \sum _{y \in Y} k(x,y)\) | \(\phi ^{\times }(X) = \sum _{x \in X} \phi (x)\) | d | \(\left| \bigcup _{x \in X} \mathsf {nz}(\phi (x))\right| \) |
4.1 Dirac and binary kernels
4.2 Closure properties
4.3 Kernels on sets
4.4 Convolution kernels
5 Computing graph kernels by explicit and implicit feature maps
5.1 Weighted vertex kernels
5.1.1 Weight kernels
5.1.2 Vertex kernels
5.1.3 Computing explicit feature maps
5.2 Fixed length walk kernels
5.2.1 Basic definitions
5.2.2 Walk and convolution kernels
5.2.3 Implicit kernel computation
5.2.4 Explicit kernel computation
5.3 Shortest-path kernel
5.4 Graphlet, subgraph and subgraph matching kernels
Graph kernel | Parts | Dimension | Running time (implicit) |
---|---|---|---|
GraphHopper | \(\mathcal {O}(n)\) | \(\delta ^2 d_V\) | \(\mathcal {O}\left( n^2(m + \log n + {\mathsf {T}}_{V} + \delta ^2 )\right) \) |
GraphInvariant | \(\mathcal {O}(n)\) | \(C d_V\) | \(\mathcal {O}\left( hm + n^2 {\mathsf {T}}_{V}\right) \) |
FixedLengthWalk | \(\mathcal {O}(\varDelta ^\ell )\) | \(d_V+(d_V d_E)^\ell \) | \(\mathcal {O}\left( \ell (n^2+m^2) + n^2{\mathsf {T}}_{V}+m^2{\mathsf {T}}_{E}\right) \) |
ShortestPath | \(\mathcal {O}(n^2)\) | \(d_V^2 d_E\) | \(\mathcal {O}\left( n^2{\mathsf {T}}_{V}+n^4{\mathsf {T}}_{E}\right) \) |
SubgraphMatching | \(\mathcal {O}(n^s)\) | \((d_V d_E^2)^s s!\) | \(\mathcal {O}\left( sn^{2s+2} + n^2{\mathsf {T}}_{V}+n^4{\mathsf {T}}_{E} \right) \) |
5.5 Discussion
6 Experimental evaluation
6.1 Experimental setup
HashMap
class was used to store feature vectors. Due to the varied memory requirements of the individual series of experiments, different hardware platforms were used in Sects. 6.2, 6.4 and 6.5. In order to compare the running time of both computational strategies systematically without the dependence on one specific kernel method, we report the running time to compute the quadratic kernel matrices, unless stated otherwise. We performed classification experiments using the C-SVM implementation LIBSVM (Chang and Lin 2011). We report mean prediction accuracies obtained by 10-fold cross-validation repeated 10 times with random fold assignments. Within each fold all necessary parameters were selected by cross-validation based on the training set. This includes the regularization parameter C selected from \(\{10^{-3}, 10^{-2}, \dots , 10^3\}\), all kernel parameters, where applicable, and whether to normalize the kernel matrix.6.1.1 Data sets
Data set | Properties | |||||
---|---|---|---|---|---|---|
Graphs | Classes | Avg. |V| | Avg. |E| | Vertex/edge labels | Attributes | |
Mutag | 188 | 2 | 17.9 | 19.8 | \(+\)/\(+\) | − |
U251 | 3 755 | 2 | 23.1 | 24.8 | \(+\)/\(+\) | − |
Enzymes | 600 | 6 | 32.6 | 62.1 | \(+\)/\(+\) | 18 |
Proteins | 1 113 | 2 | 39.1 | 72.8 | \(+\)/− | 1 |
SyntheticNew | 300 | 2 | 100.0 | 196.3 | −/− | 1 |
Synthie | 400 | 4 | 95.0 | 172.9 | −/− | 15 |
6.2 Approximative explicit feature maps of kernels for attributed graphs (Q1)
6.2.1 Method
6.2.2 Results and discussion
6.3 Kernels for graphs with discrete labels (Q2)
Kernel | Mutag | U251 | Enzymes | Proteins | SyntheticNew | Synthie |
---|---|---|---|---|---|---|
Implicit | ||||||
\(\hbox {FLRW}_0\) | 0.618 | 250.3 | 20.67 | 100.8 | 159.5 | 224.2 |
\(\hbox {FLRW}_1\) | 0.606 | 281.6 | 23.45 | 116.4 | 202.3 | 284.1 |
\(\hbox {FLRW}_2\) | 0.652 | 303.4 | 26.03 | 132.0 | 236.2 | 330.7 |
\(\hbox {FLRW}_3\) | 0.617 | 323.8 | 28.18 | 143.0 | 270.4 | 377.1 |
\(\hbox {FLRW}_4\) | 0.653 | 343.8 | 30.63 | 156.2 | 304.3 | 424.2 |
\(\hbox {FLRW}_5\) | 0.693 | 363.7 | 32.65 | 169.5 | 336.9 | 468.6 |
\(\hbox {FLRW}_6\) | 0.733 | 383.5 | 34.86 | 182.5 | 371.2 | 513.6 |
\(\hbox {FLRW}_7\) | 0.779 | 404.4 | 36.94 | 195.8 | 404.4 | 558.9 |
\(\hbox {FLRW}_8\) | 0.870 | 425.3 | 38.16 | 208.8 | 438.0 | 603.9 |
\(\hbox {FLRW}_9\) | 0.877 | 447.7 | 39.97 | 221.3 | 470.1 | 648.1 |
SP | 5.272 | 2 h 6\(^{\prime }\)55\(^{\prime \prime }\) | 9\(^{\prime }\)8\(^{\prime \prime }\) | 4 h 58\(^{\prime }\)40\(^{\prime \prime }\) | 2h 14\(^{\prime }\)23\(^{\prime \prime }\) | 2 h 39\(^{\prime }\)30\(^{\prime \prime }\) |
CSM | 15.45 | 4 h 0\(^{\prime }\)16\(^{\prime \prime }\) | 34\(^{\prime }\)15\(^{\prime \prime }\) | OOM | \(>\,\)24 h | \(>\,\)24 h |
Explicit | ||||||
\(\hbox {FLRW}_0\) | 0.004 | 0.868 | 0.029 | 0.081 | 0.008 | 0.016 |
\(\hbox {FLRW}_1\) | 0.014 | 2.827 | 0.080 | 0.141 | 0.024 | 0.032 |
\(\hbox {FLRW}_2\) | 0.019 | 7.844 | 0.170 | 0.251 | 0.040 | 0.051 |
\(\hbox {FLRW}_3\) | 0.035 | 14.96 | 0.466 | 0.545 | 0.056 | 0.070 |
\(\hbox {FLRW}_4\) | 0.058 | 31.73 | 1.518 | 1.207 | 0.072 | 0.092 |
\(\hbox {FLRW}_5\) | 0.147 | 64.57 | 4.629 | 2.991 | 0.089 | 0.110 |
\(\hbox {FLRW}_6\) | 0.461 | 107.8 | 13.58 | 6.476 | 0.104 | 0.128 |
\(\hbox {FLRW}_7\) | 1.127 | 170.9 | 37.72 | 12.07 | 0.124 | 0.150 |
\(\hbox {FLRW}_8\) | 2.491 | 346.0 | 95.03 | 24.48 | 0.141 | 0.172 |
\(\hbox {FLRW}_9\) | 4.809 | 646.8 | 278.4 | 56.62 | 0.161 | 0.192 |
SP | 0.120 | 27.82 | 0.907 | 3.121 | 1.332 | 1.459 |
GL | 0.011 | 3.512 | 0.205 | 0.354 | 0.186 | 0.310 |