Introduction
GCL methods | SimGRL | |
---|---|---|
Data augmentations | Yes | No |
The channel of encoder | Two | One |
The number of negative nodes | Large | One |
-
We propose a simple and novel SSL paradigm for graph representation learning, called SimGRL. SimGRL can efficiently work with a single-channel encoder compared with the prior graph contrastive methods that have a dual-channel encoder.
-
We present a distributor that generates triplets as contrastive views of nodes, allowing SimGRL to perform well without data augmentations.
-
We design a triplet loss based on adjacency information that only leverages a negative node for every object node, considerably reducing memory overhead.
-
We empirically show that SimGRL achieves competitive performance on both node classification tasks and graph classification tasks, especially on running time and memory overhead.
Related work
Graph representation learning
Self-supervised graph representation learning
Triplet loss
Problem definition
Proposed approach
Encoder
Distributor
Triplet loss
Datasets | Hom. ratio \(h_{r}\) | #Nodes | #Edges | Avg.#Neighbors per node | #Features | #Classes | Train/Val/Test nodes |
---|---|---|---|---|---|---|---|
Cora | 0.81 | 2708 | 5429 | 2.00 | 1433 | 7 | 140/500/1000 |
Citeseer | 0.74 | 3327 | 4732 | 1.42 | 3703 | 6 | 120/500/1000 |
Pubmed | 0.80 | 19,717 | 44,338 | 2.24 | 500 | 3 | 60/500/1000 |
ogbn-arxiv | 0.61 | 169,343 | 1,166,243 | 13.7 | 128 | 40 | 90941/29799/48603 |
ogbn-products | 0.79 | 2,449,029 | 61,859,140 | 50.5 | 100 | 47 | 196615/39323/2213091 |
Datasets | #Graph | #Classes | Avg.#Nodes per graph | Avg.#Edges per graph |
---|---|---|---|---|
MUTAG | 188 | 2 | 17.93 | 19.79 |
IMDB-BIN | 1000 | 2 | 19.77 | 193.06 |
IMDB-MULTI | 1500 | 3 | 13 | 65.93 |
Experiments and analysis
-
Q1: How does SimGRL compare to state-of-the-art methods on the node classification and graph classification tasks in terms of accuracy?
-
Q2: Is the proposed SimGRL framework efficient in comparison to state-of-the-art methods in terms of running time and memory overhead?
-
Q3: How does SimGRL perform when utilizing different distributor selectors?
-
Q4: Is the proposed SimGRL framework robust when noisy nodes invade triplets?
Datasets
Experimental settings
Methods | Input | Cora | Citeseer | Pubmed |
---|---|---|---|---|
Planetoid | X, A, Y | 75.7±0.0 | 64.7±0.0 | 77.2±0.0 |
Chebyshev | X, A, Y | 81.2±0.0 | 69.8±0.0 | 74.4±0.0 |
GCN | X, A, Y | 81.5±0.0 | 70.3±0.0 | 79.0±0.0 |
SGC | X, A, Y | 81.0±0.0 | 71.9±0.0 | 78.9±0.0 |
GAT | X, A, Y | 83.0±0.7 | 72.5±0.7 | 79.0±0.3 |
SimGRL(ours) | X, A | 84.8±0.3 | 72.7±0.4 | 80.7±0.4 |
Methods | Input | Cora | Citeseer | Pubmed |
---|---|---|---|---|
Raw features | X, A | 47.9±0.4 | 49.3±0.2 | 69.1±0.3 |
DeepWalk | X, A | 67.2±0.0 | 43.2±0.0 | 65.3±0.0 |
GAE | X, A | 71.5±0.4 | 65.8±0.4 | 72.1±0.5 |
DGI | X, A | 82.3±0.6 | 71.8±0.7 | 76.8±0.6 |
Grace | X, A | 83.3±0.4 | 72.1±0.5 | 79.5±1.1 |
GraphCL | X, A | 83.6±0.5 | 72.5±0.7 | 79.8±0.5 |
GCA | X, A | 80.4±1.7 | 67.4±0.7 | OOM-A |
BGRL | X, A | 73.5±1.5 | 58.8±1.4 | 73.3±1.5 |
SelfGNN | X, A | 81.0±0.2 | 67.1±0.4 | 80.5±0.2 |
SimGRL(ours) | X, A | 84.8 ±0.3 | 72.7 ±0.4 | 80.7 ±0.4 |
Methods | Input | ogbn-arxiv | ogbn-products |
---|---|---|---|
MLP | X, A, Y | 55.2±0.2 | 61.3±0.2 |
GCN | X, A, Y | 71.7±0.1 | 70.6±0.1 |
Node2vec | X, A | 69.8±0.1 | 68.5±0.1 |
DGI | X, A | OOM-B | OOM-B |
Grace | X, A | OOM-B | OOM-B |
GraphCL | X, A | OOM-B | OOM-B |
GCA | X, A | OOM-B | OOM-B |
BGRL | X, A | OOM-B | OOM-B |
SelfGNN | X, A | 70.2±0.2 | OOM-B |
SimGRL(ours) | X, A | 71.5±0.1 | 72.2±0.1 |
Baselines
Methods | MUTAG | IMDB-BINARY | IMDB-MULTI |
---|---|---|---|
SP | 85.2±2.4 | 55.6±0.2 | 38.0±0.3 |
GK | 81.7±2.1 | 65.9±1.0 | 43.9±0.4 |
WL | 80.7±3.0 | 72.3±3.4 | 47.0±0.5 |
DGK | 87.4±2.7 | 67.0±0.6 | 44.6±0.5 |
MLG | 87.9±1.6 | 66.6±0.3 | 41.2±0.0 |
SimGRL(ours) | 89.1 ±0.6 | 74.5 ±0.6 | 51.4 ±0.4 |
Methods | MUTAG | IMDB-BINARY | IMDB-MULTI |
---|---|---|---|
GraphSAGE | 85.1±7.6 | 72.3±5.3 | 50.9±2.2 |
GCN | 85.6±5.8 | 74.0±3.4 | 51.9±3.8 |
GIN-0 | 89.4±5.6 | 75.1±5.1 | 52.3±2.8 |
GIN-\(\epsilon \) | 89.0±6.0 | 74.3±5.1 | 52.1±3.6 |
GAT | 89.4±6.1 | 70.5±2.3 | 47.8±3.1 |
SimGRL (ours) | 89.1±0.6 | 74.5±0.6 | 51.4±0.4 |
Methods | MUTAG | IMDB-BINARY | IMDB-MULTI |
---|---|---|---|
Random walk | 83.7±1.5 | 50.7±0.3 | 34.7±0.2 |
Node2vec | 72.6±10.2 | – | – |
Sub2vec | 61.1±15.8 | 55.3±1.5 | 36.7±0.8 |
Graph2vec | 83.2±9.6 | 71.1±0.5 | 50.4±0.9 |
InfoGraph | 89.0±1.1 | 73.0±0.9 | 49.7±0.5 |
HTC | 91.8±0.5 | 73.3±0.5 | 50.5±0.3 |
SimGRL(ours) | 89.1±0.6 | 74.5±0.6 | 51.4±0.4 |
Cora | CiteSeer | Pubmed | ogbn-arxiv | ogbn-products | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Running time | Memory overhead | Running time | Memory overhead | Running time | Memory overhead | Running time | Memory overhead | Running time | Memory overhead | |
DGI | 10 s | 3400MB | 11 s | 7300MB | – | OOM-A | – | OOM-B | – | OOM-B |
GraphCL | 21 s | 6100MB | 17 s | 7400MB | – | OOM-A | – | OOM-B | – | OOM-B |
GCA | 62 s | 1200MB | 83 s | 1500MB | – | OOM-A | – | OOM-B | – | OOM-B |
SelfGNN | 61 s | 1400MB | 129 s | 1600MB | 170 s | 3800MB | 2137 s | 15357MB | – | OOM-B |
SimGRL(ours) | 0.7 s | 700MB | 0.8 s | 900MB | 8.1 s | 1100MB | 97 s | 7427MB | 720 s | 40017MB |