Skip to main content
Erschienen in: Complex & Intelligent Systems 5/2023

Open Access 27.02.2023 | Original Article

SimGRL: a simple self-supervised graph representation learning framework via triplets

verfasst von: Da Huang, Fangyuan Lei, Xi Zeng

Erschienen in: Complex & Intelligent Systems | Ausgabe 5/2023

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recently, graph contrastive learning (GCL) has achieved remarkable performance in graph representation learning. However, existing GCL methods usually follow a dual-channel encoder network (i.e., Siamese networks), which adds to the complexity of the network architecture. Additionally, these methods overly depend on varied data augmentation techniques, corrupting graph information. Furthermore, they are heavily reliant on large quantities of negative nodes for each object node, which requires tremendous memory costs. To address these issues, we propose a novel and simple graph representation learning framework, named SimGRL. Firstly, our proposed network architecture only contains one encoder based on a graph neural network instead of a dual-channel encoder, which simplifies the network architecture. Then we introduce a distributor to generate triplets to obtain the contrastive views between nodes and their neighbors, avoiding the need for data augmentations. Finally, we design a triplet loss based on adjacency information in graphs that utilizes only one negative node for each object node, reducing memory overhead significantly. Extensive experiments demonstrate that SimGRL achieves competitive performance on node classification and graph classification tasks, especially in terms of running time and memory overhead.
Hinweise

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

Recently, graph representation learning has emerged as an effective way to analyze graph-structured data, which mostly relies on graph neural networks (GNNs). Graph representation learning aims to learn high-quality embeddings that preserve the topological information of graphs and node properties. However, most prior GNN models, such as GCN [20] and GAT [36], still follow the supervised learning paradigm that requires plenty of labeled training data. Obtaining annotated labels for graphs is often prohibitively expensive or impossible, limiting their applicability to real-world scenarios. For example, producing labels for molecular graphs usually requires costly calculations. To address this drawback of supervised learning, self-supervised learning (SSL) has been presented as a method that can effectively utilize vast amounts of unlabeled data during training.
Motivated by the great achievement of SSL based on contrastive learning in computer vision (CV) [4, 5, 13] and natural language processing (NLP) [7, 10], there has been substantial research attention given to the application of SSL to graph domains. Inspired by Deep InfoMax [13], DGI [37] first employed contrastive learning in the graph domain. Recently, major graph contrastive methods have been adapted from the SimCLR [4], which follow a dual-channel network (i.e., the Siamese network) and rely on data augmentations. According to the SimCLR [4], stronger data augmentations used for images can bring more benefits. To this end, Grace [49] and GraphCL [47] proposed several kinds of data augmentations applicable to graphs. Moreover, GCA [50] introduced a kind of adaptive augmentation used for graphs, which was a more flexible data augmentation strategy for graph contrastive learning (GCL).
Although the aforementioned methods have significant performance in graph representation learning, they still suffer from the following limitations: Firstly, existing GCL methods heavily depend on data augmentations. Due to the complexity of graphs, the strong data augmentation techniques available for graphs are generally non-trivial. At the same time, data augmentation usually damages graph information, including node features and structure information. In addition, there is no standard data augmentation technique for graphs that can effectively adapt to every task. Thus, the various kinds of data augmentations are usually selected according to the specific scenarios, restricting their scope of application. Then, after obtaining two augmented views via data augmentations, these pairs of augmented representations have to be sent to the dual-channel network, followed by the projection head, for training. Also, this process increases network complexity. Finally, for most existing graph contrastive methods, each object node has been compared with large quantities of negative nodes that contain every node in the graph except for the node itself, which raises computation and memory costs. Albeit a recent work by BGRL [34] aims to throw away negative nodes during training, it calls for an elaborate asymmetric Siamese network. From these perspectives, we naturally wonder whether one method can overcome these limitations as simply as possible.
In this paper, we propose a simple self-supervised graph representation learning framework, named SimGRL, to solve all of the above limitations. Firstly, unlike existing GCL methods with the two-encoder network, we only employ an encoder based on graph neural networks, aggregating information from node features and topological structure. Then, we propose a distributor capable of generating triplets to obtain contrastive views of nodes (positive pairs), which consist of nodes and their corresponding neighbor nodes. Owing to the distributor only relying on adjacency information, our proposed method can avoid using diverse data augmentations to gain contrastive views of nodes. Furthermore, we utilize a triplet loss based on adjacency information, which only focuses on a negative node for every object node. Compared with the losses of the aforementioned graph contrastive methods, this type of loss requires much less memory overhead. In the meantime, this triplet loss allows our model to maximize the similarity between positive pairs, promoting the formation of high-quality features. We illustrate the main difference between our method and the prior graph contrastive methods in Table 1.
Table 1
Comparison between SimGRL and the previous graph contrastive learning (GCL) methods. The first column denotes model components and strategies
 
GCL methods
SimGRL
Data augmentations
Yes
No
The channel of encoder
Two
One
The number of negative nodes
Large
One
To sum up, our main contributions are summarized as follows:
  • We propose a simple and novel SSL paradigm for graph representation learning, called SimGRL. SimGRL can efficiently work with a single-channel encoder compared with the prior graph contrastive methods that have a dual-channel encoder.
  • We present a distributor that generates triplets as contrastive views of nodes, allowing SimGRL to perform well without data augmentations.
  • We design a triplet loss based on adjacency information that only leverages a negative node for every object node, considerably reducing memory overhead.
  • We empirically show that SimGRL achieves competitive performance on both node classification tasks and graph classification tasks, especially on running time and memory overhead.
The rest of the paper is organized as follows: Sect. 2 discusses the related work. Section 3 introduces the problem definition of self-supervised graph representation learning. Section 4 describes the proposed SimGRL framework. Section 5 presents an extensive experimental analysis of the proposed method. Finally, Sect. 6 concludes the paper.

Graph representation learning

Graph representation learning has provided a powerful capacity to leverage the rich information of graph-structured data, efficiently applied in various domains [29, 39, 45]. As a result, numerous approaches to learning graph representations have been proposed in recent years.
For example, inspired by word2vec [49], DeepWalk [28] and node2vec [11] utilize the skip-gram model to generate node representations by adopting random walks across nodes. However, this kind of method cannot fully leverage node features [46] and contains a multi-step pipeline that requires individual optimization for each step, such as random walk generation and semi-supervised training [20].Graph kernel approaches [21, 23] aim to decompose the graph into substructures and use appropriate kernel functions for measuring graph similarity. The mapping function that helps graph kernels embed graphs or nodes into vector spaces is deterministic [41]. Even worse, the computation of all kernel values for graph kernels takes quadratic time [26]. Graph auto-encoders [19, 27] seek to train an encoder that maps input representations to intermediate representations and a decoder that tries to recover input representations based on intermediate representations. Their key is to minimize the reconstruction error between original input representations and recovered input representations [41].
Graph neural networks (GNNs) are popular methods for learning graph representations [16]. In the GNNs framework, each GNNs layer (i.e., a learnable mapping function) intends to obtain the embedding of a node by combining data from the node’s neighbors based on the non-linear transformation and the aggregation function [2]. Compared to DeepWalk and Node2Vec, it is a single-step training framework that can effectively use node features. In addition, the primary benefit of GNN over graph kernels is that their complexity scales linearly with the number of samples [26]. Moreover, graph neural networks can be considered the encoder component of graph auto-encoders. Hence, rather than employing the others, we choose the graph neural network as our encoder because it is more effective and efficient.

Self-supervised graph representation learning

Recently, motivated by the significant success of self-supervised learning (SSL) methods in CV and NLP, many researchers have sought to apply SSL methods to graph domains. Following the idea of Deep Infomax [13], DGI [37] and InfoGraph [33] intend to maximize the mutual information (MI) between two different levels of representation. Inspired by the SimCLR [4], a series of SSL methods for graph representation learning have been proposed, achieving state-of-the-art performance in graph representation learning. For example, Grace [49] introduced two specific schemes to obtain two augmented views of graphs, including removing edges and masking features. GraphCL [47] proposed four types of graph augmentation techniques, such as node dropping, edge perturbation, and subgraph. Moreover, GCA [50] provided a kind of adaptive data augmentation used in graphs. Their key idea is to maximize the similarity between two augmented views of graphs sent to the dual-channel network.
However, the above methods still follow the dual-channel network architecture and heavily depend on diverse data augmentations. At the same time, these methods also heavily rely on massive negative nodes. In contrast to them, our proposed method avoids using data augmentations and employs a single-channel network as our framework. Most importantly, we only employ a negative node for every object node, considerably decreasing memory overhead.

Triplet loss

As a popular method in supervised visual representation learning, triplet loss attempts to learn the discriminative feature representations, which is designed in such a way as to pull similar sample pairs from the same class closer and push away dissimilar sample pairs from different classes. Due to the effectiveness of triplet loss, it has been widely applied in various computer vision tasks in a supervised manner, such as image retrieval [24], face recognition [9], person re-identification [48], and object tracking [8].
However, in contrast to the above methods based on supervised learning in computer vision, we design a new triplet loss for self-supervised graph representation learning, which is used as our objective in this work. Due to the original triplet loss that fails to address self-supervised tasks, our loss is based on the adjacency information of graphs for adapting self-supervised tasks, instead of using the label information.

Problem definition

In this section, we formally define the problem of self-supervised graph representation learning. Self-supervised graph representation learning aims to obtain good high-level node representations without leveraging any label information about nodes.
To be specific, suppose we are given a set of features, \(\varvec{X}\)=\(\left\{ \varvec{x}_{1},\varvec{x}_{2},...,\varvec{x}_{N}\right\} \in {R^{N\times {M}}}\), where N is the number of nodes in the graph and \(\varvec{x}_{i}\in {R^{M}}\) is a M dimensional real-value attribute vector associated with node i. In addition, we are provided the graph-structured information of the undirected graph in the form of an adjacency matrix \(\varvec{A}\in {R^{N\times {N}}}\). \(\varvec{A}_{ij}=1\) if node i and node j are connected and \(\varvec{A}_{ij}=0\) otherwise. We intend to learn an encoder \(f(\cdot )\), which maps the original node feature into a high-level representation space based on the information of node and topological information of the graph, such that \(f(\varvec{X},\varvec{A})=\varvec{H}=\left\{ \varvec{h}_{1},\varvec{h}_{2},...,\varvec{h}_{N}\right\} \in {R^{N\times {d}}}\) represents the high-level representation matrix, where \(\varvec{h_{i}}\in {R^{d}}\) denotes the high-level representation for each node i and d is the embedding dimension. Then, these high-level representations can be exploited for downstream tasks, such as node classification and graph classification tasks.

Proposed approach

In this section, we propose a simple graph representation learning framework, called SimGRL. The overall architecture of our proposed method is shown in Fig. 1. In what follows, we will introduce its main components in detail.

Encoder

The core component of the prior graph contrastive methods is the Siamese network with two encoders. By obtaining two augmented views of graphs via data augmentations, these augmented views of graphs (positive pairs) have to be sent to the Siamese network to maximize the agreement between positive pairs. However, we argue that data augmentation is a needless component in our work, and thus there is no need to use the two-encoder network. Hence, we employ only one encoder based on the vanilla graph neural network(GCN) [20] in our proposed network, which is used to gather the information from nodes’ features and topological structure. The encoder f aims to map the node features into high-level space to generate high-quality features. Formally, we formulate the encoder architecture as follows:
$$\begin{aligned} \varvec{H}={f}(\varvec{X}, \varvec{A})=\sigma \left( \varvec{\hat{D}}^{-\frac{1}{2}} \varvec{\hat{A}} \varvec{\hat{D}}^{-\frac{1}{2}} \varvec{X} \varvec{W}\right) \end{aligned}$$
(1)
where \(\varvec{\hat{A}}=\varvec{A}+\varvec{I}_N\) denotes the adjacency matrix with added self-connections, \(\varvec{I}_N\) is the identity matrix, \(\varvec{\hat{D}}\) represents the degree matrix, \(\varvec{W}\) denotes a trainable weight matrix, \(\varvec{X}\) is the initial feature matrix of nodes, and \(\sigma (\cdot )\) denotes an activation function. In practice, we select the \(PReLU(\cdot )\) as the activation function. \(f(\cdot )\) represents the encoder. \(\varvec{H}\in {R^{N\times {d}}}\) denotes the output node representations through the encoder, where N is the number of nodes and d is the embedding dimension of nodes.

Distributor

Utilizing the rich topological information of graphs used in self-supervised learning is a more natural choice than leveraging data augmentations to corrupt the original information of graphs. Inspired by the triplet network [14], we intend to design a distributor that can generate triplets to form contrastive views of nodes, which replaces data augmentations. However, the original definition of triplets relies on label information, which fails to generate triplets in unsupervised tasks.
To this end, we follow the assumption from the vanilla graph neural network [20]. The assumption is that connected nodes are likely to share the same labels, which is the key to the vast majority of graph neural networks. Additionally, based on the assumption of community detection[17], the neighbors have similar representations. Thus, we extend the idea of triplets to graph domains based on the above assumptions. Firstly, we select the first-order neighbors of anchor nodes (i.e., the nodes themselves) as positive nodes, which compensates for the information provided by the labels. Then the straightforward idea for selecting negative nodes is to select nodes that are not directly connected to anchor nodes.
Practically, as for the positive node, we leverage a positive selector \(SP(\cdot )\) to obtain the positive node, which is based on the guidance from the adjacency matrix. Then,for simplicity, we use a negative selector \(SN(\cdot )\) that randomly and uniformly selects a node from the whole graph as the negative node. The positive selector and the negative selector constitute the distributor together. Using the distributor, we can obtain the contrastive views of nodes that contain the positive pairs constituted from the anchor nodes and corresponding positive nodes and the negative pairs constituted from the anchor nodes and corresponding negative nodes. Finally, the positive selector and the negative selector can be formulated as the following expressions:
$$\begin{aligned} SP(v_{i})= & {} Random(\left\{ {j},{j}\in \varvec{A}_{i\cdot } \right\} ) \end{aligned}$$
(2)
$$\begin{aligned} SN(v_{i})= & {} Random(\left\{ {k},v_{k}\in \mathcal V \right\} ) \end{aligned}$$
(3)
where \(Random(\cdot )\) is a function that randomly and uniformly selects an element from a set, \(\varvec{A}_{i\cdot }\) is the i-th row vector of the adjacency matrix \(\varvec{A}\), j is the sequence number of first-hop neighbors belonged to node \(v_{i}\), \(\mathcal V\) is a set of all the nodes in a graph, and \(v_{k}\) denotes the k-th node of \(\mathcal V\).

Triplet loss

The prior graph contrastive methods usually utilize a contrastive objective that aims to maximize the similarity between the two augmented views of the same node. However, this type of objective heavily relies on massive negative nodes that contain every node except for the object node, requiring huge memory costs. At the same time, the composition of negative nodes in their works may be unreasonable. Specifically, they treat all nodes as negative nodes except for the object node to easily obtain massive negative nodes, which probably contain a lot of negative nodes that have the same label as the object node. To alleviate this issue, intuitively, we provide a negative sampler that only samples an unconnected node for object nodes as their negative nodes, which reduces the likelihood of sampling nodes with the same labels as object nodes. Moreover, we empirically find that randomly sampling nodes for object nodes is also effective and simple, and this will be discussed in detail in Sect. 5.6.
Table 2
The statistics of datasets conducted on the node classification task
Datasets
Hom. ratio \(h_{r}\)
#Nodes
#Edges
Avg.#Neighbors per node
#Features
#Classes
Train/Val/Test nodes
Cora
0.81
2708
5429
2.00
1433
7
140/500/1000
Citeseer
0.74
3327
4732
1.42
3703
6
120/500/1000
Pubmed
0.80
19,717
44,338
2.24
500
3
60/500/1000
ogbn-arxiv
0.61
169,343
1,166,243
13.7
128
40
90941/29799/48603
ogbn-products
0.79
2,449,029
61,859,140
50.5
100
47
196615/39323/2213091
Table 3
The statistics of datasets conducted on the graph classification task
Datasets
#Graph
#Classes
Avg.#Nodes per graph
Avg.#Edges per graph
MUTAG
188
2
17.93
19.79
IMDB-BIN
1000
2
19.77
193.06
IMDB-MULTI
1500
3
13
65.93
Then, we introduce a triplet loss based on adjacency information (i.e., the distributor) as our objective. Compared with the contrastive objective, this objective utilizes only one negative node for every object node. By achieving this objective, we can also enforce our model to maximize the similarity between the node embeddings of contrastive views and minimize the agreement between the embeddings of negative pairs. Specifically, the triplet objective function of an object node is defined as:
$$\begin{aligned} \ell \left( {u}_{i}\right) \!=ReLU\left( \left\| \varvec{h}_{i}^{a}-\varvec{h}_{i}^{p}\right\| \!-\!\left\| \varvec{h}_{i}^{a}-\varvec{h}_{i}^{n}\right\| +M \right) \end{aligned}$$
(4)
where \(\varvec{h}_{i}^{a}\) is the embedding of the object node, \(\varvec{h}_{i}^{p}\) denotes the embedding of the positive node(i.e., \(\varvec{h}_{j}\)), \(\varvec{h}_{i}^{n}\) represents the embedding of the negative node (i.e., \(\varvec{h}_{k}\)), j and k can be obtained by \(SP(\cdot )\) and \(SN(\cdot )\). M is the margin value, N is the number of nodes in the graph, and \(ReLU(\cdot )\) denotes the activation function. Then, the overall training objective can be formulated as the following loss function:
$$\begin{aligned} \mathcal {J}=\frac{1}{N} \sum _{i=1}^{N}\ell \left( {u}_{i}\right) \end{aligned}$$
(5)
In a nutshell, in each iteration, SimGRL first utilizes an encoder to aggregate information about neighbors. Then, it employs a distributor to generate graph triplets. Finally, the parameters of the learning matrix \(\varvec{W}\) are updated by optimizing the triplet loss. The overall algorithm is summarized in Algorithm 1.

Experiments and analysis

In this section, we conduct extensive experiments to evaluate the proposed method on eight public datasets guided by four research questions.
  • Q1: How does SimGRL compare to state-of-the-art methods on the node classification and graph classification tasks in terms of accuracy?
  • Q2: Is the proposed SimGRL framework efficient in comparison to state-of-the-art methods in terms of running time and memory overhead?
  • Q3: How does SimGRL perform when utilizing different distributor selectors?
  • Q4: Is the proposed SimGRL framework robust when noisy nodes invade triplets?

Datasets

To make a fairly comprehensive comparison, we conducted extensive experiments on various widely used benchmark datasets.
For the node classification task, we use three classical citation networks, Cora, Citeseer, and Pubmed [30], where nodes features are the bag of word representations of documents, edges associated with non-directed citations, and nodes labels stand for the genre of documents. Additionally, we select two large-scale datasets from Open Graph Benchmark [15], i.e., ogbn-arxiv and ogbn-products. ogbn-arxiv denotes a paper citation network of ARXIV papers, and ogbn-products represents an Amazon product co-purchasing network.For the graph classification task, we conducted experiments on the following datasets: Firstly, the MUTAG [22] dataset is formed by chemical compounds that are considered graphs, and their labels correspond to their mutagenic effect on a bacterium. As for the IMDB-binary dataset and the IMDB-multi dataset [43], their nodes represent actors and actresses, and their edges denote that they appear in the same movie. Moreover, the label information of movie networks relies on the genre of movies.
Additionally, we require a metric to evaluate the homophily of each dataset, since our proposed method mainly depends on the homophily assumption that connected nodes are likely to share the same labels. Here, we define an edge homophily ratio as our metric. Formally, the edge homophily ratio is defined as follows:
$$\begin{aligned} \textrm{h}_r=\frac{\left| \left\{ (u, v) \in E \wedge y_u=y_v\right\} \right| }{|E|} \end{aligned}$$
(6)
where E is an edge set, (uv) denotes a pair of nodes, and \(y_u\) represents the label of the node u. The edge homophily ratio \(h_r\) is the fraction of edges connecting two nodes of the same class. The closer the edge homophily ratio is to one, the stronger the homophily for graphs. It should be noted that this definition is mostly applicable to the datasets used in node classification tasks, and thus we only present the edge homophily ratios of Cora, Pubmed, and Citeseer. The detailed statistics for the datasets are summarized in Tables 2 and 3 respectively.

Experimental settings

For the node classification task, we utilize a linear classifier to evaluate the performance of the learned node representations, which follows the previous state-of-the-art approach DGI [37]. We then report the mean classification accuracy along with the standard deviations after 10 runs of employing the linear classifier.
For the graph classification task, we follow the same evaluation procedure as InfoGraph [33] that assesses the classification performance by leveraging a linear SVM. Moreover, we also use 10-fold cross-validation accuracy with the standard deviation to make a fair comparison.
The experiments on large-scale datasets (i.e., ogbn-arxiv, ogbn-products) were conducted on a machine with one NVIDIA A100 GPU card (40 GB of RAM), and the rest of the experiments were conducted on a machine with one NVIDIA 2080Ti GPU card (11 GB of RAM).All the methods were implemented in Python 3.8 with Pytorch. The hidden layer dimension is 512 as the encoder output dimension. We use the Adam learning algorithm as our optimizer. The learning rate of the optimizer is 0.01 for all datasets. The \(L_{2}\) regularization value is only set 0.0005 on the citeseer dataset. To search for proper margin values, we conduct experiments with different margin values. The results are shown in Fig. 2. Empirically, the margin value M is set as 1 for Cora, 1 for Citeseer, 0.2 for Pubmed, 0.2 for MUTAG, 1 for IMDB-BINARY, 0.6 for IMDB-MULTI, 0.2 for ogbn-arxiv, and 1 for ogbn-products.
Table 4
The performance on node classification in terms of the mean accuracy in percentage with standard deviation compared with supervised methods. X, A and Y denote the node features, adjacency matrix, and label information respectively
Methods
Input
Cora
Citeseer
Pubmed
Planetoid
X, A, Y
75.7±0.0
64.7±0.0
77.2±0.0
Chebyshev
X, A, Y
81.2±0.0
69.8±0.0
74.4±0.0
GCN
X, A, Y
81.5±0.0
70.3±0.0
79.0±0.0
SGC
X, A, Y
81.0±0.0
71.9±0.0
78.9±0.0
GAT
X, A, Y
83.0±0.7
72.5±0.7
79.0±0.3
SimGRL(ours)
X, A
84.8±0.3
72.7±0.4
80.7±0.4
Table 5
The performance on node classification in terms of the mean accuracy in percentage with standard deviation compared with unsupervised methods. X and A denote the node features and adjacency matrix respectively. OOM-A denotes running out of the memory on an NVIDIA 2080Ti GPU(11 GB of RAM)
Methods
Input
Cora
Citeseer
Pubmed
Raw features
X, A
47.9±0.4
49.3±0.2
69.1±0.3
DeepWalk
X, A
67.2±0.0
43.2±0.0
65.3±0.0
GAE
X, A
71.5±0.4
65.8±0.4
72.1±0.5
DGI
X, A
82.3±0.6
71.8±0.7
76.8±0.6
Grace
X, A
83.3±0.4
72.1±0.5
79.5±1.1
GraphCL
X, A
83.6±0.5
72.5±0.7
79.8±0.5
GCA
X, A
80.4±1.7
67.4±0.7
OOM-A
BGRL
X, A
73.5±1.5
58.8±1.4
73.3±1.5
SelfGNN
X, A
81.0±0.2
67.1±0.4
80.5±0.2
SimGRL(ours)
X, A
84.8 ±0.3
72.7 ±0.4
80.7 ±0.4
Table 6
The performance on node classification in terms of the mean accuracy in percentage with standard deviation on large-scale datasets. X, A and Y denote the node features, adjacency matrix, and label information respectively. OOM-B denotes running out of the memory on an NVIDIA A100 GPU(40 GB of RAM)
Methods
Input
ogbn-arxiv
ogbn-products
MLP
X, A, Y
55.2±0.2
61.3±0.2
GCN
X, A, Y
71.7±0.1
70.6±0.1
Node2vec
X, A
69.8±0.1
68.5±0.1
DGI
X, A
OOM-B
OOM-B
Grace
X, A
OOM-B
OOM-B
GraphCL
X, A
OOM-B
OOM-B
GCA
X, A
OOM-B
OOM-B
BGRL
X, A
OOM-B
OOM-B
SelfGNN
X, A
70.2±0.2
OOM-B
SimGRL(ours)
X, A
71.5±0.1
72.2±0.1

Baselines

Node classification. For unsupervised learning methods, we have selected nine methods, including Raw features, DeepWalk [28], GAE [19], DGI [37], GRACE [49], GraphCL [47], GCA [50], BGRL [34] and SelfGNN [18]. These approaches cover most unsupervised learning ideas in the graph field, including random-walk, autoencoder, and contrastive learning. And the rest of the graph kernel methods that are usually used in graph classification are employed in the following comparison of the graph classification task. Aside from unsupervised approaches, we also use five popular semi-supervised methods for comparison, including Planetoid [6], Chebyshev [44], GCN [20], SGC [40], and GAT [36].
Table 7
The performance on graph classification in terms of the mean accuracy in percentage with standard deviation compared with graph kernel methods
Methods
MUTAG
IMDB-BINARY
IMDB-MULTI
SP
85.2±2.4
55.6±0.2
38.0±0.3
GK
81.7±2.1
65.9±1.0
43.9±0.4
WL
80.7±3.0
72.3±3.4
47.0±0.5
DGK
87.4±2.7
67.0±0.6
44.6±0.5
MLG
87.9±1.6
66.6±0.3
41.2±0.0
SimGRL(ours)
89.1 ±0.6
74.5 ±0.6
51.4 ±0.4
Table 8
The performance on graph classification in terms of the mean accuracy in percentage with standard deviation compared with supervised graph neural networks methods
Methods
MUTAG
IMDB-BINARY
IMDB-MULTI
GraphSAGE
85.1±7.6
72.3±5.3
50.9±2.2
GCN
85.6±5.8
74.0±3.4
51.9±3.8
GIN-0
89.4±5.6
75.1±5.1
52.3±2.8
GIN-\(\epsilon \)
89.0±6.0
74.3±5.1
52.1±3.6
GAT
89.4±6.1
70.5±2.3
47.8±3.1
SimGRL (ours)
89.1±0.6
74.5±0.6
51.4±0.4
Table 9
The performance on graph classification in terms of the mean accuracy in percentage with standard deviation compared with unsupervised graph neural networks methods
Methods
MUTAG
IMDB-BINARY
IMDB-MULTI
Random walk
83.7±1.5
50.7±0.3
34.7±0.2
Node2vec
72.6±10.2
Sub2vec
61.1±15.8
55.3±1.5
36.7±0.8
Graph2vec
83.2±9.6
71.1±0.5
50.4±0.9
InfoGraph
89.0±1.1
73.0±0.9
49.7±0.5
HTC
91.8±0.5
73.3±0.5
50.5±0.3
SimGRL(ours)
89.1±0.6
74.5±0.6
51.4±0.4
Graph classification. For a comprehensive comparison, we select three kinds of methods, including graph kernel approaches, supervised methods, and unsupervised methods. Firstly, we compare our proposed method with five popular graph kernel approaches, including the shortest path kernel (SP) [3], Graphlet kernel (GK) [32], Weisfeiler-Lehman sub-tree kernel (WL) [31], deep graph kernel (DGK) [43] and the multi-scale Laplacian kernel (MLG) [21]. In addition, we also utilize 5 supervised GNNs for comparison: GraphSAGE [12], GCN [20], GAT [36], and two variants of GIN [42]: GIN-0 and GIN-\(\epsilon \). Finally, we compare the results with six unsupervised methods: Random Walk [35], Node2vec [11], Sub2vec [1], Graph2vec [25], InfoGraph [33], and HTC [38].
Table 10
Comparisons of running time and memory overhead on Cora, CiteSeer, and Pubmed. OOM-A denotes running out of memory on an 11GB 2080Ti GPU. OOM-B denotes running out of the memory on an NVIDIA A100 GPU (40 GB of RAM)
 
Cora
CiteSeer
Pubmed
ogbn-arxiv
ogbn-products
 
Running time
Memory overhead
Running time
Memory overhead
Running time
Memory overhead
Running time
Memory overhead
Running time
Memory overhead
DGI
10 s
3400MB
11 s
7300MB
OOM-A
OOM-B
OOM-B
GraphCL
21 s
6100MB
17 s
7400MB
OOM-A
OOM-B
OOM-B
GCA
62 s
1200MB
83 s
1500MB
OOM-A
OOM-B
OOM-B
SelfGNN
61 s
1400MB
129 s
1600MB
170 s
3800MB
2137 s
15357MB
OOM-B
SimGRL(ours)
0.7 s
700MB
0.8 s
900MB
8.1 s
1100MB
97 s
7427MB
720 s
40017MB

Comparison with baselines (Q1)

The ultimate purpose of self-supervised representation methods is to learn good high-level representations for downstream tasks, such as node classification and graph classification. Thus, we conduct extensive experiments on node classification and graph classification to evaluate the effectiveness of our proposed method. The results are represented in Tables 4, 5, 6, 7, 8.
First, we aim to analyze the results of node classification tasks on small datasets, i.e., Cora, Citeseer, and Pubmed. As shown in Table 4, we can observe that our proposed method outperforms the supervised baselines across all datasets. For example, SimGRL obtains 1.8% improvement, 0.2% improvement, and 1.7% improvement on Cora, Citeseer, and Pubmed over the classically supervised graph neural network (GAT), respectively. Due to effectively leveraging massive unlabeled samples during training, our proposed self-supervised method can outperform the classical supervised graph neural networks. From Table 5, we can observe that our method has state-of-the-art performance compared to unsupervised methods. It strongly demonstrates that our method can effectively extract the graph structure information as our self-supervised target.
Second, to further evaluate the scalability and performance of our proposed model, we chose two popular large-scale datasets from Open Graph Benchmark [15], which are ogbn-arxiv and ogbn-products. The results are reported in Table 6. First, on the ogbn-arxiv, we can observe that SimGRL outperforms other baselines except for GCN. This is due to the poor homophily ratio of the ogbn-arxiv, which prevents SimGRL from efficiently utilizing the homophily information. Additionally, note that the ogbn-products has a higher homophily ratio, SimGRL consistently performs better than baselines by considerable margins. Furthermore, based on our computing resources, most self-supervised baselines struggle to run on such large-scale datasets, however, SimGRL can. This implies SimGRL’s good scalability.
Third, from Table 7, we find that SimGRL is superior to MLG on the MUTAG with 1.2% improvements, WL on the IMDB-BINARY with 2.2% improvements, and WL on the IMDB-MULTI with 4.4% improvements, respectively. The great performance verifies the superiority of our method compared with graph kernel methods. From Table 8, although SimGRL fails to completely exceed all the supervised methods, it still has competitive improvements in several supervised methods. For instance, compared with GraphSAGE, it improves 4%, 2.2%, and 0.5% on the MUTAG, IMDB-BINARY, and IMDB-MULTI datasets, respectively, which demonstrates that our proposed self-supervised method has the potential to surpass the supervised methods in graph classification tasks. Moreover, from Table 9, SimGRL achieves 1.2% and 0.9% improvements over the state-of-the-art unsupervised graph contrastive method (HTC) on IMDB-BINARY and IMDB-MULTI datasets, respectively. Although SimGRL has competitive results on graph classification tasks, it cannot obtain the best performance on these tasks compared to the most supervised methods and several self-supervised methods. One possible explanation is that the homophily assumption is not universally valid in graph-level tasks (e.g., molecular graphs like MUTAG adopted in the experiments).
All in all, on the node classification and graph classification tasks, it is apparent to note that SimGRL is competitive with not only unsupervised but also supervised methods. These empirical performance improvements demonstrate the superiority of our proposed framework.

Comparison of running time and memory overhead (Q2)

The main goal of our method is of using a simple framework and avoiding massive negative nodes for contrast so that we significantly reduce the computational and memory cost. In Table 1, we qualitatively analyze the advantages of the SimGRL model. The proposed model simplifies the network structure of encoders from two channels to one channel. More importantly, it avoids the consumption of computing resources caused by massive negative nodes. For instance, given a graph contains N nodes, the prior methods require \(N-1\) negative nodes to compute for every object node, while we only need one negative node to calculate for every object node. Furthermore, they need \(O(N^{2})\) memory to store the negative samples, while we only require O(N) memory. Therefore, we used the same software and hardware configuration environment to verify our performance in terms of running time and memory overhead. We compare our method with several state-of-the-art graph contrastive methods, including DGI [33], GraphCL [47], SelfGNN [18], and GCA [50]. Table 10shows the result of the comparison of both running time and memory overhead.
As shown in Table 10, we can observe that our method achieves the minimum of both running time and memory overhead. For instance, on the Cora dataset, the prior best method (DGI) needs 10 s in terms of the running time, while our method only requires 0.7 s. As for the memory overhead, the prior best method (GCA) uses 1200 MB of memory on the Cora dataset, while our method only needs 700 MB. Hence, it convincingly proves that our proposed method can reduce memory overhead and running time considerably compared with the state-of-the-art graph contrastive methods.
Furthermore, more negative examples will be helpful for contrastive methods in computer vision, according to the SimCLR. Thus, we further explore the impact of the number of negative nodes on accuracy and memory overhead. We fixed the number of positive nodes, and the range of negative nodes’ numbers is determined by computing resources. The experimental results are reported in Fig. 3. We can obtain the following observations: First, the memory overhead significantly increases when the number of negative nodes grows. Second, the performance of the model will be improved with more negative nodes. Third, this improvement is limited. As can be seen, after more than 200 negative nodes, the accuracy curves for Cora and Citeseer start to converge. Additionally, when the number of negative nodes is greater than 200, the Pubmed accuracy curve still tends to rise. The potential reason is that the larger datasets have more nodes, and thus they may need more negative nodes to obtain discriminative features that will improve the model’s performance. Finally, since increasing the number of negative nodes will significantly increase the memory overhead, we can conclude that the cost of the improvement is significant.

Effect of different selectors(Q3)

We further analyze the effects of different positive selectors and negative selectors. For a fair comparison, the experiment setup follows the same aforementioned settings except for the selector. For this analysis, we devise several different strategies for the positive selector (SP) and the negative selector (SN). Firstly, we employ a mean SP that obtains the average representation of all the first-order neighbors as the positive node. Secondly, we utilize a random SP that randomly selects a positive node from the first-order neighbors. Thirdly, we devise an unconnected SN that only selects a node unconnected with the object node as the negative node. Finally, we use a random SN that randomly and uniformly selects a node from the whole graph as the negative node.
As we can see in Fig. 4, the random SP performs better than the mean SP, but there is only a slight gap between them. The results indicate that the proposed method is not too sensitive to the different selection strategies of the first-order neighbors. From Fig. 5, we note that the performance of the unconnected SN is also close to the performance of the random SN. One possible reason is that the number of unconnected nodes for an object node is far larger than the number of its neighbors, and thus the results obtained by the two SN appear to be very close.

Impact of the intrusion of noise (Q4)

For analyzing the robustness of our method, we design an experiment that seeks to mix nodes with the opposite type of nodes, simulating the intrusion of noise. The results are presented in Fig. 6. We define a mix rate, which is the proportion of the replaced nodes in the total nodes. Two mixed situations are considered. Firstly, we define P2N, which represents the positive mix type, which replaces positive nodes with negative nodes. Secondly, we define N2P, which denotes the negative mix type, which replaces negative nodes with positive nodes.
We may make the following observations by analyzing the figure: First, as it can be seen from Fig. 6, the accuracy gradually declines as the mix rate increases. We analyze the reason for this by noting that the learning ability of the model tends to degenerate when the difference between positive and negative nodes gradually weakens. Generally, its performance has only a little fluctuation when the mix rate is below thirty percent. This finding suggests that our proposed method can still perform well with a certain range of noise. Moreover, on the Cora and Pubmed datasets, we observe that our method can still perform well when the effect of noise is low (before the mixing rate is lower than 50%), proving that it is robust. However, the model’s performance rapidly deteriorates when the mixing rate is greater than 50% because of an imbalance between positive and negative samples. On Citeseer, because the average number of neighbors is low, it is easy to be bothered by noise. As a result, its performance fluctuates more than the other datasets at lower mixing rates, as can be shown. Furthermore, owing to sampling nodes from the whole graph as negative nodes, we can observe that the performance in N2P is better than in P2N.

T-SNE visualization

To demonstrate the superiority of our model, we display the node embeddings of the Cora dataset by SimGRL using the t-SNE algorithm. As it can be seen from Fig. 7, Fig. 7a is the raw features, and Fig. 7b is learned features by the SimGRL model. SimGRL’s 2D projection shows a clearer separation, which indicates that the encoder can extract more expressive node representations for downstream tasks via our proposed method.

Conclusions

In this paper, we propose a simple self-supervised graph representation learning framework, called SimGRL. Compared with the prior methods, our method achieves great performance with only one encoder and avoids the requirement of data augmentations. Using the triplet loss based on adjacency information, only one negative node is sufficient to make the model obtain significant performance, simultaneously reducing memory overhead. We conduct extensive experiments on node classification and graph classification tasks to evaluate our method. Empirical experimental results show that SimGRL has a competitive performance compared with not only unsupervised methods but also supervised methods in downstream tasks. Although our method achieved remarkable results based on the homophily assumption, there are graphs with heterophily in the real world. In future work, we intend to extend our method to be applicable in both homophilic and heterophilic settings.

Acknowledgements

This work was partly supported by the National Natural Science Foundation of China (U1701266), the Guangdong Provincial Key Laboratory Project of Intellectual Property and Big Data (2018B030322016), Special Projects for Key Fields in Higher Education of Guangdong, China (2020ZDZX3077), Guangdong Province Key Construction Discipline Scientific Research Capacity Improvement Project (2022ZDJS013), the Natural Science Foundation of Guangdong Province, China (2022A1515011146).

Declarations

Conflict of interest

The authors declare that they have no conflict of interest.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Literatur
1.
Zurück zum Zitat Adhikari B, Zhang Y, Ramakrishnan N, Prakash BA (2018) Sub2vec: feature learning for subgraphs. Pacific-Asia Conference on knowledge discovery and data mining. Springer, Berlin, pp 170–182CrossRef Adhikari B, Zhang Y, Ramakrishnan N, Prakash BA (2018) Sub2vec: feature learning for subgraphs. Pacific-Asia Conference on knowledge discovery and data mining. Springer, Berlin, pp 170–182CrossRef
2.
Zurück zum Zitat Battaglia PW, Hamrick JB, Bapst V, Sanchez-Gonzalez A, Zambaldi V, Malinowski M, Tacchetti A, Raposo D, Santoro A, Faulkner R et al. (2018) Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261 Battaglia PW, Hamrick JB, Bapst V, Sanchez-Gonzalez A, Zambaldi V, Malinowski M, Tacchetti A, Raposo D, Santoro A, Faulkner R et al. (2018) Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:​1806.​01261
3.
Zurück zum Zitat Borgwardt KM, Kriegel HP (2018) Shortest-path kernels on graphs. In: Fifth IEEE international conference on data mining, pp. 74–81. IEEE Borgwardt KM, Kriegel HP (2018) Shortest-path kernels on graphs. In: Fifth IEEE international conference on data mining, pp. 74–81. IEEE
4.
Zurück zum Zitat Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning, pp. 1597–1607 Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning, pp. 1597–1607
5.
Zurück zum Zitat Chen X, He K (2021) Exploring simple siamese representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15750–15758 Chen X, He K (2021) Exploring simple siamese representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15750–15758
6.
Zurück zum Zitat Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In: Proceedings of the International Conference on Neural Information Processing Systems, pp. 3844–3852 Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In: Proceedings of the International Conference on Neural Information Processing Systems, pp. 3844–3852
7.
Zurück zum Zitat Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT
8.
Zurück zum Zitat Dong X, Shen J (2018) Triplet loss in siamese network for object tracking. In: Proceedings of the European conference on computer vision, pp. 459–474 Dong X, Shen J (2018) Triplet loss in siamese network for object tracking. In: Proceedings of the European conference on computer vision, pp. 459–474
9.
Zurück zum Zitat Feng Y, Wang H, Hu HR, Yu L, Wang W, Wang S (2020) Triplet distillation for deep face recognition. In: 2020 IEEE International Conference on Image Processing, pp. 808–812. IEEE Feng Y, Wang H, Hu HR, Yu L, Wang W, Wang S (2020) Triplet distillation for deep face recognition. In: 2020 IEEE International Conference on Image Processing, pp. 808–812. IEEE
10.
Zurück zum Zitat Gao T, Yao X, Chen D (2021) Simcse: Simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pp. 6894–6910. Association for Computational Linguistics Gao T, Yao X, Chen D (2021) Simcse: Simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pp. 6894–6910. Association for Computational Linguistics
11.
Zurück zum Zitat Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864 Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864
12.
Zurück zum Zitat Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 1025–1035 Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 1025–1035
13.
Zurück zum Zitat Hjelm RD, Fedorov A, Lavoie-Marchildon S, Grewal K, Bachman P, Trischler A, Bengio Y (2018) Learning deep representations by mutual information estimation and maximization. In: International Conference on Learning Representations Hjelm RD, Fedorov A, Lavoie-Marchildon S, Grewal K, Bachman P, Trischler A, Bengio Y (2018) Learning deep representations by mutual information estimation and maximization. In: International Conference on Learning Representations
14.
Zurück zum Zitat Hoffer E, Ailon N (2015) Deep metric learning using triplet network. In: International workshop on similarity-based pattern recognition, pp. 84–92. Springer Hoffer E, Ailon N (2015) Deep metric learning using triplet network. In: International workshop on similarity-based pattern recognition, pp. 84–92. Springer
15.
Zurück zum Zitat Hu W, Fey M, Zitnik M, Dong Y, Ren H, Liu B, Catasta M, Leskovec J (2020) Open graph benchmark: datasets for machine learning on graphs. Adv Neural Inf Process Syst 33:22118–22133 Hu W, Fey M, Zitnik M, Dong Y, Ren H, Liu B, Catasta M, Leskovec J (2020) Open graph benchmark: datasets for machine learning on graphs. Adv Neural Inf Process Syst 33:22118–22133
16.
Zurück zum Zitat Huang Q, Yamada M, Tian Y, Singh D, Chang Y (2022) Graphlime: Local interpretable model explanations for graph neural networks. IEEE Trans Knowl Data Eng 2:1–6 Huang Q, Yamada M, Tian Y, Singh D, Chang Y (2022) Graphlime: Local interpretable model explanations for graph neural networks. IEEE Trans Knowl Data Eng 2:1–6
17.
Zurück zum Zitat Jin D, Yu Z, Jiao P, Pan S, He D, Wu J, Yu P, Zhang W (2021) A survey of community detection approaches: From statistical modeling to deep learning. IEEE Trans Knowl Data Eng 2:1CrossRef Jin D, Yu Z, Jiao P, Pan S, He D, Wu J, Yu P, Zhang W (2021) A survey of community detection approaches: From statistical modeling to deep learning. IEEE Trans Knowl Data Eng 2:1CrossRef
18.
Zurück zum Zitat Kefato ZT, Girdzijauskas S (2021) Self-supervised graph neural networks without explicit negative sampling. In: The International Workshop on Self-Supervised Learning for the Web (SSL’21), at WWW’21 Kefato ZT, Girdzijauskas S (2021) Self-supervised graph neural networks without explicit negative sampling. In: The International Workshop on Self-Supervised Learning for the Web (SSL’21), at WWW’21
20.
Zurück zum Zitat Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations
21.
Zurück zum Zitat Kondor R, Pan H (2016) The multiscale laplacian graph kernel. In: Advances in neural information processing systems, pp. 2990–2998 Kondor R, Pan H (2016) The multiscale laplacian graph kernel. In: Advances in neural information processing systems, pp. 2990–2998
22.
Zurück zum Zitat Kriege N, Mutzel P (2012) Subgraph matching kernels for attributed graphs. In: International Conference on Machine Learning, pp. 291–298 Kriege N, Mutzel P (2012) Subgraph matching kernels for attributed graphs. In: International Conference on Machine Learning, pp. 291–298
23.
Zurück zum Zitat Kriege NM, Giscard PL, Wilson R (2016) On valid optimal assignment kernels and applications to graph classification. In: Advances in Neural Information Processing Systems, pp. 1623–1631 Kriege NM, Giscard PL, Wilson R (2016) On valid optimal assignment kernels and applications to graph classification. In: Advances in Neural Information Processing Systems, pp. 1623–1631
24.
Zurück zum Zitat Lin H, Fu Y, Lu P, Gong S, Xue X, Jiang YG (2019) Tc-net for isbir: Triplet classification network for instance-level sketch based image retrieval. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1676–1684 Lin H, Fu Y, Lu P, Gong S, Xue X, Jiang YG (2019) Tc-net for isbir: Triplet classification network for instance-level sketch based image retrieval. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1676–1684
25.
Zurück zum Zitat Narayanan A, Chandramohan M, Venkatesan R, Chen L, Liu Y, Jaiswal S (2017) graph2vec: Learning distributed representations of graphs. arXiv preprint arXiv:1707.05005 Narayanan A, Chandramohan M, Venkatesan R, Chen L, Liu Y, Jaiswal S (2017) graph2vec: Learning distributed representations of graphs. arXiv preprint arXiv:​1707.​05005
27.
Zurück zum Zitat Pan S, Hu R, Long G, Jiang J, Yao L, Zhang, C (2018) Adversarially regularized graph autoencoder for graph embedding. In: International Joint Conference on Artificial Intelligence Pan S, Hu R, Long G, Jiang J, Yao L, Zhang, C (2018) Adversarially regularized graph autoencoder for graph embedding. In: International Joint Conference on Artificial Intelligence
28.
Zurück zum Zitat Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710 Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710
29.
Zurück zum Zitat Rhee S, Seo S, Kim S (2018) Hybrid approach of relation network and localized graph convolutional filtering for breast cancer subtype classification. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden, pp. 3527–3534 Rhee S, Seo S, Kim S (2018) Hybrid approach of relation network and localized graph convolutional filtering for breast cancer subtype classification. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden, pp. 3527–3534
30.
Zurück zum Zitat Sen P, Namata G, Bilgic M, Getoor L, Galligher B, Eliassi-Rad T (2008) Collective classification in network data. AI Mag 29(3):93–93 Sen P, Namata G, Bilgic M, Getoor L, Galligher B, Eliassi-Rad T (2008) Collective classification in network data. AI Mag 29(3):93–93
31.
Zurück zum Zitat Shervashidze N, Schweitzer P, Van Leeuwen EJ, Mehlhorn K, Borgwardt KM (2011) Weisfeiler–lehman graph kernels. J Mach Learn Res 12(9):2539–2561MathSciNetMATH Shervashidze N, Schweitzer P, Van Leeuwen EJ, Mehlhorn K, Borgwardt KM (2011) Weisfeiler–lehman graph kernels. J Mach Learn Res 12(9):2539–2561MathSciNetMATH
32.
Zurück zum Zitat Shervashidze N, Vishwanathan S, Petri T, Mehlhorn K, Borgwardt K (2009) Efficient graphlet kernels for large graph comparison. In: Artificial intelligence and statistics, pp. 488–495. PMLR Shervashidze N, Vishwanathan S, Petri T, Mehlhorn K, Borgwardt K (2009) Efficient graphlet kernels for large graph comparison. In: Artificial intelligence and statistics, pp. 488–495. PMLR
33.
Zurück zum Zitat Sun FY, Hoffman J, Verma V, Tang J (2020) Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. In: International Conference on Learning Representations Sun FY, Hoffman J, Verma V, Tang J (2020) Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. In: International Conference on Learning Representations
34.
Zurück zum Zitat Thakoor S, Tallec C, Azar MG, Munos R, Veličković P, Valko M (2021) Bootstrapped representation learning on graphs Thakoor S, Tallec C, Azar MG, Munos R, Veličković P, Valko M (2021) Bootstrapped representation learning on graphs
35.
Zurück zum Zitat Thomas G, Flach P, Stefan W (2003) On graph kernels: Hardness results and efficient alternatives. In: Proceedings of the 16th Annual Conference on Computational Learning Theory and 7th Kernel Workshop, pp. 129–143 Thomas G, Flach P, Stefan W (2003) On graph kernels: Hardness results and efficient alternatives. In: Proceedings of the 16th Annual Conference on Computational Learning Theory and 7th Kernel Workshop, pp. 129–143
36.
Zurück zum Zitat Veličković P, Cucurull G, Casanova A, Romero A, Lió P, Bengio Y (2018) Graph attention networks. In: International Conference on Learning Representations Veličković P, Cucurull G, Casanova A, Romero A, Lió P, Bengio Y (2018) Graph attention networks. In: International Conference on Learning Representations
37.
Zurück zum Zitat Veličković P, Fedus W, Hamilton W.L, Lió P, Bengio Y, Hjelm RD (2019) Deep graph infomax. In: International Conference on Learning Representations, p. 4 Veličković P, Fedus W, Hamilton W.L, Lió P, Bengio Y, Hjelm RD (2019) Deep graph infomax. In: International Conference on Learning Representations, p. 4
38.
Zurück zum Zitat Wang C, Liu Z (2021) Learning graph representation by aggregating subgraphs via mutual information maximization. arXiv preprint arXiv:2103.13125 Wang C, Liu Z (2021) Learning graph representation by aggregating subgraphs via mutual information maximization. arXiv preprint arXiv:​2103.​13125
39.
Zurück zum Zitat Wang D, Zhang Z, Zhou J, Cui P, Fang J, Jia Q, FangY, Qi Y (2021) Temporal-aware graph neural network for credit risk prediction. In: Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), pp. 702–710. SIAM Wang D, Zhang Z, Zhou J, Cui P, Fang J, Jia Q, FangY, Qi Y (2021) Temporal-aware graph neural network for credit risk prediction. In: Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), pp. 702–710. SIAM
40.
Zurück zum Zitat Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K (2019) Simplifying graph convolutional networks. In: International conference on machine learning, pp. 6861–6871. PMLR Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K (2019) Simplifying graph convolutional networks. In: International conference on machine learning, pp. 6861–6871. PMLR
41.
Zurück zum Zitat Wu Z, Pan S, Chen F, Long G, Zhang C, Philip SY (2020) A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst 32(1):4–24MathSciNetCrossRef Wu Z, Pan S, Chen F, Long G, Zhang C, Philip SY (2020) A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst 32(1):4–24MathSciNetCrossRef
42.
Zurück zum Zitat Xu K, Hu W, Leskovec J, Jegelka S (2018) How powerful are graph neural networks? In: International Conference on Learning Representations Xu K, Hu W, Leskovec J, Jegelka S (2018) How powerful are graph neural networks? In: International Conference on Learning Representations
43.
Zurück zum Zitat Yanardag P, Vishwanathan S (2015) Deep graph kernels. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1365–1374 Yanardag P, Vishwanathan S (2015) Deep graph kernels. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1365–1374
44.
Zurück zum Zitat Yang Z, Cohen W, Salakhudinov R (2016) Revisiting semi-supervised learning with graph embeddings. In: International conference on machine learning, pp. 40–48. PMLR Yang Z, Cohen W, Salakhudinov R (2016) Revisiting semi-supervised learning with graph embeddings. In: International conference on machine learning, pp. 40–48. PMLR
45.
Zurück zum Zitat Yao L, Mao C, Luo Y (2019) Graph convolutional networks for text classification. Proc AAAI Conf Artif Intell 33:7370–7377 Yao L, Mao C, Luo Y (2019) Graph convolutional networks for text classification. Proc AAAI Conf Artif Intell 33:7370–7377
46.
Zurück zum Zitat You J, Ying R, Leskovec J (2019) Position-aware graph neural networks. In: International conference on machine learning, pp. 7134–7143. PMLR You J, Ying R, Leskovec J (2019) Position-aware graph neural networks. In: International conference on machine learning, pp. 7134–7143. PMLR
47.
Zurück zum Zitat You Y, Chen T, Sui Y, Chen T, Wang Z, Shen Y (2020) Graph contrastive learning with augmentations. Adv Neural Inf Process Syst 33:5812–5823 You Y, Chen T, Sui Y, Chen T, Wang Z, Shen Y (2020) Graph contrastive learning with augmentations. Adv Neural Inf Process Syst 33:5812–5823
48.
Zurück zum Zitat Zhao D, Chen C, Li D (2021) Multi-stage attention and center triplet loss for person re-identication. Appl Intell 2:1–13 Zhao D, Chen C, Li D (2021) Multi-stage attention and center triplet loss for person re-identication. Appl Intell 2:1–13
49.
50.
Zurück zum Zitat Zhu Y, Xu Y, Yu F, Liu Q, Wu S, Wang L (2021) Graph contrastive learning with adaptive augmentation. Proc Web Conf 2021:2069–2080 Zhu Y, Xu Y, Yu F, Liu Q, Wu S, Wang L (2021) Graph contrastive learning with adaptive augmentation. Proc Web Conf 2021:2069–2080
Metadaten
Titel
SimGRL: a simple self-supervised graph representation learning framework via triplets
verfasst von
Da Huang
Fangyuan Lei
Xi Zeng
Publikationsdatum
27.02.2023
Verlag
Springer International Publishing
Erschienen in
Complex & Intelligent Systems / Ausgabe 5/2023
Print ISSN: 2199-4536
Elektronische ISSN: 2198-6053
DOI
https://doi.org/10.1007/s40747-023-00997-6

Weitere Artikel der Ausgabe 5/2023

Complex & Intelligent Systems 5/2023 Zur Ausgabe

Premium Partner