1 Introduction
-
For the first time, we explore the problems of GNN methods in neighbor aggregation, that is, the original attribute disturbance problem, and the influence differences of attribute and topology information on node representation.
-
We propose a novel API-GNN to solve the above problems. In the process of neighbor aggregation, it can not only ensure that the original attributes are preserved as much as possible, but also consider the differences and associations of attributes and topology to node representation through an interactive way.
-
We conduct extensive experiments on node classification and link prediction tasks with six real-world attributed network datasets. The results demonstrate that API-GNN outperforms all baseline models.
2 Preliminary
Symbols | Definitions |
---|---|
\(\mathcal {G}\) | Input attributed network |
\(\mathbf {A}\), \(\mathbf {X}\), \(\mathbf {Y}\) | Matrices of adjacency, attribute and node label |
N, M | The number of nodes and node attributes |
\(\mathcal {C}\) | The labels set of nodes |
\(\mathscr {N}(i)\) | The neighbor nodes set of node \(V_i\) |
\(\mathbf {x}_{it}\) | t-th original attribute vector of node \(V_i\) |
\({\mathbf {M}}_{F}^{(l)}, {\mathbf {M}}_{T}^{(l)}\) | Layer-specific transformation matrix |
\(\mathbf {a}_{F}\), \(\mathbf {a}_{N}\) | Attribute-level and node-level attention vector |
\(\mathbf {W}_{*}\), \(\mathbf {b}_{*}\) | Model trainable parameters |
\({\mathbf {h}}_{i}^{(l)}\) | Global representation of node \(V_i\) in l-th layer |
\(\mathbf{of }_{i}^{(l)}, \mathbf{tp }_{i}^{(l)}\) | Original attribute representation with low dimension and topology information representation of node \(V_i\) in l-th layer |
\(\mathbf{hf }_{i}^{(l)}\), \(\mathbf{ht }_{i}^{(l)}\) | Neighborhood attribute representation and neighborhood topology representation of node \(V_i\) in l-th layer |
\(\mathscr {N}\mathscr {I}(i)\) | Neighborhood information representation of node \(V_i\) in l-th layer |
\(\mathcal {Y}_{\mathcal {C}}, \mathcal {Y}_{+}, \mathcal {Y}_{-}\) | Labeled nodes set, postive instances link set and negative instances link set |
\(\lambda _{1}, \lambda _{2}, \lambda _{3}\) | Weight coefficients of loss function |
2.1 Problem definition
2.2 Our solution
3 The proposed model
3.1 Original attribute aggregation module
3.1.1 Attribute-level attention
3.1.2 Node-level attention
3.2 Topology aggregation module
3.2.1 Topology distilling
3.2.2 Topology aggregation
3.3 Interaction module
3.4 Optimization objective
4 Experiments
4.1 Experiment setups
4.1.1 Datasets
-
ACM: In this dataset, we select papers as nodes and use metapath paper-author-paper to construct edges. Paper features correspond to elements of a bag-of-words represented of keywords.
-
DBLP: Different from ACM, the node in DBLP network represents author and edge exists when the two authors publish their paper at the same conference. Author features are the elements of a bag-of-words represented of keywords.
-
Facebook: We regard following relationship as the edge and transform it into undirection relation. Notice that all of node attributes are anonymized. We select birthday, education, name, language, locale, hometown, location and work as node features.
-
Google+: This dataset is similar to Facebook, but the attributes are not anonymized. For node features, we select place, name, institution, university and job.
-
Amazon: We build the subgraph of clothing items in
Amazon.com
and use co-rating relation as edge. The rank, price, brand, title, description, dimension and weight are selected as item features. -
JD: For JD dataset, we also select clothing items as nodes and construct edges by co-click and co-order relation. The features of item include class-id, brand, price, description.
4.1.2 Baselines
-
Conventional Graph Embedding: we select three methods based on random walk: Deepwalk [14], Node2vec [4] and GraphRNA [7]. We also select three graph embedding methods based on exploiting proximity: LINE [17], SDNE [21], DANE [3]. Struc2Vec [15] which considers the spatial structure similarity is another baseline.
-
Improved GNN-based Graph Embedding: CS-GNN [6] designs a new mechanism to aggregate information from dissimilar neighbors. GResNet [27] explores variations of residual connections and ALaGCN [24] uses an adaptive layer to achieve a better aggregation effect. DAGNN [10] decoupling representation transformation and propagation in graph convolution operation.
-
Variants of API-GNN: we design three variants A-GNN, T-GNN and AP-GNN of API-GNN. Different from API-GNN, A-GNN only considers the original attribute aggregation module and ignores the topology aggregation module; while T-GNN contains both the original attribute aggregation module and the topology aggregation module, but its original attribute aggregation module only has node-level attention; the original attribute aggregation module and topology aggregation module are retained in AP-GNN, but the iteraction module of each iteration is ignored, and the interaction is only carried out in the last iteration.
4.1.3 Smoothness metrics
4.1.4 Parameters
Datasets | Nodes | Features | Links | \(\overline{d}\) | Labels |
---|---|---|---|---|---|
ACM | 3025 | 1870 | 6454 | 4.27 | 3 |
DBLP | 3890 | 334 | 2347227 | 1206.80 | 4 |
Facebook | 3401 | 1024 | 71254 | 41.90 | 2 |
Google\(+\) | 5807 | 640 | 885285 | 304.90 | 2 |
Amazon | 5004 | 354 | 27313 | 10.92 | 5 |
JD | 5289 | 3496 | 551825 | 208.67 | 4 |
Methods | ACM | DBLP | Facebook | Google+ | Amazon | JD |
---|---|---|---|---|---|---|
DeepWalk | 0.713 | 0.920 | 0.633 | 0.826 | 0.457 | 0.416 |
Node2vec | 0.723 | 0.864 | 0.630 | 0.830 | 0.465 | 0.431 |
GraphRNA | 0.934 | 0.884 | 0.654 | 0.826 | 0.475 | 0.493 |
LINE | 0.908 | 0.918 | 0.648 | 0.835 | 0.449 | 0.410 |
SDNE | 0.901 | 0.918 | 0.628 | 0.821 | 0.459 | 0.403 |
Struc2vec | 0.686 | 0.812 | 0.584 | 0.802 | 0.449 | 0.374 |
DANE | 0.799 | 0.794 | 0.663 | 0.804 | 0.477 | 0.342 |
GCN | 0.888 | 0.900 | 0.666 | 0.804 | 0.477 | 0.363 |
GraphSAGE | 0.875 | 0.905 | 0.657 | 0.790 | 0.461 | 0.482 |
GAT | 0.911 | 0.920 | 0.689 | 0.804 | 0.477 | 0.461 |
GresNet | 0.861 | 0.910 | 0.657 | 0.790 | 0.479 | 0.533 |
CS-GNN | 0.901 | 0.918 | 0.669 | 0.793 | 0.479 | 0.509 |
ALaGCN | 0.927 | 0.923 | 0.704 | 0.814 | 0.487 | 0.535 |
DAGNN | 0.927 | 0.906 | 0.666 | 0.828 | 0.487 | 0.488 |
A-GNN | 0.917 | 0.925 | 0.692 | 0.818 | 0.487 | 0.505 |
T-GNN | 0.931 | 0.928 | 0.710 | 0.833 | 0.491 | 0.522 |
AP-GNN | 0.924 | 0.920 | 0.704 | 0.826 | 0.477 | 0.520 |
API-GNN | 0.944 | 0.933 | 0.716 | 0.845 | 0.495 | 0.548 |
4.2 Node classification
Methods | ACM | DBLP | Facebook | Google+ | Amazon | JD |
---|---|---|---|---|---|---|
DeepWalk | 0.555 | 0.537 | 0.574 | 0.584 | 0.586 | 0.559 |
Node2vec | 0.552 | 0.543 | 0.572 | 0.577 | 0.590 | 0.555 |
GraphRNA | 0.687 | 0.547 | 0.693 | 0.764 | 0.774 | 0.746 |
LINE | 0.559 | 0.542 | 0.573 | 0.566 | 0.596 | 0.553 |
SDNE | 0.531 | 0.525 | 0.546 | 0.557 | 0.570 | 0.537 |
Struc2vec | 0.547 | 0.543 | 0.579 | 0.593 | 0.607 | 0.566 |
DANE | 0.640 | 0.606 | 0.651 | 0.651 | 0.606 | 0.635 |
GCN | 0.734 | 0.574 | 0.724 | 0.760 | 0.756 | 0.736 |
GraphSAGE | 0.726 | 0.564 | 0.713 | 0.813 | 0.730 | 0.706 |
GAT | 0.697 | 0.536 | 0.671 | 0.820 | 0.778 | 0.694 |
GresNet | 0.658 | 0.641 | 0.746 | 0.832 | 0.802 | 0.774 |
CS-GNN | 0.706 | 0.675 | 0.777 | 0.882 | 0.787 | 0.777 |
ALaGCN | 0.737 | 0.666 | 0.755 | 0.885 | 0.814 | 0.788 |
DAGNN | 0.744 | 0.689 | 0.775 | 0.894 | 0.822 | 0.797 |
A-GNN | 0.712 | 0.645 | 0.733 | 0.838 | 0.796 | 0.754 |
T-GNN | 0.739 | 0.686 | 0.762 | 0.883 | 0.816 | 0.788 |
AP-GNN | 0.726 | 0.636 | 0.751 | 0.868 | 0.787 | 0.766 |
API-GNN | 0.756 | 0.707 | 0.786 | 0.894 | 0.836 | 0.804 |