1 Introduction
-
Our study analyzes the ineffectiveness of the existing pseudo-labeling strategies and proposes a novel pseudo-labeling framework for semi-supervised node classification with extremely few labels.
-
Our approach has unique advantages to incorporate an MI-based informativeness measure for pseudo-label candidate selection and to alleviate the negative impact of noisy pseudo-labels via a generalized cross entropy loss.
-
We validate our proposed approach on six real-world graph datasets of various types, demonstrating its superior performance over state-of-the-art baselines.
2 Related works
2.1 Graph learning with few labels
2.2 Mutual information maximization
3 Problem statement
4 Methodology
4.1 Framework overview
4.2 The GNN encoder
4.3 Candidate selection for pseudo-labelling
4.4 Mitigating noisy pseudo-labels
4.5 Class-balanced regularization
4.6 Model training and computational complexity
5 Experiments
5.1 Datasets
Dataset | Nodes | Edges | Classes | Features |
---|---|---|---|---|
Citeseer | 3,327 | 4,732 | 6 | 3,703 |
Cora | 2,708 | 5,429 | 7 | 1,433 |
Dblp | 17,716 | 105,734 | 4 | 1,639 |
Wikics | 11,701 | 216,123 | 10 | 300 |
Coauthor_CS | 18,333 | 81,894 | 15 | 6,805 |
Coauthor_Phy | 34,493 | 247,962 | 5 | 8,415 |
5.2 Baselines
5.3 Experimental setup
Given labels (per class) | \(\alpha \) | \(\beta \) | k |
---|---|---|---|
\(\{1, 3, 5\}\) | 1.0 | 1.0 | 0.55 |
\(\{10, 15, 20, 30, 40, 50\}\) | 0.2 | 0.2 | 0.55 |
5.4 Comparison with state-of-the-art baselines
Method | Cora | Citeseer | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3 | 5 | 10 | 15 | 20 | 1 | 3 | 5 | 10 | 15 | 20 | |
GCN | 0.418 | 0.616 | 0.685 | 0.742 | 0.784 | 0.797 | 0.381 | 0.504 | 0.569 | 0.602 | 0.660 | 0.682 |
Super-GCN | 0.522 | 0.673 | 0.720 | 0.760 | 0.788 | 0.799 | \(\underline{0.499}\bullet \) | 0.610 | \(\underline{0.665}\bullet \) | \(\underline{0.700}\bullet \) | \(\underline{0.706}\bullet \) | 0.712 |
GMI | 0.502 | 0.672 | 0.715 | 0.757 | 0.783 | 0.797 | 0.497 | 0.568 | 0.621 | 0.632 | 0.670 | 0.683 |
SSGCN-clu | 0.407 | 0.684 | 0.739 | 0.776 | \(\underline{0.797}\bullet \) | \(\underline{0.810}\bullet \) | 0.267 | 0.388 | 0.507 | 0.616 | 0.634 | 0.647 |
SSGCN-comp | 0.451 | 0.609 | 0.676 | 0.741 | 0.772 | 0.794 | 0.433 | 0.547 | 0.638 | 0.682 | 0.692 | 0.709 |
SSGCN-par | 0.444 | 0.649 | 0.692 | 0.734 | 0.757 | 0.770 | 0.457 | 0.578 | 0.643 | 0.693 | 0.705 | \(\underline{0.716}\bullet \) |
Cotraining | 0.533 | 0.661 | 0.689 | 0.741 | 0.764 | 0.774 | 0.383 | 0.469 | 0.563 | 0.601 | 0.640 | 0.649 |
Selftraining | 0.399 | 0.608 | 0.693 | 0.761 | 0.789 | 0.793 | 0.324 | 0.463 | 0.526 | 0.647 | 0.683 | 0.685 |
Union | 0.505 | 0.663 | 0.713 | 0.764 | 0.792 | 0.797 | 0.366 | 0.491 | 0.560 | 0.631 | 0.663 | 0.667 |
Intersection | 0.408 | 0.596 | 0.674 | 0.736 | 0.770 | 0.775 | 0.337 | 0.497 | 0.582 | 0.671 | 0.694 | 0.699 |
M3S | 0.439 | 0.651 | 0.688 | 0.754 | 0.763 | 0.789 | 0.307 | 0.515 | 0.635 | 0.674 | 0.683 | 0.695 |
DSGCN | \(\underline{0.596}\bullet \) | \(\underline{0.712}\bullet \) | \(\underline{0.745}\bullet \) | \(\underline{0.777}\bullet \) | 0.792 | 0.795 | 0.463 | \(\underline{0.613}\bullet \) | 0.652 | 0.674 | 0.681 | 0.684 |
InfoGNN | 0.601 | 0.735 | 0.776 | 0.792 | 0.813 | 0.828 | 0.540 | 0.652 | 0.717 | 0.721 | 0.725 | 0.733 |
Method | Dblp | Wikics | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3 | 5 | 10 | 15 | 20 | 1 | 3 | 5 | 10 | 15 | 20 | |
GCN | 0.472 | 0.583 | 0.627 | 0.652 | 0.688 | 0.718 | 0.384 | 0.550 | 0.638 | 0.682 | 0.712 | 0.720 |
Super-GCN | 0.472 | 0.583 | 0.685 | 0.708 | 0.729 | 0.738 | 0.399 | 0.552 | 0.599 | 0.683 | 0.712 | 0.721 |
GMI | 0.544 | 0.597 | 0.656 | 0.728 | 0.739 | 0.754 | 0.325 | 0.484 | 0.546 | 0.654 | 0.683 | 0.700 |
SSGCN-clu | 0.369 | 0.528 | 0.649 | 0.692 | 0.721 | 0.744 | 0.335 | 0.579 | 0.627 | 0.694 | 0.714 | 0.725 |
SSGCN-comp | 0.458 | 0.525 | 0.598 | 0.634 | 0.674 | 0.707 | 0.224 | 0.261 | 0.358 | 0.381 | 0.343 | 0.356 |
SSGCN-par | 0.418 | 0.545 | 0.639 | 0.683 | 0.708 | 0.733 | 0.332 | 0.593 | 0.659 | 0.706 | 0.732 | 0.740 |
Cotraining | 0.545 | 0.646 | 0.634 | 0.674 | 0.703 | 0.701 | 0.367 | 0.584 | 0.645 | 0.692 | 0.724 | 0.737 |
Selftraining | 0.437 | 0.580 | 0.634 | 0.707 | 0.738 | 0.759 | 0.350 | 0.602 | 0.655\(\circ \) | 0.701 | 0.725 | 0.738 |
Union | 0.485 | 0.618 | 0.652 | 0.712 | 0.737 | 0.746 | 0.351 | 0.584 | 0.646 | 0.694 | 0.723 | \(\underline{0.740}\bullet \) |
Intersection | 0.458 | 0.581 | 0.566 | 0.665 | 0.715 | 0.734 | 0.359 | 0.599 | 0.654 | \(\underline{0.706}\bullet \) | \(\underline{0.726}\bullet \) | \(\underline{0.740}\bullet \) |
M3S | 0.547 | 0.635 | 0.672 | 0.733 | \(\underline{0.749}\bullet \) | 0.752 | 0.401 | 0.593 | 0.621 | 0.685 | 0.711 | 0.734 |
DSGCN | \(\underline{0.587}\bullet \) | 0.671\(\circ \) | \(\underline{0.720}\bullet \) | \(\underline{0.738}\bullet \) | 0.744 | \(\underline{0.764}\bullet \) | \(\underline{0.414}\bullet \) | \(\underline{0.607}\bullet \) | 0.635 | 0.705 | 0.716 | 0.728 |
InfoGNN | 0.596 | \(\underline{0.669}\) | 0.746 | 0.765 | 0.773 | 0.787 | 0.460 | 0.610 | \(\underline{0.650}\) | 0.723 | 0.740 | 0.742 |
Method | Coauther_CS | Coauther_Phy | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3 | 5 | 10 | 15 | 20 | 1 | 3 | 5 | 10 | 15 | 20 | |
GCN | 0.640 | 0.799 | 0.847 | 0.893 | 0.901 | 0.909 | 0.700 | 0.849 | 0.868 | 0.901 | 0.912 | 0.918 |
Super-GCN | 0.668 | 0.841 | 0.869 | 0.895 | 0.897 | 0.897 | 0.688 | 0.848 | 0.891 | 0.908 | 0.923 | 0.923 |
GMI | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM |
SSGCN-clu | 0.770\(\circ \) | 0.886\(\circ \) | \(\underline{0.890}\bullet \) | \(\underline{0.905}\bullet \) | 0.908 | 0.911 | 0.889\(\circ \) | \(\underline{0.923}\bullet \) | \(\underline{0.930}\bullet \) | \(\underline{0.935}\bullet \) | \(\underline{0.936}\bullet \) | \(\underline{0.936}\bullet \) |
SSGCN-comp | 0.711 | 0.858 | 0.888 | 0.904 | 0.907 | 0.909 | 0.798 | 0.892 | 0.904 | 0.927 | 0.921 | .928 |
SSGCN-par | 0.737 | 0.860 | 0.881 | 0.898 | 0.901 | 0.903 | 0.824 | 0.915 | 0.919 | 0.925 | 0.931 | 0.931 |
Cotraining | 0.643 | 0.745 | 0.810 | 0.849 | 0.864 | 0.885 | 0.758 | 0.842 | 0.850 | 0.898 | 0.891 | 0.917 |
Selftraining | 0.592 | 0.770 | 0.828 | 0.873 | 0.892 | 0.895 | 0.744 | 0.865 | 0.890 | 0.908 | 0.914 | 0.921 |
Union | 0.621 | 0.772 | 0.812 | 0.856 | 0.864 | 0.885 | 0.750 | 0.855 | 0.870 | 0.908 | 0.902 | 0.910 |
Intersection | 0.650 | 0.775 | 0.851 | 0.887 | 0.893 | 0.898 | 0.612 | 0.763 | 0.854 | 0.901 | 0.904 | 0.926 |
M3S | 0.648 | 0.818 | 0.879 | 0.897 | \(\underline{0.909}\bullet \) | \(\underline{0.912}\bullet \) | 0.828 | 0.868 | 0.895 | 0.914 | 0.922 | 0.930 |
DSGCN | \(\underline{0.743}\circ \) | 0.829 | 0.863 | 0.879 | 0.883 | 0.892 | 0.781 | 0.812 | 0.862 | 0.896 | 0.908 | 0.916 |
InfoGNN | 0.683 | \(\underline{0.865}\) | 0.892 | 0.906 | 0.913 | 0.918 | \(\underline{0.842}\) | 0.924 | 0.934 | 0.938 | 0.942 | 0.942 |
Method | Cora | Citeseer | Dblp | ||||||
---|---|---|---|---|---|---|---|---|---|
30 | 40 | 50 | 30 | 40 | 50 | 30 | 40 | 50 | |
GCN | 0.816 | 0.825 | 0.829 | 0.695 | 0.708 | 0.716 | 0.743 | 0.753 | 0.770 |
Super-GCN | 0.812 | 0.828 | 0.836 | 0.720 | 0.728 | 0.737 | 0.760 | 0.767 | 0.775 |
GMI | 0.806 | 0.815 | 0.820 | 0.692 | 0.695 | 0.701 | \(\underline{0.784}\bullet \) | 0.794\(\circ \) | \(\underline{0.794}\bullet \) |
SSGCN-clu | \(\underline{0.822}\bullet \) | \(\underline{0.829}\bullet \) | \(\underline{0.837}\bullet \) | 0.682 | 0.683 | 0.680 | 0.756 | 0.766 | 0.775 |
SSGCN-comp | 0.804 | 0.819 | 0.830 | 0.718 | 0.729 | \(\underline{0.739}\bullet \) | 0.744 | 0.752 | 0.761 |
SSGCN-par | 0.784 | 0.791 | 0.798 | \(\underline{0.724}\bullet \) | \(\underline{0.732}\bullet \) | 0.738 | 0.751 | 0.762 | 0.769 |
Cotraining | 0.804 | 0.820 | 0.823 | 0.675 | 0.684 | 0.697 | 0.716 | 0.726 | 0.736 |
Selftraining | 0.807 | 0.821 | 0.818 | 0.696 | 0.706 | 0.710 | 0.777 | 0.775 | 0.782 |
Union | 0.807 | 0.819 | 0.827 | 0.688 | 0.691 | 0.694 | 0.757 | 0.764 | 0.757 |
Intersection | 0.800 | 0.818 | 0.821 | 0.705 | 0.712 | 0.716 | 0.745 | 0.765 | 0.769 |
M3S | 0.792 | 0.807 | 0.815 | 0.713 | 0.716 | 0.721 | 0.765 | 0.769 | 0.774 |
DSGCN | 0.798 | 0.809 | 0.816 | 0.684 | 0.684 | 0.685 | 0.784 | 0.786 | 0.786 |
InfoGNN | 0.835 | 0.848 | 0.853 | 0.735 | 0.737 | 0.742 | 0.789 | \(\underline{0.792}\) | 0.795 |
Method | Wikics | Coauther_CS | Coauther_Phy | ||||||
---|---|---|---|---|---|---|---|---|---|
30 | 40 | 50 | 30 | 40 | 50 | 30 | 40 | 50 | |
GCN | \(\underline{0.752}\bullet \) | \(\underline{0.761}\bullet \) | 0.764 | 0.901 | 0.900 | 0.903 | 0.924 | 0.932 | 0.933 |
Super-GCN | 0.742 | 0.752 | 0.763 | 0.908 | 0.909 | 0.909 | 0.929 | 0.930 | 0.933 |
GMI | 0.713 | 0.730 | 0.746 | OOM | OOM | OOM | OOM | OOM | OOM |
SSGCN-clu | 0.738 | 0.745 | 0.747 | 0.914 | 0.915 | 0.915 | \(\underline{0.938}\bullet \) | \(\underline{0.939}\bullet \) | \(\underline{0.940}\) |
SSGCN-comp | 0.361 | 0.375 | 0.412 | 0.909 | 0.918 | \(\underline{0.922}\bullet \) | 0.928 | 0.933 | 0.937 |
SSGCN-par | 0.741 | 0.750 | 0.755 | 0.906 | 0.908 | 0.908 | 0.933 | 0.933 | 0.934 |
Cotraining | 0.750 | 0.756 | 0.765 | 0.889 | 0.895 | 0.898 | 0.926 | 0.924 | 0.927 |
Selftraining | 0.743 | 0.760 | 0.768\(\circ \) | 0.901 | 0.901 | 0.904 | 0.932 | 0.932 | 0.932 |
Union | \(\underline{0.752}\bullet \) | 0.761 | 0.765 | 0.893 | 0.901 | 0.898 | 0.921 | 0.931 | 0.925 |
Intersection | 0.748 | 0.765 | 0.767 | 0.896 | 0.898 | 0.905 | 0.927 | 0.927 | 0.932 |
M3S | 0.745 | 0.755 | 0.763 | \(\underline{0.916}\bullet \) | \(\underline{0.920}\bullet \) | \(\underline{0.922}\bullet \) | 0.935 | 0.937 | 0.940 |
DSGCN | 0.751 | 0.759 | 0.763 | 0.893 | 0.896 | 0.897 | 0.916 | 0.920 | 0.922 |
InfoGNN | 0.754 | 0.764 | \(\underline{0.766}\) | 0.919 | 0.922 | 0.923 | 0.943 | 0.944 | 0.945 |
5.5 Ablation study
-
InfoGNN-I: Only \(\ell _I\) is applied based on GCN, which is used to evaluate the role of the contrastive loss;
-
InfoGNN-IT: Both \(\ell _I\) and \(\ell _T\) are applied, which is utilized to evaluate the impact of the GCE loss by comparing with InfoGNN-I. Note that only model confidence scores are used here for \(\ell _T\), i.e., \({\mathcal {U}}_p = \{v \in {\mathcal {U}} | f({\textbf{x}}_v)_j > k\}\);
-
InfoGNN-ITS: On the basis of InfoGNN-IT, the informativeness score, i.e., Eq.(10), is also applied for \(\ell _T\), which is to test the efficacy of the informativeness score by comparing with InfoGNN-IT. The impact of the \(\ell _{KL}\) loss can be revealed by comparing with InfoGNN.
Method | Cora | Citeseer | Dblp | Wikics | Coauther_cs | Coauther_phy | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 10 | 3 | 10 | 3 | 10 | 3 | 10 | 3 | 10 | 3 | 10 | |
GCN | 0.616 | 0.742 | 0.504 | 0.602 | 0.583 | 0.652 | 0.550 | 0.682 | 0.799 | 0.893 | 0.849 | 0.901 |
InfoGNN-I | 0.681 | 0.764 | 0.583 | 0.694 | 0.598 | 0.739 | 0.549 | 0.695 | 0.824 | 0.887 | 0.885 | 0.928 |
InfoGNN-IT | 0.697 | 0.790 | 0.590 | 0.723 | 0.618 | 0.768 | 0.587 | 0.725 | 0.827 | 0.892 | 0.899 | 0.937 |
InfoGNN-ITS | 0.720 | 0.792 | 0.623 | 0.728 | 0.646 | 0.766 | 0.593 | 0.723 | 0.827 | 0.886 | 0.906 | 0.937 |
InfoGNN | 0.735 | 0.792 | 0.652 | 0.721 | 0.669 | 0.765 | 0.610 | 0.723 | 0.865 | 0.906 | 0.924 | 0.938 |