Abstract

Convolutional neural networks (CNNs) have received widespread attention due to their powerful modeling capabilities and have been successfully applied in natural language processing, image recognition, and other fields. On the other hand, traditional CNN can only deal with Euclidean spatial data. In contrast, many real-life scenarios, such as transportation networks, social networks, reference networks, and so on, exist in graph data. The creation of graph convolution operators and graph pooling is at the heart of migrating CNN to graph data analysis and processing. With the advancement of the Internet and technology, graph convolution network (GCN), as an innovative technology in artificial intelligence (AI), has received more and more attention. GCN has been widely used in different fields such as image processing, intelligent recommender system, knowledge-based graph, and other areas due to their excellent characteristics in processing non-European spatial data. At the same time, communication networks have also embraced AI technology in recent years, and AI serves as the brain of the future network and realizes the comprehensive intelligence of the future grid. Many complex communication network problems can be abstracted as graph-based optimization problems and solved by GCN, thus overcoming the limitations of traditional methods. This survey briefly describes the definition of graph-based machine learning, introduces different types of graph networks, summarizes the application of GCN in various research fields, analyzes the research status, and gives the future research direction.

1. Introduction

AI has drawn the attention of the whole industry in recent years as a frontier field of scientific research and has progressively become a new engine for social and economic development [1]. NLP, computer vision, intelligent robots, data mining, cognition and reasoning, and other disciplines of society have widely practiced and implemented AI. Today’s network data traffic is increasing dramatically due to the rapid expansion of smart devices such as smartphones, smart automobiles, and smart homes. Simultaneously, using technologies such as edge computing, virtualization, and network slicing broadens network services, improves user experience, and creates a more complicated network environment. The efficient management of a large number of intelligent devices and the optimization of resource allocation in large-scale and complex network environments has emerged as a critical challenge for future network growth. AI, as the future network’s brain, is employed for network optimization and decision-making. Simultaneously, increasing the computational capability of network nodes adds bones and muscles to the network, allowing intelligent network calculations to be performed [2]. Besides that, the image processing field is also improving day by day due to different high-dimensional and complex images generated from different data sources. Deep learning applications have attracted great attention in the field of image processing due to their ultrahigh prediction accuracy in recognition applications, which is bound to improve the performance of existing image processing systems and open up new application fields [35].

The coordinated development of algorithms and computing power will enable future applications to enter a new era of intelligence. Graph data is a typical non-Euclidean spatial data with complex correlations and interobject dependencies [6]. The traditional graph theory method is difficult to adapt to the complex graph problems in the future network. Therefore, finding algorithms to solve complex graph data to guide the resource allocation, management, and scheduling of communication networks has become an important scientific problem in future networks. As an emerging technology in the field of AI in recent years, GNN has opened up a new space for processing complex graph structure data. With the help of artificial intelligence technologies such as deep learning and reinforcement learning, GCN can quickly mine topological information and complex features in graph structures and have solved many major problems in the fields of computer vision, recommendation systems, and knowledge graphs [7, 8]. Therefore, the combination of GCN and the latest advancement is an important way to solve real-world issues efficiently and effectively.

1.1. Significance of this survey

CNN has grown dramatically in recent years, attracting worldwide attention due to its remarkable modeling skills. In comparison to older methods, the introduction of CNN has resulted in significant advances in the disciplines of image processing and natural language processing, such as machine translation, image recognition, and speech recognition, among others [9]. Traditional convolutional neural networks, on the other hand, can only deal with data in Euclidean space (such as images, text, and speech), and the data in these domains is translation invariant. Because of translation invariance, we can build a globally shared convolution kernel in the input data space, allowing us to define a convolutional neural network. Using picture data as an example, an image can be represented as a set of regularly distributed pixels in Euclidean space, and translation invariance means that local structures of the same size can be produced with any pixel as the center [10]. Based on this, the CNN develops meaningful hidden layer representations for pictures by learning convolution kernels shared at each pixel and then models local connections. Figure 1 depicts the distinction between non-Euclidean and Euclidean space.

Although traditional CNN bring improvements in text and image domains, they can only handle Euclidean space data. At the same time, non-Euclidean spatial data: graph data, has gradually attracted attention due to its ubiquity. Graph data can naturally express data structures in real life, such as transportation networks, World Wide Web, and social networks. Different from image and text data, the local structure of each node in graph data are different, which makes the translation invariance no longer satisfied [11]. The lack of translation invariance poses a challenge to define CNN on graph data. In recent years, due to the ubiquity of graph data, researchers have begun to focus on how to construct deep-learning models on graphs. With the help of the CNN’s ability to model local structures and the ubiquitous node dependencies on the graph, the GCN has become the most active and important algorithm. Recently, some articles have emerged to explore and summarize deep learning on graphs, but for the most important branch, GCN, the in-depth discussion and summary of its modeling methods and applications are still an area that needs to be focused on. In this regard, in this article, we deeply organize and summarize the development history and future trends of GCN with applications developed recently in all fields of science. The challenges faced in the construction of GCN mainly come from the following aspects:(i)Graph data are non-European spatial data:Graph data, as non-Euclidean spatial data, does not satisfy translation invariance, which means that each node has different local structures. Traditional convolutional neural networks' basic operators, convolution and pooling, rely on data translation invariance. At the moment, defining convolution and pooling operators on graph data are a difficult task.(ii)Different characteristics of graph data:Graph data can be used to represent a wide range of real-world applications, such as social networks, citation networks, and political relationships networks, all of which have unique characteristics that can be represented by graph data. Positive and negative tendencies are linked to signs, symbols, and other indicators. GCNs are more difficult to design because they have to model a wider range of graph characteristics.(iii)Graph data with large scale:Large-scale graphs, such as user commodity networks and user networks in social networks, can be used in practical applications with millions or even tens of millions of nodes in them. Additionally, the challenge of building a large-scale graph convolutional neural network with acceptable time and space constraints is a major one.

The current survey’s primary focus is GCN, including its variants and the most recent GCN trends. We specifically cover all of the most recent works that use the GCN in various fields of science. We concentrate on works published between the years 2000 and 2022. We used the PRISMA (preferred reporting items for systematic reviews and meta-analyses) framework guidelines to select GCN-related publications. Papers were found through a variety of sources of publishers, including Springer, IEEE, MDPI, HINDWAI, WILEY, Elsevier, and ACM library. Articles were searched with different titles as follows: “graph convolutional networks,” “graph networks,” “GCN,” “Graph attention network,” “Attention-based Graph,” “GAT,” “GATnet,” and “Graph Query.” All the articles were searched only in the English language in the selected time period. The main contributions of this survey are three-fold as follows:(1)We provide a thorough analysis of GCN deep learning techniques, including variants and advancements in GCN, applications, and current trends in various fields of study, performance measures, and so on(2)A hierarchical and structural review of recent improvements in deep learning-based GCN techniques is offered, and the benefits and limitations of each component for an effective GCN solution are examined(3)In order to provide intelligent, advice to the general public, we discuss the obstacles and unresolved concerns, as well as new trends and future directions

Figure 2 gives the complete structure of our survey. This paper firstly introduces the basic model of GNN and several important graph neural networks; secondly, it introduces the specific application methods of GNN in various fields of research such as NLP, computer vision etc; in the conclusion part, it discusses the current research status and gives the future research direction.

2. Graph Neural Network (GNN) and Its Variant

GNN was first proposed by Gori et al. [12] and Scarselli et al. [13] elaborated on this model in more detail. GNN proposed by Gori et al. [12] draws on the research results in the field of neural networks, which can directly process graph structure data, and its core is the local transfer function and the local output function. The local transfer function generates the state vector of the node, which contains the neighborhood information of the node. The transfer function is shared among all nodes and updates the node’s state vector h1 according to the input neighborhood, and its expression is as follows:

In the formula, is the feature of the node is the feature connecting the node and its neighbor node u, is the feature of the neighbor node of the node . The local output function generates a new representation of the node, and its expression is as follows:

The stacking form of the local transfer function and the local output function applied to all nodes constitutes a GNN structure model that will eventually reach a stable state through iteration. The early graph neural network has great limitations, its efficiency is low, the computational cost is high, and the node characteristics. It is difficult to affect the state after multiple updates. In recent years, in order to process graph structure data more efficiently, new graph neural networks and application studies have been proposed one after another.

2.1. Graph Convolutional Networks (GCNs)

GCN introduces convolution operation into graph structure and is one of the most important GNNs at present. According to the different feature extraction methods, it can be divided into GCN based on spectral-domain and graph convolution network based on the spatial domain. The graph convolution network are derived from graph signal processing, and a filter is introduced to define graph convolution, which can be understood as removing noise through a filter to obtain the classification result of the input signal.

Based on the spectral graph theory, Bruna et al. first proposed the convolution layer function to define the spectral domain GCN [14]. Kipf and Welling [15] first proposed the concept of semisupervised GCN based on the spectral domain (structure shown in Figure 3). The spectral domain graph convolution is defined as the product of the signal and the filter function, and its expression is as follows:

In the formula, is the filter function, x is the signal of the graph at the node, and U is the eigenvector of the normalized Laplacian matrix of the graph. can be understood as the eigenvalue function of the graph Laplacian matrix, namely, (Λ), where Λ is the diagonal matrix composed of the eigenvalues of the graph Laplacian matrix, and θ is the function parameter. In order to reduce the computational complexity, (Λ) can be approximated, and its expression is as follows:

In the formula, is the k-order Chebyshev polynomial, is the Chebyshev coefficient vector, L is the graph Laplacian matrix, is the largest eigenvalue of L, is the identity matrix, D is the opposite angle matrix, and A is an adjacency matrix. When limiting k = 1, the convolutional layer can be simplified to as follows:

Then, the convolutional layer formula of the graph convolutional network is as follows:

In the formula, σ(·) is the nonlinear activation function, and is the l-th layer graph convolution of the weight matrix of the network.

After the concept of GCN was proposed, new forms of graph convolutional network models based on spectral-domain have been proposed, such as AGCN [16], AGC [17], and so on. However, the GCN based on the spectral domain cannot handle directed graphs and has poor scalability, while the GCN based on the spatial domain is more flexible and general. Spatial domain-based graph convolutional networks define graph convolutions according to the spatial relationships of nodes. NN4G [17] is the earliest proposed GCN based on spatial domain, which realizes graph convolution by directly accumulating feature information of node neighborhood. The message passing neural network (MPNN) proposed by Gilmer et al. [18] can be regarded as a general framework for GCN based on the spatial domain. MPNN decomposes the spatial domain convolution into two processes: information transfer and state update, and it takes the feature of node as the initial state of the hidden state, namely,where is the feature of node . The hidden state update formula of MPNN is as follows:where l is the layer index, (·) is the update function, and (·) is the information transfer function. After obtaining the hidden representation of all nodes in the graph, the representation of the entire graph can be generated by the readout functionwhere R(·) is the readout function. By defining different forms of update function, information transfer function and readout function, MPNN can represent a variety of spatial domain-based graph convolutional networks. Typical spatial domain-based graph convolutional networks also include PATCHY-SAN [19], GraphSage [20], and Diffusion CNN [21].

2.2. Graph Attention Network (GAT)

GAT introduces an attention mechanism on the basis of GCN, which enables the model to focus on the information most relevant to the current task, thereby improving the performance of the model. In spectral domain-based GCN, the filter function depends on the Laplacian matrix, which is derived from the graph structure, which makes the model trained on a specific graph cannot be directly applied to other graph structures. To solve this problem, Velikovi et al. [22] proposed a new type of graph neural network structure, namely, GAT, and Figure 4 shows the attention mechanism of GAT.

The GAT learns the average value of the neighborhood features of each node in the graph and is diluted and weighted according to the importance of the neighborhood. The graph attention layer is the key structure of GAT to realize the attention mechanism. The graph attention layer takes the features of the nodes in the graph as input and outputs another set of higher-level node features that may have different cardinality. The graph attention layer realizes the conversion of input and output through the attention coefficient obtained by the attention mechanism a. The attention coefficient represents the importance of node j to node i, and its expression is as follows:

In the formula, W is the weight matrix applied to all nodes, representing the relationship between input features and output features; and are the features of node i and node j, respectively. The model introduces the attention mechanism into the graph structure by only calculating the attention coefficients of nodes and their neighbors without considering the structural information of the graph. To simplify operations and facilitate comparison, the attention coefficients are regularized and used to generate output features as follows:where σ(·) is the nonlinear activation function; αij is the regularized attention coefficient. GAT also introduces a multihead attention mechanism similar to the transformer architecture, which can perform parallel computing on adjacent node pairs and stabilize the learning process. The complexity of the GAT method is low and it only pays attention to the adjacent nodes without the information of the whole graph, and it does not need to repeat the training model when it is applied to the new graph structure. For complex graph structures, some studies have proposed new graph attention networks, such as heterogeneous GAT [23], multirelational GAT [24], and spectral GAT [25]. These models can be used in more complex and informative networks to achieve better results.

2.3. Graph Autoencoder

GAE is an unsupervised learning framework that can convert graph structures into low-dimensional vectors and reconstruct graph structures using encoded information and is often used for graph embedding (GE) and graph structure generation [26]. Graph embedding is a graph representation learning (GRL) method that aims to map graph structure data into low-dimensional dense vectors while preserving node information. Graph embedding enables graph-structured data to be more efficiently applied to traditional machine learning algorithms to achieve better results in tasks such as recommendation and classification. Typical methods include random walk-based graph embeddings, such as DeepWalk [27] and Node2Vec [28] (Figure 5 shows the difference between DeepWalk and Node2Vec), and graph embeddings based on matrix decomposition, such as singular value decomposition (SVD), locally linear embedding (LLE), and non-negative matrix factorization (NMF). Compared with graph embeddings based on the random walk and matrix factorization, graph autoencoders can be applied to highly nonlinear graph structures, preserving the nonlinear structure and complex features of graphs. In 2014, Tian et al. [29] applied autoencoders to graph data for the first time, which took the adjacency matrix of the graph or its variants as the original node features and generated it by stacking sparse autoencoders (SAE). The nonlinear embedding of the graph, that is, the low-dimensional node representation. Structural deep network embedding (SDNE) [30] is an important graph autoencoder model that also adopts the stacked autoencoder structure. It maintains the local network structure and global network structure of the graph through the first-order similarity and second-order similarity between nodes, respectively. Multilayer nonlinear functions generate-graph embedding vectors. The hidden layer expression of SDNE is as follows:where is the feature of node , is the weight matrix of the lth layer, and is the deviation of the lth layer. After the final hidden layer output is obtained, the output representation x' can be obtained by inverting the calculation process of the encoder. SDNE contains two loss functions, of which the first loss function adopts the idea of a Laplacian feature map to preserve the first-order similarity, and its expression is as follows:

In the formula, represents the connection relationship of the nodes in the graph, if and only when the node i is connected with the node j,  > 0. The second loss function is used to maintain the second-order similarity, and a penalty vector is introduced to impose a larger penalty on the reconstruction error of nonzero elements than zero elements. Its expression is as follows:where represents the Hadamard product. , , otherwise  = β > 1. To maintain both first- and second-order similarity, the joint loss of SDNE is as follows:where is the regularized L2 norm, which is used to prevent overfitting.

Another type of graph autoencoder uses variational autoencoders (VAE) [31] to implement graph embedding, which is an important generative model and can improve the generalization ability of the model. VGAE [32] applies the variational autoencoder to the graph structure, and its inference model, namely, the encoder, utilizes a 2-layer GCN structure, whose expression is as follows:where μ is the mean matrix of the encoder, log(σ) is the variance matrix, X is the feature matrix, A is the adjacency matrix, and is the random latent variable. The generator function of VGAE, the decoder, is derived from the inner product of the hidden variables, and its expression is as follows:

2.4. Other Graph Neural Networks

In addition to graph convolutional networks and graph attention networks, commonly used graph neural networks also include gated graph neural networks (GGNNs) and spatial-temporal graph neural networks (STGNNs). The gated graph neural network is an improvement on the traditional graph neural network architecture. By introducing the gated recurrent unit (GRU) into the graph neural network, the performance of the model in the long-term propagation of information is improved. The gated graph sequence neural network proposed by Ruiz et al. [33], introduced the gated recurrent unit into the information dissemination process and controlled the iterative loop to a fixed number of steps and no longer needed parameter constraints to ensure convergence. In addition to this model, the gated graph neural network model also includes GAAN [34] and so on. The spatiotemporal graph [30] is a graph structure that depicts the interaction between entities in the spatial and temporal dimensions. It has three basic elements: nodes, spatiotemporal edges, and temporal edges. The feature matrix in the high-dimensional feature space will change with time. The spatiotemporal graph neural network can learn the hidden patterns in the spatiotemporal graph, and obtain the feature information of the time domain and the spatial domain in the graph structure at the same time. Spatiotemporal graph neural networks can be divided into methods based on recurrent neural networks (RNNs) and methods based on convolutional neural networks (CNNs). RNN-based STGNN captures spatiotemporal correlations through graph convolution. Compared with the RNN-based method, the CNN-based STGNN processes the spatiotemporal graph in a nonrecursive way, which can perform parallel computation and avoid the problem of gradient explosion or gradient disappearance, such as CGCN.

2.4.1. Graph Convolutional Neural Network Spectral Method

The lack of translation invariance on graphs makes it difficult to define convolutional neural networks in the node domain. The spectral method uses the convolution theorem to define graph convolution from the spectral domain. We first give some background on the convolution theorem.(1)Graph signal processing:Convolution theorem: the Fourier transform of the signal convolution is equivalent to the product of the signal Fourier transform [35]as follows:Among them, f, g represents the two original signals, F (f) represents the Fourier transform of f, · represents the product operator, and represents the convolution operator. Perform inverse Fourier transform on both sides of (1), we can getAmong them, represents the inverse Fourier transform of the signal f. Using the convolution theorem, we can multiply the signal in the spectral space, and then use the inverse Fourier transform to convert the signal to the original space to realize the graph convolution, thus avoiding the convolution caused by the graph data not satisfying the translation invariance. Define difficult problems. The Fourier transform on the graph depends on the Laplacian matrix on the graph. In the following, we will give the definition of the Fourier transform on the graph.The definition of the Fourier transform on the graph depends on the eigenvectors of the Laplace matrix. Taking the eigenvectors as a set of bases in the spectral space, the Fourier transform of the signal x on the graph is as follows:Among them, x refers to the original representation of the signal in the node domain. refers to the representation of the signal, x transformed into the spectral domain, and represents the transposition of the eigenvector matrix, which is used for Fourier transform. The inverse Fourier transform of the signal x is as follows:Using the Fourier transform and inverse transform on the graph, we can implement the graph convolution operator based on the convolution theorem as follows:Among them, represents the graph convolution operator, x, y represents the signal of the node domain on the graph, and ⊙ represents the Hadamard multiplication, which represents the multiplication of the corresponding elements of the two vectors. We replace the vector with a diagonal matrix , then Hadamard multiplication can be transformed into matrix multiplication.(2)Graph convolutional neural network based on convolution theorem.Spectral convolutional neural network (Spectral CNN) [36] is the earliest method to construct a convolutional neural network on the graph. This method uses the convolution theorem to define the graph convolution operator at each layer and passes the gradient under the guidance of the loss function. Back-pass learning convolution kernel, and stack multiple layers to form a neural network. The structure of the mth layer of the spectral convolutional neural network is as follows:Among them, p, q are the dimensions of the input feature and output feature, respectively, represents the ith input feature of the node in the mth layer on the graph, represents the convolution kernel in the spectral space, and h represents the nonlinear activation function. In the spectral convolutional neural network, such a layer structure transforms the features from p-dimensional to q-dimensional, and based on the convolution theorem, graph convolution is realized by learning the convolution kernel.

The spectral convolutional neural network applies the convolution kernel to the input signal in the spectral space, and uses the convolution theorem to realize graph convolution to complete the information aggregation between nodes, and then applies the nonlinear activation function to the aggregation result and Stack multiple layers to form a neural network. The model does not satisfy the locality, so the locality of the spectral convolutional neural network is not guaranteed; that is, the nodes that generate information aggregation are not necessarily adjacent nodes.

The original intention of modeling a GCN is to use the graph structure to describe the information aggregation of adjacent nodes, and the spectral convolutional neural network introduced previously does not satisfy locality. Recently, graph wavelet neural network (GWNN) [37] proposed to use of the wavelet transform instead of Fourier transform to realize the convolution theorem.

The wavelet neural network pointed out that, similar to the Fourier transform, the wavelet transform also defines a method to transform the signal from the nodal domain to the spectral domain. Here, we use to represent the basis of wavelet transform, where represents the energy diffusion from the ith node, which describes the local structure of the ith node. The definition of the wavelet basis depends on the eigenvectors of the Laplace matrix, namely, , where , and the diagonal elements are obtained by applying the function to the eigenvalues. Different functions endow the wavelet base with different properties. In the wavelet neural network, the author uses the thermal kernel function, .

Taking as the base of the spectral space, the transformation matrix of the inverse wavelet transform in the figure is , where G-s represent the replacement of the previous function with .

Compared with the Fourier transform, the basis of the wavelet transform has several good properties: (1) the basis of the wavelet transform can be obtained by the Chebyshev polynomial approximation, avoiding the high cost of the Laplace matrix eigen decomposition, (2) the wavelet the transformed basis has locality, and (3) the locality of the wavelet basis makes the wavelet transform matrix very sparse, which greatly reduces the computational complexity of and makes the calculation process more efficient. The parameter s is used to represent the range of heat diffusion, and it can be flexibly adapted to different task scenarios by adjusting the hyperparameters.

Using the wavelet transform on the graph to replace the Fourier transform, the mth layer structure of the wavelet neural network is defined as follows:

Compared with the spectral convolutional neural network, the wavelet neural network replaces the Fourier transform with the wavelet transform; that is, it replaces U and with Ψ and . Under such a set of wavelet bases, the graph convolutional neural network satisfies locality, and the computational complexity of the graph convolutional neural network is greatly reduced due to the accelerated computation and sparseness of the wavelet base. In addition to wavelet neural networks, there are also some works dedicated to realizing locality and accelerated computation of graph convolutional neural networks, but different from the way wavelet neural networks replace the substrate, these works achieve locality by parameterizing convolution kernels, while reducing parameter complexity and computational complexity.

GraphHeat [38] analysed the previous spectral methods from the perspective of filters and pointed out that spectral convolutional neural networks are nonparametric filters, while Chebyshev networks and first-order graph convolutional neural networks are high-pass filters. However, this is inconsistent with the smoothness prior to the task of graph semisupervised learning. Based on this, the graph thermal kernel network uses the thermal kernel function to parameterize the convolution kernel and then implements a low-pass filter.

2.4.2. Spatial Method of Graph Convolutional Neural Network

In contrast to the previous methods, which all start from the convolution theorem to define graph convolution in the spectral domain, the spatial method aims to start from the node domain and aggregate each central node and its adjacent nodes by deriving an aggregation function from each central node. The definition of the general framework draws attention to the fundamental problems of graph convolutional networks and provides a platform for a comparative analysis of previously published work in the field. Two recent papers aim to define a general framework for graph convolutional networks by combining their respective contributions. In particular, hybrid convolutional networks (MoNet) [20] concentrate on the lack of translation invariance on graphs and map the local structure of each node by defining a mapping function, which is then applied to each node. Message propagation networks (MPNNs) [18], on the other hand, are based on the aggregation of information propagation between nodes and propose a framework by defining a general form of the aggregation function. The lack of translation invariance makes it difficult to define graph convolutional neural networks, which is a necessary but not sufficient condition. When applied to a graph, hybrid convolutional networks define an orthogonal coordinate system and represent the relationship between nodes as a low-dimensional vector in the new orthogonal coordinate system. At the same time, the hybrid convolutional network defines a set of weight functions that can be used to train the network. The weight function acts on all adjacent nodes centered on a node. The input is the relationship between nodes (a low-dimensional vector), and the output is a scalar value. With this cluster of weight functions, the hybrid convolutional network obtains a vector representation of the same size for each node as follows:

Among them, N(x) represents the set of adjacent nodes of x, f (y) represents the value of the node y on the signal f, u(x, y) represents the node under the coordinate system u, the low-dimensional vector representation of the relationship, represents The jth weight function, and J represents the number of weight functions. This step makes each node get a J-dimensional representation, and this representation integrates the local structure information of the node. The hybrid convolution model defines the shared convolution kernel on this J-dimensional representation

Different from the hybrid convolutional network, the message propagation network points out that the core of graph convolution is to define the aggregation function between nodes. Based on the aggregation function, each node can be represented as the information superposition of surrounding nodes and itself. Therefore, this model proposes a general framework for graph convolutional networks by defining a general aggregation function. The message dissemination network is divided into two steps. First, the aggregation function is applied to each node and its adjacent nodes to obtain the local structural expression of the node; then, the update function is applied to itself and the local structural expression to obtain the new express

Among them, represents the hidden layer representation of the node x in the t-th step, represents the edge feature of the nodes x, y, represents the aggregation function in the t-th step, represents the local structure expression obtained by the node x after passing the aggregation function, and represents the update function of the t-th step. Using the previously mentioned aggregation function and update function to designate each layer of the neural network, each node can continuously update itself with information from its own and neighboring nodes as the source information, and then obtain a new expression that is dependent on the local structure of the node. Some methods, such as those described previously, no longer rely on Laplacian matrices but instead design neural networks for learning aggregation functions in the context of the spatial framework. Aggregate functions learned through these methods can be tailored to specific tasks and graph structures, resulting in greater adaptability and flexibility. Different methods of Graph approach are summarized below in Table 1:

3. Applications of GCN

Researchers have been paying close attention to the graph convolutional neural network since it was first proposed, particularly in the fields of network analysis, recommender systems, biochemistry, traffic prediction, computer vision, and natural language processing, among others. It is not only traditional machine learning fields such as computer science, artificial intelligence, and signal processing that can benefit from graph convolutional neural networks but also interdisciplinary research fields such as physics, biology, chemistry, and the social sciences that can benefit from this technology. Different fields contain a variety of different graph data, and the relationship between nodes and edges is also different. How to combine domain knowledge to model the given graph data using GCN is a key issue in the application of GCN.

3.1. GCN in Communication Network

The previous graph neural network method has the ability to deal with complex communication network problems and has been applied to network function virtualization, wireless network resource allocation, network modeling, and performance analysis. Software-defined network (SDN) and network functions virtualization (NFV) have been researching hotspots in the field of communication networks in recent years. SDN separates the control plane and forwarding plane of the network.

Obtain the topology and resource information of the entire network. NFV uses virtualization technology to separate network functions from traditional hardware devices, which improves the flexibility of network configuration. GNN can be used to solve problems in SDN and NFV that need to explore graph structures, such as dynamic resource allocation, service function chain (SFC) establishment, and virtual network embedding (VNE). Rafiq et al. proposed a supervised learning method for SFC traffic prediction. The method uses GNN to map the input historical traffic to the output predicted traffic and adjust the resource allocation accordingly. The graph neural network of this model trains 2 functions: the transfer function of the point and the output function. The transition function of node n inputs the features of n, all adjacent edge features, all adjacent node features and states, and outputs the state of node n. The output function computes nodes based on the state and characteristics of the point’s output [39]. Li et al. [40] used GNN to predict NFV resource requirements so as to obtain advanced information about upcoming requests and improve the effectiveness of SFC reconstruction algorithms based on deep reinforcement learning. Network traffic migration is also an important branch of dynamic resource allocation. Sun et al. [41] proposed a method for NFV network traffic migration using GNN and deep reinforcement learning. The method maps the input network topology to the output network topology after migration, which is used to realize the expansion of network traffic, reduction, and load balancing. The essence of the SFC dynamic resource allocation problem is the transformation of the topology structure, the optimization goal is the total end-to-end delay, and there are no complex constraints, so it is easy to use GNN to solve.

The virtual network mapping problem is similar to the SFC establishment problem, but the network request and resource constraints are more complicated. VNE problems are divided into node mapping and link mapping. The existing methods of GNN to solve the VNE problem mainly focus on the node mapping. Habibi et al. [42] proposed a method of using GAE to assist VNE physical node classification. The input of the model is the adjacency matrix and the resource feature matrix, and a supervised learning model that can reconstruct the network topology is trained through the graph neural network. Yan et al. [43] proposed to use of GCN combined with deep reinforcement learning to complete the node classification task. This method uses actor-critic reinforcement learning, in which GCN is used to extract physical node features, and the features extracted from physical nodes and virtual network requests are fused through feed-forward neural networks (FF), and finally, the probability of mapping nodes is obtained. In fact, for the SFC establishment and VNE problems of large-scale complex networks, considering the complexity of node and link resources and optimization objectives, graph neural networks are powerful tools for extracting topological information, with the potential to provide faster and more optimized solutions.

With the rapid development and application of technologies such as 5G, the Internet of Things, and edge computing, the problem of resource allocation in wireless networks has become more and more important. Through the effective allocation of resources, various optimization goals can be achieved in different application scenarios, and the utilization rate of network resources can be improved. The problem with wireless power control is how to determine the transmit power of each transmitter so that the network can achieve the overall optimal signal-to-noise ratio. Its basic model is an optimization problem with constraints. The optimization target is the weighted sum of the ratio of signal to interference plus noise, and the constraint is the transmit power of the base station or equipment. Shen et al. [44] proposed to represent the multiuser wireless channel with a complete graph and used GNN to solve the power control problem. The node of the complete graph is a transceiver pair, and the node features include the direct channel state and weight; the link of the graph is the interference channel, and the link feature is the interference channel state. The method trains the transfer function and output function through GCN to output the optimal transmit power of each transmitter. Considering the situation of base stations and users in practical problems, Guo and Yang [45] proposed a method to solve the power control problem in heterogeneous networks. The nodes of this model include two kinds of heterogeneous nodes: base station and user. Heterogeneous nodes use different transfer functions and use parameter sharing to obtain output results. The wireless power control problem is not an intuitive graph structure problem, so it is necessary to transform the problem into a graph structure through modeling, and then use the GNN model to solve it.

Nakashima et al. [46] used GCN based on deep reinforcement learning to extract the features of the channel vector with topological information, and then generate the channel deployment strategy. This method can perform channel allocation in densely deployed wireless local area networks, thereby improving system throughput. Yan et al. [43] proposed an energy-saving topology control algorithm based on GCN, which uses GCN to imitate the maximum spanning tree algorithm for link prediction, and introduces new edges into the topology according to the probability graph, which optimizes the wireless network in 5G and B5G environments.

Network modeling and performance analysis is a fundamental problem to realize an efficient communication network. As mentioned previously, GNN can be used for resource optimization of wired and wireless networks. Various resources in the network are allocated to devices through optimization strategies. Therefore, an efficient network model is urgently needed to evaluate the quality of resource allocation. Rusek et al. [47] proposed RouteNet, which uses GNN to accurately evaluate the end-to-end delay and packet loss of network paths. RoutNet takes network topology, traffic matrix, and end-to-end path as input, and outputs performance evaluation indicators (delay, jitter, packet loss, etc.) according to the network state. RouteNet contains a multilayer information transfer neural network, uses RNN as the transfer function, compresses the link and path information into the hidden state vector, and finally obtains the evaluation index value of the path through the output function. RouteNet is used for the following two example problems: (1) routing optimization based on network delay and packet loss and (2) network topology upgrade with budget constraints.

Routing is an ancient and core optimization problem in the field of communication networks. Artificial intelligence algorithms have been used for network routing. Geyer et al. [48] proposed to use of GNN to learn distributed routing algorithm. This method abstracts the router interface as a point in the topology and uses GNN to train the hidden node information so that each node has a local representation of the graph topology. This method is a rare distributed-oriented GNN application. Secure network communication relies heavily on encrypted network traffic, and it can help protect sensitive data and maintain its integrity. However, it obscures the data’s characteristics, makes it more difficult to identify malicious traffic, and shields such activity from detection. Consequently, encryption alone cannot guarantee fundamental information security. It is also important to keep an eye out for suspicious activity by monitoring traffic. Traffic classification methods based on statistical features and graphs are currently the most widely used. The limitations of these two approaches make them unreliable when used to detect malicious traffic that is encrypts its contents. Prior to this, the external connections between the network flows were not considered at all in the analysis. The latter, on the other hand, is completely the opposite. GCN model called GCN-ETA is proposed by [49] which considers the statistical features (internal information) of network flows and the structural information (external connections) between them to identify malicious traffic. GCN-ETA has two parts: an improved GCN feature extractor and a decision tree classifier. It is possible to enhance the effectiveness and speed of detecting malicious encrypted traffic by modifying the traditional GCN, and this can be used as a model for the implementation of GCN in similar scenarios [50]. The design of poisoning-resistant graph neural networks is extremely difficult, and several attempts have been made in the past. Existing research attempts to reduce the negative impact of adversarial edges only with the poisoned graph, which is suboptimal because they fail to distinguish between adversarial and normal edges. Tang et al. developed PA-GNN relies on a penalized aggregation mechanism that directly restricts the negative impact of adversarial edges by assigning them lower attention coefficients [51]. Pan et al. proposed the traffic classification method using GCN and LSTM, which low labeled sample for model classification and getting better accuracy using GCN [52].

3.2. GCN in Medical Imaging

Coronavirus 2019 (COVID-19) disease chest computed tomography (CT) scans are typically derived from multiple datasets gathered from various medical centers, with images sampled using a variety of acquisition protocols. However, while combining datasets from multiple sites increases sample size, it is hampered by intercenter heterogeneity, which makes comparisons difficult. The following steps are proposed by Song et al. [53] for diagnosing COVID-19 using an augmented multicenter graph convolutional network to address this issue: (AM-GCN). Amgen’s convolutional neural network (AM-GCN) extracts features from initial CT scans using a 3-D CNN, which is supplemented by a ghost module and a multitask framework to improve the network’s performance. This study uses the extracted features to construct a multicenter graph that takes into account intercenter heterogeneity, as well as the disease status of training samples, as described in the following section. In addition, the AM-GCN algorithm employs an augmentation mechanism to increase the number of training samples, resulting in an augmented multicenter graph. This method achieved a mean accuracy of 97.76 percent based on 2223 COVID-19 subjects and 2221 normal controls from seven medical centers.

Given the high cost of exhaustively annotating 3D data, a more sustainable approach would be to develop diagnosis algorithms using only patient-level labels. Chen et al. [54] proposes the Instance Importance-aware GCN (I2GCN) for multi-instance learning (MIL), motivated by the fact that 2D slices of 3D data exhibit explicit diagnostic efficacy. To be more precise, this study begins by calculating the instance importance of each slice for diagnosis using a preliminary MIL classifier, which is then used to promote the refined diagnosis branch. Create the instance importance-aware graph convolutional layer (I2GCLayer) in the refined diagnosis branch to exploit complementary features in both importance-based and feature-based topologies. Additionally, to address the deficient supervision of 3D datasets, the importance-based subgraph augmentation (SGA) technique was proposed to effectively regularize framework training.

Zhu et al. [55] developed Interpretable Dynamic GCN (IDGCN) to enhance the performance of personalized alzheimer’s disease diagnosis and to generate interpretable results. This is accomplished through the incorporation of interpretable feature leaning and dynamic graph leaning into a GCN architecture. More precisely, interpretable feature learning ensures that diagnosis results are interpretable, and preclassification ensures that selected features are classification-oriented. Additionally, by adjusting the similar and dissimilar correlations of all objects, dynamic graph learning dynamically updates the graph structure for GCN to produce superior diagnosis results. Thus, by optimizing feature learning, graph learning, and the GCN simultaneously, the proposed disease diagnosis method not only generates reliable personalized diagnoses but also provides interpretability for diagnosis results. Similarly, Jiang et al. [56] proposed a hierarchical GCN framework (called hi-GCN) to learn the graph feature embedding, while considering the network topology information and subject’s association at the same time. Memory, thinking, behavior, and emotion are all affected by dementia, which is a term used to describe progressive brain syndromes. A dementia patient’s ability to carry out everyday tasks may deteriorate, and they become increasingly dependent on their caregivers. As a result, spotting the early signs of cognitive decline and alerting caregivers and doctors would be beneficial. Arfoglu et al. [57] used GCN to recognize activities and flag abnormal behavior related to dementia.

Figure 6 shows a standardized approach implemented by different studies for the classification of medical images. For medication recommendation and lab test imputation, Mao et al. [58] developed MedGCN, a machine learning framework based on MedGraph. The framework can be applied to a wide range of medical procedures. MedGCN built a graph to associate four different types of medical entities, namely patients, encounters, lab tests, and medications, and then used a graph neural network to learn node embeddings for medication recommendation and lab test imputation. Shi et al. [59] proposed a new method called cell-graph convolutional neural network (CGC-Net) that converts each large histology image into a graph in which each node is represented by a nucleus within the original image and cellular interactions are denoted as edges between these nodes based on node similarity. To improve the algorithm’s performance, the CGC-Net employs nuclear appearance features in addition to spatial node location. Zhang et al. [60] proposed a BDR-CNN-GCN model using batch normalization with CNN and GCN to get an accurate classification of breast disease. Yin et al. [61] create a novel multi-instance deep learning method for building a robust classifier by treating multiple 2D ultrasound images of each individual subject as multiple instances of the same bag. Convolutional neural networks (CNNs) are used in this method to learn instance-level features from 2D US kidney images, and GCNs are used to further optimize the instance-level features by exploring potential correlations among instances of the same bag. This study also use full-connected neural networks (FCNs) to learn bag-level features using gated attention-based MIL pooling. Table 2 gives a detailed comparison of different approaches using graphs.

3.3. GCN for Recommendation and Prediction

In today’s web platforms and applications, recommender systems are widely used as important tools to alleviate information overload and improve user experience. They are now widely deployed. It is a hot topic right now to take more user preferences into account when making recommendations. Although the systems often choose “click” or “purchase” as the optimization target in real-world information systems, there are also various types of user behaviors, such as view and add-to-cart. Users have the option of viewing, adding to a cart, and ultimately purchasing any item. In order to create a more precise recommender system, data on a user’s diverse behaviors is crucial. Before this study, researchers would typically use a default value (i.e., “other”) to represent the missing attribute, which resulted in suboptimal performance. In order to address this issue, Liu et al. propose an attribute-aware attentive graph convolution network that is both fast and accurate (A2-GCN) [20]. The A2-GCN algorithm, in particular, begins by constructing a graph in which nodes represent users, and items represent attributes. Following that, A2-GCN makes use of the graph convolution network to characterize the complex interactions among the participants. This model also employs the message-passing strategy to aggregate the messages passed from the other types of nodes that are directly linked in order to learn the node representation (see Figure 2) (e.g., a user or an attribute). Guo et al. used a similar approach and developed a domain-aware GCN (DA-GCN) model, which links users and items in each domain as a graph [67]. Shehnepoor et al. used GCN for a recommendation of fraudster detection in rating the user profile and proposed a HIR-RNN algorithm [68]. This algorithm performs two tasks i.e., prediction of user rating and fraudster detection based on user behavior.

Knowledge graphs (KGs) when combined with a recommendation system are helpful for providing the explainable recommendation. Ma et al. proposed knowledge-aware reasoning with graph convolution network (KR-GCN) which integrates user-item interactions and knowledge graphs into a heterogeneous graph, which is performed with the GCN [69]. Social data are much more important for getting a recommendation about the product, and Yu et al. proposed an enhanced social recommendation system based on GCN, which solves the problem of limited neighbors, noisy social relationships, and heterogeneous neighbors [70]. This model use an autoencoder to augment the data by encoding the high-order and complex connectivity patterns [71]. To extract the relationship between indirect instances between users and items, hamming similarity model is proposed by Liu et al. named hamming spatial graph convolutional networks (HS-GCNs). Xiao et al. proposed a GCN model for a recommendation system using a deep graph neural network named DeepFM graph convolutional network (DFM-GCN) [72]. DFM-GCN is mainly focused on solving the problem of cold start and data sparseness, which is solved by getting the interactive information between the nodes and the representation of items as vector nodes in GCN. There are many types of interaction data that can be generated by users, but traditional studies on recommender systems tend to focus on just one type of user behavior (the optimization target, for example, purchasing) (e.g., view, click, add-to-cart, and so on). Well-structured information can be derived from heterogeneous multi-relational data, and this information can be used to make excellent recommendations. As a result, early attempts to leverage these heterogeneous data fail to capture the high-hop structure of user-item interactions, which are insufficient to make full use of them and may only deliver limited recommendation performance. Graph heterogeneous collaborative filtering (GHCF) explores the high-hop heterogeneous user-item interactions; this study takes the advantages of graph convolutional network (GCN) and further improves it to jointly embed both representations of nodes (users and items) and relations for multirelational prediction data sparsity issue is further solved by Tang et al. by developing multigraph collaborative filtering (DMGCF) model to mine and reuse side information. This method generates multiple graphs with a dynamic evolution mechanism to simulate side information for better performance, especially when side information is unavailable [73].

Monti et al. [74] combined a multi-graph convolutional neural network with a recurrent neural network, in which the multigraph convolutional neural network was used to extract locally stationary features, and the recurrent neural network could diffuse the fractional values and reconstruct the matrix. Zhang et al. [75] modeled the recommender system as a link prediction problem on graphs and proposed a graph self-encoding framework based on different message propagation to model the bipartite graph of the recommender system and achieved the best results on data including social networks good result. Yang et al. [76] applied convolutional neural networks to recommender systems and proposed a data-efficient graph convolutional neural network algorithm MultiSage to generate embedded representations for commodity nodes. These expressions contain graph structure and node feature information. Compared with the traditional graph convolution method, it proposes an efficient random walk strategy to model convolution, designs a new training strategy, and successfully integrates graph convolution neural networks. The network is applied to a superlarge-scale recommendation system with 1 billion nodes. Wang et al. [77] proposed the RippleNet framework, which introduced knowledge graph information to improve the performance of the recommender system. Liao et al. [78] proposed the SocialLGN framework, which includes three parts: user modeling, commodity modeling, and scoring prediction. Using the attention mechanism, the user’s interaction information and the user’s social network information are effectively modeled.

The graph convolutional neural network is considered to be able to model the structural attributes and node feature information of the graph well, and the recommendation system can be regarded as either a matrix completion problem or a bipartite graph (users and items) for the link prediction problem. Compared with traditional methods, graph convolutional neural networks can better utilize the information of user attributes and product attributes that are ubiquitous in recommender systems, which is why graph convolutional neural networks can attract widespread attention in recommender system tasks. Table 3 highlights the latest progress in recommendation and prediction using the graph-based methods.

3.4. GCN for Hyperspectral Data

Hyperspectral techniques have been greatly improved by the rapid growth of optics and spectroscopy. A considerable quantity of important information can be captured using hyperspectral images, which are images that contain many contiguous bands. It has been used in a variety of disciplines, including military target identification, vegetation monitoring, and disaster prevention and control throughout the past few decades [4]. Various algorithms have been proposed so far for categorizing the pixels of a hyperspectral image into specific land-cover categories. The early-stage approaches rely heavily on traditional pattern recognition methods such as K-nearest neighbor classifiers and linear classifiers. K-nearest neighbor has been frequently employed among these traditional methods due to its simplicity in both theory and practice. With high-dimensional hyperspectral data, support vector machine (SVM) works stably and satisfactorily. The aforementioned approaches, however, are all based on constructed spectral-spatial properties that rely largely on professional skill and are extremely empirical. Deep learning is being used to solve this flaw. Mou et al. were the first to use a recurrent neural network (RNN) to classify hyperspectral images [81]. Convolutional neural network (CNN) has recently emerged as a potent method for hyperspectral image classification, and Lou et al. developed a high-performance novel HSI classification algorithm based on CNN. Figure 7 shows the basic implication method in different studies.

It's been shown that convolutional neural networks (CNNs) are excellent at representing and classifying hyperspectral images. Convolution can only be performed on normal square image regions with fixed sizes and weights in traditional CNN models, so they cannot generically adapt to unique local regions with varying item densities and geometric appearances. They must be improved in classification, especially at class boundaries [82]. To address this shortcoming, Luo et al. [83] propose using the recently proposed graph convolutional network (GCN) for hyperspectral image classification, as it can perform convolution on arbitrarily structured nonEuclidean data and is applicable to irregular image regions represented by spatial graph information. Mou et al. [84] proposed a graph-based semisupervised network called nonlocal-GCN. Unlike existing CNNs and RNNs, which take pixels or patches of a hyperspectral picture as input, this network takes the entire image (including both labeled and unlabeled data) into account. To be more specific, a nonlocal graph is initially computed. To extract characteristics from this network representation, a pair of graph convolutional layers are used. Finally, the network’s semisupervised learning is accomplished by employing a cross-entropy error over all labeled occurrences. Ding et al. [32] adopt graphSAGE for feature extraction in local regions of the graph, which is helpful in getting more accurate and effective information about nodes. Using MSAGE-CAL based attention method with graphSAGE improves the classification accuracy of HSI.

Guo et al. [79] found that GCN models are shallow and feature extraction is not effective. To solve this issue, DGU-HSI is proposed. This DGU-HSI constructs two separate graphs for spatial and spectral data for feature extraction, which extract features simultaneously. Once the feature is extracted, the graph u-nets is used for the fusion of features for classification. Yang et al. [85] also used a similar approach using the spectral (Se-GCN) and spatial (Sa-GCN) data to develop an adaptive cross-attention-driven spatial–spectral graph convolutional network (ACSS-GCN). This model is improved by using an attention mechanism in both the blocks of spectral and spatial information. Qu et al. [86] used first-time GCN for change detection in HSI data and proposed dual-branch difference amplification GCN (D2AGCN) which is highly efficient in low samples of data. The dual-branch structure can effectively extract sufficient different features to facilitate the detection of the changed areas.

GCN is used by the methods described previously to investigate large-range spatial relations of HSI, whereas local spatial information is more important when training samples are limited. S2RGANet (spectral–spatial residual graph attention network), a novel method for HSI classification that addresses the shortcomings mentioned previously, has been developed to address these issues. The spectral residual modules in the S2RGANet are designed to extract spectral discriminative features, while graph attention convolutions are introduced to explore the local geometric structure. In contrast to existing GNNs, which are designed to learn large-range spatial relations between samples in HSI, the proposed graph convolutions are designed to capture the distribution pattern of land cover in a given local patch of ground [86]. Sha et al. [87] used GAT for HSI classification by using different weights for different nodes according to their attention coefficients during the convolution process. Table 4 highlights the latest progress in hyperspectral data classification using the graph-based methods.

3.5. GCN for Computer Vision

With the development of science and technology, image processing technology has been presented to everyone’s field of vision. Among them, the content covered is relatively extensive, and the image information can be optimized, and the corresponding image recognition, detection, data encoding processing, enhanced restoration, and other work can be completed. It can not only transmit the information that the people need to obtain but also penetrate into all aspects of work production. For example, the fields of transportation, agriculture, communication technology, and aviation all require the support of image processing technology. AI is playing a vital role in image processing and helping in different tasks of image processing efficiently such as image segmentation, change detection, denoising, image enhancement, and 3D images. After traditional approaches such as SVM, KNN, and so on. CNN provides a wide range of applications in image processing but currently, the research is diverting towards GCN because of better results and low complexity of calculation. GCN research in image processing is extensive in remote sensing images, medical images, 3D images, etc., with different types of images denoising and image enhancement techniques.

Saha et al. [90] used GCN for developing a change detection mechanism in remote sensing images. This semisupervised CD method encodes multitemporal images as a graph via multiscale parcel segmentation that effectively captures the spatial and spectral aspects of the multitemporal images. Ismail et al. [91] proposed a BLDNet algorithm for estimating the damage detection in the building caused by disasters or earthquakes. The model is based on Siamese CNN with trained GCN in semisupervised to get the predictions of disasters carried out by earthquakes. In recent years, deep learning-based image denoising methods have outperformed traditional denoising techniques. In order to train a convolutional neural network to infer clean images, most deep learning-based image denoising methods use cropped small patches. However, in practice, real-world noisy images tend to be of high resolution rather than the cropped small patches, and the vanilla training strategies ignore the cross-patch contextual dependency in the whole image. Li et al. [92] used cross-patch GCN with the help of CNN to perform denoising of the image, and the results show that denoising is 95% accurate. Shen et al. [93] extended the denoising work using GCNs and proposed a novel approach, GCN-Denoiser, which preserves features of mesh denoising and performs graph convolution operations in the dual space of mesh triangles.

Remote sensing (RS) image semantic segmentation, as the fundamental task of GIS, serves as the foundation for other RS research and applications such as natural resource protection, land cover mapping, and land-use change detection. Despite receiving significant attention over the last decade, semantic segmentation of high-resolution RS images remains difficult due to the complexity of structure in RS images, which leads to interclass similarity and intraclass variability. Ouyang et al. [94] proposed a DSSN-GCN framework that combines deep semantic segmentation with GCN. In this framework, an attention residual U-shaped network (AttResUNet) is used as a feature extractor while graph nodes are denoted by the superpixels, and the graph weight is calculated by considering the spectral information and spatial information. Kim et al. [95] proposed the Split-GCN model which outlines the objects in by similar nodes features and highlights them in a specified region. This model consists of two parts: an encoder (feature extraction network) to extract the boundary information of an object and a decoder (novel graph composition network) to capture the shape of an object. The model used the polygon-based approach to detect the object boundary in uniform spaced points.

Computer vision is a long-running research topic because it can perceive and recognize the world without the aid of humans by gathering data from sensors. Reverse engineering, intelligent surveillance, and remote sensing all rely on target recognition as a critical component of their respective applications. Unmanned systems and augmented reality are examples of practical application scenarios where three-dimensional (3D) object recognition is more relevant than two-dimensional (2D) target recognition. Zhan et al. [96] proposed a 3D point cloud model named minimum bounding box oversegmentation GCN (MBBOS-GCN). This model uses a minimum bounding box algorithm, and the farthest point sampling (FPS) algorithm is used to sample within each small region to reduce sampling randomness, with an accuracy of the model being more than 90% for segmentation of the 3D objects. Wang et al. [97] used GCN models for activity recognition in 3D space and proposed a spatial-temporal graph convolutional network (ST-GCN) model. In this method, semantically close point data are treated as neighbors, and using the motion capture data, a graph was created as follows: the intrabody edges between skeleton areas are defined based on the natural connections in human bodies; the interframe edges connect the same skeleton areas between consecutive frames. For 3D shape analysis, Wei et al. [98] proposed a model named View-GCN which can 3D shape based on a graph representation of multiple views inflexible view configuration. This model is a flexible model with different view configurations, e.g., cameras located on circles, corners of dodecahedron, or even irregular positions around objects. Second, by using view-graph representation, this study can take advantage of GCN to aggregate multiview features considering the relations of graph nodes. Table 5 highlights some latest innovations for image processing using graphs.

3.6. GCN for Transport and Traffic System

The traffic prediction problem is also one of the tasks in which graph convolutional neural networks are widely used. Its purpose is to predict the rate of future traffic given the historical traffic speed and route map. In the traffic prediction problem, nodes represent sensors placed on the road, and edges represent the physical distances of node pairs. Each node contains a temporal feature. Compared with traditional graph analysis problems, the traffic prediction problem includes both time and space modeling, and how to use the graph convolutional neural network to better model the road network in traffic brings opportunities and challenges. Li et al. [101] proposed a diffusion convolutional recurrent neural network (DCRNN) to model traffic forecasting. In this model, it regards traffic flow as a diffusion problem on a directed graph and proposes to use diffusion convolution to model graph-structured data. Use recurrent neural networks to model time dependencies. It achieves a 12%–15% improvement on two large-scale road network traffic datasets.

Cui et al. [102] proposed a traffic graph convolutional long short-term memory network (TGC-LSTM) to learn road networks and time-varying traffic patterns. It defines the graph convolutional neural network on the physical road network topology. The experimental results show that the method can capture the complex spatiotemporal dependencies effectively existing in the vehicle traffic network. Zhang et al. [1] proposed a graph gate recursive unit (GGRU) to solve the traffic flow prediction problem. They applied the graph gate recursive unit to the encoding-decoding model of the recurrent neural network and applied it to the Los Angeles highway data set. Zhang et al. [38] proposed a new deep learning framework, a space-time graph convolutional neural network (STGCN), to solve the problem of time series prediction in the traffic field. In this framework, it first formalizes the problem onto a graph using convolutional structures for modeling, which achieves significant improvements over traditional machine learning methods in short-term and mid-to long-term traffic prediction due to better utilization of topology.

Zhu et al. [103] developed AST-GCN for modeling traffic forecasting for intelligent transportation systems. This model uses external factors as dynamic attributes and static attributes and designs an attribute-augmented unit to encode and integrate those factors into the spatiotemporal graph convolution model and perform traffic speed prediction. In another approach, Zhu et al. [104] proposed BRB-based RNN-GCN model for traffic flow prediction, which solves the existing issues of traffic flow prediction models such as saturation or speed. In the scenarios related to traffic prediction, how to solve the spatiotemporal dependence is an important research direction. Since the graph convolutional neural network provides a solution to the modeling of graph data problems, it combines with time series models such as recurrent neural networks to give a good solution to the modeling traffic forecasting problem is presented [70]. However, how to further fine-grained consideration of spatiotemporal data modeling is still the focus of future research.

3.7. GCN for NLP

Graph convolutional neural networks have a large number of applications in the field of natural language processing. In this field, the more common graph data are knowledge graphs, syntactic dependency graphs and abstract meaning expression graphs, word cooccurrence graphs, and graphs constructed by other methods. Entity relation extraction (RE) is a method of encoding the meaning of a sentence as a rooted directed graph [105]. Sun et al. [106] applied graph convolutional neural networks to dependency syntax trees for machine translation tasks in English and German, and English and Zhou et al. [107] used a graph convolutional neural network for event extraction, where the graph used here is a dependency syntax tree. Table 6 provide a further description of the methods:

In addition to the previous graphs, word cooccurrence networks have also been applied to text classification tasks. Where nodes are nonstop words, and edges are word cooccurrence relationships in a given window. Defferard et al. [112] proposed a convolutional neural network defined in graph theory, which provides the necessary mathematical background and an efficient numerical scheme to design fast local convolutional filters on graphs. Reference [113] used a weighting approach with GCN for the categorization of text. Pal et al. [114] used graph convolutional neural networks for text classification tasks on the Reuters dataset. Yao et al. [115] applied a graph convolutional neural network to a text classification task by constructing a coword network and document relation network, and achieved the best results without using external knowledge and word representation.

A large number of studies have shown that the results of various natural language processing tasks have been improved to a certain extent after using the graph convolutional neural network model [116]. The use of graph structure enables the complex semantic relationship between objects to be effectively mined. Compared with the traditional serialization modeling for natural language processing, the use of graph convolutional neural networks can mine nonlinear complex semantic relationships.

3.8. GCN for Bioinformatics

In addition to the traditional modeling of graph data, graph convolutional neural networks have also received a lot of attention from researchers in fields such as biochemistry. Compared with traditional graph data research, in the field of biochemistry, people usually regard a chemical structure or a protein as a graph, the nodes in the graph are smaller molecules, and the edges represent bonds or interactions. Figure 8 is a molecular graph of medicine, where the nodes are carbon, hydrogen, and oxygen atoms, and the edges are chemical bonds. Researchers focus on the chemical function of a graph, that is, the object of study is no longer the nodes in the graph, but the entire graph itself.

Intracellular interactions are the focus of most methods for determining gene-gene interactions from expression data. High-throughput spatial expression data enables methods that can infer such interactions between cells and within cells. Yuan et al. [117] developed graph convolutional neural networks for Genes were developed to accomplish this (GCNG). It uses supervised training to combine spatial information with expression data. Prior methods for analyzing spatial transcriptomics data have been improved by GCNG, which can propose new extracellular interacting gene pairs. Upstream analyzes, such as functional gene assignment, can make use of the GCNG output. It is one of the primary goals of genomic medicine to identify the genetic variations in a patient that are responsible for their clinical phenotypes and to determine their relationship to those phenotypes. Only the genotype information is taken into consideration when prioritizing genomic variants, which results in the identification of a few hundred potential variants on average. It is still a significant challenge to narrow it down even further in order to identify the disease genes that are responsible for the clinical phenotypes observed. This is especially true for rare diseases. Motivated by the recent progress in spectral graph convolutions. Rao et al. [118] developed the graph convolution-based technique HANRD (Heterogeneous Association Network for Rare Diseases) to infer new phenotype-gene associations from this initial set of associations.

Predicting chemical compounds is one of the fundamental tasks in bioinformatics and cheminformatics because it contributes to various applications in metabolic engineering and drug discovery. Harada et al. [119] proposed a new graph convolutional neural network architecture called a dual graph convolutional network that learns compound representations from both the compound graphs and the intercompound network in an end-to-end manner. For the prediction of DNA protein, Zhang et al. [120] build a sequence k-mer graph for the whole dataset based on the k-mer cooccurrence and k-mer sequence relationship and then learn DNA graph convolutional network(DNA-GCN) for the whole dataset. It has not yet been thoroughly investigated whether advanced graph network methods can be used to identify functional protein complexes from the protein-protein interaction networks (PPIs) at a high level. To improve the detection of protein complexes, Zaki et al. [121] proposes a variety of graph convolutional network (GCN) methods. A node classification problem is first formulated as a protein complex detection problem. Once the model is developed and a complex affiliation matrix is in place, this model will be able to use it to group the nodes (proteins). In addition, a multiclass GCN feature extractor and a mean shift clustering algorithm are used to extract the nodes' features and perform clustering. Furthermore, applications are in Table 7:

Appropriate gene prioritization is critical for genome-based diagnostics of a variety of genetic diseases. However, it is a difficult task that requires a limited and noisy understanding of genes, diseases, and their associations. While several computational methods have been developed for the task of disease gene prioritization, their performance is largely constrained by manually crafted features, network topology, or predefined data fusion rules [127].

Li et al. [50] define convolutional neural networks directly on graphs. The neural network model inputs molecules of any size or shape and learns molecular fingerprints end-to-end. The model can better help to realize the molecular design of specific functions. Torng et al. [128] used a graph convolutional neural network to encode atoms, bonds, and distances, which can better utilize the information in the graph structure. It provides a new paradigm for ligand-based virtual screening. Gilmer et al. [129] proposed a message propagation model MPNNs to predict the chemical properties of a given molecule. Zitnik et al. [130] used graph convolutional neural networks to model multiple drug side effects. It first constructs multimodal maps of protein-protein interactions, drug-protein target interactions, and multiple drug interactions. In the graph, each side effect is treated as a different type of edge. Furthermore, the modeling of side effects with drugs is transformed into a link prediction problem, which provides a new research idea for further study of pharmacology. Xiao et al. [131] and Reau et al. [132] proposed to apply graph convolutional neural network to protein interaction prediction. In this task, proteins are chains of amino acid residues that fold into three-dimensional structures that give them biochemical functions. Proteins exert their functions through complex networks of interactions with other proteins. You et al. [133] proposed the graph convolutional policy network (GCPN), a model based on general graph convolution and reinforcement learning to generate target graphs. In this model, the hidden state is expressed as a node by means of message propagation, and then a policy π is generated.

4. Prospects for Future Research Directions and Limitations of GCN

Although graph convolutional networks have succeeded in recent years, there are still some unsolved problems and directions worthy of further research.

4.1. Deep Network Structure

After stacking a large number of network layers, the traditional deep learning model has achieved remarkable results in many problems due to its powerful representation ability [38]. However, in the graph convolutional neural network model, after stacking a small number of layers, the network achieves the best effect. Adding graph convolutional layers simultaneously will make the results worse. Because graph convolution includes aggregating the features of neighbor nodes, when the network stacks multiple layers, the components between nodes are too smooth and lack discrimination. The experimental results of GCN show that when the number of network layers exceeds two layers, with the increase in the number of layers, the effect of GCN on the semisupervised node classification problem will decrease instead [134]. At the same time, with the continuous superposition of the network, eventually, all nodes will learn the same expression. Whether the graph neural network needs a deep structure or whether a deep network structure can be designed to avoid the problem of excessive smoothness is an urgent research problem to be solved.

4.2. Multiscale on-Graph Tasks

Graph mining tasks can be divided into node-level problems, graph and subgraph-level problems, and signal-level problems according to the different main objects. The critical point of node-level tasks is to learn efficient representations for each node, while learning representations for graphs is the key to graph-level tasks [135]. The key points of the signal-level task are learned to express effectively for different graph signals, while the network structure is unchanged. At present, most graph convolutional neural networks are designed for node-level tasks, and less attention is paid to graph-level and signal-level tasks.

4.3. Dynamically Changing Graph Data

In practical scenarios, the network is often dynamic. This dynamic includes the characteristics of nodes and edges that are constantly changing over time and the structure of the network that is constantly changing (there are new edges, nodes join the network, and nodes and edges disappear from the network) [136]. Considering the dynamics of the network is also the trend of graph mining algorithms. The current graph convolutional neural networks are designed for static networks, so designing a graph convolutional neural network that can model the dynamic changes of the network is also an important direction in the future.

4.4. The Complex Nature of Graph Data

In practical scenarios, networks often have complex structural characteristics. For example, the types of nodes, the complex features on the edges, the community structure of the network, and so on. Although many works have proposed some solutions, they are all networks designed for a certain characteristic [137]. Whether a network can be designed to simultaneously model various complex characteristics of the network is also a question worthy of discussion. GCN employs mean pooling. As a result, it will be impossible to distinguish aggregation on, say, the 2 different multisets (a,b) and (a,a,b,b). Mean-pooling will produce the same estimate for both multisets, so it is not injective. Because of mean-pooling, GCN will be unable to distinguish between nodes receiving messages from two other nodes and nodes receiving messages from four other nodes. The structural distinction is not distinguished here [133, 138140].

4.5. Adversarial Attacks on Graph Neural Networks

Neural networks shine in various tasks but still have instability problems. For example, adding a certain amount of noise to the picture will not change the type of the picture to the human eye, but the neural network has already judged it as other types. Designing a targeted sample to make the machine learning model make a misjudgment is called an adversarial attack. In the field of GNN, constructing adversarial samples using the characteristics of nodes and network structure and designing a graph neural network that can defend against adversarial attacks are all important directions for future development.

5. Conclusion

Graphs are a powerful and rich structured data type with strengths and challenges that differ greatly from images and text. We have outlined some of the milestones that researchers have reached in developing neural network-based models that process graphs in this study. We have gone over some of the key design decisions that must be made when employing these architectures, and hopefully, the GNN playground can provide some insight into the empirical outcomes of these decisions. The recent success of GNNs opens the door to a wide range of new problems, and we are excited to see what the field will bring. The key points of this survey include the following points:(i)The existing applications in the field of computer mainly use GN, GCN, and MPNN models, rarely use the GAE model and do not use the GAT model. Most of the existing applications use FF, RNN, CNN, etc., as aggregation functions to transmit node and topology information and output predicted values, and the application scope is limited. Due to their own limitations, GN, GCN, and MPNN are difficult to solve complex communication network problems.(ii)The learning methods are mainly divided into supervised learning and reinforcement learning. Supervised learning is mostly used for traffic/resource/index prediction, node classification, and other issues; reinforcement learning is mostly used for path selection, topology transformation/mapping, and other issues.(iii)The existing application goals mainly focus on the tasks of nodes. The output features are mostly the features of nodes or the overall indicators of the network and are rarely used for linking tasks.(iv)Almost all existing applications are based on centralized learning, and it is necessary to obtain the information of all nodes before learning.

The authors allow to publish after acceptance.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Hao Tang, Uzair Aslam Bhatti, Guilu Wu, Shah Marjan, and Aamir Hussain were involved in the data analysis; Hao Tang, Uzair Aslam Bhatti, and Aamir Hussain were responsible for manuscript writing; Uzair Aslam Bhatti was responsible for project management; and Hao Tang was responsible for funding.

Acknowledgments

Hainan University Research Fund (project nos. KYQD (ZR)-22064, KYQD (ZR)-22063, and KYQD (ZR)-22065).