VulEye: A Novel Graph Neural Network Vulnerability Detection Approach for PHP Application

Lin, Chun; Xu, Yijia; Fang, Yong; Liu, Zhonglin

doi:10.3390/app13020825

Open AccessArticle

VulEye: A Novel Graph Neural Network Vulnerability Detection Approach for PHP Application

by

Chun Lin

,

Yijia Xu

^*

,

Yong Fang

and

Zhonglin Liu

School of Cyber Science and Engineering, Sichuan University, Chengdu 610207, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(2), 825; https://doi.org/10.3390/app13020825

Submission received: 24 November 2022 / Revised: 25 December 2022 / Accepted: 29 December 2022 / Published: 6 January 2023

(This article belongs to the Special Issue AI for Cybersecurity)

Download

Browse Figures

Versions Notes

Abstract

:

Following advances in machine learning and deep learning processing, cyber security experts are committed to creating deep intelligent approaches for automatically detecting software vulnerabilities. Nowadays, many practices are for C and C++ programs, and methods rarely target PHP application. Moreover, many of these methods use LSTM (Long Short-Term Memory) but not GNN (Graph Neural Networks) to learn the token dependencies within the source code through different transformations. That may lose a lot of semantic information in terms of code representation. This article presents a novel Graph Neural Network vulnerability detection approach, VulEye, for PHP applications. VulEye can assist security researchers in finding vulnerabilities in PHP projects quickly. VulEye first constructs the PDG (Program Dependence Graph) of the PHP source code, slices PDG with sensitive functions contained in the source code into sub-graphs called SDG (Sub-Dependence Graph), and then makes SDG the model input to train with a Graph Neural Network model which contains three stack units with a GCN layer, Top-k pooling layer, and attention layer, and finally uses MLP (Multi-Layer Perceptron) and softmax as a classifier to predict if the SDG is vulnerable. We evaluated VulEye on the PHP vulnerability test suite in Software Assurance Reference Dataset. The experiment reports show that the best macro-average F1 score of the VulEye reached 99% in the binary classification task and 95% in the multi-classes classification task. VulEye achieved the best result compared with the existing open-source vulnerability detection implements and other state-of-art deep learning models. Moreover, VulEye can also locate the precise area of the flaw, since our SDG contains code slices closely related to vulnerabilities with a key triggering sensitive/sink function.

Keywords:

graph neural network; PHP; vulnerability detection; PDG

1. Introduction

PHP is a general-purpose scripting language for web development [1]. Since its birth, PHP has been called the world’s best programming language by most developers. Currently, PHP still dominates the web-server-side programming languages. According to the W3TECHS report, PHP topped the list of programming languages used by web applications as of September 2022, with a market share of 77.3% [2]. Because of its free and extensible features, PHP’s open-source ecosystem has made vast and rapid progress, and there are now over 138K great PHP repositories on GitHub [3].

Nowadays, open-source software develops fast. Based on various existing PHP frameworks, developers can implement a PHP application more rapidly and conveniently than before and focus more on the business process. Unfortunately, the open-source code introduced by the developer may contain some buggy code, which will cause vulnerabilities such as cross-site scripting (XSS), SQL injection (SQLI), OS command injection (OSCI), etc. If developers copy and paste portions or a total of lib code from open software without security audits and modification, it will propagate the vulnerable code to the ongoing project [4]. The vulnerabilities contained in the open-source projects are easily discovered and exploited by hackers, posing a colossal threat [5]. Therefore, the problem of detecting vulnerabilities in web applications as early as possible is critical and worth paying more attention to.

With the spreading use of PHP in server-side web application programming, any little security bugs may cause a colossal disaster [6]. BackupBuddy [7] is currently one of WordPress’s best-selling site backup plugins. It allows users to back up their WordPress website from the dashboard, including users, posts, widgets, media files, and theme files. On 9 September 2022, a WordPress security company Wordfence disclosed BackupBuddy’s zero-day flaw CVE-2022-31474 [8], with a CVSS score of 7.5. The flaw makes it possible for unauthorized users to access and view arbitrary files from the vulnerable website, which may contain secret information, such as the WordPress “wp-config.php” file, “/etc/passwd” file, etc.

Analyzing the source code characteristics and discovering the weakness is consistently one of the effective methods for vulnerability detection [9]. In the past, people used manual auditing to test the security of PHP programs. It was not only complex but also wasted time and labor. Moreover, it cannot handle the current large open-source codes. Currently, the traditional method uses the regex pattern to search the flaws’ patterns in PHP source code, which depends on the detective vulnerable features extracted from human experts’ experience. So, it is often not feasible for the complicated vulnerabilities. Since machine learning technology is developing fast, more and more researchers have dived into seeking methods using artificial intelligence to find vulnerabilities automatically, and it has been proven to be feasible.

The traditional ways of PHP vulnerability detection [10,11,12] are based on static, flow, and taint analysis tools, along with manual auditing. They need more labor and are inefficient in finding flaws in large-scale PHP project files. With the development of artificial intelligence, machine learning and deep learning approaches have proved suitable for software vulnerability detection. The data mining methods [13,14,15,16] require manually extracting the characteristics of vulnerabilities, which relies on rich expert knowledge and costs much labor time and suffers from poor detection performance, with higher false negatives and false positives. The deep-learning-based approach [6,17,18] transforms the source code into a custom token sequence and feeds it into natural language models to extract the features of vulnerabilities, then classifies them to be vulnerable or not. All these solutions can atomically capture the context information of the vulnerability in the PHP source code. However, they do not consider the structural characteristics of PHP script, which will lose a lot of semantic knowledge in code representation. Otherwise, GNNs (Graph Neural Networks) [19] can learn from the graph characteristics of source codes well. The existing methods transform the source code graph into a particular representation, such as AST (Abstract Syntax Tree), CFG (Control Flow Graph), DFG (Data Flow Graph), PDG (Program Dependence Graph), CPG (Code Property Graph), etc., which lack the grammatical features of source code [20]. Moreover, there are a few methods for PHP vulnerability detection using Graph Neural Networks. DeepTective [21] first investigated using GNN to detect vulnerabilities in PHP source code but just used CFG (Control Flow Graph) to represent the source code, lacking the data-dependent information, which is not conducive to vulnerability detection.

To address the above problems, we propose a Graph Neural Network vulnerability detecting approach VulEye, based on a particular structure of SDG (Sub-Dependence Graph), which is a sub-graph of PDG. We combine the source code’s control and data dependence features and put the source code line into the SDG’s nodes attribute. The SDG integrates the token-level characteristics of the taint propagation chain into the corresponding graph nodes to realize vulnerability detection. In addition to the comprehensive structural features of source code, our approach with SDG can retain rich context semantic knowledge and has more suitable detection ability.

This paper shows our contributions as follows:

Propose a novel Graph Neural Networks vulnerability detection approach, VulEye, for PHP application based on SDG, which contains source code skeleton information and the code token features embedding with Doc2Vec [22]. It uses a GCN (Graph Convolutional Network) [23] with a global attention mechanism to learn the characteristics of vulnerabilities and classify them by MLP and softmax. This approach combines the source code’s graph-level and token-level features so that it feeds the GNN (Graph Neural Network) model more details about the code and achieves better detection ability.
Propose an improved PHP-based code-slicing technique. It can automatically extract all the code slices related to the contamination propagation chain of a given sensitive API function in source code. There is no need to manually mark the entry point line number of the slice [24]. Moreover, VulEye can also locate the detailed location of vulnerability, since the SDG contains code slices closely related to vulnerabilities with a critical triggering sensitive/sink function.
The experimental results show that the detection capability of VulEye is the best compared with the existing methods and other open-source detection implements. Based on the SARD’s PHP vulnerability test suite dataset [25], VulEye achieved the best macro-average F1 score of 99% in the binary classification task and 95% in the multi-classes classification task. Compared with the state-of-the-art approach of DeepTective, VulEye achieves enhancements in all indicators. We further evaluate the generalization capability of VulEye with SQL-Labs [26] and XSS-Labs’s [27] vulnerable PHP files, which achieved an accuracy of 88% and 72%.

2. Related Work

Currently, there are three main methods of vulnerability detection in PHP source code: one is tool-based manual auditing, the other is machine learning-based methods, and the third is the deep-learning-based approach.

2.1. Tool-Based Manual Auditing

The well-known tools developed by cyber security experts can reduce manual analysis’s workload and improve vulnerability detection’s efficiency. Many traditional approaches focus on using static, semantic, and taint analysis to locate vulnerabilities. Pixy [10,28] is the first open-source static vulnerability detection tool for PHP. It combines data-flow analysis and cross-function context-aware techniques to detect XSS (cross-site scripting) vulnerabilities in PHP application components. However, it requires engineering modification before detecting other vulnerabilities. Rips [11,29] is a static source code analysis tool based on PHP’s built-in “tokenizer” function. It combines stain analysis and static analysis methods to locate vulnerabilities in PHP applications. phpSAFE [14] is a static analysis tool that detects denial of service (DOS) and unauthorized access vulnerabilities. Firstly, it extracts the program’s AST (Abstract Syntax Tree), analyzes the control dependencies, and performs an inter-process semantic analysis. Therefore, it can identify and verify all the conformance security checks between context calls. PHPStan [30] analyzes source code by customizing different levels of rules to look for explicit and implicit bugs (such as common parameter errors, type incompatibilities, using undefined variables, and so on) and allows the user to add new rules easily.

2.2. Machine-Learning-Based Method

In recent years, data mining and machine learning have made massive progress in vulnerability detection. WAP [31] and WAPe [15] introduce machine learning and data mining to reduce the false alarm rate of the vulnerability detected by taint analysis. They can see eight types of vulnerabilities, including cross-site scripting, SQL injection, OS command injection, etc. In follow-up work, DEKANT [32] introduced the Hidden Markov Model (HMM) [33] to describe vulnerabilities based on a set of source code snippets with the mark of contaminated or uncontaminated. They then send these code fragments into the natural language processing model to predict vulnerabilities. All these approaches above can detect different vulnerability categories. WIRECAML [34] combines machine learning and taint analysis to find XSS (cross-site scripting) and SQLI (SQL injection) vulnerabilities in PHP source code. The tool extracts meaningful data stream features by analyzing the reachable variable definitions, contamination points, and reachable constants and optimizing the learning process of the ML model. The tool obtains the best results with a decision tree classifier, with a score of 88% for SQLI and 82% for XSS regarding the precision–recall curve score. Compared with the previous static analysis methods of PIXY, Rips, and WAP, WIRECAML achieves higher detection capability regarding accuracy, recall rate, and F1 score.

2.3. Deep-Learning-Based Method

Using deep learning to detect source code vulnerabilities has become hot research. Most research is for C and C++ programs [5,9,35], but it’s rare for PHP code vulnerability detection. TAP [6,36] presents a static analysis method for PHP vulnerability detection based on code tokens and deep learning. The tool proposed a custom tokenizer to generate PHP code tokens and use data-flow analysis to find the relevant function calls. TAP implements a sequence-based deep learning model that uses Word2Vec to generate token embedding and LSTM (Long Short-Term Memory) to train and classify vulnerabilities. Compared with RIPS and WIRECAML, TAP achieves the best effects regarding accuracy, F1 score, and AUC (Area Under the Curve) on both safe and unsafe test cases. Vulhunter [18] proposed a different way to represent vulnerabilities using bytecode features. It generates potentially suspicious code fragments by analyzing the CFG (Control Flow Graph) and DFG (Data Flow Graph) and then converts the code snippet to a bytecode. Vulhunter vectorizes the bytecode fragment by Word2Vec and passes it along with the tag into a BILSTM (Bi-directional Long Short-Term Memory) model. The evaluation results indicate that Vulhunter can detect SQLI and XSS vulnerabilities and achieves a higher effect than RIPS regarding the recall and F1 scores. DeepTective [21] is the first approach that uses Graph Neural Network techniques to identify vulnerabilities in PHP source code. It combines GRU (Gated Recurrent Unit) and GCN (Graph Convolutional Network) to detect cross-site scripting, SQL injection, and OS command injection vulnerabilities. DeepTective transforms source code into a sequence of tokens for GRU to analyze syntactic structural properties and extracts the CFG for GCN to learn the semantic properties. The paper claims that this hybrid architecture achieves a 99.92% F1 score on synthetic data from SARD.

3. Background

This section introduces some conceptions relevant to PHP vulnerabilities, source code representation, and graph neural networks.

3.1. PHP Common Vulnerability

A software vulnerability is an application error made by software developers, which malicious actors can use to access the system or network [37]. The most dangerous software vulnerability ranking of “2022 CWE Top 25” [38] includes the common vulnerabilities in the current web application ecosystem. The top five web application vulnerabilities include cross-site scripting (XSS or CWE-79), SQL injection (SQLI or CWE-89), OS command injection (OSCI or CWE-78), path traversal (PTR or CWE-22), and cross-domain request forgery (CSRF or CEW-352), ranking of two, three, six, eight, and nine, respectively. We pick up the top-ranking three vulnerabilities for illustration.

XSS (cross-site scripting) is a vulnerability caused by the absence or poor implementation of filtering when data are input or output. An attacker can inject arbitrary javascript into HTML files through this vulnerability. A successful cross-site scripting attack can lead to various evil actions, including personal information leaks such as session cookies or requesting a harmful web application. The example code Listing 1 shows an XSS vulnerability in an actual web application. Based on this example, the flaw occurs in code line 6. The $_POST[“name”] input is not filtered before the printout. Therefore, an attacker can induce the victim to insert <script> tags controlled by the attacker into the browser page, thereby enabling any malicious client behavior.

Listing 1. XSS example.

SQLI (SQL injection) happens when someone tries to change a SQL query before passing it to the backend database server. An attacker can exploit it to read or write personal data from the backend database without permission or execute the system command on the database server in some conditions. The example code Listing 2 uses $id to render the SQL query dynamically in code line 2. A well-skilled attacker can execute SQL injection by supplying an evil-formatted string in the input variable $_GET[“id”], which may be in the form of “’UNION [evil SQL query]”. Then, the database structure and personal data will be leaked and modified by posting malicious SQL queries.

Listing 2. XSS example.

OSCI (OS command injection) allows an attacker to execute evil commands on the target operating system running the vulnerable web application. The harshness of OSCI depends on the privilege level obtained by the attacker through the injection attack. The example code Listing 3 shows an actual operating system command injection vulnerability, where the $_GET[“ip”] input is connected with a string to form a complete shell command. It is designed to check if a host is alive by executing the system command ping. However, because of the lack of filtering on the IP address input, anyone can exploit this vulnerability by positioning a malicious IP (such as “localhost; [any shell command here]”) and running any commands on the target server.

Listing 3. OSCI example.

3.2. Source Code Representation

The source code is a list of instructions the programmer writes when developing software. Source code representation helps the computer to understand the code instruction so that the computer can deal with the source code correctly for machine learning or deep learning. Before the program is executed, the compiler or script interpreter must convert the source code into an operation code that the machine can recognize. Similarly, it is also necessary to convert the source code into an intermediate representation object when using machine learning or deep learning to detect vulnerabilities. This object should contain as much vulnerability feature information of the source code as possible, including syntax and semantic features, such as specific character sequences, control dependency, data dependency during program execution, etc., but also be able to convert into vectors easily and directly input into machine learning models. At present, the mainstream source code representation technologies include:

Token Sequence.
Abstract Syntax Tree (AST).
Control Flow Graph (CFG).
Data Flow Graph (DFG).
Program Dependence Graph (PDG).
Code Attribute Graph (CPG).

The appropriate intermediate representation retains more knowledge of the source code and has a richer ability to express vulnerability features [39]. In this article, we use a sub-graph of PDG to represent source code, which is discussed below.

A Program Dependence Graph (PDG) includes control and data dependence during program execution, which is crucial in accurately analyzing vulnerability characteristics. It can improve the effectiveness of the vulnerability detection system. PDG is a directed graph where each node represents an instruction (program statement), and each edge represents the data or control dependence between two program instructions. A control dependence edge

N_{i} \to N_{j}

indicates that the execution of

N_{j}

is determined by

N_{i}

, reflecting the control flow or execution order between two instructions. The data-dependence edge

N_{i}^{'} \to N_{j}^{'}

indicates that the data defined at

N_{i}^{'}

acts on

N_{j}^{'}

, reflecting the “definition-use” relationship of variables between two statements.

3.3. Graph Neural Network

A Graphical Neural Network (GNN) is a deep learning method for processing graph structure data. It has excellent performance in the spatial non-Euclidean data analysis domain, such as social networks [40], knowledge graphs [41], traffic networks [42], etc. A GNN uses a deep neural network to learn the graph’s node and structural features. In a word, the primary process of the GNN model includes two procedures. One is propagation and the other is output. In the first procedure, the GNN model updates node representations over a timestep and handles the target output (such as the classifications of nodes) in the second procedure.

A graph structure

G = (V, E)

, where V represents a node-set, E represents a set composed of all edges, and

h_{v} \in R^{d}

is a state embedding for each node v, which contains the feature of the neighborhood and itself. The state embedding

h_{v}

is a d-dimension vector of node v. In the propagation procedure, the GNN projects the graph data into lower-dimension spaces to obtain node representations by iteration. It initializes the node representation

h_{v}^{(l)}

with the model input

X_{v}

, then updates each node representation by a propagation formula (1), where l is the iteration number, which automatically increases between 0 and the maximum iteration number specified before the training step.

h_{v}^{l} = f (X_{v} X_{C O_{v}} X_{N B_{v}} h_{N B_{v}}^{l - 1})

(1)

X_{C O_{v}}

is a feature vector of the edge-binding node v,

X_{N B_{v}}

is feature of the neighboring vertex of node v, and

h_{N B_{v}}

is the embedding matrix of the neighboring vertex of node v.

f (\cdot)

is a projection function that maps input data into d-dimensional space. According to Banach’s immobility theory [43], we can obtain a unique

h_{v}

solution for iterative updating as follows (2), where H and X are all linkages of h and x.

H_{v}^{l + 1} = F (H^{l}, X)

(2)

Update the function and continuously use the hidden state of the neighbor node at the current time as part of the input to generate the hidden state of the center node the next time, until the next state of each node changes very little. So far, each node has “known” its neighbor’s information. Finally, GNNs use an output function to adapt to different downstream tasks, which calculate the output by feeding the state

h_{v}

and feature

x_{v}

, as follows (3).

o_{v} = g (x_{v}, h_{v})

(3)

4. Proposed Approach

In this section, we discuss our proposed VulEye, a new PHP vulnerability detection method using the latest Graph Neural Network techniques, including how to represent the nonstructure source code into a graph structure, how to use algorithms to perform deep learning, and finally, realize vulnerability detection.

4.1. Overview

Figure 1 shows our overall solution. VulEye includes three stages: Graph Composition, Deep Learning, and Vulnerability Detection. VulEye retains the high-level control and data-dependence information of PHP code in SDG (Sub-Dependence Graph), which is a sub-graph of PDG. Then, it uses Doc2Vec [22] to embed the statement-level feature information of the PHP code line into the node feature representation of SDG and then makes all the node features and edge features of SDG as the input of the GNN model for feature learning. Finally, it selects MLP (Multi-Layer Perceptron) and softmax for classification.

4.2. Stage I: Graph Composition

Most web vulnerabilities include, but are not limited to, XSS (cross-site scripting), OSCI (OS command injection), SQLI (SQL injection), etc. They are all caused by the user’s input not being safely sanitized and directly fed to the dangerous function. Some conditional statements will change the command execute route, making the vulnerable code line run on a specific condition. To this end, we introduce a new graph structure named SDG, which is a sub-graph of PDG (Program Dependence Graph) and only contains the nodes and edges closely related to vulnerability. The procedure of the SDG constructor includes two steps: PDG extraction and PDG slicing.

PDG extraction. We use the open-source tool PHP-parser [44] to obtain the AST of PHP source code and analyze the control dependence and data dependence to construct the PDG, the same as in paper [12]. Figure 2 shows the example procedure of PDG extraction with code Listing 1. We extract the CFG (Control Flow Graph) of the source code according to the code execution order and branch relationship. Then, we analyze the “define-use” relationship of variables in the code to obtain the DFG (Data Flow Graph) of the source code, where the use of the $name variable of nodes 3, 4, and 6 is defined by node 1. Finally, we combine the above two diagrams into one graph, the so-called PDG.

In this step, we extend PDG with a graph attribute sensitive_call_funcs to save all the sensitive call functions in the source code. Table 1 shows the dangerous API calling functions concerned by different vulnerabilities in PHP vulnerability. There are some other API functions that can sanitize the special chars of user input that will prevent evil things from occurring. Table 2 shows the standard sanitizer functions. These two types of API functions are all the sensitive functions we focus on, because they are the key to deciding whether the source code is vulnerable.

PDG slicing. When slicing the vulnerable code, we focus on the functions or API calls that can trigger vulnerability execution, usually called sink point/function. Unlike the traditional way of manually slicing source code according to the specified code line, we adopt an improved automatic code-slicing method. In the previous step of AST (Abstract Syntax Tree) analyzing, we reserve the names and types of all API calling functions in source code, pick up the sensitive function calls as a list, then save it as PDG graph attribute sensitive_call_funcs. With this sensitive call functions list, we can quickly locate dangerous sink points during subsequent slicing. When the sink point is determined, we reverse search all the paths that can reach the sink point by using the connectivity of the nodes in PDG, then collect all nodes on these paths as a set V. We use it to segment the PDG obtained in the previous step into an SDG, leaving only the nodes in the set V and the corresponding edges. Moreover, we retain the source code represented by nodes in the SDG node attribute, because these codes are code slices closely related to vulnerabilities.

The PHP code Listing 1 is an XSS (CWE-79) vulnerable sample. There are two sensitive functions echo related to XSS vulnerability in this sample code, and the line numbers are 4 and 6. The actual vulnerability trigger is in line 6. According to the above slicing principle, we can obtain two SDG slices. The code lines are 1, 2, 3, 4 and 1, 2, 3, 5, 6, respectively, as shown in Figure 3. The fourth line of code has a conditional judgment before execution. Only the $name variable containing uppercase and lowercase letters is allowed as its input, so it will not trigger the XSS vulnerability. However, the sixth line of code has no limit and leads to vulnerability.

4.3. Stage II: Normalization and Embedding

Normalization. Normalizing the original code slice is necessary before importing SDG into the deep learning model. The programmer has a personal habit of naming the function and variable. It will generate noise in the code that will enlarge the dimensionality of vectorization, and some may cause errors. Therefore, we standardize the source code in a unified form so that the semantic information of the original code can be better preserved while reducing redundant information. Referring to the method in the article, we replace the user-defined variable and function names with a public identifier and a separate index (VAR0, VAR1..., FUN0, FUN1...) and replace all printable strings and strings in fixed splicing format with null. However, we do not replace PHP built-in API functions such as sensitive functions closely related to vulnerabilities (such as echo, query, and system) and filter functions (such as addslashes, mysql_escape_string, and htmlspecialchars).

According to the normalized rules, we transform the PHP code Listing 1 to Listing 4, then replace the name and age variables with VAR0 and VAR1, respectively. The echo function output string in a fixed splicing format is empty. However, the double quotation marks around $VAR0 in the fourth line are reserved because the single and double quotation marks here play a crucial role in triggering the vulnerability. For example, htmlspecialchars is a filter function to escape unique strings contained in the input and is a way to prevent XSS. By default, only the characters <, >, &, and double quote are escaped. Therefore, the single quotation can bypass the escaping rule if the output string is spliced with single quotation marks causes vulnerability, while double quotation marks are safe.

Listing 4. Normalized XSS example.

Doc2Vec [22] is a common technique for characterizing documents as low-dimensional vectors. It is an unsupervised model that encodes the entire code statement instead of a single code mark into a fixed-length vector. Compared with the simple combination vector generated by embedding a single word token, Doc2Vec can more accurately represent the entire statement characteristics of the code. The primary algorithm of Doc2Vec is called Distributed Memory Version of Paragraph Vector (PV-DM), in which a statement topic vector is used to represent the semantics of statements. Suppose a token sequence is given as

w_{1}, w_{2}, \dots, w_{N}

, it includes both a single-word token and a sentence token that can reflect the topic of the whole sentence. Doc2Vec’s task is to use the token list

w_{i - k}, \dots, w_{i + k}

that is around the current

w_{i}

to predict the occurrence probability of

w_{i}

.

\frac{1}{N} \sum_{k = 1}^{N - k} log p (w_{i} | w_{i - k}, . . ., w_{i + k})

(4)

p (w_{i} | w_{i - k}, . . ., w_{i + k}) = \frac{e x p (y_{w_{i}})}{\sum_{j} e x p (y_{w_{j}})}

(5)

y = U q (w_{i - k}, . . ., w_{i + k}; V) + b

(6)

To make vectors with similar meanings close to each other in the potential vector space, Doc2Vec uses the average logarithmic probability (4) as the objective function to train a multi-category classifier, such as softmax (5), where

y_{w_{j}}

represents the n-th element in the output marker’s denormalized logarithmic probability distribution vector y. The calculation formula of the y vector is as follows (6). Each column in V represents a unique vector of

w_{i}

, which is initialized randomly. U and b are softmax parameters, and

q (\cdot)

is the concatenation of token vectors extracted from V. Then, use random gradient descent (SGD) to automatically train and update the representation vector of

w_{i}

[33]. After training, each representation vector is the feature vector of the token

w_{i}

, where the representation vector of the statement subject token reflects the overall characteristics of the whole statement.

Figure 4 is an example of Doc2Vec’s procedure of predicting the “[” character in the second line of code. First, through word segmentation, a corpus consisting of the complete code statement and a total of 6 words adjacent to the character “[” to be predicted is obtained (“$VAR1=$_POST [’VAR1 ’];”, “$VAR1”, “=”, “$_POST”, “VAR1”, “]”, “;”). Then, randomly initialize a

V^{T}

vector according to the size of the set embedding matrix, splice the corresponding vectors into a fixed-length vector, and then use softmax for prediction.

V^{T}

is automatically updated during training. The final

V_{s}

is the statement feature vector of the current code line.

4.4. Stage III: Deep Learning Algorithm

Figure 5 shows details of our deep learning model, including three basic module units, which consist of a graph convolution layer, a graph pooling layer, and an attention layer. We stack the three cell modules, where the output of the last cell is used as the input of the next cell. Finally, we sum the output of each cell, and the result is the eigenvector learned by the model from the input SDG diagram sample.

Graph Convolutional Network (GCN) layer. GCN [23] is a feature extractor that can automatically learn node features and the correlation features between nodes [45]. The GCN implements an efficient method of extracting features from graph data and is perfect for node classification, graph classification, and link prediction. It can also obtain graph embedding.

Suppose a graph’s data have N nodes, and each node has its characteristics. The features of these nodes form an

N \times D

dimension matrix X. The relationship between the nodes will form an

N \times N

-dimensional matrix A, also known as an adjacency matrix. X and A are the inputs of the GCN model. The propagation mode between layers of neural networks in the GCN model is as follows (7):

H^{l + 1} = σ ({\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}} H^{l} W^{l})

(7)

H is the feature of each hidden layer, and

H = X

for the input layer.

σ (\cdot)

is a nonlinear activation function. W is the node weight matrix.

\tilde{A} = A + I

, I is identity matrix.

\tilde{D}

is the degree matrix of

\tilde{A}

, and the calculation formula is (8):

\tilde{D_{i i}} = \sum_{j} \tilde{A_{i j}}

(8)

Finally, the

{\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}}

can be calculated by (9)

{\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}} = \sum_{j} \frac{1}{\sqrt{\tilde{D_{i i}} \tilde{D_{j j}}}} \tilde{A_{i j}} H_{j}

(9)

The core idea of GCN is to aggregate node information by using edge information to generate a new representation of nodes. Its fundamental idea is to extract the graph’s spatial topological characteristic. In other words, it is a feature extractor that can automatically learn the node features of the input graph and the association features between nodes. Therefore, it is very suitable for classification tasks at the graph level. According to (7), we use ReLU as the activation function in our training model.

Top-k Graph Pooling layer. In a CNN (Convolutional Neural Network), the pooling layer can effectively reduce the number of model parameters, thus avoiding model overfitting. Similarly, in the Graph Neural Network, a hierarchical pooling layer can play a similar role in the graph structure, especially in complex and large-scale graph classification tasks, because these diagrams usually contain rich hierarchies, which are very important for diagram classification tasks. In this paper, we choose the Top-k pooling model [45] as the graph pooling layer to learn the hierarchical structure of the nodes in the graph and further extract the node features that have a close relationship with the graph-level feature representation. Therefore, it will reduce the dimension of the output feature vector, reduce the computational load of the subsequent neural network layer, and to some extent, avoid the overfitting of the model.

The model performs inverse sorting by calculating the projection score of each node feature vector on a global basis vector, which is automatically learnable. Then, it selects the top

⌈k N⌉

nodes, where

k \in (0, 1)

is the pooling ratio, and N is the total number of nodes in the original graph. The specific calculation principle is as follows (10), (11), (12), and (13):

y = \frac{H^{l} p^{l}}{∥ p^{l} ∥}

(10)

i d x = toprank (y, k)

(11)

A^{l + 1} = A_{(i d x, i d x)}^{l}

(12)

H^{l + 1} = {(H^{l} ⊙ \tanh (y))}_{i d x}

(13)

where p is a learnable basis vector. y is the projection score of the node on the basis vector.

∥ \cdot ∥

represents the L2 norm. toprank means to select the first k indexes from the given input vector. ⊙ represents (broadcasted) elementwise multiplication.

t a n h (\cdot)

is a nonlinear activation function. The subscript vector idx is an index operation, which gets the node at the index specified by the vector idx.

Attention layer. To unify the output of each unit module and facilitate the final aggregation operation, we introduce a global soft attention layer. In this layer, we calculate the attention score by mapping node features, then use softmax to normalize and weight the node features in the output graph level features to highlight the vital node features that can affect the graph-level classification tasks. The weight coefficient is calculated as follows (14):

Z = softmax (σ ({\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}} X W_{a t t}))

(14)

Z \in R^{N \times 1}

is the normalized attention score.

σ

is the activation function.

\tilde{A} = A + I

(I denote the identity matrix).

\tilde{D} \in R^{N \times N}

is the degree matrix of

\tilde{A}

.

X \in R^{N \times F}

is the input feature (N is the total number of nodes, and F is the dimension of the input feature).

W_{a t t} \in R^{F \times 1}

is the parameter to be learned.

Model output. Considering the model effect and computation amount, we designed a three-layer structure with a convolution layer, graph pooling layer, and attention layer as basic units. We aggregated the output of each unit as the output of graph-level features. This model can not only thoroughly learn the node features and edge features of the graph but also map the output high-dimensional feature vectors to the low-dimensional space to improve the efficiency of model training and detection. The output of each fundamental unit is expressed as (15):

s^{l} = \frac{1}{N^{l}} \sum_{i = 1}^{N^{l}} x_{i}^{l} ‖ {m a x}_{i = 1}^{N^{l}} x_{i}^{l}

(15)

where

N^{l}

is the node number of the n-th basic unit layer.

x_{i}^{l}

is the eigenvector of node i in the n-th basic unit layer. The characteristics of the final graphic level can be aggregated by the output characteristics

s^{l}

of all basic units (16).

s = \sum_{l = 1}^{L} s^{l}

(16)

4.5. Stage IV: Vulnerability Detection

Since SDG contains the PHP script’s control dependence, data dependence, and statement-level code semantic information, the final received graph-level features incorporate semantic, syntactic, and structural knowledge of PHP source code and cover multi-type flaws. After the feature extraction, we use the MLP classification algorithm combined with softmax to predict the label probability distribution from the graph-level features. The classification algorithm shown follows (17).

p ({l a b e l}_{i}) = \frac{e x p (s \cdot {l a b e l}_{i})}{\sum_{{l a b e l}_{j} \in Y} e x p (s \cdot {l a b e l}_{j})}

(17)

where

p ({l a b e l}_{i})

is the predicted distribution of specified

{l a b e l}_{i}

. s is the graph eigenvector.

{l a b e l}_{i}

denotes the vector representation of each tag obtained from the model training. Finally, the label with the highest probability is used to determine whether the test source code has vulnerabilities.

5. Evaluation

To evaluate the effectiveness of VulEye in detecting common web application vulnerabilities, we comprehensively assessed the model designed in this paper by building comparative experiments based on a public vulnerability benchmark dataset. Then, we selected the best model to further evaluate in terms of detecting actual vulnerabilities within the experimental environments, such as SQL-Labs and XSS-Labs.

5.1. Datasets

We obtained a benchmark dataset on PHP application vulnerabilities from the Software Assurance Reference DataSet (SARD) [25]. Each source program (known as a test case) corresponds to one or more CWEIDs, which can be used as multiple vulnerability tags. This dataset contains 42212 PHP script files, covering many common security vulnerabilities in PHP applications, including 29258 safe samples and 12954 unsafe (vulnerable) samples. These weak samples include 12 common CWE vulnerabilities, such as CWE-79 cross-site scripting, CWE-78 OS command injection, CWE-89 SQL injection, etc. In the experiment, we ignored the data sets with a total number of samples fewer than 10, including CWE-311, CWE-327, and CWE-209. The statistics of data sets used in this experiment are shown in Table 3.

5.2. Experimental Setups

Pretreatment. The source programs collected by SARD are marked with the corresponding CWE ID at the statement level into three labels, safe, unsafe, and mixed. “Safe” denotes the sample does not include security defects. “Unsafe” means the example contains one or more specific security defects. “Mixed” means the sample consists of security defects and their fixed security patches. We mark the pieces without security defects as 0 and those with one or more specific security defects as 1. The vulnerability detection model in this paper is conducted at the slice level, not at the method or file level, to report whether SDG slices have vulnerabilities. As described in Section 4.2, we first slice each source program and generate one or more SDG slices according to the number of sensitive functions that different programs may contain. Then, according to the above marking rules, we further mark whether the slice has vulnerable code lines as safe or unsafe. The specific process is as follows: If the sample is extracted from an “unsafe” or “mixed” program and it contains a vulnerable statement marked by SARD, we mark it as “1” (that is unsafe). Otherwise, it is marked as “0” (that is safe). Table 4 shows the statistics of SDG-sliced datasets. In the table, *_N denotes the statistics of SDG nodes, and *_E denotes SDG edges’ statistics.

Configuration. We carried out all experiments in this paper on a high-performance host with 24-core CPU, 128G memory, and two NVIDIA RTX3090 graphics cards (only one in actual training). The detailed software and hardware configuration parameters are in Table 5. In the experiment, we use Cross-Entropy as our loss function, because this is a multi-class classification problem. We set the model Dropout Rate to 0.5 and Learning Rate to 0.002 to prevent over-fitting during training. Through experimental comparison, we finally chose Adam as the optimizer to accelerate the convergence of the network. The training process uses a batch size of 64. We randomized the data set, and the ratio of the training set, verification set, and test set is 8:1:1.

Criteria. We used the confusion matrix, as shown in Table 6, during the evaluation procedure. TP (True Positive) is the number of samples where both predicted and actual labels are positive. FP (False Positive) is the number of examples where the actual label is negative, but the predicted label is positive. TN (True Negative) is the number of samples where both the forecast and actual labels are negative. FN (False Negative) is the number of examples where the true label is positive, but the forecast label is negative. To fully use the value of data sets and objectively verify the performance of different methods and models, we choose the following evaluation indicators: accuracy, precision, recall, and F1 score. The specific definitions are as follows:

Accuracy: Reflects the correct proportion of prediction in all samples.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(18)

Precision: Reflects the proportion of correctly predicted vulnerabilities in the samples judged as vulnerabilities by the system.

P r e c i s i o n = \frac{T P}{T P + F P}

(19)

Recall: Reflects the proportion of vulnerability samples determined by the detection system in the actual vulnerability samples.

R e c a l l = \frac{T P}{T P + F N}

(20)

F1 score: Comprehensively considers the precision and recall and more evenly reflects the vulnerability detection performance of the system.

F 1 s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(21)

5.3. Result

To be more evidently and intuitive, we evaluated the advantages of VulEye by designing four questions and answering them.

Q1: How is VulEye’s detecting ability?

We first evaluated VulEye on the binary classification task on this question and chose the Receiver Operating Characteristic (ROC) curve to show the performance. Figure 6 offers VulEye’s ROC performance. We can see that the area under the curve (AUC) was up to 0.9993, meaning that VulEye has adequate detection capability. Figure 7 and Figure 8 show the tendencies of accuracy and loss in the training and valid stage. We can see that accuracy continues and stabilizes when the training steps increases and reaches 97% when the training step is 8000. Meanwhile, the loss gradually reduced from 0.5 to about 0.02. Table 7 and Table 8 for the classification report of VulEye shows that the model’s best macro-average accuracy reached 99%. So, VulEye performs very nicely on the binary classification job. To additionally confirm the model’s capability in detecting multi-classes of vulnerabilities, we show the results by a multi-class confusion matrix, as shown in Figure 9. The results show that only CWE-78 and CWE-95’s detection abilities are lower than 90%.

Q2: Is the SDG slicing method beneficial to vulnerability detection?

We compare VulEye’s SDG representation method with the traditional CFG (Control Flow Graph) and PDG (Program Dependence Graph) representation methods to verify the effectiveness of the SDG slicing method proposed in this paper. We trained binary and multi-classification models for each program representation method during the experiment to observe the vulnerability detection performance in binary and multi-classification scenarios. Table 9 shows that VulEye applies to different program representation methods, and the accuracy and precision of detection are more than 87%. Among them, the program representation method using PDG has a better detection effect than CFG, with an accuracy rate exceeding 96%, a precise detection rate for vulnerable programs exceeding 95%, and achieved an F1 score of 93.47%. The program representation method based on the PDG sub-graph slice SDG proposed in this paper can further improve the detection effect of the model. Its indicators are optimal, with a detection accuracy rate of 98.43%, a precise rate of vulnerable programs of 95.99%, and an F1 score of 96.29%.

Q3: How about the VulEye’s detection ability compared with other models and tools?

Since many outstanding detection models and mature tools for PHP vulnerability detection worldwide exist, we selected six advanced models and tools for comparison: the machine-learning-based model Wirecaml [46] and TAP [36], the static-analysis-based tools Progpilt [47], RIPS [29], and WAP, and the Graph Neural Network based model DeepTective. We use these tools and models to conduct PHP vulnerability detection experiments compared with VulEye and measure their detection performance.

Table 10 shows the performance result. We can see that the detection ability of VulEye significantly outperformed the existing open-source detection tools Progpilt, RIPS, and WAP. The main goal of these tools is generally to use simple rules to parse and check vulnerabilities. It is difficult to generate all kinds of vulnerability-triggering rules. Moreover, the rules’ generation depends on expert knowledge, and some amiss rules may enlarge False Negatives and False Positives in vulnerability detection. the machine learning tools TAP and Wirecaml-SQLI performed better than open-source tools based on static analysis. The TAP model achieved an accuracy of 0.9385 and an F1 score of 0.8788. As far as we know, DeepTective is the latest PHP vulnerability detection method based on a Graph Neural Network which combines gated recurrent units and graph convolutional networks to detect SQLI (SQL injection), XSS (cross-site scripting), and OSCI (OS command injection) vulnerabilities. Since it is not an open-source project, we implemented the model and tried our best to train it as the authors have. The results show that VulEye is also better than the DeepTective model.

Q4: How is the generalization capability of VulEye?

We evaluate the generalization ability of VulEye through the actual PHP vulnerability programs contained in the two open-source vulnerability experimental environments, SQL-Labs [26] and XSS-Labs [27]. In the experiment, we selected our best model to detect the vulnerable file in the SQL-Labs and XSS-Labs and find how many files VulEye can successfully discover. There are 69 SQLI vulnerable files in SQL-Labs and 20 XSS vulnerable files in XSS-Labs. We deleted two vulnerable files in XSS-Labs, as shown in the code Listings 5 and 6, because the vulnerability is not caused by PHP code but javascript. Table 11 shows the result of VulEye’s vulnerability detection capability in an actual PHP project. VulEye detected 61 SQLI vulnerable files in a total of 69 and 13 XSS vulnerable files out of 18. The accuracy of SQLI vulnerability detection is 0.8841, and the XSS vulnerability detection accuracy is 0.7222.

Listing 5. XSS-Labs sample 6.

Listing 6. XSS-Labs sample 8.

6. Future Work

However, VulEye still has limitations. Currently, VulEye only handles SDG slicing in one PHP file and cannot handle the function calls between two or more files. In the future, we will carry out more work to solve the problem and conduct more tests in real production projects.

7. Conclusions

In this paper, we proposed VulEye, a novel Graph Neural Network vulnerability detection approach dealing with PHP script. VulEye can assist security researchers in finding PHP projects’ vulnerabilities quickly. VulEye first constructs the Program Dependence Graph (PDG) of the PHP source code and slices PDG with the sensitive functions contained in the PHP code into SDGs (Sub-Dependence Graph), which are a sub-graph of PDG. Then, we make SDG the model input to train with a Graph Neural Network model, which contains three stack units with a multi-layer of GCN layer, Top-k pooling layer, and attention layer. Finally, we use MLP and softmax as a classifier to predict if the SDG graph is vulnerable. We evaluated VulEye on the SARD datasets and discovered that the best macro-average F1 score of the VulEye reached 99% in the binary classification task and 95% in the multi-classes classification task. VulEye achieved the best result compared with the existing open-source vulnerability detection implements and other state-of-art deep learning models. Moreover, VulEye can also locate the specific location of vulnerabilities, since the SDG contains code slices closely related to vulnerabilities with a critical triggering sensitive/sink function.

Author Contributions

Conceptualization, Y.X.; data curation, Z.L.; formal analysis, C.L.; funding acquisition, Y.X.; investigation, Y.X.; methodology, C.L.; software, C.L.; supervision, Y.F.; validation, Z.L.; visualization, C.L.; writing—original draft, C.L.; writing—review and editing, C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by the National Natural Science Foundation of China (U20B2045).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We would like to thank the reviewers for their helpful comments. This research is funded by the National Natural Science Foundation of China (U20B2045).

Conflicts of Interest

The authors declare no conflict of interest.

References

None. PHP Official Site. Available online: https://www.php.net/ (accessed on 4 September 2022).
None. Usage of Server-Side Programming Languages for Websites. Available online: https://w3techs.com/technologies/overview/programming_language/all (accessed on 4 September 2022).
None. GitHut: A Small Place to Discover Languages in GitHub. Available online: https://githut.info/ (accessed on 4 September 2022).
Sun, H.; Cui, L.; Li, L.; Ding, Z.; Hao, Z.; Cui, J.; Liu, P. VDSimilar: Vulnerability detection based on code similarity of vulnerabilities and patches. Comput. Secur. 2021, 110, 102417. [Google Scholar] [CrossRef]
Zhang, H.; Wang, S.; Li, H.; Chen, T.H.; Hassan, A.E. A Study of C/C++ Code Weaknesses on Stack Overflow. IEEE Trans. Softw. Eng. 2022, 48, 2359–2375. [Google Scholar] [CrossRef]
Fang, Y.; Han, S.; Huang, C.; Wu, R. TAP: A static analysis model for PHP vulnerabilities based on token and deep learning technology. PLoS ONE 2019, 14, 1–19. [Google Scholar] [CrossRef] [PubMed] [Green Version]
None. The WordPress Backup Plugin. Available online: https://ithemes.com/backupbuddy/ (accessed on 17 September 2022).
None. Hackers Exploit Zero-Day in WordPress BackupBuddy Plugin in 5 Million Attempts. Available online: https://thehackernews.com/2022/09/hackers-exploit-zero-day-in-wordpress.html (accessed on 17 September 2022).
Guo, W.; Fang, Y.; Huang, C.; Ou, H.; Lin, C.; Guo, Y. HyVulDect: A hybrid semantic vulnerability mining system based on Graph Neural Network. Comput. Secur. 2022, 121, 102823. [Google Scholar] [CrossRef]
Jovanovic, N.; Kruegel, C.; Kirda, E. Pixy: A static analysis tool for detecting Web application vulnerabilities. In Proceedings of the 2006 IEEE Symposium on Security and Privacy (S&P’06), Berkeley, CA, USA, 21–24 May 2006; pp. 6–263. [Google Scholar] [CrossRef]
Dahse, J.; Schwenk, J. RIPS-A static source code analyser for vulnerabilities in PHP scripts. In Seminar Work (Seminer Çalismasi); Horst Görtz Institute Ruhr-University Bochum: Bochum, Germany, 2010. [Google Scholar]
Dahse, J.; Holz, T. Simulation of Built-in PHP Features for Precise Static Code Analysis. In Proceedings of the NDSS, San Diego, CA, USA, 23–26 February 2014; Volume 14, pp. 23–26. [Google Scholar]
Son, S.; Shmatikov, V. SAFERPHP: Finding semantic vulnerabilities in PHP applications. In Proceedings of the ACM SIGPLAN 6th Workshop on Programming Languages and Analysis for Security, San Jose, CA, USA, 5 June 2011; pp. 1–13. [Google Scholar]
Nunes, P.J.C.; Fonseca, J.; Vieira, M. phpSAFE: A security analysis tool for OOP web application plugins. In Proceedings of the 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Washington, DC, USA, 22–25 June 2015; pp. 299–306. [Google Scholar]
Medeiros, I.; Neves, N.; Correia, M. Equipping wap with weapons to detect vulnerabilities: Practical experience report. In Proceedings of the 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Toulouse, France, 28 June–1 July 2016; pp. 630–637. [Google Scholar]
Huang, J.; Li, Y.; Zhang, J.; Dai, R. UChecker: Automatically detecting php-based unrestricted file upload vulnerabilities. In Proceedings of the 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Portland, OR, USA, 24–27 June 2019; pp. 581–592. [Google Scholar]
Fidalgo, A.; Medeiros, I.; Antunes, P.; Neves, N. Towards a deep learning model for vulnerability detection on web application variants. In Proceedings of the 2020 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), Porto, Portugal, 24–28 October 2020; pp. 465–476. [Google Scholar]
Guo, N.; Li, X.; Yin, H.; Gao, Y. Vulhunter: An automated vulnerability detection system based on deep learning and bytecode. In Proceedings of the International Conference on Information and Communications Security, Beijing, China, 15–17 December 2019; pp. 199–218. [Google Scholar]
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The Graph Neural Network model. IEEE Trans. Neural Netw. 2008, 20, 61–80. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Backes, M.; Rieck, K.; Skoruppa, M.; Stock, B.; Yamaguchi, F. Efficient and flexible discovery of php application vulnerabilities. In Proceedings of the 2017 IEEE European Symposium on Security and Privacy (EuroS&P), Paris, France, 26–28 April 2017; pp. 334–349. [Google Scholar]
Rabheru, R.; Hanif, H.; Maffeis, S. A Hybrid Graph Neural Network Approach for Detecting PHP Vulnerabilities. In Proceedings of the 2022 IEEE Conference on Dependable and Secure Computing (DSC), Edinburgh, UK, 22–24 June 2022; pp. 1–9. [Google Scholar] [CrossRef]
Le, Q.; Mikolov, T. Distributed Representations of Sentences and Documents. In Proceedings of the 31st International Conference on International Conference on Machine Learning (ICML’14), Bejing, China, 22–24 June 2014; Volume 32, pp. II–1188–II–1196. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Weiser, M. Program slicing. IEEE Trans. Softw. Eng. 1984, 4, 352–357. [Google Scholar] [CrossRef]
Stivalet, B. PHP Vulnerability Test Suite. Available online: https://samate.nist.gov/SARD/view.php?tsID=103 (accessed on 4 June 2022).
Audi. SQLI Labs. Available online: https://github.com/Audi-1/sqli-labs (accessed on 6 October 2022).
BLACKHAT-SSG. XSS Labs. Available online: https://github.com/BLACKHAT-SSG/XSS-Labs (accessed on 6 October 2022).
OliverKlee. Pixy. Available online: https://github.com/oliverklee/pixy (accessed on 4 September 2022).
RipsScanner. Ripsscanner/Rips: RIPS—A Static Source Code Analyzer for Vulnerabilities in PHP Scripts. Available online: https://github.com/ripsscanner/rips (accessed on 4 June 2022).
Phpstan. Phpstan. Available online: https://phpstan.org/ (accessed on 4 June 2022).
Medeiros, I.; Neves, N.; Correia, M. Detecting and removing web application vulnerabilities with static analysis and data mining. IEEE Trans. Reliab. 2015, 65, 54–69. [Google Scholar] [CrossRef]
Medeiros, I.; Neves, N.; Correia, M. DEKANT: A static analysis tool that learns to detect web application vulnerabilities. In Proceedings of the 25th International Symposium on Software Testing and Analysis, Saarbrücken, Germany, 18–20 July 2016; pp. 1–11. [Google Scholar]
Rabiner, L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 1989, 77, 257–286. [Google Scholar] [CrossRef]
Kronjee, J.; Hommersom, A.; Vranken, H. Discovering software vulnerabilities using data-flow analysis and machine learning. In Proceedings of the Proceedings of the 13th International Conference on Availability, Reliability and Security; Hamburg, Germany, 27–30 August 2018, pp. 1–10.
Li, Z.; Zou, D.; Xu, S.; Ou, X.; Jin, H.; Wang, S.; Deng, Z.; Zhong, Y. Vuldeepecker: A deep learning-based system for vulnerability detection. arXiv 2018, arXiv:1801.01681. [Google Scholar]
Das-Lab. TAP. Available online: https://github.com/das-lab/TAP (accessed on 4 June 2022).
Lin, G.; Wen, S.; Han, Q.L.; Zhang, J.; Xiang, Y. Software Vulnerability Detection Using Deep Neural Networks: A Survey. Proc. IEEE 2020, 108, 1825–1848. [Google Scholar] [CrossRef]
Corporation, M. 2022 CWE Top 25 Most Dangerous Software Weaknesses. Available online: https://cwe.mitre.org/top25/archive/2022/2022_cwe_top25.html (accessed on 4 September 2022).
Li, Y.; Wang, S.; Nguyen, T.N.; Van Nguyen, S. Improving bug detection via context-based code representation learning and attention-based neural networks. Proc. ACM Program. Lang. 2019, 3, 1–30. [Google Scholar] [CrossRef]
Fan, W.; Ma, Y.; Li, Q.; He, Y.; Zhao, E.; Tang, J.; Yin, D. Graph Neural Networks for social recommendation. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 417–426. [Google Scholar]
Shang, C.; Tang, Y.; Huang, J.; Bi, J.; He, X.; Zhou, B. End-to-end structure-aware convolutional networks for knowledge base completion. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 3060–3067. [Google Scholar]
Zheng, C.; Fan, X.; Wang, C.; Qi, J. Gman: A graph multi-attention network for traffic prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 1234–1241. [Google Scholar]
Khamsi, M.A.; Kirk, W.A. An Introduction to Metric Spaces and Fixed Point Theory; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
Nikic. PHP-Parser. Available online: https://github.com/nikic/PHP-Parser (accessed on 4 June 2022).
Fey, M. Pytorch Geometric Documentation. Available online: https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html (accessed on 16 June 2022).
Jorkro. Jorkro/Wirecaml: WeaknessIdentificati on Research EmployingCFG Analysis and Machine Learning. Available online: https://github.com/jorkro/wirecaml (accessed on 4 June 2022).
DesignSecurity. Progpilot. Available online: https://github.com/designsecurity/progpilot (accessed on 4 June 2022).

Figure 1. VulEye overview.

Figure 2. PDG extraction procedure of code listing 1.

Figure 3. PDG slice procedure of code listing 1.

Figure 4. Doc2vec.

Figure 5. VulEye deep learning model.

Figure 6. VulEye ROC curve.

Figure 7. VulEye training performance.

Figure 8. VulEye valid performance.

Figure 9. VulEye confusion matrix for multi-classes classification results.

Table 1. Common PHP Sensitive API functions.

Vulnerability	Sensitive Function
OSCI	System, exec, passthru, shell_exec,
(CWE-78)	popen, proc_open, pcntl_exec
XSS	echo, print, printf, vprintf, trigger_error, user_errorm, die,
(CWE-79)	exit, var_dump, odbc_result_all, ifx_htmltbl_result
	mysql_query, odbc_exec, mysqli_query, mysql_db_query,
SQLI	pg_query, pdo::query, pg_send_query, mysqli::query,
(CWE-89)	mysql_unbuffered_query, sqlsrv_query,
	SQLIte3::query, pg_query_params, pg_send_query_params, SQLIte3::exec

Table 2. Common PHP Sanitize functions.

Type	Functions
	addslashes, mysql_escape_string, strip_tag, mysql_real_escape_strin,
Sanitize	htmlspecialchars, htmlentities, strip_tag, escapeshellarg,
Functions	escapeshellcmd, preg_quote, pg_escape_string, sqlite_escape_string,
	cubrid_real_escape_string, mssql_escape, ldap_escape,
	cubrid_real_escape_string, pdo->prepare

Table 3. Original Dataset.

CWE	Safe	Unsafe	Total
OS Command Injection (CWE-78)	1872	624	2496
Cross-site Scripting (CWE-79)	5728	4352	10,080
SQL Injection (CWE-89)	8640	912	9552
LDAP Injection (CWE-90)	1728	2112	3840
XML Injection (CWE-91)	4784	1264	6048
Eval Injection (CWE-95)	1296	336	1632
PHP Remote File Inclusion (CWE-98)	2592	672	3264
Open Redirect (CWE-601)	2208	2592	4800
Missing Authorization (CWE-862)	400	80	480
Totals	29,248	12,944	42,192

Table 4. Preprocessed Dataset.

	Safe	Unsafe	Total	Total_N	Max_N	Avg_N	Min_N	Total_E	Max_E	Avg_E	Min_E
CWE-78	3204	888	4092	30,060	19	7.3	2	47,292	33	11.6	1
CWE-79	4358	1984	6342	43,474	18	6.9	2	65,940	31	10.4	1
CWE-89	13,797	1320	15,117	172,596	23	11.4	2	272,106	40	18	1
CWE-90	4464	2112	6576	51,840	19	7. 9	2	83,136	33	12.6	1
CWE-91	9833	2386	12,219	79,211	21	8.1	2	127,549	37	12.9	1
CWE-95	870	144	1014	8034	19	7.9	2	12,744	33	12.6	1
CWE-98	2028	1078	3106	14,388	18	7.1	2	22,128	31	10.9	1
CWE-601	3806	4078	7884	51,012	18	6. 5	2	77,316	31	9.8	1
CWE-862	766	64	830	6436	21	7.8	3	10,343	37	12.5	2
Total	43,126	10,590	53,716	457,051	/	/	/	718,554	/	/	/

Table 5. Experimental Environment Configure.

Type	Config
CPU	12th Gen Intel(R) Core(TM) i9-12900KF(24 cores)
RAM	128GB
GPU	NVIDIA RTX 3090
OS	Ubuntu 22.04.1 LTS
Soft	Python 3.9.12, Torch 1.12, CUDA 11.6, TorchGeometer 2.0.4

Table 6. Confusion matrix of Binary classification.

Real/Predict	Unsafe	Safe
Unsafe	TP	FN
Safe	FP	TN

Table 7. Binary Classification report of VulEye.

	Precision	Recall	F1 Score	Support
Safe	1.00	0.99	1.00	2111
Unsafe	0.97	1.00	0.99	565
accuracy			0.99	2676
macro avg	0.99	1.00	0.99	2676
weighted avg	0.99	0.99	0.99	2676

Table 8. Multi-classes Classification report of VulEye.

	Precision	Recall	F1 Score	Support
Safe	1.00	1.00	1.00	2111
CWE-78	1.00	1.00	1.00	1
CWE-79	1.00	1.00	1.00	193
CWE-89	0.95	1.00	0.97	35
CWE-90	0.98	0.97	0.97	100
CWE-91	1.00	1.00	1.00	15
CWE-95	0.67	0.50	0.57	4
CWE-98	1.00	1.00	1.00	42
CWE-601	1.00	1.00	1.00	172
CWE-862	1.00	1.00	1.00	3
accuracy			1.00	2676
macro avg	0.96	0.95	0.95	2676
weighted avg	1.00	1.00	1.00	2676

Table 9. Different representation methods’ results comparison.

	Class	Precision	Recall	F1	Accuracy
VulEye-CFG	Safe	0.8741	0.959	0.9146	0.8756
VulEye-CFG	Unsafe	0.8805	0.6865	0.7715	0.8756
VulEye-PDG	Safe	0.9628	0.9824	0.9725	0.9613
VulEye-PDG	Unsafe	0.9576	0.913	0.9347	0.9613
VulEye-SDG	Safe	0.9909	0.9892	0.99	0.9843
VulEye-SDG	Unsafe	0.9599	0.9659	0.9629	0.9843

Table 10. Tools’ Results Comparison.

Tool	Accuracy	Precision	Recall	F1
VulEye	0.9843	0.9599	0.9659	0.9629
DeepTective	0.9726	0.9331	0.9373	0.9352
TAP	0.9385	0.925	0.837	0.8788
Wirecaml-XSS	0.3864	0.2274	0.8851	0.3618
Wirecaml-SQLI	0.7171	0.1269	1	0.2252
Progpilot	0.413	0.2227	0.4839	0.305
WAP	0.657	0.2843	0.1902	0.2279
RIPS	0.6923	0.3333	0.1562	0.212

Table 11. Generalization capability test results.

	Contained	Detected	Accuracy
SQL-Labs	69	61	0.8841
XSS-Labs	18	13	0.7222

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lin, C.; Xu, Y.; Fang, Y.; Liu, Z. VulEye: A Novel Graph Neural Network Vulnerability Detection Approach for PHP Application. Appl. Sci. 2023, 13, 825. https://doi.org/10.3390/app13020825

AMA Style

Lin C, Xu Y, Fang Y, Liu Z. VulEye: A Novel Graph Neural Network Vulnerability Detection Approach for PHP Application. Applied Sciences. 2023; 13(2):825. https://doi.org/10.3390/app13020825

Chicago/Turabian Style

Lin, Chun, Yijia Xu, Yong Fang, and Zhonglin Liu. 2023. "VulEye: A Novel Graph Neural Network Vulnerability Detection Approach for PHP Application" Applied Sciences 13, no. 2: 825. https://doi.org/10.3390/app13020825

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

VulEye: A Novel Graph Neural Network Vulnerability Detection Approach for PHP Application

Abstract

1. Introduction

2. Related Work

2.1. Tool-Based Manual Auditing

2.2. Machine-Learning-Based Method

2.3. Deep-Learning-Based Method

3. Background

3.1. PHP Common Vulnerability

3.2. Source Code Representation

3.3. Graph Neural Network

4. Proposed Approach

4.1. Overview

4.2. Stage I: Graph Composition

4.3. Stage II: Normalization and Embedding

4.4. Stage III: Deep Learning Algorithm

4.5. Stage IV: Vulnerability Detection

5. Evaluation

5.1. Datasets

5.2. Experimental Setups

5.3. Result

6. Future Work

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI