1 Introduction
-
The higher error rate of the obtained results.
-
Insufficient accuracy of the created SQLi detectors by the ML algorithms.
-
Insufficient precision of the created SQLi detectors in detecting the SQLi attacks.
-
The higher number of selected features is the other drawback of the metaheuristic algorithms used for feature selection in the SQLi detectors.
-
Lower convergence speed and consequently insufficient performance of the feature selectors used in the previously suggested SQLi detectors.
-
Enhancing the accuracy of the SQLi detection method.
-
Enhancing the precision of the SQLi detection method.
-
Enhancing the sensitivity of the SQLi detection Method.
-
Reducing the Error rate of the SQLi detection method.
-
Finding the minimum number of most effective features in the SQLi detection method.
-
Creating an effective numeric training dataset that includes 13 numeric features from the existing SQLi datasets.
-
Developing two different binary versions of the gray wolf optimization algorithm to optimally select the features of the dataset.
-
Creating an effective and efficient classification model to detect SQLi attacks.
-
Increasing the efficiency of the SQLi detection methods by selecting 20% of the most effective features.
2 Related works
2.1 Blacklist-based SQLi detection method
2.2 Static analysis method
2.3 Dynamic analysis method
2.4 A hybrid analysis method
2.5 Query profiling methods
2.6 Machine learning-based methods
3 Suggested method
3.1 Dataset creation
select * from customer where cu_id = 1 or 'a' = 'a' | 1 |
SELECT * FROM customer UNION SELECT * clear FROM result ORDER BY habit | 2 |
select customer1.cu_id, customer1.name, customer1.family, factor.fact_id, factor.fdate, factor.amount from (select * from customer where cu_id = @cust_id) as customer1 inner join factor on customer1.cu_id = factor.cust_id; | 1 |
select name, family, tell, sum1 from customer inner join (select cust_id, sum(amount) as sum1 from payment group by cust_id) as payment1 on customer.cu_id = payment1.cust_id where cu_id = @cu_id; | 1 |
Length | Nesting | Unionnum | Constantnum | Orconstant | Spacifical character | Type | Andnum | “Num | Num | Null num | () num | Attack |
---|---|---|---|---|---|---|---|---|---|---|---|---|
30 | 2 | 0 | 4 | 0 | 15 | 1 | 1 | 0 | 0 | 0 | 4 | NO |
30 | 2 | 0 | 4 | 0 | 15 | 1 | 1 | 0 | 0 | 0 | 4 | NO |
30 | 2 | 0 | 4 | 0 | 15 | 1 | 1 | 0 | 0 | 0 | 4 | NO |
30 | 2 | 0 | 4 | 0 | 15 | 1 | 1 | 0 | 0 | 0 | 4 | NO |
30 | 2 | 0 | 4 | 0 | 15 | 1 | 1 | 0 | 0 | 0 | 4 | NO |
30 | 2 | 0 | 4 | 0 | 15 | 1 | 1 | 0 | 0 | 0 | 4 | NO |
30 | 2 | 0 | 4 | 0 | 15 | 1 | 1 | 0 | 0 | 0 | 4 | NO |
30 | 2 | 0 | 4 | 0 | 15 | 1 | 1 | 0 | 0 | 0 | 4 | NO |
30 | 2 | 0 | 4 | 0 | 15 | 1 | 1 | 0 | 0 | 0 | 4 | NO |
30 | 2 | 0 | 4 | 0 | 15 | 1 | 1 | 0 | 0 | 0 | 4 | NO |
30 | 2 | 0 | 4 | 0 | 15 | 1 | 1 | 0 | 0 | 0 | 4 | NO |
30 | 2 | 0 | 4 | 0 | 15 | 1 | 1 | 0 | 0 | 0 | 4 | NO |
30 | 2 | 0 | 4 | 0 | 15 | 1 | 1 | 0 | 0 | 0 | 4 | NO |
30 | 2 | 0 | 4 | 0 | 15 | 1 | 1 | 0 | 0 | 0 | 4 | NO |
30 | 2 | 0 | 4 | 0 | 15 | 1 | 1 | 0 | 0 | 0 | 4 | NO |
30 | 2 | 0 | 4 | 0 | 15 | 1 | 1 | 0 | 0 | 0 | 4 | NO |
30 | 2 | 0 | 4 | 0 | 15 | 1 | 1 | 0 | 0 | 0 | 4 | NO |
30 | 2 | 0 | 4 | 0 | 15 | 1 | 1 | 0 | 0 | 0 | 4 | NO |
30 | 2 | 0 | 4 | 0 | 15 | 1 | 1 | 0 | 0 | 0 | 4 | NO |
Feature | Description |
---|---|
1 | Length of SQL Query |
2 | Nesting Level Query |
3 | Num. of union Operator in the SQL Query |
4 | Num. of Constant Value in the SQL Query |
5 | Num. of OR Operator in the SQL Query |
6 | Num. of Specific Character in the SQL Query |
7 | Type of SQL Query in the SQL Query |
8 | Num. of AND Operator in the SQL Query |
9 | Num. of double quotation Character (“) in the SQL Query |
10 | Num. of double quotation Character (‘) in the SQL Query |
11 | Num. of Null Value in the SQL Query |
12 | Num. Parenthesis in the SQL Query |
13 | Class (Attack or Not Attack) |
3.2 Feature selection by gray wolf optimization algorithm
3.3 Binary GWO
3.4 Adaption of binary gray wolf optimizer
3.5 Creating the classifier
4 Experiments platform and results
4.1 Experiment platform
HW/SW | Specification |
---|---|
Central processing unit (CPU) | Intel Core i7 |
Frequency | 3.4 GHz |
RAM | 8 GB |
Operating system (OS) | Microsoft co. Windows 10 |
Software platform | MATLAB 2020b |
4.2 Datasets
Dataset type | Total number of samples | Number of normal samples | Number of malicious samples |
---|---|---|---|
Healthy and malicious query dataset | 1027 | 473 | 554 |
Parameter name | Value |
---|---|
No. of agents (wolf) | 12 |
Dimension | No. of elected features |
Iteration | 100 |
\(\alpha\) | 0.9 |
β | 0.1 |
4.3 Evaluation criteria and research questions
-
Accuracy
-
Precision
-
Sensitivity
-
Error Rate
-
Number of selected features
-
RQ1: How effective is the use of the proposed first feature selector (bGWO1) on the accuracy precision and error rate of the SQLi detectors?
-
RQ2: How effective is the use of the proposed second feature selector (bGWO2) on the accuracy precision and error rate of the SQLi detectors?
-
RQ3: What is the effect of using the proposed feature selector in reducing the number of features of the training dataset?
-
RQ4: What is the stability of the proposed feature selectors as a stochastic-based method?
-
RQ5: What is the success rate of the proposed feature selectors in finding the most effective features in the SQLi dataset?
4.4 Results and discussion
4.4.1 SQLi detectors without feature selector
ANN accuracy | ANN error rate | ANN sensitivity | ANN precision |
---|---|---|---|
0.9708 | 0.0292 | 0.9856 | 0.9514 |
0.9773 | 0.0227 | 0.9867 | 0.9673 |
0.9805 | 0.0195 | 0.9893 | 0.9660 |
DT accuracy | DT error rate | DT sensitivity | DT precision |
---|---|---|---|
0.8939 | 0.1061 | 0.8213 | 0.9440 |
0.9123 | 0.0877 | 0.9346 | 0.8830 |
0.9329 | 0.0671 | 0.9294 | 0.9251 |
4.4.2 The effects of bGWO1 on the SQLi detectors (RQ1)
Number of features | Selected features | |||
---|---|---|---|---|
bGWO1 + ANN | bGWO1 + DT | bGWO1 + ANN | bGWO1 + DT | |
Run1 | 4 | 6 | ||
Run2 | 3 | 7 | ||
Run3 | 4 | 6 |
4.4.3 The effects of bGWO2 on the SQLi detectors (RQ2)
4.4.4 The effects of bGWO in reducing the number of features (RQ3)
Number of features | Selected features | |||
---|---|---|---|---|
bGWO2 + ANN | bGWO2 + DT | bGWO2 + ANN | bGWO2 + DT | |
Run1 | 2 | 3 | ||
Run2 | 3 | 3 | ||
Run3 | 2 | 3 |
4.4.5 Evaluating the stability of the proposed method (RQ4)
4.4.6 Evaluating the success rate of the proposed method (RQ5)
Methods | Success rate |
---|---|
DT | 0.8795 |
ANN | 0.9866 |
bGWO1 + DT | 0.9917 |
bGWO2 + DT | 0.9863 |
bGWO1 + ANN | 0.991 |
bGWO2 + ANN | 0.9802 |
Methods | Training by the dataset with all Features | Feature selection using GWO1 | Feature selection using GWO2 | Training by the dataset with selective features |
---|---|---|---|---|
DT | 0.1057 s | 9.8611 s | 11.6230 s | 0.0184 s |
ANN | 3.6185 s | 845.2082s | 930.5194 s | 1.0520 s |
bGWO1 + ANN | bGWO1 + DT | bGWO2 + DT | bGWO2 + ANN |
---|---|---|---|
0.970588 | 0.972775 | 0.96874 | 0.97606 |
-
Num. of Constant Value in the SQL Query
-
Num. Parenthesis in the SQL Query
-
Nesting Level Query
-
Length of SQL Query
-
Num. of Specific Character in the SQL Query
-
Num. of union Operator in the SQL Query
-
Num. of double quotation Character (‘) in the SQL Query