Skip to main content
Erschienen in:

Open Access 29.08.2024

A hybrid approach for efficient feature selection in anomaly intrusion detection for IoT networks

verfasst von: Aya G. Ayad, Nehal A. Sakr, Noha A. Hikal

Erschienen in: The Journal of Supercomputing | Ausgabe 19/2024

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The exponential growth of Internet of Things (IoT) devices underscores the need for robust security measures against cyber-attacks. Extensive research in the IoT security community has centered on effective traffic detection models, with a particular focus on anomaly intrusion detection systems (AIDS). This paper specifically addresses the preprocessing stage for IoT datasets and feature selection approaches to reduce the complexity of the data. The goal is to develop an efficient AIDS that strikes a balance between high accuracy and low detection time. To achieve this goal, we propose a hybrid feature selection approach that combines filter and wrapper methods. This approach is integrated into a two-level anomaly intrusion detection system. At level 1, our approach classifies network packets into normal or attack, with level 2 further classifying the attack to determine its specific category. One critical aspect we consider is the imbalance in these datasets, which is addressed using the Synthetic Minority Over-sampling Technique (SMOTE). To evaluate how the selected features affect the performance of the machine learning model across different algorithms, namely Decision Tree, Random Forest, Gaussian Naive Bayes, and k-Nearest Neighbor, we employ benchmark datasets: BoT-IoT, TON-IoT, and CIC-DDoS2019. Evaluation metrics encompass detection accuracy, precision, recall, and F1-score. Results indicate that the decision tree achieves high detection accuracy, ranging between 99.82 and 100%, with short detection times ranging between 0.02 and 0.15 s, outperforming existing AIDS architectures for IoT networks and establishing its superiority in achieving both accuracy and efficient detection times.
Hinweise

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

The Internet of Things (IoT) encompasses a wide range of objects integrated with sensors and triggers that collect, process, and share data with other objects, software, and platforms. This groundbreaking technology trend is spurring an unprecedented information revolution, making it one of the most disruptive technologies in recent history, capturing the attention of society and academia [1]. IoT networks consist of everyday objects like smart converters, lamps, ovens, and refrigerators, as well as temperature sensors, IP cameras, smoke detectors, and even more advanced devices like RFID, heartbeat detectors, and parking sensors [2, 3]. However, building robust IoT networks presents numerous challenges, including limited resources, low energy efficiency, device heterogeneity, handling massive amounts of data, ensuring high-bandwidth data transport, scalability, and most importantly, the security of user data and privacy [4]. This paper aims to detect and mitigate potential threats, unauthorized access, and other anomalous activities within IoT networks, ultimately enhancing their security and protecting the integrity of data [5].
The Intrusion Detection System (IDS) is a prominent system that is constantly proposed to defend networks. An IDS is a software program that supervises a network or system to detect anomalous traffic or policy deviations [6]. IDS can be classified according to different aspects, like scope and detection approach. Regarding its scope, there are Host-Intrusion Detection Systems (HIDS) and Network-Intrusion Detection Systems (NIDS). The IDS in the HIDS resides on each host in the network, employing its resources, whereas the IDS in the NIDS exists on the server or network tape at the network layer to handle device communications. According to its detection approach, there is a Signature-based Intrusion Detection System (SIDS) also called Misuse Detection System, and an Anomaly-based Intrusion Detection System (AIDS) [7]. SIDS can detect attacks by matching the attack pattern with previously stored patterns in the database. It is highly effective at identifying known attacks by matching patterns with pre-defined signatures stored in a database [8] which leads to generating fewer false positives. However, it is ineffective against new, unknown attacks or zero-day exploits, as they rely on existing signatures. In addition, the utilized database of attack signatures must be regularly updated to include new attack patterns, requiring continuous maintenance and attackers can modify known attack patterns slightly to avoid detection by SIDS.
On the other hand, AIDS is based on a set of rule-based mechanisms rather than pattern recognition. AIDS aims to identify any deviation from normal system operation by monitoring system activity and categorizing it as either normal or abnormal. Detecting attack traffic requires AIDS to be trained to recognize abnormal activity. AIDS operates in two stages: training and testing [8]. During the training stage, the system learns from a dataset to discern the normal and abnormal patterns of network traffic based on distinct features. Subsequently, the testing stage solely focuses on evaluating the system’s ability to classify current traffic based on what it learned during the training phase. Anomalies are typically identified through various techniques, with artificial intelligence techniques being the most common approach. In particular, machine learning techniques have the most significant potential for detecting anonymous anomalous behavior [9].
So, these features make AIDS effective against zero-day exploits that are not covered by SIDS. Additionally, AIDS can adjust to evolving network environments over time, continuously learning and improving its detection capabilities and the use of machine learning and Artificial Intelligent (AI) techniques can enhance the accuracy and efficiency of anomaly detection. However, AIDS can generate a significant number of false positives, as any deviation from the norm is flagged as a potential threat, which can overwhelm administrators. Besides, AIDS requires a training phase to learn normal behavior patterns, which can be time-consuming and resource-intensive.
In this paper, we address several key challenges in enhancing the effectiveness of AIDS for IoT networks. One significant hurdle in developing efficient AIDS for IoT networks is the high consumption of IoT resources and the need to support real-time applications. Proper preprocessing and feature selection are crucial for reducing data complexity, facilitating faster training, and improving detection efficiency. These steps enhance the performance of machine learning models and accelerate the anomaly detection process. We propose a hybrid feature selection approach that combines filter and wrapper methods to identify the most relevant features. IoT datasets often suffer from class imbalance, where the number of normal instances significantly outweighs the number of attack instances. This imbalance can adversely affect the performance of machine learning models. To address this, we employ the Synthetic Minority Over-sampling Technique (SMOTE) to generate synthetic samples for the minority class, thereby balancing the dataset. Our proposed system is structured into two levels. At level 1, the system classifies network packets as normal or attack. At level 2, the system further classifies the detected attack to determine its specific category. This hierarchical approach improves the accuracy and specificity in identifying attack categories. We evaluate the performance of our proposed feature selection and anomaly detection approach using multiple machine learning algorithms, including Decision Tree, Random Forest, Gaussian Naive Bayes, and k-Nearest Neighbor. This comprehensive evaluation allows us to identify the most effective algorithm for our proposed system. We use three benchmark datasets such as BoT-IoT, TON-IoT, and CIC-DDoS2019 to evaluate our system.
The main contributions of this paper are as follows:
  • Propose a lightweight model that results in a significant reduction in detection time.
  • Propose a hybrid feature selection method aimed at enhancing the efficiency of IDS by selecting the relevant features.
  • Propose a two-level real-time AIDS model for comprehensive attack detection.
  • Handle unbalanced datasets problem by SMOTE for effectively tackling data imbalance.
  • Assess the proposed model’s robustness using three benchmark datasets, ensuring its performance generalizes effectively across diverse scenarios.
The paper is structured as follows: Sect. 2 offers a synopsis of recent relevant works. Section 3 discusses our proposed model. Section 4 discusses the comprehensive tests and findings, while Sect. 5 outlines the discussion. Finally, Sect. 6 proposes a conclusion and future work.
In this section, we discuss related work in the field of attack detection and feature selection, providing an overview of existing research on developing AIDS models for IoT networks and analyzing their methodologies, strengths, and drawbacks. To conclude this section, a summary of these methods is presented in Table 1.
Habeeb and Babu [10] employed a two-step approach for feature selection. First, they calculated the correlation between features to identify potential redundancies. Second, they utilized a hybrid optimization algorithm, combining the Whale Optimization Algorithm (WOA) with a Genetic Algorithm (GA), resulting in a final set of 32 features. They trained their model using K-Nearest Neighbors (K-NN) on the BoT-IoT dataset, achieving an accuracy of 99.5%.
Sun et al. [11] address the limitations in the Internet of Medical Things (IOMT) by proposing an IDS. Their approach leverages Particle Swarm Optimization (PSO) to select the most relevant features from the data and then utilizes the AdaBoost algorithm to classify potential attacks. This model is evaluated on the NSL-KDD dataset and achieved an accuracy of 98.5% with 12 selected features.
Dey et al. [12] introduced a hybrid feature selection approach that combines statistical test-based filter methods, such as Chi-Square, Pearson’s Correlation Coefficient (PCC), and Mutual Information (MI), with a metaheuristic technique called Non-Dominated Sorting Genetic Algorithm (NSGA-II) for feature optimization. The effectiveness of the approach was assessed using the TON-IoT dataset and evaluated with a Support Vector Machine (SVM). By utilizing 13 features from 43, the model achieved accuracy up to 99.48%. Mohy-eddine et al. [13] employed a combination of univariate statistical tests, principal component analysis (PCA), and genetic algorithms (GA) to select the most relevant features for their model that resulted in ten features. Their model is evaluated by K-NN using the BoT-IoT dataset, achieving an accuracy, precision, recall, and F1-score of 99.99%, with 57.73 s in the detection.
Azar et al. [14] proposed four hybrid IDS for satellite-terrestrial systems using Random Forest (RF) and Sequential Forward Feature Selection (SFS). The systems include RF-SFS, RF-SFS-ANN (RF-SFS with Artificial Neural Network), RF-SFS-LSTM (RF-SFS with Long Short-Term Memory), and RF-SFS-GRU (RF-SFS with Gated Recurrent Unit). Evaluated on the STIN dataset, RF-SFS achieved 90.5% accuracy, and RF-SFS-GRU reached 87%. On the UNSW-NB15 dataset, RF-SFS obtained 78.52% accuracy, while RF-SFS-GRU achieved 79.00%.
Sharma et al. [15] also proposed the Deep Neural Network (DNN) as a detection classifier, where it was trained with UNSW-NB15. They selected the best-related features with PCC. They applied Generative Adversarial Networks (GANs) to address class imbalance problems within the dataset. The model achieved a 91.00% accuracy rate. Dina et al. [16] addressed this problem of data imbalance from a focal loss function. Focal loss is used to train Convolutional Neural Network(CNN) and Feed-forward Neural Network (FNN) instead of cross-entropy. To assess the effectiveness of their model, they utilized datasets Bot-IoT, WUSTL-IIoT-2021, and WUSTL-EHMS-2020. When CNN trained on these datasets, it achieved accuracy up to 86.77%, 98.21%, and 93.08%, respectively. When FNN trained on these datasets, it achieved accuracy up to 91.55%, 98.95%, and 93.26%, respectively.
Kareem et al. [17] introduced a feature selection method that improved upon the Gorilla Troops Optimizer (GTO) with the integration of the Bird Swarm Algorithm (BSA). They evaluated their approach by applying the K-NN classifier to four datasets: NSL-KDD, UNSW-NB15, CICIDS-2017, and BoT-IoT. The results indicated that the K-NN classifier achieved high detection accuracy and specificity in the CICIDS-2017 dataset, with values of 98.79% and 99.68%, respectively. The BoT-IoT dataset yielded a sensitivity of 99.28% and a detection time of 145.75 s. Sharma et al. [18] conducted a study where they computed the correlation between features and removed highly correlated ones. In their evaluation, they utilized a DNN classifier and employed the KDDCUP99 dataset. They achieved a detection rate of 98.25%.
Adeniyi et al. [19] implemented deep feedforward neural networks (DFFNN) to detect the attacks and deep autoencoder (DAE) to reduce dimensions. They evaluated their model using NF-ToN-IoT, and then, the model achieved 89.00% accuracy.
Hikal and Elgayar [20] proposed a lightweight model for botnet attack detection using RF, Decision Tree (DT), SVM, and Back-propagation Neural Networks (BPNN). Their study utilized a dataset collected from three types of IoT cameras connected via Wi-Fi. The authors selected the most relevant and important features using the PCC, Spearman Correlation Coefficient (SCC), and Jaccard Index. The proposed framework achieved detection times of 30–80 s with a detection accuracy of 99.70%.
Ullah and Mahmoud [21] proposed a binary and multi-classification model that was based on flow features extracted from the BoT-IoT dataset. The model leveraged a DT at the first level and an RF at the second level. DT achieved an accuracy of up to 99.99% at the first level, while RF achieved an accuracy of up to 99.68% at the second level.
In the realm of IoT systems, researchers have extensively investigated machine learning and deep learning methods to enhance intrusion detection capabilities. These efforts have led to notable advancements, including enhanced accuracy, reduced false alarms, and improved detection of IoT-related attacks. However, some current methods exhibit several limitations, including high Computational Complexity such as WOA combined with GA [10], involving high computational overhead, making them impractical for real-time applications. Several studies utilized a single dataset for evaluation including those by Sun et al. [11], Mohy-eddine et al. [22] and Ullah and Mahmoud [21] evaluate their models on a single dataset, limiting the generalizability of their results. Methods such as those proposed by Sharma et al. [15] and Dina et al. [16] have tackled class imbalance issues, but there is still room for improvement in the effectiveness of these solutions. Methods like Dey et al. [12] and Azar et al. [14] utilize advanced feature selection techniques, but they suffer from low accuracy in certain contexts.
To address these issues, this study introduces a unified IDS targeting real-time applications. It comprises a hybrid feature selection technique that blends speed, simplicity, and quality through a fusion of filter and wrapper techniques. The proposed method is evaluated using state-of-the-art machine learning algorithms and the latest datasets through multilevel detection and by addressing the imbalance problem, aiming to overcome the aforementioned limitations.
Table 1
An overview of the assessed anomaly-based intrusion detection systems for Internet of Things networks
Study
Methodology
Results
Strengths
Limitations
Classifier
Feature Selection
Dataset
Accuracy (%)
Detection time (s)
Habeeb and Babu [10]
KNN
Corr,WOA, GA
BoT-IoT
99.50
N/A
1- GA enhanced the model
1- Model is trained on one dataset.
2- A high number of features can lead to increased computational complexity during the training process, resulting in slower training times.
Sun et al. [11]
AdaBoost
PSO
NSL-KDD
98.50
N/A
1- A low number of feature.
2- High Accuracy
1- Only one dataset used.
2- The dataset is general not specific about IOMT.
Dey et al. [12]
SVM
filter approach
TON-IoT
99.48
N/A
1-Hybrid between filter approaches (Chi-Square, Pearson’s Correlation Coefficient, and Mutual Information) and Genetic Algorithm (Non-Dominated Sorting).
1-Only accuracy metric used.
2-Only one dataset used
Mohy-eddine et al. [13]
KNN
PCA
univariate statistical tests
GA
BoT-IoT
99.99
57.73
1- The primary objective is to improve the accuracy and detection rate of the IDS.
2- The researchers adopt a holistic approach to selecting features.
1- Only one dataset was used.
2- long detection time.
Azar et al. [14]
RF
GRU
ANN
LSTM
SFS
STIN
UNSW-NB15
(90.5, 87.00, 71.47, 86.00)
(78.52, 79.00, 78.23, 78.00)
N/A
1- Evaluation done by two distinct domains of dataset.
2-Four hybrid IDS for satellite-terrestrial communication systems.
1-Low accuracy results.
Sharma et al. [15]
DNN
PCC
UNSW-NB15
91.00
N/A
1-Solved imbalance problem using GAN.
1- only one dataset was used.
2- Features extracted through one direction, which is statistics.
Dina et al. [16]
CNN
FNN
N/A
Bot-IoT
WUSTL-IIoT-2021
WUSTL-EHMS-2020
(86.77, 98.21)
(93.08, 91.55)
(98.95, 93.26)
N/A
1-Using focal loss to solve imbalance problem.
2- Distinct domains of used datasets.
1- F1_score was low when using WUSTL-IIoT-2021.
Kareem et al. [17]
K-NN
GTO
BSA
NSL-KDD
CICIDS-2017
UNSW-NB15
BoT-IoT
K-NN with CICIDS-2017= 98.79
K-NN with BoT-IoT= 99.28
K-NN with BoT-IoT= 145.5s
1- Working across multiple datasets.
1- Long detection time.
Sharma et al. [18]
DNN
Correlation
KDDCUP99
98.25
N/A
1-Applying L2 regularization technique
1- Only one dataset was used.
Adeniyi et al. [19]
DFFNN
DAE
NF-ToN-IoT
89.00
N/A
1- The proposed model was tested using a recently introduced dataset of IoT/IIoT systems.
1- Low accuracy results.
Hikal and Elgayar [20]
DT, SVM, RF, BPNN
PCC, SCC,
Jaccard index
Generated
99.70
30–80 s
1- Pay attention to the detection time.
1-High detection time.
Ullah and Mahmoud [21]
DT at the binary level
RF at the multi-label level
Flow features
BoT-IoT
Binary level=99.99
Multi level= 99.68%
N/A
1-Two consecutive levels.
1-Only one used dataset.

3 The proposed model

This study aims to create an efficient Anomaly Intrusion Detection System (AIDS) specifically designed for IoT networks. The system aims to achieve high detection accuracy and efficiency while being able to run in real-time by integrating advanced feature selection techniques with robust classification algorithms. This necessitates the application of suitable preprocessing and feature selection approaches to address challenges specific to IoT networks, as mentioned in recent literature. Figure 1 illustrates the main stages of the proposed approach: (1) IoT dataset, (2) data preprocessing, (3) hybrid feature selection, and (4) multi-level detection. A comprehensive discussion of each of these stages is elaborated upon in the subsequent subsections.

3.1 Data pre-processing

In general, data preprocessing or data engineering is usually the first step in any experiment. The process of feeding the raw data to the classification model before solving its constraints might result in misleading predictions [23]. The data preprocessing stage involves data cleaning, categorical feature encoding, the balance of the dataset, and feature normalization. These steps are summarized in Algorithm 1.

3.1.1 Data cleaning

Data cleaning is a critical preprocessing step that ensures the integrity and reliability of the dataset used for building an IDS in IoT environments. This process involves several technical procedures aimed at refining the raw data to enhance the performance and accuracy of the IDS. The primary steps include:
  • Duplicate row removal: Each entry in the dataset must be unique to avoid redundant information, which can bias the model. This can be achieved using comparison-based methods to detect and remove identical rows across the dataset.
  • Handling missing values: Preventing skewed analysis and model predictions involves addressing incomplete data entries. Strategies such as imputation, where missing values are filled with the mean, median, or mode of the respective feature, are employed. In cases where missing values are significant, rows with missing values beyond a certain threshold can be excluded.
  • Outlier detection and exclusion: Identifying and managing data points that deviate significantly from the remain of the data is crucial, as these could indicate noise or rare events. Outliers may be excluded or flagged for further investigation.

3.1.2 Feature encoding

To effectively apply feature selection and classification models, it is essential to convert categorical features into numerical data. This conversion process, known as feature encoding, ensures that categorical features are represented by their corresponding numerical values. To accomplish this, the Label Encoder is utilized, which assigns a unique integer number to each categorical feature based on their alphabetical order. By converting categorical features into numerical values, we enable the feature selection and classification models to process the data efficiently [24].

3.1.3 Balancing the dataset

Address class imbalance which is common in IDS datasets can affect prediction quality [25]. In this situation, it is challenging to maintain good generalization for the minor classes. To achieve balance in these datasets, there are two main techniques: oversampling and undersampling. Oversampling techniques tend to increase the number of samples in the minor class, while undersampling techniques tend to decrease the number of samples in the majority class until they equal the samples in other classes. Because the used datasets suffer from a large difference between the number of samples in the classes, it was better to use oversampling techniques to increase the number of samples in the minor class rather than decreasing the number of samples in the majority class. To mitigate these issues, we applied the Synthetic Minority Over-sampling Technique (SMOTE) [26] as introduced in Algorithm 2 proposed by Chawla et al. [27]. SMOTE generates new samples rather than replicating existing ones, which helps in maintaining a larger and more balanced dataset.

3.1.4 Feature normalization

Some datasets’ features have significantly varying magnitudes, ranges, and units. Because the datasets are so diverse, feature scaling is frequently employed to standardize the range of independent variables and to ensure they contribute equally to the model. A Min–Max normalization is used to normalize the data in the range [0, 1]. Equation (1) shows the mathematical equation needed to calculate the Min–Max normalization [28].
$$\begin{aligned} \bar{X}=\frac{x-x_{\min }}{x_{\max }-x_{\min }} \end{aligned}$$
(1)
where \(\bar{X}\) is the normalized value, x is an original feature value, and \(x_{\max }\) and \(x_{\min }\) are the maximum and minimum values of this feature.

3.2 Feature selection

The dataset comprises numerous features, some relevant and others irrelevant. The objective at this stage is to select a subset of features to reduce the complexity of the dataset. Feature selection techniques fall into three categories: filter, wrapper, and embedded. Filters, being statistical methods, evaluate feature relevance based on characteristics like correlation or information gain. Features are selected depending on their correlation with the target variable. Correlation analysis is employed to determine whether features exhibit positive or negative correlations with the target variable. While computationally efficient, they may overlook relevant feature combinations due to a lack of consideration for feature interaction [29].
Conversely, wrapper methods assess feature subset quality by evaluating specific classification model performance, aiming to maximize classification accuracy while considering feature interaction [30]. However, they can be computationally expensive and prone to overfitting [31]. Embedded techniques blend the benefits of filter and wrapper methods, incorporating feature interactions while maintaining low computational costs, resulting in superior performance compared to other techniques [32].
This paper proposes a hybrid approach that combines filter and wrapper methods for feature selection. Previous research has predominantly focused on either filters or wrappers individually. By merging these approaches, we aim to enhance feature selection efficiency, simplicity, and effectiveness.
Initially, a correlation analysis is conducted to assess linear relationships between features. The Pearson Correlation Coefficient (PCC) [33] and Spearman Correlation Coefficient (SCC) [34] are suitable for numerical datasets. PCC measures linear relationships, while SCC assesses nonlinear ones. The correlation analysis helps identify highly correlated features, indicating potential redundancy. In such cases, only one of the highly correlated features needs inclusion in the model. Correlation between features is calculated using the PCC and SCC Eqs. (2) and (3), respectively.
$$\begin{aligned} Cf_{x} f_{y}= & {} \frac{\sum _{i=1}^{n} (f_{x_{i}}-\bar{f_{x}}) (f_{y_{i}}-\bar{f_{y}})}{\sqrt{\sum _{i=1}^{n}(f_{x_{i}}-\bar{f_{x}})^2}\sqrt{\sum _{i=1}^{n} (f_{y_{i}}-\bar{f_{y}})^2}} \end{aligned}$$
(2)
$$\begin{aligned} R_s= & {} 1-\frac{6\sum _{i=1}^n(f_x-f_y)^2}{n(n^2-1)} \end{aligned}$$
(3)
where \(f_{x}\) and \(f_{y}\) are the feature values, n is the number of features, and \(\bar{f_{x}}\) is the mean value of the feature.
Next, the Recursive Feature Elimination (RFE) method is applied as described in Algorithm 3. This method leverages a classifier to predict the importance of each feature. By iteratively eliminating less important features, RFE selects the top-ranked essential features based on their accuracy [35]. To ensure robustness, RFE is trained using two different classifiers: Decision Tree (DT) and Random Forest (RF). This approach ensures that the selected features are consistently important across multiple classification models.
To illustrate the process, let’s consider an example where we have features labeled as f1, f2, f3, f4, and so on. After applying the correlation analysis, let’s assume that f1 and f2 are highly correlated, as well as f4 and f5. In this case, we can choose either f1 or f2 and either f4 or f5, but not both from each pair. This selection serves to reduce redundancy and simplify the feature set.
Next, we take the shared features or intersection between the output obtained from the PCC and SCC analysis, denoted as S1. The intersection between two filter feature selection techniques offers several advantages in identifying shared and strong features. By combining the strengths of multiple techniques, this approach increases the likelihood of capturing feature sets that are rich in information and relevant. This can lead to improved model accuracy, generalization capability, and robustness. Moreover, this hybrid approach allows for a more comprehensive analysis of the input feature space, reducing the risk of missing important variables and structures within the data. Harnessing the power of shared strong features from multiple filter methods, this technique enhances the overall effectiveness of feature selection.
We then train the RFE method using both DT and RF classifiers. Based on the accuracy of these classifiers, the RFE method predicts the importance of each feature. We select the top-ranked essential features, as determined by RFE, for further analysis. Following this, we find the shared features between the RFE-DT and RFE-RF sets, denoted as S2. After using filter and wrapper feature selection methods to obtain different subsets, we combine them to form the set S by taking their distinct features through the union operation. This allows us to gather unique features from each subset. The feature selection approach steps are described in Algorithm 4.

3.3 Multi-level detection

While anomaly detection is effective at identifying deviations from normal patterns, it does not provide detailed information about the type of anomaly. Knowing the specific type of attack is critical for implementing appropriate countermeasures. Our approach ensures not only that anomalies are detected but also that they are accurately classified into specific categories, enhancing the overall security posture of the network.
The flowchart of the detection process is shown in Fig. 2. Our detection process operates on two levels. First Level (Anomaly Detection): This level ensures the accurate identification of normal packets within the network, distinguishing them from attack packets. It serves as an initial filter to separate anomalies from normal behavior. Second Level (Anomaly Category Detection or Multi-class Classification): the second level focuses on the packets flagged as attacks by the first level. Here, we classify these attacks into specific types. This step is crucial because identifying the exact nature of the attack allows for more precise and effective responses. Additionally, if any normal packet is initially misclassified as an attack resulting in a False Positive (FP), the second level of detection reassesses and correctly classifies it as normal. To implement this process, we forwarded the selected features to a variety of state-of-the-art machine learning algorithms. We implemented Decision Tree (DT), Random Forest (RF), K-Nearest Neighbor (K-NN), and Gaussian Naive Bayes (GNB).

4 Experimental evaluation and results

This section describes the implementation and evaluation of the proposed model. The upcoming subsections cover the datasets used, the methodological implementation, the experimental settings, and the performance evaluation. This is followed by a discussion of the experimental findings and a comparative study. Finally, the time complexity of the proposed feature selection model is discussed.

4.1 Dataset specification

In order to improve the capability of the supervised models, labeled network traffic datasets are used. This is accomplished by providing the essential information to efficiently train AIDS for exceptional accuracy and reliability in detecting as many network attacks as possible. Due to the diversity and differences in IoT devices and their daily increase, vast amounts of unstructured data are produced. To build our proposed model to work efficiently in real-time, we used a dataset that was produced from a realistic environment. This dataset contains a variety of attacks, ensuring that our model can accurately detect and classify different types of intrusions. By utilizing this dataset, we can train our model to effectively identify patterns and characteristics of network traffic data in real-time scenarios. There are many new flow-based benchmark datasets, such as BoT-IoT [36], TON-IoT [37], and CIC-DDoS2019 [38], that have lately become available. We use these datasets as recent and realistic network traffic data for efficient anomaly-based network intrusion detection. Table 2 describes some of the main characteristics of these datasets.
Table 2
Description of BoT-IoT, TON-IoT, and CIC-DDoS2019 datasets
Datasets
No. of features
Size
Attack label
Normal
Attack
Bot-IoT [36]
46 features
477
3,668,045
DoSHTTP, DoSTCP, DoSUDP, DDoSHTTP, DDoSTCP, DDoSUDP, OS_Fingerprint, Service scan, Data Exfiltration, Theft Keylogging
TON-IoT [37]
45 features
500,000
2,500,000
DoS, DDoS, MITM, Scanning, XSS
CIC-DDoS2019 [38]
79 features
94,568
331,354
NTP, DNS, LDAP, MSSQL, NetBIOS, SNMP, UDP, UDP-Lag, WebDDoS, SYN, TFTP, Portmap

4.1.1 BoT-IoT dataset

BoT-IoT dataset [36] was established by building a realistic network environment in the Cyber Range Lab at UNSW Canberra. This included data from various smart home devices such as refrigerators, garage doors, thermostats, lights, and weather monitoring systems. Out of the total dataset of 72 million records, 5% (3.6 million records) were used in the experiments as described in Table 2.

4.1.2 TON-IoT dataset

TON-IOT dataset [37] was created from a realistic and enormous-scope network constructed at UNSW Canberra’s Cyber IoT Lab. It consists of telemetry data from IoT services, operating system logs, and network traffic from IoT networks. For our experiments, the network traffic data were used. The TON-IoT dataset contains 22,339,021 records, and 7% (3 million records) of the dataset were used in the experiments. Some attacks that did not affect the IoT network were excluded from the analysis, as described in Table 2.

4.1.3 CIC-DDoS2019 dataset

CIC-DDoS dataset [38] provided by the Canadian Institute for Cybersecurity, provides a comprehensive collection of different DDoS attacks. It was collected on two separate days for training and testing purposes. The training set, captured on January 12th, 2019, includes 12 different types of DDoS attacks. These attacks include Network Time Protocol (NTP), Lightweight Directory Access Protocol (LDAP), Domain Name System (DNS), Microsoft SQL Server (MSSQL), Simple Network Management Protocol (SNMP), Network Basic Input Output System (NetBIOS), User Datagram Protocol (UDP), SYN, UDP-Lag, WebDDoS, and TFTP. On the other hand, the test dataset consists of seven attacks, namely NetBIOS, MSSQL, PortScan, LDAP, UDP-Lag, UDP, and SYN.

4.2 Methodological implementation

The practical implementation of each stage is depicted in Fig. 3.
Initially, the datasets, BoT-IoT and TON-IOT, are obtained in CSV format from their respective websites. These datasets consist of multiple files which are merged into a single CSV file. Subsequently, during the data cleaning stage, it is observed that the BoT-IoT dataset is well-structured with no duplicate rows or null values. However, from its visualization, we noticed it contains a notable outlier: the Data_Exfiltration attack, which is underrepresented compared to other attack types. To mitigate its disproportionate impact during preprocessing, these instances are removed to ensure efficiency in time and resource utilization. Conversely, both the TON-IOT and CIC-DDoS2019 datasets exhibit duplicate rows, which are consequently eliminated. However, they do not contain null values or significant outliers. Categorical features, such as "protocol type" are encoded using label encoding (e.g., assigning ["TCP", "UDP", "ICMP"] to [0, 1, 2]).
Upon analyzing the data, an imbalance issue is identified, and to address this imbalance, SMOTE is employed. It is important to ensure that there is no redundancy in the new samples and that these samples are reasonable representations of the original dataset, following the same distribution. Figure 4 consists of histograms comparing the original data distribution to the data distribution after applying SMOTE. The original histograms, represented by the three upper histograms, display significant skewness and imbalance, particularly toward zero, indicating an imbalanced dataset. In contrast, the post-SMOTE histograms exhibit a more uniform distribution of feature values compared to the original skewed distribution. This indicates that the synthetic samples are effectively filling in the gaps and balancing the dataset, resulting in a varied and representative dataset. The absence of sharp peaks in the post-SMOTE histograms, which were present in the original data, further supports that SMOTE creates new data points instead of duplicating existing ones.
Following data balancing, features undergo min–max normalization to standardize their values within a uniform range of 0 to 1. A hybrid feature selection model is subsequently applied to select features. Table 3 details the selected features post-selection, revealing significant reductions in feature counts across datasets (BoT-IoT: 46 to 12, TON-IoT: 45 to 15, CIC-DDoS2019: 79 to 35).
Table 3
The selected features from the datasets
Datasets
Features Names
BoT-IoT
{Proto_number, State_number, Bytes, AR_P_Proto_P_SrcIP, Seq, Flgs_number, Saddr, Dport, Dur, Sport,Daddr, N_IN_Conn_P_DstIP}
TON-IoT
{ts, Src_ip, Src_port, Dst_ip, Dst_port, proto, duration, Src_bytes, Dst_bytes, Conn_State, Missed_bytes, Src_pkts, Src_ip_bytes, Dst_pkts,Dst_ip_bytes}
CIC-DDoS2019
{URG Flag Count, SYN Flag Count, PSH Flag Count, Fwd URG Flags, Fwd Packet Length Max, Bwd IAT Min, Bwd Packet Length Max, Total Backward Packets, Bwd Packets/s, Bwd Avg Bulk Rate, Fwd Avg Bytes/Bulk, Bwd URG Flags, Bwd Avg Bytes/Bulk, ECE Flag Count, Flow Duration, Flow IAT Min, Fwd Packets Length Total, Bwd Header Length, Protocol, Fwd Packet Length Std, Fwd Header Length, FIN Flag Count, Fwd PSH Flags, Init Bwd Win Bytes, Fwd Avg Bulk Rate, Flow Bytes/s, Active Mean, Init Fwd Win Bytes, Down/Up Ratio, Bwd Avg Packets/Bulk, Total Fwd Packets, Bwd Packet Length Min, Bwd PSH Flags, Active Std, Fwd Avg Packets/Bulk }
The processed features are then utilized in a multilevel detection approach. Input features and targets are defined for each level, and parameters of machine learning classifiers are selected, as detailed in Table 4.
Table 4
The training parameters for used datasets using ML classifiers
ML-Classifier
Parameters values for binary and multi-classification
BoT-IoT
TON-IoT
CIC-DDoS2019
DT
criterion=”entropy”
criterion=”entropy”
criterion=”entropy”
RF
n_estimators=10,criterion=’gini’
n_estimators=5,criterion=’gini’
n_estimators=5,criterion=’entropy’
K-NN
n_neighbors=5
n_neighbors=7
n_neighbors=3
GNB
No tunning needed
No tunning needed
No tunning needed

4.3 Experimental settings

The experiments were conducted on an HP notebook with Windows 10 Pro Enterprise 64-bit, an Intel(R) Core(TM) i7-5500 CPU with two cores and four logical processors, 16 GB of RAM, and 14.6 GB of virtual memory. The experiments used the PyCharm editor version 2022.2 and the Python programming language version 3.10. Data preprocessing utilized the Pandas and NumPy frameworks, while machine learning algorithms were implemented using the Scikit-Learn software version 1.1.1.

4.4 Performance evaluation metrics

For evaluation, the K-fold cross-validation method with random training/testing splits was applied. The K-fold-cross-validation approach is commonly used to validate models and prevent overfitting. It involves shuffling and dividing the dataset into k-folds, where onefold is used as the test set while the others are used for training [39]. By averaging the evaluation results of each fold, an accuracy value can be obtained [40]. To implement a stratified k-fold, we ensure that each fold has an equal number of samples for each class [41]. This helps maintain the balance of class representation throughout the training and testing process.
In terms of measuring the impact of the proposed model on network resources in IoT networks, it is understandable that specific details about device structure, such as battery, energy, and bandwidth, may not be available in the datasets used. One measurable resource that we mentioned is the detection time, which is indeed valuable information. A shorter detection time implies a reduction in the consumption of other resources, which is a positive outcome. Regarding the flexibility of the model, it is great that it can handle large datasets and different types of IoT devices that are shown in the description of the used datasets. Having a model that can adapt to various input data is crucial when dealing with the diverse range of devices in IoT networks.
To assess the performance of the classifiers, we utilized the standard performance evaluation metrics of Accuracy [42, 43], Precision, Recall, and F1-Score [42], as detailed below and specified in Eqs. (4):(7). In our evaluation, True Positive (TP) is the number of attack instances correctly classified as attack, True Negative (TN) is the number of normal instances correctly classified as normal, False Positive (FP) is the number of normal instances incorrectly classified as attack, and False Negative (FN) is the number of attack instances incorrectly classified as normal [42]. In addition, detection time(D_Time) is calculated  using Eq. (8). It represents the time taken by AIDS to classify a test sample as either normal or an intrusion, including its specific class type.
$$\begin{aligned} {\text{Accuracy}}= \frac{({\text{TP}}+{\text{TN}})}{({\text{TP}}+{\text{TN}}+{\text{FP}}+{\text{FN}})} \end{aligned}$$
(4)
$$\begin{aligned} {\text{Precision}}= \frac{{\text{TP}}}{({\text{TP}}+{\text{FP}})} \end{aligned}$$
(5)
$$\begin{aligned} {\text{Recall}} = \frac{{\text{TP}}}{({\text{TP}}+{\text{FN}})} \end{aligned}$$
(6)
$$\begin{aligned} F1{\text{-score}}= 2*(\frac{({\text{Precision}}*{\text{Recall}})}{({\text{Precision}}+{\text{Recall}})} \end{aligned}$$
(7)
$$\begin{aligned} {\text{D\_Time}}= {\text{end-start}} \end{aligned}$$
(8)

4.5 Experimental evaluation

To guarantee the effectiveness of our model, we conducted three experiments using the BoT-IoT, TON-IoT, and CIC-DDoS2019 datasets. These experiments, labeled Experiment I, Experiment II, and Experiment III, were each experiment is split into two parts. Firstly, we trained the AIDS model using all features and imbalanced data. Secondly, we trained the AIDS model using the proposed feature selection model combined with SMOTE.

4.5.1 Experiment I: using the BoT-IoT dataset

In order to assess how effective the proposed FS model is, we compare the experimental results obtained by using the entire feature set to those obtained using the proposed feature selection approach during the two-level classification.
Case1: AIDS based-all features and imbalanced data
Table 5 presents the performance of different machine learning models on all features and unbalanced BoT-IoT dataset. In the Level-1 detection, all classifiers achieved high accuracy, precision, recall, and F1-score values. The DT and RF achieved almost perfect scores in all metrics, while the GNB obtained slightly lower scores. The K-NN had excellent scores but took significantly longer detection time compared to the other models.
Moving on to Level 2 of detection, the accuracy, precision, recall, and F1-score values remained high for all models, demonstrating their effectiveness. However, there was a slight decrease in these metrics compared to level 1. Notably, GNB experienced a drop in accuracy, precision, recall, and F1-score, while DT, RF, and K-NN maintained consistently high scores.
Regarding detection time, the models performed relatively faster in Level 2 compared to Level 1, with GNB being the fastest and K-NN being the slowest. Specifically, in Level-1, DT takes 0.72 s, RF takes 2.10 s, GNB takes 2.18 s, and K-NN takes a significantly longer time of 18628.84 s. In Level 2, the time decreased for all models, with DT, RF, GNB, and K-NN are taking 0.17, 0.77, 0.60, and 40182.80 s, respectively.
Overall, the results indicate that the DT and RF models consistently perform exceptionally well, while GNB shows slightly lower performance in Level 2. Additionally, K-NN offers excellent scores but has a significantly longer detection time compared to other models.
Table 5
Performance of various machine learning models using unbalanced data and all features on the BoT-IoT dataset
Classifier
Level-1 of detection
Level-2 of detection
Accuracy
Precision
Recall
F1-score
D_Time
Accuracy
Precision
Recall
F1-score
D_Time
DT
99.99
100
100
99.99
0.72
99.99
99.90
98.15
98.94
0.17
RF
100
100
100
100
2.10
99.99
99.87
98.15
98.93
0.77
GNB
99.97
100
99.98
99.99
2.18
74.18
77.85
88.33
78.68
0.60
K-NN
99.99
100
99.99
99.99
18628.84
99.98
99.98
99.98
99.98
40182.80
Case 2: AIDS based-selected features combined with SMOTE
The outcomes for Level 1 and Level 2 detection using the proposed model can be viewed in Table 6. In Table 7, we implemented our proposed approach by selecting features and solving the imbalance problem by SMOTE, and we observed that the GNB classifier achieved impressive results compared to the previous case. Moreover, the DT and RF classifiers attained perfect results (100%) in the first level, with the DT classifier consuming a detection time of 0.13 s. On the other hand, the K-nearest neighbors (K-NN) classifier had the slowest detection time of 55.18 s. At the second level of detection, the K-NN classifier continued to exhibit the slowest detection time (41.04 s). In comparison, the DT classifier continued to maintain its detection time (0.04 s).
Table 6
Performance of various machine learning models after applying feature selection and SMOTE on the BoT-IoT dataset
Classifier
Level-1 of detection
Level-2 of detection
Accuracy
Precision
Recall
F1-score
D_Time
Accuracy
Precision
Recall
F1-score
D_Time
DT
100
100
100
100
0.13
99.99
99.99
99.99
99.99
0.04
RF
100
100
100
100
0.51
99.99
99.99
99.99
99.99
0.38
GNB
99.98
100
99.98
99.99
0.32
99.96
99.96
99.96
99.96
0.31
K-NN
99.99
100
99.99
99.99
55.18
99.99
99.99
99.99
99.99
41.04
This provides a clearer understanding of the effectiveness of SMOTE, particularly in showing the false positive rate in detail. However, models trained on datasets with all features and imbalance may exhibit higher metrics such as accuracy, but this can be misleading as they are often biased toward the majority class. When comparing the confusion matrix in the first level shown in Fig. 5, it is evident that DT, when using selected features, correctly detected three attack samples, as did RF. However, the impact of our proposed feature selection technique with SMOTE was more pronounced in GNB, which successfully reduced false negatives from 271 to 242.
Furthermore, the impact of our proposed model is evident in the second level across all models, as depicted in Fig. 6. This figure demonstrates there are 11 classes, 0: DDoSHTTP, 1: DDoSTCP, 2: DDoSUDP,3: DoSHTTP, 4: DoSTCP, 5: DoSUDP, 6:OS Fingerprint, 7: Service scan, 8: Data Exfiltration, 9: Theft Keylogging. Because all classifiers classify all normal correctly in the first level, it does not appear in the second level. As shown the Decision Tree and Random Forest could classify all attacks as true except classify one sample of DoSUDP as false. While the Gaussian Naive Bayes has improved than using all features as it could enhance the class DDoSTCP:2 from classifying 0 true to 113,966 samples true with 33 false instead of 85218.

4.5.2 Experiment II: using the TON-IoT dataset

In this section, we explore the experiments conducted to evaluate the model using TON-IOT. The experimental setup follows the sequence established in the previous section, that is, utilizing all features and employing the proposed feature selection.
Case 1: AIDS based-all features and imbalanced data
In Table 7, we can see that both the DT and the K-NN classifiers have a performance of up to 100% in the first level. However, it is worth noting that K-NN takes significantly more time, specifically 15506.95 s. In terms of time efficiency, the DT classifier achieves a shorter detection time in both levels, with 0.31 and 0.03, respectively. In the second level, all classifiers show acceptable results except for the GNB, which did not achieve positive results. The GNB classifier has an accuracy of 19.97% and an F1-score of 22.24%.
Table 7
Performance of various machine learning models using unbalanced data and all features on the TON-IoT dataset
Classifier
Level-1 of Detection
Level-2 of Detection
Accuracy
Precision
Recall
F1-score
D_Time
Accuracy
Precision
Recall
F1-score
D_Time
DT
100
100
100
100
0.31
99.99
99.99
99.99
99.99
0.03
RF
99.99
100
99.99
99.99
0.60
99.99
99.99
99.99
99.99
0.11
GNB
99.99
99.99
100
99.99
1.007
19.97
53.33
33.35
22.24
0.22
K-NN
100
100
100
100
15506.95
99.98
99.98
99.98
99.98
578.17
Case 2: AIDS based-selected features combined with SMOTE
Drawing upon the prior case, it can be deduced that each classifier exhibited commendable performance. Our objective is achieving good results, even if they are not identical. Furthermore, the selected features will help achieve this objective within the shortest possible time.
In Table 8, the DT demonstrated exceptional results across all evaluation metrics, achieving a perfect score of 100% in each metric within 0.15 s at the first level. Upon progressing to the second level, the DT model exhibited a slight decrease in performance with a 99.99% accuracy rate and a detect time of 0.05 s.
However, the most noteworthy aspect of our proposed GNB model lies in its significant improvement in the second level. Prior to our implementation, the GNB model displayed only 22% accuracy at this stage. However, with our proposed modifications, the GNB model achieved an impressive accuracy rate of 94.16% in the second level. This improvement marks a tremendous leap in performance for our proposed approach.
Table 8
Performance of various machine learning models after applying feature selection and SMOTE on the TON-IoT dataset
Classifier
Level-1 of Detection
Level-2 of Detection
Accuracy
Precision
Recall
F1-score
D_Time
Accuracy
Precision
Recall
F1-score
D_Time
DT
100
100
100
100
0.15
99.99
99.99
99.99
99.99
0.05
RF
100
100
100
100
0.41
99.99
99.99
99.99
99.99
0.18
GNB
96.72
100
95.83
97.87
0.53
94.16
94.58
94.16
94.17
0.12
K-NN
99.99
100
99.99
99.99
22678.82
99.99
99.99
99.99
99.99
469.92
The preceding outcomes were generated from the given confusion matrix. As illustrated in Fig. 7, the disparities among all classifiers are demonstrated in the first category by considering all features without solving the imbalance problem. Additionally, the second category outlines AIDS detection determined by selected features and SMOTE. It is worth noting that the RF classifier can correctly identify one attack sample despite it being classified as negative when employing all features. However, the K-NN classifier identifies one sample as negative when using the proposed model. Furthermore, the GNB classifier, when utilizing selected features and SMOTE, correctly identifies 27 normal samples but incorrectly classifies 31,274 attack samples as negative.
Examining the confusion matrix in Fig. 8, focusing on the confusion matrix at the second level, we observe the presence of five different classes: 0: DDoS, 1: DoS, 2: password, 3: scanning, and 4: XSS. These classes represent various types of attacks. However, the GNB classifier presents an additional class, “Normal,” which is encoded to 2 while simultaneously designating the “scanning” class with label 3. This progression would persist for the remaining classes. This occurs because the GNB classifier identifies some samples from the Normal class as attacks. Consequently, these samples are included in the second-level analysis, unlike the other classifiers.

4.5.3 Experiment III: using the CIC-DDoS2019 dataset

Case 1: AIDS based-all features and imbalanced data
The results in Table 9 indicate that the DT and RF models perform consistently well in both Level 1 and Level 2. They achieve high scores in Accuracy, Precision, Recall, and F1-score while also having low detection times. These models seem to be effective and efficient in detecting anomalies in the CIC-DDoS2019 dataset. On the other hand, the GNB model has lower scores in most metrics, indicating that it might struggle to classify instances in this dataset accurately. Similarly, the K-NN model has a relatively lower Accuracy and Recall score in Level 1, although it performs well in Level 2. Overall, the DT and RF models provide the best trade-off between performance metrics and detection time in both levels of detection.
Table 9
Performance of various machine learning models using unbalanced data and all features on the CIC-DDoS2019 dataset
Classifier
Level-1 of Detection
Level-2 of Detection
Accuracy
Precision
Recall
F1-score
D_Time
Accuracy
Precision
Recall
F1-score
D_Time
DT
99.94
99.96
99.97
99.96
0.03
98.74
86.59
82.65
82.55
0.006
RF
99.94
99.97
99.95
99.96
0.056
96.69
71.41
77.03
67.05
0.016
GNB
77.47
77.47
100
87.30
0.12
22.86
1.76
7.70
2.86
0.16
K-NN
93.53
93.97
97.93
95.91
195.57
99.17
95.45
92.09
93.59
28.77
Case 2: AIDS based-selected features combined with SMOTE
Table 10 illustrates that, among the classifiers, the DT and RF show high accuracy and precision, with F1-scores close to 99.98%. Both classifiers have low detection time, 0.020, and 0.06 s, respectively. When compared to the previous case, the accuracy of DT and RF increased from 99.94 to 99.97, with a decrease in the detection time. In the second level, the F1-score of DT and RF increased significantly from 82.55 to 99.82 and from 67.05 to 96.25, respectively. On the other hand, the GNB classifier performs more effectively than a prior case in two levels and up to 98% and 52.67%, respectively. The K-NN classifier has relatively good accuracy; however, in the first level, it decreases in accuracy than the previous case. But in the second level, f1-scores increased from 93.59 to 99.62 with 13.50 s.
Table 10
The performance of different machine learning models after applying feature selection and SMOTE on the CIC-DDoS2019 dataset
Classifier
Level-1 of Detection
Level-2 of Detection
Accuracy
Precision
Recall
F1-score
D_Time
Accuracy
Precision
Recall
F1-score
D_Time
DT
99.97
99.99
99.98
99.98
0.020
99.82
99.82
99.82
99.82
0.004
RF
99.97
99.99
99.97
99.98
0.06
96.14
96.57
96.14
96.25
0.018
GNB
98.32
99.33
98.48
98.91
0.068
52.67
46.87
52.68
43.36
0.034
K-NN
88.85
95.33
90.01
92.60
174.25
99.61
99.64
99.61
99.62
13.50
In Fig. 9, the DT_All classifier initially misclassifies 42 normal samples as an attack, but the DT_select classifier reduces this number to 14. Similarly, the misclassification of attack samples is reduced from 30 to 21 by the DT_select classifier. As a result, the DT_select classifier performs better than the DT_All classifier by reducing the number of false positives. Additionally, with the RF_All and RF_select methods, the misclassification of normal and attack samples reduces from 26 to 7 and from 45 to 34, respectively. In general, the proposed model demonstrates enhancements in GNB. Initially, all normal samples were misclassified as false due to the model’s bias toward the majority class. However, upon implementing the model, the samples were redistributed, resulting in 651 false positives out of all normal samples. Nonetheless, it should be noted that the model predicted 1482 attacks as normal. Regarding K-NN, the model successfully decreased the count of false positives from 6162 to 4322. However, this improvement was accompanied by an increase in the number of false negatives.
When examining the confusion matrix in Fig. 10, with a focus on the second level of the matrix, we can observe the presence of various classes such as 0: DNS, 1: LDAP, 2: MSSQL, 3: NTP, 4: NetBIOS, 5: Normal, 6: Portmap, 7: SNMP, 8: SYN, 9: TFTP, 10: UDP, 11: UDP-Lag, and 12: WebDDoS. It seems that DT, RF, and KNN algorithms perform better when using a proposed subset of features compared to using all features across multiple classes. However, the proposed model has a significant impact on the GNB algorithm, which predicts most of the classes incorrectly. But, when selected features are applied, AIDS detection improves.

4.6 Comparison with other approaches

We evaluate the proposed model’s robustness by comparing it with existing intrusion detection models. These models can be categorized into machine learning-based and deep learning-based approaches. Specifically, we refer to the studies conducted by [13, 17, 21, 4446] as representative machine learning-based solutions. On the other hand, the deep learning-based approaches we consider include the works by [19, 4750]. The comparison is presented in Table 11, where the results at two detection levels are summarized. At the first level of machine learning, our model demonstrates comparable performance in metrics such as Accuracy, Precision, Recall, and F1-Score with the works by [21, 44]. Additionally, our model outperforms the work by [17] in terms of Accuracy and Recall. It is worth mentioning that the aforementioned works utilized the K-NN algorithm, which is more time-consuming compared to the decision tree algorithm employed in our proposed model. Mohy et al. [13] also evaluated their model by K-NN and achieved accuracy up to 99.99% and a detection time equal to 57.73 s. Rihan et al. [50] employed ensemble feature selection and deep learning (DL) models. Although the authors implemented individually five filter selections and enhanced the output by RFE, however, their accuracy is less than our proposed by around 2%.
Table 11
Comparison with other models based on BoT-IoT dataset
Ref &year
Type of classifier
Level-1 of Detection (Binary Classification)
Level-2 of Detection (Multi-Classification)
Accuracy
Precision
Recall
F1-score
Accuracy
Precision
Recall
F1-score
[21][2020]
ML
99.99
99.99
99.99
99.99
99.68
99.70
99.67
99.69
[44][2021]
99.99
99.99
99.99
99.99
98.923
98.91
98.90
98.90
[17][2022]
94.85
99.28
96.23
99.99
[13][2023]
99.99
99.99
99.99
99.99
[10][2024]
99.50
[51][2024]
99.00
99.04
[48][2021]
DL
94.00
95.00
93.00
94.00
[47][2022]
92.85
95.55
[50][2023]
97.37
97.32
99.52
98.62
[52][2024]
99.68
99.30
99.11
99.21
Proposed
 
100
100
100
100
99.99
99.99
99.99
99.99
Bold values highlight the results of the proposed model
Table 12 presents a comparison of the performance of our proposed model with the current works [12, 19, 45, 46, 49, 53] using the TON-IoT dataset. We observed a significant difference in [45] compared to our results. The proposed hybrid framework in [12] achieved an accuracy of 99.48% by utilizing the hybrid NSGA-II technique for feature selection and SVM for detection. The accuracy is less than our proposed model with 0.52%, but this dataset is highly imbalanced, and the authors did not mention how to solve this problem. In addition, the proposed work is based on one detection level. However, the authors have not discussed other metrics to evaluate their model. El Hajla et al. [54] employed the voting and stacked concept to improve prediction accuracy with correlation to select the features. However, their results are lower than our proposed model, particularly in multiclass classification. When evaluating the results based on deep learning, our proposed model outperformed [19] at the first level with 11% in accuracy and 10% in F1-Score. In the second level, [53] is less than our proposed model by 9% and 11% in accuracy and F1-Score, respectively.
Table 12
Comparison with other models based on TON-IoT dataset
Ref &year
Type of Classifier
Level-1 of Detection (Binary Classification)
Level-2 of Detection (Multi-Classification)
Accuracy
Precision
Recall
F1-score
Accuracy
Precision
Recall
F1-score
[46][2021]
ML
98.20
98.90
95.90
97.40
97.80
97.80
97.80
97.80
[45][2022]
97.86
99.00
98.05
98.00
[12][2023]
99.48
[54][2024]
94.488
88.960
97.131
92.866
96.321
93.119
84.555
88.631
[54][2024]
97.313
95.804
96.988
96.393
95.757
98.199
86.515
87.836
[49][2021]
DL
99.47
99.00
[19][2022]
89.00
91.00
90.00
90.00
[53][2023]
90.57
89.59
88.87
Proposed
100
100
100
100
99.99
99.99
99.99
99.99
Bold values highlight the results of the proposed model
Table 13 demonstrates a comparison based on the CIC-DDoS2019 dataset. Thiyam et al. [55] solved the imbalance, but our proposed model outperformed it by 11% in accuracy and 18% in F1-score. Aktar et al. [56] introduced a one class Deep Contractive Autoencoder (DCAE). In the realm of multi-classification, our proposed model demonstrated superior accuracy compared to the model in [57], which was trained using all features except for the timestamp feature and [58] which employed the random feature selection and Optimization algorithm for selecting features and Light Gradient Boosting Machine (LGBM) algorithm to detect various attack classes. However, achieving high accuracy in binary classification, the CNN model [59] exhibits lower performance in multiclassification, trailing the proposed model by 6.2%.
Table 13
Comparison with other models based on CIC-DDoS2019 dataset
Ref &year
Type of classifier
Level-1 of Detection (Binary Classification)
Level-2 of Detection (Multi-Classification)
Accuracy
Precision
Recall
F1-score
Accuracy
Precision
Recall
F1-score
[57][2023]
ML
68.90
[55][2023]
99.86
99.78
99.81
99.80
[58][2024]
99.7
[60][2022]
DL
99.99
99.99
99.99
99.99
99.54
98.00
99.00
98.00
[56][2023]
93.41\(-\)97.58
[59][2024]
99.99
93.62
Proposed
99.97
99.99
99.98
99.98
99.82
99.82
99.82
99.82
Bold values highlight the results of the proposed model
As detection time is an important measure, we compared the proposed model across three experiments, summarizing the results in Fig. 11. For the BoT-IoT dataset, Aldaej et al. [51] proposed a two-level approach to anomaly detection that has a longer processing time of 1.23 s in multiclassification compared to our method. Sarhan et al. [45] trained their model with all features, resulting in a longer computation time compared to our proposed method. For the CIC-DDoS2019 dataset, although Ramzan et al. [60] achieved comparable accuracy to our results, our model demonstrated superiority in detection time.
The time complexity of our feature selection model involves several key factors, each affecting the overall computational load.

4.7 Time complexity of the proposed feature selection model

The time complexity of our feature selection model comprises several critical components, each contributing to the overall computational demand. Initially, feature selection utilizes both the PCC and SCC. Calculating the PCC for \(m\) features involves \(O(n \cdot m)\), where \(n\) denotes the number of samples [61, 62], while computing the SCC, which involves sorting, carries a complexity of \(O(m \cdot n \log n)\) [62, 63]. Next, the time complexity of RFE is calculated, where RFE has a complexity of \(O(T \cdot P)\), with \(T\) being the complexity of training the model and \(P\) the number of features eliminated [64]. For a DT, the complexity is \(O(m \cdot n \log n)\) [65] and training an RF involving \(k\) decision trees results in \(O(k \cdot T_{DT})\), simplifying to \(O(k \cdot m \cdot n \log n)\). Additionally, the model requires two intersections and one union operation, each with \(O(m)\) complexity. Combining these components, the final complexity of the feature selection model is \(O(n \cdot m) + O(m \cdot n \log n) + O(P \cdot m \cdot n \log n) + O(P \cdot k \cdot m \cdot n \log n) + O(m)\), which simplifies to \(O(P \cdot k \cdot m \cdot n \log n)\), capturing the overall computational complexity.

5 Discussion

The choice of the best model depends on the specific requirements of the intrusion detection system, considering factors such as accuracy, computational efficiency, and the acceptable trade-offs for IoT. The Decision Tree, in particular, performs well while consuming less time than the Random Forest. This efficiency can be attributed to the simplicity of Decision Trees and their faster testing times compared to the ensemble nature of Random Forest. The behavior of the K-NN algorithm relies on computing distances between data points during both the training and testing phases. The time complexity of K-NN grows with the size of the dataset, making it computationally expensive for larger IoT datasets. The simplicity of the GNB assumption and the independence assumption among features contribute to its computational efficiency. However, the model’s accuracy may be compromised in scenarios where the features are not truly independent, as in our proposed features. Therefore, we used the Decision Tree as a model to compare our proposed model with other recent works. Our proposed model addresses several limitations found in related works by employing a unified approach that effectively balances accuracy and detection time, making it suitable for real-time applications through its faster data processing capabilities and feasible for deployment in resource-constrained IoT environments. It overcomes the issue of class imbalance, which many state-of-the-art methods struggle with. Additionally, by utilizing hybrid feature selection techniques, our model can select important features and reduce data dimensionality, thereby reducing resource consumption and processing time. Also, it focuses on not only detecting the attack but also on detecting the type of attack that is important for the administration to take suitable measurements.

6 Conclusion

This paper presents the design and development of an Anomaly Intrusion Detection System (AIDS) that boasts high efficiency and minimal detection time, making it an ideal solution for real-time applications. To achieve that, this research proposed a unified model. It follows the four main steps: data acquisition, preprocessing, feature selection, and classification. To ensure efficient preprocessing, the BoT-IoT, TON-IoT, and CIC-DDoS2019 datasets undergo a thorough preparation process. Subsequently, a filter-wrapper hybridization approach is applied to select the features and optimize resource usage for IoT devices. In the classification stage, attacks are categorized into two levels: the first level determines whether the packet is normal or an attack, while the second level identifies the type of the detected attack. To achieve accurate classification, state-of-the-art machine learning classifiers such as Decision Tree, Random Forest, K-Nearest Neighbor, and Gaussian Naive Bayes are implemented. The proposed model’s effectiveness is validated using a tenfold stratified K-fold cross-validation method, which demonstrates its high accuracy within a short time. While the proposed approach has shown promising results, there exists ample opportunity for enhancement through the exploration and evaluation of dimensionality reduction techniques. Additionally, the exploration of deep learning-based approaches will be a key area of investigation, leveraging their capability to extract high-level discriminating features. These endeavors aim to refine and advance the model, ensuring its adaptability to evolving threats and strengthening its overall performance in IoT network security.

Declarations

Conflict of interest

The authors declare no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Literatur
2.
Zurück zum Zitat Hussain F (2017) Internet of things: building blocks and business models, vol 978–3. Springer, BerlinCrossRef Hussain F (2017) Internet of things: building blocks and business models, vol 978–3. Springer, BerlinCrossRef
10.
Zurück zum Zitat Habeeb MS, Babu TR (2024) Coarse and fine feature selection for network intrusion detection systems (ids) in iot networks. Trans Emerg Telecommun Technol 35(4):4961CrossRef Habeeb MS, Babu TR (2024) Coarse and fine feature selection for network intrusion detection systems (ids) in iot networks. Trans Emerg Telecommun Technol 35(4):4961CrossRef
11.
Zurück zum Zitat Sun Z, An G, Yang Y, Liu Y (2024) Optimized machine learning enabled intrusion detection 2 system for internet of medical things. Frankl Open 6:100056CrossRef Sun Z, An G, Yang Y, Liu Y (2024) Optimized machine learning enabled intrusion detection 2 system for internet of medical things. Frankl Open 6:100056CrossRef
12.
Zurück zum Zitat Dey AK, Gupta GP, Sahu SP (2023) Hybrid meta-heuristic based feature selection mechanism for cyber-attack detection in iot-enabled networks. Procedia Comput Sci 218:318–327CrossRef Dey AK, Gupta GP, Sahu SP (2023) Hybrid meta-heuristic based feature selection mechanism for cyber-attack detection in iot-enabled networks. Procedia Comput Sci 218:318–327CrossRef
13.
Zurück zum Zitat Mohy-eddine M, Guezzaz A, Benkirane S, Azrour M (2023) An efficient network intrusion detection model for iot security using k-nn classifier and feature selection. Multimed Tools Appl 82:1–19CrossRef Mohy-eddine M, Guezzaz A, Benkirane S, Azrour M (2023) An efficient network intrusion detection model for iot security using k-nn classifier and feature selection. Multimed Tools Appl 82:1–19CrossRef
14.
Zurück zum Zitat Azar AT, Shehab E, Mattar AM, Hameed IA, Elsaid SA (2023) Deep learning based hybrid intrusion detection systems to protect satellite networks. J Netw Syst Manag 31(4):82CrossRef Azar AT, Shehab E, Mattar AM, Hameed IA, Elsaid SA (2023) Deep learning based hybrid intrusion detection systems to protect satellite networks. J Netw Syst Manag 31(4):82CrossRef
15.
Zurück zum Zitat Sharma B, Sharma L, Lal C, Roy S (2023) Anomaly based network intrusion detection for iot attacks using deep learning technique. Comput Electr Eng 107:108626CrossRef Sharma B, Sharma L, Lal C, Roy S (2023) Anomaly based network intrusion detection for iot attacks using deep learning technique. Comput Electr Eng 107:108626CrossRef
16.
Zurück zum Zitat Dina AS, Siddique A, Manivannan D (2023) A deep learning approach for intrusion detection in internet of things using focal loss function. Internet Things 22:100699CrossRef Dina AS, Siddique A, Manivannan D (2023) A deep learning approach for intrusion detection in internet of things using focal loss function. Internet Things 22:100699CrossRef
19.
Zurück zum Zitat Adeniyi EA, Folorunso SO, Jimoh RG (2022) A deep learning-based intrusion detection technique for a secured iomt system. In: Informatics and Intelligent Applications: First International Conference, ICIIA 2021, Ota, Nigeria, November 25–27, 2021: Revised Selected Papers. Springer Nature, p 50 (2022). https://doi.org/10.1007/978-3-030-95630-1_4 Adeniyi EA, Folorunso SO, Jimoh RG (2022) A deep learning-based intrusion detection technique for a secured iomt system. In: Informatics and Intelligent Applications: First International Conference, ICIIA 2021, Ota, Nigeria, November 25–27, 2021: Revised Selected Papers. Springer Nature, p 50 (2022). https://​doi.​org/​10.​1007/​978-3-030-95630-1_​4
22.
Zurück zum Zitat Mohy-eddine M, Guezzaz A, Benkirane S, Azrour M (2024) Malicious detection model with artificial neural network in iot-based smart farming security. Cluster Comput 2024:1–16 Mohy-eddine M, Guezzaz A, Benkirane S, Azrour M (2024) Malicious detection model with artificial neural network in iot-based smart farming security. Cluster Comput 2024:1–16
27.
Zurück zum Zitat Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRef Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRef
28.
Zurück zum Zitat Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. University of Illinois at Urbana Champaign, Morgan Kaufmann, Urbana Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. University of Illinois at Urbana Champaign, Morgan Kaufmann, Urbana
30.
Zurück zum Zitat Kumari, B., Swarnkar, T.: Filter versus wrapper feature subset selection in large dimensionality micro array: a review. Int J Comput Sci Inf Technol (2011) Kumari, B., Swarnkar, T.: Filter versus wrapper feature subset selection in large dimensionality micro array: a review. Int J Comput Sci Inf Technol (2011)
38.
Zurück zum Zitat Sharafaldin I, Lashkari AH, Hakak S, Ghorbani AA (2019) Developing realistic distributed denial of service (ddos) attack dataset and taxonomy. In: 2019 International Carnahan Conference on Security Technology (ICCST). IEEE, pp 1–8 Sharafaldin I, Lashkari AH, Hakak S, Ghorbani AA (2019) Developing realistic distributed denial of service (ddos) attack dataset and taxonomy. In: 2019 International Carnahan Conference on Security Technology (ICCST). IEEE, pp 1–8
40.
41.
Zurück zum Zitat Olson DL, Delen D (2008) Advanced data mining techniques. Springer, Berlin Olson DL, Delen D (2008) Advanced data mining techniques. Springer, Berlin
43.
Zurück zum Zitat Huang J, Ling CX (2005) Using auc and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310CrossRef Huang J, Ling CX (2005) Using auc and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310CrossRef
50.
Zurück zum Zitat Rihan SDA, Anbar M, Alabsi BA (2023) Approach for detecting attacks on iot networks based on ensemble feature selection and deep learning models. Sensors 23(17):7342CrossRef Rihan SDA, Anbar M, Alabsi BA (2023) Approach for detecting attacks on iot networks based on ensemble feature selection and deep learning models. Sensors 23(17):7342CrossRef
51.
Zurück zum Zitat Aldaej A, Ullah I, Ahanger TA, Atiquzzaman M (2024) Ensemble technique of intrusion detection for iot-edge platform. Sci Rep 14(1):11703CrossRef Aldaej A, Ullah I, Ahanger TA, Atiquzzaman M (2024) Ensemble technique of intrusion detection for iot-edge platform. Sci Rep 14(1):11703CrossRef
52.
Zurück zum Zitat Geetha R, Jegatheesan A, Dhanaraj RK, Vijayalakshmi K, Nayyar A, Arulkumar V, Velmurugan J, Thavasimuthu R (2024) Cvs-fln: a novel iot-ids model based on metaheuristic feature selection and neural network classification model. Multimed Tools Appl 2024:1–35 Geetha R, Jegatheesan A, Dhanaraj RK, Vijayalakshmi K, Nayyar A, Arulkumar V, Velmurugan J, Thavasimuthu R (2024) Cvs-fln: a novel iot-ids model based on metaheuristic feature selection and neural network classification model. Multimed Tools Appl 2024:1–35
53.
Zurück zum Zitat Ding W, Abdel-Basset M, Mohamed R (2023) Deepak-iot: an effective deep learning model for cyberattack detection in iot networks. Inf Sci 634:157–171CrossRef Ding W, Abdel-Basset M, Mohamed R (2023) Deepak-iot: an effective deep learning model for cyberattack detection in iot networks. Inf Sci 634:157–171CrossRef
54.
Zurück zum Zitat El Hajla S, El Mahfoud Ennaji YM, Mounir S (2024) Enhancing iot network defense: advanced intrusion detection via ensemble learning techniques. Indones J Electr Eng Comput Sci 35(3):2010–2020 El Hajla S, El Mahfoud Ennaji YM, Mounir S (2024) Enhancing iot network defense: advanced intrusion detection via ensemble learning techniques. Indones J Electr Eng Comput Sci 35(3):2010–2020
55.
Zurück zum Zitat Thiyam B, Dey S (2023) Efficient feature evaluation approach for a class-imbalanced dataset using machine learning. Procedia Comput Sci 218:2520–2532CrossRef Thiyam B, Dey S (2023) Efficient feature evaluation approach for a class-imbalanced dataset using machine learning. Procedia Comput Sci 218:2520–2532CrossRef
56.
Zurück zum Zitat Aktar S, Nur AY (2023) Towards ddos attack detection using deep learning approach. Comput Secur 129:103251CrossRef Aktar S, Nur AY (2023) Towards ddos attack detection using deep learning approach. Comput Secur 129:103251CrossRef
57.
Zurück zum Zitat Hamarshe, A., Ashqar, H.I., Hamarsheh, M.: Detection of ddos attacks in software defined networking using machine learning models. In: International Conference on Advances in Computing Research. Springer, pp 640–651 (2023) Hamarshe, A., Ashqar, H.I., Hamarsheh, M.: Detection of ddos attacks in software defined networking using machine learning models. In: International Conference on Advances in Computing Research. Springer, pp 640–651 (2023)
58.
Zurück zum Zitat Ramesh Kumar M, Sudhakaran P (2024) Securing iot networks: a robust intrusion detection system leveraging feature selection and lgbm. Peer-to-Peer Netw Appl 2024:1–23 Ramesh Kumar M, Sudhakaran P (2024) Securing iot networks: a robust intrusion detection system leveraging feature selection and lgbm. Peer-to-Peer Netw Appl 2024:1–23
59.
Zurück zum Zitat Anley MB, Genovese A, Agostinello D, Piuri V (2024) Robust ddos attack detection with adaptive transfer learning. Comput Secur 144:103962CrossRef Anley MB, Genovese A, Agostinello D, Piuri V (2024) Robust ddos attack detection with adaptive transfer learning. Comput Secur 144:103962CrossRef
60.
Zurück zum Zitat Ramzan M, Shoaib M, Altaf A, Arshad S, Iqbal F, Castilla ÁK, Ashraf I (2023) Distributed denial of service attack detection in network traffic using deep learning algorithm. Sensors 23(20):8642CrossRef Ramzan M, Shoaib M, Altaf A, Arshad S, Iqbal F, Castilla ÁK, Ashraf I (2023) Distributed denial of service attack detection in network traffic using deep learning algorithm. Sensors 23(20):8642CrossRef
62.
Zurück zum Zitat Choi D, Li L, Liu H, Zeng L (2020) A recursive partitioning approach for subgroup identification in brain-behaviour correlation analysis. Pattern Anal Appl 23(1):161–177MathSciNetCrossRef Choi D, Li L, Liu H, Zeng L (2020) A recursive partitioning approach for subgroup identification in brain-behaviour correlation analysis. Pattern Anal Appl 23(1):161–177MathSciNetCrossRef
63.
Zurück zum Zitat Knight WR (1966) A computer method for calculating Kendall’s tau with ungrouped data. J Am Stat Assoc 61(314):436–439CrossRef Knight WR (1966) A computer method for calculating Kendall’s tau with ungrouped data. J Am Stat Assoc 61(314):436–439CrossRef
64.
Zurück zum Zitat Huang X, Zhang L, Wang B, Li F, Zhang Z (2018) Feature clustering based support vector machine recursive feature elimination for gene selection. Appl Intell 48:594–607CrossRef Huang X, Zhang L, Wang B, Li F, Zhang Z (2018) Feature clustering based support vector machine recursive feature elimination for gene selection. Appl Intell 48:594–607CrossRef
65.
Zurück zum Zitat Sani HM, Lei C, Neagu D (2018) Computational complexity analysis of decision tree algorithms. In: Artificial Intelligence XXXV: 38th SGAI International Conference on Artificial Intelligence, AI 2018, Cambridge, UK, December 11–13, 2018, Proceedings, vol 38. Springer, pp 191–197 Sani HM, Lei C, Neagu D (2018) Computational complexity analysis of decision tree algorithms. In: Artificial Intelligence XXXV: 38th SGAI International Conference on Artificial Intelligence, AI 2018, Cambridge, UK, December 11–13, 2018, Proceedings, vol 38. Springer, pp 191–197
Metadaten
Titel
A hybrid approach for efficient feature selection in anomaly intrusion detection for IoT networks
verfasst von
Aya G. Ayad
Nehal A. Sakr
Noha A. Hikal
Publikationsdatum
29.08.2024
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 19/2024
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-024-06409-x