Network intrusion detection system based on recursive feature addition and bigram technique
Introduction
The need for a defence system against the ever emerging network attacks has become a necessity for all participants of the Internet. Everyday there are new attacks, called “zero-day exploits” for which no vendor has yet discovered or developed a solution for (Wook Jang et al., 2016). Zero-day attacks have shown to be difficult to alleviate their damage due to the lack of information (Zhang et al., 2016). Therefore, there is always a need to defend against these zero-day attacks before they cause huge damage to networks.
Data mining is a technique that can be used with intrusion detection to detect characteristic patterns from the data features that depict system and user behaviour (Lee and Stolfo, 1998), and, ideally examples of malevolent activity. Machine learning algorithms have been used extensively with intrusion detection for the purpose of improving the accuracy of detection and making an immune model for the IDS against zero-day attacks or novel attacks. To build fast and accurate IDS, it is very important to select informative features from the input data. Feature selection has proven its ability to reduce computation demands, overfitting, model size and increase the accuracy (Sahu et al., 2014). The difficulty that faces a developer building these kinds of systems is the scarcity of attack examples which can be used to train a learning machine to build a model for detecting that particular attack. Even effective machine learning algorithms struggle when there are few examples, or unbalanced examples, and large numbers of features. The available informative features also affect the performance (i.e. the more the better). Previous IDSs often neglected the payload features although they contain some useful information. Therefore, we decided to utilize the payload features and extract useful information for ID purposes. In order to improve the detection capability of the system, we used the bigram technique to encode the payload features into a form that can be used in machine learning algorithms.
The bigram technique is an established technique especially in Deep Packet Inspection (DPI) and has been studied for decades. However, in this paper, a new combination of using feature selection, the bigram technique and the application to this particular problem (intrusion detection) is presented. We made the problem of intrusion detection harder by focusing on “zero-day attack” scenario. In order to simulate this, we intentionally built a learning machine using small numbers of examples and large numbers of features. The purpose of that is to check if we can still detect attacks with a data set with the above characteristics. The paper includes several contributions. We have highlighted the contributions of the paper in Table 1:
As it can be noticed from the Table 1, this work involved different contributions. We used a novel combination of bigram techniques and feature extraction to encode long payload features extracted from the network traffic. Although, the bigram technique is not new, but utilizing it in this context is novel and has shown its usefulness. Moreover, the work also employed RFA feature selection to find interdependent features within the data. Another major contribution is proposing a new Combined metric for evaluating IDS which combines accuracy, F – measure and FAR to make comparing different IDSs easier. In addition, the work involved incorporating bigram technique with RFA feature selection to overcome the problem of overfitting that results from the scarcity of data in case of “zero-day exploits”. The last contribution is represented by selecting a random feature from the features that obtained equal ranking coefficient in each run to eliminate its statistical significance in case that happens. In addition, we conducted a thorough analysis of the feature selection behaviour with the bigram and non-bigram features.
The rest of the paper is organized as follows: In Section 2, a suitable background about the topic is presented. Intrusion detection is explained in Section 3. The challenges that face intrusion detection are discussed in Section 3.2. In Section 4, the previous work in this area is discussed. In Section 5, feature selection types are explained. The employed feature selection and the approach are presented in Section 6. Our methodology is explained in Section 7. The details of the methodology are described in Section 8. In Section 9, we explain the feature extraction and the data set preparation. Section 10 explains the feature selection on the ISCX 2102 data set. The results of applying RFA on the ISCX 2012 data set are presented in Section 11. The conclusion and future work are presented in Section 12.
Section snippets
Background
In the last decade, the area of feature selection has received a great amount of attention by machine learning researchers (Shanab et al., 2011). It can be noticed that, in many pattern recognition and machine learning applications, the range of features has grown from tens to hundreds and thousands of features. These features may contain many irrelevant features which may affect application performance. Therefore, researchers have been looking for techniques to handle the problem of reducing
Intrusion detection
An intrusion is any group of actions that try to violate one or more of the computer security goals: Confidentiality, Integrity, and availability. The key elements to intrusion detection are (Lee and Stolfo, 1998):
- 1.
Resources that need to be protected by the intrusion detection system such as: user accounts, file systems, system kernels, etc.
- 2.
Models: that describe the legitimate behaviour of the resources
- 3.
Techniques: that match the current system activities with the constructed models to recognize
Previous work
Many studies have been conducted on applying feature selection to improve the IDS performance. Those studies used different IDS data sets for testing their models. However, in this paper we use the ISCX 2012 intrusion detection data set. In (Vasudevan and Selvakumar, 2015), the authors applied the intraclass correlation coefficient and interclass correlation coefficient to obtain a class-specific subset of features. The interclass and intraclass correlation coefficients were used to measure the
Feature selection types
In general, feature selection methods are divided into three types: Filter methods, Wrapper methods, and Embedded methods (Saeys et al., 2007).
- (a)
Filter methods involve the methods that perform feature selection independently from the classifier and do not incorporate learning. They ignore any cooperation with the classifier as illustrated in Fig. 1a. These methods determine the feature importance by inspecting the intrinsic properties of the data. Commonly, filter methods calculate a feature
The employed feature selection method and approach
In this section we explain our employed feature selection method in detail. The algorithm of the feature selection method also will be explained in the next section.
Methodology
The overall model of our approach is depicted in Fig. 3. The input data represent data sets training records including their targets. These records have been collected and organized as a standard benchmark data set (in this work we are using ISCX 2012 data set). The training records are then entered into the feature selection module to select the best subset of features according to the employed feature selection method RFA.
Next, the training phase works on the selected features by training a
Methodology description
In this section, we will explain the measurements that we used in our experiments as well as the detailed steps of our methodology.
Feature extraction and data set preparation
In this section, we will explain how we minimized the size of data set in terms of number of features. We start with how we encoded the string features to 4k features. Next, we explain how we generated different sizes (in terms of number of examples) of data sets.
The ISCX data set consists of different types of features: numeric, categorical, datetime, and strings. Usually the packet header information are represented by a mixture of the above types, but the payload features are usually
Feature selection on ISCX data set
As mentioned before, this step involves applying a feature selection method on the ISCX data set. In the next sections, we will apply RFA on the ISCX data set and analyze the results of that method on the four generated data sets. For each data set, we repeated the experiment 30 times and used 3 folds cross-validation for testing.
Results of RFA application on ISCX data set
In order to observe the effect of including the payload features in improving the detection accuracy, we conducted a crucial experiment. We measured the SVM's classification accuracy and F-measure on all the ISCX data sets without the payload features and with payload features. We measured the performance metrics after converting the payload features to bigram features and applying RFA feature selection. The goal of this experiment is to show that these payload features include important and
Conclusions and future work
This paper presents a new feature selection-based network intrusion detection system. The proposed system uses the ISCX 2012 data set for testing the proposed model. Since new attacks now are trying to deceive NIDS by distributing the attack packets over a long period of time, the system is designed to deal with few number of examples and large number of features. We prepared the data set for intrusion detection by encoding the payload features (long strings) using a bigram technique. While
Tarfa Hamed has obtained his PhD recently from the School of Computer Science at the University of Guelph. His research interests are Intrusion detection, Machine learning, Pattern recognition, and Feature selection.
References (54)
- et al.
A survey of intrusion detection systems based on ensemble and hybrid classifiers
Comput Secur
(2017) - et al.
A review of microarray datasets and applied feature selection methods
Inf Sci (Ny)
(2014) - et al.
A survey on feature selection methods
Comput Electr Eng
(2014) - et al.
Toward an efficient and scalable feature selection approach for internet traffic classification
Comput Netw
(2013) - et al.
Effect of label noise in the complexity of classification problems
Neurocomputing
(2015) - et al.
A class-oriented feature selection approach for multi-class imbalanced network traffic datasets based on local and global metrics fusion
Neurocomputing
(2015) - et al.
Feature selection based hybrid anomaly intrusion detection system using K means and RBF kernel function
Procedia Comput Sci
(2015) - et al.
Toward developing a systematic approach to generate benchmark datasets for intrusion detection
Comput Secur
(2012) - et al.
Subspace learning for unsupervised feature selection via matrix factorization
Pattern Recognit
(2015) - et al.
Feature selection for unsupervised learning through local learning
Pattern Recognit Lett
(2015)
A novel feature selection method considering feature interaction
Pattern Recognit
Feature selection for intrusion detection system using ant colony optimization
IJ Netw Secur
Building an intrusion detection system using a filter-based feature selection algorithm
IEEE Trans Comput
Towards effective feature selection in machine learning-based botnet detection approaches
Classification and feature selection techniques in data mining
Int J Eng Res Technol
Quality of similarity rankings in time series
Adv Spat Temporal Databases
A review of feature selection methods on synthetic data
Knowl Inf Syst
A training algorithm for optimal margin classifiers
LIBSVM: a library for support vector machines
ACM Trans Intelligent Syst Technol
An introduction to intrusion-detection systems
Overfitting and undercomputing in machine learning
ACM Comput Surv
Monte Carlo feature selection for supervised classification
Bioinformatics
A new feature selection IDS based on genetic algorithm and SVM
Practical feature selection: from correlation to causality
NATO Sci Peace Secur
An introduction to variable and feature selection
J Mach Learn Res
Gene selection for cancer classification using support vector machines
Mach Learn
An accurate, fast embedded feature selection for SVMs
Cited by (92)
A deep learning model based on contrast-enhanced computed tomography for differential diagnosis of gallbladder carcinoma
2023, Hepatobiliary and Pancreatic Diseases InternationalAn interpretable intrusion detection method based on few-shot learning in cloud-ground interconnection
2022, Physical CommunicationA lightweight approach for network intrusion detection in industrial cyber-physical systems based on knowledge distillation and deep metric learning
2022, Expert Systems with ApplicationsCitation Excerpt :Intrusion detection systems (IDS) can detect intrusions that cannot be stopped by other security mechanisms, and it plays an important role in protecting the industrial CPS as a second line of defense. We can classify intrusion detection systems into Host-based Intrusion Detection System (HIDS) (Rebecca, 1998:) and Network-based Intrusion Detection System (NIDS) (Hamed, Dara, & Kremer, 2018) based on various data sources. HIDSs only monitor hosts, which need to be installed on each host, cannot observe and analyze network-related behavioral information.
A review of recent approaches on wrapper feature selection for intrusion detection
2022, Expert Systems with ApplicationsCitation Excerpt :Other works use Support Vector Machines (SVM) (Hosseini Bamakan, Wang, Yingjie, & Shi, 2016; Mohammadi et al., 2018; Salo et al., 2019) and Core Vector Machine (CVM) (Divyasree & Sherly, 2018), taking the advantage of their high-dimensional space separation abilities. In other approaches the authors propose different subset of features according to each attack class, in an attempt to discriminate attacks more accurately (Abdullah et al., 2018), a specific attack (Jiang et al., 2018) or detect zero day attacks (Hamed, Dara, & Kremer, 2018). These approaches do not separate the feature selection process from the classification technique.
Imbalanced data classification: A KNN and generative adversarial networks-based hybrid approach for intrusion detection
2022, Future Generation Computer SystemsCitation Excerpt :These methods can achieve good prediction results by learning the effective features in the data. Most of the traditional machine learning methods are based on supervised learning model [2,3,12]. Liang et al. [13] Proposed an industrial network intrusion detection algorithm based on multi feature data clustering optimization model.
Tarfa Hamed has obtained his PhD recently from the School of Computer Science at the University of Guelph. His research interests are Intrusion detection, Machine learning, Pattern recognition, and Feature selection.
Rozita Dara is an Assistant Professor at the School of Computer Science, University of Guelph, Canada, where she has established the Data Management and Privacy Governance Laboratory. Prior to her academic position, she served as the research scientist in industry and government. She received her PhD from the University of Waterloo in 2007.
Stefan C. Kremer has recently become a Professor in the School of Computer Science at the University of Guelph. His research interests include Machine Learning, Deep Learning, Dynamical Recurrent Networks and he has applied his work to the domains of genomics, bioinformatics, proteomics, natural language processing.