Elsevier

Computers & Security

Volume 73, March 2018, Pages 137-155
Computers & Security

Network intrusion detection system based on recursive feature addition and bigram technique

https://doi.org/10.1016/j.cose.2017.10.011Get rights and content

Abstract

Network and Internet security is a critical universal issue. The increased rate of cyber terrorism has put national security under risk. In addition, Internet attacks have caused severe damages to different sectors (i.e., individuals, economy, enterprises, organizations and governments). Network Intrusion Detection Systems (NIDS) are one of the solutions against these attacks. However, NIDS always need to improve their performance in terms of increasing the accuracy and decreasing false alarms. Integrating feature selection with intrusion detection has shown to be a successful approach since feature selection can help in selecting the most informative features from the entire set of features.

Usually, for the stealthy and low profile attacks (zero – day attacks), there are few neatly concealed packets distributed over a long period of time to mislead firewalls and NIDS. Besides, there are many features extracted from those packets, which may make some machine learning-based feature selection methods to suffer from overfitting especially when the data have large numbers of features and relatively small numbers of examples.

In this paper, we are proposing a NIDS based on a feature selection method called Recursive Feature Addition (RFA) and bigram technique. The system has been designed, implemented and tested. We tested the model on the ISCX 2012 data set, which is one of the most well-known and recent data sets for intrusion detection purposes. Furthermore, we are proposing a bigram technique to encode payload string features into a useful representation that can be used in feature selection. In addition, we propose a new evaluation metric called (combined) that combines accuracy, detection rate and false alarm rate in a way that helps in comparing different systems and selecting the best among them. The designed feature selection-based system has shown a noticeable improvement on the performance using different metrics.

Introduction

The need for a defence system against the ever emerging network attacks has become a necessity for all participants of the Internet. Everyday there are new attacks, called “zero-day exploits” for which no vendor has yet discovered or developed a solution for (Wook Jang et al., 2016). Zero-day attacks have shown to be difficult to alleviate their damage due to the lack of information (Zhang et al., 2016). Therefore, there is always a need to defend against these zero-day attacks before they cause huge damage to networks.

Data mining is a technique that can be used with intrusion detection to detect characteristic patterns from the data features that depict system and user behaviour (Lee and Stolfo, 1998), and, ideally examples of malevolent activity. Machine learning algorithms have been used extensively with intrusion detection for the purpose of improving the accuracy of detection and making an immune model for the IDS against zero-day attacks or novel attacks. To build fast and accurate IDS, it is very important to select informative features from the input data. Feature selection has proven its ability to reduce computation demands, overfitting, model size and increase the accuracy (Sahu et al., 2014). The difficulty that faces a developer building these kinds of systems is the scarcity of attack examples which can be used to train a learning machine to build a model for detecting that particular attack. Even effective machine learning algorithms struggle when there are few examples, or unbalanced examples, and large numbers of features. The available informative features also affect the performance (i.e. the more the better). Previous IDSs often neglected the payload features although they contain some useful information. Therefore, we decided to utilize the payload features and extract useful information for ID purposes. In order to improve the detection capability of the system, we used the bigram technique to encode the payload features into a form that can be used in machine learning algorithms.

The bigram technique is an established technique especially in Deep Packet Inspection (DPI) and has been studied for decades. However, in this paper, a new combination of using feature selection, the bigram technique and the application to this particular problem (intrusion detection) is presented. We made the problem of intrusion detection harder by focusing on “zero-day attack” scenario. In order to simulate this, we intentionally built a learning machine using small numbers of examples and large numbers of features. The purpose of that is to check if we can still detect attacks with a data set with the above characteristics. The paper includes several contributions. We have highlighted the contributions of the paper in Table 1:

As it can be noticed from the Table 1, this work involved different contributions. We used a novel combination of bigram techniques and feature extraction to encode long payload features extracted from the network traffic. Although, the bigram technique is not new, but utilizing it in this context is novel and has shown its usefulness. Moreover, the work also employed RFA feature selection to find interdependent features within the data. Another major contribution is proposing a new Combined metric for evaluating IDS which combines accuracy, F – measure and FAR to make comparing different IDSs easier. In addition, the work involved incorporating bigram technique with RFA feature selection to overcome the problem of overfitting that results from the scarcity of data in case of “zero-day exploits”. The last contribution is represented by selecting a random feature from the features that obtained equal ranking coefficient in each run to eliminate its statistical significance in case that happens. In addition, we conducted a thorough analysis of the feature selection behaviour with the bigram and non-bigram features.

The rest of the paper is organized as follows: In Section 2, a suitable background about the topic is presented. Intrusion detection is explained in Section 3. The challenges that face intrusion detection are discussed in Section 3.2. In Section 4, the previous work in this area is discussed. In Section 5, feature selection types are explained. The employed feature selection and the approach are presented in Section 6. Our methodology is explained in Section 7. The details of the methodology are described in Section 8. In Section 9, we explain the feature extraction and the data set preparation. Section 10 explains the feature selection on the ISCX 2102 data set. The results of applying RFA on the ISCX 2012 data set are presented in Section 11. The conclusion and future work are presented in Section 12.

Section snippets

Background

In the last decade, the area of feature selection has received a great amount of attention by machine learning researchers (Shanab et al., 2011). It can be noticed that, in many pattern recognition and machine learning applications, the range of features has grown from tens to hundreds and thousands of features. These features may contain many irrelevant features which may affect application performance. Therefore, researchers have been looking for techniques to handle the problem of reducing

Intrusion detection

An intrusion is any group of actions that try to violate one or more of the computer security goals: Confidentiality, Integrity, and availability. The key elements to intrusion detection are (Lee and Stolfo, 1998):

  • 1.

    Resources that need to be protected by the intrusion detection system such as: user accounts, file systems, system kernels, etc.

  • 2.

    Models: that describe the legitimate behaviour of the resources

  • 3.

    Techniques: that match the current system activities with the constructed models to recognize

Previous work

Many studies have been conducted on applying feature selection to improve the IDS performance. Those studies used different IDS data sets for testing their models. However, in this paper we use the ISCX 2012 intrusion detection data set. In (Vasudevan and Selvakumar, 2015), the authors applied the intraclass correlation coefficient and interclass correlation coefficient to obtain a class-specific subset of features. The interclass and intraclass correlation coefficients were used to measure the

Feature selection types

In general, feature selection methods are divided into three types: Filter methods, Wrapper methods, and Embedded methods (Saeys et al., 2007).

  • (a)

    Filter methods involve the methods that perform feature selection independently from the classifier and do not incorporate learning. They ignore any cooperation with the classifier as illustrated in Fig. 1a. These methods determine the feature importance by inspecting the intrinsic properties of the data. Commonly, filter methods calculate a feature

The employed feature selection method and approach

In this section we explain our employed feature selection method in detail. The algorithm of the feature selection method also will be explained in the next section.

Methodology

The overall model of our approach is depicted in Fig. 3. The input data represent data sets training records including their targets. These records have been collected and organized as a standard benchmark data set (in this work we are using ISCX 2012 data set). The training records are then entered into the feature selection module to select the best subset of features according to the employed feature selection method RFA.

Next, the training phase works on the selected features by training a

Methodology description

In this section, we will explain the measurements that we used in our experiments as well as the detailed steps of our methodology.

Feature extraction and data set preparation

In this section, we will explain how we minimized the size of data set in terms of number of features. We start with how we encoded the string features to 4k features. Next, we explain how we generated different sizes (in terms of number of examples) of data sets.

The ISCX data set consists of different types of features: numeric, categorical, datetime, and strings. Usually the packet header information are represented by a mixture of the above types, but the payload features are usually

Feature selection on ISCX data set

As mentioned before, this step involves applying a feature selection method on the ISCX data set. In the next sections, we will apply RFA on the ISCX data set and analyze the results of that method on the four generated data sets. For each data set, we repeated the experiment 30 times and used 3 folds cross-validation for testing.

Results of RFA application on ISCX data set

In order to observe the effect of including the payload features in improving the detection accuracy, we conducted a crucial experiment. We measured the SVM's classification accuracy and F-measure on all the ISCX data sets without the payload features and with payload features. We measured the performance metrics after converting the payload features to bigram features and applying RFA feature selection. The goal of this experiment is to show that these payload features include important and

Conclusions and future work

This paper presents a new feature selection-based network intrusion detection system. The proposed system uses the ISCX 2012 data set for testing the proposed model. Since new attacks now are trying to deceive NIDS by distributing the attack packets over a long period of time, the system is designed to deal with few number of examples and large number of features. We prepared the data set for intrusion detection by encoding the payload features (long strings) using a bigram technique. While

Tarfa Hamed has obtained his PhD recently from the School of Computer Science at the University of Guelph. His research interests are Intrusion detection, Machine learning, Pattern recognition, and Feature selection.

References (54)

  • ZengZ. et al.

    A novel feature selection method considering feature interaction

    Pattern Recognit

    (2015)
  • M.H. Aghdam et al.

    Feature selection for intrusion detection system using ant colony optimization

    IJ Netw Secur

    (2016)
  • M.A. Ambusaidi et al.

    Building an intrusion detection system using a filter-based feature selection algorithm

    IEEE Trans Comput

    (2016)
  • E.B. Beigi et al.

    Towards effective feature selection in machine learning-based botnet detection approaches

    (2014)
  • S. Beniwal et al.

    Classification and feature selection techniques in data mining

    Int J Eng Res Technol

    (2012)
  • T. Bernecker et al.

    Quality of similarity rankings in time series

    Adv Spat Temporal Databases

    (2011)
  • V. Bolón-Canedo et al.

    A review of feature selection methods on synthetic data

    Knowl Inf Syst

    (2013)
  • B.E. Boser et al.

    A training algorithm for optimal margin classifiers

    (1992)
  • ChangC.-C. et al.

    LIBSVM: a library for support vector machines

    ACM Trans Intelligent Syst Technol

    (2011)
  • H. Debar

    An introduction to intrusion-detection systems

    (2002)
  • T. Dietterich

    Overfitting and undercomputing in machine learning

    ACM Comput Surv

    (1995)
  • M. Draminski et al.

    Monte Carlo feature selection for supervised classification

    Bioinformatics

    (2008)
  • H. Gharaee et al.

    A new feature selection IDS based on genetic algorithm and SVM

    (2016)
  • I. Guyon

    Practical feature selection: from correlation to causality

    NATO Sci Peace Secur

    (2008)
  • I. Guyon et al.

    An introduction to variable and feature selection

    J Mach Learn Res

    (2003)
  • I. Guyon et al.

    Gene selection for cancer classification using support vector machines

    Mach Learn

    (2002)
  • T. Hamed et al.

    An accurate, fast embedded feature selection for SVMs

    (2014)
  • Cited by (92)

    • A lightweight approach for network intrusion detection in industrial cyber-physical systems based on knowledge distillation and deep metric learning

      2022, Expert Systems with Applications
      Citation Excerpt :

      Intrusion detection systems (IDS) can detect intrusions that cannot be stopped by other security mechanisms, and it plays an important role in protecting the industrial CPS as a second line of defense. We can classify intrusion detection systems into Host-based Intrusion Detection System (HIDS) (Rebecca, 1998:) and Network-based Intrusion Detection System (NIDS) (Hamed, Dara, & Kremer, 2018) based on various data sources. HIDSs only monitor hosts, which need to be installed on each host, cannot observe and analyze network-related behavioral information.

    • A review of recent approaches on wrapper feature selection for intrusion detection

      2022, Expert Systems with Applications
      Citation Excerpt :

      Other works use Support Vector Machines (SVM) (Hosseini Bamakan, Wang, Yingjie, & Shi, 2016; Mohammadi et al., 2018; Salo et al., 2019) and Core Vector Machine (CVM) (Divyasree & Sherly, 2018), taking the advantage of their high-dimensional space separation abilities. In other approaches the authors propose different subset of features according to each attack class, in an attempt to discriminate attacks more accurately (Abdullah et al., 2018), a specific attack (Jiang et al., 2018) or detect zero day attacks (Hamed, Dara, & Kremer, 2018). These approaches do not separate the feature selection process from the classification technique.

    • Imbalanced data classification: A KNN and generative adversarial networks-based hybrid approach for intrusion detection

      2022, Future Generation Computer Systems
      Citation Excerpt :

      These methods can achieve good prediction results by learning the effective features in the data. Most of the traditional machine learning methods are based on supervised learning model [2,3,12]. Liang et al. [13] Proposed an industrial network intrusion detection algorithm based on multi feature data clustering optimization model.

    View all citing articles on Scopus

    Tarfa Hamed has obtained his PhD recently from the School of Computer Science at the University of Guelph. His research interests are Intrusion detection, Machine learning, Pattern recognition, and Feature selection.

    Rozita Dara is an Assistant Professor at the School of Computer Science, University of Guelph, Canada, where she has established the Data Management and Privacy Governance Laboratory. Prior to her academic position, she served as the research scientist in industry and government. She received her PhD from the University of Waterloo in 2007.

    Stefan C. Kremer has recently become a Professor in the School of Computer Science at the University of Guelph. His research interests include Machine Learning, Deep Learning, Dynamical Recurrent Networks and he has applied his work to the domains of genomics, bioinformatics, proteomics, natural language processing.

    View full text