Elsevier

Expert Systems with Applications

Volume 64, 1 December 2016, Pages 330-339
Expert Systems with Applications

OCPAD: One class Naive Bayes classifier for payload based anomaly detection

https://doi.org/10.1016/j.eswa.2016.07.036Get rights and content

Highlights

  • OCPAD: Multinomial Bayesian one class classifier for anomalous payload detection.

  • A tree to store probability ranges of ngrams found in non malicious payloads.

  • OCPAD has high Detection Rate and low False Positives.

  • Theoretical and experimental comparison with other methods.

Abstract

Application specific attack detection requires packet payload analysis. Current payload analysis techniques suffer from failed detection as they use only the presence or absence of short sequences of a packet in a knowledge-base created out of non-malicious packets. In this paper, we describe OCPAD a content anomaly detection method to identify network packets with suspicious payload content. Proposed method combines the benefits of one class classification and frequency information of short sequences.

We adapt one class Multinomial Naive Bayes classifier as anomaly detector for detecting HTTP attacks. OCPAD uses likelihood of each short sequence’s occurrence in a payload of known non-malicious packets as a measure to derive the degree of maliciousness of a packet. In the training phase, OCPAD generates the likelihood range of each sequence’s occurrence from every packet. In order to store the likelihood range of these sequences, we propose a novel and efficient data structure called ProbabilityTree. In the testing phase, it treats a short sequence as anomalous if it is not found in the database or its likelihood of occurrence in a packet is not in the range found in training phase. Using the likelihood of anomalous short sequences, it generates a class label for a test packet. Our experiments with a large dataset of 1 million HTTP packets collected from an academic network revealed OCPAD has a high Detection Rate (up to 100%) compared to previous methods and acceptable rate of False Positives (less than 0.6%).

Introduction

Application specific and targeted attacks are on an ever increasing trend (Micro, 2016). Application specific attacks include buffer overflows, command injection attacks, scripting attacks, etc. These attacks may not be easily detected by an Intrusion Detection System (IDS) which inspect only header or flow level data; as there may not be any visible changes in patterns at header or flow level (Alizadeh, Khoshrou, & Zúquete, 2015). Detecting these attacks require application level data or content analysis including application semantics. Application level data forms the payload of a packet hence payload analysis is required to detect application specific attacks. The attack detection methods can be divided into two cases as misuse (Roesch, 1999) and anomaly detection (Kim, Cho, Kang, & Kang, 2011) methods. A misuse detection system has signatures of malicious payload and anomaly detection system model the normal payload and identify deviations to detect malicious cases. An anomaly detection engine designed for detecting application specific attacks is a kind of expert system. Like any other expert system it has a knowledge-base, as it learns the behavioral aspects of normal application payload. When this system is put into use, it makes decision on every payload as either normal or malicious.

Existing misuse and anomaly based detection methods show varied performance in detecting these attacks. Majority of the payload based anomaly detection methods work by creating a database of short sequences from payload of known non malicious packets. This database represents the normal behavior profile of the application. In order to detect intrusions, short sequences of a test packet are matched against the sequences in the database. Based on the number of found or not found sequences, a score (for a packet) which indicates the degree of deviation from normal profile is derived. If the generated score crosses a threshold α, the packet is declared as anomalous (intrusion). Several methods also use machine learning techniques to classify the payload using these short sequences as features (Ariu, Tronci, Giacinto, 2011, Jamdagni, Tan, He, Nanda, Liu, 2013, Perdisci, Ariu, Fogla, Giacinto, Lee, 2009, Perdisci, Gu, Lee, 2006). Further, it is a common practice to use one class classification techniques or unsupervised learning methods for anomaly detection (Chandola, Banerjee, and Kumar, 2009; Gates & Taylor, 2007). This is motivated by the difficulty in finding a labeled dataset of normal and attack (McHug, 2000) in equal proportion to train a two class classification algorithm.

Advanced mimicry attacks proposed by Kolesnikov, Dagon, and Lee (2006) were able to evade many of the detection methods which extracted fixed number of features like PAYL (Wang & Stolfo, 2004). To defeat these mimicry attacks Anagram (Wang, Parekh, & Stolfo, 2006) proposed to randomize the sequence length while deriving the score for a packet. Hubballi, Biswas, and Nandi (2010) showed that using frequency information of short sequences is useful in reducing false negatives. This is based on the observation that, an accidental presence of a short sequence in the normal packet which is otherwise only seen in a malicious packet can evade its detection if only the sequence’s presence or absence is taken into account while calculating the score. Further over occurrence of a particular type of short sequence in a payload may also signal an intrusion. A sample payload of a buffer overflow attack is shown in Fig. 1. It can be seen from this figure that, it contains a large number of A’s inside its payload which are meant to overflow the buffer in a target process to hijack its execution. If binary comparison of short sequences is done and by chance one short sequence containing only A’s is found in normal profile, this packet will not be detected as anomalous as almost all sequences of this packet will be declared as found in profile.

While one class classification is useful to create a model of payload anomaly detection and frequency information is useful in reducing false negatives, we in this paper propose a novel one class classification method to accurately detect application specific intrusions. Our method combines the advantages of both frequency based methods and one class classification techniques. We calculate the occurrence probability range of a particular short sequence from known non malicious packets. This probability serves as an indicator of permitted upper and lower bound on number of times a particular short sequence appears in a packet.1 A packet whose payload has large number of deviating short sequences from their respective identified ranges will be detected as anomaly. In specific in this paper we make following contributions.

  • We describe a version of Multinomial Bayesian one class classification technique for accurately detecting anomalous payloads.

  • We propose an efficient data structure called ProbabilityTree to store the probability ranges of short sequences found in non malicious payloads in the profile creation phase.

  • We experiment with a large HTTP dataset and report the performance of our anomaly detection system in terms of its Detection Rate and False Positive Rate.

Rest of the paper is organized as follows. In Section 2, we describe few of the other works closely related to ours. In Section 3 we elaborate our proposed one class classification scheme for detecting anomalous payload. Experimental details are provided in Section 4 followed by the conclusion in Section 5.

Section snippets

Prior work

There are several models which use ngram (which is a synonym for short sequence) based technique to detect the anomalous payloads. Few of the most closely related works are described below.

Wang and Stolfo (2004) proposed PAYL which uses 1-gram to extract 256 features with their respective frequencies from normal traffic. Each feature corresponds to one of the 256 ASCII values. Average of all the feature vectors is called centroid and represents the normal profile of an application. In order to

One class classification model

In this section we describe our proposed model OCPAD for detecting anomalous payloads in the network traffic. Our model is based on the occurrence probability of an ngram in a packet. The class of a packet (normal/anomaly) is calculated with probability of each ngram using a version of Multinomial one class Naive Bayes classifier.

Experimental evaluation

In this section we describe the experiments done to evaluate the proposed one class classification model. For the experiments, we use a large dataset collected from our institute network. Two types of datasets are used in the experiment. First is a normal dataset and second is an abnormal or attack dataset. Details pertaining to these two datasets are given below.

Normal dataset: This dataset was collected at the institute network of IIT Indore. There are around 600 hosts in this network.

Conclusion

State of the art content anomaly detectors suffer from failed detection as they use only presence or absence of ngram from training dataset. In this paper we described OCPAD a content anomaly detector for detecting anomalous packet payloads in network traffic. Unlike the previous methods OCPAD uses the occurrence probability range of each ngram in any payload and use one class Naive Bayes classification to detect anomalies. Like any other anomaly detector, OCPAD has a training phase and testing

References (32)

  • T. Detristan et al.

    Polymorphic shellcode engine using spectrum analysis

    Phrack Issue

    (2003)
  • P. Fogla et al.

    Polymorphic blending attacks

    Usenix-ss’06: Proceedings of the 15th conference on usenix security symposium

    (2006)
  • C. Gates et al.

    Challenging the anomaly detection paradigm: A provocative discussion

    Nspw ’06: Proceedings of the 2006 workshop on new security paradigms

    (2007)
  • C.T. Gimenez et al.

    Combining expert knowledge with automatic feature extraction for reliable web attack detection

    Security and Communication Networks

    (2012)
  • N. Görnitz et al.

    Toward supervised anomaly detection

    Journal of Artificial Intelligence Research

    (2013)
  • N. Hubballi et al.

    Layered higher order n-grams for hardening payload based anomaly intrusion detection

    Fares ’10: Proceedings of the workshop on frontiers of availability, reliability and security

    (2010)
  • Cited by (81)

    • A framework of active learning and semi-supervised learning for lithology identification based on improved naive Bayes

      2022, Expert Systems with Applications
      Citation Excerpt :

      It has been widely used due to its high efficiency, solid theoretical foundation, and good generalization ability(Friedman et al., 1997; P. Domingos & Pazzani, 1997; Swarnkar & Hubballi, 2016). Naive Bayes assumes that when class variables are given, the attribute variables are conditionally independent.

    • Generating decision support for alarm processing in cold supply chains using a hybrid k-NN algorithm

      2022, Expert Systems with Applications
      Citation Excerpt :

      The algorithm also fared well in the early detection of health issues (P. M. Kumar & Devi Gandhi, 2018) and driver’s alertness (Allach, Ahmed, & Boudhir, 2019). Naïve Bayes combined with k-means fared well in the reduction of false alarm rate in intrusion detection (Om & Kundu, 2012), detection of payload-based anomalies (Swarnkar & Hubballi, 2016), and software defect prediction (Arar & Ayan, 2017). Hardly interpretable models were represented by ensemble or hybrid algorithms and deep neural networks.

    View all citing articles on Scopus
    View full text