OCPAD: One class Naive Bayes classifier for payload based anomaly detection
Introduction
Application specific and targeted attacks are on an ever increasing trend (Micro, 2016). Application specific attacks include buffer overflows, command injection attacks, scripting attacks, etc. These attacks may not be easily detected by an Intrusion Detection System (IDS) which inspect only header or flow level data; as there may not be any visible changes in patterns at header or flow level (Alizadeh, Khoshrou, & Zúquete, 2015). Detecting these attacks require application level data or content analysis including application semantics. Application level data forms the payload of a packet hence payload analysis is required to detect application specific attacks. The attack detection methods can be divided into two cases as misuse (Roesch, 1999) and anomaly detection (Kim, Cho, Kang, & Kang, 2011) methods. A misuse detection system has signatures of malicious payload and anomaly detection system model the normal payload and identify deviations to detect malicious cases. An anomaly detection engine designed for detecting application specific attacks is a kind of expert system. Like any other expert system it has a knowledge-base, as it learns the behavioral aspects of normal application payload. When this system is put into use, it makes decision on every payload as either normal or malicious.
Existing misuse and anomaly based detection methods show varied performance in detecting these attacks. Majority of the payload based anomaly detection methods work by creating a database of short sequences from payload of known non malicious packets. This database represents the normal behavior profile of the application. In order to detect intrusions, short sequences of a test packet are matched against the sequences in the database. Based on the number of found or not found sequences, a score (for a packet) which indicates the degree of deviation from normal profile is derived. If the generated score crosses a threshold α, the packet is declared as anomalous (intrusion). Several methods also use machine learning techniques to classify the payload using these short sequences as features (Ariu, Tronci, Giacinto, 2011, Jamdagni, Tan, He, Nanda, Liu, 2013, Perdisci, Ariu, Fogla, Giacinto, Lee, 2009, Perdisci, Gu, Lee, 2006). Further, it is a common practice to use one class classification techniques or unsupervised learning methods for anomaly detection (Chandola, Banerjee, and Kumar, 2009; Gates & Taylor, 2007). This is motivated by the difficulty in finding a labeled dataset of normal and attack (McHug, 2000) in equal proportion to train a two class classification algorithm.
Advanced mimicry attacks proposed by Kolesnikov, Dagon, and Lee (2006) were able to evade many of the detection methods which extracted fixed number of features like PAYL (Wang & Stolfo, 2004). To defeat these mimicry attacks Anagram (Wang, Parekh, & Stolfo, 2006) proposed to randomize the sequence length while deriving the score for a packet. Hubballi, Biswas, and Nandi (2010) showed that using frequency information of short sequences is useful in reducing false negatives. This is based on the observation that, an accidental presence of a short sequence in the normal packet which is otherwise only seen in a malicious packet can evade its detection if only the sequence’s presence or absence is taken into account while calculating the score. Further over occurrence of a particular type of short sequence in a payload may also signal an intrusion. A sample payload of a buffer overflow attack is shown in Fig. 1. It can be seen from this figure that, it contains a large number of A’s inside its payload which are meant to overflow the buffer in a target process to hijack its execution. If binary comparison of short sequences is done and by chance one short sequence containing only A’s is found in normal profile, this packet will not be detected as anomalous as almost all sequences of this packet will be declared as found in profile.
While one class classification is useful to create a model of payload anomaly detection and frequency information is useful in reducing false negatives, we in this paper propose a novel one class classification method to accurately detect application specific intrusions. Our method combines the advantages of both frequency based methods and one class classification techniques. We calculate the occurrence probability range of a particular short sequence from known non malicious packets. This probability serves as an indicator of permitted upper and lower bound on number of times a particular short sequence appears in a packet.1 A packet whose payload has large number of deviating short sequences from their respective identified ranges will be detected as anomaly. In specific in this paper we make following contributions.
- •
We describe a version of Multinomial Bayesian one class classification technique for accurately detecting anomalous payloads.
- •
We propose an efficient data structure called Probability Tree to store the probability ranges of short sequences found in non malicious payloads in the profile creation phase.
- •
We experiment with a large HTTP dataset and report the performance of our anomaly detection system in terms of its Detection Rate and False Positive Rate.
Rest of the paper is organized as follows. In Section 2, we describe few of the other works closely related to ours. In Section 3 we elaborate our proposed one class classification scheme for detecting anomalous payload. Experimental details are provided in Section 4 followed by the conclusion in Section 5.
Section snippets
Prior work
There are several models which use ngram (which is a synonym for short sequence) based technique to detect the anomalous payloads. Few of the most closely related works are described below.
Wang and Stolfo (2004) proposed PAYL which uses 1-gram to extract 256 features with their respective frequencies from normal traffic. Each feature corresponds to one of the 256 ASCII values. Average of all the feature vectors is called centroid and represents the normal profile of an application. In order to
One class classification model
In this section we describe our proposed model OCPAD for detecting anomalous payloads in the network traffic. Our model is based on the occurrence probability of an ngram in a packet. The class of a packet (normal/anomaly) is calculated with probability of each ngram using a version of Multinomial one class Naive Bayes classifier.
Experimental evaluation
In this section we describe the experiments done to evaluate the proposed one class classification model. For the experiments, we use a large dataset collected from our institute network. Two types of datasets are used in the experiment. First is a normal dataset and second is an abnormal or attack dataset. Details pertaining to these two datasets are given below.
Normal dataset: This dataset was collected at the institute network of IIT Indore. There are around 600 hosts in this network.
Conclusion
State of the art content anomaly detectors suffer from failed detection as they use only presence or absence of ngram from training dataset. In this paper we described OCPAD a content anomaly detector for detecting anomalous packet payloads in network traffic. Unlike the previous methods OCPAD uses the occurrence probability range of each ngram in any payload and use one class Naive Bayes classification to detect anomalies. Like any other anomaly detector, OCPAD has a training phase and testing
References (32)
- et al.
Hmmpayl: An intrusion detection system based on hidden markov models
Computer Security
(2011) - et al.
Intrusion detection using gsad model for http traffic on web services
Iwcmc ’10: Proceedings of the 6th international wireless communications and mobile computing conference
(2010) - et al.
Advanced polymorphic worms: Evading ids by blending in with normal traffic
Usenix ’06: In proceedings of the usenix security symposium
(2006) - et al.
Http attack detection using n-gram analysis
Computer and Security
(2014) - et al.
Mcpad: A multiple classifier system for accurate payload-based anomaly detection
Computer Networks
(2009) - et al.
Using an ensemble of one-class svm classifiers to harden payload-based anomaly detection systems
Icdm ’06: Proceedings of the sixth international conference on data mining
(2006) - et al.
Application-specific traffic anomaly detection using universal background model
Iwspa ’15: Proceedings of the 2015 acm international workshop on international workshop on security and privacy analytics
(2015) - et al.
Exploiting n-gram location for intrusion detection
Ictai’15: 27th international conference on tools with artificial intelligence
(2015) - et al.
Network malware classification comparison using dpi and flow packet headers
Journal of Virology and Hacking Techniques
(2016) - et al.
Anomaly detection: A survey
ACM Computing Surveys
(2009)
Polymorphic shellcode engine using spectrum analysis
Phrack Issue
Polymorphic blending attacks
Usenix-ss’06: Proceedings of the 15th conference on usenix security symposium
Challenging the anomaly detection paradigm: A provocative discussion
Nspw ’06: Proceedings of the 2006 workshop on new security paradigms
Combining expert knowledge with automatic feature extraction for reliable web attack detection
Security and Communication Networks
Toward supervised anomaly detection
Journal of Artificial Intelligence Research
Layered higher order n-grams for hardening payload based anomaly intrusion detection
Fares ’10: Proceedings of the workshop on frontiers of availability, reliability and security
Cited by (81)
Cloud-edge coordinated traffic anomaly detection for industrial cyber-physical systems
2023, Expert Systems with ApplicationsA traffic anomaly detection approach based on unsupervised learning for industrial cyber–physical system
2023, Knowledge-Based SystemsScale-independent shrinkage broad learning system for wheelset bearing anomaly detection under variable conditions
2023, Mechanical Systems and Signal ProcessingA framework of active learning and semi-supervised learning for lithology identification based on improved naive Bayes
2022, Expert Systems with ApplicationsCitation Excerpt :It has been widely used due to its high efficiency, solid theoretical foundation, and good generalization ability(Friedman et al., 1997; P. Domingos & Pazzani, 1997; Swarnkar & Hubballi, 2016). Naive Bayes assumes that when class variables are given, the attribute variables are conditionally independent.
Ascertain the efficient machine learning approach to detect different ARP attacks
2022, Computers and Electrical EngineeringGenerating decision support for alarm processing in cold supply chains using a hybrid k-NN algorithm
2022, Expert Systems with ApplicationsCitation Excerpt :The algorithm also fared well in the early detection of health issues (P. M. Kumar & Devi Gandhi, 2018) and driver’s alertness (Allach, Ahmed, & Boudhir, 2019). Naïve Bayes combined with k-means fared well in the reduction of false alarm rate in intrusion detection (Om & Kundu, 2012), detection of payload-based anomalies (Swarnkar & Hubballi, 2016), and software defect prediction (Arar & Ayan, 2017). Hardly interpretable models were represented by ensemble or hybrid algorithms and deep neural networks.