Detecting attack signatures in the real network traffic with ANNIDA

https://doi.org/10.1016/j.eswa.2007.03.011Get rights and content

Abstract

In this paper, an improved version of ANNIDA for detecting attack signatures in the payload of network packets is presented. The Hamming Net artificial neural network methodology was used with good results. A review of the application’s development is followed by a summary of the modifications made in the application in order to classify real data. Application improvements are reported, solving the problems of time delays in writing/reading data in the files and data collision effects when generating numeric keys used to model data for the neural network. Test results highlight the increased accuracy and efficiency of the new application when submitted to real data from HTTP network traffic containing actual traces of attacks and legitimate data. Finally, an evaluation of the application to detect signatures in real network traffic data is presented.

Introduction

The search for content strings associated with attack signatures in packets passing through a network is performed by signature-based IDS (Intrusion Detection System), such as Snort, Bro, ACME, RealSecure, and Shadow, among others. These systems provide a low rate of false positives and provide alerts of attack traces, thus allowing the fast implementation of countermeasures to reestablish the environment normality. Also, they present satisfactory performance, because mathematical operations using floating-points are rarely executed in these systems (Silva, Santos, Silva, & Montes, 2005a).

The majority of signature-based IDS are implemented using rules and filters. In general, these systems are restricted to identifying only certain malicious content strings, and most of the time they are unable to detect new types of attacks. To solve this problem, neural network approaches found in the literature (Lippmann and Cunningham, 2000, Silva et al., 2003, Silva et al., 2004, Silva et al., 2005a, Silva et al., 2005b, Silva et al., 2006), and other artificial intelligence techniques, like soft computing, learning machine, artificial immune systems and agent-based systems (Karim, 2005, Kruegel and Toth, 2003, Paula et al., 2004) have been adopted to classify attack patterns.

At INPE (National Institute for Space Research, Brazil) a research group has been working on the development of an application that uses artificial neural networks (ANN) to detect attack signatures in computer networks since 2003. Initial studies (Silva et al., 2003, Silva et al., 2004) evaluated the use of Hamming Net (Fausset, 1994) for fast classification of malicious content string in network traffic. The Hamming Net was chosen for its robustness, enabling the identification of attack variations or traces of new attacks (Silva et al., 2005b) and for its capacity for rapid detection of malicious signatures (Silva et al., 2004), because it does not update weights continuously.

From the start, Snort signatures (Rules Download, 2006) have been used as search patterns (or attack patterns), because they are fairly reliable, with updated versions freely available on the Internet. Snort signatures are stored in files labeled with the network service name or with the attack type name. In a previous paper (Silva et al., 2005a), the following classes of signatures were used: ddos, dns, exploit, finger, ftp, netbios, icmp and oracle.

Some preliminary versions of this application were built to detect signatures containing only a single string of malicious content. However, since there are signatures composed of multiple content strings called ‘associated contents’, the application was improved to handle them (Silva et al., 2005a). Input data were remodeled in the revised version and the application code was modified to call the Hamming Net routine for all contents of a signature to be classified. In this version, the application was named ANNIDA – Artificial NeuralNetwork for Intrusion Detection Application (Silva et al., 2005a).

The methodology used for searching ‘associated content strings’ in network packet payload data is shown in Fig. 1. Both content strings “SITE” and “C∣3A5C∣” belong to one attack signature. Then, the files s1, s2, sn are created, where n is the total of content strings of the largest signature in the Snort rule set used. In this example, signatures with two content strings are described in two files s1 and s2. The content strings in the same line of both files are associated content strings that belong to the same attack signature.

The Hamming net routine is executed to search for malicious content string in the first file s1. As soon as it finds one, the algorithm generates an alert that this content string has been found and indicates all positions in which this suspicious data appears in the s1 file. The next files are read, s2 and so on, one by one, in an attempt to match content strings in the marked positions to the input patterns. If it makes a match, the string is marked and the search continues until the end of the associated content strings. Finally, marked strings are shown on the screen.

As shown in Fig. 1, the set of weights (W) in the neural network is calculated by using the values of signatures or exemplars (S), where W = S/2 (Fausset, 1994). For example, considering a 5-bit bipolar exemplar s1 = (−1 −1 1 −1 1), the value of the weights on the input neurons is w1 = (−0.5 −0.5 0.5 −0.5 0.5), since w1 = s1/2.

The major challenge in this project stage (Silva et al., 2005b) was the creation of a strategy to store and obtain data for the neural network, in such a way that until n associated content strings, where 0 < n < content strings is the total number of strings in the largest signature on the set, could be correctly processed, when searching an attack signature. Storing signature content strings in associated lines of text files was the strategy adopted, thus allowing a correct data reading for classification.

In this stage, satisfactory results were produced, mainly in the recognition of signature composites of up to 2 associated content strings, resulting in an average of 90% correct classifications, considering a maximum number of 500 inputs presented to the ANN. For searching signatures with a larger number of content strings, 3 and 6 content strings, for example, the precision dropped to 70% considering a total of 277 exemplar units.

A classification using Hamming Net can represent a known attack or the trace of a new attack, depending on the degree of similarity set in the application (Silva et al., 2005a). Using a 100% degree of similarity, a perfect match between an input and an attack pattern must occur. Using a lower degree of similarity it is possible to identify a variation of an attack or a new threat.

Section snippets

Application improvements

In order to improve the application’s performance and precision, some modifications were carried out, including: writing and reading data from MySQL databases, data structure changes, the application code porting to a Linux operational platform, better modeling of data for the neural network, more refinements of the application code, the creation of filters, and the use of network packet real data for analysis.

Test results and data analysis

To evaluate the performance and precision of the application, tests were carried out in a controlled network environment at INPE.

Tests results were obtained using an Atlon 64-bit workstation operating to 2.2 GHz, with 1 GB RAM and 160 GB HD. Preliminary ANNIDA versions (Silva et al., 2004, Silva et al., 2005a) were implemented in C language and compiled with DevCPP, on a Windows operational environment. The new code runs on a Linux Slackware platform, has been written in C language and compiled

Conclusion

Improvements in ANNIDA were carried out this year and very satisfactory results were achieved. The main changes in the application were as follows:

  • porting the code to a Linux operational platform;

  • modification of the data storage mechanism from text files to a MySQL database (better performance, scalability, speed in the data storage and queries, data consistency), for testing purposes;

  • detection of attack signatures in real network packet data, instead of analyzing simulated data;

  • alterations in

References (15)

  • R.P. Lippmann et al.

    Improving intrusion detection performance using keyword selection and neural networks

    Computer Networks

    (2000)
  • B. Caswell et al.

    Snort 2 – Sistema de Detecção de Intruso Open Source

    (2003)
  • Dragon Network Intrusion Detection. <http://www.enterasys.com/products/ids/>. Webpage accessed in...
  • L. Fausset

    Fundamentals of Neural Networks: Architectures, algorithms, and applications

    (1994)
  • Implementation of N-Hash. <http://www.ussrback.com/crypto/source/hash/nhash.c>. Webpage accessed in...
  • A. Karim

    Computational intelligence for network intrusion detection: Recent contributions

    Lecture Notes In Artificial Intelligence

    (2005)
  • C. Kruegel et al.

    Using decision trees to improve signature-based intrusion detection

    Lecture Notes in Computer Science

    (2003)
There are more references available in the full text version of this article.

Cited by (0)

View full text