1 Introduction

In a more digital and versatile world, it is very hard to safeguard private information with the aid of conventional security systems. Moreover, the number of attacks on computer networks has been increasing over the past few years. For detecting suspicious activity over the network with invasive behaviours such as vagueness, heterogeneity, convolution, and dynamic tendency, various methods are evolving continuously [1,2,3]. More recently, the accuracy of the intrusion detection and prevention system has been improved significantly with the application of artificial intelligence based algorithms. In general, IDS are classified into signature, anomaly, host and network based systems. In particular, signature based detection is widely employed which draws rules sets to identify the patterns [4]. Normally, when IDS detect ongoing attack in the computer system, it raises alarm for taking action by the administrator [5, 6]. The construction of efficient IDS in more crucial as the attacks on the network destroys huge volume of resources and paralyses entire network.

Typically, in the domain of artificial intelligence, the most dominant intelligent one is the deep learning models. The deep neural network has the ability to better learn latent representation automatically required for intrusion detection from raw data. Furthermore, the generalization ability of the deep neural network prevents the IDS from overfitting by generating relationships beyond immediate neighbours in the input [7]. One of the widely adopted architecture of deep neural network is stacked convolution autoencoder which trains the network in an unsupervised manner. Hence, in this work, the intrusion detection system is developed with the exploit of Deep Autoencoder to improve the performance of IDS using the adaptive and self-learning ability of the deep neural network. Moreover, Deep Autoencoder is straightforward and essential machine learning model for extracting more robust features from raw input. However, it suffers from the initialization of the number of hidden neurons. Among the swarm intelligence based optimization methods, it is observed that the Fruitfly algorithm is more widely adopted to fix the hyper parameters in many artificial intelligence applications [8]. One of the key features of Fruitfly Optimization algorithm is that it eliminates the local optimum problem and converges to global optimum quickly. As a result, the Fruitfly algorithm is implemented to optimize the neurons in the hidden layer of deep autoencoder.

In this paper, a Deep convolution Autoencoder with more hidden layers is used to extract more discriminate features from the input. Furthermore, the neurons of the hidden layers are optimized with Fruitfly Optimization algorithm. The missing values in the dataset have been imputed with the Fuzzy C-Means Rough Parameter algorithm to improve the performance.

The paper is organized as follows. In Sect. 2, the crucial intrusion detection methods are overviewed. In Sect. 3, the missing value imputation method FCMRP is presented. Section 4 presents the proposed intrusion detection method using Deep Autoencoder with Fruitfly algorithm. The computational results are presented under Sect. 5. Finally, the conclusions and future research directions are given in Sect. 6.

2 Related works

More recently, the Deep Learning models play a significant role to handle more complex representation of the data. In [9], a hybridized method was introduced for handling missing values in the traffic dataset. Here, Fuzzy C-Means was employed to approximate the missing values. Further, the evolutionary genetic algorithm has been applied in order to optimize the hyper parameters such as centroids and membership functions in FCM algorithm.

Tian et al. [10] had developed an IDS using Radial Basic Function Network. The neural network can identify different intrusion activities fastly and efficiently by identifying the distinctive malicious characteristics information in the network. The computational results revealed that the developed intrusion detection system is viable and effectual to classify the attacks.

Mohammad et al. [11] classified network attacks in a system using conventional neural network. Here, two layer multi layer perceptron was developed with back propagation learning algorithm. The classification accuracy of 90.78% has been achieved.

Xiangmei et al. [12] had utilized neural network with Genetic Algorithm to construct an intrusion detection model. The computational results of the intrusion detection system with the developed method provide better accuracy.

Chualong et al. [13] had introduced Rough Neural Network based IDS for two class and multiclass attacks in the network. Experiments were conducted with varying the number of learning rate. The constructed classifier was compared with various benchmark classifiers such as J48, Random Forest and SVM to conclude the efficacy of the developed method. Alzubi et al. [14] had developed a feature selection algorithm named Binary Grey Wolf Optimization to obtain fine features from the dataset. Then, SVM was exploited to categorize the attacks. Experiments were conducted on KDD cup dataset and 99.22% accuracy was achieved. With the Binary Grey Wolf Optimization method, fourteen vital features were selected for classifying the network attacks. Qureshi et al. [15] introduced a deep neural network based IDS with self taught learning procedure. The features were extracted from the pre-trained network for classification. The experimental had been conducted on KDD cup dataset expound that the developed method improved by means of accuracy, and Receiver Operating Curve (ROC). In [16], the Genetic Algorithm was applied to choose the more optimal features from the intrusion dataset. The attacks were classified with Support Vector Machine and obtained a true-positive rate of 0.973.

Rekha et al. [17] reviewed conventional machine learning algorithms such as Decision Tree, Random Forest, Apriori algorithm, Artificial Neural Network, etc. to detect IDS in a network. A comparative analysis with accuracy as a metric of various datasets has been presented. Drewek-Ossowicka et al. [18] compared various architectures of neural network utilized for creating an Intrusion Detection System. Liu et al. [19] implemented a deep Convolutional Neural Network model in order to detect network attack of an organization. Initially, the raw data was preprocessed to convert into two dimensional data. Further, more discriminate features were extracted using Convolutional Neural Network with Softmax classifier on KDD-CUP 99 and NSL-KDD standard network intrusion detection dataset.

Most of the intrusion detection system discussed in the literature utilizes hand designed feature extractor for classification. Further, the hyperparameters are not optimized and the missing values are imputed randomly. Hence, these limitations are overcome in this research work.

3 Pre-processing

Generally, missing value imputation is one of the major challenging tasks that have to be handled in various machine learning applications. In this work, clustering based imputation has been utilized for imputing the missing values. Here, clustering is done with the dataset to obtain initial clusters. The missing values are imputed based on the cluster information.

3.1 Rough K-means centroid based imputation method (RKMC)

In RKMC, the centroid values obtained is used to impute the missing values exists in a particular dataset [20]. The complete procedure is given in algorithm1 and Fig. 1 respectively. The centriod has been measured by

$$m_{k} = w_{l} \mathop \sum \limits_{{x_{i} \in \underline{{C_{k} }} }} \frac{{x_{i} }}{{\left| {\underline{{C_{k} }} } \right|}} + w_{u} \mathop \sum \limits_{{x_{i} \in \overline{{C_{k} }} }} \frac{{x_{i} }}{{\left| {\overline{{C_{k} }} } \right|}}$$
(1)
figure a
Fig. 1
figure 1

Rough K-means centroid based imputation

3.2 Fuzzy C-means rough parameter based imputation method

The FCMRP algorithm consists of three steps. Initially, FCM clustering algorithm is applied to the objects in order to group the data into clusters. Then, find the centroid and nearest approximation in the cluster for each missing object. Finally, the missing values exists in each incomplete object is imputed with the lower and upper approximation of the cluster [21]. The centroid of FCM is computed by

$$m_{k} = \frac{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{N}}} {\text{u}}_{{{\text{ij}}}}^{{\text{v}}} .x_{{\text{i}}} }}{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{N} {\text{u}}_{{{\text{ij}}}}^{{\text{v}}} }}$$
(2)

In FCMRP, the rough clustering applied to the clusters attained with FCM method. Here, the imputation is carried out in lower and upper approximation separately. The incomplete data in each object is imputed by

$${\text{mcx}}_{{\text{i}}} = \left\{ {\begin{array}{*{20}l} {\frac{{\mathop \sum \nolimits_{{{\text{mx}}_{{\text{i}}} \in \underline{{{\text{C}}_{{\text{h}}} }} }} \underline{{{\text{cx}}}}_{{{\text{hj}}}} }}{{\left| {\underline{{{\text{C}}_{{\text{h}}} }} } \right|}} if\;{\text{mx}}_{{\text{i}}} \;exis\; in\;lower \;approximation } \hfill \\ {\frac{{\mathop \sum \nolimits_{{{\text{mx}}_{{\text{i}}} \in \overline{{{\text{C}}_{{\text{h}}} }} }} \overline{{{\text{cx}}}}_{{{\text{hj}}}} }}{{\left| {\overline{{{\text{C}}_{{\text{h}}} }} } \right|}} if\;{\text{mx}}_{{\text{i}}} \;exist \;in \;upper\;approximation} \hfill \\ \end{array} } \right.$$
(3)

Algorithm 2 shows the steps involved in FCMRP to handle missing values in a dataset. Figure 2 depicts the FCMRP with its clusters C1, C2, C3, RC11 and RC12.

figure b
Fig. 2
figure 2

FCMRP based imputation method

4 Methodology

The proposed intrusion detection system by Deep Autoencoder with Fruitfly Optimization and BPN is discussed in this section.

4.1 Deep autoencoder

Autoencoder is a special kind of multilayer perceptron where the number of neurons in the input and output layer is same. A deep autoencoder will be constructed by stacking several autoencoders as in Deep Belief Networks (DBNs) [22,23,24]. It is used to learn more complex representations from the raw input for further analysis. The architecture of autoencoder includes two parts, encoder and decoder which are trained at each successive layer. While constructing a deep autoencoder, each layer receives its input from previous layer. In particularly, the autoencoder is trained to convert the raw input into some latent or more abstract representation and the output is reconstructed from that compressed representation [25,26,27,28,29]. The encoder receives the raw data as input \(I\left( x \right) \in R^{d}\) and maps into a latent representation \(H\left( x \right) \in R^{\prime }\) with the function as,

$$H\left( x \right) = \sigma \left( {W*I\left( { x} \right) + b} \right)$$
(4)

Here, sigmoid is used as the learning function. W and b are the weight and bias respectively. The decoder part reconstructs the input back at the output layer with the function as given in (5).

$$R\left( x \right) = \sigma \left( {W^{\prime } *H\left( { x} \right) + b^{\prime } } \right)$$
(5)

where \(R\left( x \right)\) is the predicted output of the input \(I\left( x \right)\) from the latent representation \(H\left( { x} \right)\). The weights of the autoencoders are optimized to reduce the reconstruction error of the network. The reconstruction error is computed as,

$$L\left( {I\left( { x} \right) ,R\left( { x} \right) } \right) = ||I\left( { x} \right) - R\left( { x} \right)||^{2}$$
(6)

Further, the cross-entropy measure is used to compute the reconstruction error for binary values using (7).

$$L\left( {I\left( { x} \right),R\left( { x} \right)} \right) = - \mathop \sum \limits_{k = 1}^{d} [I\left( { x} \right)_{k} logR\left( { x} \right)_{k} + \left( {1I\left( { x} \right)_{k} } \right) log\left( {1 - R\left( { x} \right)r_{k} } \right)]$$
(7)

4.2 Fruitfly optimization

The Fruitfly is a population based optimization method introduced by Pan in 2012 [30]. This algorithm is stimulated by the foraging behavior of FruitFlies. More specifically, it has the capability to smell the food source from a very long distance. Hence, the Fruitfly algorithm has been adopted to optimize the parameters in various real-time applications recently [31, 32]. In this work, the Fruitfly is adopted to optimize the neurons of hidden layers in deep autoencoder.

4.3 Proposed intrusion detection technique

The Deep Autoencoder is trained with multiple hidden layers to extract the optimal features from raw data. Here, Fruitfly algorithm is implemented to optimize the hidden neurons of deep autoencoder while training. Furthermore, to classify the attacks into Denial of Service (DoS), User-to-Root (U2R), Remote-to-Locals (R2L), Probe, and Normal, the backpropagation neural network is exploited. The complete procedure of the proposed IDS is given in Fig. 3. The procedure for learning and stacking several layers of autoencoders is presented in Fig. 4. The classification of attacks in a network system is presented in algorithm 3.

figure c
Fig. 3
figure 3

Overview of the proposed IDS

Fig. 4
figure 4

Feature extraction using deep autoencoder

4.4 BPN for classification

In the domain of artificial intelligence, neural networks are promising avenue of research to obtain the complex representations between input and target output [33,34,35,36,37].

The back propagation learning is widely implemented to converge the error. Figure 5 shows the architecture of the BPN network for classification of the attacks in a network where X1 to Xn indicates the features obtained by Deep Autoencoder from raw data. And Y1 to Y5 represents class labels of the attack.

Fig. 5
figure 5

Architecture of BPN for attack classification

5 Experimental results and discussion

The proposed IDS using Deep Autoencoder with Fruitfly Optimization has been implemented and discussed. The features obtained from Deep Autoencoder are utilized to classify the attacks of a network system in a better way.

5.1 Dataset

The experiments have been conducted on the NSL_KDD and UNSW-NB15 dataset. The NSL_KDD dataset contains 60,741 records with 41 predictor variables and 1 response variable [38]. The UNSW-NB15 dataset contains 1,75,341 records with 42 predictor variables and 1 response variable [39]. The service attribute of UNSW-NB15 dataset contains missing values for 94,205 objects. The objects could be classified as Denial of Service (DoS), User-to-Root (U2R), Remote-to-Locals (R2L), Probe, and Normal in NSL_KDD; Attack and Normal in UNSW-NB15 dataset.

5.2 Experimental results

The specifications of BOSTON X86-GPU Rack Server utilized to implement the proposed model is presented in Table 1.

Table 1 Boston X86-GPU rack server specifications

5.3 Missing value imputation

For experimentation, five missing combinations have been used with UNSW-NB15 dataset. Further, Root Mean Square (RMSE) and Mean Absolute Error (MAE) are employed as quantitative measures of imputation algorithms.

The comparison between actual and imputed value with various imputation algorithms on UNSW-NB15 dataset is presented in Table 2. The position (1340, 3) contains actual value of 0. For experimentation, it has been removed and treated as missing value in the dataset. The value obtained with RKMC and FCMRP at (1340, 3) are 1 and 0 respectively. Additionally, the value imputed by FCMRP is closer to the original value in the dataset for all combinations of missing values.

Table 2 Quantitative performance on UNSW-NB15 dataset

The computational results of RKMC and FCMRP on UNSW-NB15 are given in Table 3. The Root Mean Square Error of RKMC with missing combination of 5%, 10%, 15%, 20% and 25% are 7.1038, 5.0140, 4.4368, 3.4156, and 3.4039 respectively. The Root Mean Square Error of FCMRP with missing combination of 5%, 10%, 15%, 20% and 25% are 4.5378, 3.0600, 4.1094, 2.7115, and 2.1898 respectively. Further, the quantitative results of cluster analysis with RKMC and FCMRP on UNSW-NB15 are presented in Tables 4 and 5 respectively. The computational results of the FCMRP have been improved as it handles the imprecision in datasets with the exploit of fuzzy and rough sets while preserving crucial information.

Table 3 RMSE and MAE on UNSW-NB15 dataset
Table 4 Performance of RKMC on UNSW-NB15 dataset
Table 5 Performance of FCMRP on UNSW-NB15 dataset

5.3.1 Classifying attacks

The classification results of the proposed intrusion detection system is compared with various classifiers viz Naïve Bayes, SVM, RBFN, conventional BPN, autoencoder with softmax and autoencoder with BPN. The parameters of Fruitfly algorithm are reported in Table 6.

Table 6 Fruitfly parameters

5.3.2 NSL_KDD dataset

The computational results of the proposed intrusion detection system on NSL_KDD dataset have been discussed. The confusion matrix for attack classification using autoencoder with softmax classifier, autoencoder with backpropagation neural network, and Fruitfly-Autoencoder with back propagation neural network is shown in Tables 7, 8 and 9 respectively.

Table 7 Confusion matrix of autoencoder with softmax classifier on NSL_KDD dataset
Table 8 Confusion matrix of autoencoder with BPN on NSL_KDD dataset
Table 9 Confusion matrix for fruitfly-autoencoder with BPN on NSL_KDD dataset

The quantitative results of the Fruitfly-Autoencoder with BPN for intrusion detection on NSL_KDD dataset have been compared with standard pattern recognition algorithms and are given in Table 10.

Table 10 Comparative analysis of proposed intrusion detection system with existing benchmark classifiers on NSL_KDD dataset

From Table 10 and Fig. 6, it is clearly understood that the Fruitfly-Autoencoder with BPN outperforms the existing classifiers in attack classification. The quantitative results of precision, recall, and F-measure are specified as an average of precision, recall, and F-measure for all classes. The highest and lowest accuracies are 94.00% for Fruitfly-Autoencoder with BPN and 71.14% for Naïve Bayes classifier. The classifiers such as Naïve Bayes, SVM, RBFN, BPN, Autoencoder with Softmax, Autoencoder with BPN, and Fruitfly- Autoencoder with BPN provides precision of 70.10%, 87.00%, 87.70%, 89.35%, 90.05%, 91.35%, and 92.90% respectively.

Fig. 6
figure 6

Relative quantitative measures for intrusion detection system on NSL_KDD dataset

5.3.3 UNSW-NB15 dataset

The computational results of the proposed intrusion detection system on UNSW-NB15 dataset have been discussed. The confusion matrix for attack classification using autoencoder with softmax classifier, autoencoder with backpropagation neural network, and Fruitfly-Autoencoder with backpropagation neural network is shown in Tables 11, 12 and 13 respectively. The quantitative results of the Fruitfly-Autoencoder with BPN for intrusion detection on UNSW-NB15 dataset have been compared with standard pattern recognition algorithms and are given in Table 14.

Table 11 Confusion matrix of autoencoder with softmax classifier on UNSW-NB15 dataset
Table 12 Confusion matrix of autoencoder with BPN on UNSW-NB15 dataset
Table 13 Confusion matrix of fruitfly-autoencoder with BPN on UNSW-NB15 dataset
Table 14 Comparative analysis of proposed intrusion detection system with existing benchmark classifiers on UNSW-NB15 dataset

From Table 14 and Fig. 7, it is clearly understood that the Fruitfly-Autoencoder with BPN outperforms the existing classifiers in attack classification. The quantitative results of precision, recall, and F-measure are specified as an average of precision, recall, and F-measure for all classes.

Fig. 7
figure 7

Relative quantitative measures for intrusion detection system on UNSW-NB15 dataset

The highest and lowest accuracies are 94.00% for Fruitfly-Autoencoder with BPN and 89.31% for Naïve Bayes classifier. The classifiers such as Naïve Bayes, SVM, RBFN, BPN, Autoencoder with Softmax, Autoencoder with BPN, and Fruitfly-Autoencoder with BPN provides precision of 86.20%, 88.20%, 92.00%, 90.50%, 90.35%, 92.50%, and 92.50% respectively.

The classifiers such as Naïve Bayes, SVM, RBFN, BPN, Autoencoder with Softmax, Autoencoder with BPN, and Fruitfly-Autoencoder with BPN provides F-Measure of 87.30%, 88.85%, 90.19%, 90.30%, 90.74%, 92.65%, and 92.84% respectively. Similarly, the Error Rate of Naïve Bayes, SVM, RBFN, BPN, Autoencoder with Softmax, Autoencoder with BPN, and Fruitfly-Autoencoder with BPN are 10.69%, 11.80%, 10.40%, 09.70%, 07.60%, 06.90%, and 06.00% respectively. In the proposed model, features extracted from deep Autoencoder with multiple hidden layers are more robust than hand designed feature extractors. In summary, the proposed Fruitfly-Autoencoder with BPN exhibits the highest accuracy of 94.00% and lowest error rate of 6.00% than Naïve Bayes, SVM, RBFN, BPN, Autoencoder with Softmax, and Autoencoder with BPN.

6 Conclusion and future enhancement

Intrusion Detection can be considered as more vital to safeguard the crucial information in a network system of an organization. More recently, deep learning approach is widely employed in many real world problems to achieve better prediction. Hence, in this research work, a hybridized method using Deep Autoencoder with Fruitfly Optimization is introduced for classifying the attacks. Initially, missing values in the dataset have been imputed with the Fuzzy C-Means Rough Parameter method. Then, the more discriminate features are extracted from the raw dataset with the exploit of Deep Autoencoder with more hidden layers. Furthermore, the hidden neurons of Deep Autoencoder are optimized with the meta-heuristic Fruitfly Optimization algorithm. Finally, BPN has been utilized to predict the attacks. The proposed Fruitfly-Autoencoder with BPN produced high accuracy than benchmark classifiers such as Naïve Bayes, SVM, RBFN, BPN, Autoencoder with Softmax, and Autoencoder with BPN. To validate the proposed IDS, several experiments have been conducted on NSL_KDD and UNSW-NB15 dataset. The proposed model was executed on GPU version P100 as they process multiple computations in parallel. The acquired results expound that the proposed method reveals better results in terms of precision, recall, f-measure, accuracy and error rate. In future, other Deep Learning models will be integrated to classify attacks in the network system. Further, the proposed approach could be enhanced by incorporating latest optimization algorithms such as cooperative coevolutionary differential evolution, whale optimization, and Biogeography-based Optimization.