research-article

Intrusion Detection Using Big Data and Deep Learning Techniques

Authors:
Osama Faker

Cankaya University, Ankara, Turkey

Cankaya University, Ankara, Turkey
View Profile

,
Erdogan Dogdu

Cankaya University, Georgia State University (adjunct), Ankara, Turkey

Cankaya University, Georgia State University (adjunct), Ankara, Turkey
View Profile

ACM SE '19: Proceedings of the 2019 ACM Southeast ConferenceApril 2019Pages 86–93https://doi.org/10.1145/3299815.3314439

Published:18 April 2019Publication History

ACM SE '19: Proceedings of the 2019 ACM Southeast Conference

Pages 86–93

ABSTRACT

In this paper, Big Data and Deep Learning Techniques are integrated to improve the performance of intrusion detection systems. Three classifiers are used to classify network traffic datasets, and these are Deep Feed-Forward Neural Network (DNN) and two ensemble techniques, Random Forest and Gradient Boosting Tree (GBT). To select the most relevant attributes from the datasets, we use a homogeneity metric to evaluate features. Two recently published datasets UNSW NB15 and CICIDS2017 are used to evaluate the proposed method. 5-fold cross validation is used in this work to evaluate the machine learning models. We implemented the method using the distributed computing environment Apache Spark, integrated with Keras Deep Learning Library to implement the deep learning technique while the ensemble techniques are implemented using Apache Spark Machine Learning Library. The results show a high accuracy with DNN for binary and multiclass classification on UNSW NB15 dataset with accuracies at 99.16% for binary classification and 97.01% for multiclass classification. While GBT classifier achieved the best accuracy for binary classification with the CICIDS2017 dataset at 99.99%, for multiclass classification DNN has the highest accuracy with 99.56%.

References

M. Al-Zewairi, S. Almajali, and A. Awajan. 2017. Experimental Evaluation of a Multi-layer Feed-Forward Artificial Neural Network Classifier for Network Intrusion Detection System. 2017 International Conference on New Trends in Computing Sciences (ICTCS), Amman, Jordan, pp. 167--172, IEEEGoogle Scholar
M. Belouch, S. El Hadaj, and M. Idhammad. 2017. Two-stage Classifier Approach Using RepTree algorithm for Network Intrusion Detection. International Journal of Advanced Computer Science and Applications, 8(6), pp. 389--394.Google ScholarCross Ref
M. Belouch, S. El Hadaj, and M. Idhammad. 2018. Performance Evaluation of Intrusion Detection based on Machine Learning Using Apache Spark. Procedia Computer Science 127, pp. 1--6. Google ScholarDigital Library
L. Breiman. 2001. Random Forests. Machine Learning, 45(1), pp. 5--32. Google ScholarDigital Library
V. Chandola, A. Banerjee, and V. Kumar. 2009. Anomaly Detection: A Survey. ACM Computing Surveys, 41(3), pp. 1--15. Google ScholarDigital Library
F. Coelho, A. Braga, and M. Verleysen. 2012. Cluster Homogeneity as a Semi-supervised Principle for Feature Selection Using Mutual Information. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium.Google Scholar
P. Dahiya and D. Srivastava. 2018. Network Intrusion Detection in Big Dataset Using Spark. Procedia Computer Science 132, pp. 253--262.Google ScholarDigital Library
L. Dhanabal, and S. p.Shantharajah. 2015. A Study on NSL KDD Dataset for Intrusion Detection System based on Classification Algorithms. International Journal of Advanced Research in Computer and Communication Engineering, 4(6), pp. 446--452.Google Scholar
R. Di Pietro and L. V. Mancini, eds. 2008. Intrusion Detection Systems. Springer Science & Business, vol. 38. Media.Google Scholar
Osama Faker. 2018. Intrusion Detection Using Big Data and Deep Learning Techniques. MS Thesis, Cankaya University.Google Scholar
J.H. Friedman. 2002. Stochastic Gradient Boosting. Computational Statistics & Data Analysis, 38(4), pp. 367--378. Google ScholarDigital Library
H. Gharaee and H. Hosseinvand. 2016. A New Feature Selection IDS based on Genetic Algorithm and SVM. Telecommunications (IST), 2016 8th International Symposium on. IEEE, pp. 139--144.Google Scholar
G.P. Gupta and M. Kulariya. 2016. A Framework for Fast and Efficient Cyber Security Network Intrusion Detection Using Apache Spark. Procedia Computer Science 93, Kochi, India, pp. 824--831.Google ScholarCross Ref
J. Han, E. Haihong, G. Le, and J. Du. 2011. Survey on NoSQL Databases. In Pervasive Computing and Applications (ICPCA), Port Elizabeth, South Africa 2011 6th International Conference on, pp. 363--366. IEEE.Google Scholar
A. Lashkari, G. Draper-Gil, M. Mamun, and A. Ghorbani. 2017. Characterization of Tor Traffic Using Time based Features. The 3rd International Conference on Information Systems Security and Privacy, pp. 253--262.Google Scholar
Y. Liu. 2014. Random Forest Algorithm in Big Data Environment. Computer Modelling & New Technologies, 18(12A), pp. 147--151.Google Scholar
N. Moustafa and J. Slay. 2016. The Evaluation of Network Anomaly Detection Systems: Statistical Analysis of the UNSW NB15 Data Set and the Comparison with the KDD99 Data Set. Information Security Journal: A Global Perspective, 25(13), pp. 18--31. Google ScholarDigital Library
N. Moustafa and J. Slay. 2015. UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 Network Data Set). Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, pp. 1--6, IEEE.Google Scholar
N. Moustafa and J. Slay. 2018. The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Information Security Journal: A Global Perspective, 25(1-3), pp. 18--31. Google ScholarDigital Library
R. Primartha and B. Tama. 2017. Anomaly Detection Using Random Forest: A Performance Revisited. Data and Software Engineering (ICoDSE), International Conference on, Palembang Sumatra Selatan, Indonesia, pp. 1--6, IEEE.Google Scholar
P. Resende and A. Drummond. 2018. Adaptive Anomaly-based Intrusion Detection System Using Genetic Algorithm and Profiling. Security and Privacy, e36, pp. 1--13.Google Scholar
A. Rosenberg and J. Hirschberg. 2007. V-measure: A Conditional Entropy-based External Cluster Evaluation Measure. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning(EMNLP-CoNLL), pp. 410--420.Google Scholar
J. Schmidhuber. 2015. Deep Learning in Neural Networks: An Overview. Neural Networks, vol. 61, pp. 85--117. Google ScholarDigital Library
I. Sharafaldin, A. Lashkari, and A. A. Ghorbani. 2018. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP 2018). Funchal, Madeira-Portugal, pp. 108--116.Google ScholarCross Ref
I. Sharafaldin, A. Gharib, A. H. Lashkari, and A. A. Ghorbani. 2018. Towards a Reliable Intrusion Detection Benchmark Dataset. Software Networking, 2018(1), pp. 177--200.Google ScholarCross Ref
K. Shvachko, H. Kuang, S. Radia, and R. Chansler. 2010. The Hadoop Distributed File System. Mass Storage Systems and Technologies (MSST), IEEE 26th symposium on, pp. 1--10. Google ScholarDigital Library
O.B. Sezer, M. Ozbayoglu, E. Dogdu. 2017. A Deep Neural-Network Based Stock Trading System Based on Evolutionary Optimized Technical Analysis Parameters. Procedia Computer Science, 114, pp. 473--480. Google ScholarDigital Library
S. Suthaharan. 2014. Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning. ACM SIGMETRICS Performance Evaluation Review 41(4), pp. 70--73. Google ScholarDigital Library
M. Tavallaee, E. Bagheri, W. Lu, and A. A.Ghorbani. 2009. A Detailed Analysis of the KDD CUP 99 Data Set. In Computational Intelligence for Security and Defense Applications. CISDA 2009. IEEE Symposium on, pp. 1--6, IEEE. Google ScholarDigital Library
A. Thusoo, et al.2009. Hive: A Warehousing Solution over a Map-Reduce Framework. Proceedings of the VLDB Endowment 2(2), pp. 1626--1629. Google ScholarDigital Library
E.D. Ubeyli and E. Dogdu. 2010. Automatic Detection of Erythemato-squamous Diseases Using K-means Clustering. Journal of Medical Systems, 34(2), pp. 179--184. Google ScholarDigital Library
R. Vijayanand, D. Devaraj, and B. Kannapiran. 2018. Intrusion Detection System for Wireless Mesh Network Using Multiple Support Vector Machine Classifiers with Genetic-Algorithm-based Feature Selection. Computers & Security 77, pp. 304--314.Google ScholarDigital Library
M. Zaharia, et al. 2016. Apache Spark: A Unified Engine for Big Data Processing. Communications of the ACM 59(11), pp. 56--65. Google ScholarDigital Library
C. Zhang and Y. Ma, eds. 2012. Ensemble Machine Learning: Methods and Applications. Springer Science & Business Media, Springer.Google Scholar
P. Zikopoulos and C. Eaton. 2011. Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Osborne Media. Google ScholarDigital Library
R. Zuech, T. M. Khoshgoftaar, and R. Wald. 2015. Intrusion Detection and Big Heterogeneous Data: A Survey. Journal of Big Data, 2(3), pp. 1--41.Google Scholar

Index Terms

Intrusion Detection Using Big Data and Deep Learning Techniques
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
2. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation

Recommendations

Improving performance of intrusion detection system using ensemble methods and feature selection
ACSW '18: Proceedings of the Australasian Computer Science Week Multiconference

The main task of an intrusion detection system (IDS) is to detect anomalous behaviors from both within and outside the network system, and there have been increasing studies applying machine learning in this area. The limitations of using a single ...
Read More
Analysis of Feature Selection and Ensemble Classifier Methods for Intrusion Detection

Day by day network security is becoming more challenging task. Intrusion detection systems IDSs are one of the methods used to monitor the network activities. Data mining algorithms play a major role in the field of IDS. NSL-KDD'99 dataset is used to ...
Read More
Real time intrusion detection system for ultra-high-speed big data environments

In recent years, the number of people using the Internet and network services is increasing day by day. On a daily basis, a large amount of data is generated over the Internet from zeta byte to petabytes with a very high speed. On the other hand, we see ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACM SE '19: Proceedings of the 2019 ACM Southeast Conference
April 2019
295 pages
ISBN:9781450362511
DOI:10.1145/3299815
Conference Chair:
Dan Lo
Kennesaw State University
,
Program Chair:
Donghyun Kim
Kennesaw State University
,
Publications Chair:
Eric Gamess
Jacksonville State University
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 April 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Intrusion detection system
artificial neural networks
big data
deep learning
ensemble techniques
feature selection
machine learning
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate178of377submissions,47%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 123
  Total Citations
  View Citations
- 1,561
  Total Downloads
- Downloads (Last 12 months)169
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Intrusion Detection Using Big Data and Deep Learning Techniques

ACM SE '19: Proceedings of the 2019 ACM Southeast Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Improving performance of intrusion detection system using ensemble methods and feature selection

Analysis of Feature Selection and Ensemble Classifier Methods for Intrusion Detection

Real time intrusion detection system for ultra-high-speed big data environments