skip to main content
10.1145/3299815.3314439acmconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
research-article

Intrusion Detection Using Big Data and Deep Learning Techniques

Published:18 April 2019Publication History

ABSTRACT

In this paper, Big Data and Deep Learning Techniques are integrated to improve the performance of intrusion detection systems. Three classifiers are used to classify network traffic datasets, and these are Deep Feed-Forward Neural Network (DNN) and two ensemble techniques, Random Forest and Gradient Boosting Tree (GBT). To select the most relevant attributes from the datasets, we use a homogeneity metric to evaluate features. Two recently published datasets UNSW NB15 and CICIDS2017 are used to evaluate the proposed method. 5-fold cross validation is used in this work to evaluate the machine learning models. We implemented the method using the distributed computing environment Apache Spark, integrated with Keras Deep Learning Library to implement the deep learning technique while the ensemble techniques are implemented using Apache Spark Machine Learning Library. The results show a high accuracy with DNN for binary and multiclass classification on UNSW NB15 dataset with accuracies at 99.16% for binary classification and 97.01% for multiclass classification. While GBT classifier achieved the best accuracy for binary classification with the CICIDS2017 dataset at 99.99%, for multiclass classification DNN has the highest accuracy with 99.56%.

References

  1. M. Al-Zewairi, S. Almajali, and A. Awajan. 2017. Experimental Evaluation of a Multi-layer Feed-Forward Artificial Neural Network Classifier for Network Intrusion Detection System. 2017 International Conference on New Trends in Computing Sciences (ICTCS), Amman, Jordan, pp. 167--172, IEEEGoogle ScholarGoogle Scholar
  2. M. Belouch, S. El Hadaj, and M. Idhammad. 2017. Two-stage Classifier Approach Using RepTree algorithm for Network Intrusion Detection. International Journal of Advanced Computer Science and Applications, 8(6), pp. 389--394.Google ScholarGoogle ScholarCross RefCross Ref
  3. M. Belouch, S. El Hadaj, and M. Idhammad. 2018. Performance Evaluation of Intrusion Detection based on Machine Learning Using Apache Spark. Procedia Computer Science 127, pp. 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Breiman. 2001. Random Forests. Machine Learning, 45(1), pp. 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. V. Chandola, A. Banerjee, and V. Kumar. 2009. Anomaly Detection: A Survey. ACM Computing Surveys, 41(3), pp. 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F. Coelho, A. Braga, and M. Verleysen. 2012. Cluster Homogeneity as a Semi-supervised Principle for Feature Selection Using Mutual Information. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium.Google ScholarGoogle Scholar
  7. P. Dahiya and D. Srivastava. 2018. Network Intrusion Detection in Big Dataset Using Spark. Procedia Computer Science 132, pp. 253--262.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Dhanabal, and S. p.Shantharajah. 2015. A Study on NSL KDD Dataset for Intrusion Detection System based on Classification Algorithms. International Journal of Advanced Research in Computer and Communication Engineering, 4(6), pp. 446--452.Google ScholarGoogle Scholar
  9. R. Di Pietro and L. V. Mancini, eds. 2008. Intrusion Detection Systems. Springer Science & Business, vol. 38. Media.Google ScholarGoogle Scholar
  10. Osama Faker. 2018. Intrusion Detection Using Big Data and Deep Learning Techniques. MS Thesis, Cankaya University.Google ScholarGoogle Scholar
  11. J.H. Friedman. 2002. Stochastic Gradient Boosting. Computational Statistics & Data Analysis, 38(4), pp. 367--378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Gharaee and H. Hosseinvand. 2016. A New Feature Selection IDS based on Genetic Algorithm and SVM. Telecommunications (IST), 2016 8th International Symposium on. IEEE, pp. 139--144.Google ScholarGoogle Scholar
  13. G.P. Gupta and M. Kulariya. 2016. A Framework for Fast and Efficient Cyber Security Network Intrusion Detection Using Apache Spark. Procedia Computer Science 93, Kochi, India, pp. 824--831.Google ScholarGoogle ScholarCross RefCross Ref
  14. J. Han, E. Haihong, G. Le, and J. Du. 2011. Survey on NoSQL Databases. In Pervasive Computing and Applications (ICPCA), Port Elizabeth, South Africa 2011 6th International Conference on, pp. 363--366. IEEE.Google ScholarGoogle Scholar
  15. A. Lashkari, G. Draper-Gil, M. Mamun, and A. Ghorbani. 2017. Characterization of Tor Traffic Using Time based Features. The 3rd International Conference on Information Systems Security and Privacy, pp. 253--262.Google ScholarGoogle Scholar
  16. Y. Liu. 2014. Random Forest Algorithm in Big Data Environment. Computer Modelling & New Technologies, 18(12A), pp. 147--151.Google ScholarGoogle Scholar
  17. N. Moustafa and J. Slay. 2016. The Evaluation of Network Anomaly Detection Systems: Statistical Analysis of the UNSW NB15 Data Set and the Comparison with the KDD99 Data Set. Information Security Journal: A Global Perspective, 25(13), pp. 18--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. Moustafa and J. Slay. 2015. UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 Network Data Set). Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, pp. 1--6, IEEE.Google ScholarGoogle Scholar
  19. N. Moustafa and J. Slay. 2018. The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Information Security Journal: A Global Perspective, 25(1-3), pp. 18--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Primartha and B. Tama. 2017. Anomaly Detection Using Random Forest: A Performance Revisited. Data and Software Engineering (ICoDSE), International Conference on, Palembang Sumatra Selatan, Indonesia, pp. 1--6, IEEE.Google ScholarGoogle Scholar
  21. P. Resende and A. Drummond. 2018. Adaptive Anomaly-based Intrusion Detection System Using Genetic Algorithm and Profiling. Security and Privacy, e36, pp. 1--13.Google ScholarGoogle Scholar
  22. A. Rosenberg and J. Hirschberg. 2007. V-measure: A Conditional Entropy-based External Cluster Evaluation Measure. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning(EMNLP-CoNLL), pp. 410--420.Google ScholarGoogle Scholar
  23. J. Schmidhuber. 2015. Deep Learning in Neural Networks: An Overview. Neural Networks, vol. 61, pp. 85--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. I. Sharafaldin, A. Lashkari, and A. A. Ghorbani. 2018. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP 2018). Funchal, Madeira-Portugal, pp. 108--116.Google ScholarGoogle ScholarCross RefCross Ref
  25. I. Sharafaldin, A. Gharib, A. H. Lashkari, and A. A. Ghorbani. 2018. Towards a Reliable Intrusion Detection Benchmark Dataset. Software Networking, 2018(1), pp. 177--200.Google ScholarGoogle ScholarCross RefCross Ref
  26. K. Shvachko, H. Kuang, S. Radia, and R. Chansler. 2010. The Hadoop Distributed File System. Mass Storage Systems and Technologies (MSST), IEEE 26th symposium on, pp. 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. O.B. Sezer, M. Ozbayoglu, E. Dogdu. 2017. A Deep Neural-Network Based Stock Trading System Based on Evolutionary Optimized Technical Analysis Parameters. Procedia Computer Science, 114, pp. 473--480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Suthaharan. 2014. Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning. ACM SIGMETRICS Performance Evaluation Review 41(4), pp. 70--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Tavallaee, E. Bagheri, W. Lu, and A. A.Ghorbani. 2009. A Detailed Analysis of the KDD CUP 99 Data Set. In Computational Intelligence for Security and Defense Applications. CISDA 2009. IEEE Symposium on, pp. 1--6, IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Thusoo, et al.2009. Hive: A Warehousing Solution over a Map-Reduce Framework. Proceedings of the VLDB Endowment 2(2), pp. 1626--1629. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. E.D. Ubeyli and E. Dogdu. 2010. Automatic Detection of Erythemato-squamous Diseases Using K-means Clustering. Journal of Medical Systems, 34(2), pp. 179--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Vijayanand, D. Devaraj, and B. Kannapiran. 2018. Intrusion Detection System for Wireless Mesh Network Using Multiple Support Vector Machine Classifiers with Genetic-Algorithm-based Feature Selection. Computers & Security 77, pp. 304--314.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Zaharia, et al. 2016. Apache Spark: A Unified Engine for Big Data Processing. Communications of the ACM 59(11), pp. 56--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. C. Zhang and Y. Ma, eds. 2012. Ensemble Machine Learning: Methods and Applications. Springer Science & Business Media, Springer.Google ScholarGoogle Scholar
  35. P. Zikopoulos and C. Eaton. 2011. Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Osborne Media. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. Zuech, T. M. Khoshgoftaar, and R. Wald. 2015. Intrusion Detection and Big Heterogeneous Data: A Survey. Journal of Big Data, 2(3), pp. 1--41.Google ScholarGoogle Scholar

Index Terms

  1. Intrusion Detection Using Big Data and Deep Learning Techniques

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ACM SE '19: Proceedings of the 2019 ACM Southeast Conference
        April 2019
        295 pages
        ISBN:9781450362511
        DOI:10.1145/3299815
        • Conference Chair:
        • Dan Lo,
        • Program Chair:
        • Donghyun Kim,
        • Publications Chair:
        • Eric Gamess

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 April 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate178of377submissions,47%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader