Top

Published in:

2019 | OriginalPaper | Chapter

Investigating the Effective Use of Machine Learning Algorithms in Network Intruder Detection Systems

Authors : Intisar S. Al-Mandhari, L. Guan, E. A. Edirisinghe

Published in: Advances in Information and Communication Networks

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Research into the use of machine learning techniques for network intrusion detection, especially carried out with respect to the popular public dataset, KDD cup 99, have become commonplace during the past decade. The recent popularity of cloud-based computing and the realization of the associated risks are the main reasons for this research thrust. The proposed research demonstrates that machine learning algorithms can be effectively used to enhance the performance of existing intrusion detection systems despite the high misclassification rates reported in the literature. This paper reports on an empirical investigation to determine the underlying causes of the poor performance of some of the well-known machine learning classifiers. Especially when learning from minor classes/attacks. The main factor is that the KDD cup 99 dataset, which is popularly used in most of the existing research, is an imbalanced dataset due to the nature of the specific intrusion detection domain, i.e. some attacks being rare and some being very frequent. Therefore, there is a significant imbalance amongst the classes in the dataset. Based on the number of the classes in the dataset, the imbalance dataset issue can be considered a binary problem or a multi-class problem. Most of the researchers focus on conducting a binary class classification as conducting a multi-class classification is complex. In the research proposed in this paper, we consider the problem as a multi-class classification task. The paper investigates the use of different machine learning algorithms in order to overcome the common misclassification problems that have been faced by researchers who used the imbalance KDD cup 99 dataset for their investigations. Recommendations are made as for which classifier is best for the classification of imbalanced data.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Walsh Sampling with Incomplete Noisy Signals

next chapter Anonymization of System Logs for Preserving Privacy and Reducing Storage

Modi, C., Patel, D., Borisaniya, B., Patel, A., et al.: A survey on security issues and solutions at different layers of Cloud computing. J. Supercomput. 63(2), 561–592 (2013)CrossRef

Chen, Y., Sion, R.: On securing untrusted clouds with cryptography. Science 109–114 (2010)

Sommer, R., Paxson, V.: Outside the closed world: on using machine learning for network intrusion detection, pp. 305–316 (2010). http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5504793&contentType=Conference+Publications&queryText=R.+Sommer+and+V.+Paxson,+Outside+the+Closed+World:+On+Using+Machine++Learning+For+Network+Intrusion+Detection

Naiping, S.N.S., Genyuan, Z.G.Z.: A study on intrusion detection based on data mining. In: International Conference of Information Science and Management Engineering, ISME, vol. 1, pp. 8–15 (2010)

Almutairi, A.: Intrusion detection using data mining techniques

McHugh, J.: Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Trans. Inf. Syst. Secur. 3(4), 262–294 (2000)CrossRef

Tavallaee, M., et al.: A detailed analysis of the KDD CUP 99 data set. In: IEEE Symposium on Computational Intelligence for Security and Defense Applications, CISDA 2009, (Cisda), pp. 1–6 (2009)

Tavallaee, M.: An Adaptive Intrusion Detection System. Sdstate.Edu. (2011)

Thomas, C., Balakrishnan, N.: Performance enhancement of intrusion detection systems using advances in sensor fusion. In: 11th International Conference on Information Fusion, pp. 1–7 (2008)

10.

Tran, T., et al.: Network intrusion detection using machine learning and voting techniques. In: Machine Learning, pp. 7–10 (2011). http://cdn.intechweb.org/pdfs/10441.pdf

11.

Tsai, C.H., Chang, L.C., Chiang, H.C.: Forecasting of ozone episode days by cost-sensitive neural network methods. Sci. Total Environ. 407(6), 2124–2135 (2009). https://doi.org/10.1016/j.scitotenv.2008.12.007CrossRef

12.

Troesch, M., Walsh, I.: Machine learning for network intrusion detection, pp. 1–5 (2014)

13.

Juma, S., et al.: Machine learning techniques for intrusion detection system: a review. J. Theor. Appl. Inf. Theor. 72(3), 422–429 (2015). http://research.ijcaonline.org/volume119/number3/pxc3903678.pdf

14.

Panda, M., et al.: Network intrusion detection system: a machine learning approach. Intell. Decis. Technol. 5(4), 347–356 (2011). http://dx.doi.org/10.3233/IDT-20110117%5Cnhttp://iospress.metapress.com/content/911371h6266k5h4p/

15.

Kubat, M.:. Neural networks: a comprehensive foundation by Simon Haykin, Macmillan, 1994. The Knowledge Engineering Review 13(4), pp. 409–412 (1999). ISBN 0-02-352781-7

16.

LeCun, Y.A., et al.: Efficient backprop. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7700 (2012)

17.

Engen, V.: Machine learning for network based intrusion detection. Int. J. (2010)

18.

López, V., et al.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013). https://doi.org/10.1016/j.ins.2013.07.007CrossRef

19.

Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explor. 6(1), 7–19 (2004)CrossRef

20.

Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Data Mining and Knowledge Discovery Handbook, pp. 853–867 (2005). http://link.springer.com/10.1007/0-387-25465-X_40

21.

Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Handling imbalanced datasets: a review. Science 30(1), 25–36 (2006). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.96.9248&rep=rep1&type=pdf

22.

Weiss, G.M., Provost, F.: Learning when training data are costly: the effect of class distribution on tree induction. J. Artif. Intell. Res. 19, 315–354 (2003)CrossRef

23.

Barandela, R., et al.: Strategies for learning in class imbalance problems.pdf. Pattern Recog. 36, 849–851 (2003)CrossRef

24.

Barandela, R., Sánchez, J.S., Valdovinos, R.M.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6(3), 245–256 (2003)MathSciNetCrossRef

25.

Ducange, P., Lazzerini, B., Marcelloni, F.: Multi-objective genetic fuzzy classifiers for imbalanced and cost-sensitive datasets. Soft. Comput. 14(7), 713–728 (2010)CrossRef

26.

Lin, W.J., Chen, J.J.: Class-imbalanced classifiers for high-dimensional data. Brief. Bioinform. 14(1), 13–26 (2013)CrossRef

27.

Wang, J.: Advanced attack tree based intrusion detection (2012)

28.

Wang, J., et al.: Extract minimum positive and maximum negative features for imbalanced binary classification. Pattern Recogn. 45(3), 1136–1145 (2012). https://doi.org/10.1016/j.patcog.2011.09.004CrossRef

29.

Batuwita, R., Palade, V.: Class imbalance learning methods for support vector. imbalanced learning: foundations, algorithms, applications, pp. 83–100 (2013)

30.

García-Pedrajas, N., et al.: Class imbalance methods for translation initiation site recognition in DNA sequences. Knowl. Based Syst. 25(1), 22–34 (2012)CrossRef

31.

Domingos, P.: MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 55, pp. 155–164 (1999). http://portal.acm.org/citation.cfm?id=312129.312220&type=series

32.

Zhou, Z., Member, S., Liu, X.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18(1), 63–77 (2006)CrossRef

33.

Błaszczyński, J., et al.: Integrating selective pre-processing of imbalanced data with Ivotes ensemble. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNAI, vol. 6086, pp. 148–157 (2010)

34.

Chawla, N.V., et al.: SMOTEBoost: improving prediction. In: Lecture Notes in Computer Science, vol. 2838, pp.107–119 (2003)

35.

Chawla, N.V.: C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In: Proceedings of the International Conference on Machine Learning, Workshop Learning from Imbalanced Data Set II (2003). http://www.site.uottawa.ca:4321/~nat/Workshop2003/chawla.pdf

36.

Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., et al.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40(1), 185–197 (2010)CrossRef

37.

Batuwita, R., Palade, V.: Efficient resampling methods for training support vector machines with imbalanced datasets. In: Proceedings of the International Joint Conference on Neural Networks (2010)

38.

Fernandez, A., et al.: A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159(18), 2378–2398 (2008)MathSciNetCrossRef

39.

Fernández, A., del Jesus, M.J., Herrera, F.: On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets. Inf. Sci. 180(8), 1268–1291 (2010). https://doi.org/10.1016/j.ins.2009.12.014MathSciNetCrossRef

40.

Weiss, G., McCarthy, K., Zabar, B.: Cost-sensitive learning vs. sampling: which is best for handling unbalanced classes with unequal error costs? Dmin, pp. 1–7 (2007). http://storm.cis.fordham.edu/~gweiss/papers/dmin07-weiss.pdf

41.

Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the 2000 International Conference on Artificial Intelligence, pp. 111–117 (2000)

42.

Van Hulse, J.: An empirical comparison of repetitive undersampling techniques, pp. 29–34 (2009)

43.

Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRef

44.

Burez, J., Van den Poel, D.: Handling class imbalance in customer churn prediction. Expert Syst. Appl. 36(3), 4626–4636 (2009). https://doi.org/10.1016/j.eswa.2008.05.027CrossRef

45.

Adamu Teshome, D., Rao, V.S.: A cost sensitive machine learning approach for intrusion detection. Glob. J. Comput. Sci. Technol. 14(6) (2014)

46.

Choudhury, S., Bhowal, A.: Comparative analysis of machine learning algorithms along with classifiers for network intrusion detection. In: International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), (May), pp. 89–95 (2015). http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7225395

47.

He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRef

48.

Mohammad, M.N., Sulaiman, N., Muhsin, O.A.: A novel Intrusion Detection System by using intelligent data mining in WEKA environment. Procedia Comput. Sci. 3, 1237–1242 (2011). https://doi.org/10.1016/j.procs.2010.12.198CrossRef

49.

Depren, O., Topallar, M., Anarim E., Ciliz, M.K.: An intelligent intrusion detection system for anomaly and misuse detection in computer networks. Expert Syst. Appl., 29, 713–722 (2005)CrossRef

50.

Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International Joint Conference on Artificial Intelligence (IJCAI), (1995)

Title: Investigating the Effective Use of Machine Learning Algorithms in Network Intruder Detection Systems
Authors: Intisar S. Al-Mandhari
L. Guan
E. A. Edirisinghe
Publisher: Springer International Publishing
Book: Advances in Information and Communication Networks
Print ISBN: 978-3-030-03404-7

Electronic ISBN: 978-3-030-03405-4

Copyright Year: 2019
DOI: https://doi.org/10.1007/978-3-030-03405-4_10

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"