Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 7/2023

05.01.2023 | Original Article

Traffic data extraction and labeling for machine learning based attack detection in IoT networks

verfasst von: Hayelom Gebrye, Yong Wang, Fagen Li

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 7/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The fast expansion of the Internet of Things (IoT) networks raises the possibility of further network threats. In today’s world, network traffic analysis has become an increasingly critical and useful tool for monitoring network traffic in general and analyzing attack patterns in particular. A few years ago, distributed denial-of-service attacks on IoT networks were considered the most pressing problem that needed to be addressed. The absence of high-quality datasets is one of the main obstacles to applying DDOS detection systems based on machine learning. Researchers have developed numerous methods to extract and analyze information from recorded files. From a literature review, it is clear that most of these tools share similar drawbacks. In this study, we proposed an intelligent raw network data extractor and labeler tool by incorporating the limitations of the tools that are available to transform PCAP to CSV. To generate and process a high-quality DDOS attack dataset suitable for machine learning models, we employed several data preprocessing operations on the selected network intrusion dataset. To confirm the validity and acceptability of the dataset, we tested different models. Among the models tested, the random forest was the most accurate in detecting the DDOS attack.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Roopak M, Tian GY, Chambers J (2019) Deep learning models for cyber security in iot networks. In: 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0452–0457. IEEE Roopak M, Tian GY, Chambers J (2019) Deep learning models for cyber security in iot networks. In: 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0452–0457. IEEE
2.
Zurück zum Zitat Iglesias F, Zseby T (2015) Analysis of network traffic features for anomaly detection. Mach Learn 101(1):59–84MathSciNetCrossRef Iglesias F, Zseby T (2015) Analysis of network traffic features for anomaly detection. Mach Learn 101(1):59–84MathSciNetCrossRef
3.
Zurück zum Zitat Frank Jr CV (2019) Mirai bot scanner summation prototype Frank Jr CV (2019) Mirai bot scanner summation prototype
4.
Zurück zum Zitat Orebaugh A, Ramirez G, Beale J (2006) Wireshark & Ethereal Network Protocol Analyzer Toolkit, Elsevier Orebaugh A, Ramirez G, Beale J (2006) Wireshark & Ethereal Network Protocol Analyzer Toolkit, Elsevier
5.
Zurück zum Zitat Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetMATH Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetMATH
6.
Zurück zum Zitat Wei T, Li X, Stojanovic V (2021) Input-to-state stability of impulsive reaction-diffusion neural networks with infinite distributed delays. Nonlinear Dyn 103(2):1733–1755CrossRef Wei T, Li X, Stojanovic V (2021) Input-to-state stability of impulsive reaction-diffusion neural networks with infinite distributed delays. Nonlinear Dyn 103(2):1733–1755CrossRef
7.
Zurück zum Zitat Tao H, Li J, Chen Y, Stojanovic V, Yang H (2020) Robust point-to-point iterative learning control with trial-varying initial conditions. IET Control Theory Appl 14(19):3344–3350MathSciNetCrossRef Tao H, Li J, Chen Y, Stojanovic V, Yang H (2020) Robust point-to-point iterative learning control with trial-varying initial conditions. IET Control Theory Appl 14(19):3344–3350MathSciNetCrossRef
8.
Zurück zum Zitat Xu Z, Li X, Stojanovic V (2021) Exponential stability of nonlinear state-dependent delayed impulsive systems with applications. Nonlinear Anal Hybrid Syst 42:101088MathSciNetCrossRefMATH Xu Z, Li X, Stojanovic V (2021) Exponential stability of nonlinear state-dependent delayed impulsive systems with applications. Nonlinear Anal Hybrid Syst 42:101088MathSciNetCrossRefMATH
12.
Zurück zum Zitat Davidoff S, Ham J (2012) Network forensics: tracking hackers through cyberspace vol. 2014. Prentice hall Upper Saddle River Davidoff S, Ham J (2012) Network forensics: tracking hackers through cyberspace vol. 2014. Prentice hall Upper Saddle River
13.
Zurück zum Zitat Deck S, Khiabani H (2015) Extracting files from network packet captures. SANS Institute-InfoSec Reading Room Deck S, Khiabani H (2015) Extracting files from network packet captures. SANS Institute-InfoSec Reading Room
14.
15.
Zurück zum Zitat Alothman B (2019) Raw network traffic data preprocessing and preparation for automatic analysis. In: 2019 International Conference on Cyber Security and Protection of Digital Services (Cyber Security), pp. 1–5. IEEE Alothman B (2019) Raw network traffic data preprocessing and preparation for automatic analysis. In: 2019 International Conference on Cyber Security and Protection of Digital Services (Cyber Security), pp. 1–5. IEEE
16.
Zurück zum Zitat Draper-Gil G, Lashkari AH, Mamun MSI, Ghorbani AA (2016) Characterization of encrypted and vpn traffic using time-related. In: Proceedings of the 2nd International Conference on Information Systems Security and Privacy (ICISSP), pp. 407–414 Draper-Gil G, Lashkari AH, Mamun MSI, Ghorbani AA (2016) Characterization of encrypted and vpn traffic using time-related. In: Proceedings of the 2nd International Conference on Information Systems Security and Privacy (ICISSP), pp. 407–414
17.
Zurück zum Zitat Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the kdd cup 99 data set. In: 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp. 1–6. Ieee Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the kdd cup 99 data set. In: 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp. 1–6. Ieee
18.
Zurück zum Zitat Kayacık HG, Zincir-Heywood AN, Heywood MI Selecting features for intrusion detection: a feature relevance analysis on kdd 99 benchmark Kayacık HG, Zincir-Heywood AN, Heywood MI Selecting features for intrusion detection: a feature relevance analysis on kdd 99 benchmark
19.
Zurück zum Zitat Flood R A data-driven toolset using containers to generate datasets for network intrusion detection Flood R A data-driven toolset using containers to generate datasets for network intrusion detection
20.
Zurück zum Zitat Mukkavilli SK, Shetty S, Hong L et al (2016) Generation of labelled datasets to quantify the impact of security threats to cloud data centers. J Inf Secur 7(03):172 Mukkavilli SK, Shetty S, Hong L et al (2016) Generation of labelled datasets to quantify the impact of security threats to cloud data centers. J Inf Secur 7(03):172
21.
Zurück zum Zitat Alzahrani S, Hong L (2018) Generation of ddos attack dataset for effective ids development and evaluation. J Inf Secur 9(4):225–241 Alzahrani S, Hong L (2018) Generation of ddos attack dataset for effective ids development and evaluation. J Inf Secur 9(4):225–241
22.
Zurück zum Zitat Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6):417CrossRefMATH Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6):417CrossRefMATH
23.
Zurück zum Zitat Izenman A (2013) Linear discriminant analysis in modern multivariate statistical techniques: 237–280. Springer, New York Izenman A (2013) Linear discriminant analysis in modern multivariate statistical techniques: 237–280. Springer, New York
24.
Zurück zum Zitat Legendre P, De Cáceres M (2013) Beta diversity as the variance of community data: dissimilarity coefficients and partitioning. Ecol Lett 16(8):951–963CrossRef Legendre P, De Cáceres M (2013) Beta diversity as the variance of community data: dissimilarity coefficients and partitioning. Ecol Lett 16(8):951–963CrossRef
25.
Zurück zum Zitat Wenskovitch J, Crandell I, Ramakrishnan N, House L, North C (2017) Towards a systematic combination of dimension reduction and clustering in visual analytics. IEEE Trans Visual Comput Graphics 24(1):131–141CrossRef Wenskovitch J, Crandell I, Ramakrishnan N, House L, North C (2017) Towards a systematic combination of dimension reduction and clustering in visual analytics. IEEE Trans Visual Comput Graphics 24(1):131–141CrossRef
26.
Zurück zum Zitat Fujiwara T, Kwon O-H, Ma K-L (2019) Supporting analysis of dimensionality reduction results with contrastive learning. IEEE Trans Visual Comput Graphics 26(1):45–55CrossRef Fujiwara T, Kwon O-H, Ma K-L (2019) Supporting analysis of dimensionality reduction results with contrastive learning. IEEE Trans Visual Comput Graphics 26(1):45–55CrossRef
29.
Zurück zum Zitat Holland T (2004) Understanding ips and ids: Using ips and ids together for defense in depth. SANS Institute Holland T (2004) Understanding ips and ids: Using ips and ids together for defense in depth. SANS Institute
30.
Zurück zum Zitat Heidemann J, Mirkovic J, Hardaker W, Kallitsis M (2021) Collecting, labeling, and using networking data: the intersection of ai and networking Heidemann J, Mirkovic J, Hardaker W, Kallitsis M (2021) Collecting, labeling, and using networking data: the intersection of ai and networking
31.
Zurück zum Zitat Fukuda K, Heidemann J, Qadeer A (2017) Detecting malicious activity with dns backscatter over time. IEEE/ACM Trans Netw 25(5):3203–3218CrossRef Fukuda K, Heidemann J, Qadeer A (2017) Detecting malicious activity with dns backscatter over time. IEEE/ACM Trans Netw 25(5):3203–3218CrossRef
32.
33.
Zurück zum Zitat Csubák D, Szücs K, Vörös P, Kiss A (2016) Big data testbed for network attack detection. Acta Polytechnica Hungarica 13(2):47–57 Csubák D, Szücs K, Vörös P, Kiss A (2016) Big data testbed for network attack detection. Acta Polytechnica Hungarica 13(2):47–57
34.
Zurück zum Zitat Cunningham RK, Lippmann RP, Fried DJ, Garfinkel SL, Graf I, Kendall KR, Webster SE, Wyschogrod D, Zissman MA (1999) Evaluating intrusion detection systems without attacking your friends: the 1998 Darpa intrusion detection evaluation. Technical report, Massachusetts Inst Of Tech Lexington Lincoln Lab Cunningham RK, Lippmann RP, Fried DJ, Garfinkel SL, Graf I, Kendall KR, Webster SE, Wyschogrod D, Zissman MA (1999) Evaluating intrusion detection systems without attacking your friends: the 1998 Darpa intrusion detection evaluation. Technical report, Massachusetts Inst Of Tech Lexington Lincoln Lab
35.
Zurück zum Zitat Haines JW, Rossey LM, Lippmann RP, Cunningham RK (2001) Extending the darpa off-line intrusion detection evaluations. In: Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX’01, vol. 1, pp. 35–45. IEEE Haines JW, Rossey LM, Lippmann RP, Cunningham RK (2001) Extending the darpa off-line intrusion detection evaluations. In: Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX’01, vol. 1, pp. 35–45. IEEE
36.
Zurück zum Zitat Lee W, Stolfo SJ (2000) A framework for constructing features and models for intrusion detection systems. ACM Transact Inform Syst Secur (TiSSEC) 3(4):227–261CrossRef Lee W, Stolfo SJ (2000) A framework for constructing features and models for intrusion detection systems. ACM Transact Inform Syst Secur (TiSSEC) 3(4):227–261CrossRef
37.
Zurück zum Zitat Sperotto A, Sadre R, Vliet Fv, Pras A (2009) A labeled data set for flow-based intrusion detection. In: International Workshop on IP Operations and Management, pp. 39–50. Springer Sperotto A, Sadre R, Vliet Fv, Pras A (2009) A labeled data set for flow-based intrusion detection. In: International Workshop on IP Operations and Management, pp. 39–50. Springer
38.
Zurück zum Zitat Sangster B, O’Connor T, Cook T, Fanelli R, Dean E, Morrell C, Conti GJ (2009) Toward instrumenting network warfare competitions to generate labeled datasets. In: CSET Sangster B, O’Connor T, Cook T, Fanelli R, Dean E, Morrell C, Conti GJ (2009) Toward instrumenting network warfare competitions to generate labeled datasets. In: CSET
39.
Zurück zum Zitat Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA (2012) Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput Secur 31(3):357–374CrossRef Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA (2012) Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput Secur 31(3):357–374CrossRef
40.
Zurück zum Zitat Moustafa N, Slay J (2015) Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In: 2015 Military Communications and Information Systems Conference (MilCIS), pp. 1–6. IEEE Moustafa N, Slay J (2015) Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In: 2015 Military Communications and Information Systems Conference (MilCIS), pp. 1–6. IEEE
41.
Zurück zum Zitat Kolias C, Kambourakis G, Stavrou A, Gritzalis S (2015) Intrusion detection in 802.11 networks: empirical evaluation of threats and a public dataset. IEEE Communications Surveys & Tutorials 18(1), 184–208 Kolias C, Kambourakis G, Stavrou A, Gritzalis S (2015) Intrusion detection in 802.11 networks: empirical evaluation of threats and a public dataset. IEEE Communications Surveys & Tutorials 18(1), 184–208
42.
Zurück zum Zitat Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp 1:108–116 Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp 1:108–116
43.
Zurück zum Zitat Lashkari AH, Draper-Gil G, Mamun MSI, Ghorbani AA, et al. (2017) Characterization of tor traffic using time based features. In: ICISSp, pp. 253–262 Lashkari AH, Draper-Gil G, Mamun MSI, Ghorbani AA, et al. (2017) Characterization of tor traffic using time based features. In: ICISSp, pp. 253–262
44.
Zurück zum Zitat Sharafaldin I, Lashkari AH, Hakak S, Ghorbani AA (2019) Developing realistic distributed denial of service (ddos) attack dataset and taxonomy. In: 2019 International Carnahan Conference on Security Technology (ICCST), pp. 1–8. IEEE Sharafaldin I, Lashkari AH, Hakak S, Ghorbani AA (2019) Developing realistic distributed denial of service (ddos) attack dataset and taxonomy. In: 2019 International Carnahan Conference on Security Technology (ICCST), pp. 1–8. IEEE
45.
Zurück zum Zitat Koroniotis N, Moustafa N, Sitnikova E, Turnbull B (2019) Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot dataset. Futur Gener Comput Syst 100:779–796CrossRef Koroniotis N, Moustafa N, Sitnikova E, Turnbull B (2019) Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot dataset. Futur Gener Comput Syst 100:779–796CrossRef
46.
Zurück zum Zitat Meidan Y, Bohadana M, Mathov Y, Mirsky Y, Shabtai A, Breitenbacher D, Elovici Y (2018) N-baiot–network-based detection of iot botnet attacks using deep autoencoders. IEEE Pervasive Comput 17(3):12–22CrossRef Meidan Y, Bohadana M, Mathov Y, Mirsky Y, Shabtai A, Breitenbacher D, Elovici Y (2018) N-baiot–network-based detection of iot botnet attacks using deep autoencoders. IEEE Pervasive Comput 17(3):12–22CrossRef
47.
Zurück zum Zitat Ullah I, Mahmoud QH (2020) A scheme for generating a dataset for anomalous activity detection in iot networks. In: Canadian Conference on Artificial Intelligence, pp. 508–520. Springer Ullah I, Mahmoud QH (2020) A scheme for generating a dataset for anomalous activity detection in iot networks. In: Canadian Conference on Artificial Intelligence, pp. 508–520. Springer
50.
Zurück zum Zitat Haykin S, Network N (2004) A comprehensive foundation. Neural Netw 2(2004):41 Haykin S, Network N (2004) A comprehensive foundation. Neural Netw 2(2004):41
52.
Zurück zum Zitat Kotsiantis SB, Zaharakis I, Pintelas P et al (2007) Supervised machine learning: a review of classification techniques. Emerging artificial intelligence applications in computer engineering 160(1):3–24 Kotsiantis SB, Zaharakis I, Pintelas P et al (2007) Supervised machine learning: a review of classification techniques. Emerging artificial intelligence applications in computer engineering 160(1):3–24
53.
Zurück zum Zitat Friedman J, Hastie T, Tibshirani R, et al. (2001) The elements of statistical learning vol. 1. Springer series in statistics New York Friedman J, Hastie T, Tibshirani R, et al. (2001) The elements of statistical learning vol. 1. Springer series in statistics New York
54.
Zurück zum Zitat Hussain J, Lalmuanawma S (2016) Feature analysis, evaluation and comparisons of classification algorithms based on noisy intrusion dataset. Proc Comput Sci 92:188–198CrossRef Hussain J, Lalmuanawma S (2016) Feature analysis, evaluation and comparisons of classification algorithms based on noisy intrusion dataset. Proc Comput Sci 92:188–198CrossRef
55.
Zurück zum Zitat Fenanir S, Semchedine F, Baadache A (2019) A machine learning-based lightweight intrusion detection system for the internet of things. Rev. d’Intelligence Artif. 33(3):203–211 Fenanir S, Semchedine F, Baadache A (2019) A machine learning-based lightweight intrusion detection system for the internet of things. Rev. d’Intelligence Artif. 33(3):203–211
56.
Zurück zum Zitat Abrar I, Ayub Z, Masoodi F, Bamhdi AM (2020) A machine learning approach for intrusion detection system on nsl-kdd dataset. In: 2020 International Conference on Smart Electronics and Communication (ICOSEC), pp. 919–924. IEEE Abrar I, Ayub Z, Masoodi F, Bamhdi AM (2020) A machine learning approach for intrusion detection system on nsl-kdd dataset. In: 2020 International Conference on Smart Electronics and Communication (ICOSEC), pp. 919–924. IEEE
57.
Zurück zum Zitat Ashraf S, Ahmed T (2020) Sagacious intrusion detection strategy in sensor network. In: 2020 International Conference on UK-China Emerging Technologies (UCET), pp. 1–4. IEEE Ashraf S, Ahmed T (2020) Sagacious intrusion detection strategy in sensor network. In: 2020 International Conference on UK-China Emerging Technologies (UCET), pp. 1–4. IEEE
58.
Zurück zum Zitat Belavagi MC, Muniyal B (2016) Performance evaluation of supervised machine learning algorithms for intrusion detection. Proc Comput Sci 89:117–123CrossRef Belavagi MC, Muniyal B (2016) Performance evaluation of supervised machine learning algorithms for intrusion detection. Proc Comput Sci 89:117–123CrossRef
59.
Zurück zum Zitat Ullah S, Ahmad J, Khan MA, Alkhammash EH, Hadjouni M, Ghadi YY, Saeed F, Pitropakis N (2022) A new intrusion detection system for the internet of things via deep convolutional neural network and feature engineering. Sensors 22(10):3607CrossRef Ullah S, Ahmad J, Khan MA, Alkhammash EH, Hadjouni M, Ghadi YY, Saeed F, Pitropakis N (2022) A new intrusion detection system for the internet of things via deep convolutional neural network and feature engineering. Sensors 22(10):3607CrossRef
Metadaten
Titel
Traffic data extraction and labeling for machine learning based attack detection in IoT networks
verfasst von
Hayelom Gebrye
Yong Wang
Fagen Li
Publikationsdatum
05.01.2023
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 7/2023
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-022-01765-7

Weitere Artikel der Ausgabe 7/2023

International Journal of Machine Learning and Cybernetics 7/2023 Zur Ausgabe

Neuer Inhalt