Skip to main content

31.08.2018

A fast unsupervised preprocessing method for network monitoring

verfasst von: Martin Andreoni Lopez, Diogo M. F. Mattos, Otto Carlos M. B. Duarte, Guy Pujolle

Erschienen in: Annals of Telecommunications | Ausgabe 3-4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Identifying a network misuse takes days or even weeks, and network administrators usually neglect zero-day threats until a large number of malicious users exploit them. Besides, security applications, such as anomaly detection and attack mitigation systems, must apply real-time monitoring to reduce the impacts of security incidents. Thus, information processing time should be as small as possible to enable an effective defense against attacks. In this paper, we present a fast preprocessing method for network traffic classification based on feature correlation and feature normalization. Our proposed method couples a normalization and feature selection algorithms. We evaluate the proposed algorithms against three different datasets for eight different machine learning classification algorithms. Our proposed normalization algorithm reduces the classification error rate when compared with traditional methods. Our feature selection algorithm chooses an optimized subset of features improving accuracy by more than 11% within a 100-fold reduction in processing time when compared to traditional feature selection and feature reduction algorithms. The preprocessing method is performed in batch and streaming data, being able to detect concept-drift.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Features refer to the original set of attributes that describe the data. Variables refer to the input of the machine learning algorithms applied over the data. If no preprocessing method handles the original data, the set of variables and the set of features are the same.
 
2
Anonymized data can be asked by sending an email contact to the authors
 
Literatur
1.
Zurück zum Zitat Hu P, Li H, Fu H, Cansever D, Mohapatra P (2015) Dynamic defense strategy against advanced persistent threat with insiders. In: IEEE conference on computer communications (INFOCOM), vol 4, pp 747–755 Hu P, Li H, Fu H, Cansever D, Mohapatra P (2015) Dynamic defense strategy against advanced persistent threat with insiders. In: IEEE conference on computer communications (INFOCOM), vol 4, pp 747–755
4.
Zurück zum Zitat Paxson V (1999) Bro: a system for detecting network intruders in real-time. Comput Netw 31(23–24):2435–2463CrossRef Paxson V (1999) Bro: a system for detecting network intruders in real-time. Comput Netw 31(23–24):2435–2463CrossRef
5.
Zurück zum Zitat Roesch M (1999) Snort-lightweight intrusion detection for networks. In: Proceedings of the 13th USENIX Conference on System Administration. USENIX Association, pp 229–238 Roesch M (1999) Snort-lightweight intrusion detection for networks. In: Proceedings of the 13th USENIX Conference on System Administration. USENIX Association, pp 229–238
6.
Zurück zum Zitat Vallentin M, Sommer R, Lee J, Leres C, Paxson V, Tierney B (2007) The NIDS cluster: scalable, stateful network intrusion detection on commodity hardware. In: Recent advances in intrusion detection. Springer, Berlin, pp 107–126 Vallentin M, Sommer R, Lee J, Leres C, Paxson V, Tierney B (2007) The NIDS cluster: scalable, stateful network intrusion detection on commodity hardware. In: Recent advances in intrusion detection. Springer, Berlin, pp 107–126
7.
Zurück zum Zitat Bar A, Finamore A, Casas P, Golab l., Mellia M (2014) Large-scale network traffic monitoring with DBStream, a system for rolling big data analysis. In: 2014 IEEE International Conference on Big Data (Big Data). IEEE, vol 10, pp 165–170 Bar A, Finamore A, Casas P, Golab l., Mellia M (2014) Large-scale network traffic monitoring with DBStream, a system for rolling big data analysis. In: 2014 IEEE International Conference on Big Data (Big Data). IEEE, vol 10, pp 165–170
8.
Zurück zum Zitat Stonebraker M, Çetintemel U, Zdonik S (2005) The 8 requirements of real-time stream processing. ACM SIGMOD Rec 34(4):42–47CrossRef Stonebraker M, Çetintemel U, Zdonik S (2005) The 8 requirements of real-time stream processing. ACM SIGMOD Rec 34(4):42–47CrossRef
9.
Zurück zum Zitat Mayhew M, Atighetchi M, Adler A, Greenstadt R (2015) Use of machine learning in big data analytics for insider threat detection. In: IEEE Military Communications Conference. MILCOM, vol 10, pp 915–922 Mayhew M, Atighetchi M, Adler A, Greenstadt R (2015) Use of machine learning in big data analytics for insider threat detection. In: IEEE Military Communications Conference. MILCOM, vol 10, pp 915–922
10.
Zurück zum Zitat Mladenić D (2006) Feature selection for dimensionality reduction. In: Saunders C, Grobelnik M, Gunn S, Shawe-Taylor J (eds) Subspace, latent structure and feature selection (slsfs): statistical and optimization perspectives workshop, pp 84–102. Springer, Bohinj Mladenić D (2006) Feature selection for dimensionality reduction. In: Saunders C, Grobelnik M, Gunn S, Shawe-Taylor J (eds) Subspace, latent structure and feature selection (slsfs): statistical and optimization perspectives workshop, pp 84–102. Springer, Bohinj
11.
Zurück zum Zitat Bifet A, Morales GDF (2014) Big data stream learning with Samoa. In: 2014 IEEE International Conference on Data Mining Workshop, pp 1199–1202 Bifet A, Morales GDF (2014) Big data stream learning with Samoa. In: 2014 IEEE International Conference on Data Mining Workshop, pp 1199–1202
12.
Zurück zum Zitat Khamassi I, Sayed-Mouchaweh M, Hammami M, Ghédira K (2018) Discussion and review on evolving data streams and concept drift adapting. Evol Syst 9(1):1–23CrossRef Khamassi I, Sayed-Mouchaweh M, Hammami M, Ghédira K (2018) Discussion and review on evolving data streams and concept drift adapting. Evol Syst 9(1):1–23CrossRef
13.
Zurück zum Zitat Rahm E, Do HH (2000) Data cleaning: problems and current approaches. IEEE Bullet Tech Comm Data Eng 23(4):3–13 Rahm E, Do HH (2000) Data cleaning: problems and current approaches. IEEE Bullet Tech Comm Data Eng 23(4):3–13
14.
Zurück zum Zitat García S, Luengo J, Herrera F (2016) Data preprocessing in data mining. Springer, Berlin García S, Luengo J, Herrera F (2016) Data preprocessing in data mining. Springer, Berlin
15.
Zurück zum Zitat Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1/2):23– 69CrossRefMATH Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1/2):23– 69CrossRefMATH
16.
Zurück zum Zitat Schölkopf B, Smola AJ, Müller K-R (1999) Kernel principal component analysis. In: Advances in kernel methods. MIT Press, Cambridge, pp 327–352 Schölkopf B, Smola AJ, Müller K-R (1999) Kernel principal component analysis. In: Advances in kernel methods. MIT Press, Cambridge, pp 327–352
18.
Zurück zum Zitat Zhang S, Zhang C, Yang Q (2003) Data preparation for data mining. Appl Artif Intell 17(5–6):375–381CrossRef Zhang S, Zhang C, Yang Q (2003) Data preparation for data mining. Appl Artif Intell 17(5–6):375–381CrossRef
19.
Zurück zum Zitat Tan S (2005) Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Syst Appl 28(4):667–671CrossRef Tan S (2005) Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Syst Appl 28(4):667–671CrossRef
20.
Zurück zum Zitat Ramérez-Gallego S, Krawczyk B, García S, Woźniak M, Herrera F (2017) A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing Ramérez-Gallego S, Krawczyk B, García S, Woźniak M, Herrera F (2017) A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing
21.
Zurück zum Zitat Van Der Maaten L, Postma E, den Herik J (2009) Dimensionality reduction: a comparative. J Mach Learn Res 10:66–71 Van Der Maaten L, Postma E, den Herik J (2009) Dimensionality reduction: a comparative. J Mach Learn Res 10:66–71
22.
Zurück zum Zitat Ang JC, Mirzal A, Haron H, Hamed HNA (2016) Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans Comput Biol Bioinform 13(5):971–989CrossRef Ang JC, Mirzal A, Haron H, Hamed HNA (2016) Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans Comput Biol Bioinform 13(5):971–989CrossRef
23.
Zurück zum Zitat Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28CrossRef Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28CrossRef
24.
Zurück zum Zitat Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1-3):389–422CrossRefMATH Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1-3):389–422CrossRefMATH
25.
Zurück zum Zitat Hall MA (1999) Correlation-based feature selection for machine learning. Ph.D. dissertation, The University of Waikato Hall MA (1999) Correlation-based feature selection for machine learning. Ph.D. dissertation, The University of Waikato
26.
Zurück zum Zitat Kumar A, Sung M, Xu JJ, Wang J (2004) Data streaming algorithms for efficient and accurate estimation of flow size distribution. In: ACM SIGMETRICS performance evaluation review. ACM, vol 132, no. 1, pp 177-188 Kumar A, Sung M, Xu JJ, Wang J (2004) Data streaming algorithms for efficient and accurate estimation of flow size distribution. In: ACM SIGMETRICS performance evaluation review. ACM, vol 132, no. 1, pp 177-188
27.
Zurück zum Zitat Ben-Haim Y, Tom-tov E (2010) A streaming parallel decision tree algorithm. J Mach Learn Res 11:849–872MathSciNetMATH Ben-Haim Y, Tom-tov E (2010) A streaming parallel decision tree algorithm. J Mach Learn Res 11:849–872MathSciNetMATH
28.
Zurück zum Zitat Webb GI (2014) Contrary to popular belief incremental discretization can be sound, computationally efficient and extremely useful for streaming data. In: IEEE International Conference on Data Mining (ICDM). IEEE, pp 1031–1036 Webb GI (2014) Contrary to popular belief incremental discretization can be sound, computationally efficient and extremely useful for streaming data. In: IEEE International Conference on Data Mining (ICDM). IEEE, pp 1031–1036
29.
Zurück zum Zitat Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD CUP 99 data set. In: 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp 1–6 Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD CUP 99 data set. In: 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp 1–6
30.
Zurück zum Zitat Lobato A, Andreoni Lopez M, Sanz IJ, Cárdenas A, Duarte OCMB, Pujolle G (2018) An adaptive real-time architecture for zero-day threat detection. In: IEEE ICC 2018 Next Generation Networking and Internet Symposium (ICC’18 NGNI), Kansas City, USA Lobato A, Andreoni Lopez M, Sanz IJ, Cárdenas A, Duarte OCMB, Pujolle G (2018) An adaptive real-time architecture for zero-day threat detection. In: IEEE ICC 2018 Next Generation Networking and Internet Symposium (ICC’18 NGNI), Kansas City, USA
31.
Zurück zum Zitat Andreoni Lopez M, Silva RS, Alvarenga ID, Rebello GAF, Sanz IJ, Lobato AGP, Mattos DMF, Duarte OCMB, Pujolle G (2017) Collecting and characterizing a real broadband access network traffic dataset. In: IEEE/IFIP 1st Cyber Security in Networking Conference (CSNet), pp 1–8 Andreoni Lopez M, Silva RS, Alvarenga ID, Rebello GAF, Sanz IJ, Lobato AGP, Mattos DMF, Duarte OCMB, Pujolle G (2017) Collecting and characterizing a real broadband access network traffic dataset. In: IEEE/IFIP 1st Cyber Security in Networking Conference (CSNet), pp 1–8
32.
Zurück zum Zitat Hu H, Kantardzic M (2016) Smart preprocessing improves data stream mining. In: 49th Hawaii International Conference on System Sciences (HICSS). IEEE, pp 1749–1757 Hu H, Kantardzic M (2016) Smart preprocessing improves data stream mining. In: 49th Hawaii International Conference on System Sciences (HICSS). IEEE, pp 1749–1757
34.
Zurück zum Zitat Prasath VBS, Alfeilat HAA, Lasassmeh O, Hassanat ABA Distance and similarity measures effect on the performance of k-nearest neighbor classifier - a review, CoRR. [Online]. arXiv:1708.04321 Prasath VBS, Alfeilat HAA, Lasassmeh O, Hassanat ABA Distance and similarity measures effect on the performance of k-nearest neighbor classifier - a review, CoRR. [Online]. arXiv:1708.​04321
35.
Zurück zum Zitat Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the Twenty-First International Conference on Machine Learning. ACM, pp 116 Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the Twenty-First International Conference on Machine Learning. ACM, pp 116
36.
Zurück zum Zitat Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRefMATH Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRefMATH
37.
Zurück zum Zitat Perkins S, Theiler J (2003) Online feature selection using grafting. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp 592–599 Perkins S, Theiler J (2003) Online feature selection using grafting. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp 592–599
38.
Zurück zum Zitat Zhou J, Foster DP, Stine RA, Ungar LH (2006) Streamwise feature selection. J Mach Learn Res 7 (Sep):1861–1885MathSciNetMATH Zhou J, Foster DP, Stine RA, Ungar LH (2006) Streamwise feature selection. J Mach Learn Res 7 (Sep):1861–1885MathSciNetMATH
39.
Zurück zum Zitat Wu X, Yu K, Ding W, Wang H, Zhu X (2013) Online feature selection with streaming features. IEEE Trans Pattern Anal Mach Intell 35(5):1178–1192CrossRef Wu X, Yu K, Ding W, Wang H, Zhu X (2013) Online feature selection with streaming features. IEEE Trans Pattern Anal Mach Intell 35(5):1178–1192CrossRef
Metadaten
Titel
A fast unsupervised preprocessing method for network monitoring
verfasst von
Martin Andreoni Lopez
Diogo M. F. Mattos
Otto Carlos M. B. Duarte
Guy Pujolle
Publikationsdatum
31.08.2018
Verlag
Springer International Publishing
Erschienen in
Annals of Telecommunications / Ausgabe 3-4/2019
Print ISSN: 0003-4347
Elektronische ISSN: 1958-9395
DOI
https://doi.org/10.1007/s12243-018-0663-2

Neuer Inhalt