Top

Published in:

2017 | OriginalPaper | Chapter

ILAB: An Interactive Labelling Strategy for Intrusion Detection

Authors : Anaël Beaugnon, Pierre Chifflier, Francis Bach

Published in: Research in Attacks, Intrusions, and Defenses

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Acquiring a representative labelled dataset is a hurdle that has to be overcome to learn a supervised detection model. Labelling a dataset is particularly expensive in computer security as expert knowledge is required to perform the annotations. In this paper, we introduce ILAB, a novel interactive labelling strategy that helps experts label large datasets for intrusion detection with a reduced workload. First, we compare ILAB with two state-of-the-art labelling strategies on public labelled datasets and demonstrate it is both an effective and a scalable solution. Second, we show ILAB is workable with a real-world annotation project carried out on a large unlabelled NetFlow dataset originating from a production environment. We provide an open source implementation (https://github.com/ANSSI-FR/SecuML/) to allow security experts to label their own datasets and researchers to compare labelling strategies.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Redemption: Real-Time Protection Against Ransomware at End-Hosts

next chapter Precisely and Scalably Vetting JavaScript Bridge in Android Hybrid Apps

Available only for authorised users

http://contagiodump.blogspot.fr/.

http://www.unb.ca/cic/research/datasets/nsl.html.

The IP addresses have been hidden for privacy reasons.

Almgren, M., Jonsson, E.: Using active learning in intrusion detection. In: CSFW, pp. 88–98 (2004)

Antonakakis, M., Perdisci, R., Nadji, Y., Vasiloglou, N., Abu-Nimeh, S., Lee, W., Dagon, D.: From throw-away traffic to bots: detecting the rise of DGA-based malware. In: USENIX Security, pp. 491–506 (2012)

Baldridge, J., Palmer, A.: How well does active learning actually work?: Time-based evaluation of cost-reduction strategies for language documentation. In: EMNLP, pp. 296–305 (2009)

Berlin, K., Slater, D., Saxe, J.: Malicious behavior detection using windows audit logs. In: AISEC, pp. 35–44 (2015)

Bilge, L., Balzarotti, D., Robertson, W., Kirda, E., Kruegel, C.: Disclosure: detecting botnet command and control servers through large-scale netflow analysis. In: ACSAC, pp. 129–138 (2012)

Claise, B.: Cisco systems netflow services export version 9 (2004)

Corona, I., Maiorca, D., Ariu, D., Giacinto, G.: Lux0r: detection of malicious PDF-embedded JavaScript code through discriminant analysis of API references. In: AISEC, pp. 47–57 (2014)

Dasgupta, S., Hsu, D.: Hierarchical sampling for active learning. In: ICML, pp. 208–215 (2008)

Druck, G., Settles, B., McCallum, A.: Active learning by labeling features. In: EMNLP, pp. 81–90 (2009)

10.

Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, Berlin (2001). doi:10.1007/978-0-387-21606-5 MATH

11.

Gascon, H., Yamaguchi, F., Arp, D., Rieck, K.: Structural detection of android malware using embedded call graphs. In: AISEC, pp. 45–54 (2013)

12.

Görnitz, N., Kloft, M., Brefeld, U.: Active and semi-supervised data domain description. In: ECML-PKDD, pp. 407–422 (2009)

13.

Görnitz, N., Kloft, M., Rieck, K., Brefeld, U.: Active learning for network intrusion detection. In: AISEC, pp. 47–54 (2009)

14.

Görnitz, N., Kloft, M.M., Rieck, K., Brefeld, U.: Toward supervised anomaly detection. JAIR 46, 235–262 (2013)MathSciNetMATH

15.

Hachey, B., Alex, B., Becker, M.: Investigating the effects of selective sampling on the annotation task. In: CoNLL, pp. 144–151 (2005)

16.

Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)CrossRef

17.

Jones, E., Oliphant, T., Peterson, P.: SciPy: open source scientific tools for Python (2001). http://www.scipy.org/

18.

Jung, J., Paxson, V., Berger, A.W., Balakrishnan, H.: Fast portscan detection using sequential hypothesis testing. In: S&P, pp. 211–225 (2004)

19.

Khasawneh, K.N., Ozsoy, M., Donovick, C., Abu-Ghazaleh, N., Ponomarev, D.: Ensemble learning for low-level hardware-supported malware detection. In: Bos, H., Monrose, F., Blanc, G. (eds.) RAID 2015. LNCS, vol. 9404, pp. 3–25. Springer, Cham (2015). doi:10.1007/978-3-319-26362-5_1 CrossRef

20.

Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: SIGIR, pp. 3–12 (1994)

21.

Miller, B., Kantchelian, A., Afroz, S., Bachwani, R., Dauber, E., Huang, L., Tschantz, M.C., Joseph, A.D., Tygar, J.: Adversarial active learning. In: AISEC, pp. 3–14 (2014)

22.

Nappa, A., Rafique, M.Z., Caballero, J.: The MALICIA dataset: identification and analysis of drive-by download operations. IJIS 14(1), 15–33 (2015)CrossRef

23.

Omohundro, S.M.: Five Balltree Construction Algorithms. International Computer Science Institute, Berkeley (1989)

24.

Paxson, V.: Bro: a system for detecting network intruders in real-time. Comput. Netw. 31(23), 2435–2463 (1999)CrossRef

25.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. JMLR 12, 2825–2830 (2011)MathSciNetMATH

26.

Pelleg, D., Moore, A.W.: Active learning for anomaly and rare-category detection. In: NIPS, pp. 1073–1080 (2004)

27.

Rieck, K.: Computer security and machine learning: worst enemies or best friends? In: SysSec, pp. 107–110 (2011)

28.

Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)CrossRefMATH

29.

Schütze, H., Velipasaoglu, E., Pedersen, J.O.: Performance thresholding in practical text classification. In: CIKM, pp. 662–671 (2006)

30.

Sculley, D.: Online active learning methods for fast label-efficient spam filtering. In: CEAS, pp. 1–4 (2007)

31.

Sculley, D., Otey, M.E., Pohl, M., Spitznagel, B., Hainsworth, J., Zhou, Y.: Detecting adversarial advertisements in the wild. In: KDD, pp. 274–282 (2011)

32.

Settles, B.: Active learning literature survey. Univ. Wisconsin Madison 52(55–66), 11 (2010)

33.

Settles, B.: From theories to queries: active learning in practice. JMLR 16, 1–18 (2011)

34.

Settles, B.: Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012)MathSciNetCrossRefMATH

35.

Smutz, C., Stavrou, A.: Malicious PDF detection using metadata and structural features. In: ACSAC, pp. 239–248 (2012)

36.

Smutz, C., Stavrou, A.: Malicious PDF detection using metadata and structural features. In: Technical report. George Mason University (2012)

37.

Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast–but is it good?: Evaluating non-expert annotations for natural language tasks. In: EMNLP. pp. 254–263 (2008)

38.

Sommer, R., Paxson, V.: Outside the closed world: On using machine learning for network intrusion detection. In: S&P, pp. 305–316 (2010)

39.

Song, J., Takakura, H., Okabe, Y., Eto, M., Inoue, D., Nakao, K.: Statistical analysis of honeypot data and building of kyoto 2006+ dataset for NIDS evaluation. In: BADGERS, pp. 29–36 (2011)

40.

Stokes, J.W., Platt, J.C., Kravis, J., Shilman, M.: Aladin: active learning of anomalies to detect intrusions. Technical report. Microsoft Network Security Redmond, WA (2008)

41.

Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: CISDA (2009)

42.

Tax, D.M., Duin, R.P.: Support vector data description. Mach. Learn. 54(1), 45–66 (2004)CrossRefMATH

43.

Tomanek, K., Olsson, F.: A web survey on the use of active learning to support annotation of text data. In: ALNLP, pp. 45–48 (2009)

44.

Veeramachaneni, K., Arnaldo, I.: AI2: training a big data machine to defend. In: DataSec, pp. 49–54 (2016)

45.

Whittaker, C., Ryner, B., Nazif, M.: Large-scale automatic classification of phishing pages. In: NDSS, vol. 10 (2010)

46.

Wright, S., Nocedal, J.: Numerical optimization. Springer Sci. 35, 67–68 (1999)MATH

47.

Zhang, T., Oles, F.: The value of unlabeled data for classification problems. In: ICML, pp. 1191–1198 (2000)

Title: ILAB: An Interactive Labelling Strategy for Intrusion Detection
Authors: Anaël Beaugnon
Pierre Chifflier
Francis Bach
Publisher: Springer International Publishing
Book: Research in Attacks, Intrusions, and Defenses
Print ISBN: 978-3-319-66331-9

Electronic ISBN: 978-3-319-66332-6

Copyright Year: 2017
DOI: https://doi.org/10.1007/978-3-319-66332-6_6

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner