skip to main content
research-article
Open Access

On Generating Network Traffic Datasets with Synthetic Attacks for Intrusion Detection

Published:02 January 2021Publication History
Skip Abstract Section

Abstract

Most research in the field of network intrusion detection heavily relies on datasets. Datasets in this field, however, are scarce and difficult to reproduce. To compare, evaluate, and test related work, researchers usually need the same datasets or at least datasets with similar characteristics as the ones used in related work. In this work, we present concepts and the Intrusion Detection Dataset Toolkit (ID2T) to alleviate the problem of reproducing datasets with desired characteristics to enable an accurate replication of scientific results. Intrusion Detection Dataset Toolkit (ID2T) facilitates the creation of labeled datasets by injecting synthetic attacks into background traffic. The injected synthetic attacks created by ID2T blend with the background traffic by mimicking the background traffic’s properties.

This article has three core contributions. First, we present a comprehensive survey on intrusion detection datasets. In the survey, we propose a classification to group the negative qualities found in the datasets. Second, the architecture of ID2T is revised, improved, and expanded in comparison to previous work. The architectural changes enable ID2T to inject recent and advanced attacks, such as the EternalBlue exploit or a peer-to-peer botnet. ID2T’s functionality provides a set of tests, known as TIDED, that helps identify potential defects in the background traffic into which attacks are injected. Third, we illustrate how ID2T is used in different use-case scenarios to replicate scientific results with the help of reproducible datasets. ID2T is open source software and is made available to the community to expand its arsenal of attacks and capabilities.

References

  1. Sebastian Abt and Harald Baier. 2013. Are we missing labels? A study of the availability of ground-truth in network security research. In Proceedings of the Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. United States Military Academy. 2009. CDX 2009 Network. Retrieved from https://www.westpoint.edu/centers-and-research/cyber-research-center/data-sets.Google ScholarGoogle Scholar
  3. Akamai. 2018. The state of the internet / security report. Retrieved from https://www.akamai.com/uk/en/multimedia/documents/case-study/spring-2018-state-of-the-internet-security-report.pdf.Google ScholarGoogle Scholar
  4. Rafael Ramos Regis Barbosa, Ramin Sadre, Aiko Pras, and Remco Meent. 2010. Simpleweb/University of Twente Traffic Traces Data Repository. Technical Report. Centre for Telematics and Information Technology, University of Twente.Google ScholarGoogle Scholar
  5. Steven M. Bellovin. 1992. Packets found on an internet 1 introduction 2 address space oddities. Comput. Commun. 23, 3 (1992), 1--8.Google ScholarGoogle Scholar
  6. Monowar H. Bhuyan, Dhruba K. Bhattacharyya, and Jugal K. Kalita. 2015. Towards generating real-life datasets for network intrusion detection. Int. J. Netw. Secur. 17, 6 (2015), 683--701.Google ScholarGoogle Scholar
  7. Daniela Brauckhoff, Arno Wagner, and May Martin. 2008. FLAME: A flow-level anomaly modeling engine. In Proceedings of the Conference on Cyber Security (CSET’08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. CAIDA. 2017. CAIDA Data—Overview of Datasets, Monitors, and Reports. Retrieved from http://www.caida.org/data/overview/.Google ScholarGoogle Scholar
  9. National CyberWatch Center. 2017. Mid-Atlantic Collegiate Cyber Defense Competition. Retrieved from https://maccdc.org/.Google ScholarGoogle Scholar
  10. Carlos Garcia Cordero, Emmanouil Vasilomanolakis, Nikolay Milanov, Christian Koch, David Hausheer, and Max Mühlhäuser. 2015. ID2T: A DIY dataset creation toolkit for intrusion detection systems. In Proceedings of the Conference on Communications and Network Security (CNS’15). IEEE, 739--740.Google ScholarGoogle ScholarCross RefCross Ref
  11. Michelle Cotton, Lars Eggert, Joe Touch, Magnus Westerlund, and Stuart Cheshire. 2011. Internet Assigned Numbers Authority (IANA) Procedures for the Management of the Service name and Transport Protocol Port Number Registry. RFC 6335. Retrieved from http://buildbot.tools.ietf.org/html/rfc6335.Google ScholarGoogle Scholar
  12. Gideon Creech and Jiankun Hu. 2013. Generation of a new IDS test dataset: Time to Retire the KDD Collection. In Proceedings of the Wireless Communications and Networking Conference (WCNC’13). IEEE, 4487--4492.Google ScholarGoogle ScholarCross RefCross Ref
  13. Robert K. Cunningham, Richard P. Lippmann, David J. Fried, Simson L. Garfinkel, Isaac Graf, Kris R. Kendall, Seth E. Webster, Dan Wyschogrod, and Marc A. Zissman. 1999. Evaluating Intrusion Detection Systems Without Attacking your Friends: The 1998 DARPA Intrusion Detection Evaluation. Technical Report. MIT Lincoln Lab.Google ScholarGoogle Scholar
  14. Peter B. Danzig and Sugih Jamin. 1991. tcplib: A library of internetwork traffic characteristics. Library 48 (1991), 1--8.Google ScholarGoogle Scholar
  15. Romain Fontugne, Pierre Borgnat, Patrice Abry, and Kensuke Fukuda. 2010. MAWILab: Combining diverse anomaly detectors for automated anomaly labeling and performance benchmarking. In Proceedings of the Conference on Emerging Networking EXperiments and Technologies (CoNEXT’10). ACM, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sebastian Garcia. 2011. Stratosphere Research Laboratory. Retrieved from https://www.stratosphereips.org/.Google ScholarGoogle Scholar
  17. Sebastian Garcia, Martin Grill, Jan Stiborek, and Alejandro Zunino. 2014. An empirical comparison of botnet detection methods. Comput. Secur. 45 (2014), 100--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Carlos Garcia Cordero, Sascha Hauke, Max Mühlhäuser, and Mathias Fischer. 2016. Analyzing flow-based anomaly intrusion detection using replicator neural networks. In Proceedings of the 14th Annual Conference on Privacy, Security and Trust (PST’16). 317--324. DOI:https://doi.org/10.1109/PST.2016.7906980Google ScholarGoogle ScholarCross RefCross Ref
  19. Dan Grossman. 2002. New Terminology and Clarifications for Diffserv. RFC 3260. Retrieved from http://buildbot.tools.ietf.org/html/rfc3260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. W. Haider, J. Hu, J. Slay, B. P. Turnbull, and Y. Xie. 2017. Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling. J. Netw. Comput. Appl. 87 (2017), 185--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Santiago Hernández. 2018. Awesome-Cybersecurity-Datasets. Retrieved from https://github.com/shramos/Awesome-Cybersecurity-Datasets.Google ScholarGoogle Scholar
  22. IMPACT. 2017. Information Marketplace. Retrieved from https://www.impactcybertrust.org.Google ScholarGoogle Scholar
  23. Kadangode K. Ramakrishnan, Sally Floyd, and D. Black. 2001. The Addition of Explicit Congestion Notification (ECN’01) to IP. Technical Report.Google ScholarGoogle Scholar
  24. KDD Cup 99. 1999. Knowledge Discovery and Data Mining Tools Competition. Retrieved from http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.Google ScholarGoogle Scholar
  25. Robert Koch, Mario Golling, and Gabi Dreo Rodosek. 2014. Towards comparability of intrusion detection systems: New data sets. In Proceedings of the TERENA Networking Conference. 7.Google ScholarGoogle Scholar
  26. Anukool Lakhina, Mark Crovella, and Christophe Diot. 2005. Mining anomalies using traffic feature distributions. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM’05). ACM Press, 217--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Imed Lassoued. 2011. Adaptive Monitoring and Management of Internet Traffic. PhD Thesis. Université de Nice.Google ScholarGoogle Scholar
  28. Marc Liberatore and Prashant Shenoy. 2013. Umass trace repository. Retrieved from http://traces.cs.umass.edu.Google ScholarGoogle Scholar
  29. Thomas Lukaseder. 2017. 2017-SUEE-data-set. Retrieved from https://github.com/vs-uulm/2017-SUEE-data-set.Google ScholarGoogle Scholar
  30. Matthew V. Mahoney. 2003. Network traffic anomaly detection based on packet bytes. In Proceedings of the 2003 ACM Symposium on Applied Computing. ACM, 346--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Matthew V. Mahoney and Philip K. Chan. 2003. An analysis of the 1999 DARPA/lincoln laboratory evaluation data for network anomaly detection. In Proceedings of the International Symposium on Recent Advances in Intrusion Detection. 220--237. DOI:https://doi.org/10.1007/b13476Google ScholarGoogle Scholar
  32. John McHugh. 2000. Testing intrusion detection systems: A critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by lincoln laboratory. ACM Trans. Info. Syst. Secur. 3, 4 (2000), 262--294. DOI:https://doi.org/10.1145/382912.382923 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Nour Moustafa and Jill Slay. 2015. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the Military Communications and Information Systems Conference (MilCIS’15). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  34. Boris Nechaev, Mark Allman, Vern Paxson, and Andrei V. Gurtov. 2010. A preliminary analysis of TCP performance in an enterprise network. INM/WREN 10 (2010).Google ScholarGoogle Scholar
  35. NETRESEC. 2010. Capture files from Mid-Atlantic CCDC. Retrieved from https://www.netresec.com/?page=MACCDC.Google ScholarGoogle Scholar
  36. Vern Paxson. 1999. Bro: A system for detecting network intruders in real-time. Comput. Netw. 31, 23--24 (1999), 2435--2463. DOI:https://doi.org/10.1016/S1389-1286(99)00112-7 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jon Postel et al. 1981. Internet Protocol. RFC 791. Retrieved from http://buildbot.tools.ietf.org/html/rfc791.Google ScholarGoogle Scholar
  38. Nadun Rajasinghe, Jagath Samarabandu, and Xianbin Wang. 2018. INSecS-DCS: A highly customizable network intrusion dataset creation framework. In Proceedings of the IEEE Canadian Conference on Electrical 8 Computer Engineering (CCECE’18). IEEE, 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  39. Joyce Reynolds and Jon Postel. 1994. Assigned Numbers. Technical Report.Google ScholarGoogle Scholar
  40. Haakon Ringberg, Matthew Roughan, and Jennifer Rexford. 2008. The need for simulation in evaluating anomaly detectors. SIGCOMM Comput. Commun. Rev. 38, 1 (Jan. 2008), 55--59. DOI:https://doi.org/10.1145/1341431.1341443 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Benjamin Sangster, Thomas Cook, Robert Fanelli, Erik Dean, William J. Adams, Chris Morrell, and Gregory Conti. 2009. Toward instrumenting network warfare competitions to generate labeled datasets. In Proceedings of the USENIX Security’s Workshop on Cyber Security Experimentation and Test (CSET’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Mike Sconzo. 2015. Samples of Security Related Data. Retrieved from https://www.secrepo.com/.Google ScholarGoogle Scholar
  43. Ali Shiravi, Hadi Shiravi, Mahbod Tavallaee, and Ali A. Ghorbani. 2012. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31, 3 (2012), 357--374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. John Sonchack, Adam J. Aviv, and Jonathan M. Smith. 2013. Bridging the data gap: Data related challenges in evaluating large scale collaborative security systems. In Proceedings of the 6th Workshop on Cyber Security Experimentation and Test.Google ScholarGoogle Scholar
  45. Jungsuk Song, Hiroki Takakura, and Yasuo Okabe. 2006. Description of Kyoto University benchmark data. Academic Center for Computing and Media Studies (ACCMS), Kyoto University.Google ScholarGoogle Scholar
  46. Jungsuk Song, Hiroki Takakura, and Yasuo Okabe. 2008. Cooperation of intelligent honeypots to detect unknown malicious codes. In Proceedings of the WOMBAT Workshop on Information Security Threats Data Collection and Sharing (WISTDCS’08). IEEE, 31--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Anna Sperotto, Ramin Sadre, Frank Van Vliet, and Aiko Pras. 2009. A labeled data set for flow-based intrusion detection. In Proceedings of the International Workshop on IP Operations and Management. Springer, 39--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. SPIRENT. 2002. pcapr: PCAP files repository. Retrieved from https://www.pcapr.net/.Google ScholarGoogle Scholar
  49. Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A. Ghorbani. 2009. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the Symposium on Computational Intelligence for Security and Defense Applications (CISDA’09). IEEE, 1--6. DOI:https://doi.org/10.1109/CISDA.2009.5356528 Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Emmanouil Vasilomanolakis, Carlos Garcia Cordero, Nikolay Milanov, and Max Mühlhäuser. 2016. Towards the creation of synthetic, yet realistic, intrusion detection datasets. In Proceedings of the IEEE/IFIP Network Operations and Management Symposium (NOMS’16). IEEE, 1209--1214.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Emmanouil Vasilomanolakis, Shankar Karuppayah, Max Mühlhäuser, and Mathias Fischer. 2015. Taxonomy and survey of collaborative intrusion detection. Comput. Surveys 47, 4 (2015), 33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Emmanouil Vasilomanolakis, Matthias Krügl, Carlos Garcia Cordero, Max Mühlhäuser, and Mathias Fischer. 2015. SkipMon: A locality-aware collaborative intrusion detection system. In Proceedings of the IEEE 34th International Performance on Computing and Communications Conference (IPCCC’15). IEEE, 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Richard Zuech, Taghi M. Khoshgoftaar, Naeem Seliya, Maryam M. Najafabadi, and Clifford Kemp. 2015. A new intrusion detection benchmarking system. In Proceedings of the FLAIRS Conference. 252--256.Google ScholarGoogle Scholar

Index Terms

  1. On Generating Network Traffic Datasets with Synthetic Attacks for Intrusion Detection

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Privacy and Security
        ACM Transactions on Privacy and Security  Volume 24, Issue 2
        May 2021
        242 pages
        ISSN:2471-2566
        EISSN:2471-2574
        DOI:10.1145/3446639
        Issue’s Table of Contents

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 2 January 2021
        • Accepted: 1 September 2020
        • Revised: 1 March 2020
        • Received: 1 June 2018
        Published in tops Volume 24, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format