research-article

Open Access

On Generating Network Traffic Datasets with Synthetic Attacks for Intrusion Detection

Authors:
Carlos Garcia Cordero

Technische Universität Darmstadt, Hessen, Germany

Technische Universität Darmstadt, Hessen, Germany
View Profile

,
Emmanouil Vasilomanolakis

Aalborg University, Copenhagen, Denmark

Aalborg University, Copenhagen, Denmark
View Profile

,
Aidmar Wainakh

Technische Universität Darmstadt, Hessen, Germany

Technische Universität Darmstadt, Hessen, Germany
View Profile

,
Max Mühlhäuser

Technische Universität Darmstadt, Hessen, Germany

Technische Universität Darmstadt, Hessen, Germany
View Profile

,
Simin Nadjm-Tehrani

Linköping University, Sweden

Linköping University, Sweden
View Profile

Authors Info & Claims

ACM Transactions on Privacy and Security Volume 24 Issue 2Article No.: 8pp 1–39https://doi.org/10.1145/3424155

Published:02 January 2021Publication History

ACM Transactions on Privacy and Security

Abstract

Most research in the field of network intrusion detection heavily relies on datasets. Datasets in this field, however, are scarce and difficult to reproduce. To compare, evaluate, and test related work, researchers usually need the same datasets or at least datasets with similar characteristics as the ones used in related work. In this work, we present concepts and the Intrusion Detection Dataset Toolkit (ID2T) to alleviate the problem of reproducing datasets with desired characteristics to enable an accurate replication of scientific results. Intrusion Detection Dataset Toolkit (ID2T) facilitates the creation of labeled datasets by injecting synthetic attacks into background traffic. The injected synthetic attacks created by ID2T blend with the background traffic by mimicking the background traffic’s properties.

This article has three core contributions. First, we present a comprehensive survey on intrusion detection datasets. In the survey, we propose a classification to group the negative qualities found in the datasets. Second, the architecture of ID2T is revised, improved, and expanded in comparison to previous work. The architectural changes enable ID2T to inject recent and advanced attacks, such as the EternalBlue exploit or a peer-to-peer botnet. ID2T’s functionality provides a set of tests, known as TIDED, that helps identify potential defects in the background traffic into which attacks are injected. Third, we illustrate how ID2T is used in different use-case scenarios to replicate scientific results with the help of reproducible datasets. ID2T is open source software and is made available to the community to expand its arsenal of attacks and capabilities.

References

Sebastian Abt and Harald Baier. 2013. Are we missing labels? A study of the availability of ground-truth in network security research. In Proceedings of the Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS’14). Google ScholarDigital Library
United States Military Academy. 2009. CDX 2009 Network. Retrieved from https://www.westpoint.edu/centers-and-research/cyber-research-center/data-sets.Google Scholar
Akamai. 2018. The state of the internet / security report. Retrieved from https://www.akamai.com/uk/en/multimedia/documents/case-study/spring-2018-state-of-the-internet-security-report.pdf.Google Scholar
Rafael Ramos Regis Barbosa, Ramin Sadre, Aiko Pras, and Remco Meent. 2010. Simpleweb/University of Twente Traffic Traces Data Repository. Technical Report. Centre for Telematics and Information Technology, University of Twente.Google Scholar
Steven M. Bellovin. 1992. Packets found on an internet 1 introduction 2 address space oddities. Comput. Commun. 23, 3 (1992), 1--8.Google Scholar
Monowar H. Bhuyan, Dhruba K. Bhattacharyya, and Jugal K. Kalita. 2015. Towards generating real-life datasets for network intrusion detection. Int. J. Netw. Secur. 17, 6 (2015), 683--701.Google Scholar
Daniela Brauckhoff, Arno Wagner, and May Martin. 2008. FLAME: A flow-level anomaly modeling engine. In Proceedings of the Conference on Cyber Security (CSET’08). Google ScholarDigital Library
CAIDA. 2017. CAIDA Data—Overview of Datasets, Monitors, and Reports. Retrieved from http://www.caida.org/data/overview/.Google Scholar
National CyberWatch Center. 2017. Mid-Atlantic Collegiate Cyber Defense Competition. Retrieved from https://maccdc.org/.Google Scholar
Carlos Garcia Cordero, Emmanouil Vasilomanolakis, Nikolay Milanov, Christian Koch, David Hausheer, and Max Mühlhäuser. 2015. ID2T: A DIY dataset creation toolkit for intrusion detection systems. In Proceedings of the Conference on Communications and Network Security (CNS’15). IEEE, 739--740.Google ScholarCross Ref
Michelle Cotton, Lars Eggert, Joe Touch, Magnus Westerlund, and Stuart Cheshire. 2011. Internet Assigned Numbers Authority (IANA) Procedures for the Management of the Service name and Transport Protocol Port Number Registry. RFC 6335. Retrieved from http://buildbot.tools.ietf.org/html/rfc6335.Google Scholar
Gideon Creech and Jiankun Hu. 2013. Generation of a new IDS test dataset: Time to Retire the KDD Collection. In Proceedings of the Wireless Communications and Networking Conference (WCNC’13). IEEE, 4487--4492.Google ScholarCross Ref
Robert K. Cunningham, Richard P. Lippmann, David J. Fried, Simson L. Garfinkel, Isaac Graf, Kris R. Kendall, Seth E. Webster, Dan Wyschogrod, and Marc A. Zissman. 1999. Evaluating Intrusion Detection Systems Without Attacking your Friends: The 1998 DARPA Intrusion Detection Evaluation. Technical Report. MIT Lincoln Lab.Google Scholar
Peter B. Danzig and Sugih Jamin. 1991. tcplib: A library of internetwork traffic characteristics. Library 48 (1991), 1--8.Google Scholar
Romain Fontugne, Pierre Borgnat, Patrice Abry, and Kensuke Fukuda. 2010. MAWILab: Combining diverse anomaly detectors for automated anomaly labeling and performance benchmarking. In Proceedings of the Conference on Emerging Networking EXperiments and Technologies (CoNEXT’10). ACM, 1--12. Google ScholarDigital Library
Sebastian Garcia. 2011. Stratosphere Research Laboratory. Retrieved from https://www.stratosphereips.org/.Google Scholar
Sebastian Garcia, Martin Grill, Jan Stiborek, and Alejandro Zunino. 2014. An empirical comparison of botnet detection methods. Comput. Secur. 45 (2014), 100--123. Google ScholarDigital Library
Carlos Garcia Cordero, Sascha Hauke, Max Mühlhäuser, and Mathias Fischer. 2016. Analyzing flow-based anomaly intrusion detection using replicator neural networks. In Proceedings of the 14th Annual Conference on Privacy, Security and Trust (PST’16). 317--324. DOI:https://doi.org/10.1109/PST.2016.7906980Google ScholarCross Ref
Dan Grossman. 2002. New Terminology and Clarifications for Diffserv. RFC 3260. Retrieved from http://buildbot.tools.ietf.org/html/rfc3260. Google ScholarDigital Library
W. Haider, J. Hu, J. Slay, B. P. Turnbull, and Y. Xie. 2017. Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling. J. Netw. Comput. Appl. 87 (2017), 185--192. Google ScholarDigital Library
Santiago Hernández. 2018. Awesome-Cybersecurity-Datasets. Retrieved from https://github.com/shramos/Awesome-Cybersecurity-Datasets.Google Scholar
IMPACT. 2017. Information Marketplace. Retrieved from https://www.impactcybertrust.org.Google Scholar
Kadangode K. Ramakrishnan, Sally Floyd, and D. Black. 2001. The Addition of Explicit Congestion Notification (ECN’01) to IP. Technical Report.Google Scholar
KDD Cup 99. 1999. Knowledge Discovery and Data Mining Tools Competition. Retrieved from http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.Google Scholar
Robert Koch, Mario Golling, and Gabi Dreo Rodosek. 2014. Towards comparability of intrusion detection systems: New data sets. In Proceedings of the TERENA Networking Conference. 7.Google Scholar
Anukool Lakhina, Mark Crovella, and Christophe Diot. 2005. Mining anomalies using traffic feature distributions. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM’05). ACM Press, 217--228. Google ScholarDigital Library
Imed Lassoued. 2011. Adaptive Monitoring and Management of Internet Traffic. PhD Thesis. Université de Nice.Google Scholar
Marc Liberatore and Prashant Shenoy. 2013. Umass trace repository. Retrieved from http://traces.cs.umass.edu.Google Scholar
Thomas Lukaseder. 2017. 2017-SUEE-data-set. Retrieved from https://github.com/vs-uulm/2017-SUEE-data-set.Google Scholar
Matthew V. Mahoney. 2003. Network traffic anomaly detection based on packet bytes. In Proceedings of the 2003 ACM Symposium on Applied Computing. ACM, 346--350. Google ScholarDigital Library
Matthew V. Mahoney and Philip K. Chan. 2003. An analysis of the 1999 DARPA/lincoln laboratory evaluation data for network anomaly detection. In Proceedings of the International Symposium on Recent Advances in Intrusion Detection. 220--237. DOI:https://doi.org/10.1007/b13476Google Scholar
John McHugh. 2000. Testing intrusion detection systems: A critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by lincoln laboratory. ACM Trans. Info. Syst. Secur. 3, 4 (2000), 262--294. DOI:https://doi.org/10.1145/382912.382923 Google ScholarDigital Library
Nour Moustafa and Jill Slay. 2015. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the Military Communications and Information Systems Conference (MilCIS’15). IEEE, 1--6.Google ScholarCross Ref
Boris Nechaev, Mark Allman, Vern Paxson, and Andrei V. Gurtov. 2010. A preliminary analysis of TCP performance in an enterprise network. INM/WREN 10 (2010).Google Scholar
NETRESEC. 2010. Capture files from Mid-Atlantic CCDC. Retrieved from https://www.netresec.com/?page=MACCDC.Google Scholar
Vern Paxson. 1999. Bro: A system for detecting network intruders in real-time. Comput. Netw. 31, 23--24 (1999), 2435--2463. DOI:https://doi.org/10.1016/S1389-1286(99)00112-7 Google ScholarDigital Library
Jon Postel et al. 1981. Internet Protocol. RFC 791. Retrieved from http://buildbot.tools.ietf.org/html/rfc791.Google Scholar
Nadun Rajasinghe, Jagath Samarabandu, and Xianbin Wang. 2018. INSecS-DCS: A highly customizable network intrusion dataset creation framework. In Proceedings of the IEEE Canadian Conference on Electrical 8 Computer Engineering (CCECE’18). IEEE, 1--4.Google ScholarCross Ref
Joyce Reynolds and Jon Postel. 1994. Assigned Numbers. Technical Report.Google Scholar
Haakon Ringberg, Matthew Roughan, and Jennifer Rexford. 2008. The need for simulation in evaluating anomaly detectors. SIGCOMM Comput. Commun. Rev. 38, 1 (Jan. 2008), 55--59. DOI:https://doi.org/10.1145/1341431.1341443 Google ScholarDigital Library
Benjamin Sangster, Thomas Cook, Robert Fanelli, Erik Dean, William J. Adams, Chris Morrell, and Gregory Conti. 2009. Toward instrumenting network warfare competitions to generate labeled datasets. In Proceedings of the USENIX Security’s Workshop on Cyber Security Experimentation and Test (CSET’09). Google ScholarDigital Library
Mike Sconzo. 2015. Samples of Security Related Data. Retrieved from https://www.secrepo.com/.Google Scholar
Ali Shiravi, Hadi Shiravi, Mahbod Tavallaee, and Ali A. Ghorbani. 2012. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31, 3 (2012), 357--374. Google ScholarDigital Library
John Sonchack, Adam J. Aviv, and Jonathan M. Smith. 2013. Bridging the data gap: Data related challenges in evaluating large scale collaborative security systems. In Proceedings of the 6th Workshop on Cyber Security Experimentation and Test.Google Scholar
Jungsuk Song, Hiroki Takakura, and Yasuo Okabe. 2006. Description of Kyoto University benchmark data. Academic Center for Computing and Media Studies (ACCMS), Kyoto University.Google Scholar
Jungsuk Song, Hiroki Takakura, and Yasuo Okabe. 2008. Cooperation of intelligent honeypots to detect unknown malicious codes. In Proceedings of the WOMBAT Workshop on Information Security Threats Data Collection and Sharing (WISTDCS’08). IEEE, 31--39. Google ScholarDigital Library
Anna Sperotto, Ramin Sadre, Frank Van Vliet, and Aiko Pras. 2009. A labeled data set for flow-based intrusion detection. In Proceedings of the International Workshop on IP Operations and Management. Springer, 39--50. Google ScholarDigital Library
SPIRENT. 2002. pcapr: PCAP files repository. Retrieved from https://www.pcapr.net/.Google Scholar
Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A. Ghorbani. 2009. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the Symposium on Computational Intelligence for Security and Defense Applications (CISDA’09). IEEE, 1--6. DOI:https://doi.org/10.1109/CISDA.2009.5356528 Google ScholarDigital Library
Emmanouil Vasilomanolakis, Carlos Garcia Cordero, Nikolay Milanov, and Max Mühlhäuser. 2016. Towards the creation of synthetic, yet realistic, intrusion detection datasets. In Proceedings of the IEEE/IFIP Network Operations and Management Symposium (NOMS’16). IEEE, 1209--1214.Google ScholarDigital Library
Emmanouil Vasilomanolakis, Shankar Karuppayah, Max Mühlhäuser, and Mathias Fischer. 2015. Taxonomy and survey of collaborative intrusion detection. Comput. Surveys 47, 4 (2015), 33. Google ScholarDigital Library
Emmanouil Vasilomanolakis, Matthias Krügl, Carlos Garcia Cordero, Max Mühlhäuser, and Mathias Fischer. 2015. SkipMon: A locality-aware collaborative intrusion detection system. In Proceedings of the IEEE 34th International Performance on Computing and Communications Conference (IPCCC’15). IEEE, 1--8. Google ScholarDigital Library
Richard Zuech, Taghi M. Khoshgoftaar, Naeem Seliya, Maryam M. Najafabadi, and Clifford Kemp. 2015. A new intrusion detection benchmarking system. In Proceedings of the FLAIRS Conference. 252--256.Google Scholar

Index Terms

On Generating Network Traffic Datasets with Synthetic Attacks for Intrusion Detection
1. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Intrusion detection systems
  2. Network security

Recommendations

A hybrid intrusion detection system design for computer network security

Intrusions detection systems (IDSs) are systems that try to detect attacks as they occur or after the attacks took place. IDSs collect network traffic information from some point on the network or computer system and then use this information to secure ...
Read More
Service-independent payload analysis to improve intrusion detection in network traffic
AusDM '08: Proceedings of the 7th Australasian Data Mining Conference - Volume 87

The popularity of computer networks broadens the scope for network attackers and increases the damage these attacks can cause. In this context, Intrusion Detection Systems (IDS) are included as part of any complete security package. This work focuses on ...
Read More
A Comparative Study on the Impact of Adversarial Machine Learning Attacks on Contemporary Intrusion Detection Datasets
Abstract
Adversarial attack techniques have taken a firm stand against the capabilities of deep neural networks, rendering them less efficient in performing their functions. Various kind of attacks have been studied and appropriate defense mechanisms have ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Privacy and Security Volume 24, Issue 2
May 2021
242 pages
ISSN:2471-2566
EISSN:2471-2574
DOI:10.1145/3446639
Editor:
Ninghui Li
Purdue University, USA
Issue’s Table of Contents
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 January 2021
- Accepted: 1 September 2020
- Revised: 1 March 2020
- Received: 1 June 2018
Published in tops Volume 24, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Intrusion detection systems
attack injection
datasets
synthetic dataset
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 5,487
  Total Downloads
- Downloads (Last 12 months)1,491
- Downloads (Last 6 weeks)198
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

On Generating Network Traffic Datasets with Synthetic Attacks for Intrusion Detection

ACM Transactions on Privacy and Security

Abstract

References

Cited By

Index Terms

Recommendations

A hybrid intrusion detection system design for computer network security

Service-independent payload analysis to improve intrusion detection in network traffic

A Comparative Study on the Impact of Adversarial Machine Learning Attacks on Contemporary Intrusion Detection Datasets

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

On Generating Network Traffic Datasets with Synthetic Attacks for Intrusion Detection

ACM Transactions on Privacy and Security

Abstract

References

Cited By

Index Terms

Recommendations

A hybrid intrusion detection system design for computer network security

Service-independent payload analysis to improve intrusion detection in network traffic

A Comparative Study on the Impact of Adversarial Machine Learning Attacks on Contemporary Intrusion Detection Datasets

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media