Top

Soft Computing

Published in:

18-01-2019 | Focus

Scalable detection of botnets based on DGA

Efficient feature discovery process in machine learning techniques

Authors: Mattia Zago, Manuel Gil Pérez, Gregorio Martínez Pérez

Published in: Soft Computing | Issue 8/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Botnets are evolving, and their covert modus operandi, based on cloud technologies such as the virtualisation and the dynamic fast-flux addressing, has been proved challenging for classic intrusion detection systems and even the so-called next-generation firewalls. Moreover, dynamic addressing has been spotted in the wild in combination with pseudo-random domain names generation algorithm (DGA), ultimately leading to an extremely accurate and effective disguise technique. Although these concealing methods have been exposed and analysed to great extent in the past decade, the literature lacks some important conclusions and common-ground knowledge, especially when it comes to Machine Learning (ML) solutions. This research horizontally navigates the state of the art aiming to polish the feature discovery process, which is the single most time-consuming part of any ML approach. Results show that only a minor fraction of the defined features are indeed practical and informative, especially when considering 0-day botnet identification. The contributions described in this article will ease the detection process, ultimately enabling improved and more scalable solutions for DGA-based botnets detection.

previous article Study on a storage location strategy based on clustering and association algorithms

next article An efficient index structure for distributed k-nearest neighbours query processing

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Including four features (NLP-L-x , NLP-R-NUM-x , NLP-R-VOW-x , NLP-R-CON-x ) for each domain name level: the FQDN, the Second Level Domain Name (2LD) or all the others sub-levels as a whole (OLD).

According to ICANN specifics, the minimum length of a domain name without considering the Top Level Domain (TLD) is three characters. The maximum, including symbols and extensions, is 255, having a maximum length per-level of 63 characters.

The IG, is purely theoretic, it does not consider any particular classification algorithm.

By experimentally demonstrating that users’ data are not strictly required to recognise malwares in the wild. See Sect. 3.3.

Abakumov A (2016) andrewaeva/DGA. URL https://github.com/andrewaeva/DGA

Abbink J, Doerr C (2017) Popularity-based detection of domain generation algorithms. In: 12th international conference on availability, reliability and security, pp 79:1–79:8. https://doi.org/10.1145/3098954.3107008

Abdel-Hamid O, Mohamed Ar, Jiang H, Deng L, Penn G, Yu D (2014) Convolutional neural networks for speech recognition. IEEE/ACM Trans Audio Speech Lang Process 22(10):1533–1545. https://doi.org/10.1109/TASLP.2014.2339736 CrossRef

Ahluwalia A, Traore I, Ganame K, Agarwal N (2017) Detecting broad length algorithmically generated domains. In: Intelligent, secure, and dependable systems in distributed and cloud environments, chap. 2, pp 19–34. Springer International Publishing. https://doi.org/10.1007/978-3-319-69155-8_2

Alieyan K, ALmomani A, Manasrah A, Kadhum MM (2017) A survey of botnet detection based on DNS. Neural Comput Appl 28(7):1541–1558. https://doi.org/10.1007/s00521-015-2128-0 CrossRef

Almomani A, Alauthman M, Albalas F, Dorgham O, Obeidat A (2018) An online intrusion detection system to cloud computing based on Neucube algorithms. Int J Cloud Appl Comput 8(2):96–112. https://doi.org/10.4018/IJCAC.2018040105 CrossRef

Anderson HS, Woodbridge J, Filar B (2016) DeepDGA: adversarially-tuned domain generation and detection. In: 2016 ACM workshop on artificial intelligence and security, pp 13–21. https://doi.org/10.1145/2996758.2996767

Antonakakis M, Perdisci R, Nadji Y, Vasiloglou N, Abu-Nimeh S, Lee W, Dagon D (2012) From throw-away traffic to bots: detecting the rise of DGA-based malware. In: 21st USENIX security symposium, pp 491–506. Bellevue, WA. URL https://www.usenix.org/conference/usenixsecurity12/technical-sessions/presentation/antonakakis

Bader J. Domain Generation Algorithms. URL https://github.com/baderj/domain_generation_algorithms

Baruch M, David G (2018) Domain generation algorithm detection using machine learning methods. In: Cyber security: power and technology, pp 133–161. Springer International Publishing. https://doi.org/10.1007/978-3-319-75307-2_9

Berger A, Gansterer WN (2013) Modeling DNS agility with DNSMap. In: 2013 proceedings IEEE INFOCOM, pp 3153–3158. https://doi.org/10.1109/INFCOM.2013.6567130

Biglar Beigi E, Hadian Jazi H, Stakhanova N, Ghorbani AA (2014) Towards effective feature selection in machine learning-based botnet detection approaches. In: 2014 IEEE conference on communications and network security, pp 247–25. https://doi.org/10.1109/CNS.2014.6997492

Bilge L, Sen S, Balzarotti D, Kirda E, Kruegel C (2014) Exposure: a passive DNS analysis service to detect and report malicious domains. ACM Trans Inf Syst Secur 16(4):14:1–14:28. https://doi.org/10.1145/2584679 CrossRef

Bishop C (2006) Pattern recognition and machine learning. Springer, BerlinMATH

Bisio F, Saeli S, Lombardo P, Bernardi D, Perotti A, Massa D (2017) Real-time behavioral DGA detection through machine learning. In: 2017 international carnahan conference on security technology, pp 1–6. https://doi.org/10.1109/CCST.2017.8167790

Bugiel S, Nürnberger S, Pöppelmann T, Sadeghi AR, Schneider T (2011) AmazonIA: when elasticity snaps back. In: 18th ACM conference on computer and communications security, pp 389–400. https://doi.org/10.1145/2046707.2046753

Demšar J, Curk T, Erjavec A, Gorup Č, Hočevar T, Milutinovič M, Možina M, Polajnar M, Toplak M, Starič A, Štajdohar M, Umek L, Žagar L, Žbontar J, Žitnik M, Zupan B (2013) Orange: data mining toolbox in Python. J Mach Learn Res 14:2349–2353. URL http://jmlr.org/papers/v14/demsar13a.html

Fran E, Hall MA, Witten IH (2016) The WEKA Workbench. Tech. rep. URL https://www.cs.waikato.ac.nz/ml/weka

Fu Y, Yu L, Hambolu O, Ozcelik I, Husain B, Sun J, Sapra K, Du D, Beasley CT, Brooks RR (2017) Stealthy domain generation algorithms. IEEE Trans Inf Forensics Secur 12(6):1430–1443. https://doi.org/10.1109/TIFS.2017.2668361 CrossRef

García S, Grill M, Stiborek J, Zunino A (2014) An empirical comparison of botnet detection methods. Comput Secur 45:100–123. https://doi.org/10.1016/j.cose.2014.05.011 CrossRef

Grill M, Nikolaev I, Valeros V, Rehak M (2015) Detecting DGA malware using NetFlow. In: 2015 IFIP/IEEE international symposium on integrated network management, pp 1304–1309. https://doi.org/10.1109/INM.2015.7140486

Gupta B, Agrawal DP, Yamaguchi S (eds) (2016) Handbook of research on modern cryptographic solutions for computer and cyber security, 1st edn. IGI Global

Han C, Zhang Y (2017) CODDULM: an approach for detecting C&C domains of DGA on passive DNS traffic. In: 2017 6th international conference on computer science and network technology, pp 385–388. https://doi.org/10.1109/ICCSNT.2017.8343724

Haykin S (1998) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall PTR, Upper Saddle RiverMATH

Holz T, Steiner M, Dahl F, Biersack E, Freiling F (2008) Measurements and mitigation of peer-to-peer-based Botnets: a case study on storm worm. In: USENIX security 2008. URL https://www.usenix.org/conference/leet-08/measurements-and-mitigation-peer-peer-based-botnets-case-study-storm-worm

Hussain SA, Fatima M, Saeed A, Raza I, Shahzad RK (2017) Multilevel classification of security concerns in cloud computing. Appl Comput Inform 13(1):57–65. https://doi.org/10.1016/j.aci.2016.03.001 CrossRef

Kintis P, Miramirkhani N, Lever C, Chen Y, Romero-Gómez R, Pitropakis N, Nikiforakis N, Antonakakis M (2017) Hiding in plain sight: a longitudinal study of combosquatting abuse. In: ACM SIGSAC conference on computer and communications security, pp 569–586. https://doi.org/10.1145/3133956.3134002

Kührer M, Rossow C, Holz T (2014) Paint it black: evaluating the effectiveness of malware blacklists. In: RAID 2014: research in attacks, intrusions and defenses, June, pp 1–21. Springer International Publishing. https://doi.org/10.1007/978-3-319-11379-1_1

Leelasankar K, Chellappan C, Sivasankar P (2018) Handbook of research on network forensics and analysis techniques, chap. successful computer forensics analysis on the cyber attack Botnet, pp 266–281. IGI Global. https://doi.org/10.4018/978-1-5225-4100-4.ch014

Lerner Z (2014) Microsoft the Botnet hunter: the role of public-private partnerships in mitigating Botnets. Harvard J Law Technol 28(1):237–261. URL http://jolt.law.harvard.edu/articles/pdf/v28/28HarvJLTech237.pdf

Lobato AGP, Lopez MA, Sanz IJ, Cardenas AA, Duarte OCMB, Pujolle G (2018) An Adaptive real-time architecture for zero-day threat detection. In: 2018 IEEE international conference on communications (ICC), pp 1–6. https://doi.org/10.1109/ICC.2018.8422622

Luo X, Wang L, Xu Z, Yang J, Sun M, Wang J (2017) DGASensor: fast detection for DGA-based malwares. In: 5th international conference on communications and broadband networking, pp 47–53. https://doi.org/10.1145/3057109.3057112

Mac H, Tran D, Tong V, Nguyen LG, Tran HA (2017) DGA Botnet detection using supervised learning methods. In: 8th international symposium on information and communication technology, pp 211–218. https://doi.org/10.1145/3155133.3155166

Majestic-12 Ltd: The Majestic Million (2018) URL https://majestic.com/reports/majestic-million

Malware Domain List (2009) URL https://www.malwaredomainlist.com/mdl.php

Mantovani RG, Rossi AL, Vanschoren J, Bischl B, Carvalho AC (2015) To tune or not to tune: recommending when to adjust SVM hyper-parameters via meta-learning. In: Proceedings of the international joint conference on neural networks, vol 2015-September, pp 1–8. https://doi.org/10.1109/IJCNN.2015.7280644

Mell P, Grance T (2011) The NIST definition of cloud computing, NIST Special Publication 800-145. URL http://faculty.winthrop.edu/domanm/csci411/Handouts/NIST.pdf

Mowbray M, Hagen J (2014) Finding domain-generation algorithms by looking at length distribution. In: 2014 IEEE international symposium on software reliability engineering workshops, pp 395–400. https://doi.org/10.1109/ISSREW.2014.20

Nespoli P, Papamartzivanos D, Mrmol FG, Kambourakis G (2018) Optimal countermeasures selection against cyber attacks: a comprehensive survey on reaction frameworks. IEEE Commun Surv Tutor 20(2):1361–1396. https://doi.org/10.1109/COMST.2017.2781126 CrossRef

Netlab 360: DGA Families. URL http://data.netlab.360.com/dga/

Nguyen TD, Cao TD, Nguyen LG (2015) DGA Botnet detection using collaborative filtering and density-based clustering. In: 6th international symposium on information and communication technology, pp 203–209. https://doi.org/10.1145/2833258.2833310

OSINT: OSINT DGA List. URL http://osint.bambenekconsulting.com/feeds/

Pelleg D, Moore A (2000) X-means: Extending K-Means with efficient estimation of the number of clusters. In: 7th international conference on machine learning pp 727–734. https://doi.org/10.1007/3-540-44491-2_3

Plohmann D (2015) DGArchive. URL https://dgarchive.caad.fkie.fraunhofer.de

Plohmann D, Yakdan K, Klatt M, Bader J, Gerhards-Padilla E (2016) A comprehensive measurement study of domain generating malware. In: 25th USENIX security symposium, pp 263–278. Austin, TX. URL https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_plohmann.pdf

Pu Y, Chen X, Pu Y, Shi J (2015) A clustering approach for detecting auto-generated Botnet domains. In: Applications and techniques in information security, pp 269–279. https://doi.org/10.1007/978-3-662-48683-2_24

Risk Analytics: DNS-BH-Malware Domain Blocklist (2007). URL http://www.malwaredomains.com

Schales DL, Jang J, Wang T, Hu X, Kirat D, Wuest B, Stoecklin MP (2016) Scalable analytics to detect DNS misuse for establishing stealthy communication channels. IBM J Res Dev 60(4):3:1–3:14. https://doi.org/10.1147/JRD.2016.2557639 CrossRef

Schiavoni S, Maggi F, Cavallaro L, Zanero S (2014) Phoenix: DGA-based Botnet tracking and intelligence. In: 11th international conference on detection of intrusions and malware, and vulnerability assessment, pp 192–211. Springer International Publishing. https://doi.org/10.1007/978-3-319-08509-8_11

Sharieh A, Albdour L (2017) A heuristic approach for service allocation in cloud computing. Int J Cloud Appl Comput 7(4):60–74. https://doi.org/10.4018/IJCAC.2017100104 CrossRef

Shi Y, Chen G, Li J (2017) Malicious domain name detection based on extreme machine learning. Neural Process Lett. https://doi.org/10.1007/s11063-017-9666-7

Song WJ, Li B (2016) A method to detect machine generated domain names based on random forest algorithm. In: 2016 international conference on information system and artificial intelligence, pp 509–513. https://doi.org/10.1109/ISAI.2016.0114

Stergiou C, Psannis KE, Kim BG, Gupta B (2018) Secure integration of IoT and cloud computing. Future Gener Comput Syst 78(3):964–975. https://doi.org/10.1016/j.future.2016.11.031 CrossRef

Stevanovic M, Pedersen JM, D’Alconzo A, Ruehrup S, Berger A (2015) On the ground truth problem of malicious DNS traffic analysis. Comput Secur 55:142–158. https://doi.org/10.1016/j.cose.2015.09.004 CrossRef

Stevanovic M, Pedersen JM, D’Alconzo A, Ruehrup S (2017) A method for identifying compromised clients based on DNS traffic analysis. Int J Inf Secur 16(2):115–132. https://doi.org/10.1007/s10207-016-0331-3 CrossRef

Thomas M, Mohaisen A (2014) Kindred domains: detecting and clustering Botnet domains using DNS traffic. In: 23rd international conference on World Wide Web, pp 707–712. https://doi.org/10.1145/2567948.2579359

Tong V, Nguyen G (2016) A method for detecting DGA Botnet based on semantic and cluster analysis. In: 7th symposium on information and communication technology, pp 272–277. https://doi.org/10.1145/3011077.3011112

Tran D, Mac H, Tong V, Tran HA, Nguyen LG (2018) A LSTM based framework for handling multiclass imbalance in DGA Botnet detection. Neurocomputing 275:2401–2413. https://doi.org/10.1016/j.neucom.2017.11.018 CrossRef

Truong D, Cheng G (2016) Detecting domain-flux botnet based on DNS traffic features in managed network. Secur Commun Netw 9(14):2338–2347. https://doi.org/10.1002/sec.1495 CrossRef

Tu TD, Guang C, Xin LY (2015) Detecting Bot-infected machines based on analyzing the similar periodic DNS queries. In: 2015 international conference on communications, management and telecommunications, pp 35–40. https://doi.org/10.1109/ComManTel.2015.7394256

Vinayakumar R, Soman K, Poornachandran P, Sachin Kumar S (2018) Evaluating deep learning approaches to characterize and classify the DGAs at scale. J Intell Fuzzy Syst 34(3):1265–1276. https://doi.org/10.3233/JIFS-169423 CrossRef

Vormayr G, Zseby T, Fabini J (2017) Botnet communication patterns. IEEE Commun Surv Tutor 19(4):2768–2796. https://doi.org/10.1109/COMST.2017.2749442 CrossRef

Watkins L, Beck S, Zook J, Buczak A, Chavis J, Robinson WH, Morales JA, Mishra S (2017) Using semi-supervised machine learning to address the big data problem in DNS networks. In: 2017 IEEE 7th annual computing and communication workshop and conference, pp 1–6. https://doi.org/10.1109/CCWC.2017.7868376

Woodbridge J, Anderson HS, Ahuja A, Grant D (2016) Predicting domain generation algorithms with long short-term memory networks. CoRR abs/1611.0. URL http://arxiv.org/abs/1611.00791

Xu S, Li S, Meng K, Wu L, Ding M (2017) An adaptive malicious domain detection mechanism with DNS traffic. In: 2017 VI international conference on network, communication and computing, pp 86–91. https://doi.org/10.1145/3171592.3171595

Yadav S, Reddy AKK, Reddy ALN, Ranjan S (2010) Detecting algorithmically generated malicious domain names. In: 10th ACM SIGCOMM conference on internet measurement, pp 48–61. https://doi.org/10.1145/1879141.1879148

Zhang S, Zhang X, Ou X (2014) After we knew it: empirical study and modeling of cost-effectiveness of exploiting prevalent known vulnerabilities across IaaS cloud. In: 9th ACM symposium on information, computer and communications security, pp 317–328. https://doi.org/10.1145/2590296.2590300

Zhang H, Gharaibeh M, Thanasoulas S, Papadopoulos C (2016) BotDigger: detecting DGA Bots in a single network. Tech. rep., Colorado State University. URL http://www.cs.colostate.edu/TechReports/Reports/2016/tr16-101.pdf

Title: Scalable detection of botnets based on DGA
Efficient feature discovery process in machine learning techniques
Authors: Mattia Zago
Manuel Gil Pérez
Gregorio Martínez Pérez
Publication date: 18-01-2019
Publisher: Springer Berlin Heidelberg
Published in: Soft Computing / Issue 8/2020
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI: https://doi.org/10.1007/s00500-018-03703-8

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 8/2020

A watermarking scheme based on rotating vector for image content authentication

Nickel foam surface defect detection based on spatial-frequency multi-scale MB-LBP

Scalable influence maximization based on influential seed successors

On the number of fuzzy subgroups of dicyclic groups

Research on key issues of gesture recognition for artificial intelligence

Application of Kalman filter to Model-based Prognostics for Solenoid Valve

Premium Partner