research-article

Detection of algorithmically generated domain names used by botnets: a dual arms race

Authors:
Jan Spooren

KU Leuven, Heverlee, Belgium

KU Leuven, Heverlee, Belgium
View Profile

,
Davy Preuveneers

KU Leuven, Heverlee, Belgium

KU Leuven, Heverlee, Belgium
View Profile

,
Lieven Desmet

KU Leuven, Heverlee, Belgium

KU Leuven, Heverlee, Belgium
View Profile

,
Peter Janssen

EURid VZW, Diegem, Belgium

EURid VZW, Diegem, Belgium
View Profile

,
Wouter Joosen

KU Leuven, Heverlee, Belgium

KU Leuven, Heverlee, Belgium
View Profile

SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied ComputingApril 2019Pages 1916–1923https://doi.org/10.1145/3297280.3297467

Published:08 April 2019Publication History

SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing

Pages 1916–1923

ABSTRACT

Malware typically uses Domain Generation Algorithms (DGAs) as a mechanism to contact their Command and Control server. In recent years, different approaches to automatically detect generated domain names have been proposed, based on machine learning. The first problem that we address is the difficulty to systematically compare these DGA detection algorithms due to the lack of an independent benchmark. The second problem that we investigate is the difficulty for an adversary to circumvent these classifiers when the machine learning models backing these DGA-detectors are known. In this paper we compare two different approaches on the same set of DGAs: classical machine learning using manually engineered features and a 'deep learning' recurrent neural network. We show that the deep learning approach performs consistently better on all of the tested DGAs, with an average classification accuracy of 98.7% versus 93.8% for the manually engineered features. We also show that one of the dangers of manual feature engineering is that DGAs can adapt their strategy, based on knowledge of the features used to detect them. To demonstrate this, we use the knowledge of the used feature set to design a new DGA which makes the random forest classifier powerless with a classification accuracy of 59.9%. The deep learning classifier is also (albeit less) affected, reducing its accuracy to 85.5%.

References

Alexa Internet, Inc. . 2018. Alexa. https://www.alexa.com/topsites. {Online; accessed 16-September-2018}.Google Scholar
Hyrum S. Anderson, Jonathan Woodbridge, and Bobby Filar. 2016. DeepDGA: Adversarially-Tuned Domain Generation and Detection. In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security (AISec '16). ACM, New York, NY, USA, 13--21. Google ScholarDigital Library
Manos Antonakakis, Roberto Perdisci, Yacin Nadji, Nikolaos Vasiloglou, Saeed Abu-Nimeh, Wenke Lee, and David Dagon. 2012. From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware. In Presented as part of the 21st USENIX Security Symposium (USENIX Security 12). USENIX, Bellevue, WA, 491--506. https://www.usenix.org/conference/usenixsecuritty12/technical-sessions/presentation/antonakakis Google ScholarDigital Library
Evan Cooke, Farnam Jahanian, and Danny McPherson. 2005. The Zombie Roundup: Understanding, Detecting, and Disrupting Botnets. (07 2005).Google Scholar
Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning. In CCS. ACM, 1285--1298. Google ScholarDigital Library
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'14). MIT Press, Cambridge, MA, USA, 2672--2680. http://dl.acm.org/citation.cfm?id=2969033.2969125 Google ScholarDigital Library
Google. 2017. Google Safe Browsing List. https://developers.google.com/safe-browsing/. {Online; accessed August-2017}.Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-term Memory. Neural Comput. 9, 9 (Nov. 1997), 1735--1780.Google ScholarDigital Library
Hyrum Anderson. 2018. Endgame GitHub. https://github.com/endgameinc/dga_predict/. {Online; accessed 20-August-2018}.Google Scholar
Victor Le Pochat, Tom Van Goethem, Samaneh Tajalizadehkhoob, Maciej Korczynski, and Wouter Joosen. 2019. Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation. In Proceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS 2019).Google ScholarCross Ref
Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. In NDSS. The Internet Society.Google Scholar
Miranda Mowbray and Josiah Hagen. 2014. Finding Domain-Generation Algorithms by Looking at Length Distribution. In 25th IEEE International Symposium on Software Reliability Engineering Workshops, ISSRE Workshops, Naples, Italy, November 3-6, 2014. IEEE Computer Society, 395--400. Google ScholarDigital Library
Mayana Pereira, Shaun Coleman, Bin Yu, Martine DeCock, and Anderson Nascimento. 2018. Dictionary Extraction and Detection of Algorithmically Generated Domain Names in Passive DNS Traffic: 21st International Symposium, RAID 2018, Heraklion, Crete, Greece, September 10-12, 2018, Proceedings. (09 2018), 295--314.Google Scholar
Daniel Plohmann. 2018. DGArchive. https://dgarchive.caad.fkie.fraunhofer.de/. {Online; accessed 10-September-2018}.Google Scholar
Daniel Plohmann, Khaled Yakdan, Michael Klatt, Johannes Bader, and Elmar Gerhards-Padilla. 2016. A Comprehensive Measurement Study of Domain Generating Malware. In 25th USENIX Security Symposium (USENIX Security 16). USENIX Association, Austin, TX, 263--278. https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/plohmann Google ScholarDigital Library
Vera Rimmer, Davy Preuveneers, Marc Juárez, Tom van Goethem, and Wouter Joosen. 2017. Automated Feature Extraction for Website Fingerprinting through Deep Learning. CoRR abs/1708.06376 (2017). arXiv:1708.06376 http://arxiv.org/abs/1708.06376Google Scholar
Christian Rossow, Dennis Andriesse, Tillmann Werner, Brett Stone-Gross, Daniel Plohmann, Christian J. Dietrich, and Herbert Bos. 2013. SoK: P2PWNED - Modeling and Evaluating the Resilience of Peer-to-Peer Botnets. 2013 IEEE Symposium on Security and Privacy (2013), 97--111. Google ScholarDigital Library
Stefano Schiavoni, Federico Maggi, Lorenzo Cavallaro, and Stefano Zanero. 2014. Phoenix: DGA-Based Botnet Tracking and Intelligence. In Detection of Intrusions and Malware, and Vulnerability Assessment, Sven Dietrich (Ed.). Springer International Publishing, Cham, 192--211.Google Scholar
Samuel Schüppen, Dominik Teubert, Patrick Herrmann, and Ulrike Meyer. 2018. FANCI: Feature-based Automated NXDomain Classification and Intelligence. In 27th USENIX Security Symposium (USENIX Security 18). USENIX Association, Baltimore, MD, 1165--1181. https://www.usenix.org/conference/usenixsecurity18/presentation/schuppen Google ScholarDigital Library
SURBL.org. 2017. SURBL - URI Reputation Data. http://www.surbl.org/. {Online; accessed August-2017}.Google Scholar
The Spamhaus Project Ltd. 2017. The Domain Block List. https://www.spamhaus.org/dbl/. {Online; accessed August-2017}.Google Scholar
Florian TramÃĺr, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. 2017. The Space of Transferable Adversarial Examples. arXiv (2017). https://arxiv.org/abs/1704.03453Google Scholar
Ian H. Witten, Eibe Frank, Mark A. Hall, and Christopher J. Pal. 2016. Data Mining, Fourth Edition: Practical Machine Learning Tools and Techniques (4th ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Google ScholarDigital Library
Jonathan Woodbridge, Hyrum S. Anderson, Anjum Ahuja, and Daniel Grant. 2016. Predicting Domain Generation Algorithms with Long Short-Term Memory Networks. CoRR abs/1611.00791 (2016). arXiv:1611.00791 http://arxiv.org/abs/1611.00791Google Scholar
S. Yadav, A. K. K. Reddy, A. L. N. Reddy, and S. Ranjan. 2012. Detecting Algorithmically Generated Domain-Flux Attacks With DNS Traffic Analysis. IEEE/ACM Transactions on Networking 20, 5 (Oct 2012), 1663--1677. Google ScholarDigital Library

Index Terms

Detection of algorithmically generated domain names used by botnets: a dual arms race
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Classification and regression trees
      2. Neural networks
2. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Malware and its mitigation

Recommendations

Detecting algorithmically generated malicious domain names
IMC '10: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement

Recent Botnets such as Conficker, Kraken and Torpig have used DNS based "domain fluxing" for command-and-control, where each Bot queries for existence of a series of domain names and the owner has to register only one such domain name. In this paper, we ...
Read More
Malware detection using adaptive data compression
AISec '08: Proceedings of the 1st ACM workshop on Workshop on AISec

A popular approach in current commercial anti-malware software detects malicious programs by searching in the code of programs for scan strings that are byte sequences indicative of malicious code. The scan strings, also known as the signatures of ...
Read More
On the use of DGAs in malware: an everlasting competition of detection and evasion

Malware typically makes use of Domain Generation Algorithms (DGAs) as a mechanism to contact their Command and Control server. In recent years, different approaches to automatically detect generated domain names have been proposed, based on machine ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing
April 2019
2682 pages
ISBN:9781450359337
DOI:10.1145/3297280
Conference Chairs:
Chih-Cheng Hung
Kennesaw State University, Marietta, Georgia
,
George A. Papadopoulos
University of Cyprus, Nicosia, Cyprus
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 April 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Best Paper
Author Tags
domain generation algorithms
machine learning
malware detection
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 27
  Total Citations
  View Citations
- 462
  Total Downloads
- Downloads (Last 12 months)48
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Detection of algorithmically generated domain names used by botnets: a dual arms race

SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Detecting algorithmically generated malicious domain names

Malware detection using adaptive data compression

On the use of DGAs in malware: an everlasting competition of detection and evasion