ABSTRACT
Malware typically uses Domain Generation Algorithms (DGAs) as a mechanism to contact their Command and Control server. In recent years, different approaches to automatically detect generated domain names have been proposed, based on machine learning. The first problem that we address is the difficulty to systematically compare these DGA detection algorithms due to the lack of an independent benchmark. The second problem that we investigate is the difficulty for an adversary to circumvent these classifiers when the machine learning models backing these DGA-detectors are known. In this paper we compare two different approaches on the same set of DGAs: classical machine learning using manually engineered features and a 'deep learning' recurrent neural network. We show that the deep learning approach performs consistently better on all of the tested DGAs, with an average classification accuracy of 98.7% versus 93.8% for the manually engineered features. We also show that one of the dangers of manual feature engineering is that DGAs can adapt their strategy, based on knowledge of the features used to detect them. To demonstrate this, we use the knowledge of the used feature set to design a new DGA which makes the random forest classifier powerless with a classification accuracy of 59.9%. The deep learning classifier is also (albeit less) affected, reducing its accuracy to 85.5%.
- Alexa Internet, Inc. . 2018. Alexa. https://www.alexa.com/topsites. {Online; accessed 16-September-2018}.Google Scholar
- Hyrum S. Anderson, Jonathan Woodbridge, and Bobby Filar. 2016. DeepDGA: Adversarially-Tuned Domain Generation and Detection. In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security (AISec '16). ACM, New York, NY, USA, 13--21. Google ScholarDigital Library
- Manos Antonakakis, Roberto Perdisci, Yacin Nadji, Nikolaos Vasiloglou, Saeed Abu-Nimeh, Wenke Lee, and David Dagon. 2012. From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware. In Presented as part of the 21st USENIX Security Symposium (USENIX Security 12). USENIX, Bellevue, WA, 491--506. https://www.usenix.org/conference/usenixsecuritty12/technical-sessions/presentation/antonakakis Google ScholarDigital Library
- Evan Cooke, Farnam Jahanian, and Danny McPherson. 2005. The Zombie Roundup: Understanding, Detecting, and Disrupting Botnets. (07 2005).Google Scholar
- Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning. In CCS. ACM, 1285--1298. Google ScholarDigital Library
- Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'14). MIT Press, Cambridge, MA, USA, 2672--2680. http://dl.acm.org/citation.cfm?id=2969033.2969125 Google ScholarDigital Library
- Google. 2017. Google Safe Browsing List. https://developers.google.com/safe-browsing/. {Online; accessed August-2017}.Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-term Memory. Neural Comput. 9, 9 (Nov. 1997), 1735--1780.Google ScholarDigital Library
- Hyrum Anderson. 2018. Endgame GitHub. https://github.com/endgameinc/dga_predict/. {Online; accessed 20-August-2018}.Google Scholar
- Victor Le Pochat, Tom Van Goethem, Samaneh Tajalizadehkhoob, Maciej Korczynski, and Wouter Joosen. 2019. Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation. In Proceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS 2019).Google ScholarCross Ref
- Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. In NDSS. The Internet Society.Google Scholar
- Miranda Mowbray and Josiah Hagen. 2014. Finding Domain-Generation Algorithms by Looking at Length Distribution. In 25th IEEE International Symposium on Software Reliability Engineering Workshops, ISSRE Workshops, Naples, Italy, November 3-6, 2014. IEEE Computer Society, 395--400. Google ScholarDigital Library
- Mayana Pereira, Shaun Coleman, Bin Yu, Martine DeCock, and Anderson Nascimento. 2018. Dictionary Extraction and Detection of Algorithmically Generated Domain Names in Passive DNS Traffic: 21st International Symposium, RAID 2018, Heraklion, Crete, Greece, September 10-12, 2018, Proceedings. (09 2018), 295--314.Google Scholar
- Daniel Plohmann. 2018. DGArchive. https://dgarchive.caad.fkie.fraunhofer.de/. {Online; accessed 10-September-2018}.Google Scholar
- Daniel Plohmann, Khaled Yakdan, Michael Klatt, Johannes Bader, and Elmar Gerhards-Padilla. 2016. A Comprehensive Measurement Study of Domain Generating Malware. In 25th USENIX Security Symposium (USENIX Security 16). USENIX Association, Austin, TX, 263--278. https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/plohmann Google ScholarDigital Library
- Vera Rimmer, Davy Preuveneers, Marc Juárez, Tom van Goethem, and Wouter Joosen. 2017. Automated Feature Extraction for Website Fingerprinting through Deep Learning. CoRR abs/1708.06376 (2017). arXiv:1708.06376 http://arxiv.org/abs/1708.06376Google Scholar
- Christian Rossow, Dennis Andriesse, Tillmann Werner, Brett Stone-Gross, Daniel Plohmann, Christian J. Dietrich, and Herbert Bos. 2013. SoK: P2PWNED - Modeling and Evaluating the Resilience of Peer-to-Peer Botnets. 2013 IEEE Symposium on Security and Privacy (2013), 97--111. Google ScholarDigital Library
- Stefano Schiavoni, Federico Maggi, Lorenzo Cavallaro, and Stefano Zanero. 2014. Phoenix: DGA-Based Botnet Tracking and Intelligence. In Detection of Intrusions and Malware, and Vulnerability Assessment, Sven Dietrich (Ed.). Springer International Publishing, Cham, 192--211.Google Scholar
- Samuel Schüppen, Dominik Teubert, Patrick Herrmann, and Ulrike Meyer. 2018. FANCI: Feature-based Automated NXDomain Classification and Intelligence. In 27th USENIX Security Symposium (USENIX Security 18). USENIX Association, Baltimore, MD, 1165--1181. https://www.usenix.org/conference/usenixsecurity18/presentation/schuppen Google ScholarDigital Library
- SURBL.org. 2017. SURBL - URI Reputation Data. http://www.surbl.org/. {Online; accessed August-2017}.Google Scholar
- The Spamhaus Project Ltd. 2017. The Domain Block List. https://www.spamhaus.org/dbl/. {Online; accessed August-2017}.Google Scholar
- Florian TramÃĺr, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. 2017. The Space of Transferable Adversarial Examples. arXiv (2017). https://arxiv.org/abs/1704.03453Google Scholar
- Ian H. Witten, Eibe Frank, Mark A. Hall, and Christopher J. Pal. 2016. Data Mining, Fourth Edition: Practical Machine Learning Tools and Techniques (4th ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Google ScholarDigital Library
- Jonathan Woodbridge, Hyrum S. Anderson, Anjum Ahuja, and Daniel Grant. 2016. Predicting Domain Generation Algorithms with Long Short-Term Memory Networks. CoRR abs/1611.00791 (2016). arXiv:1611.00791 http://arxiv.org/abs/1611.00791Google Scholar
- S. Yadav, A. K. K. Reddy, A. L. N. Reddy, and S. Ranjan. 2012. Detecting Algorithmically Generated Domain-Flux Attacks With DNS Traffic Analysis. IEEE/ACM Transactions on Networking 20, 5 (Oct 2012), 1663--1677. Google ScholarDigital Library
Index Terms
- Detection of algorithmically generated domain names used by botnets: a dual arms race
Recommendations
Detecting algorithmically generated malicious domain names
IMC '10: Proceedings of the 10th ACM SIGCOMM conference on Internet measurementRecent Botnets such as Conficker, Kraken and Torpig have used DNS based "domain fluxing" for command-and-control, where each Bot queries for existence of a series of domain names and the owner has to register only one such domain name. In this paper, we ...
Malware detection using adaptive data compression
AISec '08: Proceedings of the 1st ACM workshop on Workshop on AISecA popular approach in current commercial anti-malware software detects malicious programs by searching in the code of programs for scan strings that are byte sequences indicative of malicious code. The scan strings, also known as the signatures of ...
On the use of DGAs in malware: an everlasting competition of detection and evasion
Malware typically makes use of Domain Generation Algorithms (DGAs) as a mechanism to contact their Command and Control server. In recent years, different approaches to automatically detect generated domain names have been proposed, based on machine ...
Comments