ABSTRACT
Many malware families utilize domain generation algorithms (DGAs) to establish command and control (C&C) connections. While there are many methods to pseudorandomly generate domains, we focus in this paper on detecting (and generating) domains on a per-domain basis which provides a simple and flexible means to detect known DGA families. Recent machine learning approaches to DGA detection have been successful on fairly simplistic DGAs, many of which produce names of fixed length. However, models trained on limited datasets are somewhat blind to new DGA variants. In this paper, we leverage the concept of generative adversarial networks to construct a deep learning based DGA that is designed to intentionally bypass a deep learning based detector. In a series of adversarial rounds, the generator learns to generate domain names that are increasingly more difficult to detect. In turn, a detector model updates its parameters to compensate for the adversarially generated domains. We test the hypothesis of whether adversarially generated domains may be used to augment training sets in order to harden other machine learning models against yet-to-be-observed DGAs. We detail solutions to several challenges in training this character-based generative adversarial network. In particular, our deep learning architecture begins as a domain name auto-encoder (encoder + decoder) trained on domains in the Alexa one million. Then the encoder and decoder are reassembled competitively in a generative adversarial network (detector + generator), with novel neural architectures and training strategies to improve convergence.
- A closer look at cyrptolocker's DGA. https://blog.fortinet.com/post/a-closer-look-at-cryptolocker-s-dga. Accessed: 2016-04--22.Google Scholar
- M. Antonakakis, R. Perdisci, Y. Nadji, N. Vasiloglou, S. Abu-Nimeh, W. Lee, and D. Dagon. From throw-away traffic to bots: detecting the rise of DGA-based malware. In P21st USENIX Security Symposium (USENIX Security 12), pages 491--506, 2012. Google ScholarDigital Library
- A. J. Aviv and A. Haeberlen. Challenges in experimenting with botnet detection systems. In CSET, 2011. Google ScholarDigital Library
- Y. Bengio, N. Boulanger-Lewandowski, and R. Pascanu. Advances in optimizing recurrent networks. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 8624--8628. IEEE, 2013.Google ScholarCross Ref
- A. Cherepanov and R. Lipovsky. Hesperbot-A new, advanced banking trojan in the wild, 2013.Google Scholar
- F. Chollet. keras. https://github.com/fchollet/keras, 2016.Google Scholar
- J. Geffner. End-to-end analysis of a domain generating algorithm malware family. Black Hat USA 2013, 2013.Google Scholar
- F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with LS™. Neural computation, 12(10):2451--2471, 2000. Google ScholarDigital Library
- F. A. Gers, N. N. Schraudolph, and J. Schmidhuber. Learning precise timing with LS™ recurrent networks. J. Machine Learning Research, 3:115--143, 2003. Google ScholarDigital Library
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672--2680, 2014. Google ScholarDigital Library
- I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.Google Scholar
- A. Graves. Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711, 2012.Google Scholar
- N. Hampton and Z. A. Baig. Ransomware: Emergence of the cyber-extortion menace. In Australian Information Security Management Conference, 2015.Google Scholar
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735--1780, 1997. Google ScholarDigital Library
- Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush. Character-aware neural language models. arXiv preprint arXiv:1508.06615, 2015.Google Scholar
- T. Mikolov, M. Karafiát, L. Burget, J. Cernockỳ, and S. Khudanpur. Recurrent neural network based language model. In INTERSPEECH, volume 2, page 3, 2010.Google ScholarCross Ref
- N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In Proceedings of the 37th IEEE Symposium on Security and Privacy, 2015.Google Scholar
- A. J. Robinson. An application of recurrent nets to phone probability estimation. Neural Networks, IEEE Transactions on, 5(2):298--305, 1994. Google ScholarDigital Library
- T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. arXiv preprint arXiv:1606.03498, 2016.Google Scholar
- S. Schiavoni, F. Maggi, L. Cavallaro, and S. Zanero. Phoenix: DGA-based botnet tracking and intelligence. In Detection of intrusions and malware, and vulnerability assessment, pages 192--211. Springer, 2014.Google Scholar
- R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. arXiv preprint arXiv:1505.00387, 2015.Google Scholar
- Symantec. W32.Ramnit analysis. 2015-02--24, Version 1.0.Google Scholar
- C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.Google Scholar
- M. Ward. Cryptolocker victims to get files back for free. BBC News, August, 6, 2014.Google Scholar
- S. Yadav, A. K. K. Reddy, A. Reddy, and S. Ranjan. Detecting algorithmically generated malicious domain names. In Proc. 10th ACM SIGCOMM conference on Internet measurement, pages 48--61. ACM, 2010. Google ScholarDigital Library
- S. Yadav, A. K. K. Reddy, A. N. Reddy, and S. Ranjan. Detecting algorithmically generated domain-flux attacks with DNS traffic analysis. Networking, IEEE/ACM Transactions on, 20(5):1663--1677, 2012. Google ScholarDigital Library
Index Terms
- DeepDGA: Adversarially-Tuned Domain Generation and Detection
Recommendations
Detection of algorithmically generated domain names used by botnets: a dual arms race
SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied ComputingMalware typically uses Domain Generation Algorithms (DGAs) as a mechanism to contact their Command and Control server. In recent years, different approaches to automatically detect generated domain names have been proposed, based on machine learning. ...
Leveraging n-gram neural embeddings to improve deep learning DGA detection
SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied ComputingSeveral families of malware are based on the need to establish a connection with a Command and Control (C&C) server. In addition, to avoid detection, these servers "hide" behind domain names that are periodically changed according to a specific Domain ...
Uncertainty-Aware Semi-Supervised Method Using Large Unlabeled and Limited Labeled COVID-19 Data
The new coronavirus has caused more than one million deaths and continues to spread rapidly. This virus targets the lungs, causing respiratory distress which can be mild or severe. The X-ray or computed tomography (CT) images of lungs can reveal whether ...
Comments