research-article

DeepDGA: Adversarially-Tuned Domain Generation and Detection

Authors:
Hyrum S. Anderson

Endgame, Inc,, Arlington, VA, USA

Endgame, Inc,, Arlington, VA, USA
View Profile

,
Jonathan Woodbridge

Endgame. Inc., San Francisco, CA, USA

Endgame. Inc., San Francisco, CA, USA
View Profile

,
Bobby Filar

Endgame, Inc., Arlington, VA, USA

Endgame, Inc., Arlington, VA, USA
View Profile

AISec '16: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and SecurityOctober 2016Pages 13–21https://doi.org/10.1145/2996758.2996767

Published:28 October 2016Publication History

AISec '16: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security

Pages 13–21

ABSTRACT

Many malware families utilize domain generation algorithms (DGAs) to establish command and control (C&C) connections. While there are many methods to pseudorandomly generate domains, we focus in this paper on detecting (and generating) domains on a per-domain basis which provides a simple and flexible means to detect known DGA families. Recent machine learning approaches to DGA detection have been successful on fairly simplistic DGAs, many of which produce names of fixed length. However, models trained on limited datasets are somewhat blind to new DGA variants. In this paper, we leverage the concept of generative adversarial networks to construct a deep learning based DGA that is designed to intentionally bypass a deep learning based detector. In a series of adversarial rounds, the generator learns to generate domain names that are increasingly more difficult to detect. In turn, a detector model updates its parameters to compensate for the adversarially generated domains. We test the hypothesis of whether adversarially generated domains may be used to augment training sets in order to harden other machine learning models against yet-to-be-observed DGAs. We detail solutions to several challenges in training this character-based generative adversarial network. In particular, our deep learning architecture begins as a domain name auto-encoder (encoder + decoder) trained on domains in the Alexa one million. Then the encoder and decoder are reassembled competitively in a generative adversarial network (detector + generator), with novel neural architectures and training strategies to improve convergence.

References

A closer look at cyrptolocker's DGA. https://blog.fortinet.com/post/a-closer-look-at-cryptolocker-s-dga. Accessed: 2016-04--22.Google Scholar
M. Antonakakis, R. Perdisci, Y. Nadji, N. Vasiloglou, S. Abu-Nimeh, W. Lee, and D. Dagon. From throw-away traffic to bots: detecting the rise of DGA-based malware. In P21st USENIX Security Symposium (USENIX Security 12), pages 491--506, 2012. Google ScholarDigital Library
A. J. Aviv and A. Haeberlen. Challenges in experimenting with botnet detection systems. In CSET, 2011. Google ScholarDigital Library
Y. Bengio, N. Boulanger-Lewandowski, and R. Pascanu. Advances in optimizing recurrent networks. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 8624--8628. IEEE, 2013.Google ScholarCross Ref
A. Cherepanov and R. Lipovsky. Hesperbot-A new, advanced banking trojan in the wild, 2013.Google Scholar
F. Chollet. keras. https://github.com/fchollet/keras, 2016.Google Scholar
J. Geffner. End-to-end analysis of a domain generating algorithm malware family. Black Hat USA 2013, 2013.Google Scholar
F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with LS™. Neural computation, 12(10):2451--2471, 2000. Google ScholarDigital Library
F. A. Gers, N. N. Schraudolph, and J. Schmidhuber. Learning precise timing with LS™ recurrent networks. J. Machine Learning Research, 3:115--143, 2003. Google ScholarDigital Library
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672--2680, 2014. Google ScholarDigital Library
I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.Google Scholar
A. Graves. Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711, 2012.Google Scholar
N. Hampton and Z. A. Baig. Ransomware: Emergence of the cyber-extortion menace. In Australian Information Security Management Conference, 2015.Google Scholar
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735--1780, 1997. Google ScholarDigital Library
Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush. Character-aware neural language models. arXiv preprint arXiv:1508.06615, 2015.Google Scholar
T. Mikolov, M. Karafiát, L. Burget, J. Cernockỳ, and S. Khudanpur. Recurrent neural network based language model. In INTERSPEECH, volume 2, page 3, 2010.Google ScholarCross Ref
N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In Proceedings of the 37th IEEE Symposium on Security and Privacy, 2015.Google Scholar
A. J. Robinson. An application of recurrent nets to phone probability estimation. Neural Networks, IEEE Transactions on, 5(2):298--305, 1994. Google ScholarDigital Library
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. arXiv preprint arXiv:1606.03498, 2016.Google Scholar
S. Schiavoni, F. Maggi, L. Cavallaro, and S. Zanero. Phoenix: DGA-based botnet tracking and intelligence. In Detection of intrusions and malware, and vulnerability assessment, pages 192--211. Springer, 2014.Google Scholar
R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. arXiv preprint arXiv:1505.00387, 2015.Google Scholar
Symantec. W32.Ramnit analysis. 2015-02--24, Version 1.0.Google Scholar
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.Google Scholar
M. Ward. Cryptolocker victims to get files back for free. BBC News, August, 6, 2014.Google Scholar
S. Yadav, A. K. K. Reddy, A. Reddy, and S. Ranjan. Detecting algorithmically generated malicious domain names. In Proc. 10th ACM SIGCOMM conference on Internet measurement, pages 48--61. ACM, 2010. Google ScholarDigital Library
S. Yadav, A. K. K. Reddy, A. N. Reddy, and S. Ranjan. Detecting algorithmically generated domain-flux attacks with DNS traffic analysis. Networking, IEEE/ACM Transactions on, 20(5):1663--1677, 2012. Google ScholarDigital Library

Index Terms

DeepDGA: Adversarially-Tuned Domain Generation and Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
2. Security and privacy
  1. Network security

Recommendations

Detection of algorithmically generated domain names used by botnets: a dual arms race
SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing

Malware typically uses Domain Generation Algorithms (DGAs) as a mechanism to contact their Command and Control server. In recent years, different approaches to automatically detect generated domain names have been proposed, based on machine learning. ...
Read More
Leveraging n-gram neural embeddings to improve deep learning DGA detection
SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing

Several families of malware are based on the need to establish a connection with a Command and Control (C&C) server. In addition, to avoid detection, these servers "hide" behind domain names that are periodically changed according to a specific Domain ...
Read More
Uncertainty-Aware Semi-Supervised Method Using Large Unlabeled and Limited Labeled COVID-19 Data
The new coronavirus has caused more than one million deaths and continues to spread rapidly. This virus targets the lungs, causing respiratory distress which can be mild or severe. The X-ray or computed tomography (CT) images of lungs can reveal whether ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AISec '16: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security
October 2016
144 pages
ISBN:9781450345736
DOI:10.1145/2996758
Program Chairs:
David Mandell Freeman
LinkedIn Corporation, USA
,
Aikaterini Mitrokotsa
Chalmers University of Technology, Sweden
,
Arunesh Sinha
University of Michigan, USA
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 October 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep learning
domain generation algorithms
generative adversarial networks
machine learning
Qualifiers
- research-article
Conference

Acceptance Rates
AISec '16 Paper Acceptance Rate12of38submissions,32%Overall Acceptance Rate94of231submissions,41%
More
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 135
  Total Citations
  View Citations
- 1,356
  Total Downloads
- Downloads (Last 12 months)76
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

DeepDGA: Adversarially-Tuned Domain Generation and Detection

AISec '16: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

Detection of algorithmically generated domain names used by botnets: a dual arms race

Leveraging n-gram neural embeddings to improve deep learning DGA detection

Uncertainty-Aware Semi-Supervised Method Using Large Unlabeled and Limited Labeled COVID-19 Data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

DeepDGA: Adversarially-Tuned Domain Generation and Detection

AISec '16: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

Detection of algorithmically generated domain names used by botnets: a dual arms race

Leveraging n-gram neural embeddings to improve deep learning DGA detection

Uncertainty-Aware Semi-Supervised Method Using Large Unlabeled and Limited Labeled COVID-19 Data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media