Efficient and Accurate Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks

Authors:
Babak Rahbarinia

Auburn University Montgomery, Montgomery, AL

Auburn University Montgomery, Montgomery, AL
View Profile

,
Roberto Perdisci

University of Georgia, Georgia Institute of Technology, Athens, GA

University of Georgia, Georgia Institute of Technology, Athens, GA
View Profile

,
Manos Antonakakis

Georgia Institute of Technology, Atlanta, GA

Georgia Institute of Technology, Atlanta, GA
View Profile

Authors Info & Claims

ACM Transactions on Privacy and Security Volume 19 Issue 2Article No.: 4pp 1–31https://doi.org/10.1145/2960409

Published:17 August 2016Publication History

ACM Transactions on Privacy and Security

Abstract

In this article, we propose Segugio, a novel defense system that allows for efficiently tracking the occurrence of new malware-control domain names in very large ISP networks. Segugio passively monitors the DNS traffic to build a machine-domain bipartite graph representing who is querying what. After labeling nodes in this query behavior graph that are known to be either benign or malware-related, we propose a novel approach to accurately detect previously unknown malware-control domains.

We implemented a proof-of-concept version of Segugio and deployed it in large ISP networks that serve millions of users. Our experimental results show that Segugio can track the occurrence of new malware-control domains with up to 94% true positives (TPs) at less than 0.1% false positives (FPs). In addition, we provide the following results: (1) we show that Segugio can also detect control domains related to new, previously unseen malware families, with 85% TPs at 0.1% FPs; (2) Segugio’s detection models learned on traffic from a given ISP network can be deployed into a different ISP network and still achieve very high detection accuracy; (3) new malware-control domains can be detected days or even weeks before they appear in a large commercial domain-name blacklist; (4) Segugio can be used to detect previously unknown malware-infected machines in ISP networks; and (5) we show that Segugio clearly outperforms domain-reputation systems based on Belief Propagation.

References

Manos Antonakakis, Roberto Perdisci, David Dagon, Wenke Lee, and Nick Feamster. 2010. Building a dynamic reputation system for DNS. In Proceedings of the 19th USENIX Conference on Security (USENIX Security’10). Google ScholarDigital Library
Manos Antonakakis, Roberto Perdisci, Wenke Lee, Nikolaos Vasiloglou, II, and David Dagon. 2011. Detecting malware domains at the upper DNS hierarchy. In Proceedings of the 20th USENIX Conference on Security (SEC’11). Google ScholarDigital Library
Manos Antonakakis, Roberto Perdisci, Yacin Nadji, Nikolaos Vasiloglou, Saeed Abu-Nimeh, Wenke Lee, and David Dagon. 2012. From throw-away traffic to bots: Detecting the rise of DGA-based malware. In Proceedings of the 21st USENIX Conference on Security Symposium (Security’12). USENIX Association, Berkeley, CA, 24--24. http://dl.acm.org/citation.cfm?id=2362793.2362817 Google ScholarDigital Library
Leyla Bilge, Engin Kirda, Christopher Kruegel, and Marco Balduzzi. 2011. EXPOSURE: Finding malicious domains using passive DNS analysis. In NDSS. The Internet Society.Google Scholar
Leo Breiman. 2001. Random forests. Machine Learning 45, 1, 5--32. Google ScholarDigital Library
Juan Caballero, Chris Grier, Christian Kreibich, and Vern Paxson. 2011. Measuring pay-per-install: The commoditization of malware distribution. In Proceedings of the 20th USENIX Conference on Security (SEC’11). USENIX Association, Berkeley, CA, USA, 13--13. Google ScholarDigital Library
D. H. Chau, C. Nachenberg, J. Willhelm, A. Wright, and C. Faloutsos. 2011. Polonium: Tera-scale graph mining and inference for malware detection. Proceedings of SIAM International Conference on Data Mining (SDM’11) 131--142.Google Scholar
Baris Coskun, Sven Dietrich, and Nasir Memon. 2010. Friends of an enemy: Identifying local members of peer-to-peer botnets using mutual contacts. In Proceedings of the 26th Annual Computer Security Applications Conference. ACM, 131--140. Google ScholarDigital Library
Manuel Egele, Theodoor Scholte, Engin Kirda, and Christopher Kruegel. 2008. A survey on automated dynamic malware-analysis techniques and tools. ACM Computing Surveys 44, 2, Article 6. Google ScholarDigital Library
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. The Journal of Machine Learning Research 9, 1871--1874. Google ScholarDigital Library
Mark Felegyhazi, Christian Kreibich, and Vern Paxson. 2010. On the potential of proactive domain blacklisting. In Proceedings of the 3rd USENIX Workshop on Large-scale Exploits and Emergent Threats (LEET’10). Google ScholarDigital Library
Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee. 2008a. BotMiner: Clustering analysis of network traffic for protocol- and structure-independent botnet detection. In Proceedings of the 17th Conference on Security Symposium (SS’08). USENIX Association, Berkeley, CA, 139--154. Google ScholarDigital Library
Guofei Gu, Phillip Porras, Vinod Yegneswaran, Martin Fong, and Wenke Lee. 2007. BotHunter: Detecting malware infection through IDS-driven dialog correlation. In Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium (SS’07). USENIX Association, Berkeley, CA, Article 12. Google ScholarDigital Library
Guofei Gu, Junjie Zhang, and Wenke Lee. 2008b. BotSniffer: Detecting botnet command and control channels in network traffic. In Proceedings of the 15th Annual Network and Distributed System Security Symposium (NDSS’08).Google Scholar
Gregoire Jacob, Ralf Hund, Christopher Kruegel, and Thorsten Holz. 2011. JACKSTRAWS: Picking command and control connections from bot traffic. In Proceedings of the 20th USENIX Conference on Security. Berkeley, CA. Google ScholarDigital Library
Thomas Karagiannis, Konstantina Papagiannaki, and Michalis Faloutsos. 2005. BLINC: Multilevel traffic classification in the dark. In Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM’05). ACM, New York, NY, 12. Google ScholarDigital Library
Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. The MIT Press, Cambridge, MA. Google ScholarDigital Library
Marc Kührer, Christian Rossow, and Thorsten Holz. 2014. Paint it black: Evaluating the effectiveness of malware blacklists. In Research in Attacks, Intrusions and Defenses. Springer, 1--21.Google Scholar
Ludmila I. Kuncheva. 2004. Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, Hoboken, NJ. Google ScholarDigital Library
Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2010. GraphLab: A new parallel framework for machine learning. In Conference on Uncertainty in Artificial Intelligence (UAI). Catalina Island, CA.Google Scholar
Pratyusa K. Manadhata, Sandeep Yadav, Prasad Rao, and William Horne. 2014. Detecting malicious domains via graph inference. In Computer Security - ESORICS’14, Miroslaw Kutylowski and Jaideep Vaidya (Eds.). Lecture Notes in Computer Science, Vol. 8712. Springer, Berlin, 1--18.Google Scholar
Terry Nelms, Roberto Perdisci, and Mustaque Ahamad. 2013. ExecScent: Mining for new C&C domains in live networks with adaptive control protocol templates. In Proceedings of the 22nd USENIX Conference on Security. USENIX Association, 589--604. Google ScholarDigital Library
Roberto Perdisci, Wenke Lee, and Nick Feamster. 2010. Behavioral clustering of HTTP-based malware and signature generation using malicious network traces. In Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation (NSDI’10). Google ScholarDigital Library
M. Zubair Rafique and Juan Caballero. 2013. FIRMA: Malware clustering and network signature generation with mixed network behaviors. In Proceedings of the 16th International Symposium on Research in Attacks, Intrusions and Defenses. St. Lucia. Google ScholarDigital Library
Babak Rahbarinia, Roberto Perdisci, and Manos Antonakakis. 2015. Segugio: Efficient behavior-based tracking of malware-control domains in large ISP networks. In Proceedings of the 2015 IEEE/IFIP International Conference on Dependable Systems &Networks (DSN’’15). Google ScholarDigital Library
Christian Rossow, Christian Dietrich, and Herbert Bos. 2013. Large-scale analysis of malware downloaders. In Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 42--61. Google ScholarDigital Library
Kazumichi Sato, Keisuke Ishibashi, Tsuyoshi Toyono, and Nobuhisa Miyake. 2010. Extending black domain name list by using co-occurrence relation between DNS queries. In LEET. Google ScholarDigital Library
Le Song, Arthur Gretton, Danny Bickson, Yucheng Low, and Carlos Guestrin. 2011. Kernel belief propagation. In Artificial Intelligence and Statistics (AISTATS).Google Scholar
Symantec. 2013a. India Sees 280 Percent Increase in Bot Infections. Retrieved July 18, 2016 from http://www.symantec.com/en/in/about/news/release/article.jsp?pr id=20130428_01.Google Scholar
Symantec. 2013b. Internet Security Threat Report, Volume 18. http://www.symantec.com/content/en/us/enterprise/other_resources/b-istr_main_report_v18_2012_21291018.en-us.pdf.Google Scholar
Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel, and Engin Kirda. 2009. Automatically generating models for botnet detection. In Proceedings of the 14th European Conference on Research in Computer Security (ESORICS’09). Google ScholarDigital Library
Kuai Xu, Feng Wang, and Lin Gu. 2011. Network-aware behavior clustering of Internet end hosts. In Proceedings of IEEE INFOCOM.Google ScholarCross Ref
Ting-Fang Yen and Michael K. Reiter. 2010. Are your hosts trading or plotting? Telling P2P file-sharing and bots apart. In Proceedings of the IEEE 30th International Conference on Distributed Computing Systems (ICDCS’10). Google ScholarDigital Library
Junjie Zhang, Roberto Perdisci, Wenke Lee, Unum Sarfraz, and Xiapu Luo. 2011. Detecting stealthy P2P botnets using statistical traffic fingerprints. In Proceedings of the IEEE/IFIP 41st International Conference on Dependable Systems &Networks (DSN’’11). Google ScholarDigital Library

Index Terms

Efficient and Accurate Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks

Recommendations

Segugio: Efficient Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks
DSN '15: Proceedings of the 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks

In this paper, we propose Segugio, a novel defense system that allows for efficiently tracking the occurrence of new malware-control domain names in very large ISP networks. Segugio passively monitors the DNS traffic to build a machine-domain bipartite ...
Read More
Design and implementation of a malware detection system based on network behavior

With the increasing of new malicious software attacks, the host-based malware detection methods cannot always detect the latest unknown malware. Intrusion detection system does not focus on malware detection, whereas the behavior-based detection methods ...
Read More
Behavior-based malware detection
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Privacy and Security Volume 19, Issue 2
September 2016
83 pages
ISSN:2471-2566
EISSN:2471-2574
DOI:10.1145/2988517
Editor:
David Basin
ETH Zurich, Switzerland
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 August 2016
- Revised: 1 June 2016
- Accepted: 1 June 2016
- Received: 1 September 2015
Published in tops Volume 19, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Behavioral analysis
graph mining
malware-control domains
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 24
  Total Citations
  View Citations
- 881
  Total Downloads
- Downloads (Last 12 months)58
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient and Accurate Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks

ACM Transactions on Privacy and Security

Abstract

References

Cited By

Index Terms

Recommendations

Segugio: Efficient Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks

Design and implementation of a malware detection system based on network behavior

Behavior-based malware detection