skip to main content
research-article
Public Access

Efficient and Accurate Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks

Published:17 August 2016Publication History
Skip Abstract Section

Abstract

In this article, we propose Segugio, a novel defense system that allows for efficiently tracking the occurrence of new malware-control domain names in very large ISP networks. Segugio passively monitors the DNS traffic to build a machine-domain bipartite graph representing who is querying what. After labeling nodes in this query behavior graph that are known to be either benign or malware-related, we propose a novel approach to accurately detect previously unknown malware-control domains.

We implemented a proof-of-concept version of Segugio and deployed it in large ISP networks that serve millions of users. Our experimental results show that Segugio can track the occurrence of new malware-control domains with up to 94% true positives (TPs) at less than 0.1% false positives (FPs). In addition, we provide the following results: (1) we show that Segugio can also detect control domains related to new, previously unseen malware families, with 85% TPs at 0.1% FPs; (2) Segugio’s detection models learned on traffic from a given ISP network can be deployed into a different ISP network and still achieve very high detection accuracy; (3) new malware-control domains can be detected days or even weeks before they appear in a large commercial domain-name blacklist; (4) Segugio can be used to detect previously unknown malware-infected machines in ISP networks; and (5) we show that Segugio clearly outperforms domain-reputation systems based on Belief Propagation.

References

  1. Manos Antonakakis, Roberto Perdisci, David Dagon, Wenke Lee, and Nick Feamster. 2010. Building a dynamic reputation system for DNS. In Proceedings of the 19th USENIX Conference on Security (USENIX Security’10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Manos Antonakakis, Roberto Perdisci, Wenke Lee, Nikolaos Vasiloglou, II, and David Dagon. 2011. Detecting malware domains at the upper DNS hierarchy. In Proceedings of the 20th USENIX Conference on Security (SEC’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Manos Antonakakis, Roberto Perdisci, Yacin Nadji, Nikolaos Vasiloglou, Saeed Abu-Nimeh, Wenke Lee, and David Dagon. 2012. From throw-away traffic to bots: Detecting the rise of DGA-based malware. In Proceedings of the 21st USENIX Conference on Security Symposium (Security’12). USENIX Association, Berkeley, CA, 24--24. http://dl.acm.org/citation.cfm?id=2362793.2362817 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Leyla Bilge, Engin Kirda, Christopher Kruegel, and Marco Balduzzi. 2011. EXPOSURE: Finding malicious domains using passive DNS analysis. In NDSS. The Internet Society.Google ScholarGoogle Scholar
  5. Leo Breiman. 2001. Random forests. Machine Learning 45, 1, 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Juan Caballero, Chris Grier, Christian Kreibich, and Vern Paxson. 2011. Measuring pay-per-install: The commoditization of malware distribution. In Proceedings of the 20th USENIX Conference on Security (SEC’11). USENIX Association, Berkeley, CA, USA, 13--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. H. Chau, C. Nachenberg, J. Willhelm, A. Wright, and C. Faloutsos. 2011. Polonium: Tera-scale graph mining and inference for malware detection. Proceedings of SIAM International Conference on Data Mining (SDM’11) 131--142.Google ScholarGoogle Scholar
  8. Baris Coskun, Sven Dietrich, and Nasir Memon. 2010. Friends of an enemy: Identifying local members of peer-to-peer botnets using mutual contacts. In Proceedings of the 26th Annual Computer Security Applications Conference. ACM, 131--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Manuel Egele, Theodoor Scholte, Engin Kirda, and Christopher Kruegel. 2008. A survey on automated dynamic malware-analysis techniques and tools. ACM Computing Surveys 44, 2, Article 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. The Journal of Machine Learning Research 9, 1871--1874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Mark Felegyhazi, Christian Kreibich, and Vern Paxson. 2010. On the potential of proactive domain blacklisting. In Proceedings of the 3rd USENIX Workshop on Large-scale Exploits and Emergent Threats (LEET’10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee. 2008a. BotMiner: Clustering analysis of network traffic for protocol- and structure-independent botnet detection. In Proceedings of the 17th Conference on Security Symposium (SS’08). USENIX Association, Berkeley, CA, 139--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Guofei Gu, Phillip Porras, Vinod Yegneswaran, Martin Fong, and Wenke Lee. 2007. BotHunter: Detecting malware infection through IDS-driven dialog correlation. In Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium (SS’07). USENIX Association, Berkeley, CA, Article 12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Guofei Gu, Junjie Zhang, and Wenke Lee. 2008b. BotSniffer: Detecting botnet command and control channels in network traffic. In Proceedings of the 15th Annual Network and Distributed System Security Symposium (NDSS’08).Google ScholarGoogle Scholar
  15. Gregoire Jacob, Ralf Hund, Christopher Kruegel, and Thorsten Holz. 2011. JACKSTRAWS: Picking command and control connections from bot traffic. In Proceedings of the 20th USENIX Conference on Security. Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Thomas Karagiannis, Konstantina Papagiannaki, and Michalis Faloutsos. 2005. BLINC: Multilevel traffic classification in the dark. In Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM’05). ACM, New York, NY, 12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. The MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Marc Kührer, Christian Rossow, and Thorsten Holz. 2014. Paint it black: Evaluating the effectiveness of malware blacklists. In Research in Attacks, Intrusions and Defenses. Springer, 1--21.Google ScholarGoogle Scholar
  19. Ludmila I. Kuncheva. 2004. Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, Hoboken, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2010. GraphLab: A new parallel framework for machine learning. In Conference on Uncertainty in Artificial Intelligence (UAI). Catalina Island, CA.Google ScholarGoogle Scholar
  21. Pratyusa K. Manadhata, Sandeep Yadav, Prasad Rao, and William Horne. 2014. Detecting malicious domains via graph inference. In Computer Security - ESORICS’14, Miroslaw Kutylowski and Jaideep Vaidya (Eds.). Lecture Notes in Computer Science, Vol. 8712. Springer, Berlin, 1--18.Google ScholarGoogle Scholar
  22. Terry Nelms, Roberto Perdisci, and Mustaque Ahamad. 2013. ExecScent: Mining for new C&C domains in live networks with adaptive control protocol templates. In Proceedings of the 22nd USENIX Conference on Security. USENIX Association, 589--604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Roberto Perdisci, Wenke Lee, and Nick Feamster. 2010. Behavioral clustering of HTTP-based malware and signature generation using malicious network traces. In Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation (NSDI’10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Zubair Rafique and Juan Caballero. 2013. FIRMA: Malware clustering and network signature generation with mixed network behaviors. In Proceedings of the 16th International Symposium on Research in Attacks, Intrusions and Defenses. St. Lucia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Babak Rahbarinia, Roberto Perdisci, and Manos Antonakakis. 2015. Segugio: Efficient behavior-based tracking of malware-control domains in large ISP networks. In Proceedings of the 2015 IEEE/IFIP International Conference on Dependable Systems &Networks (DSN’’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Christian Rossow, Christian Dietrich, and Herbert Bos. 2013. Large-scale analysis of malware downloaders. In Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 42--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kazumichi Sato, Keisuke Ishibashi, Tsuyoshi Toyono, and Nobuhisa Miyake. 2010. Extending black domain name list by using co-occurrence relation between DNS queries. In LEET. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Le Song, Arthur Gretton, Danny Bickson, Yucheng Low, and Carlos Guestrin. 2011. Kernel belief propagation. In Artificial Intelligence and Statistics (AISTATS).Google ScholarGoogle Scholar
  29. Symantec. 2013a. India Sees 280 Percent Increase in Bot Infections. Retrieved July 18, 2016 from http://www.symantec.com/en/in/about/news/release/article.jsp?pr id=20130428_01.Google ScholarGoogle Scholar
  30. Symantec. 2013b. Internet Security Threat Report, Volume 18. http://www.symantec.com/content/en/us/enterprise/other_resources/b-istr_main_report_v18_2012_21291018.en-us.pdf.Google ScholarGoogle Scholar
  31. Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel, and Engin Kirda. 2009. Automatically generating models for botnet detection. In Proceedings of the 14th European Conference on Research in Computer Security (ESORICS’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kuai Xu, Feng Wang, and Lin Gu. 2011. Network-aware behavior clustering of Internet end hosts. In Proceedings of IEEE INFOCOM.Google ScholarGoogle ScholarCross RefCross Ref
  33. Ting-Fang Yen and Michael K. Reiter. 2010. Are your hosts trading or plotting? Telling P2P file-sharing and bots apart. In Proceedings of the IEEE 30th International Conference on Distributed Computing Systems (ICDCS’10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Junjie Zhang, Roberto Perdisci, Wenke Lee, Unum Sarfraz, and Xiapu Luo. 2011. Detecting stealthy P2P botnets using statistical traffic fingerprints. In Proceedings of the IEEE/IFIP 41st International Conference on Dependable Systems &Networks (DSN’’11). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient and Accurate Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Privacy and Security
            ACM Transactions on Privacy and Security  Volume 19, Issue 2
            September 2016
            83 pages
            ISSN:2471-2566
            EISSN:2471-2574
            DOI:10.1145/2988517
            Issue’s Table of Contents

            Copyright © 2016 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 17 August 2016
            • Revised: 1 June 2016
            • Accepted: 1 June 2016
            • Received: 1 September 2015
            Published in tops Volume 19, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader