ABSTRACT
The increasing sophistication of malicious software calls for new defensive techniques that are harder to evade, and are capable of protecting users against novel threats. We present AESOP, a scalable algorithm that identifies malicious executable files by applying Aesop's moral that "a man is known by the company he keeps." We use a large dataset voluntarily contributed by the members of Norton Community Watch, consisting of partial lists of the files that exist on their machines, to identify close relationships between files that often appear together on machines. AESOP leverages locality-sensitive hashing to measure the strength of these inter-file relationships to construct a graph, on which it performs large scale inference by propagating information from the labeled files (as benign or malicious) to the preponderance of unlabeled files. AESOP attained early labeling of 99% of benign files and 79% of malicious files, over a week before they are labeled by the state-of-the-art techniques, with a 0.9961 true positive rate at flagging malware, at 0.0001 false positive rate.
Supplemental Material
- D. S. Anderson, C. Fleizach, S. Savage, and G. M. Voelker. Spamscatter: Characterizing internet scam hosting infrastructure. In Proceedings of the USENIX Security Symposium, 2007. Google ScholarDigital Library
- M. Antonakakis, R. Perdisci, D. Dagon, W. Lee, and N. Feamster. Building a dynamic reputation system for dns. In Proceedings of the USENIX Security Symposium, 2010. Google ScholarDigital Library
- L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi. Exposure: Finding malicious domains using passive dns analysis. In Proceedings of the Annual Network and Distributed System Security Symposium, 2011.Google Scholar
- Bleeping Computer. Cryptolocker ransomware information guide and faq. October 2013. http://www.bleepingcomputer.com/virus-removal/cryptolocker-ransomware-information.Google Scholar
- A. Z. Broder. On the resemblance and containment of documents. In Proceedings of the Compression and Complexity of Sequences, 1997. Google ScholarDigital Library
- M. Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the ACM Symposium on the Theory of Computing, 2002. Google ScholarDigital Library
- D. H. Chau, C. Nachenberg, J. Wilhelm, A. Wright, and C. Faloutsos. Polonium: Tera-scale graph mining and inference for malware detection. In Proceedings of the SIAM International Conference on Data Mining, 2011.Google ScholarCross Ref
- O. Chum, J. Philbin, and A. Zisserman. Near duplicate image detection: min-hash and tf-idf weighting. In British Machine Vision Conference, 2008.Google ScholarCross Ref
- E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. D. Ullman, and C. Yang. Finding interesting associations without support pruning. In Proceedings of the IEEE International Conference on Data Engineering, 2000. Google ScholarDigital Library
- A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In Proceedings of the International conference on World Wide Web, 2007. Google ScholarDigital Library
- M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the ACM Symposium on Computational Geometry, 2004. Google ScholarDigital Library
- T. Dumitras and D. Shou. Toward a standard benchmark for computer security research: The worldwide intelligence network environment (wine). In Proceedings of the European Conference on Computer Systems BADGERS Workshop, 2011. Google ScholarDigital Library
- P. F. Felzenszwalb and D. P. Huttenlocher. Efficient belief propagation for early vision. International Journal of Computer Vision, 70(1):41--54, 2006. Google ScholarDigital Library
- A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proceedings of the International Conference on Very Large Data Bases, 1999. Google ScholarDigital Library
- X. Hu, S. Bhatkar, K. Griffin, and K. G. Shin. Mutantx-s: Scalable malware clustering based on static features. In Proceedings of USENIX Annual Technical Conference, 2013. Google ScholarDigital Library
- P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the ACM Symposium on the Theory of Computing, 1998. Google ScholarDigital Library
- U. Kang, D. H. Chau, and C. Faloutsos. Mining large graphs: Algorithms, inference, and discoveries. In Proceedings of the IEEE International Conference on Data Engineering, 2011. Google ScholarDigital Library
- N. Karampatziakis, J. W. Stokes, A. Thomas, and M. Marinescu. Using file relationships in malware classification. In Proceedings of the Conference on Detection of Intrusions and Malware and Vulnerability Assessment, 2012. Google ScholarDigital Library
- H. Kim and H. Park. Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics, 23(12):1495--1502, 2007. Google ScholarDigital Library
- M. McGlohon, S. Bay, M. G. Anderle, D. M. Steier, and C. Faloutsos. Snare: A link analytic system for graph labeling and risk detection. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, 2009. Google ScholarDigital Library
- E. E. Papalexakis, T. Dumitras, D. H. P. Chau, B. A. Prakash, and C. Faloutsos. Spatio-temporal mining of software adoption & penetration. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2013. Google ScholarDigital Library
- A. Rajaraman and J. D. Ullman. Mining of Massive Datasets. Cambridge University Press, 2012. Google ScholarDigital Library
- Symantec. Internet security threat report. 18, 2013.Google Scholar
- Y. Ye, T. Li, S. Zhu, W. Zhuang, E. Tas, U. Gupta, and M. Abdulhayoglu. Combining file content and file relations for cloud based malware detection. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, 2011. Google ScholarDigital Library
- J. Yedidia, W. Freeman, and Y. Weiss. Understanding belief propagation and its generalizations, pages 239--270. Morgan Kaufmann Publishers Inc., 2003. Google ScholarDigital Library
Index Terms
- Guilt by association: large scale malware detection by mining file-relation graphs
Recommendations
Real-Time Detection of Malware Downloads via Large-Scale URL->File->Machine Graph Mining
ASIA CCS '16: Proceedings of the 11th ACM on Asia Conference on Computer and Communications SecurityIn this paper we propose Mastino, a novel defense system to detect malware download events. A download event is a 3-tuple that identifies the action of downloading a file from a URL that was triggered by a client (machine). Mastino utilizes global ...
Malware detection using adaptive data compression
AISec '08: Proceedings of the 1st ACM workshop on Workshop on AISecA popular approach in current commercial anti-malware software detects malicious programs by searching in the code of programs for scan strings that are byte sequences indicative of malicious code. The scan strings, also known as the signatures of ...
Malware Detection Method Focusing on Anti-debugging Functions
CANDAR '14: Proceedings of the 2014 Second International Symposium on Computing and NetworkingMalware has received much attention in recent years. Antivirus software is widely used as a countermeasure against malware. However, some kinds of malware can evade detection by antivirus software, hence, a new detection method is required. In this ...
Comments