Abstract
With the construction of smart cities, the number of Internet of Things (IoT) devices is growing rapidly, leading to an explosive growth of malware designed for IoT devices. These malware pose a serious threat to the security of IoT devices. The traditional malware classification methods mainly rely on feature engineering. To improve accuracy, a large number of different types of features will be extracted from malware files in these methods. That brings a high complexity to the classification. To solve these issues, a malware classification method based on Word2Vec and Multilayer Perception (MLP) is proposed in this article. First, for one malware sample, Word2Vec is used to calculate a word vector for all bytes of the binary file and all instructions in the assembly file. Second, we combine these vectors into a 256x256x2-dimensional matrix. Finally, we designed a deep learning network structure based on MLP to train the model. Then the model is used to classify the testing samples. The experimental results prove that the method has a high accuracy of 99.54%.
- Bernardo Quintero, Emiliano Martínez, Víctor Manuel Álvarez, Karl Hiramoto, Julio Canto, Alejandro Bermúdez, and Juan A. Infantes. 2020. VirusTotal. Retrieved July 29, 2021 from https://www.virustotal.com/.Google Scholar
- Bugra Cakir and Erdogan Dogdu. 2018. Malware classification using deep learning methods. In Proceedings of the ACMSE 2018 Conference (ACMSE’18). Article 10, 5 pages. https://doi.org/10.1145/3190645.3190692 Google ScholarDigital Library
- Moses S. Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th ACM Symposium on Theory of Computing. Google ScholarDigital Library
- X. Chen, C. Li, D. Wang, S. Wen, J. Zhang, S. Nepal, Y. Xiang, and K. Ren. 2020. Android HIV: A study of repackaging malware for evading machine-learning detection. IEEE Transactions on Information Forensics and Security 15 (2020), 987–1001.Google ScholarDigital Library
- George E. Dahl, Jack W. Stokes, Li Deng, and Dong Yu. 2013. Large-scale malware classification using random projections and neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, Los Alamitos, CA, 3422–3426. Google ScholarCross Ref
- Yuxin Ding and Siyi Zhu. 2017. Malware detection based on deep learning algorithm. Neural Computing & Applications1 (2017), 1–12. Google ScholarDigital Library
- Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics 29, 5 (2001), 1189–1232. Google ScholarCross Ref
- Jin Gao, Yahao He, Xiaoyan Zhang, and Yamei Xia. 2017. Duplicate short text detection based on Word2vec. In Proceedings of the 2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS’17).Google ScholarCross Ref
- Chris Giannella and Eric Bloedorn. 2015. Spectral malware behavior clustering. In Proceedings of the 2015 IEEE International Conference on Intelligence and Security Informatics (ISI’15).IEEE, Los Alamitos, CA, 7–12. Google ScholarCross Ref
- Kyoung Soo Han, Jae Hyun Lim, Eul Gyu Im, Kyoung Soo Han, Jae Hyun Lim, and Eul Gyu Im. 2013. Malware analysis method using visualization of binary files. In Proceedings of the 2013 Research in Adaptive and Convergent Systems (RACS’13). 317–321. Google ScholarDigital Library
- Simon Haykin and Bart Kosko. 2009. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324.Google Scholar
- AV-TEST Institute. 2020. Malware Statistics & Trends Report. Retrieved July 29, 2021 from http://www.av-test.org/en/statistics/malware/.Google Scholar
- Anna Katrenko. 2020. Malware Sandbox Evasion: Techniques, Principles & Solutions. Retrieved July 29, 2021 from https://www.apriorit.com/dev-blog/545-sandbox-evading-malware.Google Scholar
- T. M. Kebede, O. Djaneye-Boundjou, B. N. Narayanan, A. Ralescu, and D. Kapp. 2017. Classification of malware programs using autoencoders based deep learning architecture and its application to the Microsoft malware classification challenge (BIG 2015) dataset. In Proceedings of the 2017 IEEE National Aerospace and Electronics Conference (NAECON’17). 70–75. https://doi.org/10.1109/NAECON.2017.8268747Google ScholarCross Ref
- Hae Jung Kim. 2018. Image-based malware classification using convolutional neural network. In Advances in Computer Science and Ubiquitous Computing. Lecture Notes in Computer Science, Vol. 474. Springer, 1352–1357. https://doi.org/10.1007/978-981-10-7605-3_215Google Scholar
- Jeremy Z. Kolter and Marcus A. Maloof. 2004. Learning to detect malicious executables in the wild. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 470–478. Google ScholarDigital Library
- Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. 1188–1196. Google ScholarDigital Library
- G. Lin, S. Wen, Q. L. Han, J. Zhang, and Y. Xiang. 2020. Software vulnerability detection using deep neural networks: A survey. Proceedings of the IEEE 108, 10 (2020), 1825–1848.Google ScholarCross Ref
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781.Google Scholar
- Saeed Nari and Ali A. Ghorbani. 2013. Automated malware classification based on network behavior. In Proceedings of the 2013 International Conference on Computing, Networking, and Communications (ICNC’13). IEEE, Los Alamitos, CA, 642–647. Google ScholarDigital Library
- Younghee Park, Douglas S. Reeves, and Mark Stamp. 2013. Deriving common malware behavior through graph clustering. Computers & Security 39 (2013), 419–430. Google ScholarDigital Library
- Razvan Pascanu, Jack W. Stokes, Hermineh Sanossian, Mady Marinescu, and Anil Thomas. 2015. Malware classification with recurrent networks. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’15). IEEE, Los Alamitos, CA, 1916–1920. Google ScholarCross Ref
- Igor Popov. 2017. Malware detection using machine learning based on Word2Vec embeddings of machine code instructions. In Proceedings of the 2017 Siberian Symposium on Data Science and Engineering (SSDSE’17). IEEE, Los Alamitos, CA, 1–4. Google ScholarCross Ref
- Yanchen Qiao, Qingshan Jiang, Zhenchao Jiang, and Liang Gu. 2019. A multi-channel visualization method for malware classification based on deep learning. In Proceedings of the 2019 18th IEEE International Conference on Trust, Security, and Privacy in Computing and Communications and the 13th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE’19). IEEE, Los Alamitos, CA, 757–762. Google Scholar
- Y. Qiao, B. Zhang, and W. Zhang. 2020. Malware classification method based on word vector of bytes and multilayer perception. In Proceedings of the 2020 IEEE International Conference on Communications (ICC’20). IEEE, Los Alamitos, CA, 1–6.Google Scholar
- Youyang Qu, Longxiang Gao, Tom H. Luan, Yong Xiang, Shui Yu, Bai Li, and Gavin Zheng. 2020. Decentralized privacy using blockchain-enabled federated learning in fog computing. IEEE Internet of Things Journal 7, 6 (2020), 5171–5183.Google ScholarCross Ref
- Youyang Qu, Shui Yu, Longxiang Gao, Wanlei Zhou, and Sancheng Peng. 2018. A hybrid privacy protection scheme in cyber-physical social networks. IEEE Transactions on Computational Social Systems 5, 3 (2018), 773–784.Google ScholarCross Ref
- Youyang Qu, Shui Yu, Jingwen Zhang, Huynh Thi Thanh Binh, Longxiang Gao, and Wanlei Zhou. 2019. GAN-DP: Generative adversarial net driven differentially privacy-preserving big data publishing. In Proceedings of the IEEE International Conference on Communications (ICC’19). IEEE, Los Alamitos, CA, 1–6.Google ScholarCross Ref
- Youyang Qu, Shui Yu, Wanlei Zhou, Sancheng Peng, Guojun Wang, and Ke Xiao. 2018. Privacy of things: Emerging challenges and opportunities in wireless Internet of Things. IEEE Wireless Communications 25, 6 (2018), 91–97. Google ScholarDigital Library
- R. K. Rahul, T. Anjali, Vijay Krishna Menon, and K. P. Soman. 2017. Deep learning for network flow analysis and malware classification. In Proceedings of the International Symposium on Security in Computing and Communication. 226–235.Google Scholar
- Royi Ronen, Marian Radu, Corina Feuerstein, Elad Yom-Tov, and Mansour Ahmadi. 2018. Microsoft malware classification challenge. arXiv:1802.10135.Google Scholar
- Zahra Salehi, Mahboobeh Ghiasi, and Ashkan Sami. 2012. A miner for malware detection based on API function calls and their arguments. In Proceedings of the 2012 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP’12). IEEE, Los Alamitos, CA, 563–568. Google ScholarCross Ref
- Matthew G. Schultz, Eleazar Eskin, F. Zadok, and Salvatore J. Stolfo. 2001. Data mining methods for detection of new malicious executables. In Proceedings of the 2001 IEEE Symposium on Security and Privacy (S&P’01). IEEE, Los Alamitos, CA, 38–49. Google ScholarDigital Library
- Syed Zainudeen Mohd Shaid. 2015. Malware behavior image for malware variant identification. In Proceedings of the International Symposium on Biometrics and Security Technologies. 238–243.Google Scholar
- Madhu K. Shankarapani, Subbu Ramamoorthy, Ram S. Movva, and Srinivas Mukkamala. 2011. Malware detection using assembly and API call sequences. Journal in Computer Virology 7, 2 (2011), 107–119. Google ScholarDigital Library
- Ronghua Tian, Lynn Margaret Batten, and S. C. Versteeg. 2008. Function length as a tool for malware classification. In Proceedings of the 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE’08). IEEE, Los Alamitos, CA, 69–76. Google Scholar
- Trung Kien Tran and Hiroshi Sato. 2017. NLP-based approaches for malware classification from API sequences. In Proceedings of the 2017 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES’17). IEEE, Los Alamitos, CA, 101–105. Google ScholarCross Ref
- Huanran Wang, Hui He, and Weizhe Zhang. 2018. Demadroid: Object reference graph-based malware detection in Android. Security and Communication Networks 2018 (2018), Article 7064131.Google Scholar
- Wenyi Huang and Jack W. Stokes. 2016. MtNet: A multi-task neural network for dynamic malware classification. In Detection of Intrusions and Malware, and Vulnerability Assessment. Lecture Notes in Computer Science, Vol. 9721. Springer, 399–418.https://doi.org/10.1007/978-3-319-40667-1_20 Google ScholarDigital Library
- Xu Chen, J. Andersen, Z. M. Mao, M. Bailey, and J. Nazario. 2008. In Proceedings of the 2008 IEEE International Conference on Dependable Systems and Networks with FTCS and DCC (DSN’08). IEEE, Los Alamitos, CA.Google Scholar
- Bin Zhang, Wentao Xiao, Xi Xiao, Arun Kumar Sangaiah, Weizhe Zhang, and Jiajia Zhang. 2020. Ransomware classification using patch-based CNN and self-attention network on embedded N-grams of opcodes. Future Generation Computer Systems 110 (2020), 708–720. Google ScholarCross Ref
- Dongwen Zhang, Hua Xu, Zengcai Su, and Yunfeng Xu. 2015. Chinese comments sentiment classification based on word2vec and SVMperf. Expert Systems with Applications 42, 4 (2015), 1857–1863. Google ScholarDigital Library
- W. Zhang, H. Wang, H. He, and P. Liu. 2020. DAMBA: Detecting Android malware by ORGB analysis. IEEE Transactions on Reliability 69, 1 (2020), 55–69.Google ScholarCross Ref
- W. Zhang, B. Zhang, Y. Zhou, H. He, and Z. Ding. 2020. An IoT honeynet based on multi-port honeypots for capturing IoT attacks. IEEE Internet of Things Journal 7, 5 (2020), 3991–3999.Google ScholarCross Ref
Index Terms
- Malware Classification Based on Multilayer Perception and Word2Vec for IoT Security
Recommendations
Malware classification method via binary content comparison
RACS '12: Proceedings of the 2012 ACM Research in Applied Computation SymposiumWith the wide spread uses of the Internet, the number of Internet attacks keeps increasing, and malware is the main cause of most Internet attacks. Malware is used by attackers to infect normal users' computers and to acquire private information as well ...
Malware Function Classification Using APIs in Initial Behavior
ASIAJCIS '15: Proceedings of the 2015 10th Asia Joint Conference on Information SecurityMalware proliferation has become a serious threat to the Internet in recent years. Most of the current malware are subspecies of existing malware that have been automatically generated by illegal tools. To conduct an efficient analysis of malware, ...
A novel malware analysis for malware detection and classification using machine learning algorithms
SIN '17: Proceedings of the 10th International Conference on Security of Information and NetworksNowadays, Malware has become a serious threat to the digitization of the world due to the emergence of various new and complex malware every day. Due to this, the traditional signature-based methods for detection of malware effectively becomes an ...
Comments