skip to main content
research-article
Open Access

Malware Classification Based on Multilayer Perception and Word2Vec for IoT Security

Authors Info & Claims
Published:14 September 2021Publication History
Skip Abstract Section

Abstract

With the construction of smart cities, the number of Internet of Things (IoT) devices is growing rapidly, leading to an explosive growth of malware designed for IoT devices. These malware pose a serious threat to the security of IoT devices. The traditional malware classification methods mainly rely on feature engineering. To improve accuracy, a large number of different types of features will be extracted from malware files in these methods. That brings a high complexity to the classification. To solve these issues, a malware classification method based on Word2Vec and Multilayer Perception (MLP) is proposed in this article. First, for one malware sample, Word2Vec is used to calculate a word vector for all bytes of the binary file and all instructions in the assembly file. Second, we combine these vectors into a 256x256x2-dimensional matrix. Finally, we designed a deep learning network structure based on MLP to train the model. Then the model is used to classify the testing samples. The experimental results prove that the method has a high accuracy of 99.54%.

References

  1. Bernardo Quintero, Emiliano Martínez, Víctor Manuel Álvarez, Karl Hiramoto, Julio Canto, Alejandro Bermúdez, and Juan A. Infantes. 2020. VirusTotal. Retrieved July 29, 2021 from https://www.virustotal.com/.Google ScholarGoogle Scholar
  2. Bugra Cakir and Erdogan Dogdu. 2018. Malware classification using deep learning methods. In Proceedings of the ACMSE 2018 Conference (ACMSE’18). Article 10, 5 pages. https://doi.org/10.1145/3190645.3190692 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Moses S. Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th ACM Symposium on Theory of Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. X. Chen, C. Li, D. Wang, S. Wen, J. Zhang, S. Nepal, Y. Xiang, and K. Ren. 2020. Android HIV: A study of repackaging malware for evading machine-learning detection. IEEE Transactions on Information Forensics and Security 15 (2020), 987–1001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. George E. Dahl, Jack W. Stokes, Li Deng, and Dong Yu. 2013. Large-scale malware classification using random projections and neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, Los Alamitos, CA, 3422–3426. Google ScholarGoogle ScholarCross RefCross Ref
  6. Yuxin Ding and Siyi Zhu. 2017. Malware detection based on deep learning algorithm. Neural Computing & Applications1 (2017), 1–12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics 29, 5 (2001), 1189–1232. Google ScholarGoogle ScholarCross RefCross Ref
  8. Jin Gao, Yahao He, Xiaoyan Zhang, and Yamei Xia. 2017. Duplicate short text detection based on Word2vec. In Proceedings of the 2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS’17).Google ScholarGoogle ScholarCross RefCross Ref
  9. Chris Giannella and Eric Bloedorn. 2015. Spectral malware behavior clustering. In Proceedings of the 2015 IEEE International Conference on Intelligence and Security Informatics (ISI’15).IEEE, Los Alamitos, CA, 7–12. Google ScholarGoogle ScholarCross RefCross Ref
  10. Kyoung Soo Han, Jae Hyun Lim, Eul Gyu Im, Kyoung Soo Han, Jae Hyun Lim, and Eul Gyu Im. 2013. Malware analysis method using visualization of binary files. In Proceedings of the 2013 Research in Adaptive and Convergent Systems (RACS’13). 317–321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Simon Haykin and Bart Kosko. 2009. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324.Google ScholarGoogle Scholar
  12. AV-TEST Institute. 2020. Malware Statistics & Trends Report. Retrieved July 29, 2021 from http://www.av-test.org/en/statistics/malware/.Google ScholarGoogle Scholar
  13. Anna Katrenko. 2020. Malware Sandbox Evasion: Techniques, Principles & Solutions. Retrieved July 29, 2021 from https://www.apriorit.com/dev-blog/545-sandbox-evading-malware.Google ScholarGoogle Scholar
  14. T. M. Kebede, O. Djaneye-Boundjou, B. N. Narayanan, A. Ralescu, and D. Kapp. 2017. Classification of malware programs using autoencoders based deep learning architecture and its application to the Microsoft malware classification challenge (BIG 2015) dataset. In Proceedings of the 2017 IEEE National Aerospace and Electronics Conference (NAECON’17). 70–75. https://doi.org/10.1109/NAECON.2017.8268747Google ScholarGoogle ScholarCross RefCross Ref
  15. Hae Jung Kim. 2018. Image-based malware classification using convolutional neural network. In Advances in Computer Science and Ubiquitous Computing. Lecture Notes in Computer Science, Vol. 474. Springer, 1352–1357. https://doi.org/10.1007/978-981-10-7605-3_215Google ScholarGoogle Scholar
  16. Jeremy Z. Kolter and Marcus A. Maloof. 2004. Learning to detect malicious executables in the wild. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 470–478. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. 1188–1196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Lin, S. Wen, Q. L. Han, J. Zhang, and Y. Xiang. 2020. Software vulnerability detection using deep neural networks: A survey. Proceedings of the IEEE 108, 10 (2020), 1825–1848.Google ScholarGoogle ScholarCross RefCross Ref
  19. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781.Google ScholarGoogle Scholar
  20. Saeed Nari and Ali A. Ghorbani. 2013. Automated malware classification based on network behavior. In Proceedings of the 2013 International Conference on Computing, Networking, and Communications (ICNC’13). IEEE, Los Alamitos, CA, 642–647. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Younghee Park, Douglas S. Reeves, and Mark Stamp. 2013. Deriving common malware behavior through graph clustering. Computers & Security 39 (2013), 419–430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Razvan Pascanu, Jack W. Stokes, Hermineh Sanossian, Mady Marinescu, and Anil Thomas. 2015. Malware classification with recurrent networks. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’15). IEEE, Los Alamitos, CA, 1916–1920. Google ScholarGoogle ScholarCross RefCross Ref
  23. Igor Popov. 2017. Malware detection using machine learning based on Word2Vec embeddings of machine code instructions. In Proceedings of the 2017 Siberian Symposium on Data Science and Engineering (SSDSE’17). IEEE, Los Alamitos, CA, 1–4. Google ScholarGoogle ScholarCross RefCross Ref
  24. Yanchen Qiao, Qingshan Jiang, Zhenchao Jiang, and Liang Gu. 2019. A multi-channel visualization method for malware classification based on deep learning. In Proceedings of the 2019 18th IEEE International Conference on Trust, Security, and Privacy in Computing and Communications and the 13th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE’19). IEEE, Los Alamitos, CA, 757–762. Google ScholarGoogle Scholar
  25. Y. Qiao, B. Zhang, and W. Zhang. 2020. Malware classification method based on word vector of bytes and multilayer perception. In Proceedings of the 2020 IEEE International Conference on Communications (ICC’20). IEEE, Los Alamitos, CA, 1–6.Google ScholarGoogle Scholar
  26. Youyang Qu, Longxiang Gao, Tom H. Luan, Yong Xiang, Shui Yu, Bai Li, and Gavin Zheng. 2020. Decentralized privacy using blockchain-enabled federated learning in fog computing. IEEE Internet of Things Journal 7, 6 (2020), 5171–5183.Google ScholarGoogle ScholarCross RefCross Ref
  27. Youyang Qu, Shui Yu, Longxiang Gao, Wanlei Zhou, and Sancheng Peng. 2018. A hybrid privacy protection scheme in cyber-physical social networks. IEEE Transactions on Computational Social Systems 5, 3 (2018), 773–784.Google ScholarGoogle ScholarCross RefCross Ref
  28. Youyang Qu, Shui Yu, Jingwen Zhang, Huynh Thi Thanh Binh, Longxiang Gao, and Wanlei Zhou. 2019. GAN-DP: Generative adversarial net driven differentially privacy-preserving big data publishing. In Proceedings of the IEEE International Conference on Communications (ICC’19). IEEE, Los Alamitos, CA, 1–6.Google ScholarGoogle ScholarCross RefCross Ref
  29. Youyang Qu, Shui Yu, Wanlei Zhou, Sancheng Peng, Guojun Wang, and Ke Xiao. 2018. Privacy of things: Emerging challenges and opportunities in wireless Internet of Things. IEEE Wireless Communications 25, 6 (2018), 91–97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. K. Rahul, T. Anjali, Vijay Krishna Menon, and K. P. Soman. 2017. Deep learning for network flow analysis and malware classification. In Proceedings of the International Symposium on Security in Computing and Communication. 226–235.Google ScholarGoogle Scholar
  31. Royi Ronen, Marian Radu, Corina Feuerstein, Elad Yom-Tov, and Mansour Ahmadi. 2018. Microsoft malware classification challenge. arXiv:1802.10135.Google ScholarGoogle Scholar
  32. Zahra Salehi, Mahboobeh Ghiasi, and Ashkan Sami. 2012. A miner for malware detection based on API function calls and their arguments. In Proceedings of the 2012 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP’12). IEEE, Los Alamitos, CA, 563–568. Google ScholarGoogle ScholarCross RefCross Ref
  33. Matthew G. Schultz, Eleazar Eskin, F. Zadok, and Salvatore J. Stolfo. 2001. Data mining methods for detection of new malicious executables. In Proceedings of the 2001 IEEE Symposium on Security and Privacy (S&P’01). IEEE, Los Alamitos, CA, 38–49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Syed Zainudeen Mohd Shaid. 2015. Malware behavior image for malware variant identification. In Proceedings of the International Symposium on Biometrics and Security Technologies. 238–243.Google ScholarGoogle Scholar
  35. Madhu K. Shankarapani, Subbu Ramamoorthy, Ram S. Movva, and Srinivas Mukkamala. 2011. Malware detection using assembly and API call sequences. Journal in Computer Virology 7, 2 (2011), 107–119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ronghua Tian, Lynn Margaret Batten, and S. C. Versteeg. 2008. Function length as a tool for malware classification. In Proceedings of the 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE’08). IEEE, Los Alamitos, CA, 69–76. Google ScholarGoogle Scholar
  37. Trung Kien Tran and Hiroshi Sato. 2017. NLP-based approaches for malware classification from API sequences. In Proceedings of the 2017 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES’17). IEEE, Los Alamitos, CA, 101–105. Google ScholarGoogle ScholarCross RefCross Ref
  38. Huanran Wang, Hui He, and Weizhe Zhang. 2018. Demadroid: Object reference graph-based malware detection in Android. Security and Communication Networks 2018 (2018), Article 7064131.Google ScholarGoogle Scholar
  39. Wenyi Huang and Jack W. Stokes. 2016. MtNet: A multi-task neural network for dynamic malware classification. In Detection of Intrusions and Malware, and Vulnerability Assessment. Lecture Notes in Computer Science, Vol. 9721. Springer, 399–418.https://doi.org/10.1007/978-3-319-40667-1_20 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Xu Chen, J. Andersen, Z. M. Mao, M. Bailey, and J. Nazario. 2008. In Proceedings of the 2008 IEEE International Conference on Dependable Systems and Networks with FTCS and DCC (DSN’08). IEEE, Los Alamitos, CA.Google ScholarGoogle Scholar
  41. Bin Zhang, Wentao Xiao, Xi Xiao, Arun Kumar Sangaiah, Weizhe Zhang, and Jiajia Zhang. 2020. Ransomware classification using patch-based CNN and self-attention network on embedded N-grams of opcodes. Future Generation Computer Systems 110 (2020), 708–720. Google ScholarGoogle ScholarCross RefCross Ref
  42. Dongwen Zhang, Hua Xu, Zengcai Su, and Yunfeng Xu. 2015. Chinese comments sentiment classification based on word2vec and SVMperf. Expert Systems with Applications 42, 4 (2015), 1857–1863. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. W. Zhang, H. Wang, H. He, and P. Liu. 2020. DAMBA: Detecting Android malware by ORGB analysis. IEEE Transactions on Reliability 69, 1 (2020), 55–69.Google ScholarGoogle ScholarCross RefCross Ref
  44. W. Zhang, B. Zhang, Y. Zhou, H. He, and Z. Ding. 2020. An IoT honeynet based on multi-port honeypots for capturing IoT attacks. IEEE Internet of Things Journal 7, 5 (2020), 3991–3999.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Malware Classification Based on Multilayer Perception and Word2Vec for IoT Security

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Internet Technology
            ACM Transactions on Internet Technology  Volume 22, Issue 1
            February 2022
            717 pages
            ISSN:1533-5399
            EISSN:1557-6051
            DOI:10.1145/3483347
            • Editor:
            • Ling Liu
            Issue’s Table of Contents

            Copyright © 2021 Association for Computing Machinery.

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 14 September 2021
            • Accepted: 1 November 2020
            • Revised: 1 October 2020
            • Received: 1 May 2020
            Published in toit Volume 22, Issue 1

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format