research-article

Open Access

Malware Classification Based on Multilayer Perception and Word2Vec for IoT Security

Authors:
Yanchen Qiao

Cyberspace Security Research Center, Peng Cheng Laboratory, Shenzhen, China

Cyberspace Security Research Center, Peng Cheng Laboratory, Shenzhen, China
View Profile

,
Weizhe Zhang

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
View Profile

,
Xiaojiang Du

Temple University, Philadelphia, USA, PA

Temple University, Philadelphia, USA, PA
View Profile

,
Mohsen Guizani

Qatar University, Doha, Qatar

Qatar University, Doha, Qatar
View Profile

Authors Info & Claims

ACM Transactions on Internet Technology Volume 22 Issue 1Article No.: 10pp 1–22https://doi.org/10.1145/3436751

Published:14 September 2021Publication History

ACM Transactions on Internet Technology

Abstract

With the construction of smart cities, the number of Internet of Things (IoT) devices is growing rapidly, leading to an explosive growth of malware designed for IoT devices. These malware pose a serious threat to the security of IoT devices. The traditional malware classification methods mainly rely on feature engineering. To improve accuracy, a large number of different types of features will be extracted from malware files in these methods. That brings a high complexity to the classification. To solve these issues, a malware classification method based on Word2Vec and Multilayer Perception (MLP) is proposed in this article. First, for one malware sample, Word2Vec is used to calculate a word vector for all bytes of the binary file and all instructions in the assembly file. Second, we combine these vectors into a 256x256x2-dimensional matrix. Finally, we designed a deep learning network structure based on MLP to train the model. Then the model is used to classify the testing samples. The experimental results prove that the method has a high accuracy of 99.54%.

References

Bernardo Quintero, Emiliano Martínez, Víctor Manuel Álvarez, Karl Hiramoto, Julio Canto, Alejandro Bermúdez, and Juan A. Infantes. 2020. VirusTotal. Retrieved July 29, 2021 from https://www.virustotal.com/.Google Scholar
Bugra Cakir and Erdogan Dogdu. 2018. Malware classification using deep learning methods. In Proceedings of the ACMSE 2018 Conference (ACMSE’18). Article 10, 5 pages. https://doi.org/10.1145/3190645.3190692 Google ScholarDigital Library
Moses S. Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th ACM Symposium on Theory of Computing. Google ScholarDigital Library
X. Chen, C. Li, D. Wang, S. Wen, J. Zhang, S. Nepal, Y. Xiang, and K. Ren. 2020. Android HIV: A study of repackaging malware for evading machine-learning detection. IEEE Transactions on Information Forensics and Security 15 (2020), 987–1001.Google ScholarDigital Library
George E. Dahl, Jack W. Stokes, Li Deng, and Dong Yu. 2013. Large-scale malware classification using random projections and neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, Los Alamitos, CA, 3422–3426. Google ScholarCross Ref
Yuxin Ding and Siyi Zhu. 2017. Malware detection based on deep learning algorithm. Neural Computing & Applications1 (2017), 1–12. Google ScholarDigital Library
Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics 29, 5 (2001), 1189–1232. Google ScholarCross Ref
Jin Gao, Yahao He, Xiaoyan Zhang, and Yamei Xia. 2017. Duplicate short text detection based on Word2vec. In Proceedings of the 2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS’17).Google ScholarCross Ref
Chris Giannella and Eric Bloedorn. 2015. Spectral malware behavior clustering. In Proceedings of the 2015 IEEE International Conference on Intelligence and Security Informatics (ISI’15).IEEE, Los Alamitos, CA, 7–12. Google ScholarCross Ref
Kyoung Soo Han, Jae Hyun Lim, Eul Gyu Im, Kyoung Soo Han, Jae Hyun Lim, and Eul Gyu Im. 2013. Malware analysis method using visualization of binary files. In Proceedings of the 2013 Research in Adaptive and Convergent Systems (RACS’13). 317–321. Google ScholarDigital Library
Simon Haykin and Bart Kosko. 2009. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324.Google Scholar
AV-TEST Institute. 2020. Malware Statistics & Trends Report. Retrieved July 29, 2021 from http://www.av-test.org/en/statistics/malware/.Google Scholar
Anna Katrenko. 2020. Malware Sandbox Evasion: Techniques, Principles & Solutions. Retrieved July 29, 2021 from https://www.apriorit.com/dev-blog/545-sandbox-evading-malware.Google Scholar
T. M. Kebede, O. Djaneye-Boundjou, B. N. Narayanan, A. Ralescu, and D. Kapp. 2017. Classification of malware programs using autoencoders based deep learning architecture and its application to the Microsoft malware classification challenge (BIG 2015) dataset. In Proceedings of the 2017 IEEE National Aerospace and Electronics Conference (NAECON’17). 70–75. https://doi.org/10.1109/NAECON.2017.8268747Google ScholarCross Ref
Hae Jung Kim. 2018. Image-based malware classification using convolutional neural network. In Advances in Computer Science and Ubiquitous Computing. Lecture Notes in Computer Science, Vol. 474. Springer, 1352–1357. https://doi.org/10.1007/978-981-10-7605-3_215Google Scholar
Jeremy Z. Kolter and Marcus A. Maloof. 2004. Learning to detect malicious executables in the wild. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 470–478. Google ScholarDigital Library
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. 1188–1196. Google ScholarDigital Library
G. Lin, S. Wen, Q. L. Han, J. Zhang, and Y. Xiang. 2020. Software vulnerability detection using deep neural networks: A survey. Proceedings of the IEEE 108, 10 (2020), 1825–1848.Google ScholarCross Ref
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781.Google Scholar
Saeed Nari and Ali A. Ghorbani. 2013. Automated malware classification based on network behavior. In Proceedings of the 2013 International Conference on Computing, Networking, and Communications (ICNC’13). IEEE, Los Alamitos, CA, 642–647. Google ScholarDigital Library
Younghee Park, Douglas S. Reeves, and Mark Stamp. 2013. Deriving common malware behavior through graph clustering. Computers & Security 39 (2013), 419–430. Google ScholarDigital Library
Razvan Pascanu, Jack W. Stokes, Hermineh Sanossian, Mady Marinescu, and Anil Thomas. 2015. Malware classification with recurrent networks. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’15). IEEE, Los Alamitos, CA, 1916–1920. Google ScholarCross Ref
Igor Popov. 2017. Malware detection using machine learning based on Word2Vec embeddings of machine code instructions. In Proceedings of the 2017 Siberian Symposium on Data Science and Engineering (SSDSE’17). IEEE, Los Alamitos, CA, 1–4. Google ScholarCross Ref
Yanchen Qiao, Qingshan Jiang, Zhenchao Jiang, and Liang Gu. 2019. A multi-channel visualization method for malware classification based on deep learning. In Proceedings of the 2019 18th IEEE International Conference on Trust, Security, and Privacy in Computing and Communications and the 13th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE’19). IEEE, Los Alamitos, CA, 757–762. Google Scholar
Y. Qiao, B. Zhang, and W. Zhang. 2020. Malware classification method based on word vector of bytes and multilayer perception. In Proceedings of the 2020 IEEE International Conference on Communications (ICC’20). IEEE, Los Alamitos, CA, 1–6.Google Scholar
Youyang Qu, Longxiang Gao, Tom H. Luan, Yong Xiang, Shui Yu, Bai Li, and Gavin Zheng. 2020. Decentralized privacy using blockchain-enabled federated learning in fog computing. IEEE Internet of Things Journal 7, 6 (2020), 5171–5183.Google ScholarCross Ref
Youyang Qu, Shui Yu, Longxiang Gao, Wanlei Zhou, and Sancheng Peng. 2018. A hybrid privacy protection scheme in cyber-physical social networks. IEEE Transactions on Computational Social Systems 5, 3 (2018), 773–784.Google ScholarCross Ref
Youyang Qu, Shui Yu, Jingwen Zhang, Huynh Thi Thanh Binh, Longxiang Gao, and Wanlei Zhou. 2019. GAN-DP: Generative adversarial net driven differentially privacy-preserving big data publishing. In Proceedings of the IEEE International Conference on Communications (ICC’19). IEEE, Los Alamitos, CA, 1–6.Google ScholarCross Ref
Youyang Qu, Shui Yu, Wanlei Zhou, Sancheng Peng, Guojun Wang, and Ke Xiao. 2018. Privacy of things: Emerging challenges and opportunities in wireless Internet of Things. IEEE Wireless Communications 25, 6 (2018), 91–97. Google ScholarDigital Library
R. K. Rahul, T. Anjali, Vijay Krishna Menon, and K. P. Soman. 2017. Deep learning for network flow analysis and malware classification. In Proceedings of the International Symposium on Security in Computing and Communication. 226–235.Google Scholar
Royi Ronen, Marian Radu, Corina Feuerstein, Elad Yom-Tov, and Mansour Ahmadi. 2018. Microsoft malware classification challenge. arXiv:1802.10135.Google Scholar
Zahra Salehi, Mahboobeh Ghiasi, and Ashkan Sami. 2012. A miner for malware detection based on API function calls and their arguments. In Proceedings of the 2012 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP’12). IEEE, Los Alamitos, CA, 563–568. Google ScholarCross Ref
Matthew G. Schultz, Eleazar Eskin, F. Zadok, and Salvatore J. Stolfo. 2001. Data mining methods for detection of new malicious executables. In Proceedings of the 2001 IEEE Symposium on Security and Privacy (S&P’01). IEEE, Los Alamitos, CA, 38–49. Google ScholarDigital Library
Syed Zainudeen Mohd Shaid. 2015. Malware behavior image for malware variant identification. In Proceedings of the International Symposium on Biometrics and Security Technologies. 238–243.Google Scholar
Madhu K. Shankarapani, Subbu Ramamoorthy, Ram S. Movva, and Srinivas Mukkamala. 2011. Malware detection using assembly and API call sequences. Journal in Computer Virology 7, 2 (2011), 107–119. Google ScholarDigital Library
Ronghua Tian, Lynn Margaret Batten, and S. C. Versteeg. 2008. Function length as a tool for malware classification. In Proceedings of the 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE’08). IEEE, Los Alamitos, CA, 69–76. Google Scholar
Trung Kien Tran and Hiroshi Sato. 2017. NLP-based approaches for malware classification from API sequences. In Proceedings of the 2017 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES’17). IEEE, Los Alamitos, CA, 101–105. Google ScholarCross Ref
Huanran Wang, Hui He, and Weizhe Zhang. 2018. Demadroid: Object reference graph-based malware detection in Android. Security and Communication Networks 2018 (2018), Article 7064131.Google Scholar
Wenyi Huang and Jack W. Stokes. 2016. MtNet: A multi-task neural network for dynamic malware classification. In Detection of Intrusions and Malware, and Vulnerability Assessment. Lecture Notes in Computer Science, Vol. 9721. Springer, 399–418.https://doi.org/10.1007/978-3-319-40667-1_20 Google ScholarDigital Library
Xu Chen, J. Andersen, Z. M. Mao, M. Bailey, and J. Nazario. 2008. In Proceedings of the 2008 IEEE International Conference on Dependable Systems and Networks with FTCS and DCC (DSN’08). IEEE, Los Alamitos, CA.Google Scholar
Bin Zhang, Wentao Xiao, Xi Xiao, Arun Kumar Sangaiah, Weizhe Zhang, and Jiajia Zhang. 2020. Ransomware classification using patch-based CNN and self-attention network on embedded N-grams of opcodes. Future Generation Computer Systems 110 (2020), 708–720. Google ScholarCross Ref
Dongwen Zhang, Hua Xu, Zengcai Su, and Yunfeng Xu. 2015. Chinese comments sentiment classification based on word2vec and SVMperf. Expert Systems with Applications 42, 4 (2015), 1857–1863. Google ScholarDigital Library
W. Zhang, H. Wang, H. He, and P. Liu. 2020. DAMBA: Detecting Android malware by ORGB analysis. IEEE Transactions on Reliability 69, 1 (2020), 55–69.Google ScholarCross Ref
W. Zhang, B. Zhang, Y. Zhou, H. He, and Z. Ding. 2020. An IoT honeynet based on multi-port honeypots for capturing IoT attacks. IEEE Internet of Things Journal 7, 5 (2020), 3991–3999.Google ScholarCross Ref

Index Terms

Malware Classification Based on Multilayer Perception and Word2Vec for IoT Security

Recommendations

Malware classification method via binary content comparison
RACS '12: Proceedings of the 2012 ACM Research in Applied Computation Symposium

With the wide spread uses of the Internet, the number of Internet attacks keeps increasing, and malware is the main cause of most Internet attacks. Malware is used by attackers to infect normal users' computers and to acquire private information as well ...
Read More
Malware Function Classification Using APIs in Initial Behavior
ASIAJCIS '15: Proceedings of the 2015 10th Asia Joint Conference on Information Security

Malware proliferation has become a serious threat to the Internet in recent years. Most of the current malware are subspecies of existing malware that have been automatically generated by illegal tools. To conduct an efficient analysis of malware, ...
Read More
A novel malware analysis for malware detection and classification using machine learning algorithms
SIN '17: Proceedings of the 10th International Conference on Security of Information and Networks

Nowadays, Malware has become a serious threat to the digitization of the world due to the emergence of various new and complex malware every day. Due to this, the traditional signature-based methods for detection of malware effectively becomes an ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Internet Technology Volume 22, Issue 1
February 2022
717 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/3483347
Editor:
Ling Liu
Georgia Institute of Technology, USA
Issue’s Table of Contents
Copyright © 2021 Association for Computing Machinery.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 September 2021
- Accepted: 1 November 2020
- Revised: 1 October 2020
- Received: 1 May 2020
Published in toit Volume 22, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Malware classification
Word2Vec
multilayer perception
IoT
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 1,424
  Total Downloads
- Downloads (Last 12 months)561
- Downloads (Last 6 weeks)82
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Malware Classification Based on Multilayer Perception and Word2Vec for IoT Security

ACM Transactions on Internet Technology

Abstract

References

Cited By

Index Terms

Recommendations

Malware classification method via binary content comparison

Malware Function Classification Using APIs in Initial Behavior

A novel malware analysis for malware detection and classification using machine learning algorithms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Malware Classification Based on Multilayer Perception and Word2Vec for IoT Security

ACM Transactions on Internet Technology

Abstract

References

Cited By

Index Terms

Recommendations

Malware classification method via binary content comparison

Malware Function Classification Using APIs in Initial Behavior

A novel malware analysis for malware detection and classification using machine learning algorithms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media