Top

Wireless Personal Communications

Published in:

21-11-2017

Defect Prediction in Android Binary Executables Using Deep Neural Network

Authors: Feng Dong, Junfeng Wang, Qi Li, Guoai Xu, Shaodong Zhang

Published in: Wireless Personal Communications | Issue 3/2018

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Software defect prediction locates defective code to help developers improve the security of software. However, existing studies on software defect prediction are mostly limited to the source code. Defect prediction for Android binary executables (called apks) has never been explored in previous studies. In this paper, we propose an explorative study of defect prediction in Android apks. We first propose smali2vec, a new approach to generate features that capture the characteristics of smali (decompiled files of apks) files in apks. Smali2vec extracts both token and semantic features of the defective files in apks and such comprehensive features are needed for building accurate prediction models. Then we leverage deep neural network (DNN), which is one of the most common architecture of deep learning networks, to train and build the defect prediction model in order to achieve accuracy. We apply our defect prediction model to more than 90,000 smali files from 50 Android apks and the results show that our model could achieve an AUC (the area under the receiver operating characteristic curve) of 85.98% and it is capable of predicting defects in apks. Furthermore, the DNN is proved to have a better performance than the traditional shallow machine learning algorithms (e.g., support vector machine and naive bayes) used in previous studies. The model has been used in our practical work and helped locate many defective files in apks.

previous article A Human-in-the-Loop Architecture for Mobile Network: From the View of Large Scale Mobile Data Traffic

next article Distributed and Adaptive Analog Coding for Video Broadcast in Wireless Cooperative System

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

Heartbleed, https://en.wikipedia.org/wiki/Heartbleed.

OWASP, https://www.owasp.org/index.php/Category:Vulnerability.

Github, https://github.com.

Xda Forums, Compiling AOSP Standalone apps: https://forum.xda-developers.com/showthread.php?t=1800090.

AVD, Android Vulnerabilities Database, http://android.scap.org.cn/.

CVE, Common Vulnerabilities and Exposures, http://cve.mitre.org/.

F. Dong, S.D. Zhang, S.H. Wang, DNN-based software defect prediction experimental data and code, https://github.com/breezedong/DNN-based-software-defect-prediction.

JesusFreke, smali/baksmali, https://github.com/JesusFreke/smali/wiki.

Dalvik opcodes: http://pallergabor.uw.hu/androidblog/dalvik_opcodes.html.

Android Open Source Project, Dalvik bytecode: https://source.android.com/devices/tech/dalvik/dalvikbytecode.

http://www.antlr.org/.

Google, TensorFlow Wide Deep Learning Tutorial, https://www.tensorflow.org.

Bengio, Y. (2009). Learning deep architectures for ai. Foundations & Trends in Machine Learning, 2(1), 1–127.CrossRefMATH

Bishnu, P. S., & Bhattacherjee, V. (2012). Software fault prediction using quad tree-based k-means clustering algorithm. IEEE Transactions on Knowledge and Data Engineering, 24(6), 1146–1150.CrossRef

David, O. E., & Netanyahu, N. S. (2015). Deepsign: Deep learning for automatic malware signature generation and classification. In International Joint Conference on Neural Networks (pp. 1–8).

Deng, L., & Yu, D. (2014). Deep learning: methods and applications. Foundations and Trends® in Signal Processing, 7(3–4), 197–387.MathSciNetCrossRefMATH

Du, Y., Wang, X., & Wang, J. (2015). A static android malicious code detection method based on multisource fusion. Security and Communication Networks, 8(17), 3238–3246.CrossRef

Dong, S. Z., & Wang, S. (2017). Dnn-based software defect prediction experimental data and code, https://github.com/breezedong/DNN-based-software-defect-prediction. Accessed July 20, 2017.

Ghotra, B., Mcintosh, S., & Hassan, A. E. (2015). Revisiting the impact of classification techniques on the performance of defect prediction models. In IEEE/ACM IEEE International Conference on Software Engineering (pp. 789–800).

Hall, T., Beecham, S., Bowes, D., Gray, D., & Counsell, S. (2012). A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(6), 1276–1304.CrossRef

Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.CrossRef

10.

Jerome, Q., Allix, K., State, R., & Engel, T. (2014). Using opcode-sequences to detect malicious android applications. In IEEE international conference on communications (pp. 914–919).

11.

Lecun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.CrossRef

12.

Lessmann, S., Baesens, B., Mues, C., & Pietsch, S. (2008). Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering, 34(4), 485–496.CrossRef

13.

Ma, Z., Rana, P. K., Taghia, J., Flierl, M., & Leijon, A. (2014). Bayesian estimation of Dirichlet mixture model with variational inference. Pattern Recognition, 47(9), 3143–3157.CrossRefMATH

14.

Ma, Z., Tan, Z. H., & Guo, J. (2016). Feature selection for neutral vector in eeg signal classification. Neurocomputing, 174, 937–945.CrossRef

15.

Ma, Z., Teschendorff, A. E., Leijon, A., Qiao, Y., Zhang, H., & Guo, J. (2015). Variational bayesian matrix factorization for bounded support data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(4), 876–889.CrossRef

16.

Ma, Z., Xie, J., Li, H., Sun, Q., Si, Z., Zhang, J., et al. (2017). The role of data analysis in the development of intelligent energy networks. IEEE Network, 31(5), 88–95.CrossRef

17.

Malhotra, R. (2016). An empirical framework for defect prediction using machine learning techniques with Android software. Applied Soft Computing, 49, 1034–1050.CrossRef

18.

Mclaughlin, N., Rincon, J. M. D., Kang, B. J., Yerima, S., Miller, P., Sezer, S., et al. (2017). Deep android malware detection. In ACM on conference on data and application security and privacy (pp. 301–308).

19.

Mou, L., Li, G., Jin, Z., Zhang, L., & Wang, T. (2014). Tbcnn: A tree-based convolutional neural network for programming language processing. Eprint Arxiv.

20.

Nguyen, V. H., & Le, M. S. T. (2010). Predicting vulnerable software components with dependency graphs. In International workshop on security measurements and metrics (p. 3).

21.

Perl, H., Dechand, S., Smith, M., Arp, D., Yamaguchi, F., Rieck, K., et al. (2015). Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits. In ACM Sigsac conference on computer and communications security (pp. 426–437).

22.

Prasad, M. C., Florence, L., & Arya, A. (2015). A study on software metrics based software defect prediction using data mining and machine learning techniques. International Journal of Database Theory and Application, 8(3), 179–190.CrossRef

23.

Scandariato, R., Walden, J., Hovsepyan, A., & Joosen, W. (2014). Predicting vulnerable software components via text mining. IEEE Transactions on Software Engineering, 40(10), 993–1006.CrossRef

24.

Schmidhuber, J. (2014). Deep learning in neural networks: An overview. Neural Networks the Official Journal of the International Neural Network Society, 61, 85.CrossRef

25.

Wang, S., Liu, T., & Tan, L.: Automatically learning semantic features for defect prediction. In IEEE/ACM international conference on software engineering (pp. 297–308).

26.

Xu, P., Yin, Q., Huang, Y., Song, Y. Z., Ma, Z., Wang, L., & Guo, J. (2017). Cross-modal Subspace Learning for Fine-grained Sketch-based Image Retrieval. arXiv preprint arXiv:1705.09888.

27.

Xu, Y., Du, J., Dai, L. R., & Lee, C. H. (2013). An experimental study on speech enhancement based on deep neural networks. IEEE Signal Processing Letters, 21(1), 65–68.CrossRef

28.

Yuan, Z., Lu, Y., Wang, Z., & Xue, Y. (2014). Droid-sec: Deep learning in android malware detection. ACM Sigcomm Computer Communication Review, 44(4), 371–372.CrossRef

29.

Zhang, F., Zheng, Q., Zou, Y., & Hassan, A. E. (2016). Cross-project defect prediction using a connectivity-based unsupervised classifier. In IEEE/ACM international conference on software engineering (pp. 309–320).

30.

Zhao, Z., Wang, J., & Bai, J. (2013). Malware detection method based on the control-flow construct feature of software. Iet Information Security, 8(1), 18–24.CrossRef

31.

Zhao, Z., Wang, J., & Wang, C. (2013). An unknown malware detection scheme based on the features of graph. Security and Communication Networks, 6(2), 239–246.CrossRef

Title: Defect Prediction in Android Binary Executables Using Deep Neural Network
Authors: Feng Dong
Junfeng Wang
Qi Li
Guoai Xu
Shaodong Zhang
Publication date: 21-11-2017
Publisher: Springer US
Published in: Wireless Personal Communications / Issue 3/2018
Print ISSN: 0929-6212
Electronic ISSN: 1572-834X
DOI: https://doi.org/10.1007/s11277-017-5069-3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 3/2018

Framework for Fast and Efficient Cloud Video Transcoding System Using Intelligent Splitter and Hadoop MapReduce

Variational Bayesian Inference for Infinite Dirichlet Mixture Towards Accurate Data Categorization

Machine Learning Based Resource Utilization and Pre-estimation for Network on Chip (NoC) Communication

A Human-in-the-Loop Architecture for Mobile Network: From the View of Large Scale Mobile Data Traffic

Mining Association Rules Based on Deep Pruning Strategies

Entropy with Local Binary Patterns for Efficient Iris Liveness Detection