Skip to main content
Top
Published in: Wireless Personal Communications 3/2018

21-11-2017

Defect Prediction in Android Binary Executables Using Deep Neural Network

Authors: Feng Dong, Junfeng Wang, Qi Li, Guoai Xu, Shaodong Zhang

Published in: Wireless Personal Communications | Issue 3/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Software defect prediction locates defective code to help developers improve the security of software. However, existing studies on software defect prediction are mostly limited to the source code. Defect prediction for Android binary executables (called apks) has never been explored in previous studies. In this paper, we propose an explorative study of defect prediction in Android apks. We first propose smali2vec, a new approach to generate features that capture the characteristics of smali (decompiled files of apks) files in apks. Smali2vec extracts both token and semantic features of the defective files in apks and such comprehensive features are needed for building accurate prediction models. Then we leverage deep neural network (DNN), which is one of the most common architecture of deep learning networks, to train and build the defect prediction model in order to achieve accuracy. We apply our defect prediction model to more than 90,000 smali files from 50 Android apks and the results show that our model could achieve an AUC (the area under the receiver operating characteristic curve) of 85.98% and it is capable of predicting defects in apks. Furthermore, the DNN is proved to have a better performance than the traditional shallow machine learning algorithms (e.g., support vector machine and naive bayes) used in previous studies. The model has been used in our practical work and helped locate many defective files in apks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Bengio, Y. (2009). Learning deep architectures for ai. Foundations & Trends in Machine Learning, 2(1), 1–127.CrossRefMATH Bengio, Y. (2009). Learning deep architectures for ai. Foundations & Trends in Machine Learning, 2(1), 1–127.CrossRefMATH
2.
go back to reference Bishnu, P. S., & Bhattacherjee, V. (2012). Software fault prediction using quad tree-based k-means clustering algorithm. IEEE Transactions on Knowledge and Data Engineering, 24(6), 1146–1150.CrossRef Bishnu, P. S., & Bhattacherjee, V. (2012). Software fault prediction using quad tree-based k-means clustering algorithm. IEEE Transactions on Knowledge and Data Engineering, 24(6), 1146–1150.CrossRef
3.
go back to reference David, O. E., & Netanyahu, N. S. (2015). Deepsign: Deep learning for automatic malware signature generation and classification. In International Joint Conference on Neural Networks (pp. 1–8). David, O. E., & Netanyahu, N. S. (2015). Deepsign: Deep learning for automatic malware signature generation and classification. In International Joint Conference on Neural Networks (pp. 1–8).
4.
go back to reference Deng, L., & Yu, D. (2014). Deep learning: methods and applications. Foundations and Trends® in Signal Processing, 7(3–4), 197–387.MathSciNetCrossRefMATH Deng, L., & Yu, D. (2014). Deep learning: methods and applications. Foundations and Trends® in Signal Processing, 7(3–4), 197–387.MathSciNetCrossRefMATH
5.
go back to reference Du, Y., Wang, X., & Wang, J. (2015). A static android malicious code detection method based on multisource fusion. Security and Communication Networks, 8(17), 3238–3246.CrossRef Du, Y., Wang, X., & Wang, J. (2015). A static android malicious code detection method based on multisource fusion. Security and Communication Networks, 8(17), 3238–3246.CrossRef
7.
go back to reference Ghotra, B., Mcintosh, S., & Hassan, A. E. (2015). Revisiting the impact of classification techniques on the performance of defect prediction models. In IEEE/ACM IEEE International Conference on Software Engineering (pp. 789–800). Ghotra, B., Mcintosh, S., & Hassan, A. E. (2015). Revisiting the impact of classification techniques on the performance of defect prediction models. In IEEE/ACM IEEE International Conference on Software Engineering (pp. 789–800).
8.
go back to reference Hall, T., Beecham, S., Bowes, D., Gray, D., & Counsell, S. (2012). A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(6), 1276–1304.CrossRef Hall, T., Beecham, S., Bowes, D., Gray, D., & Counsell, S. (2012). A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(6), 1276–1304.CrossRef
9.
go back to reference Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.CrossRef Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.CrossRef
10.
go back to reference Jerome, Q., Allix, K., State, R., & Engel, T. (2014). Using opcode-sequences to detect malicious android applications. In IEEE international conference on communications (pp. 914–919). Jerome, Q., Allix, K., State, R., & Engel, T. (2014). Using opcode-sequences to detect malicious android applications. In IEEE international conference on communications (pp. 914–919).
11.
go back to reference Lecun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.CrossRef Lecun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.CrossRef
12.
go back to reference Lessmann, S., Baesens, B., Mues, C., & Pietsch, S. (2008). Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering, 34(4), 485–496.CrossRef Lessmann, S., Baesens, B., Mues, C., & Pietsch, S. (2008). Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering, 34(4), 485–496.CrossRef
13.
go back to reference Ma, Z., Rana, P. K., Taghia, J., Flierl, M., & Leijon, A. (2014). Bayesian estimation of Dirichlet mixture model with variational inference. Pattern Recognition, 47(9), 3143–3157.CrossRefMATH Ma, Z., Rana, P. K., Taghia, J., Flierl, M., & Leijon, A. (2014). Bayesian estimation of Dirichlet mixture model with variational inference. Pattern Recognition, 47(9), 3143–3157.CrossRefMATH
14.
go back to reference Ma, Z., Tan, Z. H., & Guo, J. (2016). Feature selection for neutral vector in eeg signal classification. Neurocomputing, 174, 937–945.CrossRef Ma, Z., Tan, Z. H., & Guo, J. (2016). Feature selection for neutral vector in eeg signal classification. Neurocomputing, 174, 937–945.CrossRef
15.
go back to reference Ma, Z., Teschendorff, A. E., Leijon, A., Qiao, Y., Zhang, H., & Guo, J. (2015). Variational bayesian matrix factorization for bounded support data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(4), 876–889.CrossRef Ma, Z., Teschendorff, A. E., Leijon, A., Qiao, Y., Zhang, H., & Guo, J. (2015). Variational bayesian matrix factorization for bounded support data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(4), 876–889.CrossRef
16.
go back to reference Ma, Z., Xie, J., Li, H., Sun, Q., Si, Z., Zhang, J., et al. (2017). The role of data analysis in the development of intelligent energy networks. IEEE Network, 31(5), 88–95.CrossRef Ma, Z., Xie, J., Li, H., Sun, Q., Si, Z., Zhang, J., et al. (2017). The role of data analysis in the development of intelligent energy networks. IEEE Network, 31(5), 88–95.CrossRef
17.
go back to reference Malhotra, R. (2016). An empirical framework for defect prediction using machine learning techniques with Android software. Applied Soft Computing, 49, 1034–1050.CrossRef Malhotra, R. (2016). An empirical framework for defect prediction using machine learning techniques with Android software. Applied Soft Computing, 49, 1034–1050.CrossRef
18.
go back to reference Mclaughlin, N., Rincon, J. M. D., Kang, B. J., Yerima, S., Miller, P., Sezer, S., et al. (2017). Deep android malware detection. In ACM on conference on data and application security and privacy (pp. 301–308). Mclaughlin, N., Rincon, J. M. D., Kang, B. J., Yerima, S., Miller, P., Sezer, S., et al. (2017). Deep android malware detection. In ACM on conference on data and application security and privacy (pp. 301–308).
19.
go back to reference Mou, L., Li, G., Jin, Z., Zhang, L., & Wang, T. (2014). Tbcnn: A tree-based convolutional neural network for programming language processing. Eprint Arxiv. Mou, L., Li, G., Jin, Z., Zhang, L., & Wang, T. (2014). Tbcnn: A tree-based convolutional neural network for programming language processing. Eprint Arxiv.
20.
go back to reference Nguyen, V. H., & Le, M. S. T. (2010). Predicting vulnerable software components with dependency graphs. In International workshop on security measurements and metrics (p. 3). Nguyen, V. H., & Le, M. S. T. (2010). Predicting vulnerable software components with dependency graphs. In International workshop on security measurements and metrics (p. 3).
21.
go back to reference Perl, H., Dechand, S., Smith, M., Arp, D., Yamaguchi, F., Rieck, K., et al. (2015). Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits. In ACM Sigsac conference on computer and communications security (pp. 426–437). Perl, H., Dechand, S., Smith, M., Arp, D., Yamaguchi, F., Rieck, K., et al. (2015). Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits. In ACM Sigsac conference on computer and communications security (pp. 426–437).
22.
go back to reference Prasad, M. C., Florence, L., & Arya, A. (2015). A study on software metrics based software defect prediction using data mining and machine learning techniques. International Journal of Database Theory and Application, 8(3), 179–190.CrossRef Prasad, M. C., Florence, L., & Arya, A. (2015). A study on software metrics based software defect prediction using data mining and machine learning techniques. International Journal of Database Theory and Application, 8(3), 179–190.CrossRef
23.
go back to reference Scandariato, R., Walden, J., Hovsepyan, A., & Joosen, W. (2014). Predicting vulnerable software components via text mining. IEEE Transactions on Software Engineering, 40(10), 993–1006.CrossRef Scandariato, R., Walden, J., Hovsepyan, A., & Joosen, W. (2014). Predicting vulnerable software components via text mining. IEEE Transactions on Software Engineering, 40(10), 993–1006.CrossRef
24.
go back to reference Schmidhuber, J. (2014). Deep learning in neural networks: An overview. Neural Networks the Official Journal of the International Neural Network Society, 61, 85.CrossRef Schmidhuber, J. (2014). Deep learning in neural networks: An overview. Neural Networks the Official Journal of the International Neural Network Society, 61, 85.CrossRef
25.
go back to reference Wang, S., Liu, T., & Tan, L.: Automatically learning semantic features for defect prediction. In IEEE/ACM international conference on software engineering (pp. 297–308). Wang, S., Liu, T., & Tan, L.: Automatically learning semantic features for defect prediction. In IEEE/ACM international conference on software engineering (pp. 297–308).
26.
go back to reference Xu, P., Yin, Q., Huang, Y., Song, Y. Z., Ma, Z., Wang, L., & Guo, J. (2017). Cross-modal Subspace Learning for Fine-grained Sketch-based Image Retrieval. arXiv preprint arXiv:1705.09888. Xu, P., Yin, Q., Huang, Y., Song, Y. Z., Ma, Z., Wang, L., & Guo, J. (2017). Cross-modal Subspace Learning for Fine-grained Sketch-based Image Retrieval. arXiv preprint arXiv:​1705.​09888.
27.
go back to reference Xu, Y., Du, J., Dai, L. R., & Lee, C. H. (2013). An experimental study on speech enhancement based on deep neural networks. IEEE Signal Processing Letters, 21(1), 65–68.CrossRef Xu, Y., Du, J., Dai, L. R., & Lee, C. H. (2013). An experimental study on speech enhancement based on deep neural networks. IEEE Signal Processing Letters, 21(1), 65–68.CrossRef
28.
go back to reference Yuan, Z., Lu, Y., Wang, Z., & Xue, Y. (2014). Droid-sec: Deep learning in android malware detection. ACM Sigcomm Computer Communication Review, 44(4), 371–372.CrossRef Yuan, Z., Lu, Y., Wang, Z., & Xue, Y. (2014). Droid-sec: Deep learning in android malware detection. ACM Sigcomm Computer Communication Review, 44(4), 371–372.CrossRef
29.
go back to reference Zhang, F., Zheng, Q., Zou, Y., & Hassan, A. E. (2016). Cross-project defect prediction using a connectivity-based unsupervised classifier. In IEEE/ACM international conference on software engineering (pp. 309–320). Zhang, F., Zheng, Q., Zou, Y., & Hassan, A. E. (2016). Cross-project defect prediction using a connectivity-based unsupervised classifier. In IEEE/ACM international conference on software engineering (pp. 309–320).
30.
go back to reference Zhao, Z., Wang, J., & Bai, J. (2013). Malware detection method based on the control-flow construct feature of software. Iet Information Security, 8(1), 18–24.CrossRef Zhao, Z., Wang, J., & Bai, J. (2013). Malware detection method based on the control-flow construct feature of software. Iet Information Security, 8(1), 18–24.CrossRef
31.
go back to reference Zhao, Z., Wang, J., & Wang, C. (2013). An unknown malware detection scheme based on the features of graph. Security and Communication Networks, 6(2), 239–246.CrossRef Zhao, Z., Wang, J., & Wang, C. (2013). An unknown malware detection scheme based on the features of graph. Security and Communication Networks, 6(2), 239–246.CrossRef
Metadata
Title
Defect Prediction in Android Binary Executables Using Deep Neural Network
Authors
Feng Dong
Junfeng Wang
Qi Li
Guoai Xu
Shaodong Zhang
Publication date
21-11-2017
Publisher
Springer US
Published in
Wireless Personal Communications / Issue 3/2018
Print ISSN: 0929-6212
Electronic ISSN: 1572-834X
DOI
https://doi.org/10.1007/s11277-017-5069-3

Other articles of this Issue 3/2018

Wireless Personal Communications 3/2018 Go to the issue