Skip to main content
Top
Published in: Journal of Computer Virology and Hacking Techniques 3/2017

19-05-2016 | Original Paper

Graph embedding as a new approach for unknown malware detection

Authors: Hashem Hashemi, Amin Azmoodeh, Ali Hamzeh, Sattar Hashemi

Published in: Journal of Computer Virology and Hacking Techniques | Issue 3/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Malware is any type of computer program which is developed to harm computers, networks, and information. Noticeable growth of malware development has made computer and network security a significant and challenging area in recent years. There is an intensive competition between malwares and antiviruses. Malware authors make every effort to develop new harmful codes using various programming tricks and exploits which are unseen for detection techniques. On the other hand, antivirus developers upgrade their methods and algorithms to recognize unknown malware. Therefore, an accurate and rapid detection method is an irrefutable demand in computer security area. This paper proposes a new malware detection method based on the OpCodes within an executable file. Proposed method generates a graph of operational codes (OpCode) within an executable file and then embeds this graph into eigenspace using “Power Iteration” method. This will help us represent an executable file as a linear combination of eigenvectors proportionate to their eigenvalues, which is beneficial to train machine learning classifiers such as k-nearest neighbor (KNN) and support vector machine (SVM). The main advantages of our proposed method are high detection rate despite utilizing simple classifiers like KNN, acceptable computational complexity even in large scale datasets against rival methods, and low false positive rate.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
Literature
1.
go back to reference Elhadi, A.A.E., Maarof, M.A., Osman, A.H.: Information Assurance and Security Research Group, Faculty of Computer Science and Infor.: Malware detection based on hybrid signature behaviour application programming interface call graph. Am. J. Appl. Sci. 9(3), 283–288 (2012)CrossRef Elhadi, A.A.E., Maarof, M.A., Osman, A.H.: Information Assurance and Security Research Group, Faculty of Computer Science and Infor.: Malware detection based on hybrid signature behaviour application programming interface call graph. Am. J. Appl. Sci. 9(3), 283–288 (2012)CrossRef
2.
go back to reference Bazrafshan, Z., Hashemi, H., Mehdi, S., Fard, H., Hamzeh, A., Fard, S.M.H., Hamzeh, A.: A survey on heuristic malware detection techniques. Inf. Knowl. Technol. (IKT) 2, 113–120 (2013) Bazrafshan, Z., Hashemi, H., Mehdi, S., Fard, H., Hamzeh, A., Fard, S.M.H., Hamzeh, A.: A survey on heuristic malware detection techniques. Inf. Knowl. Technol. (IKT) 2, 113–120 (2013)
3.
go back to reference Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas, P.G.P.: Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inf. Sci. (Ny) 231, 64–82 (2013)MathSciNetCrossRef Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas, P.G.P.: Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inf. Sci. (Ny) 231, 64–82 (2013)MathSciNetCrossRef
4.
go back to reference Vinod, P., Jaipur, R., Laxmi, V., Gaur, M.: Survey on malware detection methods. In: Proceedings of the 3rd Hackers’ Workshop on computer and internet security (IITKHACK’09), pp. 74–79 (2009) Vinod, P., Jaipur, R., Laxmi, V., Gaur, M.: Survey on malware detection methods. In: Proceedings of the 3rd Hackers’ Workshop on computer and internet security (IITKHACK’09), pp. 74–79 (2009)
5.
go back to reference Bilar, D.: Opcodes as predictor for malware. Int. J. Electron. Secur. Digit. 1(2), 156–168 (2007) Bilar, D.: Opcodes as predictor for malware. Int. J. Electron. Secur. Digit. 1(2), 156–168 (2007)
6.
go back to reference Uppal, D., Sinha, R., Mehra, V., Jain, V.: Malware detection and classification based on extraction of API sequences. In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2337–342. IEEE (2014) Uppal, D., Sinha, R., Mehra, V., Jain, V.: Malware detection and classification based on extraction of API sequences. In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2337–342. IEEE (2014)
7.
go back to reference Uppal, D., Sinha, R.: Exploring behavioral aspects of API calls for malware identification and categorization. In: Networks (CICN), vol. 2014 (2014) Uppal, D., Sinha, R.: Exploring behavioral aspects of API calls for malware identification and categorization. In: Networks (CICN), vol. 2014 (2014)
8.
go back to reference Sundarkumar, G., Ravi, V.: Malware detection via API calls, topic models and machine learning. In: (CASE), 2015 IEEE (2015) Sundarkumar, G., Ravi, V.: Malware detection via API calls, topic models and machine learning. In: (CASE), 2015 IEEE (2015)
9.
go back to reference Fan, C.I., Hsiao, H.W., Chou, C.H., Tseng, Y.F.: Malware detection systems based on API log data mining. In: 2015 IEEE 39th Annual Computer Software and Application Conference (COMPSAC), pp. 255–60. IEEE (2015) Fan, C.I., Hsiao, H.W., Chou, C.H., Tseng, Y.F.: Malware detection systems based on API log data mining. In: 2015 IEEE 39th Annual Computer Software and Application Conference (COMPSAC), pp. 255–60. IEEE (2015)
10.
go back to reference Alam, S., Traore, I., Sogukpinar, I.: Annotated control flow graph for metamorphic malware detection. Comput. J. (2014) Alam, S., Traore, I., Sogukpinar, I.: Annotated control flow graph for metamorphic malware detection. Comput. J. (2014)
11.
go back to reference Cesare S., Xiang Y., Zhou W.: Control flow-based malware variant detection. In: IEEE Transactions on Dependable and Secure Computing, pp. 307–317. IEEE (2014) Cesare S., Xiang Y., Zhou W.: Control flow-based malware variant detection. In: IEEE Transactions on Dependable and Secure Computing, pp. 307–317. IEEE (2014)
12.
go back to reference Wüchner, T., Ochoa, M., Pretschner, A.: Malware detection with quantitative data flow graphs. In: Proc. 9th ACM Symp. Information, Comput. Commun. Secur. - ASIA CCS ’14, pp. 271–282 (2014) Wüchner, T., Ochoa, M., Pretschner, A.: Malware detection with quantitative data flow graphs. In: Proc. 9th ACM Symp. Information, Comput. Commun. Secur. - ASIA CCS ’14, pp. 271–282 (2014)
13.
go back to reference Wüchner, T., Ochoa, M., Pretschner, A.: Robust and effective malware detection through quantitative data flow graph metrics. (2015). arXiv:1502.01609 Wüchner, T., Ochoa, M., Pretschner, A.: Robust and effective malware detection through quantitative data flow graph metrics. (2015). arXiv:​1502.​01609
14.
go back to reference Abou-assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: N-gram-based detection of new malicious code. In: Computer Software and Applications Conference, 2004. Proceedings of the 28th Annual International, vol. 2, no. 1, pp. 41–42 (2004) Abou-assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: N-gram-based detection of new malicious code. In: Computer Software and Applications Conference, 2004. Proceedings of the 28th Annual International, vol. 2, no. 1, pp. 41–42 (2004)
15.
go back to reference Canfora, G., Lorenzo, A.: Effectiveness of opcode ngrams for detection of multi family android malware. In: 10th International Conference on Availability, Reliability and Security (ARES), pp. 333–340. IEEE (2015) Canfora, G., Lorenzo, A.: Effectiveness of opcode ngrams for detection of multi family android malware. In: 10th International Conference on Availability, Reliability and Security (ARES), pp. 333–340. IEEE (2015)
16.
go back to reference Santos, I., Sanz, B., Laorden, C., Brezo, F., Bringas, P.G.: Opcode-sequence-based semi-supervised unknown malware detection. Comput. Intell. Secur. Inf. Syst., pp. 50–57 (2011) Santos, I., Sanz, B., Laorden, C., Brezo, F., Bringas, P.G.: Opcode-sequence-based semi-supervised unknown malware detection. Comput. Intell. Secur. Inf. Syst., pp. 50–57 (2011)
17.
go back to reference Santos, I., Brezo, F., Nieves, J., Penya, Y.Y.K., Sanz, B., Laorden, C., Bringas, P.G.: Idea: Opcode-sequence-based malware detection. Eng. Secur. Softw. Syst., pp. 35–43 (2010) Santos, I., Brezo, F., Nieves, J., Penya, Y.Y.K., Sanz, B., Laorden, C., Bringas, P.G.: Idea: Opcode-sequence-based malware detection. Eng. Secur. Softw. Syst., pp. 35–43 (2010)
18.
go back to reference Santos, I., Brezo, F., Sanz, B., Laorden, C., Bringas, P.P.G.: Using opcode sequences in single-class learning to detect unknown malware. IET Inf. Secur. 5(4), 220 (2011)CrossRef Santos, I., Brezo, F., Sanz, B., Laorden, C., Bringas, P.P.G.: Using opcode sequences in single-class learning to detect unknown malware. IET Inf. Secur. 5(4), 220 (2011)CrossRef
19.
go back to reference Santos, I., Laorden, C., Bringas, P.G.P.: Collective classification for unknown malware detection. SECRYPT, pp. 251–256 (2011) Santos, I., Laorden, C., Bringas, P.G.P.: Collective classification for unknown malware detection. SECRYPT, pp. 251–256 (2011)
20.
go back to reference Anderson, B., Quist, D., Neil, J., Storlie, C., Lane, T.: Graph-based malware detection using dynamic analysis. J. Comput. Virol. 7(4), 247–258 (2011)CrossRef Anderson, B., Quist, D., Neil, J., Storlie, C., Lane, T.: Graph-based malware detection using dynamic analysis. J. Comput. Virol. 7(4), 247–258 (2011)CrossRef
21.
go back to reference Jalote, P., Jalote, P., Jalote, P.: An integrated approach to software engineering. Springer, NewYork (2005)MATH Jalote, P., Jalote, P., Jalote, P.: An integrated approach to software engineering. Springer, NewYork (2005)MATH
23.
go back to reference Wilhelm, R., Engblom, J., Ermedahl, A.: The worst-case execution-time problem—overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst., vol. V, pp. 1–47 (2008) Wilhelm, R., Engblom, J., Ermedahl, A.: The worst-case execution-time problem—overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst., vol. V, pp. 1–47 (2008)
24.
25.
go back to reference Zhao, Z.: A virus detection scheme based on features of Control Flow Graph. In: 2nd International Conference on Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC), pp. 943–947 (2011) Zhao, Z.: A virus detection scheme based on features of Control Flow Graph. In: 2nd International Conference on Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC), pp. 943–947 (2011)
26.
go back to reference Mitchell, T.M.: Machine learning and data mining over the past. vol. 42, no. 11 (1999) Mitchell, T.M.: Machine learning and data mining over the past. vol. 42, no. 11 (1999)
27.
go back to reference Breiman, L.: Bagging predictors. Mach. Learn. 140, 123–140 (1996)MATH Breiman, L.: Bagging predictors. Mach. Learn. 140, 123–140 (1996)MATH
28.
go back to reference Breu, F., Guggenbichler, S., Wollmann, J.: Random forests. Vasa, pp. 1–35 (2008) Breu, F., Guggenbichler, S., Wollmann, J.: Random forests. Vasa, pp. 1–35 (2008)
29.
go back to reference Tesauro, G.J., Kephart, J.O., Sorkin, G.B.: Neural networks for computer virus recognition. IEEE Expert 11(4), 5–6 (1996) Tesauro, G.J., Kephart, J.O., Sorkin, G.B.: Neural networks for computer virus recognition. IEEE Expert 11(4), 5–6 (1996)
30.
go back to reference Arnold, W., Tesauro, G., Heights, Y.: Automatically generated Win32 heuristic virus detection. In: Proc. 2000 Int. virus Bull. Conf., no. September, pp. 51–60 (2000) Arnold, W., Tesauro, G., Heights, Y.: Automatically generated Win32 heuristic virus detection. In: Proc. 2000 Int. virus Bull. Conf., no. September, pp. 51–60 (2000)
31.
go back to reference Abou-Assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: N-gram-based detection of new malicious code. In: Computer Software and Applications Conference, 2004. COMPSAC 2004. Proceedings of the 28th Annual International, vol. 2, no. 1, pp. 41–42 (2004) Abou-Assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: N-gram-based detection of new malicious code. In: Computer Software and Applications Conference, 2004. COMPSAC 2004. Proceedings of the 28th Annual International, vol. 2, no. 1, pp. 41–42 (2004)
32.
go back to reference Ye, Y., Li, T., Huang, K., Jiang, Q., Chen, Y.: Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list. J. Intell. Inf. 35(1), 1–20 (2010)CrossRef Ye, Y., Li, T., Huang, K., Jiang, Q., Chen, Y.: Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list. J. Intell. Inf. 35(1), 1–20 (2010)CrossRef
33.
go back to reference Peng, H., Long, F., Ding C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005) Peng, H., Long, F., Ding C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
34.
go back to reference Shabtai, A., Moskovitch, R., Feher, C., Dolev, S., Elovici, Y.: Detecting unknown malicious code by applying classification techniques on OpCode patterns. Secur. Inf. 1(1), 1 (2012)CrossRef Shabtai, A., Moskovitch, R., Feher, C., Dolev, S., Elovici, Y.: Detecting unknown malicious code by applying classification techniques on OpCode patterns. Secur. Inf. 1(1), 1 (2012)CrossRef
35.
go back to reference Runwal, N., Low, R.M., Stamp, M.: Opcode graph similarity and metamorphic detection. J. Comput. Virol. 8(1–2), 37–52 (2012)CrossRef Runwal, N., Low, R.M., Stamp, M.: Opcode graph similarity and metamorphic detection. J. Comput. Virol. 8(1–2), 37–52 (2012)CrossRef
36.
go back to reference Shanmugam, G., Low, R.M., Stamp M.: Simple substitution distance and metamorphic detection. J. Comput. Virol. Hack. Tech. 9(3), 159–170 (2013) Shanmugam, G., Low, R.M., Stamp M.: Simple substitution distance and metamorphic detection. J. Comput. Virol. Hack. Tech. 9(3), 159–170 (2013)
37.
go back to reference Toderici, A.H., Stamp, M.: Chi-squared distance and metamorphic virus detection. J. Comput. Virol. Hacking Tech. 9(1), 1–14 (2012) Toderici, A.H., Stamp, M.: Chi-squared distance and metamorphic virus detection. J. Comput. Virol. Hacking Tech. 9(1), 1–14 (2012)
38.
go back to reference Filiol, E., Josse, S.: A statistical model for undecidable viral detection. J. Comput. Virol. 3(2), 65–74 (2007)CrossRef Filiol, E., Josse, S.: A statistical model for undecidable viral detection. J. Comput. Virol. 3(2), 65–74 (2007)CrossRef
39.
go back to reference Dhavare, A., Low, R.M., Stamp, M.: Efficient cryptanalysis of homophonic substitution ciphers. Cryptologia 37(3), 250–281 (2013)CrossRef Dhavare, A., Low, R.M., Stamp, M.: Efficient cryptanalysis of homophonic substitution ciphers. Cryptologia 37(3), 250–281 (2013)CrossRef
40.
go back to reference Jidigam, R.K., Austin, T.H., Stamp, M.: Singular value decomposition and metamorphic detection. J. Comput. Virol. Hack. Tech. 11(4), 203–216 (2015) Jidigam, R.K., Austin, T.H., Stamp, M.: Singular value decomposition and metamorphic detection. J. Comput. Virol. Hack. Tech. 11(4), 203–216 (2015)
41.
go back to reference Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cogn. Neurosci. 3(1), 71–86 (1991)CrossRef Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cogn. Neurosci. 3(1), 71–86 (1991)CrossRef
42.
go back to reference Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. Wiley (2001) Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. Wiley (2001)
43.
go back to reference Riesen, K., Bunke, H.: Graph classification and clustering based on vector space embedding. World Scientific, Singapore (2010) Riesen, K., Bunke, H.: Graph classification and clustering based on vector space embedding. World Scientific, Singapore (2010)
44.
go back to reference Kandel, A., Bunke, H., Last, M.: Applied Graph Theory in Computer Vision and Pattern Recognition. Brain Cogn. 52, 262 (2007) Kandel, A., Bunke, H., Last, M.: Applied Graph Theory in Computer Vision and Pattern Recognition. Brain Cogn. 52, 262 (2007)
45.
go back to reference Chung, F.R.K.: Spectral Graph Theory, vol. 30. AMS Bookstore (1999) Chung, F.R.K.: Spectral Graph Theory, vol. 30. AMS Bookstore (1999)
46.
go back to reference Hancock, E.R.: Structural graph matching using the EM algorithm and singular value decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 23(10), 1120–1136 (2001)CrossRef Hancock, E.R.: Structural graph matching using the EM algorithm and singular value decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 23(10), 1120–1136 (2001)CrossRef
47.
go back to reference Wilson, R.C., Hancock, E.R.: Levenshtein distance for graph spectral features. In: ICPR (2), no. C, pp. 489–492 (2004) Wilson, R.C., Hancock, E.R.: Levenshtein distance for graph spectral features. In: ICPR (2), no. C, pp. 489–492 (2004)
48.
go back to reference Robles-Kelly, A., Hancock, E.R.: A Riemannian approach to graph embedding. Pattern Recognit. 40(3), 1042–1056 (2007)CrossRefMATH Robles-Kelly, A., Hancock, E.R.: A Riemannian approach to graph embedding. Pattern Recognit. 40(3), 1042–1056 (2007)CrossRefMATH
49.
go back to reference Umeyama, S.: An eigendecomposition approach to weighted graph matching problems. Pattern Anal. Mach. Intell. IEEE 10(5), 695–703 (1988)CrossRefMATH Umeyama, S.: An eigendecomposition approach to weighted graph matching problems. Pattern Anal. Mach. Intell. IEEE 10(5), 695–703 (1988)CrossRefMATH
50.
go back to reference Luo, B., Wilson, R.C., Hancock, E.R., Wilson, R.C.: Spectral embedding of graphs. Pattern Recognit. 36(10), 2213–2230 (2003)CrossRefMATH Luo, B., Wilson, R.C., Hancock, E.R., Wilson, R.C.: Spectral embedding of graphs. Pattern Recognit. 36(10), 2213–2230 (2003)CrossRefMATH
51.
go back to reference Lin, F., Cohen, W.W.: Power iteration clustering. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 655–662 (2010) Lin, F., Cohen, W.W.: Power iteration clustering. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 655–662 (2010)
52.
go back to reference Harrington, P.: Machine Learning in Action, vol. 37, no. 3 (2012) Harrington, P.: Machine Learning in Action, vol. 37, no. 3 (2012)
53.
go back to reference Eskandari, M., Khorshidpour, Z., Hashemi, S.: HDM-analyser: a hybrid analysis approach based on data mining techniques for malware detection. J. Comput. Virol. Hacking Tech. 9(2), 77–93 (2013) Eskandari, M., Khorshidpour, Z., Hashemi, S.: HDM-analyser: a hybrid analysis approach based on data mining techniques for malware detection. J. Comput. Virol. Hacking Tech. 9(2), 77–93 (2013)
54.
go back to reference Kohavi, R.: A study of cross-validation and bootstrap for accuracy esti-mation and model selection. In: Proceedings of the 1995 International Joint Conference on Artificial Intelligence, vol. 14, no. 2, pp. 1137–1145 (1995) Kohavi, R.: A study of cross-validation and bootstrap for accuracy esti-mation and model selection. In: Proceedings of the 1995 International Joint Conference on Artificial Intelligence, vol. 14, no. 2, pp. 1137–1145 (1995)
Metadata
Title
Graph embedding as a new approach for unknown malware detection
Authors
Hashem Hashemi
Amin Azmoodeh
Ali Hamzeh
Sattar Hashemi
Publication date
19-05-2016
Publisher
Springer Paris
DOI
https://doi.org/10.1007/s11416-016-0278-y

Other articles of this Issue 3/2017

Journal of Computer Virology and Hacking Techniques 3/2017 Go to the issue

Premium Partner