Skip to main content
Erschienen in: International Journal of Information Security 3/2017

04.05.2016 | Regular Contribution

Entropy analysis to classify unknown packing algorithms for malware detection

verfasst von: Munkhbayar Bat-Erdene, Hyundo Park, Hongzhe Li, Heejo Lee, Mahn-Soo Choi

Erschienen in: International Journal of Information Security | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The proportion of packed malware has been growing rapidly and now comprises more than 80 % of all existing malware. In this paper, we propose a method for classifying the packing algorithms of given unknown packed executables, regardless of whether they are malware or benign programs. First, we scale the entropy values of a given executable and convert the entropy values of a particular location of memory into symbolic representations. Our proposed method uses symbolic aggregate approximation (SAX), which is known to be effective for large data conversions. Second, we classify the distribution of symbols using supervised learning classification methods, i.e., naive Bayes and support vector machines for detecting packing algorithms. The results of our experiments involving a collection of 324 packed benign programs and 326 packed malware programs with 19 packing algorithms demonstrate that our method can identify packing algorithms of given executables with a high accuracy of 95.35 %, a recall of 95.83 %, and a precision of 94.13 %. We propose four similarity measurements for detecting packing algorithms based on SAX representations of the entropy values and an incremental aggregate analysis. Among these four metrics, the fidelity similarity measurement demonstrates the best matching result, i.e., a rate of accuracy ranging from 95.0 to 99.9 %, which is from 2 to 13  higher than that of the other three metrics. Our study confirms that packing algorithms can be identified through an entropy analysis based on a measure of the uncertainty of the running processes and without prior knowledge of the executables.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Symantec Corporation.: Internet Security Threat Report (2014) Symantec Corporation.: Internet Security Threat Report (2014)
2.
Zurück zum Zitat Choi, H., Zhu, B.B., Lee, H.: Detecting Malicious Web Links and Identifying Their Attack Types. In: WebApps (2011) Choi, H., Zhu, B.B., Lee, H.: Detecting Malicious Web Links and Identifying Their Attack Types. In: WebApps (2011)
3.
Zurück zum Zitat Yan, W., Zhang, Z., Ansari, N.: Revealing packed malware. IEEE Secur. Priv. 6(5), 65–69 (2008)CrossRef Yan, W., Zhang, Z., Ansari, N.: Revealing packed malware. IEEE Secur. Priv. 6(5), 65–69 (2008)CrossRef
4.
Zurück zum Zitat Lyda, R., Hamrock, J.: Using entropy analysis to find encrypted and packed malware. IEEE Secur. Priv. 2, 40–45 (2007)CrossRef Lyda, R., Hamrock, J.: Using entropy analysis to find encrypted and packed malware. IEEE Secur. Priv. 2, 40–45 (2007)CrossRef
5.
Zurück zum Zitat Guo, F., Ferrie, P., Chiueh, T.C.: A study of the packer problem and its solutions. In: Recent Advances in Intrusion Detection, pp. 98–115. Springer, Berlin, Heidelberg, Cambridge (2008) Guo, F., Ferrie, P., Chiueh, T.C.: A study of the packer problem and its solutions. In: Recent Advances in Intrusion Detection, pp. 98–115. Springer, Berlin, Heidelberg, Cambridge (2008)
6.
Zurück zum Zitat Shafiq, M.Z., Tabish, S.M., Mirza, F., Farooq, M.: Pe-miner: Mining structural information to detect malicious executables in realtime. In: Recent advances in Intrusion Detection, pp. 121–141. (2009) Shafiq, M.Z., Tabish, S.M., Mirza, F., Farooq, M.: Pe-miner: Mining structural information to detect malicious executables in realtime. In: Recent advances in Intrusion Detection, pp. 121–141. (2009)
7.
Zurück zum Zitat Shafiq, M.Z., Tabish, S., Farooq, M.: PE-probe: leveraging packer detection and structural information to detect malicious portable executables. In: Proceedings of the Virus Bulletin Conference (VB), pp. 29–33. (2009) Shafiq, M.Z., Tabish, S., Farooq, M.: PE-probe: leveraging packer detection and structural information to detect malicious portable executables. In: Proceedings of the Virus Bulletin Conference (VB), pp. 29–33. (2009)
8.
Zurück zum Zitat Saichand, G., Kumar, T.V., Tech, M.: Malwise-An Effective and Efficient Classification System for Packed and Polymorphic Malware, IEEE Transactions on Computer, pp. 1193–1206. (2013) Saichand, G., Kumar, T.V., Tech, M.: Malwise-An Effective and Efficient Classification System for Packed and Polymorphic Malware, IEEE Transactions on Computer, pp. 1193–1206. (2013)
9.
Zurück zum Zitat Liu, L., Ming, J., Wang, Z., Gao, D., Jia, C.: Denial-of-service attacks on host-based generic unpackers. In: Information and Communications Security, pp. 241–253. (2009) Liu, L., Ming, J., Wang, Z., Gao, D., Jia, C.: Denial-of-service attacks on host-based generic unpackers. In: Information and Communications Security, pp. 241–253. (2009)
11.
Zurück zum Zitat Pasha, M.M.R., Prathima, M.Y., Thirupati, M.L., Malwise System for Packed and Polymorphic Malware, pp. 167–172. (2014) Pasha, M.M.R., Prathima, M.Y., Thirupati, M.L., Malwise System for Packed and Polymorphic Malware, pp. 167–172. (2014)
12.
Zurück zum Zitat Briones, I., Gomez, A.: Graphs, entropy and grid computing: automatic comparison of malware. In: Virus Bulletin Conference, pp. 1–12. (2014) Briones, I., Gomez, A.: Graphs, entropy and grid computing: automatic comparison of malware. In: Virus Bulletin Conference, pp. 1–12. (2014)
13.
Zurück zum Zitat Sun, L., Versteeg, S., Bozta, S., Yann, T.: Pattern recognition techniques for the classification of malware packers. In: Information Security and Privacy, pp. 370–390. (2010) Sun, L., Versteeg, S., Bozta, S., Yann, T.: Pattern recognition techniques for the classification of malware packers. In: Information Security and Privacy, pp. 370–390. (2010)
15.
Zurück zum Zitat Jacob, G., Comparetti, P.M., Neugschwandtner, M., Kruegel, C., Vigna, G.: A static, packer-agnostic filter to detect similar malware samples. In: Detection of intrusions and Malware, and vulnerability assessment, pp. 102–122. (2012) Jacob, G., Comparetti, P.M., Neugschwandtner, M., Kruegel, C., Vigna, G.: A static, packer-agnostic filter to detect similar malware samples. In: Detection of intrusions and Malware, and vulnerability assessment, pp. 102–122. (2012)
16.
Zurück zum Zitat Perdisci, R., Lanzi, A., Lee, W.: Classification of packed executables for accurate computer virus detection. Pattern Recognit. Lett. 29(14), 1941–1946 (2008)CrossRef Perdisci, R., Lanzi, A., Lee, W.: Classification of packed executables for accurate computer virus detection. Pattern Recognit. Lett. 29(14), 1941–1946 (2008)CrossRef
17.
Zurück zum Zitat Santos, I., Ugarte-Pedrero, X., Sanz, B., Laorden, C., Bringas, P.G.: Collective classification for packed executable identification. In: Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference, pp. 23–30. ACM (2011) Santos, I., Ugarte-Pedrero, X., Sanz, B., Laorden, C., Bringas, P.G.: Collective classification for packed executable identification. In: Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference, pp. 23–30. ACM (2011)
18.
Zurück zum Zitat Cesare, S. and Xiang, Y.: Classification of malware using structured control flow. In: Proceedings of the Eighth Australasian Symposium on Parallel and Distributed Computing-vol. 107, pp. 61–70. (2010) Cesare, S. and Xiang, Y.: Classification of malware using structured control flow. In: Proceedings of the Eighth Australasian Symposium on Parallel and Distributed Computing-vol. 107, pp. 61–70. (2010)
19.
Zurück zum Zitat Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 470–478. ACM (2004) Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 470–478. ACM (2004)
20.
Zurück zum Zitat Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: IEEE Symposium on Security and Privacy, Proceedings, pp. 38–49. IEEE (2001) Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: IEEE Symposium on Security and Privacy, Proceedings, pp. 38–49. IEEE (2001)
21.
Zurück zum Zitat Stolfo, S.J., Wang, K., Li, W.J.: Towards stealthy malware detection. In: Malware Detection, pp. 231–249. Springer, US (2007) Stolfo, S.J., Wang, K., Li, W.J.: Towards stealthy malware detection. In: Malware Detection, pp. 231–249. Springer, US (2007)
22.
Zurück zum Zitat Tian, R., Batten, L., Islam, R., Versteeg, S.: An automated classification system based on the strings of trojan and virus families. In: MALWARE International Conference on, pp. 23–30. IEEE (2009) Tian, R., Batten, L., Islam, R., Versteeg, S.: An automated classification system based on the strings of trojan and virus families. In: MALWARE International Conference on, pp. 23–30. IEEE (2009)
23.
Zurück zum Zitat Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable. Behavior-Based Malware Clustering. In: NDSS 9, 8–11 (2009) Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable. Behavior-Based Malware Clustering. In: NDSS 9, 8–11 (2009)
24.
Zurück zum Zitat Christodorescu, M., Jha, S., Kruegel, C.: Mining specifications of malicious behavior. In: Proceedings of the 1st India software engineering conference, pp. 5–14. ACM (2008) Christodorescu, M., Jha, S., Kruegel, C.: Mining specifications of malicious behavior. In: Proceedings of the 1st India software engineering conference, pp. 5–14. ACM (2008)
25.
Zurück zum Zitat Kolbitsch, C., Comparetti, P.M., Kruegel, C., Kirda, E., Zhou, X.Y., Wang, X.: Effective and efficient malware detection at the end host. In: USENIX Security Symposium, pp. 351–366. (2009) Kolbitsch, C., Comparetti, P.M., Kruegel, C., Kirda, E., Zhou, X.Y., Wang, X.: Effective and efficient malware detection at the end host. In: USENIX Security Symposium, pp. 351–366. (2009)
26.
Zurück zum Zitat Szor, P.: The Art of Computer Virus Research and Defense. Pearson Education, New York (2005) Szor, P.: The Art of Computer Virus Research and Defense. Pearson Education, New York (2005)
27.
Zurück zum Zitat Lee, J., Jeong, K., Lee, H.: Detecting metamorphic malwares using code graphs. In: Proceedings of the ACM Symposium on Applied Computing, pp. 1970–1977. (2010) Lee, J., Jeong, K., Lee, H.: Detecting metamorphic malwares using code graphs. In: Proceedings of the ACM Symposium on Applied Computing, pp. 1970–1977. (2010)
28.
Zurück zum Zitat Vapnik, V.N., Chervonenkis, A.J.: Theory of pattern Recognition: Statistical Problems of Learning, Nauka (1974) Vapnik, V.N., Chervonenkis, A.J.: Theory of pattern Recognition: Statistical Problems of Learning, Nauka (1974)
29.
Zurück zum Zitat Vapnik, V.: The Nature of Statistical Learning Theory. Springer Science & Business Media, New York (2013)MATH Vapnik, V.: The Nature of Statistical Learning Theory. Springer Science & Business Media, New York (2013)MATH
30.
Zurück zum Zitat Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)CrossRef Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)CrossRef
31.
Zurück zum Zitat Jeong, G., Choo, E., Lee, J., Bat-Erdene, M., Lee, H.: Generic unpacking using entropy analysis. In: Malicious and Unwanted Software (MALWARE), pp. 98–105. IEEE (2010) Jeong, G., Choo, E., Lee, J., Bat-Erdene, M., Lee, H.: Generic unpacking using entropy analysis. In: Malicious and Unwanted Software (MALWARE), pp. 98–105. IEEE (2010)
32.
Zurück zum Zitat Martignoni, L., Christodorescu, M., Jha, S.: Omniunpack: Fast, generic, and safe unpacking of malware. In: Computer Security Applications Conference, ACSAC, pp. 431–441. IEEE (2007) Martignoni, L., Christodorescu, M., Jha, S.: Omniunpack: Fast, generic, and safe unpacking of malware. In: Computer Security Applications Conference, ACSAC, pp. 431–441. IEEE (2007)
33.
Zurück zum Zitat Kang, M.G., Poosankam, P., Yin, H.: Renovo: A hidden code extractor for packed executables. In: Proceedings of the ACM workshop on Recurring malcode, pp. 46–53. ACM (2007) Kang, M.G., Poosankam, P., Yin, H.: Renovo: A hidden code extractor for packed executables. In: Proceedings of the ACM workshop on Recurring malcode, pp. 46–53. ACM (2007)
34.
Zurück zum Zitat Pietrek, M.: An In-depth Look into the Win32 Portable Executable File Format (2002) Pietrek, M.: An In-depth Look into the Win32 Portable Executable File Format (2002)
35.
Zurück zum Zitat Yeung, R.W.: A First Course in Information Theory. Springer Science & Business Media, New York (2012) Yeung, R.W.: A First Course in Information Theory. Springer Science & Business Media, New York (2012)
36.
Zurück zum Zitat Costa, M., Goldberger, A.L., Peng, C.K.: Multiscale entropy analysis of biological signals. Phys. Rev. E 71(2), 1–18 (2005)MathSciNetCrossRef Costa, M., Goldberger, A.L., Peng, C.K.: Multiscale entropy analysis of biological signals. Phys. Rev. E 71(2), 1–18 (2005)MathSciNetCrossRef
37.
Zurück zum Zitat Costa, M., Healey, J.A.: Multiscale entropy analysis of complex heart rate dynamics: discrimination of age and heart failure effects. In: Computers in Cardiology, pp. 705–708. IEEE (2003) Costa, M., Healey, J.A.: Multiscale entropy analysis of complex heart rate dynamics: discrimination of age and heart failure effects. In: Computers in Cardiology, pp. 705–708. IEEE (2003)
38.
Zurück zum Zitat Costa, M., Goldberger, A.L., Peng, C.K.: Multiscale entropy analysis of complex physiologic time series. Phys. Rev. Lett. 89(6), 21–24 (2002)CrossRef Costa, M., Goldberger, A.L., Peng, C.K.: Multiscale entropy analysis of complex physiologic time series. Phys. Rev. Lett. 89(6), 21–24 (2002)CrossRef
39.
Zurück zum Zitat Nikulin, V.V., Brismar, T.: Comment on multiscale entropy analysis of complex physiologic time series. Phys. Rev. Lett. 92(8), 804–812 (2004) Nikulin, V.V., Brismar, T.: Comment on multiscale entropy analysis of complex physiologic time series. Phys. Rev. Lett. 92(8), 804–812 (2004)
40.
Zurück zum Zitat Pincus, S.M.: Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci. 88(6), 2297–2301 (1991) Pincus, S.M.: Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci. 88(6), 2297–2301 (1991)
41.
Zurück zum Zitat Pincus, S.M.: Assessing serial irregularity and its implications for health. Ann. NY Acad. Sci. 954(1), 245–267 (2001)CrossRef Pincus, S.M.: Assessing serial irregularity and its implications for health. Ann. NY Acad. Sci. 954(1), 245–267 (2001)CrossRef
42.
Zurück zum Zitat Richman, J.S., Moorman, J.R.: Physiological time-series analysis using approximate entropy and sample entropy. Am. J. Physiol. Heart. Circ. Physiol. 278(6), H2039–H2049 (2000) Richman, J.S., Moorman, J.R.: Physiological time-series analysis using approximate entropy and sample entropy. Am. J. Physiol. Heart. Circ. Physiol. 278(6), H2039–H2049 (2000)
43.
Zurück zum Zitat Lake, D.E., Richman, J.S., Griffin, M.P., Moorman, J.R.: Sample entropy analysis of neonatal heart rate variability. Am. J. Physiol. Regul. Integ. Comp. Physiol. 283(3), R789–R797 (2002)CrossRef Lake, D.E., Richman, J.S., Griffin, M.P., Moorman, J.R.: Sample entropy analysis of neonatal heart rate variability. Am. J. Physiol. Regul. Integ. Comp. Physiol. 283(3), R789–R797 (2002)CrossRef
44.
Zurück zum Zitat Chakrabarti, K., Keogh, E., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. ACM Trans. Database Syst. (TODS) 27(2), 188–228 (2002)CrossRef Chakrabarti, K., Keogh, E., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. ACM Trans. Database Syst. (TODS) 27(2), 188–228 (2002)CrossRef
45.
Zurück zum Zitat Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pp. 2–11. ACM (2003) Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pp. 2–11. ACM (2003)
46.
Zurück zum Zitat Yi, B.K., Faloutsos, C.: Fast time sequence indexing for arbitrary Lp norms. VLDB, In: Proceedings of the 26th International Conference on Very Large Data Bases, pp. 385–394. (2000) Yi, B.K., Faloutsos, C.: Fast time sequence indexing for arbitrary Lp norms. VLDB, In: Proceedings of the 26th International Conference on Very Large Data Bases, pp. 385–394. (2000)
47.
Zurück zum Zitat Keogh, E., Kasetty, S.: On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min. Knowl. Discov. 7(4), 349–371 (2003)MathSciNetCrossRef Keogh, E., Kasetty, S.: On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min. Knowl. Discov. 7(4), 349–371 (2003)MathSciNetCrossRef
48.
Zurück zum Zitat Meijer, B.R.: Rules and algorithms for the design of templates for template matching. In: Pattern Recognition, Conference A: Computer Vision and Applications, In: Proceedings of the 11th IAPR International Conference on, pp. 760–763. IEEE (1992) Meijer, B.R.: Rules and algorithms for the design of templates for template matching. In: Pattern Recognition, Conference A: Computer Vision and Applications, In: Proceedings of the 11th IAPR International Conference on, pp. 760–763. IEEE (1992)
50.
Zurück zum Zitat Georgia Tech Information Security Center.: Offensive computing (2005) Georgia Tech Information Security Center.: Offensive computing (2005)
51.
Zurück zum Zitat Han, K.S., Lim, J.H., Kang, B., Im, E.G.: Malware analysis using visualized images and entropy graphs. Int. J. Inf. Secur. 14(1), 1–14 (2015) Han, K.S., Lim, J.H., Kang, B., Im, E.G.: Malware analysis using visualized images and entropy graphs. Int. J. Inf. Secur. 14(1), 1–14 (2015)
52.
Zurück zum Zitat Bat-Erdene, M., Kim, T., Li, H., Lee, H.: Dynamic classification of packing algorithms for inspecting executables using entropy analysis. In: MALWARE, 8th International Conference on, pp. 19–26. IEEE (2013) Bat-Erdene, M., Kim, T., Li, H., Lee, H.: Dynamic classification of packing algorithms for inspecting executables using entropy analysis. In: MALWARE, 8th International Conference on, pp. 19–26. IEEE (2013)
Metadaten
Titel
Entropy analysis to classify unknown packing algorithms for malware detection
verfasst von
Munkhbayar Bat-Erdene
Hyundo Park
Hongzhe Li
Heejo Lee
Mahn-Soo Choi
Publikationsdatum
04.05.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Information Security / Ausgabe 3/2017
Print ISSN: 1615-5262
Elektronische ISSN: 1615-5270
DOI
https://doi.org/10.1007/s10207-016-0330-4

Weitere Artikel der Ausgabe 3/2017

International Journal of Information Security 3/2017 Zur Ausgabe