Skip to main content
Log in

Network malware classification comparison using DPI and flow packet headers

  • Original Paper
  • Published:
Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Abstract

In order to counter cyber-attacks and digital threats, security experts must generate, share, and exploit cyber-threat intelligence generated from malware. In this research, we address the problem of fingerprinting maliciousness of traffic for the purpose of detection and classification. We aim first at fingerprinting maliciousness by using two approaches: Deep Packet Inspection (DPI) and IP packet headers classification. To this end, we consider malicious traffic generated from dynamic malware analysis as traffic maliciousness ground truth. In light of this assumption, we present how these two approaches are used to detect and attribute maliciousness to different threats. In this work, we study the positive and negative aspects for Deep Packet Inspection and IP packet headers classification. We evaluate each approach based on its detection and attribution accuracy as well as their level of complexity. The outcomes of both approaches have shown promising results in terms of detection; they are good candidates to constitute a synergy to elaborate or corroborate detection systems in terms of run-time speed and classification precision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Abdelnour, A.F., Selesnick, I.W.: Nearly symmetric orthogonal wavelet bases. In: Proceedings of the IEEE International Conference on Acoustics, Speech, Signal Processing (ICASSP) (2001a)

  2. Aggarwal, C.C., Gates, S.C., Yu, P.S.: On the merits of building categorization systems by supervised clustering. In: KDD, KDD’99, pp. 352–356. ACM, New York, NY (1999)

  3. Alshammari, R.A., Zincir-Heywood, A.N.: Investigating two different approaches for encrypted traffic classification. In: Proceedings of the Sixth Annual Conference on Privacy, Security and Trust (PST’08), pp. 156–166. IEEE Computer Society, Washington, DC (2008)

  4. Alshammari, R.A., Zincir-Heywood, A.N.: Machine learning based encrypted traffic classification: Identifying SSH and Skype. In: Proceedings of the IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA 2009), pp. 1–8. IEEE (2009)

  5. Alshammari, R.A.: Automatically generating robust signatures using a machine learning approach to unveil encrypted VOIP traffic without using port numbers, IP addresses and payload inspection. Ph.D. thesis, Dalhousie University, Halifax, Nova Scotia (2012)

  6. Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated classification and analysis of Internet malware. Tech. rep., University of Michigan (2007). http://www.eecs.umich.edu/techreports/cse/2007/CSE-TR-530-07.pdf

  7. Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: NDSS, vol. 9 (2009)

  8. Binkley, J.R., Singh, S.: An algorithm for anomaly-based botnet detection. In: Proceedings of the 2nd conference on Steps to Reducing Unwanted Traffic on the Internet, no. 2 in SRUTI, pp. 1–7. USENIX Association, Berkeley, CA (2006)

  9. Bloedorn, E., Christiansen, A.D., Hill, W., Skorupka, C., Talbot, L.M., Tivel, J.: Data mining for network intrusion detection: How to get started. Tech. rep., The MITRE Corporation (2001). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.102.8556&rep=rep1&type=pdf

  10. Boggs, N., Hiremagalore, S., Stavrou, A., Stolfo, S.J.: Cross-domain collaborative anomaly detection: so far yet so close. In: Recent Advances in Intrusion Detection, pp. 142–160. Springer, Berlin (2011)

  11. Boukhtouta, A., Lakhdari, N.E., Mokhov, S.A., Debbabi, M.: Towards fingerprinting malicious traffic. In: Proceedings of ANT’13, vol. 19, pp. 548–555. Elsevier, Amsterdam (2013). doi:10.1016/j.procs.2013.06.073

  12. Bozorgi, M., Saul, L.K., Savage, S., Voelker, G.M.: Beyond heuristics: Learning to classify vulnerabilities and predict exploits. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’10, pp. 105–114. ACM, New York, NY (2010). doi:10.1145/1835804.1835821

  13. Chang, S., Daniels, T.E.: P2P botnet detection using behavior clustering & statistical tests. In: Proceedings of the 2nd ACM Workshop on Security and Artificial Intelligence. AISec, pp. 23–30. ACM, New York, NY (2009)

  14. CrySyS Lab: sKyWIper (a.k.a. Flame a.k.a. Flamer): A complex malware for targeted attacks. Tech. rep., Budapest University of Technology and Economics: Department of Telecommunications, Budapest, Hungary (2012). http://www.crysys.hu/skywiper/skywiper.pdf

  15. Dhillon, I.S., Mallela, S., Kumar, R.: A divisive information theoretic feature clustering algorithm for text classification. J. Mach. Learn. Res. 3, 1265–1287 (2003)

    MathSciNet  MATH  Google Scholar 

  16. Dietrich, C.J., Rossow, C., Pohlmann, N.: CoCoSpot: clustering and recognizing botnet command and control channels using traffic analysis. Comput. Netw. 57(2), 475–486 (2013)

    Article  Google Scholar 

  17. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2012)

    MATH  Google Scholar 

  18. Fan, W., Miller, M., Stolfo, S., Lee, W., Chan, P.: Using artificial anomalies to detect unknown and known network intrusions. In: Proceedings of the IEEE International Conference on Data Mining (ICDM 2001), pp. 123–130 (2001). doi:10.1109/ICDM.2001.989509

  19. Frank, E.: J48. [online] (2012). http://weka.sourceforge.net/doc.dev/weka/classifiers/trees/J48.html

  20. Frank, E., Legg, S., Inglis, S.: Class SMO. [online] (2012). http://weka.sourceforge.net/doc/weka/classifiers/functions/SMO.html

  21. Freund, Y.: Boosting a weak learning algorithm by majority. Inf. Comput. 121(2), 256–285 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  22. Golovko, V., Bezobrazov, S., Kachurka, P., Vaitsekhovich, L.: Neural network and artificial immune systems for malware and network intrusion detection. In: Advances in Machine Learning II, pp. 485–513. Springer, Berlin (2010)

  23. Gu, G., Porras, P., Yegneswaran, V., Fong, M., Lee, W.: BotHunter: detecting malware infection through IDS-driven dialog correlation. In: Proceedings of 16th USENIX Security Symposium, SS, pp. 1–16. USENIX Association, Berkeley, CA (2007)

  24. Gu, G., Zhang, J., Lee, W.: BotSniffer: Detecting botnet command and control channels in network traffic. In: Proceedings of the Network and Distributed System Security Symposium, NDSS. The Internet Society (2008)

  25. Gu, G., Perdisci, R., Zhang, J., Lee, W.: BotMiner: clustering analysis of network traffic for protocol- and structure-independent botnet detection. In: Proceedings of the 17th Security Symposium, SS, pp. 139–154. USENIX Association, Berkeley, CA (2008)

  26. Han, B.: Towards a multi-tier runtime system for GIPSY. Master’s thesis, Department of Computer Science and Software Engineering, Concordia University, Montreal (2010)

  27. Han, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco, CA (2005)

    Google Scholar 

  28. Hearst, M.A., Dumais, S., Osman, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE of Intelligent Systems and Their Applications 13(4), 18–28 (1998)

    Article  Google Scholar 

  29. Hu, X., Shin, K.G., Bhatkar, S., Griffin, K.: MutantX-S: Scalable malware clustering based on static features. In: USENIX Annual Technical Conference, pp. 187–198 (2013)

  30. Ji, Y.: Scalability evaluation of the GIPSY runtime system. Master’s thesis, Department of Computer Science and Software Engineering, Concordia University, Montreal (2011). http://spectrum.library.concordia.ca/7152/

  31. Karasaridis, A., Rexroad, B., Hoeflin, D.: Wide-scale botnet detection and characterization. In: Proceedings of the First Workshop on Hot Topics in Understanding Botnets, HotBots, pp. 1–7. USENIX Association, Berkeley, CA (2007)

  32. Karypis Lab: Data clustering software. [online] (2006–2014). http://glaros.dtc.umn.edu/gkhome/views/cluto

  33. Katz, G., Shabtai, A., Rokach, L., Ofek, N.: ConfDTree: Improving decision trees using confidence intervals. In: 12th IEEE International Conference on, Data Mining (ICDM), pp. 339–348 (2012)

  34. Kheir, N., Blanc, G., Debar, H., Garcia-Alfaro, J., Yang, D.: Automated classification of C&C connections through malware URL clustering. In: ICT Systems Security and Privacy Protection, pp. 252–266. Springer, Berlin (2015)

  35. Kirat, D., Nataraj, L., Vigna, G., Manjunath, B.S.: SigMal: a static signal processing based malware triage. In: ACSAC’13. ACM, New York, NY (2013). doi:10.1145/2523649.2523682

  36. Kokare, M., Biswas, P.K., Chatterji, B.N.: Texture image retrieval using new rotated complex wavelet filters. IEEE Transaction on Systems, Man, and Cybernetics-Part B: Cybernetics 6(35), 1168–1178 (2005)

    Article  Google Scholar 

  37. Kokare, M., Biswas, P.K., Chatterji, B.N.: Rotation-invariant texture image retrieval using rotated complex wavelet filters. IEEE Transaction on Systems, Man, and Cybernetics-Part B: Cybernetics 6(36), 1273–1282 (2006)

    Article  Google Scholar 

  38. Kremenek, T., Engler, D.: Z-ranking: Using statistical analysis to counter the impact of static analysis approximations. In: SAS 2003 (2003)

  39. Kremenek, T., Ashcraft, K., Yang, J., Engler, D.: Correlation exploitation in error ranking. In: Foundations of Software Engineering (FSE) (2004)

  40. Kremenek, T., Twohey, P., Back, G., Ng, A., Engler, D.: From uncertainty to belief: inferring the specification within. In: Proceedings of the 7th Symposium on Operating System Design and Implementation (2006)

  41. Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: KDD, KDD’99, pp. 16–22. ACM, New York, NY (1999)

  42. Lee, W.: Applying data mining to intrusion detection: the quest for automation, efficiency, and credibility. ACM SIGKDD Explorations Newsletter 4(2), 35–42 (2001)

    Article  Google Scholar 

  43. Lee, W., Stolfo, S.J., Mok, K.W.: Adaptive intrusion detection: a data mining approach. Artificial Intelligence Review 14, 533–567 (2000). doi:10.1023/1006624031083

    Article  MATH  Google Scholar 

  44. Li, R., Xi, O.J., Pang, B., Shen, J., Ren, C.L.: Network application identification based on wavelet transform and k-means algorithm. In: Proceedings of the IEEE International Conference on Intelligent Computing and Intelligent Systems (ICIS2009), vol. 1, pp. 38–41 (2009). doi:10.1109/ICICISYS.2009.5357939

  45. Li, W., Canini, M., Moore, A.W., Bolla, R.: Efficient application identification and the temporal and spatial stability of classification schema. Comput. Netw. 53, 790–809 (2009)

    Article  MATH  Google Scholar 

  46. Limthong, K., Kensuke, F., Watanapongse, P.: Wavelet-based unwanted traffic time series analysis. In: 2008 International Conference on Computer and Electrical Engineering, pp. 445–449. IEEE Computer Society, Washington, DC (2008). doi:10.1109/ICCEE.2008.106

  47. Livadas, C., Walsh, R., Lapsley, D.E., Strayer, W.T.: Using machine learning techniques to identify botnet traffic. In: LCN, pp. 967–974. IEEE Computer Society, Washington, DC (2006)

  48. Locasto, M.E., Parekh, J.J., Stolfo, S., Misra, V.: Collaborative distributed intrusion detection. Tech. Rep. CUCS-012-04 (2004). http://hdl.handle.net/10022/AC:P:29215

  49. Locasto, M.E., Parekh, J.J., Keromytis, A.D., Stolfo, S.J.: Towards collaborative security and P2P intrusion detection. In: Proceedings of the Information Assurance Workshop (IAW’05), from the Sixth Annual IEEE SMC, pp. 333–339. IEEE (2005)

  50. Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA (2002)

    MATH  Google Scholar 

  51. MathWorks: MATLAB. [online] (2000–2012). http://www.mathworks.com/products/matlab/

  52. MathWorks: MATLAB Coder. [online] (2012). http://www.mathworks.com/help/toolbox/coder/coder_product_page.html, last viewed June 2012

  53. MathWorks: MATLAB Coder: codegen—generate C/C++ code from MATLAB code. [online] (2012). http://www.mathworks.com/help/toolbox/coder/ref/codegen.html, last viewed June 2012

  54. McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions, vol. 382. Wiley, New York (2007)

    MATH  Google Scholar 

  55. Mokhov, S.A.: Study of best algorithm combinations for speech processing tasks in machine learning using median vs. mean clusters in MARF. In: Desai, B.C. (ed.) Proceedings of C3S2E’08, pp. 29–43. ACM, Montreal, Quebec (2008). doi:10.1145/1370256.1370262

    Google Scholar 

  56. Mokhov, S.A.: MARFCAT—MARF-based Code Analysis Tool. Published electronically within the MARF project. http://sourceforge.net/projects/marf/files/Applications/MARFCAT/ (2010–2015). Last viewed February 2014

  57. Mokhov, S.A.: The use of machine learning with signal- and NLP processing of source code to fingerprint, detect, and classify vulnerabilities and weaknesses with MARFCAT. Tech. Rep. NIST SP 500–283, NIST (2011). Report: http://www.nist.gov/manuscript-publication-search.cfm?pub_id=909407, online e-print at http://arxiv.org/abs/1010.2511

  58. Mokhov, S.A.: Intensional cyberforensics. Ph.D. thesis, Department of Computer Science and Software Engineering, Concordia University, Montreal (2013). arXiv:1312.0466

  59. Mokhov, S.A., Debbabi, M.: File type analysis using signal processing techniques and machine learning vs. file unix utility for forensic analysis. In: O. Goebel, S. Frings, D. Guenther, J. Nedon, D. Schadt (eds.) Proceedings of the IT Incident Management and IT Forensics (IMF’08), LNI140, pp. 73–85. GI (2008)

  60. Mokhov, S.A., Paquet, J., Debbabi, M.: Formally specifying operational semantics and language constructs of Forensic Lucid. In: O. Göbel, S. Frings, D. Günther, J. Nedon, D. Schadt (eds.) Proceedings of the IT Incident Management and IT Forensics (IMF’08), LNI, vol. 140, pp. 197–216. GI (2008). Online at http://subs.emis.de/LNI/Proceedings/Proceedings140/gi-proc-140-014.pdf

  61. Mokhov, S.A., Paquet, J., Debbabi, M.: Towards automatic deduction and event reconstruction using Forensic Lucid and probabilities to encode the IDS evidence. In: S. Jha, R. Sommer, C. Kreibich (eds.) Proceedings of Recent Advances in Intrusion Detection RAID’10, Lecture Notes in Computer Science (LNCS), vol. 6307, pp. 508–509. Springer, Berlin (2010). doi:10.1007/978-3-642-15512-3_36

  62. Mokhov, S.A., Paquet, J., Debbabi, M.: The use of NLP techniques in static code analysis to detect weaknesses and vulnerabilities. In: M. Sokolova, P. van Beek (eds.) Proceedings of Canadian Conference on AI’14, LNAI, vol. 8436, pp. 326–332. Springer, Berlin (2014). doi:10.1007/978-3-319-06483-3_33. Short paper

  63. Mokhov, S.A., Paquet, J., Debbabi, M.: MARFCAT: Fast code analysis for defects and vulnerabilities. In: Proceedings of SWAN’15, pp. 35–38. IEEE (2015) (to appear)

  64. Motorola: Efficient polyphase FIR resampler for numpy: Native C/C++ implementation of the function upfirdn(). [online] (2009). http://code.google.com/p/upfirdn/source/browse/upfirdn

  65. Murphy, K.P.: HMM toolbox. [online] (2002–2014). http://www.cs.ubc.ca/murphyk/Software/HMM/hmm_download.html

  66. Nari, S., Ghorbani, A.A.: Automated malware classification based on network behavior. In: Proceedings of the 2013 International Conference on Computing, Networking and Communications (ICNC), pp. 642–647. IEEE (2013)

  67. Noh, S.K., Oh, J.H., Lee, J.S., Noh, B.N., Jeong, H.C.: Detecting p2p botnets using a multi-phased flow model. In: International Conference on Digital Society, ICDS, pp. 247–253. IEEE Computer Society, Washington, DC (2009)

  68. Okada, Y., Ata, S., Nakamura, N., Nakahira, Y., Oka, I.: Comparisons of machine learning algorithms for application identification of encrypted traffic. In: Proceedings of the 10th International Conference on Machine Learning and Applications and Workshops (ICMLA), vol. 2, pp. 358–361 (2011)

  69. Okun, V., Delaitre, A., Black, P.E., NIST SAMATE: Static Analysis Tool Exposition (SATE) 2010. [online] (2010). http://samate.nist.gov/SATE2010Workshop.html

  70. Ouchani, S., Ait’Mohamed, O., Debbabi, M.: A non-convex classifier support for abstraction-refinement framework. In: 24th International Conference on Microelectronics (ICM), pp. 1–4 (2012)

  71. Paquet, J.: Distributed eductive execution of hybrid intensional programs. In: Proceedings of the 33rd Annual IEEE International Computer Software and Applications Conference (COMPSAC’09), pp. 218–224. IEEE Computer Society, Washington, DC (2009). doi:10.1109/COMPSAC.2009.137

  72. Paxson, V.: Bro: a system for detecting network intruders in real-time. Comput. Netw. 31(23–24), 2435–2463 (1999). http://www.icir.org/vern/papers/bro-CN99.pdf

  73. Peng, Y., Kou, G., Sabatka, A., Chen, Z., Khazanchi, D., Shi, Y.: Application of clustering methods to health insurance fraud detection. In: Proceedings of the 2006 International Conference on Service Systems and Service Management, vol. 1, pp. 116–120 (2006)

  74. Perdisci, R., Ariu, D., Fogla, P., Giacinto, G., Lee, W.: McPAD: a multiple classifier system for accurate payload-based anomaly detection. Comput. Netw. 53(6), 864–881 (2009)

    Article  MATH  Google Scholar 

  75. Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA (1993)

    Google Scholar 

  76. Rahimian, A., Ziarati, R., Preda, S., Debbabi, M.: On the reverse engineering of the Citadel botnet. In: Foundations and Practice of Security. Lecture Notes in Computer Science, pp. 408–425. Springer, Berlin (2014)

  77. Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 108–125. Springer, Berlin (2008)

  78. Rodríguez, L.J., Torres, I.: Comparative study of the baum-welch and viterbi training algorithms applied to read and spontaneous speech recognition. In: Pattern Recognition and Image Analysis. Lecture Notes in Computer Science, vol. 2652, pp. 847–857. Springer, Berlin (2003)

  79. Rossow, C., Dietrich, C.J., Bos, H., Cavallaro, L., Van Steen, M., Freiling, F.C., Pohlmann, N.: Sandnet: network traffic analysis of malicious software. In: Proceedings of the First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, pp. 78–88. ACM, New york (2011)

  80. Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Boston, MA (1989)

    Google Scholar 

  81. Schreiber, R.: MATLAB. Scholarpedia 2(6), 2929 (2007). doi:10.4249/scholarpedia.2929. http://www.scholarpedia.org/article/MATLAB

  82. Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proceedings of IEEE Symposium on Security and Privacy, pp. 38–49. Oakland (2001)

  83. Selesnick, I., Cai, S., Li, K., Sendur, L., Abdelnour, A.F.: MATLAB implementation of wavelet transforms. Tech. rep., Electrical Engineering, Polytechnic University, Brooklyn, NY (2003). http://taco.poly.edu/WaveletSoftware/

  84. Simon, G.J., Xiong, H., Eilertson, E., Kumar, V.: Scan detection: a data mining approach. In: Proceedings of SDM 2006, pp. 118–129. SIAM, Philadelphia, PA (2006). http://www.siam.org/meetings/sdm06/proceedings/011simong.pdf

  85. Sly Technologies Inc: jNetPcap OpenSource. [online] (2012). http://www.jnetpcap.com/

  86. Song, D.: BitBlaze: Security via binary analysis. [online] (2010). http://bitblaze.cs.berkeley.edu

  87. Song, D.: WebBlaze: New techniques and tools for web security. [online] (2010). http://webblaze.cs.berkeley.edu

  88. Song, Y., Keromytis, A.D., Stolfo, S.: Spectrogram: a mixture-of-markov-chains model for anomaly detection in web traffic. In: Proceedings of the Network and Distributed System Security Symposium, pp. 121–135. Internet Society (2009)

  89. Sourcefire: Snort: open-source network intrusion prevention and detection system (IDS/IPS). [online] (1999–2015). http://www.snort.org/

  90. Stolfo, S.J., Lee, W., Chan, P.K., Fan, W., Eskin, E.: Data mining-based intrusion detectors: an overview of the Columbia IDS Project. ACM SIGMOD Record 30(4), 5–14 (2001)

    Article  Google Scholar 

  91. Su, J., Zhang, H.: A fast decision tree learning algorithm. In: Proceedings of the 21st National Conference on Artificial Intelligence, AAAI’06, vol. 1, pp. 500–505. AAAI Press (2006)

  92. Tegeler, F., Fu, X., Vigna, G., Kruegel, C.: BotFinder: finding bots in network traffic without deep packet inspection. In: Proceedings of the 8th International Conference on Emerging Networking Experiments and Technologies, CoNEXT, pp. 349–360. ACM, New York, NY (2012)

  93. The Weka Project: Weka 3: data mining with open source machine learning software in Java. [online] (2006–2014). http://www.cs.waikato.ac.nz/ml/weka/

  94. Thorat, S.A., Khandelwal, A.K., Bruhadeshwar, B., Kishore, K.: Payload content based network anomaly detection. In: Proceedings of the First International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2008), pp. 127–132. IEEE (2008)

  95. ThreatTrack Security: ThreadAnalyzer: dynamic sandboxing and malware analysis (formerly GFI SandBox). [online] (2013). http://www.threattracksecurity.com/enterprise-security/sandbox-software.aspx

  96. Trinius, P., Willems, C., Holz, T., Rieck, K.: A malware instruction set for behavior-based analysis (2011)

  97. Vassev, E.I.: General architecture for demand migration in the GIPSY demand-driven execution engine. Master’s thesis, Department of Computer Science and Software Engineering, Concordia University, Montreal (2005). http://spectrum.library.concordia.ca/8681/

  98. Wang, K., Stolfo, S.J.: Anomalous payload-based network intrusion detection. In: Recent Advances in Intrusion Detection, pp. 203–222. Springer, Berlin (2004)

  99. Whalen, S., Boggs, N., Stolfo, S.J.: Model aggregation for distributed content anomaly detection. In: Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop, pp. 61–71. ACM, New York (2014)

  100. Wicherski, G.: pehash: a novel approach to fast malware clustering. In: 2nd USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET) (2009)

  101. Wireless and Secure Networks Research Lab: WISNET: downloads. [online] (2009–2014). http://wisnet.seecs.nust.edu.pk/downloads.php

  102. Wu, M.D., Wolfthusen, S.D.: Network forensics of partial SSL/TLS encrypted traffic classification using clustering algorithms. In: O. Göbel, S. Frings, D. Günther, J. Nedon, D. Schadt (eds.) Proceedings of the IT Incident Management and IT Forensics (IMF’08), LNI140, pp. 157–172 (2008)

  103. Yen, T.F., Reiter, M.K.: Traffic aggregation for malware detection. In: Proceedings of the 5th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment, DIMVA, pp. 207–227. Springer, Berlin (2008)

  104. Zanero, S.: Analyzing TCP traffic patterns using self organizing maps. In: Image Analysis and Processing (ICIAP 2005), pp. 83–90. Springer, Berlin (2005)

  105. Zanero, S., Savaresi, S.M.: Unsupervised learning techniques for an intrusion detection system. In: Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 412–419. ACM, New York (2004)

  106. Zanero, S., Serazzi, G.: Unsupervised learning algorithms for intrusion detection. In: Network Operations and Management Symposium (NOMS 2008), pp. 1043–1048. IEEE (2008)

  107. Zetter, K.: Meet ‘Flame’, The Massive Spy Malware Infiltrating Iranian Computers. WIRED (2012). http://www.wired.com/threatlevel/2012/05/flame/

  108. Zhang, D., Liu, D., Csallner, C., Kung, D., Lei, Y.: A distributed framework for demand-driven software vulnerability detection. J. Syst. Softw. 87, 60–73 (2014). doi:10.1016/j.jss.2013.08.033

    Article  Google Scholar 

  109. Zhao, Y., Karypis, G.: Criterion functions for document clustering: experiments and analysis. Tech. rep., University of Minnesota (2002)

  110. Zhao, Y., Karypis, G., Fayyad, U.: Hierarchical clustering algorithms for document datasets. Data Min. Knowl. Discov. 10(2), 141–168 (2005)

    Article  MathSciNet  Google Scholar 

  111. Zhong, S., Ghosh, J.: Generative model-based document clustering: a comparative study. Knowl. Inf. Syst. 8(3), 374–384 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amine Boukhtouta.

Appendices

APPENDIX

MARFPCAT Algorithms and results

Hereafter, we list some MARFPCAT results using the algorithms shown in Algorithm 1 and Algorithm 2. The results are based on the whole packet examination (i.e., headers and payload) that illustrate the precision per algorithm combinations as well as attribution for the top precise malware types. The methodology behind them is described in Section 5 and the results are discussed in Section 6.2. The algorithms’ options, in addition to those described in [55], are:

  • -dynaclass – treat learned classes as labels automatically from the reports (no predefined classes are set at the beginning),

    figure a
    figure b
  • -binary – treat data as pure binary non-formatted data,

  • -nopreprep – to skip extra pre-pre-processing,

  • -sdwt – use separating discrete wavelet transform, and

  • -flucid – generate Forensic Lucid expressions for subsequent forensic investigations and reasoning in an external system [58].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Boukhtouta, A., Mokhov, S.A., Lakhdari, NE. et al. Network malware classification comparison using DPI and flow packet headers. J Comput Virol Hack Tech 12, 69–100 (2016). https://doi.org/10.1007/s11416-015-0247-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-015-0247-x

Keywords

Navigation