A Method for TLS Malicious Traffic Identification Based on Machine Learning

Article Preview

Abstract:

With more and more malicious traffic using TLS protocol encryption, efficient identification of TLS malicious traffic has become an increasingly important task in network security management in order to ensure communication security and privacy. Most of the traditional traffic identification methods on TLS malicious encryption only adopt the common characteristics of ordinary traffic, which results in the increase of coupling among features and then the low identification accuracy. In addition, most of the previous work related to malicious traffic identification extracted features directly from the data flow without recording the extraction process, making it difficult for subsequent traceability. Therefore, this paper implements an efficient feature extraction method with structural correlation for TLS malicious encrypted traffic. The traffic feature extraction process is logged in modules, and the index is used to establish relevant information links, so as to analyse the context and facilitate subsequent feature analysis and problem traceability. Finally, Random Forest is used to realize efficient TLS malicious traffic identification with an accuracy of up to 99.38%.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

291-301

Citation:

Online since:

April 2021

Export:

Price:

* - Corresponding Author

[1] ZHANG L, CUI Y and LIU J, et al 2018(9) Application of machine learning in cyberspace security research J. Journal of Computer 1943-75.

Google Scholar

[2] WANG W et al 2018 Deep learning for network traffic classification and anomaly detection[D] Hefei University of Science and Technology of China.

Google Scholar

[3] ANDERSON B and MCGREW D 2016 Identifying encrypted malware traffic with contextual flow data[C] ACM Workshop on Artificial Intelligence & Security 36-41.

DOI: 10.1145/2996758.2996768

Google Scholar

[4] ANDERSON B and MCGREW D 2017 Machine learning for encrypted malware traffic classification: accounting for noisy labels and non-stationarity[C] The 23rd ACM SIGKDD International Conference 1725-29.

DOI: 10.1145/3097983.3098163

Google Scholar

[5] WANG L, FENG H M, and LIU B et al 2019 SSL VPN encrypted traffic identification based on hybrid method[J] Computer Applications and Software 321-328.

Google Scholar

[6] LU G, GUO R H and ZHOU Y et al 2018 Review of malicious traffic feature extration[J]. Netinfo Security 7-15.

Google Scholar

[7] WANG K 2002 A research on MD5[J]. Chinese Information 78-81.

Google Scholar

[8] SHIRAVI A, SHIRACI H and TAVALLAEE M et al 2012 Toward developing a systematic approach to generate benchmark datasets for intrusion detection[J] Computers & Security 357-374.

DOI: 10.1016/j.cose.2011.12.012

Google Scholar

[9] LASHKARI A H, DRAPER-GIL G and MAMUN M S I et al 2016 Characterization of encrypted and VPN traffic using time-related features[C] International Conference on Information Systems Security & Privacy 407-414.

DOI: 10.5220/0005740704070414

Google Scholar

[10] LUO Z M, XU S B and LIU X D 2020 Scheme for identifying malware traffic with TLS data based on machine learning[J] Chinese Journal of Network and Information Security 77-83.

Google Scholar

[11] LIU M and WU Z X 2018 Theory and application of support vector machine[J]. Science and Technology Vision 73-74.

Google Scholar

[12] BREIMAN L 2001 Random forest[J]. Machine Learning 1-33.

Google Scholar

[13] CHEN T and GUESTRIN C 2016 XGBoost: a scalable tree boosting system[C] The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

DOI: 10.1145/2939672.2939785

Google Scholar

[14] W. Wang, M. Zhu, J. Wang, X. Zeng and Z. Yang 2017 End-to-end encrypted traffic classification with one-dimensional convolution neural networks 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), Beijing pp.43-48,.

DOI: 10.1109/isi.2017.8004872

Google Scholar

[15] Wang W, Zhu M, Zeng X W, Ye X Z and Sheng Y Q 2017 Malware traffic classification using convolutional neural network for representation learning 2017 International Conference on Information Networking (ICOIN) pp.712-717,.

DOI: 10.1109/icoin.2017.7899588

Google Scholar

[16] Tavallaee M, Bagheri E, Lu W and Ghorbani A 2009 A detailed analysis of the KDD CUP 99 data set Proc. 2009 IEEE Int. Conf. Comput. Intell. Security Defense Appl pp.53-58.

DOI: 10.1109/cisda.2009.5356528

Google Scholar

[17] Wang Z The Applications of Deep Learning on Traffic Identification https://goo.gl/WouIM6.

Google Scholar

[18] Dainotti, Pescape A and Claffy A 2012 Issues and future directions in traffic classification Network IEEE vol. 26 no. 1 pp.35-40.

DOI: 10.1109/mnet.2012.6135854

Google Scholar

[19] Creech G and Hu J 2013 Generation of a new ids test dataset: Time to retire the kdd collection Wireless Communications and Networking Conference (WCNC) 2013 IEEE pp.4487-4492.

DOI: 10.1109/wcnc.2013.6555301

Google Scholar

[20] Mielczarek W and Mon T 2015 USB Data Capture and Analysis in Windows Using USBPcap and Wireshark 431-443.

DOI: 10.1007/978-3-319-19419-6_41

Google Scholar

[21] CTU University, The Stratosphere IPS Project Dataset, 2016 https://stratosphereips.org/category/dataset.html.

Google Scholar

[22] Koukis D, Antonatos D, Antoniades D, Markatos E P and Trimintzios P 2006 A Generic Anonymization Framework for Network Traffic 2006 IEEE International Conference on Communications, Istanbul pp.2302-09.

DOI: 10.1109/icc.2006.255113

Google Scholar

[23] Yang C et al 2019 A malicious traffic detection method based on an SMOTE algorithm and ensemble learning CN110572382A.

Google Scholar