Skip to main content

2021 | Buch

Information and Communications Security

23rd International Conference, ICICS 2021, Chongqing, China, November 19-21, 2021, Proceedings, Part I

insite
SUCHEN

Über dieses Buch

This two-volume set LNCS 12918 - 12919 constitutes the refereed proceedings of the 23nd International Conference on Information and Communications Security, ICICS 2021, held in Chongqing, China, in September 2021. The 49 revised full papers presented in the book were carefully selected from 182 submissions. The papers in Part I are organized in the following thematic blocks:​ blockchain and federated learning; malware analysis and detection; IoT security; software security; Internet security; data-driven cybersecurity.

Inhaltsverzeichnis

Frontmatter

Blockchain and Federated Learning

Frontmatter
The Golden Snitch: A Byzantine Fault Tolerant Protocol with Activity
Abstract
The increasing popularity of blockchain-based cryptocurrencies has revitalized the search for efficient Byzantine fault-tolerant (BFT) protocols. Many existing BFT protocols can achieve good performance in fault-free cases but suffer severe performance degradation when faults occur. This is also a problem with DiemBFT. To mitigate performance attacks in DiemBFT, we present an improved BFT protocol with optimal liveness called the Golden Snitch. The core idea is to introduce unbiased randomness in leader selection and improve the voting mechanism to protect honest leaders from being dragged down by the previous leader. The performance of the Golden Snitch is evaluated through experiments, turning out it outperforms DiemBFT in the presence of faults.
Huimei Liao, Haixia Xu, Peili Li
Rectifying Administrated ERC20 Tokens
Abstract
ERC20 token is the most popular type of Ethereum smart contract. The daily transaction volume of these tokens exceeds 100 billion dollars, which agitates the popular notions of “decentralized banking” and “tokenized economy”. Yet, it is a common misconception to assume that the decentralization of blockchain entails the decentralization of smart contracts deployed on this blockchain. In practice, the developers of smart contracts implement administrating patterns, such as censoring certain users, creating or destroying balances on demand, destroying smart contracts, or injecting arbitrary code. These routines, which are designed to tightly control the operation of these smart contracts, turn an ERC20 token into an administrated token—the type of Ethereum smart contract that we scrutinize in this research.
We discover that many smart contracts are administrated, which means that their owners solely possess an omnipotent power over these contracts. Moreover, the owners of these tokens carry lesser social and legal responsibilities compared to the traditional centralized actors that those tokens intend to disrupt. This entails two major problems: a) the owners of the tokens have the ability to quickly steal all the funds and disappear from the market; and b) if the private key of the owner’s account is stolen, all the assets might immediately turn into the property of the attacker. Therefore, the administrated ERC20 tokens are not only dissimilar to the traditional centralized asset management tools, such as banks, but they are also more vulnerable to adversarial actions by their owners or attackers. We develop a pattern recognition framework based on 9 syntactic features characterizing administrated ERC20 tokens, which we use to analyze existing smart contracts deployed on Ethereum Mainnet. Our analysis of 84,062 unique Ethereum smart contracts reveals that nearly 58% of them are administrated ERC20 tokens, which accounts for almost 90% of all ERC20 tokens deployed on Ethereum.
To protect users from the frivolousness of unregulated token owners without depriving the ability of these owners to properly manage their tokens, we introduce SafelyAdministrated—a library that enforces a responsible ownership and management of ERC20 tokens. The library introduces three mechanisms: deferred maintenance, board of trustees and safe pause. We implement and test SafelyAdministrated in the form of Solidity abstract contract, which is ready to be used by the next generation of safely administrated ERC20 tokens.
Nikolay Ivanov, Hanqing Guo, Qiben Yan
Moat: Model Agnostic Defense against Targeted Poisoning Attacks in Federated Learning
Abstract
Federated learning has migrated data-driven learning to a model-centric approach. As the server does not have access to the data, the health of the data poses a concern. The malicious participation injects malevolent gradient updates to make the model maleficent. They do not impose an overall ill-behavior. Instead, they target a few classes or patterns to misbehave. Label Flipping and Backdoor attacks belong to targeted poisoning attacks performing adversarial manipulation for targeted misclassification. The state-of-the-art defenses based on statistical similarity or autoencoder credit scores suffer from the number of attackers or ingenious injection of backdoor noise. This paper proposes a universal model-agnostic defense technique (Moat) to mitigate different poisoning attacks in Federated Learning. It uses interpretation techniques to measure the marginal contribution of individual features. The aggregation of interpreted values for important features against a baseline input detects the presence of an adversary. The proposed solution scales in terms of attackers and is also robust against adversarial noise in either homogeneous or heterogeneous distribution. The most appealing about Moat is that it achieves model convergence even in the presence of 90% attackers. We ran experiments for different combinations of settings, models, and datasets, to verify our claim. The proposed technique is compared with the existing state-of-the-art algorithms and justified that Moat outperforms them.
Arpan Manna, Harsh Kasyap, Somanath Tripathy

Malware Analysis and Detection

Frontmatter
Certified Malware in South Korea: A Localized Study of Breaches of Trust in Code-Signing PKI Ecosystem
Abstract
Code-signing PKI ecosystems are vulnerable to abusers. Kim et al. reported such abuse cases, e.g., malware authors misused the stolen private keys of the reputable code-signing certificates to sign their malicious programs. This certified malware exploits the chain of the trust established in the ecosystem and helps an adversary readily bypass security mechanisms such as anti-virus engines. Prior work analyzed the large corpus of certificates collected from the wild to characterize the security problems. However, this practice was typically performed in a global perspective and often left the issues that could happen at a local level behind. Our work revisits the investigations conducted by previous studies with a local perspective. In particular, we focus on code-signing certificates issued to South Korean companies. South Korea employs the code-signing PKI ecosystem with its own regional adaptations; thus, it is a perfect candidate to make a comparison. To begin with, we build a data collection pipeline and collect 455 certificates issued for South Korean companies and are potentially misused. We analyze those certificates based on three dimensions: (i) abusers, (ii) issuers, and (iii) the life-cycle of the certificate. We first identify that the strong regulation of a government can affect the market share of CAs. We also observe that several problems in certificate revocation: (i) the certificates had issued by local companies that closed the code-signing business still exist, (ii) only 6.8% of the abused certificates are revoked, and (iii) eight certificates are not revoked properly. All of those could lead to extending the validity of certified malware in the wild. Moreover, we show that the number of abuse cases is high in South Korea, even though it has a small population. Our study implies that Korean security practitioners require immediate attention to code-signing PKI abuse cases to safeguard the entire ecosystem.
Bumjun Kwon, Sanghyun Hong, Yuseok Jeon, Doowon Kim
GAN-Based Adversarial Patch for Malware C2 Traffic to Bypass DL Detector
Abstract
The constantly evolving malware brings great challenges to network security defense. Fortunately, deep learning (DL)-based system achieved good performance in the malware command and control (C2) traffic detection field due to its excellent representation capabilities. However, DL models have been shown to be vulnerable to evasion attacks, that is, DL models can easily be misled by adding subtle perturbations to the original samples. In this paper, we propose a GAN-based evasion method, which can help malware C2 traffic bypass the DL detector. Our main contributions contain: (1) directly generate adversarial traffic that can implement malicious functions by inserting additional adversarial patches in the original flow; (2) adaptively imitating victim’s normal traffic by training GAN in victim environment, and introducing transfer learning to reduce the additional victim resource usage caused by GAN training. Results show that the adversarial patch generated by GAN can prevent malware C2 traffic from being detected with 51.4% success rate. The higher time efficiency and smaller malware impact make our method more suitable for real attacks.
Junnan Wang, Qixu Liu, Chaoge Liu, Jie Yin
Analyzing the Security of OTP 2FA in the Face of Malicious Terminals
Abstract
One Time Password (OTP) is the most prevalent 2FA method among users and service providers worldwide. It is imperative to assess this 2FA scheme’s security from multiple perspectives, considering its ubiquitous presence in the user’s day-to-day activities. In this work, we assess the security of seven commercially deployed OTP-2FA schemes against malware in the terminal attack model without compromising any 2FA device or authentication services. To implement this attack scenario, we develop a combination of attack modules that will capture password and OTP in different ways during the user’s login attempt. At the same time, it would originate a fresh concurrent hidden session from within the terminal or remotely to get possession to the user account without compromising the service or network or any external device. We examine implemented attack against seven different popular public services, which mostly use two variants of OTP-2FA and observed that almost all of them are vulnerable to this attack. Here, the threat model is practical as the attack components can be installed in the user’s terminal without any root/administrator privilege. Moreover, the attack modules require a small number of resources to run. The whole procedure would run from the background that makes the attack very hidden in nature and attain low detectability after examining against prominent anti-malware programs that indicate a real-world threat. Our findings after the analysis of the OTP-2FA schemes indicate that an adversary who can install malware on the user’s terminal can defeat almost all popular and widely used OTP-2FA schemes, which are vital security components of online accounts and secure financial transactions. The result also points out that the OTP-2FA scheme does not add extra security on top of the password in the presence of the malicious program in the terminal.
Ahmed Tanvir Mahdad, Mohammed Jubur, Nitesh Saxena

IoT Security

Frontmatter
Disappeared Face: A Physical Adversarial Attack Method on Black-Box Face Detection Models
Abstract
Face detection is a classical problem in the field of computer vision. It has significant application value in face recognition and face recognition related applications such as face-scan payment, identity authentication, and other areas. The emergence of adversarial algorithms on face detection poses a substantial threat to the security of face recognition. The current adversarial attacks on face detection have the limitations of the need to fully understand the attacked face detection model’s structure and parameters. Therefore, these methods’ transferability, which can measure the attack’s effectiveness across many other models, is not high. Moreover, due to the consideration of commercial confidentiality, commercial face detection models deployed in real-world applications cannot be accessed, so we cannot directly launch white-box adversarial attacks against these models. Aiming at solving the above problems, we propose a Black-Box Physical Attack Method on face detection. Through ensemble learning, we can extract the public weakness of the face detection models. The attack against the public weakness has high transferability across models and makes escaping black-box face detection models possible. Our method realizes the successful escape of both the white-box and black-box face detection models in both the PC terminal and the mobile terminal, including the camera module, mobile payment module, selfie beauty module, and official face detection models.
Chuan Zhou, Huiyun Jing, Xin He, Liming Wang, Kai Chen, Duohe Ma
HIAWare: Speculate Handwriting on Mobile Devices with Built-In Sensors
Abstract
A variety of sensors are built into intelligent mobile devices. However, these sensors can be used as side channels for inferring information. Researchers have shown that some touchscreen information, such as PIN and unlock pattern, can be speculated by background applications with motion sensors. Those attacks mainly focus on the restricted-area input interface (e.g., virtual keyboard). To date, the privacy risk in the unrestricted-area input interface does not receive sufficient attention.
In this paper, we investigate such privacy risk and design an unrestricted-area information speculation framework, called Handwritten Information Awareness (HIAWare). HIAWare exploits the sensors’ signals that are affected by handwriting actions to speculate the handwritten characters. To alleviate the impact of different handwriting habits, we utilize the generality patterns of characters. Furthermore, to mitigate the impact of holding posture in handwriting, we propose a user-independent posture-aware approach. As a result, HIAWare can attack any victim without obtaining the victim’s information in advance. The experiments show that the speculation accuracy of HIAWare is close to 90.0%, demonstrating the viability of HIAWare.
Jing Chen, Peidong Jiang, Kun He, Cheng Zeng, Ruiying Du
Studies of Keyboard Patterns in Passwords: Recognition, Characteristics and Strength Evolution
Abstract
Keyboard patterns are widely used in password construction, as they can be easily memorized with the aid of positions on the keyboard. Consequently, keyboard-pattern-based passwords has being the target in many dictionary attack models. However, most of the existing researches relies only on recognition methods defining keyboard pattern structures empirically or even manually. As a result, only those infamous keyboard patterns such as qwerty are recognized and many potential structures are not specified. Besides, there are limited studies focusing on the characteristics of keyboard patterns.
In this paper, we deal with the problem of recognizing and analyzing keyboard patterns in a systematic approach. Firstly, we put forward a general recognition method that can pick out keyboard patterns form passwords automatically. Next, a comprehensive study of keyboard pattern characteristics is presented, which reveals a great deal of amazing facts about the preference for passwords based on keyboard patterns, such as: (1) More than half of the pattern-based passwords are completely composed by keyboard patterns; (2) The frequency distribution of the keyboard patterns satisfies the PDF-Zipf model; (3) Users prefer to use keyboard patterns consisted by horizontal continuous keys or those characters whose physical location are on the upper left of the keyboard. We further evaluate the security of keyboard-pattern-based passwords by employing the PCFG-base cracking technique. The experimental results indicate that the keyboard patterns can reduce the security of passwords.
Kunyu Yang, Xuexian Hu, Qihui Zhang, Jianghong Wei, Wenfen Liu
CNN-Based Continuous Authentication on Smartphones with Auto Augmentation Search
Abstract
In this paper, we present CAuSe, a CNN-based Continuous Authentication on smartphones using Auto Augmentation Search, where the CNN is specially designed for deep feature extraction and the auto augmentation search is exploited for CNN training data augmentation. Specifically, CAuSe consists of three stages of the offline stage, registration stage and authentication stage. In the offline stage, we utilize auto augmentation search on the collected data to find an optimal strategy for CNN training data augmentation. Then, we specially design a CNN to learn and extract deep features from the augmented data and train the LOF classifier after 95 features are selected by PCA in the registration stage. With the trained CNN and LOF classifier, CAuSe identifies the current user as a legitimate user or an impostor in the authentication stage. Based on our dataset, we evaluate the effectiveness of optimal strategy and the performance of CAuSe. The experimental results demonstrate that the strategy of Time-Warping(0.6)+Time-Warping(0.6) reaches the highest accuracy of 93.19% with data size 400 and CAuSe achieves the best authentication accuracy of 96.93%, respectively, comparing with other strategies and classifiers.
Shaojiang Deng, Jiaxing Luo, Yantao Li
Generating Adversarial Point Clouds on Multi-modal Fusion Based 3D Object Detection Model
Abstract
In autonomous vehicles (AVs), a critical stage of perception system is to leverage multi-modal fusion (MMF) detectors which fuse data from LiDAR (Light Detection and Ranging) and camera sensors to perform 3D object detection. While single-modal (LiDAR-based and camera-based) models are found to be vulnerable to adversarial attacks, there are limited studies on the adversarial robustness of MMF models. Recent work has proposed a general spoofing attack on LiDAR-based perception, based on the defect of ignored occlusion patterns in point clouds. In this paper, we are inspired to attack LiDAR channel alone to fool the MMF model into detecting a fake near-front object with high confidence score. We perform the first study to analyze the roubustness of a popular MMF model against the above attack and discover it is invalid due to the correction of camera. We propose a black-box attack method to generate adversarial point clouds with few points and prove the defect still exists in MMF architecture. We evaluate the attack effectiveness of different combinations of points and distances and generate universal adversarial examples at the best distance of 4m, which achieve attack success rates of more than 95% and average confidence scores over 0.9 on the KITTI validation set when the points exceed 30. Furthermore, we verify the generality of our attack and the transferability of generated universal adversarial point clouds across models.
Huiying Wang, Huixin Shen, Boyang Zhang, Yu Wen, Dan Meng
Source Identification from In-Vehicle CAN-FD Signaling: What Can We Expect?
Abstract
Controller Area Network (CAN) is significantly deployed in various industrial applications (including current in-vehicle network) due to its high performance and reliability. Controller area network with flexible data rate (CAN-FD) is supposed to be the next generation of in-vehicle network to dispose of CAN limitations of data payload size and bandwidth. The paper explores for the first time Electronic Control Unit (ECU) identification on in-vehicle CAN-FD network from bus signaling and the contributions are four-fold.
  • Technically, we discuss the factors that might affect ECU recognition (e.g., CAN-FD controller, CAN-FD transceiver, and voltage regulator) and look into the signal ringing and its intensity where dominant states along with rising edges (from recessive to dominant states) suffice to fingerprint the ECUs. We can thereby design ECU identification scheme on in-vehicle CAN-FD network.
  • For a given network topology (in terms of the stub length and the number of ECUs), we execute CAN-FD and CAN separately and one can expect considerable performance for the two kinds of protocols by using any signal characteristics (rising edges, dominant states, falling edges, and recessive states). In particular, the recognition rates by dominant states and rising edges of signals outperform significantly those by any other combinations of signal characteristics.
  • As a respond to the possible transition mechanism from CAN to CAN-FD, we also allow a hybrid topology of CAN and CAN-FD, namely, there exist on the network ECUs sending purely CAN frames, ECUs sending purely CAN-FD frames, and ECUs sending both CAN and CAN-FD frames, and our suggestion on dominant states and rising edges shows robustness to source identification as expected. This shows convincing evidence on the universal applicability of our approach to forthcoming real vehicles set up by CAN-FD network.
  • The proposed approach can be easily extended to intrusion detection against attacks not only initiated by external devices but also internal devices.
We hope our results could be used as a step forward and a guidance on securing the commercialization and batch production of in-vehicle CAN-FD network in the near future.
Yucheng Liu, Xiangxue Li
EmuIoTNet: An Emulated IoT Network for Dynamic Analysis
Abstract
Dynamic analysis of IoT firmware is an effective method to discover security flaws and vulnerabilities. However, limited by emulation methods concentrating on a single IoT device, it is challenging to find security issues hidden in communication channels. This paper presents EmuIoTNet, a tool capable of automatically building an emulated IoT network for dynamic analysis. First, EmuIoTNet prepares an emulated hardware environment to emulate a number of devices for firmware. Then, it employs network virtualization tools to setup two types of networks, IntraNet and InterNet, which connect emulated devices, companion applications, and cloud endpoints to support many communication protocols. Meanwhile, it reconfigures the IP address of emulated devices at will to support simultaneous operations of multiple users. The experimental results show that EmuIoTNet can automatically build various emulated networks and facilitate security analysis in communication channels.
Qin Si, Lei Cui, Lun Li, Zhenquan Ding, Yongji Liu, Zhiyu Hao

Software Security

Frontmatter
ACGVD: Vulnerability Detection Based on Comprehensive Graph via Graph Neural Network with Attention
Abstract
Vulnerability is one of the main causes of network intrusion. An effective way to mitigate security threats is to find and repair vulnerabilities as soon as possible. Traditional vulnerability detection methods are limited by expert knowledge. Existing deep learning-based methods neglect the connection between semantic graphs and cannot effectively deal with the structure information. Graph neural network brings new insight into vulnerability detection. However, benign nodes on the graph account for a large proportion, resulting in vulnerability information could be disturbed by them. To address the limitations of existing vulnerability detection approaches, in this paper, we propose ACGVD, a vulnerability detection method by constructing a graph network with attention. We first combine multiple semantic graphs together to form a more comprehensive graph. We then adopt the Graph neural network instead of the sequence-based model to automatically analyze the comprehensive graph. In order to solve the problem that the vulnerability information could be covered up, we add a double-level attention mechanism to the graph model. We also add a novel classification layer to extract the high-level features of the code. To make the experiment more realistic, the model is trained over the latest published real-world dataset. The experiment results demonstrate that compared with state-of-the-art methods, our model ACGVD achieves 5.01%, 13.89%, and 8.27% improvement in accuracy, recall and F1-score, respectively.
Min Li, Chunfang Li, Shuailou Li, Yanna Wu, Boyang Zhang, Yu Wen
TranFuzz: An Ensemble Black-Box Attack Framework Based on Domain Adaptation and Fuzzing
Abstract
A lot of research effort has been done to investigate how to attack black-box neural networks. However, less attention has been paid to the challenge of data and neural networks all black-box. This paper fully considers the relationship between the challenges related to data black-box and model black-box and proposes an effective and efficient non-target attack framework, namely TranFuzz. On the one hand, TranFuzz introduces a domain adaptation-based method, which can reduce data difference between the local (or source) and target domains by leveraging sub-domain feature mapping. On the other hand, TranFuzz proposes a fuzzing-based method to generate imperceptible adversarial examples of high transferability. Experimental results indicate that the proposed method can achieve an attack success rate of more than 68% in a real-world CVS attack. Moreover, TranFuzz can also reinforce both the robustness (up to 3.3%) and precision (up to 5%) of the original neural network performance by taking advantage of the adversarial re-training.
Hao Li, Shanqing Guo, Peng Tang, Chengyu Hu, Zhenxiang Chen
Software Obfuscation with Non-Linear Mixed Boolean-Arithmetic Expressions
Abstract
Mixed Boolean-Arithmetic (MBA) expression mixes bitwise operations (e.g., AND, OR, and NOT) and arithmetic operations (e.g., ADD and IMUL). It enables a semantic-preserving program transformation to convert a simple expression to a difficult-to-understand but equivalent form. MBA expression has been widely adopted as a highly effective and low-cost obfuscation scheme. However, state-of-the-art deobfuscation research proposes substantial challenges to the MBA obfuscation technique. Attacking methods such as bit-blasting, pattern matching, program synthesis, deep learning, and mathematical transformation can successfully simplify specific categories of MBA expressions. Existing MBA obfuscation must be enhanced to overcome these emerging challenges.
In this paper, we first review existing MBA obfuscation methods and reveal that existing MBA obfuscation is based on “linear MBA”, a simple subset of MBA transformation. This leaves the more complex “non-linear MBA” in its infancy. Therefore, we propose a new obfuscation method to unleash the power of non-linear MBA. Non-linear MBA expressions are generated from the combination or transformation of linear MBA rules based on a solid theoretical underpinning. Comparing to existing MBA obfuscation, our method can generate significantly more complex MBA expressions. To present the practicability of the non-linear MBA obfuscation scheme, we apply non-linear MBA obfuscation to the Tiny Encryption Algorithm (TEA). We have implemented the method as a prototype tool, named MBA-Obfuscator, to produce a large-scale dataset. We run all existing MBA simplification tools on the dataset, and at most 147 out of 1,000 non-linear MBA expressions can be successfully simplified. Our evaluation shows MBA-Obfuscator is a practical obfuscation scheme with a solid theoretical cornerstone.
Binbin Liu, Weijie Feng, Qilong Zheng, Jing Li, Dongpeng Xu
VIRSA: Vectorized In-Register RSA Computation with Memory Disclosure Resistance
Abstract
Memory disclosure attacks give adversaries access to sensitive data in memory, posing a serious threat to the security of cryptographic systems. For example, the plain private key in RAM is exposed to the attacker during RSA operation. In this paper, we propose a register-based RSA system with high efficiency, called VIRSA, so that CRT-enabled 2048-bit RSA is entirely carried out on CPU registers. The private key and the intermediate results during the calculation process are all stored in registers, and will not appear in memory, which effectively prevents memory disclosure attacks. The input RSA parameters are encrypted by an AES key. The AES key is stored in the privileged debug registers. For performance, we use AVX-512F instruction set to accelerate the RSA calculation. We adopt vector instructions to implement 1024-bit Montgomery multiplication and make use of redundant representation to solve the carry propagation problem. Experiments on Intel Xeon Silver 4208 CPU shows that VIRSA achieves a performance factor of 0.8 compared to the OpenSSL RSA implementation, which outperforms existing approaches such as PRIME. Furthermore, we make use of the windowing method to improve the RSA performance. The precomputed table is encrypted by the AES key to ensure security. The performance of VIRSA using the fixed windowing method slightly exceeds OpenSSL, achieving a performance factor of 1.02.
Yu Fu, Wei Wang, Lingjia Meng, Qiongxiao Wang, Yuan Zhao, Jingqiang Lin
Informer: Protecting Intel SGX from Cross-Core Side Channel Threats
Abstract
As one of the major threats facing Intel SGX, side-channel attacks have been widely researched and disclosed as actual vulnerabilities in recent years, which can severely harm the integrity and confidentiality of programs protected by SGX. Most existing defense schemes are built based on the assumption that the adversary launches attacks from the same core as the victim, which however have been proved insufficient by newly-emerged cross-core side-channel attacks (e.g. CrossTalk). We present Informer, a defensive approach for SGX against side-channel attacks launched from any location, whether the adversary resides in the same physical CPU core as the victim or not. Informer achieves this goal by creating dummy threads that temporarily monopolize all CPU cores when security-critical codes are being executed, which breaks the essential concurrent execution condition of side-channel attacks. A key challenge is to ensure all those threads are scheduled exclusively to occupy all CPU cores even within an untrusted OS. Informer can defend against side-channel attacks from any core, and only incurs 22% performance overhead in OpenSSL. An additional mechanism is designed to reduce the impact on the operating system, as well as an optional extension to reduce the performance overhead brought to other programs.
Fan Lang, Wei Wang, Lingjia Meng, Qiongxiao Wang, Jingqiang Lin, Li Song

Internet Security

Frontmatter
Towards Open World Traffic Classification
Abstract
Due to the dynamic evolution of network traffic, open world traffic classification has become a vital problem. Traditional traffic classification methods have achieved success to a certain extent but failed with unknown traffic detection due to the assumption of a closed world. Existing techniques on unknown traffic detection suffer from an unsatisfactory accuracy and robustness because they lack design according to the hierarchical structure of network flows. Meanwhile, the diverse flow patterns in the same attacks and the similar flow patterns from different attacks lead to the existence of hard examples, which degrades the classification performance. As a solution, we present a Siamese Hierarchical Encoder Network for traffic classification in an open world setting. We import a hierarchical encoder mechanism which mines the potential sequential and spatial characteristics of traffic deeply and adopt the siamese structure with a new designed complementary loss function which focuses on mining hard paired examples and quickens the convergence. Both of the key designs conjointly learn the intra-class compactness and inter-class separateness in the feature space to set aside more space for unknown traffic. Our comprehensive experiments on real-world datasets covering intrusion detection and malware detection indicate that SHE-Net achieves excellent performance and outperforms the state-of-the-art methods.
Zhu Liu, Lijun Cai, Lixin Zhao, Aimin Yu, Dan Meng
Comprehensive Degree Based Key Node Recognition Method in Complex Networks
Abstract
Aiming at the problem of the insufficient resolution and accuracy of the key node recognition methods in complex networks, a Comprehensive Degree Based Key Node Recognition Method (CDKNR) in complex networks is proposed. Firstly, the K-shell method is adopted to layer the network and obtain the K-shell (Ks) value of each node, and the influence of the global structure of the network is measured by the Ks value. Secondly, the concept of Comprehensive Degree (CD) is proposed, and a dynamically adjustable influence coefficient μi is set, and the Comprehensive Degree of each node is obtained by measuring the influence of the local structure of the network through the number of neighboring nodes and sub-neighboring nodes and influence coefficient μi. Finally, the importance of nodes is distinguished according to the Comprehensive Degree. Compared with several classical methods and risk assessment method, the experimental results show that the proposed method can effectively identify the key nodes, and has high accuracy and resolution in different complex networks. In addition, the CDKNR can provide a basis for risk assessment of network nodes, important node protection and risk disposal priority ranking of nodes in the network.
Lixia Xie, Honghong Sun, Hongyu Yang, Liang Zhang
Improving Convolutional Neural Network-Based Webshell Detection Through Reinforcement Learning
Abstract
Webshell detection is highly important for network security protection. Conventional methods are based on keywords matching, which heavily relies on experiences of domain experts when facing emerging malicious webshells of various kinds. Recently, machine learning, especially supervised learning, is introduced for webshell detection and has proved to be a great success. As one of state-of-the-art work, neural network (NN) is designed to input a large number of features and enable deep learning. Thus, how to properly combine the advantages of automatic feature selection and the advantages of expert knowledge-based way has become a key issue. Considering that special features to indicate unexpected webshell behaviors for a target business system are usually simple but effective, in this work, we propose a novel approach for improving webshell detection based on convolutional neural network (CNN) through reinforcement learning. We utilize the reinforcement learning of asynchronous advantage actor-critic (A3C) for automatic feature selection, aiming to maximize the expected accuracy of the CNN classifier on a validation dataset by sequentially interacting with the feature space. Moreover, considering the sparseness of feature values, we build the CNN classifier with two convolutional layers and a global pooling. Extensive experiments and analysis have been conducted to demonstrate the effectiveness of our proposed method.
Yalun Wu, Minglu Song, Yike Li, Yunzhe Tian, Endong Tong, Wenjia Niu, Bowei Jia, Haixiang Huang, Qiong Li, Jiqiang Liu
Exploring the Security Issues of Trusted CA Certificate Management
Abstract
Public Key Infrastructure (PKI) is widely used in security protocols, and the root certification authority (CA) plays a role as the trust anchor of PKI. However, as researches show, not all root CAs are trustworthy and malicious CAs might issue fraudulent certificates, which can cause Man-in-the-Middle attacks and eavesdropping attacks. Besides, massive CAs and CA certificates make it hard for users to manage the CA certificates by themselves. Though PKI applications generally provide the implementation of trusted CA certificate management (called CA manager in this paper) to store, manage, and verify CA certificates, security incidents still exist, and a malicious CA certificate can damage the entire security. This work explores the security issues of CA managers for three popular operating systems and eight applications installed on them. We make a systematic analysis of the CA managers, such as the modification of the certificate trust list, the source of trust, and the security check of the CA certificates, and propose the functionalities that a CA manager should have. Our work shows that all CA managers we analyzed have security issues, e.g., silent addition of CA certificates, inefficient validation on CA certificates, which will result in insecure CA certificates being falsely trusted. We also make some suggestions on the security enhancement for CA managers.
Yanduo Fu, Qiongxiao Wang, Jingqiang Lin, Aozhuo Sun, Linli Lu
Effective Anomaly Detection Model Training with only Unlabeled Data by Weakly Supervised Learning Techniques
Abstract
Intrusion detection systems (IDS) play an important role in security monitoring to identify anomalous or suspicious activities. Traditional IDS could be signature-based (or rule-based) or anomaly-based (or analytics-based). With the objectives of detecting zero-day attacks, analytics-based IDS have attracted great interest of the cybersecurity community. Furthermore, machine learning (ML) techniques have been extensively explored for advancing analytics-based IDS. Many ML techniques have been studied to improve the efficiency of intrusion detection and some have shown good performance. However, traditional supervised learning algorithms need strong supervision information, fully correctly labeled (FCL) data, to train an accurate model. Whereas, with the rapid development of network and communication technologies, the volume of network traffic and system logs has increased drastically in recent years, especially with the introduction of Next Generation Broadband Network (NGBN) and 5G networks. This caused huge pressure on analytics-based IDS because, for ML to train predictive models, security-relevant data need to be labeled manually, hence leading to practical barriers to achieving effective IDS. In order to avoid being overly dependent on strong supervision information, weakly supervised learning techniques, which utilize incomplete, inexact, or possibly inaccurate labels, have been studied by cybersecurity researchers in that such weak supervision information are easier and cheaper to obtain than FCL data. This research aims to explore the feasibility of weakly supervised learning techniques in IDS tasks so as to reduce the reliance on a massive amount of strong supervision information, which will only continue to grow tremendously in the big data society. We also investigated the detection stability of the proposed scheme when inaccurate weak supervision information is provided. In this article, we propose an IDS model training scheme that is based on a weakly supervised learning algorithm, which requires only unlabeled data. Experiments have been performed on three publicly available IDS evaluation datasets. The results showed that the proposed scheme performs well and is even better than some supervised learning-based IDS (SL-IDS) models. Experimental results also indicated that the weakly supervised learning based IDS model is robust and can be applied in real world situations. Besides, we examined detection performance of the proposed method when it faces class-imbalanced training data and the experiment results show that it performs better than the compared methods.
Wenzhuo Yang, Kwok-Yan Lam

Data-Driven Cybersecurity

Frontmatter
CySecAlert: An Alert Generation System for Cyber Security Events Using Open Source Intelligence Data
Abstract
Receiving relevant information on possible cyber threats, attacks, and data breaches in a timely manner is crucial for early response. The social media platform Twitter hosts an active cyber security community. Their activities are often monitored manually by security experts, such as Computer Emergency Response Teams (CERTs). We thus propose a Twitter-based alert generation system that issues alerts to a system operator as soon as new relevant cyber security related topics emerge. Thereby, our system allows us to monitor user accounts with significantly less workload. Our system applies a supervised classifier, based on active learning, that detects tweets containing relevant information. The results indicate that uncertainty sampling can reduce the amount of manual relevance classification effort and enhance the classifier performance substantially compared to random sampling. Our approach reduces the number of accounts and tweets that are needed for the classifier training, thus making the tool easily and rapidly adaptable to the specific context while also supporting data minimization for Open Source Intelligence (OSINT). Relevant tweets are clustered by a greedy stream clustering algorithm in order to identify significant events. The proposed system is able to work near real-time within the required 15-min time frameand detects up to 93.8% of relevant events with a false alert rate of 14.81%.
Thea Riebe, Tristan Wirth, Markus Bayer, Philipp Kühn, Marc-André Kaufhold, Volker Knauthe, Stefan Guthe, Christian Reuter
CyberRel: Joint Entity and Relation Extraction for Cybersecurity Concepts
Abstract
Cyber threats are becoming increasingly sophisticated, while new attack techniques are emerging, causing serious harm to businesses and even countries. Therefore, how to analyze attack incidents and trace the attack groups behind them becomes extremely important. Threat intelligence provides a new technical solution for attack traceability by constructing Cybersecurity Knowledge Graph (CKG). The CKG cannot be constructed without a large number of entity-relation triples, and the existing entity and relation extraction for cybersecurity concepts uses the traditional pipeline model that suffers from error propagation and ignores the connection between the two subtasks. To solve the above problem, we propose CyberRel, a joint entity and relation extraction model for cybersecurity concepts. We model the joint extraction problem as a multiple sequence labeling problem, generating separate label sequences for different relations containing information about the involved entities and the subject and object of that relation. CyberRel introduces the latest pre-trained model BERT to generate word vectors, then uses BiGRU neural network and the attention mechanism to extract features, and finally decodes them by BiGRU combined with CRF. Experimental results on Open Source Intelligence (OSINT) data show that the F1 value of CyberRel is 80.98%, which is better than the previous pipeline model.
Yongyan Guo, Zhengyu Liu, Cheng Huang, Jiayong Liu, Wangyuan Jing, Ziwang Wang, Yanghao Wang
Microblog User Location Inference Based on POI and Query Likelihood Model
Abstract
Location inference of microblog users is of great significance for disaster monitoring, public opinion tracing and tracking, and extensive location-based services. However due to the noisy content of microblog text and the ambiguity of geographic location, it is quite difficult to infer user location based only on user-generated text. This paper proposes a microblog user location inference algorithm based on POI and query likelihood model, named PaQL. First, the POI (Point of Interest) model of each region is constructed based on the electronic map. Then, from the word segmentation results of the user’s blog texts, the POIs with stronger location orientation are extracted as user features. Next, the inverse region frequency of POIs is calculated, based on which the correlation between users and the candidate regions is calculated based on the query likelihood model. Finally, the candidate region with the highest correlation is considered as the user’s inferred location. The location inference experiment is conducted on the provincial-level data set (3,862k blogs of 154k users) and the city-level data set (3,086k blogs of 103k users) of Sina Weibo platform. The results show that: Compared with three existing typical algorithms, GP-FLIW, GP-LIWTF and WC-EFS, which are only based on user text, the precision of provincial-level inference is improved by 7.80%, 4.99% and 1.41%, respectively, and the city-level inference precision is improved by 10.67%, 8.38% and 3.72%, respectively. Moreover, the proposed algorithm also outperforms the existing methods in terms of recall and \({F}_{1}\).
Yimin Liu, Xiangyang Luo, Han Li
Backmatter
Metadaten
Titel
Information and Communications Security
herausgegeben von
Dr. Debin Gao
Qi Li
Xiaohong Guan
Prof. Xiaofeng Liao
Copyright-Jahr
2021
Electronic ISBN
978-3-030-86890-1
Print ISBN
978-3-030-86889-5
DOI
https://doi.org/10.1007/978-3-030-86890-1

Premium Partner