Skip to main content
Top

2022 | Book

Algorithms and Architectures for Parallel Processing

21st International Conference, ICA3PP 2021, Virtual Event, December 3–5, 2021, Proceedings, Part II

insite
SEARCH

About this book

The three volume set LNCS 13155, 13156, and 13157 constitutes the refereed proceedings of the 21st International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2021, which was held online during December 3-5, 2021.

The total of 145 full papers included in these proceedings were carefully reviewed and selected from 403 submissions. They cover the many dimensions of parallel algorithms and architectures including fundamental theoretical approaches, practical experimental projects, and commercial components and systems.

The papers were organized in topical sections as follows:

Part I, LNCS 13155: Deep learning models and applications; software systems and efficient algorithms; edge computing and edge intelligence; service dependability and security algorithms; data science;

Part II, LNCS 13156: Software systems and efficient algorithms; parallel and distributed algorithms and applications; data science; edge computing and edge intelligence; blockchain systems; deept learning models and applications; IoT;

Part III, LNCS 13157: Blockchain systems; data science; distributed and network-based computing; edge computing and edge intelligence; service dependability and security algorithms; software systems and efficient algorithms.

Table of Contents

Frontmatter

Software Systems and Efficient Algorithms

Frontmatter
The Design and Realization of a Novel Intelligent Drying Rack System Based on STM32

With the development and progress of society, people have higher and higher requirements for the intelligence of the drying rack system. Therefore, it is essential practical significance to improve the level of intelligence of the drying rack system in people’s daily life. Traditional drying rack systems are mainly divided into four types: outdoor, floor-standing, hand-operated, and electric. However, they all have the disadvantages of consuming human resources, time, and space. In this paper, we design a novel intelligent drying rack system based on STM32, consisting of many sensors, a horizontal rotation mechanism, and a lifting mechanism. This intelligent drying rack system first uses raindrop sensors, photoresistors, and wind speed sensors to collect external environmental information. Then, the system monitors the surrounding environment changes in real-time. When the environment changes, the system will drive the motor to realize collecting and drying clothes. Finally, we can realize all intelligent control functions through voice module, button module, and infrared remote control. In addition, we also develop an APP that displays the system’s control interface and can realize all intelligent control functions. Through debug-running and system testing, the simulation results demonstrate the accuracy and efficacy of the designed drying rack system.

Shiwen Zhang, Wang Hu, Wei Liang, Lei Liao
Efficient Estimation of Time-Dependent Shortest Paths Based on Shortcuts

The shortest path search in the road network in the road network is of great importance in various Intelligent Transportation Systems. However, the commonly used shortest path search algorithms, such as Dijkstra and A * algorithm, are time-consuming due to their complexity, which leads to their poor performance in large-scale road networks. Thus, a new optimization technology is required to solve the path search problem on large-scale road networks. In this paper, the temporal feature of the road network is considered for the shortest path search problem, which is closer to the road network of the real world. And, an algorithm called Time-Dependent A* With Shortcuts (TDAWS) is proposed to estimate the time-dependent shortest paths. Concretely, the road network is pre-processed offline and partitioned into several regions based on clustering, which captures the spatial pattern of the road network. Then we construct shortcuts contain the shortest paths information to reduce search time and propose two mechanisms called Hop On Directionally (HOD) and Hop-Off Early (HOE) to avoid unnecessary detours. We constructed an extensive experimental study on a road network with real-world taxi trajectory data and compared our approach with existing techniques. The results demonstrated that the time cost of our method is more stable and achieves up to 17 times faster than the precise shortest path searching algorithm with an acceptable extra ratio (about 20%) on the path length.

Linbo Liao, Shipeng Yang, Yongxuan Lai, Wenhua Zeng, Fan Yang, Min Jiang
Multi-level PWB and PWC for Reducing TLB Miss Overheads on GPUs

Nowadays, GPU is becoming popular across a broad range of domains. To provide virtual memory support for most applications at present, GPU introduces the address translation process. However, many applications show an irregular memory access pattern, i.e. accesses are poor structured and often data dependent, which makes performance worse especially with virtual-to-physical address translations. GPU memory management unit (MMU) adopts caching units, e.g. page walk buffer (PWB) and page walk cache (PWC), and schedule strategies to accelerate the address translations after TLB misses. However, limited by the linear table structure of traditional PWB and PWC, they hold too many redundant information, which further limits the performance of irregular applications. Although nonlinear structure can eliminate the redundancy, it requires sequential look-up on PWB and PWC, which brings greater performance loss. In this paper, we propose the multi-level PWB and PWC structure, which features the multi-level structure for eliminating the redundancy in traditional structure and the co-design of PWB and PWC for enabling parallel look-up. Besides, we design four corresponding address translation processes to ensure the efficiency of the new structure. We evaluate our design with real-world benchmarks under GPGPU-Sim simulator. Results show that our design achieves 42.6% IPC improvement with 35.1% less space overheads.

Yang Lin, Dunbo Zhang, Chaoyang Jia, Qiong Wang, Li Shen
Hybrid GA-SVR: An Effective Way to Predict Short-Term Traffic Flow

Establishing an accurate short-term traffic flow prediction model is an important part of the intelligent transportation system (ITS). However, due to the nonlinear and stochastic dynamics of the traffic flow, building an effective predictive model remains a challenge. Support vector regression (SVR), a model that is widely used to solve non-linear regression problems, has good predictive performance for time series data such as traffic flow. But the hyperparameters of support vector machines affect their predictive performance. This paper presents a prediction model using a genetic algorithm (GA) to determine the combination of hyperparameters for the SVR model, called a hybrid GA-SVR model. Experiments on real-world traffic flow data have shown that the hybrid GA-SVR model has superior predictive performance than several state-of-the-art prediction algorithms.

Guanru Tan, Shiqiang Zheng, Boyu Huang, Zhihan Cui, Haowen Dou, Xi Yang, Teng Zhou

Parallel and Distributed Algorithms and Applications

Frontmatter
MobiTrack: Mobile Crowdsensing-Based Object Tracking with Min-Region and Max-Utility

Exploiting mobile cameras embedded on the widely-used smartphones to serve object tracking offers a new dimension to reduce the deployment cost of the stationary cameras and shorten the tracking latency, but brings the challenges in efficient task assignment and cooperations among workers due to the requirement of Mobile Crowdsensing (MCS) system. Most existing effort in the literature focuses on object tracking with MCS where the workers capture the moving object photos at pre-calculated sites. However, the contradiction between the tracking coverage and the system cost in these MCS-based tracking solutions is sharpened when tracking scenarios and worker number vary. In this paper, we investigate the tracking region to conduct the task assignment among top-k most probable sensing locations, which can achieve maximal tracking utility. Specifically, we construct a N-Gram prediction model to determine the k tracking locations and formulate the task assignment problem solved by the Kuhn-Munkras algorithm, respectively, laying a theoretical foundation. The prediction model soundness is verified statistically and the task assignment effectiveness is evaluated via large scale real-world data simulations.

Wenqiang Li, Jun Tao, Zuyan Wang, YiFan Xu, Xiaolei Tang, YiChao Dong
Faulty Processor Identification for a Multiprocessor System Under the PMC Model Using a Binary Grey Wolf Optimizer

With the increasing demand for computing power, multiprocessor systems composed of numerous processors are still deployed in various fields. Binary grey wolf optimizer cannot directly tackle system-level fault diagnosis in spite of considerable success in feature selection, set-union knapsack problem, and uncapacitated facility location problem. In this study, we proposed a binary grey wolf optimizer for system-level fault diagnosis (BGWOFD). BGWOFD employed Boolean algebra to mimic the social hierarchy of grey wolf according to the rank-based dominance weight. To balance the convergence and mutation, a new competitive mechanism is adopted. Furthermore, the mutation strategy designed for the PMC model can effectively improve diagnostic efficiency and population diversity. Experimental results demonstrate the advantage of the proposed algorithm in diagnostic accuracy and diagnostic time.

Fulai Pan, Weixia Gui
Fast On-Road Object Detector on ROS-Based Mobile Robot

The application environment of mobile robot is gradually expanding from indoor to outdoor. Vision-based detection, which acquires traffic information through the camera, is a state-of-the-art auxiliary technology. In this paper, a robotic middleware Robot Operating System (ROS) is applied to detect object and control application based on embedded processor. And, we present an effective On-road object detector which is suitable for embedded GPU by improving the performance of Single Shot MultiBox Detector (SSD). Our approach is to construct detection network by using depth-wise separable convolution for saving computing resource and present multi-category clustering to adjust the generated default boxes for optimizing accuracy. Experiments on KITTI dataset show that the proposed network runs 2.1 times faster than original SSD network on embedded GPU and maintains 71% mean average precision. Finally, a mobile robot is designed based on the detector and controller to demonstrate On-road assisted driving intuitively.

Gang Wang, Qiudi Song, Tao Li, Min Li
A Lightweight Asynchronous I/O System for Non-volatile Memory

Non-volatile memory, also called persistent memory (PM), has the features of byte addressing, non-volatility and the similar performance with traditional DRAM, but still shows obvious latency in several common scenarios which adopt the synchronous (sync) I/O, such as the application transferring large PM data or accessing the remote PM data in a NUMA architecture. These problems motivate the asynchronous (async) I/O of a PM file system. In this paper, we first investigate the efficiency of the combination of PM and IO_uring, which is a novel and highly-efficient async I/O system proposed recently. We find IO_uring on PM still incurs a serial of performance issues: (1) pseudo-async I/O path; (2) low efficiency memory allocation of I/O data and (3) unnecessary CPU overhead on user polling. Then we introduce LWAIO, a lightweight async I/O system to relieve the above issues. It mainly contains three techniques: (1) kernel-level threading; (2) dynamic memory pool and (3) kernel pushing. We implement LWAIO in NOVA, a well-known PM file system and conduct extensive experiments to verify its advantages on a real PM platform. The experimental results show that LWAIO brings up to 13% IOPS benefit when dealing with random write I/O operation, as well as 45% IOPS improvement when dealing with the random reads.

Jiebin Luo, Weijie Zhang, Dingding Li, Haoyu Luo, Deze Zeng
The Case for Disjoint Job Mapping on High-Radix Networked Parallel Computers

Onboard optics and co-packaged optics (CPO) will enable to build an ultra high-radix switching ASIC. Ultra high-radix interconnection networks, which take a low diameter, lead to a marginal impact of intra-job network topology on the performance of job mapping, i.e., placement of message passing interface (MPI) ranks onto compute nodes. In this context, we investigate the impact of job mapping algorithms on job scheduling performance, which have different trade-offs between the resource utilization and the constraint of intra-job network topology. Our simulation results show that a simple disjoint job mapping policy (e.g., a topology-oblivious job mapping algorithm) surprisingly outperforms a complicated joint one (e.g., a topology-aware job mapping algorithm) for its substantially better job scheduling performance at the cost of a larger network diameter, especially when dealing with an exceedingly large workload on high-radix networked parallel computers.

Yao Hu, Michihiro Koibuchi
FastCache: A Client-Side Cache with Variable-Position Merging Schema in Network Storage System

Cache plays an important role in providing high throughput and low latency network storage service for I/O intensive applications. One major challenge is that performance of storage degrades significantly even with cache at backend while facing microwrite workload. A straightforward approach is to use cache at client to merge microwrites into sequential write. However, we notice that direct merging within block causes severe fragments problem. Specifically, simple cache update policy pollutes cache which leads to I/O performance degradation. In this paper, we introduce FastCache, a two level of cache based on hash table and linked list to store data slice and variable-position merging schema to convert random microwrites into sequential write. To avoid cache pollution, we design a new cache update policy based on measurable threshold to control flushing and Poisson distribution sampling to find the most suitable entries to be evicted. We implement FastCache in FastCFS and conduct extensive evaluations under benchmark FIO and real workload. We show that FastCache outperforms LRU and HCCache in terms of IOPS and access latency. The experimental results demonstrate that IOPS can be improved by up to 10 $$\times $$ × , and the access latency with FastCache decreases by 50%–90%.

Lin Qian, BaoLiu Ye, XiaoLiang Wang, Zhihao Qu, Weiguo Duan, Ming Zhao
An Efficient Parallelization Model for Sparse Non-negative Matrix Factorization Using cuSPARSE Library on Multi-GPU Platform

Positive or Non-negative Matrix Factorization (NMF) is an effective technique and has been widely used for Big Data representation. It aims to find two non-negative matrices W and H whose product provides an optimal approximation to the original input data matrix A, such that $$A\approx W*H$$ A ≈ W ∗ H . Although, NMF plays an important role in several applications, such as machine learning, data analysis and biomedical applications. Due to the sparsity that is caused by missing information in many high-dimension scenes (e.g., social networks, recommender systems and DNA gene expressions), the NMF method cannot mine a more accurate representation from the explicit information. Therefore, the Sparse Non-negative Matrix Factorization (SNMF) can incorporate the intrinsic geometry of the data, which is combined with implicit information. Thus, SNMF can realize a more compact representation for the sparse data. In this paper, we study the Sparse Non-negative Matrix Factorization (SNMF). We use Multiplicative Update Algorithm (MUA) that computes the factorization by applying update on both matrices W and H. Accordingly, to address these issue, we propose a two models to implement a parallel version of SNMF on GPUs using NVIDIA CUDA framework. To optimize SNMF, we use cuSPARSE optimized library to compute the algebraic operations in MUA where sparse matrices A, W and H are stored in Compressed Sparse Row (CSR) format. At last, our contribution is validated through a series of experiments achieved on two input sets i.e. a set of randomly generated matrices and a set of benchmark matrices from real applications with different sizes and densities. We show that our algorithms allow performance improvements compared to baseline implementations. The speedup on multi-GPU platform can exceed $$11\times $$ 11 × as well as the Ratio can exceed 91%.

Hatem Moumni, Olfa Hamdi-Larbi
HaDPA: A Data-Partition Algorithm for Data Parallel Applications on Heterogeneous HPC Platforms

As the heterogeneity of the high-performance computing platform and the scale of data-parallel applications increased significantly, data partition becomes a key issue. Recent works use computation performance model to optimize the data partition algorithm generally. However, these methods cannot take the communication overhead into account, resulting in incompatibility for the applications with high communication ratio or unbalanced communication topology. In this paper, a new heterogeneous-aware data partition algorithm, HaDPA, is proposed. Firstly, the computation and communication overhead are predicted by suitable computation and communication performance models given a partition topology. Then, the search tree is constructed, and the hierarchical deep first search with branch and bound is designed to obtain the optimal solution, which makes up the whole HaDPA process with the constructing of optimizing model. Finally, to verify the performance of the algorithm, Matrix multiplication and axial compressor rotor applications are tested on TianHe-2A supercomputer. Experimental results show that HaDPA can effectively reduce the execution time of data parallel applications. What’s more, the impact factors of performance improvement are analyzed and explained. Regression model proofs that the communication to computation ratio matters more to the data-partition on heterogeneous HPC platforms. Besides, compared with HPOPTA, the HaDPA improvement ratio increases with a higher communication ratio and a lower heterogeneity of hardware platform.

Jingbo Li, Li Han, Yuqi Qu, Xingjun Zhang
A NUMA-Aware Parallel Truss Decomposition Algorithm for Large Scale Graphs

Truss decomposition algorithm is to decompose a graph into a hierarchical subgraph structure. A k-truss ( $$k \ge 2$$ k ≥ 2 ) is a subgraph that each edge is in at least $$k-2$$ k - 2 triangles. The existing algorithm is to first compute the number of triangles for each edge, and then iteratively increase k to peel off the edges that are not in the ( $$k+1$$ k + 1 )-truss. Due to the scale of the data and the intensity of computations, truss decomposition algorithm on the billion-side graph may take more than hours on a commodity server. In addition, today, more servers adopt NUMA architecture, which also affects the scalability of the algorithm. Therefore, we propose a NUMA-aware shared-memory parallel algorithm to accelerate the truss decomposition for NUMA systems by (1) computing different levels of k-truss between each NUMA nodes (2) dividing the range of k heuristically to ensure load balance (3) optimizing data structure and triangle counting method to reduce remote memory access, data contention and data skew. Our experiments show that on real-world datasets our OpenMP implementation can accelerate truss decomposition effectively on NUMA systems.

Zhebin Mou, Nong Xiao, Zhiguang Chen
Large-Scale Parallel Alignment Algorithm for SMRT Reads

Single Molecule Real-Time (SMRT) sequencing is one of the popular issues in third-generation sequencing technology. Compared with next-generation sequencing technology, SMRT can detect single molecules and has much longer read lengths, which also leads to a huge increase in the amount of data. As the performance of a single CPU has reached its bottleneck, single-node computing is far from meeting the SMRT sequencing requirements. An alternative solution is parallel computing. It makes the alignment algorithm run on multiple computing nodes, thus greatly decreases the running time. The Regional Hashing-based Alignment Tool (rHAT) is a novel approach developed especially for SMRT sequencing. It has better sensitivity, improved correctness compared with existing sequence alignment tools. However, the original rHAT source can only run on a single node, which dramatically limits its performance. In this article, we developed PrHAT, a parallel sequence alignment version of rHAT. We test PrHAT on simulated and real datasets which the original rHAT used. Our results show that PrHAT reduces the computing wall-time from nearly an hour to several minutes. In the process of increasing the number of nodes from 2 to 16 on aligning large-scale datasets, PrHAT achieves speedups of 1.94–14.87x. The parallel efficiency decreases from 97% to 93%; moreover, its weak scaling remains almost unchanged. Based on PrHAT, we developed OpenPrHAT. It has a similar performance towards PrHAT, but can run on other computing devices like GPU in the platform. We expect that the implementation of PrHAT will promote the development of SMRT in third-generation sequencing technology.

Zeyu Xia, Yingbo Cui, Ang Zhang, Peng Zhang, Sifan Long, Tao Tang, Lin Peng, Chun Huang, Canqun Yang, Xiangke Liao
Square Fractional Repetition Codes for Distributed Storage Systems

Fractional repetition (FR) codes have been proposed for distributed storage systems to achieve low-complexity repair of failed nodes, i.e., each contacted helper node in the repair process simply transfers a portion of stored data to the replacement node without arithmetic operations. In this paper, we propose square fractional repetition (SFR) codes, which have the key feature that a failed storage node can be repaired by two helper nodes, thus achieving the smallest non-trivial repair degree. Moreover, we show that determining the supported file size of SFR codes is equivalent to solving an integer partition problem, and an algorithm is then presented.

Bing Zhu, Shigeng Zhang, Weiping Wang
An Anti-forensic Method Based on RS Coding and Distributed Storage

The anti-forensics (AF) technology has become a new field of cybercrime. The problems of existing forensic technologies should be considered from criminals’ perspective, so as to make improvement to existing AF technologies. There are two types of AF methods, namely, data hiding and destruction, where most AF tools are primarily based on data hiding. If the data can be intercepted by investigators during the AF process, the remaining data may be destroyed by the criminal, which would make investigators obtain nothing about data information. To address this issue, this paper proposes an AF scheme with multi-device storage based on Reed-Solomon codes by combining data hiding and data destruction. The data is divided into multiple out-of-order data blocks and parity blocks, where these blocks are stored separately in different devices. This method can reduce the storage cost and protect the privacy of data. Even if the data is destroyed, it allows AF investigators to recover the data. Security analysis showed that this AF method can prevent malicious, erroneous or invalid files while acquired and ensure data security in data stolen. Theoretical analysis indicated that this method was difficult for investigators but easy for AFer in files recovery. Experimental results demonstrated that the proposed method is effective and has practical efficiency.

Xuhang Jiang, Yujue Wang, Yong Ding, Hai Liang, Huiyong Wang, Zhenyu Li

Data Science

Frontmatter
Predicting Consumers’ Coupon-usage in E-commerce with Capsule Network

In e-commerce, merchants usually increase their profit by issuing coupons to potential customers. If merchants don’t develop a suitable coupon strategy, or randomly issue coupons, they may not take effects and thus waste the budget. Therefore, it is very important for merchants to issue coupons to customers who are more likely to purchase, leading to the necessity of predicting consumer’s coupon usage. However, existing methods such as questionnaires cannot get enough data and traditional deep learning cannot solve the complex features of coupon usage prediction. To this end, this paper proposes a novel model for predicting customer’s coupon usage behavior with capsule network. It classifies coupon features into multiple groups of capsules, and designs two capsule network structures for predicting coupon usage behavior. Meanwhile, we intensively compare the proposed model with multi-layer perception (MLP), convolutional neural network (CNN) and recurrent neural network (RNN). The experimental results show that the proposed model has significantly better prediction accuracy (e.g. AUC).

Wenjun Jiang, Zhenqiong Tan, Jiawei He, Jifeng Zhang, Tian Wang, Shuhong Chen
A High-Availability K-modes Clustering Method Based on Differential Privacy

In categorical data mining, the K-modes algorithm is a classic algorithm that has been widely used. However, the data analyzed by the K-modes algorithm usually contains sensitive user information. If these data are leaked, it will seriously threaten the privacy of users. In response to this problem, the existing method that combines differential privacy with the K-modes algorithm can effectively prevent privacy leakage. Nevertheless, differential privacy adds noise to the data while protecting data privacy, which will reduce the availability of clustering results. In this paper, we propose a high-availability K-modes clustering mechanism based on differential privacy(HAKC). In this mechanism, based on the use of differential privacy to protect data privacy, we select the initial centroid of the clustering by calculation, and improve the calculation method of the distance between the data point and the centroid in the iterative process.

Shaobo Zhang, Liujie Yuan, Yuxing Li, Wenli Chen, Yifei Ding
A Strategy-based Optimization Algorithm to Design Codes for DNA Data Storage System

The exponential increase of big data volumes demands a large capacity and high-density storage. Deoxyribonucleic acid (DNA) has recently emerged as a new research trend for data storage in various studies due to its high capacity and durability, where primers and address sequences played a vital role. However, it is a critical biocomputing task to design DNA strands without errors. In the DNA synthesis and sequencing process, each nucleotide is repeated, which is prone to errors during the hybridization reactions. It decreases the lower bounds of DNA coding sets which causes the data storage stability. This study proposes a metaheuristic algorithm to improve the lower bounds of DNA data storage. The proposed algorithm is inspired by a moth-flame optimizer (MFO), which has exploration and exploitation capability in one dimension, and it is enhanced by opposition-based learning (OBL) strategy with three-dimension search space for the optimal solution; hereafter, it is MFOL algorithm. This algorithm is programmed to construct the DNA storage codes by reducing the error rates of DNA coding sets with GC-content, Hamming distance, and No-runlength constraints. In experiments, 13 benchmark functions and Wilcoxon rank-sum test are implemented, and performances are compared with the original MFO and three other algorithms. The generated DNA codewords by MFOL are compared with a state-of-the-art Altruistic algorithm and KMVO algorithm. The proposed algorithm improved 30% DNA coding rates with shorter sequences, reducing errors during DNA synthesis and sequencing.

Abdur Rasool, Qiang Qu, Qingshan Jiang, Yang Wang
Multi-Relational Hierarchical Attention for Top-k Recommendation

As one of the critical application directions in the Recommendation Systems domain, the top-k recommendation model is to rank all candidate items through non-explicit feedback (e.g., some implicit interact behavior, like clicking, collecting, or viewing) from users. In this ranking, the rank shows the users’ satisfaction with recommended items or the relevance of the target item. Although previous methods all improve the performance of the final recommended ranking, they suffer from several limitations. To overcome these limitations, we propose a Multi-Relational Hierarchical Attention within Graph Neural Network (GNN)-attention-Deep Neural Network (DNN) architecture for the top-k recommendation, named MRHA for brevity. In our proposed method, we combine the GNN’s ability to learn the local item representation of graph-structure data and attention-DNN architecture’s ability to learn the user’s preference. For processing the multi-relational data that occurs in the real application scenarios, we propose a novel hierarchical attention mechanism based on the GNN-attention-DNN architecture. The comparative experiments conducted on two real-world representative datasets show the effectiveness of the proposed method.

Shiwen Yang, Jinghua Zhu, Heran XI

Edge Computing and Edge Intelligence

Frontmatter
EdgeSP: Scalable Multi-device Parallel DNN Inference on Heterogeneous Edge Clusters

Edge computing has emerged as a promising line of research for processing large-scale data and providing low-latency services. Unfortunately, deploying deep neural networks (DNNs) on resource-limited edge devices presents unacceptable latency, hindering artificial intelligence from empowering edge devices. Prior solutions attempted to address this issue by offloading workload to the remote cloud. However, the cloud-assisted approach ignores that devices in the edge environment tend to exist as clusters. In this paper, we propose EdgeSP, a scalable multi-device parallel DNN inference framework that maximizes resource utilization of heterogeneous edge device clusters. We design a multiple fused-layer blocks parallelization strategy to reduce inter-device communication during parallel inference. Further, we add early exit branches to DNNs, empowering the device to trade-off latency and accuracy for a variety of sophisticated tasks. Experimental results show that EdgeSP enables inference latency acceleration of $$2.3\times -3.7\times $$ 2.3 × - 3.7 × for DNN inference tasks of various scales and outperforms the existing naive parallel inference method. Additionally, EdgeSP can provide high accuracy inference services under various latency requirements.

Zhipeng Gao, Shan Sun, Yinghan Zhang, Zijia Mo, Chen Zhao
An Efficient Computation Offloading Strategy in Wireless Powered Mobile-Edge Computing Networks

The emergence of mobile edge computing (MEC) has improved the data processing capabilities of devices with limited computing resources. However, some tasks that require higher latency and energy consumption are still facing huge challenges. In this paper, for the time-varying wireless channel conditions, we proposed an effective method to perform offloading calculations on the computing tasks of wireless devices, that is, to distribute the tasks to the local of offload to the edge server under the premise of satisfying time delay and energy consumption. Based on this, we adopt the parallel calculation model of Deep Reinforcement Learning Optimal Stopping Theory (DRLOST), which is composed of two parts: offloading decision generation and deep reinforcement learning. The model uses a parallel deep neural network (DNN) to generate offloading decisions, and stores the generated offloading decisions in the memory according to the optimal stopping theory model parameters to further train the model. The simulation results show that the proposed algorithm can minimize delay time, and can respond quickly to tasks even in a fast-fading environment.

Xiaobao Zhou, Jianqiang Hu, Mingfeng Liang, Yang Liu
WiRD: Real-Time and Cross Domain Detection System on Edge Device

WiFi-based perception systems can realize various gesture recognition in theory, but they cannot realize large-scale applications in practice. Later, some work solved the problem of cross-domain identification of the WiFi system, and promoted the possibility of the practical application of WiFi perception. However, the existing cross-domain recognition work requires a large number of calculations to extract motion features and recognition through a complex network, which determines that it cannot be deployed directly on edge devices. In addition, some hardware limitations of edge devices (for example, the network card is a single antenna), the amount of data we obtain is far less than that of the general network card. If the original data is not calibrated, the error information carried by the data will have a huge impact on the recognition result. Therefore, in order to solve the above problems, we propose WiRD, a system that can accurately calibrate the amplitude and phase in the case of a single antenna, and can be deployed on edge devices to achieve real-time detection. Experimental results show that WiRD is comparable to existing methods for gesture and body recognition within the domain, and has 87% accuracy for gesture recognition cross the domain, but the overall system processing time is reduced by 9 $$\times $$ × and the model inference time is reduced by 50 $$\times $$ × .

Qing Yang, Tianzhang Xing, Zhiping Jiang, Junfeng Wang, Jingyi He
Deep Learning with Enhanced Convergence and Its Application in MEC Task Offloading

As an emerging computing paradigm, the mobile edge computing (MEC) has become the top topic in various research fields. Nevertheless, task offloading, as a key issue in MEC environment, is still an immense challenge because it is often NP-hard. Currently, many researchers adopt deep learning frameworks to solve task offloading problem of MEC. Unfortunately, most of these works directly use various deep learning frameworks. It is insufficient consideration that how to improve the convergence performance of deep learning in solving MEC task offloading problem. To cope with this issue, we propose two methods to enhance the convergence of deep learning in this paper, which are named as uniform design method (UDM) and hadamard matrix method (HMM), respectively. UDM and HMM can enhance exploiting ability of the space near the specific offloading decision, benefiting to improve the convergence performance of deep learning algorithms. An improved deep learning algorithm is built by integrating UDM or HMM. The validity of our proposed algorithm is verified through extensive simulation experiments. The results show that our proposed algorithm can achieve better convergence performance than the benchmark algorithm under different learning rates and memory sizes.

Zheng Wan, Xiaogang Dong, Changshou Deng
Dynamic Offloading and Frequency Allocation for Internet of Vehicles with Energy Harvesting

The emerging vehicle services need more stable and efficient communication environments. Furthermore, fast-developing in-vehicle applications increase power consumption, and bring new challenges to the endurance of electric vehicles (EVs). In this paper, taking a vehicular network with energy harvesting as the background, we propose a joint online algorithm based on vehicle mobility to minimize the energy consumption of electric vehicles. Specifically, we determine the relationship between MEC computing power allocation and vehicle information (position and driving speed), and minimize vehicles’ energy consumption while ensuring the completion rate of offloading task calculations. This problem is NP-hard, and we use the Lyapunov optimization to transform the original problem into a deterministic optimization problem, which has a coupling between the local calculation amount and the MEC calculation frequency allocation decision. Toward this end, we apply the Lagrangian duality method to decouple the problem, and propose a Joint Local Computing and CPU-cycle Frequency Allocation (JLCCFA) algorithm to obtain the approximate optimal solution of the original problem. The simulation experiment results show that JLCCFA can effectively reduce the energy consumption of vehicle users and maintain a small task queue backlog.

Teng Ma, Xin Chen, Yan Liang, Ying Chen
SPACE: Sparsity Propagation Based DCNN Training Accelerator on Edge

On-edge learning enables edge devices to continually adapt to the new data of AI applications. However, much more computing capacity and memory space are needed to achieve the batch backward propagation oriented training, in which the system power budget is limited by the edge circumstance. As the operands are propagated during the training, useless zero values are inevitably propagated, which will cause unnecessary waste of memory accesses and computations. This paper conducted a thorough analysis of the origin of sparsity in all three phases of the training based on sparse propagation and gives three insights about the absolute sparsity and the nonabsolute sparsity found for efficient deployment of the training process. An efficient training accelerator named SPACE which can not only reduce memory footprint but also delete a massive amount of computations by exploiting the nonabsolute sparsity and the absolute sparsity is proposed. SPACE can improve performance and energy efficiency by a factor of 3.2x and 2.8x, respectively, compared with dense training architecture.

Miao Wang, Zhen Chen, Chuxi Li, Zhao Yang, Lei Li, Meng Zhang, Shengbing Zhang
Worker Recruitment Based on Edge-Cloud Collaboration in Mobile Crowdsensing System

In recent years, with the rapid development of mobile Internet and smart sensor technology, mobile crowdsensing (MCS) computing model has attracted wide concern in academia, industry and business circles. MCS utilizes the sensing and computing capabilities of smart devices carried by workers to cooperate through the mobile Internet to fulfill complex tasks. Worker recruitment is a core and common research problem in MCS, which is a combinatorial optimization problem that considers tasks, workers additionally other factors to satisfy various optimization objectives and constraints. The existing methods are not suitable for large-scale and real-time sensing tasks. Thus, this paper proposes a multi-layers worker recruitment framework based on edge-cloud collaboration. At the cloud computing layer, the whole sensing area is partitioned into small grids according to task position. At the edge computing layer, real-time data processing and aggregation are performed and then a mathematical model is constructed to make decision on worker recruitment by considering a variety of factors from the perspective of workers. Experimental results on real data prove that, compared with existing methods, our method can achieve good performance in terms of spatial coverage and running time under task cost and time constraint.

Jinghua Zhu, Yuanjing Li, Anqi Lu, Heran Xi
Energy Efficient Deployment and Task Offloading for UAV-Assisted Mobile Edge Computing

With the popularization of mobile wireless networks and Internet of Things (IoT) technologies, energy-hungry and delay-intensive applications continue to surge. Due to the limited computing power and battery capacity, mobile terminals rarely satisfy the increasing demands of application services. Mobile Edge Computing (MEC) deploys communication and computing resources near the network edge closing to the user side, which effectively reduces devices’ energy consumption and enhances system performance. However, the application of MEC needs infrastructures that can deploy edge services, and is limited by the geographical environment. UAV-assisted MEC has better flexibility and communication Line-of-Sight (LoS), which expands service scope while improving the versatility of MEC. Meanwhile, the dynamic task arrival rate, channel condition, and environmental factors pose challenges for task offloading and resources allocation strategy. In this paper, we jointly optimize UAV deployment, frequency scaling, and task scheduling to minimize energy consumption for devices while ensuring system stability in the long term. Due to the dynamic and randomness of task arrival rate and wireless channel, the original problem is defined as a stochastic optimization problem. The Drone Placement and Online Task oFFloading (DPOTFF) algorithm is designed to decouple the original problem into several sub-problems and solve them within a limited time complexity. It is also proved theoretically that the DPOTFF can obtain close-to-optimal energy consumption while ensuring system stability. The effectiveness and reliability of the algorithm are also verified by simulation and comparative experiments.

Yangguang Lu, Xin Chen, Fengjun Zhao, Ying Chen

Blockchain Systems

Frontmatter
Research on Authentication and Key Agreement Protocol of Smart Medical Systems Based on Blockchain Technology

The wireless body area network in the smart medical systems uses wearable devices to remotely monitor the patient’s physiological information and transmits the information to the medical center through an open channel. To prevent security issues such as privacy leakage and malicious attacks in the wireless body area network, anonymous authentication and key negotiation are required between the sensor and the server. The protocol must not only satisfy confidentiality and security but also provide anonymity and untraceability for sensor nodes. In response to this problem, this paper proposes an authentication and key agreement protocol for sensors and servers based on blockchain technology in the wireless body area network, with the help of session keys for subsequent access and data transmission. The safety of our scheme was evaluated through an informal safety analysis. In addition, our scheme was also simulated by using ProVerif. The experimental results showed that the scheme is safe.

Xiaohe Wu, Jianbo Xu, W. Liang, W. Jian
CRchain: An Efficient Certificate Revocation Scheme Based on Blockchain

One of essential parts of public key infrastructure is the ability to efficiently check the certificate status and quickly distribute certificates revocation information. The existing certificate revocation schemes generally suffer from a time-consuming process of certificate status verification as well as a long interval of revocation information updating. To address aforementioned issues, this paper proposes a blockchain-based certificate revocation mechanism, namely CRchain, which can efficiently check certificate status and revoke the certificate. Specifically, to achieve efficient certificate status queries, revokedCertCF and validCertCF cuckoo filters are constructed for storing the revoked and valid certificates, respectively. And then, the co-controlled key published on CRchain is presented for shortening the certificate revocation process and abridging the authority of certificate authorities. Finally, we implement and evaluate CRchain on Hyperledger Fabric with smart contract. The theoretical analysis and experimental results show that CRchain achieves better performance in latency and exchanged data size than existing reference methods in the duration of the certificate status checking.

Xiaoxue Ge, Liming Wang, Wei An, Xiaojun Zhou, Benyu Li
Anonymous Authentication Scheme Based on Trust and Blockchain in VANETs

Vehicular ad-hoc network (VANET) has been applied in intelligent transportation systems due to its tremendous potential to improve vehicle and road safety, traffic efficiency, and promote convenience as well as comfort to both drivers and passengers. However, the dynamic wireless network environment and the high mobility of vehicles bring huge challenges to the security of VANETs. A trust-based certificateless anonymous authentication scheme for VANETs has been proposed in this paper. In this scheme, bilinear pairing operations and elliptic curve cryptographic algorithm are utilized to achieve anonymous authentication. Blockchain technology is introduced to store the trust value of vehicles, realizing the identity tracking of malicious vehicles. Practicability and reliability have been improved through the combination of trust, traditional authentication and blockchain, which also realize privacy-preserving. Simulation results have proved that the proposed scheme outperforms baseline schemes in computational cost and communication cost.

Li Zhang, Jianbo Xu
BIPP: Blockchain-Based Identity Privacy Protection Scheme in Internet of Vehicles for Remote Anonymous Communication

Aiming at the difficult problem of anonymous communication between vehicles and vehicles at long distances in Internet of Vehicles, we propose a identity privacy protection scheme based on blockchain (BIPP). In this scheme, the response value of the PUF function and the continuous timestamp are used as the basis of encryption, which is associated with the pseudonym of the user ID information to prevent adversaries from conducting forgery attacks. The pseudonym mechanism and the confusion mechanism between identity and message protect the privacy of vehicle users’ identities. The information on the blockchain must pass reasonable judgment algorithms and meet the conditions for ending of the event. Blockchain technology can ensure the information received by the vehicles in the target area is credible. The scheme is proved to be secure and feasible through analysis, implementation and evaluation.

Hongyu Wu, Xiaoning Feng, Guobin Kan, Xiaoshu Jiang

Deep Learning Models and Applications

Frontmatter
Self-adapted Frame Selection Module: Refine the Input Strategy for Video Saliency Detection

Video saliency detection is intended to interpret the human visual system by modeling and predicting while observing a dynamic scene. This method is currently widely used in a variety of devices, including surveillance cameras and Internet-of-Things sensors. Traditionally, each video contains a large amount of redundancies in consecutive frames, while the common practices concentrate on extending the range of input frames to resist the uncertainty of input images. In order to overcome this problem, we propose Self-Adapted Frame Selection (SAFS) module that removes redundant information and selects frames that are highly informative. Furthermore, the module has high robustness and extensive application dealing with complex video contents, such as fast moving scene and images from different scenes. Since predicting the saliency map across multiple scenes is challenging, we establish a set of benchmarking videos for the scene change scenario. Specifically, our method combined with TASED-NET achieves significant improvements on the DHF1K dataset as well as the scene change dataset.

Shangrui Wu, Yang Wang, Tian Wang, Weijia Jia, Ruitao Xie
Evolving Deep Parallel Neural Networks for Multi-Task Learning

Multi-Task Learning (MTL) can perform multiple tasks simultaneously with a single model, and can achieve competitive performance for each individual task. In recent years, the Deep Neural Networks (DNNs) based models have demonstrated their advantages in the field of MTL. Yet, most of such models are commonly manually designed with expertise through performing trial and error experiments, which is prohibitively ineffective. In view of this, we design a method based on evolutionary algorithm in this paper, named EVO-MTL, to automate the parallel DNN architectures for effectively addressing the MTL problems. Specifically, our main idea is to evolve the connections between the parallel task-specific backbone networks, and then leverage the useful information contained in the tasks by fusing the task-specific features. In order to verify the effectiveness of the proposed algorithm, the experiments are designed to compare with recent MTL methods including the manually designed and automatically designed. The experimental results demonstrate that the proposed algorithm can outperform the carefully hand-designed methods. In addition, the proposed algorithm can also attain promising competitive performance in balancing multi-task conflicts compared with the DNN architecture searched by state-of-the-art automated MTL method.

Jie Wu, Yanan Sun
An Embedding Carrier-Free Steganography Method Based on Wasserstein GAN

Image has been widely studied as an effective carrier of information steganography, however, low steganographic capacity is a technical problem that has not been solved in non-embedded steganography methods. In this paper, we proposed a carrier-free steganography method based on Wasserstein GAN. We segmented the target information and input it into the trained Wasserstein GAN, and then generated the visual-real image. The core design is that the output results are converted into images in the trained network according to the mapping relationship between preset coding information and random noise. The experimental results indicated that the proposed method can effectively improve the ability of steganography. In addition, the results also testified that the proposed method does not depend on the complex neural network structure. On this basis, we further proved that by changing the length of noise and the mapping relationships between coding information and noise, the number of generated images can be reduced, and the steganography ability and efficiency of the algorithm can be improved.

Xi Yu, Jianming Cui, Ming Liu
Design of Face Detection Algorithm Accelerator Based on Vitis

With the development of artificial intelligence, Machine learning based FPGA (Field Programmable Gate Array) is becoming more and more important, Compared with CPU and GPU, FPGA has the advantages of reconfigurability, low power consumption and high performance of parallel computing. Due to the complexity of FPGA development process. This can be said to be the biggest obstacle for FPGA to be widely used in the field of artificial intelligence. In this paper, the FPGA accelerator is designed by a concise computation framework. This development method shortens the development time. And the accelerator built in this paper detection speed is 9 times that of CPU. The detection power consumption is about 0.1 times that of GPU.

Jie Wang, Ao Gao, Jingxin Li
FSAFA-stacking2: An Effective Ensemble Learning Model for Intrusion Detection with Firefly Algorithm Based Feature Selection

This paper presents a two-layer ensemble learning model stacking2 based on the Stacking framework to deal with the problems of lack of generalization ability and low detection rate of single model intrusion detection system. The stacking2 uses SAMME, GBDT, and RF to generate the primary learner in the first layer and constructs the meta learner using the logistic regression algorithm in the second layer. The meta learner learns from the class probability outputs produced by the primary learner. In order to solve “the curse of dimensionality” of intrusion detection dataset, this paper proposes the feature selection approach based on firefly algorithm (FSAFA), which is used to select the optimal feature subsets. Based on the selected optimal feature subsets, the training set and test set are reconstructed and then applied to stacking2. As a result, a FSAFA based stacking2 intrusion detection model is proposed. The UNSW-NB15 and NSL-KDD datasets are chosen to verify the effectiveness of the proposed model. The experiment results show that the stacking2 intrusion detection model has better generalization ability than the individual learner based intrusion detection models. Compared with other typical algorithms, the FSAFA based stacking2 intrusion detection model has good performance in detection rate.

Guo Chen, Junyao Zheng, Shijun Yang, Jieying Zhou, Weigang Wu
Attention-Based Cross-Domain Gesture Recognition Using WiFi Channel State Information

Gesture recognition is an important step to realize ubiquitous WiFi-based human-computer interaction. However, most current WiFi-based gesture recognition systems rely on domain-specific training. To address this issue, we propose an attention-based cross-domain gesture recognition system using WiFi channel state information. In order to overcome the shortcoming of handcrafted feature extraction in state-of-the-art cross-domain models, our model uses the attention mechanism to automatically extract domain-independent gesture features from spatial and temporal dimensions. We implement the model and extensively evaluate its performance by using the Widar3 dataset involving 16 users and 6 gestures across 5 orientations and 5 positions in 3 different environments. The evaluation results show that, the average in-domain gesture recognition accuracy achieved by the model is 99.67% and the average cross-domain gesture recognition accuracies are 96.57%, 97.86% and 94.2%, respectively, in terms of rooms, positions and orientations. Its cross-domain gesture recognition accuracy significantly outperforms state-of-the-art methods.

Hao Hong, Baoqi Huang, Yu Gu, Bing Jia
Font Transfer Based on Parallel Auto-encoder for Glyph Perturbation via Strokes Moving

Glyph perturbation is an increasing subject in information embedding. It can be generated by moving strokes of Chinese characters to convey secret messages. However, the generation is limited by the large number and diverse fonts of Chinese characters. Several attempts have been made to generate Chinese characters in the font transfer based on deep learning, up to now no studies have investigated font transfer for glyph perturbation. We propose a font transfer method for glyph perturbation of Chinese characters named Glyph-Font, which focuses on the position of strokes while transferring fonts. More specifically, we first build an image dataset for glyph perturbation of Chinese characters through perturbing strokes. Secondly, the generator based on a parallel auto-encoder simultaneously generates four glyph perturbations for each character in target fonts. In addition, a discriminator is designed to optimize the network by calculating the difference between real and generated images of Chinese characters. Finally, perturbation loss and patch-pixel loss are defined to amend incorrectly generated pixels and distinguish position changes of strokes. Experimental results demonstrate that our proposed Glyph-Font has the potential to generate glyph perturbations of Chinese characters automatically in various fonts.

Chen Wang, Yani Zhu, Zhangyi Shen, Dong Wang, Guohua Wu, Ye Yao
A Novel GNN Model for Fraud Detection in Online Trading Activities

The previous graph neural network-based fraud detection techniques were usually realized by clustering the neighbors with different relationships. However, the graph-based datasets face the issues of imbalanced features, classifications, and relationships, which directly decreases the detection performance. In this case, this work proposes a novel real-time GNN model to address this issue. Firstly, the features are measured to find the entities which have the highest similarity to the fraudster. The entities are sampled to identify the fraudsters in training. The fraudsters are far less than the normal nodes in the dataset. We then combine the Under-Sampling algorithm and the long-distance sampling algorithm to find the nodes that are similar to the neighbors. Finally, a reinforcement learning (RL)-based reward and punishment mechanism is proposed for sampling the weight between the relationships. It is effective to the issue of imbalanced relationships in the graph-based dataset. Experiments show that the proposed technique is superior to the comparative models on the real-world fraud dataset.

Jing Long, Fei Fang, Haibo Luo

IoT

Frontmatter
Non-interactive Zero Knowledge Proof Based Access Control in Information-Centric Internet of Things

With the development of communication technology represented by 5G, the core business model of Internet of Things (IoT) has undergone great changes. The traditional host-centric network can no longer meet the needs of the IoT for throughput, privacy protection and interrupt tolerance. IC-IoT, the combination of ICN (Information Centric Networking) and IoT was put forward, which could provide scalable content distribution by using caching-router, multi-party communication, and the decoupling between senders and receivers. However, this paradigm still faces two major problems. First, the access control relationship established between publishers and subscribers requires additional maintenance of complex data structure and authentication process. Second, unencrypted named-data objects (NDO) lead to potential risks of privacy protection. To address those challenges, this paper proposes an algorithm called ZK-CP-ABE as an encryption means for distributed content distribution. Based on CP-ABE, it introduces the non-interactive zero knowledge proof protocol into the CP-ABE’s secret-key existence proof to ensure user privacy and reduce invalid bandwidth consumption. On this basis, a system called DPS-IoT is proposed, which uses Hyperledger Fabric based blockchain system to store access policies and evidence for ZKP to prevent them from being tampered with. In addition, we use smart contract to implement ZK-CP-ABE based access control, so as to improve the robustness and throughput of the system. Finally, by comparing with the existing related works, it is proved that the method and system proposed in this paper have greater advantages in utilization of transmission bandwidth, and better performance in system throughput.

Han Liu, Dezhi Han
Simultaneous Charger Placement and Power Scheduling for On-Demand Provisioning of RF Wireless Charging Service

To provision on-demand radio frequency (RF) charging service in a designated place, the problem of simultaneous charger placement and power scheduling is studied. Based on users’ historic spatial and temporal distribution information, we formulate the problem as how to place a given number of chargers and how to adjust the chargers’ power levels in each time interval so that the revenue of the charging service can be maximized, given the total power limitation constraint. The formulated problem is a mixed integer linear programming (MILP) problem and we propose a branch and bound (B&B) algorithm to solve it. Extensive simulations in both small-scale and large-scale networks, as well as simulations based on the real data set are conducted to validate the effectiveness of our proposed algorithm. The results show that, our proposed algorithm outperforms greedy algorithm, an algorithm that deals charger placement and power scheduling separately, in most of the simulation scenarios, and reaches the optimum in small-scale instances.

Huatong Jiang, Yanjun Li, Meihui Gao
A Cross-domain Authentication Scheme Based on Zero-Knowledge Proof

This paper proposes an anonymous, cross-domain authentication scheme based on zero-knowledge proof to combat the privacy leakage problem of cross-domain authentication when users in the heterogeneous domain access network services from different trust domains. First, we use the zero-knowledge proof algorithm to make the scheme independent of the trusted third party and realise secure data exchange between the device and the agent server (AS). The AS verifies the identity of the device through the proof that does not contain any private user information that can be reconstructed, which plays an effective role in protecting the privacy of the device. Second, the device submits the proof, which is generated from private device information and public parameter information. It has nothing to do with the trust domain authentication mechanism. Therefore, it can be used for mutual authentication between heterogeneous domains. Finally, we use the characteristics of decentralisation and tamper proof of blockchain technology to ensure the consistency of interdomain message storage and realise cross-domain authentication. Theoretical analysis shows that the scheme meets the security requirements of confidentiality, integrity and availability. The experimental results show that compared with the existing schemes, our scheme is feasible and effective.

Ruizhong Du, Xiaoya Li, Yan Liu
NBUFlow: A Dataflow Based Universal Task Orchestration and Offloading Platform for Low-Cost Development of IoT Systems with Cloud-Edge-Device Collaborative Computing

With the development of intelligent hardware technology, the heterogeneity of IoT devices in the edge and end layers, and diversity of application scenarios bring unprecedented challenges to the growth of IoT systems, mainly including a large amount of code be written for task construction in a cloud-edge-device collaborative environment, the inflexible configuration of offloading policies, and the high overhead in task scheduling/offloading. A dataflow based low-cost task orchestration and offloading platform, named as NBUFlow, is proposed to solve these problems. It is featured by defining tasks based on visualized dashboard of Node-RED to realize convenient and low-cost development of IoT systems, and multiple offload strategies of deployment and multi-task parallel processing based on dataflow migration. The performance of the platform in terms of complete time of task deployment are verified through experiments. Results show that the completion time of task offloading can be reduced by three times compared with the container migration-based offloading method.

Lei Wang, Haiming Chen, Wei Qin
IoT-GAN: Anomaly Detection for Time Series in IoT Based on Generative Adversarial Networks

In order to monitor the behaviors of IoT devices, a large amount of time series data are collected by sensors embedded in them. To take timely action further to resolve the underlying issues of IoT devices, it is critical to detect anomalies among the time series. However, anomaly detection for time series in IoT is particularly challenging due to its complex temporal dependence and dynamics. In this paper, we propose an unsupervised anomaly detection method for time series based on generative adversarial networks (GANs), which can learn the normal patterns of time series data, and then use the reconstruction errors to recognize anomalies. To the best of our knowledge, we are the first to incorporate Gated Recurrent Unit with bi-directional generative adversarial network architecture to capture the complex temporal dependence and dynamics of time series in IoT. We also introduce cycle-consistent loss as a deterministic control to further improve the performance of anomaly detection and stabilize GANs training. Based on the trained model, any newly arrived observation can immediately be determined as anomalous or not without requiring an additional inference time. Extensive empirical studies on three real-world datasets demonstrate that the proposed IoT-GAN is effective and efficient in detecting anomalies of time series in IoT.

Xiaofei Chen, Shuo Zhang, Qiao Jiang, Jiayuan Chen, Hejiao Huang, Chonglin Gu
Freshness and Power Balancing Scheduling for Cooperative Vehicle-Infrastructure System

Roadside units (RSUs) cache the sensed environment information in the buffer, and send them to the passing vehicles, which greatly improves the intelligence of the vehicle, and provides rich road information to the drivers. The age of information (AOI) can be used to describe the freshness of data accepted by the vehicles, which is defined as the elapsed time since the generation of the latest data received by the vehicles. The RSU transmits fused sensors’ data, such as cameras, Lider, to the vehicles, which just enter the coverage of the RSU, and the distance between the vehicles and the RSU is relatively far, then the AOI of vehicles can be minimized. However, it lead to great amount of energy consumption due to the large transmission distance between the vehicles and the RSU. The RSU could reduce the transmission power according to the current data queue length of its buffer and the speed of current passing vehicle, and tend to transmit data when vehicles approaching the RSU. Then, the average energy consumption of the RSU can be minimized when the AOI of the vehicles does not exceed the threshold. A freshness and power balancing scheduling strategy (FPBS) in cooperative vehicle-infrastructure system was proposed in this paper. The simulation results show that the proposed strategy can effectively reduce the average energy consumption under the constraint of the average AOI of vehicles.

Qian Qiu, Liang Dai, Guiping Wang
A Low Energy Consumption and Low Delay MAC Protocol Based on Receiver Initiation and Capture Effect in 5G IoT

The development of the fifth-generation (5G) network creates the possibility to deploy enormous sensors in the Internet of Things (IoT) network to transmit data with low delay. However, the traditional receiver-initiated MAC protocols used in IoT have the problems of high idle listening energy consumption and high transmission delay. A Low Energy Consumption and Low Delay MAC (Low-energy Low-latency MAC, LL-MAC) protocol based on receiver-initiated and captured effects is proposed. The proposed protocol realizes the fast matching between the senders and the receivers when the sending nodes have data to be sent. An improved greedy algorithm is proposed to allocate power to nodes, and a collision response mechanism is used for efficient data transmission.

Hua-Mei Qi, Jia-Qi Chen, Zheng-Yi Yuan, Lin-Lin Fan
Building Portable ECG Classification Model with Cross-Dimension Knowledge Distillation

Portable electrocardiogram (ECG) devices are general tools for diagnosing and analyzing cardiovascular diseases. However, they are limited in computation and storage resources, and it is necessary to compress the model. For mismatched data dimension between the 12-leads ECG data and the single-lead portable devices, conventional compression techniques cannot be applied to ECG classification model directly. To solve this problem, a novel adaptive knowledge-distillation-based model compression method is proposed. First, two kinds of teacher models are trained, which applies single lead and 12 leads ECG data respectively. Then, a feature extension module is built. It compensates single lead ECG data into 12 leads ECG data through generative adversarial networks (GANs). Finally, a model distillation is performed via all teacher models. In this way, the proposed approach brings a deeper level of interaction between the single lead data and 12 leads data. Experiment results show it outperforms existing diagnostic methods on our collected dataset. The F1 metric increases from 49.54% to 79.3%, which demonstrates the effectiveness of our approach.

Renjie Tang, Junbo Qian, Jiahui Jin, Junzhou Luo
Backmatter
Metadata
Title
Algorithms and Architectures for Parallel Processing
Editors
Yongxuan Lai
Prof. Tian Wang
Min Jiang
Guangquan Xu
Wei Liang
Aniello Castiglione
Copyright Year
2022
Electronic ISBN
978-3-030-95388-1
Print ISBN
978-3-030-95387-4
DOI
https://doi.org/10.1007/978-3-030-95388-1

Premium Partner