Skip to main content
Top

2022 | Book

Parallel and Distributed Computing, Applications and Technologies

22nd International Conference, PDCAT 2021, Guangzhou, China, December 17–19, 2021, Proceedings

Editors: Hong Shen, Dr. Yingpeng Sang, Yong Zhang, Nong Xiao, Dr. Hamid R. Arabnia, Geoffrey Fox, Prof. Ajay Gupta, Manu Malek

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This book constitutes the proceedings of the 22nd International Conference on Parallel and Distributed Computing, Applications, and Technologies, PDCAT 2021, which took place in Guangzhou, China, during December 17-19, 2021.

The 24 full papers and 34 short papers included in this volume were carefully reviewed and selected from 97 submissions. The papers are categorized into the following topical sub-headings: networking and architectures, software systems and technologies, algorithms and applications, and security and privacy.

Table of Contents

Frontmatter

Networking and Architectures

Frontmatter
Accelerating GPU-Based Out-of-Core Stencil Computation with On-the-Fly Compression

Stencil computation is an important class of scientific applications that can be efficiently executed by graphics processing units (GPUs). Out-of-core approaches help run large scale stencil codes that process data with sizes larger than the limited capacity of GPU memory. Nevertheless, performance of out-of-core approaches is always limited by the data transfer between the CPU and GPU. Many optimizations have been explored to reduce such data transfer, however, published results on the use of on-the-fly compression are insufficient. In this study, we propose a method that accelerates GPU-based out-of-core stencil computation with on-the-fly compression, introducing a novel data compression scheme that solves the data dependency between contiguous decomposed data blocks. We also modify a widely used GPU-based compression library to support pipelining that overlaps data transfer with computation. Experimental results show that the proposed method achieved a speedup of 1.2 $$\times $$ × compared with a method that involves no compression. Moreover, although precision loss caused by compression increased with the number of time steps, it was trivial up to 4,320 time steps, demonstrating the usefulness of the proposed method.

Jingcheng Shen, Yifan Wu, Masao Okita, Fumihiko Ino
Routing with Ant Colony Optimization in Wireless Mesh Networks

Multiple-radio multiple-channel wireless mesh networks (MRMC WMNs) are fitting as the wireless backbone networks for ubiquitous Internet access. It is quite a challenge to satisfy the multiple traffic requests from multiple source-destination pairs with different data transmission requirements. The multiple pair traffic flows may cause heavy conflict via the nature of wireless media. To take almost full use of the limited resources, we design a routing algorithm based on ant colony optimization. The pheromone leads the finding of primary paths. The various simulations show the efficiency of the algorithm performance.

Jiadong Peng, Zhanmao Cao, Qisong Huang
A Light-Weight Scheme for Detecting Component Structure of Network Traffic

The rapid development of network services not only expands the scale of Internet traffic, but also diversifies the types of traffic. In this work, we design a light-weight compromise scheme to meet the management requirements of large-scale and business sensitive scenarios. The proposed scheme regards the mixed traffic as a whole and directly analyzes the component structure for it. It converts the structural and attribute features into a traffic profile by encoding, embedding and mapping. Then the traffic profile is used to infer the component structure based on CNN. The proposed scheme has no need to perform flow-by-flow classification, it is not limited to the “quantity” balance of traffic, but also considers the types of traffic in each link. Based on the experiments with actual dataset, the results show that the proposed scheme can infer component structure for mixed traffic quickly and accurately.

Zihui Wu, Yi Xie, Ziyang Wu
Evaluating the Performance and Conformance of a SYCL Implementation for SX-Aurora TSUBASA

SX-Aurora TSUBASA (SX-AT) is a vector supercomputer equipped with Vector Engines (VEs). SX-AT has not only such a new system architecture, but also some execution modes to achieve high performance on executing a real-world application that often consists of vector friendly and unfriendly parts. Vector Engine Offloading (VEO) is a programming framework to offload only a vector-friendly part to VEs, and neoSYCL has been developed on top of VEO to allow programmers to use the standard SYCL interface at offload programming on SX-AT. However, it is unclear how much neoSYCL based on VEO can conform to the SYCL standard, which is primarily based on OpenCL. Therefore, this paper discusses the conformance of neoSYCL to the SYCL standard, and also the performance. Our thorough evaluation with SYCL-Bench kernels demonstrates that neoSYCL is conformant to the SYCL standard except for OpenCL-related features. In addition, the runtime overhead for using the SYCL interface on top of VEO is negligible in most cases, allowing the neoSYCL codes to achieve comparable performance with the VEO codes.

Jiahao Li, Mulya Agung, Hiroyuki Takizawa
Bayesian Optimization-Based Task Scheduling Algorithm on Heterogeneous System

In heterogeneous computing systems, efficient task scheduling is essential for utilizing resources and reducing computing time. This problem has been shown NP-complete in the general case. Existing solutions are mainly heuristic-based that would easily track into optimal local solutions and reinforcement learning-based that need an expensive computation cost for data training on neural networks. To overcome the shortcomings, we propose a Bayesian optimization based task scheduling algorithm that automatically searches for the best heuristic strategy in the problem space. Our algorithm builds a Bayesian optimization model on heuristic strategy and scheduling performance, and updates the model by interacting with the environment to find the optimal solutions globally. To enhance the confidence of our experiments, we measure the average (weighted) makespans and running time of our algorithm. The experimental results show that our approach can improve the scheduling performance compared to the baselines.

Tan Cai, Hong Shen
Optimizing Uplink Bandwidth Utilization for Crowdsourced Livecast

Driven by the prevalence of video generation devices and the development of network infrastructures, there has been an explosive growth of Crowdsourced Video Livecast (CVL) services in the past few years. Significant efforts have been made to provide high quality CVL services with limited bandwidth availability. However, most of the existing works focused on optimizing downlink bandwidth for video distribution rather than uplink bandwidth for video uploading. For example, uploaders (i.e., broadcasters) in Twitch can arbitrarily set their upload rates, which may lead to a significant waste of upload bandwidth with the increasing number of uploaders. In this paper, we propose an effective low-complexity algorithm called Bubal to optimize upload bandwidth allocation among massive uploaders. Our objective is to optimize the utility of video uploading from the perspective of CVL platform operators by considering both viewers Quality-of-Experience (QoE) and upload bandwidth cost. To guarantee the effectiveness and fairness of bandwidth allocation, we adopt the optimization framework of Nash Bargaining Solution (NBS), which can determine the optimal bandwidth budget, upload bitrate and datacenter selection for each uploader jointly. Finally, we conduct extensive trace-driven simulations to evaluate our proposed algorithm and the results show that our algorithm achieves much higher utility than alternative strategies in various conditions.

Xianzhi Zhang, Guoqiao Ye, Miao Hu, Di Wu
A Batched Jacobi SVD Algorithm on GPUs and Its Application to Quantum Lattice Systems

Batched linear algebra problems are becoming increasingly important in engineering and scientific applications. As the performance of graphics processing units (GPUs) improves rapidly, GPUs are very attractive to solve this class of problems. This paper presents a parallel blocked Jacobi SVD algorithm for many small matrices on GPUs. The parallelism of the Jacobi algorithm is squeezed sufficiently. Our algorithm can be mapped to the GPU memory hierarchy properly due to the blocking structure. Reduction operations used for computing inner products and having low thread utilization are instead by performing the Jacobi rotation on the Gram matrix in parallel. We identify the kernels with sharing data and fuse them to improve memory locality by placing shared data, originally passed via off-chip global memory, into the on-chip shared memory. Numerical results on an NVIDIA Tesla V100 GPU show that our batched SVD routine outperforms state-of-the-art approaches between $$2.0\times $$ 2.0 × and $$4.1\times $$ 4.1 × for the examples tested. As one of the applications for our routine, the numerical simulation of quantum lattice systems is tested and achieves a maximum of $$54.1\times $$ 54.1 × speedups over the CPU implementation running on a 48-core Xeon CPU.

Rongfeng Huang, Tianyu Yu, Shifang Liu, Xinyin Zhang, Yonghua Zhao
A Molecular Dynamics Based Multi-scale Platelet Aggregation Model and Its High-Throughput Simulation

In this paper, we develop a multi-scale model to simulate the aggregation of platelets in a low shear-coefficient flow. In this multi-scale model, the Morse potential is used to describe the interaction between the $$\alpha \mathrm {II}b\beta 3$$ α II b β 3 receptor and fibrinogen, the dissipative particle dynamics (DPD) is used to simulate fluids on the macro-scale, and the coarse-grained molecular dynamics (CGMD) is used to simulate the fine-scale receptors’ biochemical reactions. Moreover, with the assistance of the high-throughput simulations on the heterogeneous cluster, we calibrate the parameters for the Morse potential which are critical in the proper simulation of the aggregation of platelets. With this model, we simulate the long-term behaviour of thrombus formation constructed by many platelets. Our simulating results are consistent with in-vitro experiments on contact areas and detaching forces. Moreover, it reduces the computational cost significantly.

Zhipeng Xu, Qingsong Zou
Approximation and Polynomial Algorithms for Multi-depot Capacitated Arc Routing Problems

We study the multi-depot capacitated arc routing problem (MCARP), which generalizes the classical arc routing problem to the more realistic situation with multiple depots. We propose approximation and polynomial algorithms for different variants of the MCARP. First, we present the first constant-factor approximation algorithms for the MCARP and the nonfixed destination variant. Second, for a restricted case of the MCARP with infinite vehicle capacity, called the multi-depot rural postman problem, we devise a $$(2-\frac{1}{2k+1})$$ ( 2 - 1 2 k + 1 ) -approximation algorithm with k indicating the number of depots. Lastly, we show that the equal-demand MCARP defined on a line graph is polynomially solvable and develop a 2-approximation algorithm for the multi-depot capacitated vehicle routing problem on a line.

Wei Yu, Yujie Liao
Zero-Shot Face Swapping with De-identification Adversarial Learning

In this paper, we propose a Zero-shot Face Swapping Network (ZFSNet) to swap novel identities where no training data is available, which is very practical. In contrast to many existing methods that consist of several stages, the proposed model can generate images containing the unseen identity in a single forward pass without fine-tuning. To achieve it, based on the basic encoder-decoder framework, we propose an additional de-identification (De-ID) module after the encoder to remove the source identity information, which contributes to removing the source identity retaining in the encoding stream and improves the model’s generalization capability. Then we introduce an attention component (ASSM) to blend the encoded source feature and the target identity feature adaptively. It amplifies proper local details and helps the decoder attend to the related identity feature. Extensive experiments evaluated on the synthesized and real images demonstrate that the proposed modules are effective in zero-shot face swapping. In addition, we also evaluate our framework on zero-shot facial expression translation to show its versatility and flexibility.

Huifang Li, Yidong Li, Jiaming Liu, Zhibin Hong, Tianshu Hu, Yan Ren
An User-Driven Active Way to Push ACL in Software-Defined Networking

Compared with the traditional network, Software-Defined Networking (SDN) provides a more convenient network paradigm to build Access Control List (ACL) application. There has been a few studies focusing on ACL application in SDN up to now, but most of the existing work adopts a reactive way to enforce ACL, resulting in new ACL update can not take effect immediately. In this paper, we propose CLACK, an approach for user-driven centralized ACL in SDN. We implement CLACK on both Floodlight and ONOS controller. The experimental results show that CLACK has a better performance than the existing Floodlight firewall application.

Haisheng Yu, Dong Liu, Wenyong Wang, Keqiu Li, Sai Zou, Zhaobin Liu, Yan Liu
Photonic Computing and Communication for Neural Network Accelerators

Conventional electronic Artificial Neural Networks (ANNs) accelerators focus on architecture design and numerical computation optimization to improve the training speed. Optical technology with low energy consumption and high transmission speed are expected to play an important role in the next generation of computing architectures. To provide a better understanding of optical technology used in ANN acceleration, we present a comprehensive review for the optical implementations of ANNs accelerator in this paper. We propose a classification of existing solutions which are categorized into optical computing acceleration and optical communication acceleration according to optical effects and optical architectures. Moreover, we discuss the challenges for these photonic neural network acceleration approaches to highlight the most promising future research opportunities in this field.

Chengpeng Xia, Yawen Chen, Haibo Zhang, Hao Zhang, Jigang Wu
Performance Comparison of Multi-layer Perceptron Training on Electrical and Optical Network-on-Chips

Multi-layer Perceptron (MLP) is a class of Artificial Neural Networks widely used in regression, classification, and prediction. To accelerate the training of MLP, more cores can be used for parallel computing on many-core systems. With the increasing number of cores, interconnection of cores has a pivotal role in accelerating MLP training. Currently, the chip-scale interconnection can either use electrical signals or optical signals for data transmission among cores. The former one is known as Electrical Network-on-Chip (ENoC) and the latter one is known as Optical Network-on-Chip (ONoC). Due to the differences of optical and electrical characteristics, the performance and energy consumption of MLP training on ONoC and ENoC can be very different. Therefore, comparing the performance and energy consumption between ENoC and ONoC for MLP training is worthy of study. In this paper, we first compare the differences between ONoC and ENoC based on a parallel MLP training method. Then, we formulate their performance model by analyzing communication and computation time. Furthermore, the energy model is formulated according to their static energy and dynamic energy consumption. Finally, we conduct extensive simulations to compare the performance and energy consumption between ONoC and ENoC. Results show that compared with ENoC, the MLP training time of ONoC is reduced by 70.12% on average and the energy consumption of ONoC is reduced by 48.36% under batch size 32. However, with a small number of cores in MLP training, ENoC consumes less energy than ONoC.

Fei Dai, Yawen Chen, Zhiyi Huang, Haibo Zhang
The Design and Implementation of Reconfigurable Quaternary Logic Processor

We propose a multi-valued processor called reconfigurable quaternary logic processor (RQLP), where we use two binary bits to express one quaternary (i.e. 4-valued) bit. The RQLP can be built with massive processor bits. Each processor bit has a unified structure consisting of four column operators which are gathered by an electric potential combiner. The structure of each column operator is composed of a signal selector, working enabler, reconfiguration register, reconfiguration circuit, output enabler, and output generator. The unified structure of each processor bit can be reconfigured into one of $$4^{16}$$ 4 16 types of two-input quaternary logic operators. Compared with modern binary 64-bit processors, the proposed many-bit RQLP can perform much more types of logic operations via hardware, where the massive processor bits on a single RQLP can be divided for parallel processing. We design a general structure of RQLP and provide the prototype circuit for RQLP’s processor bit. We implement the RQLP using FPGA and verify it with different quaternary logic operations. Our results demonstrate the effectiveness of RQLP in the aspect of correctness and reconfigurability.

Hongjian Wang, Youdong Wu, Shan Ouyang, Xunlei Chen, Yunfu Shen, Yi Jin
A 3D Dubins Curve Constructing Method Based on Particle Swarm Optimization

The navigation error of aircraft increases in task. Aircraft has to correct the navigation error under structure constraints to avoid path deviation caused by navigation error. Aircraft path planning with navigation correction under the turning radius constraint is a challenge for traditional path planning methods. In this paper, we propose a 3D Dubins curve constructing method which can draw a smooth path in 3D space for the aircraft, next we extend Dynamic Programming for Navigation Error Correction method by 3D Dubins curves to abtain a feasible path under the constraints of turning radius, and then we improve particle swarm optimization method to compute an almost optimal Dubins curve. Finally our algorithm return a feasible smooth path with approximately the optimal length for the path planning problem with navigation correction under the turning radius constraint.

Cheng Ji, Chu Wang, Mingyan Song, Fengmin Wang

Software Systems and Technologies

Frontmatter
Towards Conflict-Aware Workload Co-execution on SX-Aurora TSUBASA

NEC SX-Aurora TSUBASA is the latest vector supercomputer, consisting of host processors called Vector Hosts (VHs) and vector processors called Vector Engines (VEs). The final goal of this work is to simultaneously use both VHs and VEs to increase the resource utilization and improve the system throughput by co-executing more workloads. However, performance interferences among VH and VE workloads could occur because they share some computing resources and potentially compete to use the same resource at the same time, so-called resource conflicts. As the first step to achieve efficient workload co-execution, this paper experimentally investigates the performance interference between a VH and a VE, when each of the two processors executes a different workload. Our evaluation results clearly demonstrate that some characteristics of a workload such as system call frequency can be used as a good indicator to predict if the workload can affect the performance of another co-executing workload. We believe that this will be helpful to identify a pair of workloads causing frequent resource conflicts, and thus reduce the risk of performance interference between co-executing workloads on an SX-AT system.

Riku Nunokawa, Yoichi Shimomura, Mulya Agung, Ryusuke Egawa, Hiroyuki Takizawa
A Learning-Based Scheduler for High Volume Processing in Data Warehouse Using Graph Neural Networks

The process of extracting, transforming, and loading (also known as ETL) of a high volume of data plays an essential role in data integration strategies in data warehouse systems in recent years. In almost all distributed ETL systems currently use in both industrial and academia context, a simple heuristic-based scheduling policy is employed. Such a heuristic policy tries to process a stream of jobs in the best-effort fashion, however, it can result in under-utilization of computing resources in most practical scenarios. On the other hand, such inefficient resource allocation strategy can result in an unwanted increase in the total completion time of data processing jobs. In this paper, we develop an efficient reinforcement learning technique that uses a Graph Neural Network (GNN) model to combine all submitted tasks graphs into a single graph to simplify the representation of the states within the environment and efficiently make a parallel application for processing of the submitted jobs. Besides, to positively augment the embedding features in each leaf node, we pass messages from leaf to root so the nodes can collaboratively represent actions within the environment. The performance results show up to 15% improvement in job completion time compared to the state-of-the-art machine learning scheduler and up to 20% enhancement compared to a tuned heuristic-based scheduler.

Vivek Bengre, M. Reza HoseinyFarahabady, Mohammad Pivezhandi, Albert Y. Zomaya, Ali Jannesari
Adaptive Updates for Erasure-Coded Storage Systems Based on Data Delta and Logging

With the explosive growth of data in modern storage systems, erasure coding is widely used to ensure data reliability because of its low storage cost and high reliability. However, a small update can lead to a partial update for erasure-coded storage system, the update of data incurs high I/O latency. This paper proposes an adaptive update approach, named DETOG, which efficiently speeds up the partial update for erasure-coded storage systems. DETOG employs machine learning approaches to classify files into non-write-only and write-only files. For non-write-only files, DETOG uses the data deltas that are the differences between latest data values and original data values, rather than the parity deltas, to reconstruct the lost data. This allows erasure-coded storage systems only need to read the old data for the first update instead of each update. For write-only files, DETOG directly appends the new data to the logs of the data nodes and the parity nodes. This allows erasure-coded storage systems not to read the old data for each update. We implement DETOG on the newly designed prototype storage system to perform performance evaluation. Extensive experimental results on real-world traces show that, DETOG can efficiently improve the I/O throughput.

Bing Wei, Jigang Wu, Xiaosong Su, Qiang Huang, Yujun Liu
Matching Program Implementations and Heterogeneous Computing Systems

High performance computing (HPC) systems have become highly parallel aggregations of heterogeneous system elements. Different kinds of processors, memory regions, interconnects and software resources constitute the modern HPC computing platform. This makes software development and efficient program execution a challenging task. Previously, we have developed a platform description framework for describing multiple aspects of computing platforms. It enables tools and users to better cope with the complexities of heterogeneous platforms in a programming model and system independent way.In this paper we present how our platform model can be used to describe program implementation variants that utilize different parallel programming models. We show that by matching platform models of program implementations to descriptions of a concrete heterogeneous system we can increase overall resource utilization. In addition, we show that our model featuring control relationships brings significant performance gains for finding platform patterns within a commonly used heterogeneous compute cluster configuration.

Martin Sandrieser, Siegfried Benkner
FastDCF: A Partial Index Based Distributed and Scalable Near-Miss Code Clone Detection Approach for Very Large Code Repositories

Despite a number of techniques have been proposed over the years to detect clones for improving software maintenance, reusability or security, there is still a lack of language agnostic approaches with code granularity flexibility for near-miss clone detection in big code in scale. However, it is challenging to detect near-miss clones in big code since it requires more computing and memory resources as the scale of the source code increases. In this paper, we present FastDCF, a fast and scalable distributed clone finder, which is partial index based and optimized with multithreading strategy. Furthermore, it overcomes single node CPU and memory resource limitation with MapReduce and HDFS by scalable distributed parallelization, which further improves the efficiency. It cannot only detect Type-1 and Type-2 clones but also can discover the most computationally expensive Type-3 clones for large repositories. Meanwhile, it works for both function and file granularities. And it supports many different programming languages. Experimental results show that FastDCF detects clones in 250 million lines of code within 24 min, which is more efficient compared to existing clone detection techniques, with recall and precision comparable to state-of-the-art approaches. With BigCloneBench, a recent and widely used benchmark, FastDCF achieves both high recall and precision, which is competitive with other existing tools.

Liming Yang, Yi Ren, Jianbo Guan, Bao Li, Jun Ma, Peng Han, Yusong Tan
Towards Optimal Fast Matrix Multiplication on CPU-GPU Platforms

Increasing computing power has become available through the use of GPUs, bringing new opportunities to accelerate fast matrix multiplication using GPUs. Although researchers have proposed several optimization schemes for the Strassen algorithm on the GPU, they have not fully utilized the computing resources of CPU. In this paper, we propose a CPU-GPU heterogeneous implementation for the Winograd algorithm based on task graph scheduling. It uses work-stealing scheduler to achieve balanced load. We also propose two recursive task graph extension strategies: homogeneous and heterogeneous extension. We invoke different execution strategies in different recursive levels and design a predictor based on the random forest regression model to make a decision. Finally, the experimental evaluations are performed on a CPU-GPU heterogeneous platform. It shows that the improved Winograd algorithm achieves an average speedup of 1.6x, 1.5x and 1.4x against to cuBLAS, Winograd on CPU, and Winograd on GPU for matrices with matrix dimension greater than 5000, respectively.

Senhao Shao, Yizhuo Wang, Weixing Ji, Jianhua Gao
Temperature Matrix-Based Data Placement Using Improved Hungarian Algorithm in Edge Computing Environments

The scale of data shows an explosive growth trend, with wide use of cloud storage. However, there are problems such as network latency and power costs. The emergence of edge computing brings data close to the edge of the network, making edge computing a good supplement to cloud computing. The spatiotemporal characteristics of data have been largely ignored in studies of data placement and storage optimization. To address this, we propose a temperature matrix-based data placement method using an improved Hungarian algorithm (TEMPLIH). A temperature matrix reflects the influence of data characteristics on its placement. A replica selection algorithm based on a temperature matrix (RSA-TM) can meet latency requirements. An improved Hungarian algorithm (IHA-RM) is proposed on the basis of replica selection, which satisfies the balance among the multiple goals of latency, cost, and load balancing. Compared with commonly used data placement strategies, experiments show that TEMPLIH can effectively reduce the cost of data placement while meeting user access latency requirements and maintaining a reasonable load balance between edge servers.

Yuying Zhao, Pengwei Wang, Hengdi Huang, Zhaohui Zhang
Realtime Physics Simulation of Large Virtual Space with Docker Containers

In this paper, we propose a way of distributed processing of realtime physics simulations for 3D video games with a large virtual space. The basic idea of the proposed method is to divide a given virtual space into multiple subspaces, and to simulate each subspace with a physics simulator running on a container of virtual environment by assuming that subspaces are sufficiently independent so that each simulation is not affected by the others. In the prototype system we have implemented, the configuration of objects in the subspace allocated to a client is exchanged among hosts every few frames through WebSocket. According to the experiments conducted with the prototype system, it is confirmed that we could achieve a sufficiently high processing speed and high frame rate by bounding the number of objects in each subspace, even if the entire virtual space contains a huge number of virtual objects exceeding 10,000.

Seiji Saito, Satoshi Fujita
A Deep Reinforcement Learning-Based Approach to the Scheduling of Multiple Workflows on Non-dedicated Edge Servers

Prompted by the remarkable progress in mobile communication technologies, more and more users are starting to execute their workflow applications on the mobile edge computing environment. Scheduling multiple parallel workflows on a non-dedicated edge server is a great challenge because of different users’ requirements. In this paper, we propose an approach based on Deep Reinforcement Learning (DRL) to schedule multiple workflows on an edge server with multiple heterogeneous CPUs to minimise the violation rate of service level agreement of workflows. The effectiveness of our proposed approach is evaluated by simulation experiments based on a set of real-world scientific workflows. The results show that our approach performs better than the current state-of-the-art approaches applied to similar problems.

Yongqiang Gao, Ke Feng
A MVCC Approach to Parallelizing Interoperability of Consortium Blockchain

Driven in part of the rapid growth of consortium blockchain applications, blockchain interoperability becomes extremely essential to exchange transactional data among decentralized applications. To ensure the data integrity of transactions, the state-of-the-art studies of the blockchain interoperability apply data locks, which however severely decrease system efficiency. To boost interoperability performance, this paper proposes a novel approach based on multi-version concurrency control to parallelize interoperable transactions, which aims high transaction processing throughput while ensuring data integrity. The experimental evaluation with the Smallbank benchmark shows that the proposed method achieves up to 4x performance increase (in terms of processed transactions per second, TPS) compared with the existing methods, and moreover, it decreases the average latency with 58%.

Weiyi Lin, Qiang Qu, Li Ning, Jianping Fan, Qingshan Jiang
An Effective and Reliable Cross-Blockchain Data Migration Approach

As blockchain is widely applied, various decentralized applications would inevitably encounter data migration problems, for reasons, such as the multilevel blockchain scenarios, the exhaustion of blockchain disk space and the swap of the rapidly evolving blockchain engines. In order to proceed the applications smoothly, it is necessary to migrate original blockchain data to a new blockchain instance, which is the cross-blockchain data migration. However, ensuring the reliability of data provenance and the data consistency, and balancing migration efficiency and historical state granularity, introduce unique challenges over cross-blockchain data migration. This paper proposes an effective and reliable cross-blockchain data migration approach to coping with these challenges. To ensure the reliability, a collective mechanism of controlling, executing and storing procedures is proposed to assort migration transactions between blockchains. Furthermore, we propose two migration schemes in order to adapt decentralized application scenarios. Extensive experiments are conducted to demonstrate the effectiveness of the proposed approach.

Mengqiu Zhang, Qiang Qu, Li Ning, Jianping Fan, Ruijie Yang
Algorithm for the Facility Location Problem with Origin and Destination

The Uncapacitated Facility Location Problem with Origin and Destination (FLPWOD) is an extension of the Uncapacitated Facility Location Problem (UFLP), where each unit of demand has its own origin and destination, and must be shipped from its origin via a location at which a transit station is built, to its destination. As in the UFLP, facilities can be opened at any of the predefined locations with given fixed costs. In classical location models, the clients have to be assigned to the open facilities, and the assignment cost is the distance between a client and an open facility. In the FLPWOD, the demands with origins and destinations have to be assigned to the open facilities, and the assignment cost is the length of a tour form the origin to the destination through an open facility. LP-rounding approximation algorithm is developed with the first constant approximation ratio 4.

Fengmin Wang, Chu Wang, Na Li, Wenxing Kang
Reinforcement Learning-Based Auto-scaling Algorithm for Elastic Cloud Workflow Service

Deploying a workflow engine as a service on a container cloud environment can improve its service quality and reliability, but auto-scaling of the elastic cloud workflow service doesn’t attract much study attention. Current auto-scaling algorithms oriented to common microservices consider little about the characteristics of a long time and high cost of starting up workflow service, which can easily cause problems such as untimely scaling and excessive scaling. Given this, based on reinforcement learning and semi-Markov decision process (SMDP) modeling, an auto-scaling algorithm for elastic cloud workflow engine is proposed, which enables the cloud workflow service to scale in time, appropriately allocating resources and ensuring service availability. Simulation comparison experiments show that the algorithm automatically scales instances in advance and adapts to changes in traffic through the reinforcement learning SMDP strategy, so that it reduces the violation rate in Service Level Agreements (SLA), and improves the availability of the cloud workflow service.

Jian-bin Lu, Yang Yu, Mao-lin Pan
Optimal Energy Efficiency Strategy of mm Wave Cooperative Communication Small Cell Based on SWITP

Aiming at the optimization problem in the stage of simultaneous wireless information and power transfer (SWITP), an optimal energy efficiency strategy of millimeter-wave cooperative communication small cell based on SWITP was proposed to maximize the link energy efficiency, in which the receiver of user equipment devices worked in the power splitting mode. Under the constraints of minimum link transmission rate and minimum energy harvested, the strategy maximized the link energy efficiency of the system by jointly optimizing the transmitting power control and the power splitting factor. Since the original problem is a non-convex fractional programming problem and the NP-hard, the strategy transformed the original problem into a tractable convex optimization problem which is easy to solve by Dinkelbach method, and then Lagrange dual method was used to solve the problem. Finally, a cross-iteration algorithm was designed to get the optimal solution. Simulation results show that the proposed strategy is more effective and superior than the traditional power control method and the maximum transmit power method.

Taoshen Li, Mingyu Lu
Low Latency Execution Guarantee Under Uncertainty in Serverless Platforms

Serverless computing recently emerged as a new run-time paradigm to disentangle the client from the burden of provisioning physical computing resources, leaving such difficulty on the service provider’s side. However, an unsolved problem in such an environment is how to cope with the challenges of executing several co-running applications while fulfilling the requested Quality of Service (QoS) level requested by all application owners. In practice, developing an efficient mechanism to reach the requested performance level (such as p-99 latency and throughput) is limited to the awareness (resource availability, performance interference among consolidation workloads, etc.) of the controller about the dynamics of the underlying platforms. In this paper, we develop an adaptive feedback controller for coping with the buffer instability of serverless platforms when several collocated applications are run in a shared environment. The goal is to support a low-latency execution by managing the arrival event rate of each application when shared resource contention causes a significant throughput degradation among workloads with different priorities. The key component of the proposed architecture is a continues management of server-side internal buffers for each application to provide a low-latency feedback control mechanism based on the requested QoS level of each application (e.g., buffer information) and the worker nodes throughput. The empirical results confirm the response stability for high priority workloads when a dynamic condition is caused by low priority applications. We evaluate the performance of the proposed solution with respect to the response time and the QoS violation rate for high priority applications in a serverless platform with four worker nodes set up in our in-house virtualized cluster. We compare the proposed architecture against the default resource management policy in Apache OpenWhisk which is extensively used in commercial serverless platforms. The results show that our approach achieves a very low overhead (less than 0.7%) while it can improve the p-99 latency of high priority applications by 64%, on average, in the presence of dynamic high traffic conditions.

M. Reza HoseinyFarahabady, Javid Taheri, Albert Y. Zomaya, Zahir Tari
High Resolution Patient-Specific Blood Flow Simulation in a Full-Size Aneurysmal Aorta Based on a Parallel Two-Level Method

An accurate and efficient blood flow simulation in patient-specific arteries is instructive for the diagnose and treatment of various vascular diseases, which is, however, computationally challenging because of the complicated geometry of the artery and the turbulence in the blood flow. In this work, we introduce a parallel scalable two-level additive Schwarz method for fast solving the Navier-Stokes equations in a patient-specific full-size aorta with aneurysms. Distributions of the hemodynamics, such as the pressure, velocity, and wall shear stress, are presented and analyzed. The algorithm is studied with a focus on its robustness against different values of model parameters and parallel scalability. The results show that the proposed method is robust to solve large and complicated simulation problems with over 25 million unstructured elements using over 5000 processors on a supercomputer.

Jie Zhou, Jing Li, Shanlin Qin, Rongliang Chen
Optimizing Data Locality by Executor Allocation in Reduce Stage for Spark Framework

Data locality is a key factor influencing the performance of Spark systems. As the execution container of tasks, the executors started on which nodes can directly affect the locality level achieved by the tasks. This paper tries to improve the data locality by executor allocation in reduce stage for Spark framework. Firstly, we calculate the network distance matrix of executors and formulate an optimal executor allocation problem to minimize the total communication distance. Then, an approximation algorithm is proposed and the approximate factor is proved to be 2. Finally, we evaluate the performance of our algorithm in a practical Spark cluster by using several representative benchmarks: sort, pageRank and LDA. Experimental results show that the proposed algorithm can help to improve the data locality and application/job performance obviously.

Zhongming Fu, Mengsi He, Zhuo Tang, Yang Zhang
TEFRED: A Temperature and Energy Cognizant Fault-Tolerant Real-Time Scheduler Based on Deadline Partitioning for Heterogeneous Platforms

Energy consumption and peak temperatures on MPSoCs are growing exponentially as transistor density increases, rendering systems unstable. Thus, in modern real-time systems, fault-tolerance is an essential design feature. This paper proposes TEFRED, a heuristic scheduling strategy that addresses the problem of controlling energy and peak temperature levels simultaneously on systems with two types of cores while remaining resistant to transient faults. Our experimental results demonstrate that TEFRED can save considerable energy and lower core peak temperatures compared to a state-of-the-art energy-efficient fault-tolerant scheduler.

Yanshul Sharma, Zinea Das, Sanjay Moulik

Algorithms and Applications

Frontmatter
Social Recommendation via Graph Attentive Aggregation

Recommender systems play an important role in helping users discover items of interest from a large resource collection in various online services. Although deep graph neural network-based collaborative filtering methods have achieved promising performance in recommender systems, they are still some weaknesses. Firstly, existing graph neural network methods only take user-item interactions into account neglecting direct user-user interactions which can be obtained from social networks. Secondly, they treat the observed data uniformly without considering fine-grained differences in importance or relevance in the user-item interactions. In this paper, we propose a novel graph neural network social graph attentive aggregation (SGA) which is suitable for parallel training to boost efficiency which is the common bottleneck for neural network deployed machine learning models. This model obtains user-user collaborative information from social networks and utilizes self-attention mechanism to model the differentiation of importance in the user-item interactions. We conduct experiments on two real-world datasets and the results demonstrate that our method is effective and can be trained in parallel efficiently.

Yuanwei Liufu, Hong Shen
MACSQ: Massively Accelerated DeepQ Learning on GPUs Using On-the-fly State Construction

The current trend of using artificial neural networks to solve computationally intensive problems is omnipresent. In this scope, DeepQ learning is a common choice for agent-based problems. DeepQ combines the concept of Q-Learning with (deep) neural networks to learn different Q-values/matrices based on environmental conditions. Unfortunately, DeepQ learning requires hundreds of thousands of iterations/Q-samples that must be generated and learned for large-scale problems. Gathering data sets for such challenging tasks is extremely time consuming and requires large data-storage containers. Consequently, a common solution is the automatic generation of input samples for agent-based DeepQ networks. However, a usual workflow is to create the samples separately from the training process in either a (set of) pre-processing step(s) or interleaved with the training process. This requires the input Q-samples to be materialized in order to be fed into the training step of the attached neural network. In this paper, we propose a new GPU-focussed method for on-the-fly generation of training samples tightly coupled with the training process itself. This allows us to skip the materialization process of all samples (e.g. avoid dumping them disk), as they are (re)constructed when needed. Our method significantly outperforms usual workflows that generate the input samples on the CPU in terms of runtime performance and memory/storage consumption.

Marcel Köster, Julian Groß, Antonio Krüger
Model-Based Multi-agent Policy Optimization with Dynamic Dependence Modeling

This paper explores the combination of model-based methods and multi-agent reinforcement learning (MARL) for more efficient coordination among multiple agents. A decentralized model-based MARL method, Policy Optimization with Dynamic Dependence Modeling (POD2M), is proposed to dynamically determine the importance of other agents’ information during the model building process. In POD2M, the agents adapt their mutual dependence during building their own dynamic models in order to make a trade-off between an individual-learning process and a coordinated-learning process. Once the dynamic models have been built, the policies are then trained based on one-step model predictive rollouts. Empirical experiments on both cooperative and competitive scenarios indicate that our method can achieve higher sample efficiency against the compared model-free MARL algorithms, and outperforms the centralized method in large domains.

Biyang Hu, Chao Yu, Zifan Wu
Multi-index Federated Aggregation Algorithm Based on Trusted Verification

Movited by the modern phenomenon of distributed data collected by edge devices at scale, federated learning can use the large amounts of training data from diverse users for better representation and generalization. To improve flexibility and scalability, we propose a new federated optimization algorithm, named as Multi-index federated aggregation algorithm based on trusted verfication(TVFedmul). TVFedmul is optimized based on Fedavg algorithm, which overcomes a series of problems caused by the original aggregation algorithm, which only takes the single index of data quantity as a reference factor to measure the aggregation weight of each client. The improved aggregation algorithm is based on multi-index measurement, which can reflect the comprehensive ability of clients more comprehensively, so as to make overall judgment. Further, we introduces hyperparameter α, which can be changed to determine the importance of the indexes. Finally, via extensive experimentation, the efficiency and effectiveness of the proposed algorithm is verified.

Zhenshan Bao, Wei Bai, Wenbo Zhang
Few-Shot Generative Learning by Modeling Stereoscopic Priors

Few-shot image generation, which aims to generate images from only a few images for a new category, has attracted some research interest in recent years. However, existing few-shot generation methods only focus on 2D images, ignoring 3D information. In this work, we propose a few-shot generative network which leverages 3D priors to improve the diversity and quality of generated images. Inspired by classic graphics rendering pipelines, we unravel the image generation process into three factors: shape, viewpoint and texture. This disentangled representation enables us to make the most of both 3D and 2D information in few-shot generation. To be specific, by changing the viewpoint and extracting textures from different real images, we can generate various new images even in data-scarce settings. Extensive experiments show the effectiveness of our method.

Yuehui Wang, Qing Wang, Dongyu Zhang
Distributed Fair k-Center Clustering Problems with Outliers

Big data clustering is a fundamental problem with a vast number of applications. Due to the increasing size of data, interests in clustering problems in distributed computation models have increased. On the other hand, because important decision making is being automated with the help of algorithms, therefore, fairness in algorithms has become an especially important research topic. In this work, we design new distributed algorithms for the fair k-center problem with outliers. Our main contributions are: (1) In the fair k-center problem with outliers setting we give a 4-approximation ratio algorithm. (2) In the distributed fair k-center problem with outliers setting we give a 18-approximation ratio algorithm.

Fan Yuan, Luhong Diao, Donglei Du, Lei Liu
Multi-zone Residential HVAC Control with Satisfying Occupants’ Thermal Comfort Requirements and Saving Energy via Reinforcement Learning

Residential HVAC system control has been focused on thermal comfort and energy consumption. Due to the complexity of the dynamic building thermal model, weather conditions and human activities, traditional methods such as rule-based control (RBC) and model predictive control (MPC) are difficult to learn a strategy that can save energy while satisfying occupants’ thermal comfort requirements. To solve the above problem, we propose a method combining a thermal comfort prediction model and reinforcement learning to optimize residential multi-zone HVAC control. In this paper, we first design a hybrid model of Support Vector Regression and a Deep Neural Network (SVR-DNN) to predict thermal comfort value, which is taken as a part of the state and reward in reinforcement learning. Then we apply reinforcement learning algorithms (Q-learning, Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG)) to respectively generate an optimal HVAC control strategy to maintain the stability of thermal comfort and minimize energy consumption. The experimental results show that our SVR-DNN model can improve thermal comfort prediction performance by $$20.5\%$$ 20.5 % compared with the deep neural network (DNN); compared with rule-based control, DDPG, DQN and Q-learning based on SVR-DNN can reduce energy consumption by $$11.89\%$$ 11.89 % , $$8.41\%$$ 8.41 % , $$6.51\%$$ 6.51 % and reduce thermal comfort violation by $$91.8\%$$ 91.8 % , $$43.2\%$$ 43.2 % , $$25.4\%$$ 25.4 % .

Zhengkai Ding, Qiming Fu, Jianping Chen, Hongjie Wu, You Lu, Fuyuan Hu
Approximating BP Maximization with Distorted-Based Strategy

We study a problem of maximizing the sum of a suBmodular and suPermodular (BP) function, denoted as $$\max _{\mathcal {S}\subseteq \mathcal {V}, |\mathcal {S}|\le k}\mathcal {G}(\mathcal {S})+\mathcal {L}(\mathcal {S})$$ max S ⊆ V , | S | ≤ k G ( S ) + L ( S ) , where $$\mathcal {G}(\cdot )$$ G ( · ) is non-negative monotonic and submodular, $$\mathcal {L}(\cdot )$$ L ( · ) is monotonic and supermodular. In this paper, we consider the $$\mathcal {K}$$ K -cardinality constrained BP maximization under a streaming setting. Denote $$\kappa $$ κ as the supermodular curvature of $$\mathcal {L}$$ L . Utilizing a distorted threshold-based technique, we present a first $$(1-\kappa )/(2-\kappa )$$ ( 1 - κ ) / ( 2 - κ ) -approximation semi-streaming algorithm and then implement it by lazily guessing the optimum threshold and yield a one pass, $$\mathcal {O}(\varepsilon ^{-1}\log ((2-\kappa )\mathcal {K}/(1-\kappa )^2))$$ O ( ε - 1 log ( ( 2 - κ ) K / ( 1 - κ ) 2 ) ) memory complexity, $$((1-\kappa )/(2-\kappa )-\mathcal {O}(\varepsilon ))$$ ( ( 1 - κ ) / ( 2 - κ ) - O ( ε ) ) -approximation. We further study the BP maximization with fairness constrains and develop a distorted greedy-based algorithm, which gets a $$(1-\kappa )/(2-\kappa )$$ ( 1 - κ ) / ( 2 - κ ) -approximation for the extended fair BP maximization.

Ruiqi Yang, Suixiang Gao, Lu Han, Gaidi Li, Zhongrui Zhao
Streaming Algorithms for Maximization of a Non-submodular Function with a Cardinality Constraint on the Integer Lattice

We consider the maximization of a monotone non-submodular function with a cardinality constraint on the integer lattice. As our main contribution, two streaming algorithms with provable good performance guarantee and reasonable query time complexity are proposed.

Jingjing Tan, Yue Sun, Yicheng Xu, Juan Zou
Adaptable Focal Loss for Imbalanced Text Classification

In this paper, we study the problem of imbalanced text classification based on the pre-trained language models. We propose the Adaptable Focal Loss (AFL) method to solve this problem. Firstly, we use the word embeddings from the pre-trained models to construct the sentence level prior by the sum of the word embeddings in the sentence. Then, we extend the Focal Loss, which is widely used in the field of object detection, by replacing the task-special parameters with the scaled-softmax of the distance between the fine-tuned embeddings and the prior embeddings from the pre-trained models. By removing the task-special parameters in Focal Loss, not only can the parameters of arbitrary imbalanced proportion distribution be adjusted automatically according to the task, but also the sentences that are difficult to classify can be given a higher weight. Experimental results show that our methods can easily combine with the common classifier models and significantly improve their performances.

Lu Cao, Xinyue Liu, Hong Shen
Roman Amphitheater Classification Using Convolutional Neural Network and Data Augmentation

In this paper, we propose a neural model to classify Roman amphitheater images into six groups corresponding to their locations: Rome, Eljem, Nîmes, Arles, Verona and Pula. The proposed neural structure makes essential use of convolutional neural networks for feature extraction and images classification. To avoid overfitting, data augmentation techniques have been deployed to expand the data set size by a magnitude of 16. This method is applied on the augmented training set and results show a substantial performance and high accuracy of $$98.33\%$$ 98.33 % compared to state-of-the-art methods.

Haïfa Nakouri
Data-Hungry Issue in Personalized Product Search

Product search has been receiving significant attention with the development of e-commerce . Existing works recognize the importance of personalization and focus on personalized product search. While these works have confirmed that personalization can improve the performance of product search, they all ignore the few-shot learning problems caused by personalization. Under the few-shot setting, personalized methods may suffer from the data-hungry issue. In this paper, we explore the data-hungry issue in personalized product search. We find that data-hungry issue exists under the few-shot setting caused by personalization, and degrades the performance under the few-shot setting when the input query consists of diverse intents. Furthermore, we illustrate that with such a data-hungry issue, the returned search results tend to be close to the products the user purchases most often, or the products the most users purchase in the market given the same query. The result in the further experiment confirms our conclusions.

Bin Wu, Yuehong Wu, Shangsong Liang
Jointly Super Resolution and Degradation Learning on Unpaired Real-World Images

Recently super-resolution methods based on CNN have achieved amazing success. However, the effects of these methods on real-world images are not available. The main reason is that most of them use bicubic downsampling by default to obtain degraded low-resolution images, while the degradation process of real-world images is unknown. In our work, we argue that image degradation and super-resolution are tightly coupled. In order to complete this cycle, we propose a framework to jointly learn the degradation and super-resolution of real-world images. At the same time, in order to stabilize learning and optimize performance, we have combined a variety of image content losses. Our framework can not only achieve real-world super-resolution, but also generate paired unknown degraded datasets for other super-resolution methods. The experiments on the NTIRE2020 real-world SR dataset show the effectiveness of our model.

Xuankun Chen, Junhong Chen, Dongyu Zhang
Enhanced Discriminant Local Direction Pattern Learning for Robust Palmprint Identification

Direction-based methods have been widely used in palmprint recognition methods. However, most existing palmprint direction patterns-based methods need rich prior knowledge, and usually ignores the relationships among different samples. Furthermore, how to make the extracted features more discriminative is also a dilemma to improving the recognition performance. To solve these problems, we propose to learn enhanced discriminative direction pattern in this study. We first extract the complete and stable local direction patterns, where a salient convolution average feature (EDL) is extracted from the palmprint image. Afterwards, a linear regression learning model is introduced to enhance the discriminant of EDL, such that the representation of the direction pattern can be improved. Experimental results on 4 real-world palmprint databases show that the proposed method can outperform the other state-of-the-art related methods.

Siyuan Ma, Qintai Hu, Shuping Zhao, Lin Jiang, Wenyan Wu
Latent Multi-view Subspace Clustering Based on Schatten-P Norm

In this paper, we aim at the research of rank minimization to find more accurate low-dimensional representations for multi-view subspace learning. The Schatten-p norm is utilized as the rank relaxation function for subspace learning to enhance its ability to recover the low rank matrices, and a multi-view subspace clustering algorithm via maximizing the original feature information is proposed under the assumption that each view is derived from a latent representation. With the Schatten-p norm, the proposed algorithm can improve the quality and robustness of the latent representations. The effectiveness of our method is validated through experiments on several benchmark datasets.

Yuqin Lu, Yilan Fu, Jiangzhong Cao, Shangsong Liang, Wing-kuen Ling

Security and Privacy

Frontmatter
MOFIT: An Efficient Access Control Scheme with Attribute Merging and Outsourcing Capability for Fog-Enhanced IoT

The advancement of technology in the modern era has accelerated the growth of Internet of things (IoT) resulting in an exponential increase in the number of connected devices and the generated data. This encouraged the development of paradigm fog computing, which facilitates data analysis and processing at the edge. Along with fog, cloud co-exists to provide various services such as massive storage, processing resources, etc. However, data storage and computation at multiple levels raise the risk of data security. Ciphertext-policy attribute-based encryption (CP-ABE) is a well-known cryptographic technique for providing fine-grained access control. Unfortunately, the existing CP-ABE schemes do not support the functionality of attribute merging. Therefore, we propose a CP-ABE scheme named MOFIT that solves this long-standing problem. Additionally, expensive encryption and decryption operations are outsourced to fog nodes, which reduces the computation overhead of resource-constrained IoT devices. Further, the task of attribute merging is also outsourced to the third party. The size of the secret key held by the data user is constant and remains unaltered during any updates. According to security and performance analysis, MOFIT is secure and suitable for IoT applications.

Richa Sarma, Ferdous Ahmed Barbhuiya
RepBFL: Reputation Based Blockchain-Enabled Federated Learning Framework for Data Sharing in Internet of Vehicles

Internet of Vehicles (IoV) enables the integration of smart vehicles with Internet and collaborative analysis from shared data among vehicles. Machine learning technologies show significant advantages and efficiency for data analysis in IoV. However, the user data could be sensitive in nature, and the reliability and efficiency of sharing these data is hard to guarantee. Moreover, due to the intermittent and unreliable communications of various distributed vehicles, the traditional machine learning algorithms are not suitable for heterogeneous IoV network. In this paper, we propose a novel reputation mechanism framework that integrates the IoV with blockchain and federated learning named RepBFL. In this framework, blockchain is used to protect the shared data between the vehicles. The Road Side Units (RSU) select the high reputation vehicular nodes to share their data for federated learning. To enhance the security and reliability of the data sharing process, we develop the reputation calculated mechanism to evaluate the reliability of all vehicles in IoV. The proposed framework is feasible for the large heterogeneous vehicular networks and perform the collaborative data analysis in distributed vehicles. The experimental results show that the proposed approach can improve the data sharing efficiency. Furthermore, the reputation mechanism is able to deal with malicious behaviors effectively.

Haoyu Chen, Naiyue Chen, He Liu, Honglei Zhang, Jiabo Xu, Huaping Chen, Yidong Li
Multimodal Fusion Representation Learning Based on Differential Privacy

Multimodal data for a certain target can often play a complementary role in information integration, but the diversification of the modal brings difficulties to the training of the model. Further, previous differential privacy works are only performed on a single modality. To tackle the problem, we choose deep representation learning to map different modalities data into the same subspace. This method of fusing multiple modalities uses low-rank decomposition based on Canonical Polyadic (CP) decomposition to implicitly obtain a high-dimensional tensor rich in mutual fusion information between multiple modalities, but explicitly obtain a low-dimensional representation. The perturbation that satisfies differential privacy is then carried out in the dimensional subspace. Experimental results show that it satisfies the data utility requirement while remaining suited privacy guarantee.

Chaoxin Cai, Yingpeng Sang, Jinghao Huang, Maliang Zhang, Weizheng Li
Efficient List Decoding Applied to 

$$\mathrm{ECC}^2$$ ECC 2 is an public key encryption system based on elliptic code. It can resist known attacks based on the special structures of algebraic geometric code. However, the computational overhead of decryption of $$\mathrm{ECC}^2$$ ECC 2 is unsatisfactory, because the list decoding algorithm occupies a major part of the computational overhead of decryption of $$\mathrm{ECC}^2$$ ECC 2 . Therefore, we propose our module basis reduction interpolation of list decoding for elliptic code to speed up the decryption of $$\mathrm{ECC}^2$$ ECC 2 . The algorithm we proposed is based on the theory of Gr $$\ddot{\mathrm{o}}$$ o ¨ bner basis of modules. By implementing our proposed algorithm combined with $$\mathrm{ECC}^2$$ ECC 2 , it shows that the proposed algorithm performs better than the list decoding algorithms used in $$\mathrm{ECC}^2$$ ECC 2 .

Peidong Guan, Yunqi Wan, Zhuoran Zhang, Fangguo Zhang
Federated Data Integration for Heterogeneous Partitions Based on Differential Privacy

Federated learning has recently become a research hotspot in distributed learning, and its purpose is to jointly train machine learning models on the premise of protecting privacy. However, there are some problems with federated learning. Each machine learning algorithm must be modified in order to complete the training. The data partitioning is either horizontal or vertical, which is not flexible enough. In addition, there are many rounds of communication during the training process, so the training efficiency is low. In order to address these problems, we propose a generic federated integration method for multiple data sources. The method can integrate data in arbitrary partitions, protect the privacy based on differential privacy, and reduce communication cost based on singular value decomposition. After the data are modelled in this method, they can be transferred to the center for purpose of federated learning. Our method includes four algorithms. We give a theoretical proof on the method’s satisfying of differential privacy. Finally, experiments are conducted to demonstrate the performance of the method in prediction accuracy and data compression.

Jinghao Huang, Yingpeng Sang, Chaoxin Cai, Weizheng Li, Maliang Zhang
Patient-Chain: Patient-centered Healthcare System a Blockchain-based Technology in Dealing with Emergencies

Medical healthcare currently plays a vital role for humans in society. For each patient, personal health records are critical and sensitive assets, so how to manage them effectively is becoming exciting research to solve. Many types of research in managing and operating personal health records have been introduced; however, dealing with patients’ data in emergency cases remains an uncertain issue. When emergencies happen in reality, using a traditional access system is challenging for patients to consent medical staff, i.e., nurses or doctors, to access their data. Besides, there is no secured record management of patient’ data, which reveals highly confidential personal information, such as what happened, when, and who has access to such information. Thus, this paper proposes a control and management system regarding emergency access to protect the patients’ data called the Patient-Chain platform: a patient-centred healthcare system, a Blockchain-based technology in dealing with emergencies. The Patient-Chain system is built based on permitted Blockchain Hyperledger Fabric, defines several rules and regulations by using smart contracts and time duration to deal with emergencies. The patients also restrict the time to access the data in such urgent cases-several algorithms representing how the system works are also provided to make readers understand the proposed management system.

Hai Trieu Le, Lam Nguyen Tran Thanh, Hong Khanh Vo, Hoang Huong Luong, Khoi Nguyen Huynh Tuan, Tuan Dao Anh, The Anh Nguyen, Khang Hy Nguyen Vuong, Ha Xuan Son
A Differential Privacy Image Publishing Method Based on Wavelet Transform

Image is an important information-bearing medium with many important attributes. If the image data is released directly, personal privacy will be compromised. This paper aims at how to use the method of differential privacy to protect the privacy of image data and make the image data have high usability. In this paper, a WIP method based on wavelet change is proposed. Firstly, wavelet transform is used to compress the image. Then, noise is added to the main features after transformation to obtain the published image satisfying the differential privacy. It solves the problem of low usability of large images and the problem that Fourier transform cannot deal with abrupt signal. Experimental results show that compared with similar methods in the frequency domain, the denoised image obtained by the proposed WIP method is more distinguishable and the information entropy is closer to the original image. The accuracy is 10% higher than other methods. Compared with other frequency-domain methods for image differential privacy protection, the proposed WIP method has higher usability and robustness.

Guifen Zhang, Hangui Wei, Lina Ge, Xia Qin
Traffic Matrix Prediction Based on Differential Privacy and LSTM

The traffic matrix (TM) is an important type of information in network management, which is needed in network load balancing and routing configuration. Due to technical and cost reasons, it is difficult to directly measure the matrix, but we can use prediction instead of direct measurement. The Long Short-Term Memory(LSTM) model in deep learning is very suitable for time series forecasting problems. However, due to the characteristics of deep learning, the data samples used to train the network will leave their own traces on the final model, which allows attackers to restore training samples through member inference attacks, causing privacy leakage. This paper focuses on combining the differential privacy mechanism with the LSTM model. In the gradient descent stage of training, a well-controlled noise is added to protect the final model. In the experiments, we verify the feasibility of the method proposed in this paper on the data set Abilene.

Weizheng Li, Yingpeng Sang, Maliang Zhang, Jinghao Huang, Chaoxin Cai
A Blockchain-Based Continuous Query Differential Privacy Algorithm

In the differential privacy interactive framework, data sets need to be used to answer multiple queries. With the gradual consumption of the privacy budget, the risk of privacy disclosure increases. Therefore, it is essential to save and track the consumption of the privacy budget, which should not exceed the limit given by the privacy budget. Therefore, firstly, this paper optimizes the Gaussian mechanism to reduce the query response time; Then a Continuous Query Differential Privacy Mechanism (CQDPM) is designed to save the overhead of privacy budget and improve the availability of data; Use the blockchain to record the privacy budget to facilitate the query of the usage of the privacy budget; Finally, a data integrity verification algorithm is proposed by using blockchain. Experiments show that the proposed mechanism reduces the query response time, effectively saves the privacy budget overhead under the same privacy budget limit, and has higher data availability.

Heng Ouyang, Hongqin Lyu, Shigong Long, Hai Liu, Hongfa Ding
Formalization and Verification of Group Communication CoAP Using CSP

With the rapid expansion of Internet of Things (IoT), Constrained Application Protocol (CoAP) is developed to enable those devices with small memory, constrained computing power and limited ability to communicate with other nodes in the network. Meanwhile, group communication is very useful for managing and controlling a set of homogeneous devices in many IoT scenarios. Thus, many scholars are devoted to expanding CoAP to enable group communication. Furthermore, because CoAP is widely applicated in transportation, health care, industrial and many other areas, the security and consistency of data is of great importance. In this paper, we adopt Communicating Sequential Processes (CSP) to model group communication CoAP, and we use model checker Process Analysis Toolkit (PAT) to verify six properties of our model, including deadlock freedom, divergence freedom, data reachability, data leakage, client faking and entity manager faking. The verification results show that the original architecture has the security risk of data leakage. So we enhance it by adding message authentication code in the process. In the light of the new verification results, it can be found that we succeed in eliminating the possibility of data leakage.

Sini Chen, Ran Li, Huibiao Zhu
Backmatter
Metadata
Title
Parallel and Distributed Computing, Applications and Technologies
Editors
Hong Shen
Dr. Yingpeng Sang
Yong Zhang
Nong Xiao
Dr. Hamid R. Arabnia
Geoffrey Fox
Prof. Ajay Gupta
Manu Malek
Copyright Year
2022
Electronic ISBN
978-3-030-96772-7
Print ISBN
978-3-030-96771-0
DOI
https://doi.org/10.1007/978-3-030-96772-7

Premium Partner