Skip to main content

2020 | Buch

Parallel Architectures, Algorithms and Programming

10th International Symposium, PAAP 2019, Guangzhou, China, December 12–14, 2019, Revised Selected Papers


Über dieses Buch

This book constitutes the refereed proceedings of the 10th International Symposium on Parallel Architectures, Algorithms and Programming, PAAP 2019, held in Guangzhou, China, in December 2019.

The 39 revised full papers and 8 revised short papers presented were carefully reviewed and selected from 121 submissions. The papers deal with research results and development activities in all aspects of parallel architectures, algorithms and programming techniques.




On a Coexisting Scheme for Multiple Flows in Multi-radio Multi-channel Wireless Mesh Networks

Multi-radio multi-channel (MRMC) wireless mesh networks (WMNs) hold the promise to become the next-generation networks to provide the service of ubiquitous computing, access to the Internet, and support for a large number of data flows. Many applications in WMNs can be modeled as a multi-flow coexistence problem. Assembling links distributed in orthogonal channels to support multiple flows is essentially a combinatorial optimization problem, which concerns channel assignment, path finding, and link scheduling. To make full use of network resources, links in different channel layers should be concatenated to compose data transfer paths for multiple flows with awareness of nodes’ free interfaces and available channels. Based on the analysis of traffic behaviors, this paper designs a coexisting algorithm to maximize the number of flows. Simulations are conducted in combinatorial cases with various traffic requests of multiple pairs, and the results show the efficacy of the proposed algorithm over a random network topology. This scheme can be used to develop routing and scheduling solutions for multi-flow network tasks through prior computing.

Zhanmao Cao, Qisong Huang, Chase Q. Wu, Wenkang Kong, Aiqin Hou
Non-linear K-Barrier Coverage in Mobile Sensor Network

Intrusion detection is an important application of wireless sensor network. A belt region is said to be k-barrier covered if any intruder crossing the width of this region can be detected by at least k sensors. After initial random sensor deployment, k-barrier coverage can be achieved by relocating mobile sensors to construct k barriers appropriately. Due to energy limit of mobile sensors, it is important to reduce the amount of movements of mobile sensors. Existing algorithms focused on forming k linear barriers energy-efficiently. However, large redundant sensor movement are needed by mobile sensors to move to linear barriers. In this paper, we will study how to form k non-linear barriers energy-efficiently, which can result smaller sensor movement than linear barriers. We define a notion of horizontal virtual force by considering the euclidean distance and also horizontal angle and then propose an energy-efficient algorithm to form k non-linear barriers based on the initial deployment of mobile sensors. The algorithm first divides the region into several subregions and then constructs k sub-barriers from the left boundary of each subregion to the right boundary respectively by always choosing the mobile sensor chain with the largest horizontal virtual force and also flattening it and finally connects the sub-barriers in neighbor subregions for forming k barriers in the whole region. Simulation results show that this algorithm efficiently decreases the movements of mobile sensors compared to a linear k-barrier coverage algorithm and it can be applicable to large scale sensor networks.

Zijing Ma, Shuangjuan Li, Longkun Guo, Guohua Wang
Interrupt Responsive Spinlock Mechanism Based on MCS for Multi-core RTOS

The kernel spinlock has a non-negligible influence on the real-time performance of the multi-core RTOS. In order to protect mutual exclusive kernel data accessed in both task context and interrupt context by CPU-cores, the RTOS kernel uses existing FIFO spinlock algorithm as kernel spinlock must disable interrupt before acquiring lock, which will increase the interrupt response latency in the case of fierce competition for spinlock. In this work, the Interrupt Responsive Spinlock (IRS) mechanism based on MCS algorithm allows the CPU-cores to respond to interrupt during spin waiting, disable interrupt while holding spinlock, therefore the system can respond to interrupts in time without damaging OS kernel critical section. Besides, MCS-IRS maintains the compatibility with MCS semantics so that is transparent to the caller. Experiments show that MCS-IRS can eliminate the impact of spinlock fierce contention on Worst-Case Interrupt Response Latency, and has better multi-core scalability than MCS on Worst-Case Interrupt Disable Time, which can improve the real-time performance of multi-core RTOS.

Jingqiu Zheng, Jiali Bian, Jian Kuang
A Novel Speedup Evaluation for Multicore Architecture Based Topology of On-Chip Memory

Speedup is measured as parallel program potential for evaluation in CMP. Researchers made a lot of contributions in evaluating the speedup of multicore, however the specific architecture is not taken into consideration yet. This article presents a novel method ETOM (Evaluation on Topology of On-chip Memory) to evaluate the speedup of multicore. This method obtains the speedup based on the specific architecture and the memory organization. For verifying the legitimacy of this approach, existing speedup laws and our method are compared in this research work. The use of the method can be more easily to obtain the speedup of multicore architecture, especially at the initial phase of designing a new multicore architecture. Using ETOM, a novel multicore architecture TriBA can be obtained which performance is better than 2DMesh. In the processing of experiments, a lot of detailed characteristics of 2DMesh be found, such as different speedup of each core in different locations. However, each core has the same speedup in TriBA by using ETOM and the performance of speedup in it more than in 2DMesh. So the method is very fit for evaluating performance of specific architectures. And the new architecture TriBA has a reasonable topology of on-chip memory than 2DMesh.

XiaoJun Wang, Feng Shi, Hong Zhang
Improving the Performance of Collective Communication for the On-Chip Network

Efficiently executing the massively parallel applications has become an important goal of developing a modern high-performance multicore computer. In these parallel programs, the collective communication among these cores consume a large portion of inter-core communication. In order to prevent the collective communication from the performance bottleneck of the on-chip network, this paper proposed a new on-chip network, call Hierarchy Self Similar Cubic (HSSC), to reduce the latency of the collective communication on the multicore system. The corresponding transmission mechanisms and packet scheduling mechanism are proposed to analyze and grouping the packets, and determine a suitable transmission mechanism for each packet group on-the-fly. The experiments compare the performance of several on-chip networks. The advantages of proposed transmission mechanisms and packet scheduling mechanism are also discussed.

Slo-Li Chu, Wen-Chih Ho, Yi-Jie Jiang
A Survey of Multicast Communication in Optical Network-on-Chip (ONoC)

Optical Network-on-Chip (ONoC) is an emerging chip-level optical interconnection technology to realise high-performance and power-efficient inter-core communication for many-core processors. For on-chip networks, multicast communication is an important communication pattern, which is not only widely used in parallel computing applications in Chip Multi-Processors (CMPs), but also commonly adopted in emerging areas such as neuromorphic computing. However, the optimisation of multicast communication in an ONoC is not well studied and there is a significant room to improve the performance. In this paper, we present a comprehensive survey of the multicast communication in an ONoC, covering the development of both architecture design and networking design as well as our recent research outcomes. Moreover, we propose the design challenges and future research directions for optimising multicast in an ONoC, which can provide guidance and insight for the future researchers and chip developers.

Wen Yang, Yawen Chen, Zhiyi Huang, Haibo Zhang, Huaxi Gu, Cui Yu
Virtual Network Embedding Based on Core and Coritivity of Graph

Network virtualization is an effective approach to solve the problem of high cost of reconstructing underlying hardware facilities. Network virtualization enables a substrate network to share multiple virtual networks. In order to achieve higher performance with a lower cost, how to effectively embed virtual network to substrate network has become a key issue. Regarding this, we design and implement a heuristic virtual network embedding algorithm based on core and coritivity of graph. This algorithm finds the core nodes in the virtual network and embeds them to the substrate network. The performance of the proposed algorithm is evaluated in comparison with three other algorithms in experiments. Consequently, proposed algorithm is more efficient and improves apparently the runtime of virtual network requests.

Jie Yang, Chenggui Zhao
Non-time-Sharing Full-Duplex SWIPT Relay System with Energy Access Point

Utilizing the idle wireless devices with energy as an additional energy access point (EAP) to supplement energy for the relay, a non-time-sharing full-duplex amplification and forwarding (AF) relay system with energy access points based on radio frequency (RF) signals in wireless networks is proposed. Using AF protocol to cooperative transmit information and simultaneous wireless information and power transfer (SWIPT) technology to realize information and energy synchronous transmission in the energy-constrained relay system. The relay can eliminate the self-interference signal through self-energy recycling in the loop channel, and adopts power splitting scheme for information decoding and energy harvest for RF signals. Moreover, due to the non-time-sharing transmission characteristics, information transmission, energy harvest and cooperative transmission are completed synchronously in a time block. Taking maximize system throughput as optimization target, jointly optimizing the relay transmit power, the relay transmit beamforming vector and the power splitting ratio, and the system transforms the original multivariate non-convex problem into a semi-definite programming problem by using quadratic optimization, variable reduction methods and Lagrange method. Simulation experiments show that under the condition that the total energy harvested is fixed, the operation rate of the system can be promoted effectively by increasing the energy harvested from EAP. And the self-energy recycling of the relay can promote the throughput gain of the system. The experimental results also verify that our proposal the system based on applying non-time-sharing transmission protocol and SWIPT technology has more significant gains in improving system performance than HD-SWIPT and FD-no-SWIPT relay systems.

Zhou Yening, Li Taoshen, Wang Zhe, Ye Jin
Recent Developments in Content Delivery Network: A Survey

With the development of computer networks, the amount of network information continues to grow. To facilitate the transfer of the increasing information, a content distribution network (CDN) is developed by adding an intermediate layer on the existing network. Technically, Caching strategy of CDN is the most important mechanism, which heavily impacts the CDN performance. On the other hand, considering the cost of operating CDN, some strategies have been proposed, aiming to save the CDN cost in terms of, e.g., power energy. This paper makes a brief review on the recent developments of CDN in terms of its caching strategy and operation cost, and discusses some potential development directions of CDN.

Jiaming Zhao, Pingjia Liang, Weiming Liufu, Zhengping Fan

High Performance Systems

Weighted Mean Deviation Similarity Index for Objective Omnidirectional Video Quality Assessment

Objective quality assessment for omnidirectional videos is essential for the watching experience and optimization processes of Virtual Reality (VR) technologies. Currently, the most used objective video quality assessment (VQA) metrics for omnidirectional videos are based on PSNR. Because PSNR usually shows weak consistency with human perception, we adapt the mean deviation similarity index (MDSI) for omnidirectional VQA, and present a weighted mean deviation similarity index (WS-MDSI) for objective omnidirectional VQA. Because an omnidirectional video usually has a large frame resolution and a large number of frames, we also investigate the influence of temporal sampling on VQA. The proposed index WS-MDSI is evaluated on an omnidirectional VQA database. Experimental results show that, compared with the state-of-the-art methods based on PSNR, WS-MDSI increases the performance by 23.5% and 23.25% on the indicators of SROCC and PLCC, respectively. We also found that a 1/4 reduction in temporal resolution does not significantly affect the performance of the objective VQA for omnidirectional video.

Mengzhen Zhong, Junhao Chen, Yuzhen Niu
Tire X-ray Image Defects Detection Based on Adaptive Thresholding Method

Thin line defect is a common and critical kind of tire defect, which may result in serious accidents. This paper proposes a novel adaptive threshold based model for thin line defects detection in tire X-ray images. First, a new adaptive binarization algorithm using column based threshold selecting is proposed. The proposed algorithm outperforms previous algorithms in hard examples, without introducing extra spots or distortions to the original defect area. Next, a tire X-ray image segmentation algorithm is developed, which can divide the image into sectors with different texture features. Finally, an adaptive criterion algorithm is introduced for thin line defection, which can deal with images from different angles of shooting. The proposed model is evaluated on a tire X-ray data set composed of tire images of various thin line types. Experimental results demonstrate that the proposed model obtains significant improvement in terms of both recall rate and precision rate compared with conventional models.

Yuxiang Zhang, Naijie Gu, Xiaoci Zhang, Chuanwen Lin
Halftone Image Reconstruction Based on SLIC Superpixel Algorithm

This paper proposes a halftone image reconstruction based on the SLIC (Simple Linear Iterative Clustering) superpixel algorithm and the affinity propagation algorithm. Firstly, the halftone image is segmented based on SLIC superpixel algorithm. Secondly, the affinity propagation algorithm is used to clustering the regions segmented by superpixel Algorithm. After deleting the background, the image is vectorized. The smooth background image is obtained by the linear smoothing filter and nonlinear smoothing filters. Finally, the vectored boundary and smooth background are combined together to get the reconstructed image. The boundary information is effectively retained during the reconstruction. The proposed method can effectively remove the halftone patterns and screen patterns.

Xinhong Zhang, Boyan Zhang, Fan Zhang
Study on the Method of Extracting Diabetes History from Unstructured Chinese Electronic Medical Record

In this paper, based on the real electronic medical record data of the hospital, a customized method of rule-based learning and information extraction is designed, and three steps are adopted to realize the extraction of Chinese information: sampling and labeling. The medical history information of 600 electronic medical records (including current medical history, past history, personal history, family history, etc.) were randomly selected, and the information needed to be extracted (taking diabetes history as an example) was marked by the labeling platform developed in this study. According to the annotation results, the extraction template is summarized, and the extraction template can be directly used to extract the regular expression extraction rules, and these rules can be used to extract the actual information. The method of manual verification and automatic verification is used to verify the effectiveness of the method. By using the method of natural language processing and rule-based information extraction, an algorithm for extracting customized information from unstructured Chinese electronic medical record text data is designed and implemented. Aiming at the extraction of diabetes history in the hospital, the field verification of a single department has achieved good results.

Chengzhi Niu, Xiaofan Zhao
Deep Residual Optimization for Stereoscopic Image Color Correction

The color correction algorithm is designed to eliminate color discrepancies between image pairs. Compared with the conventional algorithm, color correction for 3D stereoscopic images not only needs to achieve the color consistency of the resulting image and the reference image but also expected to ensure the structural consistency of the resulting image and the target image. For this problem, we propose a stereoscopic image color correction algorithm based on deep residual optimization. First, we get an initial result image by fusing a global color correction image and a dense matching image of the stereo image pair. Then, a residual image optimization scheme is used to improve the structural deformation and color inconsistency of the initial result caused by mismatching and fusion. By combining the target image with the optimized residual image, the structure and clarity of the target image can be retained to the maximum extent. In addition, we use the perceptual loss and per-pixel loss to improve the structural deformation and local color inconsistency while training the optimization network. Experimental results show the effectiveness of our method.

Yuanyuan Fan, Pengyu Liu, Yuzhen Niu
Old Man Fall Detection Based on Surveillance Video Object Tracking

Image recognition technology based on deep learning has made great progress, which makes object detection technology work in many fields. The number of elderly people in China has risen year by year, proclaiming the arrival of an aging society. “The old man can’t fall” is a consensus. Using object detection algorithm to detect the fall of the elderly is a research hotspot in the field of object detection. Through the analysis of the object detection algorithm and the object tracking algorithm, Deep-sort and YOLOv3 algorithms are used to achieve the real-time fall detection of the surveillance video. The experimental results prove that combined with YOLOv3 and the Deep-sort algorithms can detect the fall of the elderly.

Zhao Qiu, Xiaoquan Liang, Qiaoqiao Chen, Xiangsheng Huang, Yiping Wang
Electric Bicycle Violation Automatic Detection in Unconstrained Scenarios

Object detection technology develops rapidly and has broad application prospects. There are few relevant researches on the detection of electric bicycle violation. It is of great practical significance to apply the object detection technology to the detection of electric bicycle violation. The main violations of electric bicycle are not wearing safety helmet, not install license plate, overload and so on. Use YOLOv3 to train the datasets of riding electric bike, safety helmet and license plate to detect electric bicycle violations; The technology of chineseocr is used to identify the electric bicycle license plate number. The experiment proves that the method presented in this paper has a high detection accuracy for the objects of riding electric bike, safety helmet and license plate, but the recognition accuracy for license plate number is a little less.

Zhao Qiu, Qiaoqiao Chen, Xiangsheng Huang, Xiaoquan Liang
Building a Lightweight Container-Based Experimental Platform for HPC Education

HPC (High Performance Computing) is of great significance due to its excellent performance in computing acceleration. However, unlike other techniques in computer science, learning HPC requires advanced computing resources such as large-scale clusters which directly increase the cost of study for students. To help students to learn HPC programming easily, we design and develop a lightweight container-based experimental platform to provide students with easily accessible and customizable HPC practice environments. In our platform, we integrate multiple practical functional modules for students, teachers, and administrators respectively. It is convenient for a user to access high-performance computing resources via a web portal, use highly customizable basic environments and have nice graphical hands-on interactive HPC learning experiences from our platform.

Zelong Wang, Di Wu, Zhenxiao Luo, Yunfei Du
Automatic Generation and Assessment of Student Assignments for Parallel Programming Learning

The course of parallel programming is becoming more and more important for the education of students majoring in computer science. However, it is not easy to learn parallel programming well due to its high theory and practice requirements. In this paper, we design and implement an automatic assignment generation and assessment system to help students learn parallel programming. The assignments can be generated according to user behaviors and thus able to guide students to learn parallel programming step by step. Besides, it can automatically generate an overall assessment of student assignments by using fuzzy string matching, which provides an approximate reference score of objective questions. Subjective questions can be assessed directly by comparing the answer to the reference answer. This system also provides a friendly user interface for students to complete online assignments and let teachers manage their question database. In our teaching practice, students can learn parallel programming more effectively with the help of such an assignment generation and assessment system.

Zhenxiao Luo, Zelong Wang, Di Wu, Xiaojun Hei, Yunfei Du
HSM: A Hybrid and Scalable Metadata Management Method in Distributed File Systems

In the bigdata era, metadata performance is critical in modern distributed file systems. Traditionally, the metadata management strategies like the subtree partitioning method focus on keeping namespace locality, while the other ones like the hash-based mapping method aim to offer good load balance. Nevertheless, none of these methods achieve the two desirable properties simultaneously. To close this gap, in this paper, we propose a novel metadata management scheme, HSM$$^{2}$$2, which combines the subtree partitioning and hash-based mapping method together. We implemented HSM$$^{2}$$2 in CephFS, a widely deployed distributed file systems, and conducted a comprehensive set of metadata-intensive experiments. Experimental results show that HSM$$^{2}$$2 can achieve better namespace locality and load balance simultaneously. Compared with CephFS, HSM$$^{2}$$2 can reduce the completion time by 70% and achieve 3.9$$\times $$× overall throughput speedup for a file-scanning workload.

Yiduo Wang, Youxu Chen, Xinyang Shao, Jinzhong Chen, Liu Yuan, Yinlong Xu


Heuristic Load Scheduling Algorithm for Stateful Cloud BPM Engine

With the increasing popularity of cloud computing technology, the traditional Business Process Management System (BPMS) begins to transform to the architecture that deployed in the cloud. Since the traditional BPMS is often implemented as a stateful single-instance architecture, the cloud BPMS providers will encounter the stateful service load scheduling problem when they refactor the traditional BPMS to the microservice architecture that deployed on the cloud server. In order to help realize the transformation of traditional BPMS and improve the load capacity of cloud BPMS with limited computing resources, we propose a heuristic load scheduling algorithm for stateful service scheduling. The algorithm makes use of the busyness metrics of the single BPMS engine instance microservice in the cloud BPMS architecture. Because the resource scheduling problem is always defined as online bin packing problem, we improve the Best-Fit algorithm to solve this kind of problem. We come up with the Best-Fit Decreasing algorithm based on cloud BPMS engine busyness measuring and load prediction to schedule the computing resources to business process instances. Compared to some common load scheduling algorithms, our algorithm help cloud BPMS increase load level by at least 15%.

Haotian Zhang, Yang Yu, Maolin Pan
An Improved Heuristic-Dynamic Programming Algorithm for Rectangular Cutting Problem

In this paper, we discussed a two-dimensional cutting problem with defects. In this problem, we are given a rectangular area which contains defects. The objective is to cut some rectangles with a given shape and direction from this rectangular area, which cannot overlap the defects, maximizing some profit associated with the rectangles generating by this cutting. We present an improved Heuristic-Dynamic Program (IHDP) to solve this problem and prove the important theorem about its complexity. The algorithm reduces the size of the discretization sets, establishes one-dimensional knapsack problem with the width and height of the small rectangular blocks, constructs two discretization sets using the obtained solution and the right and upper boundaries of the defects, and performs trial cut lines for each element of the discretization set. The algorithm calculates 14 internationally recognized examples. The experimental results show that it obtains the optimal solution of these examples, and its computing time is nearly ten times higher than that of the latest literature algorithm.

Aihua Yin, Chong Chen, Dongping Hu, Jianghai Huang, Fan Yang
Constrained Optimization via Quantum Genetic Algorithm for Task Scheduling Problem

Task scheduling is one of the most important issues on heterogeneous multiprocessor systems. In this paper, the problem is defined as performance-constrained energy optimization. It is a commonly used constrained optimization problem (COP) in practice. Task scheduling for constrained optimization problem is NP problem. It is usually handled by heuristics or meta-heuristics method. Classic quantum genetic algorithm is an excellent meta-heuristics algorithm, but they are hardly ever used to handle COPs because quantum rotation gate can only deal with single objective problem. Moreover, it is difficult to model the task scheduling problems so as to be handled by quantum genetic algorithm. To handles COPs in task scheduling on heterogeneous multiprocessor systems, we propose a new quantum genetic algorithm. In our algorithm, the chromosome consists of task sequence part and mapping part. Task sequence part is generated by list scheduling algorithm which can improve the parallel of the tasks. The mapping part indicates the correspondence between the tasks and the processors which they will run on. The mapping part will be transferred to quantum bits and take part in the evolvement guided by quantum genetic algorithm. Beside, we adopt an adaptive penalty method which belongs to constraint-handling technique to transfer COP into single objective problem. The results in simulations show the superiority of our method compared with state-of-the-art algorithms.

Zihan Yan, Hong Shen, Huiming Huang, Zexi Deng
A Method of Business Process Bottleneck Detection

In order to improve the efficiency of business processes and ensure the timeliness of cases, an approach for bottleneck detection is proposed, which gives a detailed definition of bottleneck in business process. The approach starts from the overall performance of the system and reduces process congestion by detecting and relieving bottlenecks, which is based on the event log analysis. By extracting relevant information like task arrival rate and maximum service rate etc. from the event log, this approach can analyze the historical trends of congestion rate of each task. And finally it combines the task completion time and historical congestion to detect bottleneck. Experiments show that the bottleneck detection method based on event log can better identify the bottleneck in the business process, and solving the bottleneck can effectively improve the case completion rate and average completion time of the process.

Jiexuan Chen, Yang Yu, Maolin Pan
Optimized Layout of the Soil Moisture Sensor in Tea Plantations Based on Improved Dijkstra Algorithm

Based on the clustering center of data, this paper optimizes the data transmission path, and proposes an improved Dijkstra algorithm, which is applied to the path optimization of soil moisture sensors in tea plantations. Firstly, the date of soil moisture in tea plantation is collected under the condition of full coverage of the sensor network. Then, the AP clustering algorithm is used to cluster collected data to obtain the cluster center. Secondly, the dissimilarity values of the soil moisture data and the weighted combination of distance between the sensor nodes are used to identify the edge weights and calculate the adjacency matrix of the Dijkstra algorithm. Finally, with the clustering center as the starting point and the convergence point of wireless sensor network as the end point, Dijkstra algorithm is used to search the path. In order to verify the superiority of the proposed algorithm, the algorithm is compared with the ant colony optimization algorithm. In this paper, the data dissimilarity on the path is 25.0652, the total cost of the path is 0.3613, and the difference between the average soil moisture of the tea plantation is 0.1872 and the number of sensors required is 6, The ant colony algorithm obtained the data dissimilarity on the path of 20.4538, the total cost of the path is 0.5483, and the difference between the average soil moisture of the tea plantation is 0.7321 and the number of sensors required is 9. The test results show that the date of path obtained by this method has the largest dissimilarity and the shortest path, and the data collected by this method is representative, which can accurately reflect the distribution of soil moisture in tea plantations. At the same time, the number of sensors is reduced from 25 to 6, reducing the cost of the system.

Manman Zhang, Wu Zhang, Xun Hong, Yifan Song, Yuan Rao, Yujia Gao, Yunyun Sun
Efficient Algorithms for Constrained Clustering with Side Information

Clustering as an unsupervised machine learning method has broad applications within the area of data science and natural language processing. In this paper, we use background knowledge or side information of the data as constraints to improve clustering accuracy. Following the representation method as in [15], we first format the side information as must-link set and cannot-link set. Then we propose a constrained k-means algorithm for clustering the data. The key idea of our algorithm for clustering must-link data sets is to treat each set as a data with large volume, which is, to assign a set of must-link data as a whole to the center closest to its mass center. In contrast, the key for clustering cannot-link data set is to transform the assignment of the involved data points to the computation of a minimum weight perfect matching. At last, we carried out numerical simulation to evaluate our algorithms for constrained k-means on UCI datasets. The experimental results demonstrate that our method outperforms the previous constrained k-means as well as the classical k-means in both clustering accuracy and runtime.

Zhendong Hao, Longkun Guo, Pei Yao, Peihuang Huang, Huihong Peng
Multi-objective Scheduling of Logistics UAVs Based on Simulated Annealing

This study focuses on the issue of logistics Unmanned Aerial Vehicle (UAV) distribution in urban environment, and an automatic delivery system to support the delivery of packages. It can effectively integrate existing facilities and be easily deployed. There is a scheduling problem in this system with multiple UAVs and multiple flights. We manage to optimize the two objectives of customer satisfaction and total completion time. The scheduling problem is formulated to a Mixed Integer Linear Programming (MILP), and we propose a multiple objectives decision-making method. A special encoding method suitable for the small scale problem is presented, and Simulated Annealing (SA) algorithm framework is used to generate the approximate optimal solution for this problem. In experiments, we calibrate the important parameter and analyze the robustness of the algorithm. The experimental results show that the proposed algorithm is suitable for this problem.

Yixuan Li, Xiaoxiang Yuan, Jie Zhu, Haiping Huang, Min Wu
IFME: Influence Function Based Model Explanation for Black Box Decision Systems

Due to the high precision and the huge prediction needs, machine learning models based decision systems has been widely adopted in all works of life. They were usually constructed as a black box based on sophisticated, opaque learning models. The lack of human understandable explanation to the inner logic of these models and the reason behind the predictions from such systems causes a serious trust issue. Interpretable Machine Learning methods can be used to relieve this problem by providing an explanation for the models or predictions. In this work, we focus on the model explanation problem, which study how to explain the black box prediction model globally through human understandable explanation. We propose the Influence Function based Model Explanation (IFME) method to provide interpretable model explanation based on key training points selected through influence function. First, our method introduces a novel local prediction interpreter, which also utilizes the key training points for local prediction. Then it finds the key training points to the learning models via influence function globally. Finally, we provide the influence function based model agnostic explanation to the model used. We also show the efficiency of our method through both theoretical analysis and simulated experiments.

Benbo Zha, Hong Shen
GPU-Accelerated Parallel Aligning Long Reads with High Error Rate Using Enhanced Sparse Suffix Array

The read alignment (sequence alignment) is one of the most basic and time-consuming problems in Bioinformatics. In this paper, a CPU-GPU parallel long-read alignment method is studied to solve this problem. A lightweight data structure using enhanced sparse suffix array is used to store the index of reference genome in order to adapt to the limited memory capacity of GPU architecture. The two-dimensional search space between the reference genome and long reads is divided into several search sub-spaces. The massive long reads alignment is further divided into the multiple long-read alignments with smaller size. A CPU-GPU parallel algorithm aligning long reads with high error rate is implemented by improving the seeds selection scheme. The experimental results show that the parallel algorithm can accelerate remarkably the long-read alignment while maintaining the alignment accuracy and recall rate as a whole.

Hao Wei, Cheng Zhong, Danyang Chen, Mengxiao Yin, Jinxiong Zhang
An Improved Algorithm for Building Suffix Array in External Memory

A suffix array (SA) is a data structure that has been widely used in many string processing applications, and it can be built on external memory model using the induced sorting (IS) method when the size of input and output exceeds the capacity of internal memory. In this paper, we make our first attempt to improve the performance of DSA-IS using new substring sorting and naming methods in the reduction phase. The experimental results indicate that, our program for the adapted algorithm DSA-IS+ runs as fast as eSAIS and consumes only half as much disk space as the latter on various real-world datasets.

Yi Wu, Bin Lao, Xinghui Ma, Ge Nong
In-Place Suffix Sorting on a Multicore Computer with Better Design

A suffix index built on the suffix array of data is fundamental for true full-text searches over data. The key problem for building the suffix index is how to compute the suffix array by sorting the suffixes of data time and space efficiently. According to the reported results up to date, pSACAK achieves the best suffix sorting performance on the parallel computing model of a multicore machine of internal memory. However, its naming procedure is time-consuming and constitutes a performance bottleneck. This article presents an optimized algorithm called pSACAK+ with a new naming procedure to attack this problem. Our experiments show that, pSACAK+ runs nearly 20% faster than pSACAK with a similar space consumption, which sets a new performance benchmark for suffix sorting on a multicore machine of internal memory.

Jing Yi Xie, Bin Lao, Ge Nong

Security and Privacy

Any Privacy Risk if Nobody’s Personal Information Being Collected?

To alleviate the growing concern about privacy breaches in online services, some systems do not ask users for any demographic information (DI), such as gender or age. Keeping user DI private in such systems seems guaranteed. However, as a trend report, some other systems may publish statistical preference corresponding to users with different DI, e.g. 80% of buyers of one product being young females. Intuitively, such statistical preference will not raise any privacy risk in the former type of systems, since specific personal behaviour or DI cannot be reconstructed from the statistical data. However, in this paper, we will reveal that this is not the case. We propose an unsupervised transfer learning scheme to learn multidimensional DI vectors for individual users and topics with external statistical preference. To facilitate unsupervised learning, we apply it to a rating recommendation task and concatenate DI vector with the implicit preference vector. To compare the privacy risk in scenarios with/without DI available, we choose to make the DI in the datasets of real rating systems to be concealed or not. Experiments show that our unsupervised learning scheme based on external statistical preference in the DI unavailable scenario performs almost as well as the corresponding supervised learning scheme in DI available scenario in the same system. This verifies the existence of privacy risks in systems that do not collect personal information because statistical preferences in similar systems are at fingertips.

Tingting Feng, Xiaoyu Li, Yuchun Guo, Ti Wang, Shan Yang, Zhongyan Du
Blockchain-Based Access Control Schemes for Medical Image Analysis System

Medical image analysis systems with machine learning have played an important role in the computer-aided diagnosis and treatment for diseases. However, individual privacy of user data is vulnerable since the training data is exposed to unauthorized user. Therefore, this paper designs an access control scheme to prevent illegal users from accessing medical data while achieving high accuracy of lesion classification. Specifically, in the novel lightweight consortium blockchain-based access control scheme, a chosen consortium node is utilized as key generation center instead of a trusted third party in conventional schemes. Two public retinal datasets are utilized for the classification of diabetic retinopathy (DR). Security analysis shows that the proposed scheme can prevent the user data from leakage and malicious tampering. Experimental results demonstrate that the processing of data cleaning is efficient to increase the accuracy of the classification for early lesions of DR by removing low quality images, and the accuracy is up to 90.2%.

Tonglai Liu, Jigang Wu, Xinpeng Zhang, Zhihao Peng, Jingyi Li
Research on DDoS Abnormal Traffic Detection Under SDN Network

The seperation of control layer from data layer through SDN (software defined network) enables network administrators to plan the network programmatically without changing network devices, realizing flexible configuration of network devices and fast forwarding of data flows. However, due to its construction, SDN is vulnerable to be attacked by Distributed Denial of Service (DDoS) attack. So it is important to detect DDoS attack in SDN network. This paper presents a DDoS detection scheme based on the machine learning method of SVM (support vector machine) support vector machine in SDN environment. By extracting the flow table information features in SDN network, the data is detected and the data model of DDoS traffic can be trained, and the purpose of DDoS abnormal traffic identification is finally realized.

Zhaohui Ma, Jialiang Huang
An Efficient Group Signature Based Digital Currency System

Digital currency regulation is a hot topic. Traditional privacy-enhanced digital currency system, like the CryptoNote, seeks to protect the privacy of senders and receivers. This paper presents a digital currency system based on the group signature scheme of Boneh et al. The system can protect users’ privacy and enable regulations. The system uses the one-time address technology of the CryptoNote to achieve unlinkability. It uses the group signature and an “OR” proof of the equality of two discrete logarithms to achieve untraceability. The group manager in a group signature can open a problematic transaction, restore the real identity of the sender, and revoke the private key of the sender if needed, which makes the digital currency regulatable.

Haibo Tian, Peiran Luo, Yinxue Su
Detecting Anomalies in Cluster System Using Hybrid Deep Learning Model

Anomaly detection is of great importance for data centers. It could help operations discover system failures and perform root cause analysis. In recent years, deep learning has achieved great results in many fields. Therefore, people begin to pay attention to applying deep learning for automatic anomaly detection. Convolution Neural Network (CNN) and Long Short-Term Memory (LSTM) are two classical structures in deep learning, which could effectively detect anomalies from system logs. However, the existing CNN-based and LSTM-based models have their shortcomings.In this paper, we propose a novel hybrid model for detecting anomalies from system logs, which mainly consist of CNN and LSTM. Considering logs as natural language sequences, the hybrid model embeds logs first and extracts features from embedding vectors with a convolution layer. Then LSTM is used to automatically learn temporal log patterns from normal execution. The proposed model compares the log patterns that occur in practice with the log patterns it has learned before and detects anomalies when practical log patterns deviate from the model trained from log data under normal execution.To evaluate the performance of our model, we conduct experiments over large log data. The experiment results show that the hybrid model combines the advantages of CNN and LSTM models, which is an unsupervised model, reduces the requirements of training data set, and achieves great result in anomaly detection.

Chuming Xiao, Jiaming Huang, Weigang Wu
Customs-Based Blockchain Solution for Exportation Protection

As part of the international trade supply chain, Dubai Customs acts as the gatekeeper, protecting the society and the economy. During the exportation process, one of the primary responsibilities of the Customs Authority is to authenticate the documents submitted with the exportation declaration application, to ensure the legality of the trade. Due to the lack of direct communication channel among the exportation supply chain participants, authenticating the exportation application documents is a time-consuming and challenging task. Typically, there is no direct communication paradigm among the supply chain participants. Therefore, in most cases, the authenticating process relies on human judgment. This increases the chance of non-detection of fraudulent exportation documents. In this regard, the redesign of the exportation supply chain to automate the authentication process has become an essential requirement. This paper addresses this requirement by proposing a blockchain-based system, which establishes a direct, tamper-proof information exchange mechanism among the participants. We target the car exportation supply chain; however, with slight modification, the proposed system can be easily applied to any other exportation supply chain too. To validate the performance of the proposed system, we implement a Proof-of-Concept (POC) using the IBM Hyperledger fabric.

Hussam Juma, Khaled Shaalan, Ibrahim Kamel
Customs-Based Distributed Risk Assessment Method

Customs administration oversees the important processes of facilitating trade and protecting local societies and economies: the former by minimising shipment processing times and the latter by ensuring the lawfulness of trade. One of the main processes that affect the facilitation and protection of trade is the risk of the shipment assessment process. This assessment is performed by analysing available information about shipments to determine whether or not they require physical inspection. When a shipment is identified as suspicious, a physical inspection that can take hours is performed to identify and (dis)confirm the risk factors. In this process, changing trading behaviour by increasing the volume of expected shipments can be a source of pressure. This work proposes a secondary distributed risk assessment method that provides customs administration with an online risk assessment capabilities. The proposed method complements the risk assessments performed at customs administration by providing feedback from the early stage of risk analysis. The results show that the proposed method can provide classification that is 83% accurate on average.

Hussam Juma, Khaled Shaalan, Ibrahim Kamel
Pufferfish Privacy Mechanism Based on Multi-dimensional Markov Chain Model for Correlated Categorical Data Sequences

Differential privacy is a rigorous standard for protecting data privacy and has been extensively used in data publishing and data mining. However, because of its vulnerable assumption that tuples in the database are in-dependent, it cannot guarantee privacy if the data are correlated. Kifer et al. proposed the Pufferfish Privacy framework to protect correlated data privacy, while till now under this framework there is only some practical mechanism for protecting correlations among attributes of one individual sequence. In this paper, we extend this framework to the cases of multiple correlated sequences, in which we protect correlations among individual records, as well as correlations of attributes. Application scenarios can be different people’s time-series data and the objective is to protect each individual’s privacy while publishing useful information. We firstly define privacy based on Pufferfish privacy framework in our application, and when the data are correlated, the privacy level can be assessed through the framework. Then we present a multi-dimensional Markov Chain model, which can be used to accurately describe the structure of multi-dimensional data correlations. We also propose a mechanism to implement the privacy framework, and finally conduct experiments to demonstrate that our mechanism achieves both high utility and privacy.

Zhicheng Xi, Yingpeng Sang, Hanrui Zhong, Yongchun Zhang
Lattice Based Multi-replica Remote Data Integrity Checking for Data Storage on Cloud

With the development and popularity of cloud computing, it is of crucial importance to guarantee cloud security and privacy. Remote data integrity checking (RDIC) makes cloud server capable of proving to users that their data store in the cloud is intact. To ensure the availability and reliability of critical data, users may generate multiple replicas for those data and deploy those replicas on the cloud. However, it is a problem how to check all replicas’ integrity of data saved in cloud. In previous works, some PDP schemes were proposed to solve the auditing problem of multi-replica data’s integrity on cloud servers. In this paper, we proposed a novel lattice based certificateless RDIC scheme to public auditing of outsourced data with multiple replicas. This scheme can eliminate certificate management issue and burden in PKI (Public Key Infrastructure) using users own identity to support the whole verification. Finally, our analysis demonstrates our scheme is efficient and secure.

Yongchun Zhang, Yingpeng Sang, Zhicheng Xi, Hanrui Zhong
Secure Multi-Party Computation on Blockchain: An Overview

Secure multi-party computation (SMPC) is a hot topic in the field of cryptography. It focuses on finishing computation tasks without revealing users’ inputs and outputs in decentralized scenarios. Although many researches have been conducted to perform SMPC protocols, it is hard to obtain fairness while most participants in SMPC are dishonest. Recently, the foundation of cryptocurrency, blockchain has attracted the attention of many scholars. Since blockchain’s ability to provide security and incentives, researchers start to make use of blockchain to provide fairness in SMPC protocols and increase efficiency. In this paper, we present a brief survey on how to use blockchain technology to perform SMPC protocol. We start by introducing the concept of secure computation and its security requirements. Then, we explain how we can utilize blockchain to provide fairness and present the basic model. We summarize state-of-the-art blockchain based SMPC applications and conclude this paper.

Hanrui Zhong, Yingpeng Sang, Yongchun Zhang, Zhicheng Xi
A Survey of Privacy-Preserving Techniques on Trajectory Data

How to protect user’s trajectory privacy while ensuing the user’s access to high quality services is the core of the study of trajectory privacy protection technology. With the rapid development of mobile devices and Location Based Service (LBS), the amount of locations and trajectories of moving objects collected by service providers is continuously increasing. On one hand, the collected trajectories contains rich spatial-temporal information, and its analysis and mining can support a variety of innovative applications. Since trajectories enable intrusive inferences which may expose private information, such as individual habits, behavioral patterns, social relationships and so on, directly publishing trajectories may result in individual privacy vulnerable to various threats. On the other hand, the existing techniques are unable to prevent trajectory privacy leakage, so the complete real-time trajectories of individuals may be exposed when they request for LBS, even if their location privacy is protected by common data protection mechanisms. Therefore, specific techniques for trajectory privacy preserving have been proposed in accordance with different application requirements. In the trajectory data publishing scenario, privacy preserving techniques must preserve data utility. In the LBS scenario, privacy preserving techniques must guarantee high quality of services. In this survey, we overview the key challenges and main techniques of trajectory privacy protection for the above requirements respectively.

Songyuan Li, Hong Shen, Yingpeng Sang

Big Data Processing and Deep Learning

Minimizing Off-Chip Memory Access for Deep Convolutional Neural Network Training

When training convolutional neural networks, a large amount of operations and memory access are in need, which easily lead to the bottleneck of “memory wall” and decrease the computational performance and efficiency. Batch Normalization (BN) can effectively speed up the deep network training convergence, but it has complex data dependence and causes more serious “memory wall” bottleneck. Aiming at the “memory wall” problem occurred in the training for convolutional neural network using BN algorithm, the training method with splitting BN layer and multi-layer fusion calculation is proposed to reduce the memory access in model training. Firstly, by reordering “CONV+BN+RELU” (CBR) block, we trade computation for memory access with extra computation to reduce data accessed during training. Secondly, according to the memory access characteristics of the BN layer, the BN layer is divided into two sub-layers, which are respectively fused with the adjacent layers and the CBR block is recombined into “BN_B+RELU+CONV+BN_A” (BRCB), which further reduces the read-write of the main memory during training and alleviates the “memory wall” bottleneck to improve accelerator computational efficiency. The experimental results show that when using the NVIDIA TESLA V100 GPU to train ResNet-50, Inception V3 and DenseNet models, compared with the original training method, the amount of data accessed using BRCB multi-layer fusion optimization method is reduced by 33%, 22% and 31% respectively, and the actual computing efficiency of V100 is improved by 19%, 18% and 21% respectively.

Jijun Wang, Hongliang Li
Resultant Gradient Flow Method for Multiple Objective Programming Based on Efficient Computing

The process of blending gas transmission contains multiple kinds of influence factors that are related with the achievement of maximal overall profit for a refinery gas company. It is, therefore, a multiple objective optimization problems. To maximize overall profit, we proposes a multiple objective resultant gradient descent method (RGDM) to solve fractional, nonlinear, and multiple objective programming problems. Resultant gradient descent requires a proper direction of multiple objective functions. The proper direction in this paper is computed by gradient flow to approach to the global maximum values. Gradient flow is one of the forms of geometric flow, which is widely used in linear programming, least-squares approximation, optimization, and differential equation. It is the first time to be used in mathematical programming. Resultant gradient flow is calculated in linear programming, and the extremum can directly affect the extremum of the our multiple objective functions. Such steps can indirectly simplify the non-linear objective function by separate the single objective functions so that we can use the gradient flow method to solve the multi-objective problem of non-linear programming. It also embodies the stability and efficiency of the proposed gradient flow.With the case study of a refinery gas company, this paper build a overall profit model. Also, we apply the proposed resultant method to solve this multi-objective problem by a planned strategy of how to supply natural gas to the residential area and schedule the initial coordination. Moreover, its solutions are displayed and compared with a genetic algorithm-based solver in our experimental study.

Bao Feng, Peixin He, Yunyao Li, Junfeng Wu, Peng Li, Haichang Yao, Yimu Ji, Chao Min, Jiekui Zhang, Youtao Li, Peizhuang Wang, Yong Shi, Jing He, Hui Zheng, Yang Wang
Barrage Video Recommendation Method Based on Convolutional Recursive Neural Network

Aiming at solving the problems of traditional video recommendation process, such as low time efficiency and low accuracy, this paper proposes a convolutional recursive neural network video recommendation method based on barrage. Firstly, according to the number of barrage in video, the method selects the preferable video fragments of users, and adopts the k-means clustering method to extract key frames. Secondly, we construct a convolutional recursive neural network model (RCNN) to classify the similar video fragments. Finally, the recommendation can be achieved by the similarity between video fragments. The experimental results show that the proposed recommendation method improves the accuracy of recommendation by 0.22 compared with CTR, 0.18 compared with CDL, and 0.31 compared with ConvMF. So it improves the accuracy of recommendation in a certain extent.

Siyuan Ma, Ping He
A Pipelining Strategy for Accelerating Convolutional Networks on ARM Processors

Convolutional neural networks (CNN) is playing an important role in many fields. Many applications are able to run the inference process of CNN with pre-trained models on mobile devices in these days. Improving performance of embedded processors such as ARM-based CPUs makes it possible to meet the requirement of real-time processing. In this paper, a pipelining strategy is proposed to accelerate convolution networks on ARM processors. We implement a $$3\times 3$$3×3 convolution with Neon instructions which are single instruction and multiple data (SIMD) instructions supported by ARM processors. In order to reduce stalls in the pipeline, issue orders of instructions are rearranged according to the out-of-order execution and dual-issue mechanism on ARM processors. A tiling method is exploited to increase data reuse. The input feature map is divided into multiple $$6\times 6$$6×6 tiles, and the computations within the tile is highly optimized using our proposed pipelining strategy. The speedup of proposed method is 2.88 compared with gcc compiled codes on RK3288. The effect of our optimizing method is measured by a performance profiling tool, cycles and cache misses are decreased significantly. The multi-thread version implemented with openMP achieve speedup of 6.8 compared with single-thread gcc complied version.

Xin Zhou, Rongchun Li, Peng Zhang, Yuntao Liu, Yong Dou
Prediction Model of Suspect Number Based on Deep Learning

With the development of public security informatization, crime prediction has become an important tool for public security organs to carry out accurate attacks and effective governance. In this paper, we propose an algorithm to predict the number of suspects through the feature modeling of historical data. We use Deep Neural Networks (DNN) and machine learning algorithms to extract features of different dimensions of case data. We also use Convolutional Neural Networks (CNN) to extract the text features of case description. These two types of features are combined and fed into fully connected layer and softmax layer. Compared with the DNN model which only uses numeric data, the DNN-CNN model combined with text data has improved the precision rate by 20%. The addition of text data significantly improves the precision and recall rate of prediction. To the best of our knowledge, it is the first time to combine numerical and textual data of case information in crime prediction.

Chuyue Zhang, Manchun Cai, Xiaofan Zhao, Luzhe Cao, Dawei Wang
A Sequence-to-Sequence Transformer Premised Temporal Convolutional Network for Chinese Word Segmentation

The prevalent approaches for the task of Chinese word segmentation almost rely on the Bi-LSTM neural network. However, the methods based the Bi-LSTM have an inherent drawback: the Vanishing Gradients, which cause the little efficient in capturing the faraway character information of a long sentence for the task of word segmentation. In this work, we propose a novel sequence-to-sequence transformer model for Chinese word segmentation, which is premised a type of convolutional neural network named temporal convolutional network. The model uses the temporal convolutional network to construct an encoder, and uses a fully-connected neural network to build a decoder, and applies the Viterbi algorithm to build an inference layer to infer the final result of the Chinese word segmentation. Meanwhile, the model captures the faraway character information of a long sentence by adding the layers of the encoder. For achieving a superior result of word segmentation, the model binds the Conditional Random Fields model to train parameters. The experiments on the Chinese corpus show that the performance of Chinese word segmentation of the model is better than the Bi-LSTM model, and the model has a better ability to process a long sentence than the Bi-LSTM.

Wei Jiang, Yuan Wang, Yan Tang
Parallel Architectures, Algorithms and Programming
herausgegeben von
Hong Shen
Dr. Yingpeng Sang
Springer Singapore
Electronic ISBN
Print ISBN