main-content

## Über dieses Buch

This book constitutes the proceedings of the 21st International Conference on Parallel and Distributed Computing, Applications, and Technologies, PDCAT 2020, which took place in Shenzhen, China, during December 28-30, 2020.

The 34 full papers included in this volume were carefully reviewed and selected from 109 submissions. They deal with parallel and distributed computing of networking and architectures, software systems and technologies, algorithms and applications, and security and privacy.

## Inhaltsverzeichnis

### Blood Leukocyte Object Detection According to Model Parameter-Transfer and Deformable Convolution

Currently, leukocyte detection has the problem of scarcity of labeled samples, so a focal dataset must be expanded by merging multiple datasets. At the same time, given the difference in the dyeing methods, dyeing time, and collection techniques, some datasets have the problem of different homology distributions. Moreover, the effect of direct training after dataset merging is not satisfactory. The morphology of the leukocyte types is also variable and stain contamination occurs, thereby leading to the misjudgment of using traditional convolutional networks. Therefore, in this paper, the model parameter-transfer method is used to alleviate the problem of less leukocyte labeled data in the training model and deformable convolution is introduced into the main network of target detection to improve the accuracy of the object detection model. First, numerous leukocyte datasets are used to train the blood leukocyte binary classification detection network, and the model parameters of the blood leukocyte binary classification detection network are transferred to the blood leukocyte multi classification detection network through the transfer of model parameters. This method can make better use of datasets of the same origin and different distributions so as to solve the problem of scarcity in blood leukocyte data sets. Finally, the multi classification detection network is trained quickly and the accurate blood leukocyte detection results are obtained through fine tuning. The experimental results show that compare our method with the traditional Faster RCNN object detection algorithm, $${mAP}_{0.5}$$ mAP 0.5 is 0.056 higher, $${mAP}_{0.7}$$ mAP 0.7 is 0.119 higher, with higher recall by 4%, and better accuracy by 5%. Thus, the method proposed in this paper can achieve highly accurate leukocyte detection.

Kaizhi Chen, Wencheng Wei, Shangping Zhong, Longkun Guo

In Software Effort Estimation (SEE) practice, the data drought problem has been plaguing researchers and practitioners. Leveraging heterogeneous SEE data collected by other companies is a feasible solution to relieve the data drought problem. However, how to make full use of the heterogeneous effort data to conduct SEE, which is called as Heterogeneous Software Effort Estimation (HSEE), has not been well studied. In this paper, we propose a HSEE model, called Dynamic Heterogeneous Software Effort Estimation (i.e., DHSEE), which leverages the adversarial auto-encoder and convolutional neural network techniques. Meanwhile, we have investigated the scenario of conducting HSEE with dynamically increasing effort data. Experiments on ten public datasets indicate that our approach can significantly outperform the state-of-the-art HSEE method and other competing methods on both static and dynamic SEE scenarios.

Fumin Qi, Xiao-Yuan Jing, Xiaoke Zhu, Xiaodong Jia, Li Cheng, Yichuan Dong, Ziseng Fang, Fei Ma, Shengzhong Feng

### A Novel Distributed Reinforcement Learning Method for Classical Chinese Poetry Generation

Poetry generation has been a classic natural language generation task recently. But so far the methods for this topic mainly imitate and reproduce the poems on the training data set, which indicates that they either have not much connotation or overfit too much like plagiarism of the existing poems. To solve this problem, unlike previous work, instead of tuning the trade-off between connotation and innovation, we propose a distributed reinforcement learning framework, which consists of two stages of training, to generate creative and meaningful poetry. At the first stage we train a model in parallel on a large poetry corpus at word level to master how poets write poems. At the second stage we train the model with a distributed architecture to learn how connotation is developed in human literary art works at sentence level and force the model to imitate itself when it composes some ‘good poems’ to further improve performance. Experiments on generating classical Chinese poetry demonstrate that the proposed model is able to achieve better performance and the high efficiency of training compared to the state-of-the-art.

Liangliang Ma, Hong Shen, Shangsong Liang

### Memory Access Optimization of High-Order CFD Stencil Computations on GPU

Stencils computations are a class of computations commonly found in scientific and engineering applications. They have relatively lower arithmetic intensity. Therefore, their performance is greatly affected by memory access. This paper studies the issue of memory access optimization for the key stencil computations of a high-order CFD program on the NVidia GPU. Two methods are used to optimize the performance. First, we use registers to cache the data used by the stencil computations in the kernel. We use the CUDA warp shuffle functions to exchange data between neighboring grid points, and adjust the thread computation granularity to increase the data reuse. Second, we use the shared memory to buffer the grid data used by the stencil computations in the kernel, and utilize loop tiling to reduce redundant accesses to the global memory. Performance evaluation is done on an NVidia Tesla K80 GPU. The results show that compared to the original implementation that only uses the global memory, the optimized implementation that utilizes the registers achieves a maximum speedup of 2.59 and 2.79 relatively for 15M and 60M grids, and the optimized implementation that utilizes the shared memory achieves a maximum speedup of 3.51 and 3.36 relatively for 15M and 60M grids.

Shengxiang Wang, Zhuoqian Li, Yonggang Che

### The Dataflow Runtime Environment of DFC

In this paper, we introduce the DFC dataflow language and its runtime environment. DFC runtime library is in charge of constructing the DAG of the dataflow graph, firing the DFC tasks and the synchronizations between the tasks of successive passes. Basing on an elaborately implemented thread pool and queued Active Data, DFC runtime shows an ideal performance comparing with DSPatch. The experiment of a simple dataflow graph shows that DFC has better performance for the cases that the parallelism beneath the core number, while DSPatch shows a better scalability for the cases of the parallelism exceed the core number. As DFC is still a prototype language, it lacks the ability to construct the DAG dynamically, which leads to low efficiency when coding for a well-structured dataflow graph.

Jing Zhang, Jinrong Li, Zheng Du, Jiwu Shu, Qiuming Luo

### Adaptive Tensor-Train Decomposition for Neural Network Compression

It could be of great difficulty and cost to directly apply complex deep neural network to mobile devices with limited computing and endurance abilities. This paper aims to solve such problem through improving the compactness of the model and efficiency of computing. On the basis of MobileNet, a mainstream lightweight neural network, we proposed an Adaptive Tensor-Train Decomposition (ATTD) algorithm to solve the cumbersome problem of finding optimal decomposition rank. For its non-obviousness in the forward acceleration of GPU side, our strategy of choosing to use lower decomposition dimensions and moderate decomposition rank, and the using of dynamic programming, have effectively reduced the number of parameters and amount of computation. And then, we have also set up a real-time target network for mobile devices. With the support of sufficient amount of experiment results, the method proposed in this paper can greatly reduce the number of parameters and amount of computation, improving the model’s speed in deducing on mobile devices.

Yanwei Zheng, Yang Zhou, Zengrui Zhao, Dongxiao Yu

### Development of a UAV Path Planning Approach for Multi-building Inspection with Minimal Cost

This paper presents a UAV path planning approach for multi-building inspection, which is a new application for UAV path planning. It generates helix paths for single building inspection first and defines the possible points for collecting inspection data with reasonable time slots. After inspecting one building, the UAV flies to another building with a trajectory based on a cost matrix and a visited vector defined in this algorithm. The planning of the entire inspection path is evaluated considering several factors, such as distance, time, and altitude. The proposed algorithm is applied to historical giant communal homes, Fujian Tulou, consisting of five buildings.

Shiwei Lin, Xiaoying Kong, Jack Wang, Ang Liu, Gengfa Fang, Yunlong Han

### Construction of Completely Independent Spanning Tree Based on Vertex Degree

Interconnection networks have been extensively studied in the field of parallel computer systems. In the interconnection network, completely independent spanning tree (CISTs) plays an important role in the reliable transmission, parallel transmission, and safe distribution of information. Two spanning trees $$T_1$$ T 1 and $$T_2$$ T 2 of graph G are completely independent if, for any two distinct vertices u and v of G, the two paths from u to v on $$T_1$$ T 1 and $$T_2$$ T 2 are internally disjoint. The spanning trees $$T_1, T_2, \ldots , T_k$$ T 1 , T 2 , … , T k of G are completely independent spanning trees if they are pairwise completely independent. In 2015, Hasunuma proof that G has $$\lfloor \frac{n(G)}{k}\rfloor$$ ⌊ n ( G ) k ⌋ CISTs if $$\delta (G) \ge n(G) - k$$ δ ( G ) ≥ n ( G ) - k , $$3 \le k \le \frac{n(G)}{2}$$ 3 ≤ k ≤ n ( G ) 2 and $$n(G) \ge 7$$ n ( G ) ≥ 7 . In this paper, we prove that G has $$\lfloor \frac{5n(G)}{12} \rfloor$$ ⌊ 5 n ( G ) 12 ⌋ CISTs if $$\delta (G) \ge n(G)-2$$ δ ( G ) ≥ n ( G ) - 2 and $$n(G) \ge 12$$ n ( G ) ≥ 12 , and G has $$(t+1)$$ ( t + 1 ) -CISTs if $$\delta (G) \ge n(G)-3$$ δ ( G ) ≥ n ( G ) - 3 and $$n(G) = 3t - 2 \ge 23$$ n ( G ) = 3 t - 2 ≥ 23 .

Ningning Liu, Yujie Zhang, Weibei Fan

### Distributed Algorithm for Truss Maintenance in Dynamic Graphs

Cohesive subgraphs are applied in various fields. Mining cohesive components such as k-truss have attracted a lot of effort to improve time efficiency in large-scale graphs. The k-truss is a subgraph where each edge is contained in at least $$k-2$$ k - 2 triangles and the problem of truss decomposition is computing the k-trusses of a graph for all k. However, most graphs in real scenarios are usually changing over time. The previous studies take the static graphs as input, and the truss maintenance in dynamic graphs receives little attention. This paper focuses on distributed algorithms for truss maintenance. We present a distributed model underlying the real distributed processing model Pregel. Based on the model, we propose truss decomposition and truss maintenance algorithms. To confirm the effectiveness and efficiency of the proposed algorithms, we conduct extensive experiments over both real-world and synthetic graphs.

Qi Luo, Dongxiao Yu, Hao Sheng, Jiguo Yu, Xiuzhen Cheng

### Short-Term Load Forecasting Based on CNN-BiLSTM with Bayesian Optimization and Attention Mechanism

Short-term power load forecasting is quite vital in maintaining the balance between power production and power consumption of the power grid. Prediction accuracy not only affects the power grid construction, but also influences the economic development of the power grid. This paper proposes a short-term load forecasting based on Convolutional Neural Networks and Bidirectional Long Short-Term Memory (CNN-BiLSTM) with Bayesian Optimization (BO) and Attention Mechanism (AM). The BiLSTM is good at time series forecasting, and the Attention Mechanism can help the model to focus on the important part of the BiLSTM output. In order to make the forecasting performance of the model as good as possible, the Bayesian Optimization is used to tune the hyperparameters of the model. The input of the model is history load, time slot, and meteorological factors. In order to eliminate the seasonal influence, the data set is divided into four subsets with respect to four seasons. The performance of the proposed model is compared with other forecasting models by MAE, RMSE, MAPE, and $$R^2$$ R 2 score. The experiment results show that the proposed model fits the actual values best and has the best forecasting performance among the contrast models.

Kai Miao, Qiang Hua, Huifeng Shi

### On the Non-ergodic Convergence Rate of the Directed Nonsmooth Composite Optimization

This paper considers the distributed “nonsmooth+nonsmooth” composite optimization problems for which n agents collaboratively minimize the sum of their local objective functions over the directed networks. In particular, we focus on the scenarios where the sought solutions are desired to possess some structural properties, e.g., sparsity. However, to ensure the convergence, most existing methods produce an ergodic solution via the averaging schemes as the output, which causes the desired structural properties of the output to be destroyed. To address this issue, we develop a new decentralized stochastic proximal gradient method, termed DSPG, in which the nonergodic (last) iteration acts as the output. We also show that the DSPG method achieves the nonergodic convergence rate $$O(\log (T)/\sqrt{T})$$ O ( log ( T ) / T ) for generally convex objective functions and $$O(\log (T)/T)$$ O ( log ( T ) / T ) for strongly convex objective functions. When the structure-enhancing regularization is absent and the simple and suffix averaging schemes are used, the convergence rates of DSPG reach $$O(1/\sqrt{T})$$ O ( 1 / T ) for generally convex objective functions and O(1/T) for strongly convex objective functions, showing improvement relative to the rates $$O(\log (T)/\sqrt{T})$$ O ( log ( T ) / T ) and $$O(\log (T)/T)$$ O ( log ( T ) / T ) provided by the existing methods. Simulation examples further illustrate the effectiveness of the proposed method.

Yichuan Dong, Zhuoxu Cui, Yong Zhang, Shengzhong Feng

### 6D Pose Estimation Based on the Adaptive Weight of RGB-D Feature

In the task of 6D pose estimation by RGB-D image, the crucial problem is how to make the most of two types of features respectively from RGB and depth input. As far as we know, prior approaches treat those two sources equally, which may overlook that the different combinations of those two properties could have varying degrees of impact. Therefore, we propose a Feature Selecting Mechanism (FSM) in this paper to find the most suitable ratio of feature dimension from RGB image and point cloud (converted from depth image) to predict the 6D pose more effectively. We first conduct artificial selection in our Feature Selecting Mechanism (FSM) to prove the potential for the weight of the RGB-D feature. Afterward, the neural network is deployed in our FSM to adaptively pick out features from RGB-D input. Through our experiments on the LINEMOD dataset, YCB-Video dataset, and our multi-pose synthetic image dataset, we show that there is an up to 2% improvement in the accuracy by utilizing our FSM, compared to the state-of-the-art method.

Gengshen Zhang, Li Ning, Liangbing Feng

### Blockchain-Based Secure Outsourcing of Fully Homomorphic Encryption Using Hidden Ideal Lattice

The efficiency of homomorphic encryption has always affected its practicality. With the dawn of internet of things, the demand for computation and encryption on lightweight devices is increasing. Complex cryptographic computing is an important burden for lightweight devices, but outsourcing provides great convenience for them. In this paper, based on blockchain, we propose a secure outsourcing scheme for Fully Homomorphic Encryption using Hidden Ideal Lattice (FHEHIL), in which the time-consuming operations (including modular exponentiation and polynomial multiplication) are outsourced. For polynomial multiplication, we propose a secure outsourcing algorithm that reduces the local computation cost to $$O\left(n\right)$$ O n . Previous work based on Fast Fourier Transform can only achieve $$O\left(nlog(n)\right)$$ O n l o g ( n ) for the local cost. Through security analysis, our scheme achieves the goals of privacy protection against passive attackers and cheating detection against active attackers. Experiments also demonstrate our scheme is more efficient in comparison with the non-outsourcing FHEHIL.

Mingyang Song, Yingpeng Sang, Yuying Zeng, Shunchao Luo

### Multiple Projections Learning for Dimensional Reduction

Locality Preserving Projection (LPP) is a dimensional reduction method that has been widely used in various fields. While traditional LPP only uses a single projection matrix to reduce the dimension and preserve the locality structure of data, it may cause the single matrix may not handle these two tasks well at the same time. Therefore, in this paper, we proposed relaxed sparse locality presenting projection (RSLPP) which introduces two different projection matrices to better accomplish the two tasks. The addition of another projection matrix can help the original projection matrix has more freedom to select the appropriate feature for preserving the local structure of data. The experimental results on two data sets prove the effectiveness of the method.

Lin Jiang, Xiaozhao Fang, Na Han

### Preventing DDoS Attacks on Bitcoin Memory Pool by the Dynamic Fee Threshold Mechanism

Blockchain is a well-known distributed technology which can be regarded as a decentralized database and combines the peer-to-peer architecture, cryptography and consensus mechanism. As the first and most famous application of blockchain and cryptocurrencies, Bitcoin also suffers from different kinds of attacks. Several studies have been done to analyze and solve these attacks. In this paper, we research the Distributed Denial-of-Service (DDoS) on Bitcoin’s memory pool (Mempool). We analyze the feasibility of this kind of attack based on the current mining and relay progress of Bitcoin, and propose the possible effects caused by this attack from some novel aspects of Bitcoin, including the confirmation time, market price and legitimate user. We further present the Dynamic Fee Threshold Mechanism to counter this attack. Our method uses several characteristics to distinguish the normal transactions in Bitcoin from the malicious spam transactions. We also make an analysis and experiments on the method, to demonstrate its effectiveness and accuracy of detecting the malicious transactions.

Shunchao Luo, Yingpeng Sang, Mingyang Song, Yuying Zeng

### The Compiler of DFC: A Source Code Converter that Transform the Dataflow Code to the Multi-threaded C Code

The working principle of DFC compiler is introduced in this article. DFC is a grammatical extension of standard C language, with special DF function which describe the dependence of computing DAG. DFC compiler, dfcc, is used to convert the DFC source codes to standard C codes with the assistance of multi-threaded library. The lexical rules and grammatical rules are used to setup the AST of DFC codes, and then it is converted to the AST of standard C without DF function nodes. The derived AST is printed into a text file, a normal C file, and finally is processed by GCC to obtain the executable file. The experiment gives some demonstration of that converting details, and the memory footprint is studied to show an ideal scalability.

Zheng Du, Jing Zhang, Jinrong Li, Haixin Du, Jiwu Shu, Qiuming Luo

### Online Learning-Based Co-task Dispatching with Function Configuration in Edge Computing

Edge computing is a promising cloud computing paradigm that reduces computing latency by deploying edge servers near data sources and users, which is of great importance to implement delay-sensitive applications like AR, Cloud Gaming and Auto Driving. Due to the limited resources of edge servers, task dispatching and function configuration are the key to fully utilize edge servers. Moreover, a typical task request in edge computing (called a co-task) is consisted of a set of subtasks, where the task completion time is determined by the latest completed subtask. In this work, we propose a scheme named OnDisco, which combines reinforcement learning and heuristic methods to minimize the average completion time of co-tasks. Compared with heuristic algorithm, deep reinforcement learning can learn the inherent characteristics of the environment without any prior knowledge, and OnDisco is therefore well adapted to varying environments. Simulations on Alibaba traces shows that OnDisco reduces the average task completion time by $$58\%$$ 58 % and $$76\%$$ 76 % compared with the heuristic and random algorithm, respectively. Moreover, OnDisco outperforms the baselines consistently in various data environments and parameter settings.

Wanli Cao, Haisheng Tan, Zhenhua Han, Shuokang Han, Mingxia Li, Xiang-Yang Li

### System-Level FPGA Routing for Logic Verification with Time-Division Multiplexing

Multi-FPGA prototype design is widely used to verify modern VLSI circuits, but the limited number of connections between FPGAs in a multi-FPGA system may cause routing failure. Therefore, using time-division multiplexing (TDM) technology, multiple signals are transmitted through the same routing channel to improve utilization. However, the performance of this type of system depends on the routing quality within the FPGAs due to the signal delay between FPGA pairs. In this paper, we propose a system-level routing method based on TDM to minimize the maximum TDM ratio that satisfies the strict ratio constraint. Firstly, we weight the edges and use two methods to build approximate minimum Steiner trees (MST) to route the nets. Then we propose a ratio assignment method based on edge-demand which satisfy the TDM ratio constraint. We tested our method with the benchmarks provided by 2019 CAD Contest at ICCAD and compared it with the top two. The experimental results shows that our method not only solves all problems but also achieves a good TDM ratio.

Long Sun, Longkun Guo, Peihuang Huang

### Protein Interresidue Contact Prediction Based on Deep Learning and Massive Features from Multi-sequence Alignment

Predicting the corresponding 3D structure from the protein’s sequence is one of the most challenging tasks in computational biology, and a confident interresdiue contact map serves as the main driver towards ab initio protein structure prediction. Benefiting from the ever-increasing sequence databases, residue contact prediction has been revolutionized recently by the introduction of direct coupling analysis and deep learning techniques. However, existing deep learning contact prediction methods often rely on a number of external programs and are therefore computationally expensive. Here, we introduce a novel contact prediction method based on fully convolutional neural networks and extensively extracted evolutionary features from multi-sequence alignment. The results show that our deep learning model based on a highly optimized feature extraction mechanism is very effective in interresidue contact prediction.

Huiling Zhang, Hao Wu, Hing-Fung Ting, Yanjie Wei

### See Fine Color from the Rough Black-and-White

Image super-resolution and colorization are two important research fields in computer vision. In previous studies, they have been considered separately as two unrelated tasks. However, for the task of restoring gray video to high-definition color video, when the network learns to abstract features from low-resolution images and maps them to high-resolution images, the abstract understanding of images by the network is also useful for colorization task. Treating them as two unrelated tasks have to construct two different models, which needs more time and resources. In this paper, we propose a framework to combine the tasks of image super-resolution and colorization together. We design a new network model to directly map low-resolution gray images into high-resolution color images. Moreover, this model can obtain motion information of objects in the video by predicting surrounding frames with the current frame. Thus, video super-resolution and colorization can be realized. To support studying super-resolution and colorization together, we build a video dataset containing three scenes. As far as we know, this is the first dataset for such kinds of tasks combination.

Jingjing Wu, Li Ning, Chan Zhou

### Data Aggregation Aware Routing for Distributed Training

For distributed training, the communication overhead for parameter synchronization is heavy in the network. Data aggregation can efficiently alleviate network overheads. However, existing works on data aggregation are based on the streaming message data, which can not well adapt to the discrete communication for parameter synchronization. This paper formulates a data aggregation aware routing problem, with the objective of minimizing training finishing time for global model under the constraint of cache capacity. The problem is formulated as a mixed-integer non-linear programming problem, and it is proved to be NP-Hard. Then we propose a data aggregation aware routing algorithm to solve the formulated problem, by transmitting the data to the closest aggregation node in greedy to reduce the network overhead. Simulation results show that, the proposed algorithm can reduce average training finishing time by $$74\%$$ 74 % , and it can reduce the network overhead by $$33\%$$ 33 % on average, compared with the shortest path algorithm.

Zhaohong Chen, Xin Long, Yalan Wu, Long Chen, Jigang Wu, Shuangyin Liu

### A New Integer Programming Model for the File Transfer Scheduling Problem

Scheduling the transfer of files in distributed computer systems is an increasing concern. The purpose of this paper is to propose a new time-index integer programming model for the file transfer scheduling problem with integer file length and port restrictions, which is to design a schedule to transfer series of files with different file length while minimizing overall completion time. By different formulations of variables and constraints, our new model contains fewer constraints and variables. And we also propose an enhanced technique to reduce the number of constraints further by utilizing the elementary lower bound and upper bound. Experimental results show that compared to existing best model, our model with the enhanced technique can solve the same instance with only 14.3% of the time. Moreover, our model can solve larger instances (up to 100 vertices and 600 files) to optimality that can not be solved by existing models.

Jingwen Yang, Jiaxin Li, Xin Han

### Approximation Algorithms for the General Cluster Routing Problem

Graph routing problem (GRP) and its generalizations have been extensively studied because of their broad applications in the real world. In this paper, we study a variant of GRP called the general cluster routing problem (GCRP). Let $$G=\left( V,\,E\right)$$ G = V , E be a complete undirected graph with edge weight $$c\left( e\right)$$ c e satisfying the triangle inequality. The vertex set is partitioned into a family of clusters $$C_{1},...,C_{m}$$ C 1 , . . . , C m . We are given a required vertex subset $$\overline{V} \subseteq V$$ V ¯ ⊆ V and a required edge subset $$\overline{E} \subseteq E$$ E ¯ ⊆ E . The aim is to compute a minimum cost tour that visits each vertex in $$\overline{V}$$ V ¯ exactly once and traverses each edge in $$\overline{E}$$ E ¯ at least once, while the vertices and edges of each cluster are visited consecutively. When the starting and ending vertices of each cluster are specified, we devise an approximation algorithm via incorporating Christofides’ algorithm with minimum weight bipartite matching, achieving an approximation ratio that equals the best approximation ratio of path TSP. Then for the case there are no specified starting and ending vertices for each cluster, we propose a more complicated algorithm with a compromised ratio of 2.67 by employing directed spanning tree against the designated auxiliary graph. Both approximation ratios improve the state-of-art approximation ratio that are respectively 2.4 and 3.25.

Longkun Guo, Bin Xing, Peihuang Huang, Xiaoyan Zhang

### Maximizing Group Coverage in Social Networks

Groups play a crucial role in decision-making of social networks, since individual decision-making is often influenced by groups. This brings the Group Influence Maximization (GIM) problem which aims to maximize the expected number of activated groups by finding k seed nodes. The GIM problem has been proved NP-hard while computing the objective function is $$\#P$$ # P -hard under Independent Cascade (IC) model. We propose an algorithm called Maximizing Group Coverage (MGC) which greedily selects the best node based on evaluating the contribution of nodes to the groups, ensuring the success of approximating the maximization of the number of activated groups. Finally, we compare the MGC algorithm with the baseline algorithm called Maximum Coverage (MC) through experiments, demonstrating that MGC outperforms MC under IC model regarding the average number of activated groups.

Yuting Zhong, Longkun Guo, Peihuang Huang

### LightLayers: Parameter Efficient Dense and Convolutional Layers for Image Classification

Deep Neural Networks (DNNs) have become the de-facto standard in computer vision, as well as in many other pattern recognition tasks. A key drawback of DNNs is that the training phase can be very computationally expensive. Organizations or individuals that cannot afford purchasing state-of-the-art hardware or tapping into cloud hosted infrastructures may face a long waiting time before the training completes or might not be able to train a model at all. Investigating novel ways to reduce the training time could be a potential solution to alleviate this drawback, and thus enabling more rapid development of new algorithms and models. In this paper, we propose LightLayers, a method for reducing the number of trainable parameters in DNNs. The proposed LightLayers consists of LightDense and LightConv2D layers that are as efficient as regular Conv2D and Dense layers but uses less parameters. We resort to Matrix Factorization to reduce the complexity of the DNN models resulting in lightweight DNN models that require less computational power, without much loss in the accuracy. We have tested LightLayers on MNIST, Fashion MNIST, CIFAR 10, and CIFAR 100 datasets. Promising results are obtained for MNIST, Fashion MNIST, and CIFAR-10 datasets whereas CIFAR 100 shows acceptable performance by using fewer parameters.

Debesh Jha, Anis Yazidi, Michael A. Riegler, Dag Johansen, Håvard D. Johansen, Pål Halvorsen

### The Hybrid Navigation Method in Face of Dynamic Obstacles

With the development of society and lifestyle changes. Robo-tic navigation based on traditional navigation and positioning techniques can hardly cope with navigation in complex dynamic scenarios. Traditional navigation and positioning algorithms, such as Dijkstra and A*, cannot perform path planning in dynamic scenarios. In this paper, we present an improved reinforcement learning-based algorithm for local path planning that allows it to still perform well when there are more dynamic obstacles. The algorithm has the advantage of low cost to transform into others situation and fast training speed over the navigation algorithm based entirely on reinforcement learning.

Kaidong Zhao, Li Ning

### A Relaxed Balanced Lock-Free Binary Search Tree

This paper presents a new relaxed balanced concurrent binary search tree using a single word compare and swap primitive, in which all operations are lock-free. Our design separates balancing actions from update operations and includes a lock-free balancing mechanism in addition to the insert, search, and relaxed delete operations. Search in our design is not affected by ongoing concurrent update operations or by the movement of nodes by tree restructuring operations. Our experiments show that our algorithm performs better than other state-of-the-art concurrent BSTs.

Manish Singh, Lindsay Groves, Alex Potanin

### A Dynamic Parameter Tuning Method for High Performance SpMM

Sparse matrix-matrix multiplication (SpMM) is a basic kernel that is used by many algorithms. Several researches focus on various optimizations for SpMM parallel execution. However, a division of a task for parallelization is not well considered yet. Generally, a matrix is equally divided into blocks for processes even though the sparsities of input matrices are different. The parameter that divides a task into multiple processes for parallelization is fixed. As a result, load imbalance among the processes occurs. To balance the loads among the processes, this paper proposes a dynamic parameter tuning method by analyzing the sparsities of input matrices. The experimental results show that the proposed method improves the performance of SpMM for examined matrices by up to 39.5% and 12.3% on average.

Bin Qi, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi

### Data Caching Based Transfer Optimization in Large Scale Networks

Xinxin Han, Guichen Gao, Yang Wang, Hing-Fung Ting, Yong Zhang

### Second-Order Convolutional Neural Network Based on Cholesky Compression Strategy

In the past few years, Convolution Neural Network (CNN) has been successfully applied to many computer vision tasks. Most of these networks can only extract first-order information from input images. The second-order statistical information refers to the second-order correlation obtained by calculating the covariance matrix, the fisher information matrix, or the vector outer product operation on the local feature group according to the channels. It has been shown that using second-order information on facial expression datasets can better capture the distortion of facial area features, while at the same time generate more parameters which may cause much more computational cost. In this article we propose a new CNN structure including layers which can (i) incorporate first-order information into the covariance matrix; (ii) use eigenvalue vectors to measure the importance of feature channels; (iii) reduce the bilinear dimensionality of the parameter matrix; and (iv) perform Cholesky decomposition on the positive definite matrix to complete the compression of the second-order information matrix. Due to the incorporation of both first-order and second-order information and the Cholesky compression strategy, our proposed method reduces the number of parameters by half of the SPDNet model, and simultaneously achieves better results in facial expression classification tasks than the corresponding first-order model and the reference second-order model.

Yan Li, Jing Zhang, Qiang Hua

### Submodular Maximization with Bounded Marginal Values

We study the problem of maximizing non-monotone submodular functions subject to a p-independence system constraint. Although the submodularity ratio has been well-studied in maximizing set functions under monotonic scenario, the defined parameter may bring hardness of approximation for the maximization of set functions in the non-monotonic case. In this work, utilizing a lower bound for the marginal values, we investigate the Repeated Greedy introduced by (Feldman et al. 2017) and obtain a parameterized performance guarantee for the above constrained submodular maximization problem.

Ruiqi Yang, Suixiang Gao, Changjun Wang, Dongmei Zhang

### A Streaming Model for Monotone Lattice Submodular Maximization with a Cardinality Constraint

In this paper, we consider a streaming model of maximizing monotone lattice submodular function with a cardinality constraint on the integer lattice. As (lattice) submodularity does not imply the diminishing return property on the integer lattice, we introduce the Sieve-Streaming algorithm combining with a modified binary search subroutine to solve the problem. We also show it is with an approximation ratio $$1/2-\epsilon$$ 1 / 2 - ϵ , a memory complexity $$O( \epsilon ^{-1} k\log k)$$ O ( ϵ - 1 k log k ) , and a query complexity $$O( \epsilon ^{-2}\log ^2 k )$$ O ( ϵ - 2 log 2 k ) per element.

Zhenning Zhang, Longkun Guo, Linyang Wang, Juan Zou

### The Prize-Collecting k-Steiner Tree Problem

In this paper, we study the prize-collecting k-Steiner tree problem (PC k-ST), which is an interesting generalization of both the k-Steiner tree problem (k-ST) and the prize-collecting Steiner tree problem (PCST). In the PC k-ST, we are given an undirected connected graph $$G =(V, E)$$ G = ( V , E ) , a subset $$R \subseteq V$$ R ⊆ V called terminals, a root vertex $$r \in V$$ r ∈ V and an integer k. Every edge has a non-negative edge cost and every vertex has a non-negative penalty cost. We wish to find an r-rooted tree F that spans at least k vertices in R so as to minimize the total edge costs of F as well as the penalty costs of the vertices not in F. As our main contribution, we propose two approximation algorithms for the PC k-ST with ratios of 5.9672 and 5. The first algorithm is based on an observation of the solutions for the k-ST and the PCST, and the second one is based on the technique of primal-dual.

Lu Han, Changjun Wang, Dachuan Xu, Dongmei Zhang

### Optimal Algorithm of Isolated Toughness for Interval Graphs

Factor and fractional factor are widely used in many fields related to computer science. The isolated toughness of an incomplete graph G is defined as $$i\tau (G)=\min \{\frac{|S|}{i(G-S)}:S\in C(G), i(G-S)>1\}$$ i τ ( G ) = min { | S | i ( G - S ) : S ∈ C ( G ) , i ( G - S ) > 1 } . Otherwise, we set $$i\tau (G)=\infty$$ i τ ( G ) = ∞ if G is complete. This parameter has a close relationship with the existence of factors and fractional factors of graphs. In this paper, we pay our attention to computational complexity of isolated toughness, and present an optimal polynomial time algorithm to compute the isolated toughness for interval graphs, a subclass of co-comparability graphs.

Fengwei Li, Qingfang Ye, Hajo Broersma, Xiaoyan Zhang

### Graceful Performance Degradation in Apache Storm

The concept of stream data processing is becoming challenging in most business sectors where try to improve their operational efficiency by deriving valuable information from unstructured, yet, contentiously generated high volume raw data in an expected time spans. A modern streamlined data processing platform is required to execute analytical pipelines over a continues flow of data-items that might arrive in a high rate. In most cases, the platform is also expected to dynamically adapt to dynamic characteristics of the incoming traffic rates and the ever-changing condition of underlying computational resources while fulfill the tight latency constraints imposed by the end-users. Apache Storm has emerged as an important open source technology for performing stream processing with very tight latency constraints over a cluster of computing nodes. To increase the overall resource utilization, however, the service provider might be tempted to use a consolidation strategy to pack as many applications as possible in a (cloud-centric) cluster with limited number of working nodes. However, collocated applications can negatively compete with each other, for obtaining the resource capacity in a shared platform that, in turn, the result may lead to a severe performance degradation among all running applications.The main objective of this work is to develop an elastic solution in a modern stream processing ecosystem, for addressing the shared resource contention problem among collocated applications. We propose a mechanism, based on design principles of Model Predictive Control theory, for coping with the extreme conditions in which the collocated analytical applications have different quality of service (QoS) levels while the shared-resource interference is considered as a key performance limiting parameter. Experimental results confirm that the proposed controller can successfully enhance the $$p$$ p -99 latency of high priority applications by 67%, compared to the default round robin resource allocation strategy in Storm, during the high traffic load, while maintaining the requested quality of service levels.