Skip to main content

2024 | Book

Neural Information Processing

30th International Conference, ICONIP 2023, Changsha, China, November 20–23, 2023, Proceedings, Part II

Editors: Biao Luo, Long Cheng, Zheng-Guang Wu, Hongyi Li, Chaojie Li

Publisher: Springer Nature Singapore

Book Series : Lecture Notes in Computer Science


About this book

The six-volume set LNCS 14447 until 14452 constitutes the refereed proceedings of the 30th International Conference on Neural Information Processing, ICONIP 2023, held in Changsha, China, in November 2023.
The 652 papers presented in the proceedings set were carefully reviewed and selected from 1274 submissions. They focus on theory and algorithms, cognitive neurosciences; human centred computing; applications in neuroscience, neural networks, deep learning, and related fields.

Table of Contents


Theory and Algorithms

Distributed Nash Equilibrium Seeking of Noncooperative Games with Communication Constraints and Matrix Weights

Distributed Nash equilibrium seeking is investigated in this paper for a class of multi-agent systems under intermittent communication and matrix-weighted communication graphs. Different from most of the existing works on distributed Nash equilibrium seeking of noncooperative games where the players (agents) can communicate continuously over time, the players considered in this paper are assumed to exchange information only with their neighbors during some disconnected time intervals while the underlying communication graph is matrix-weighted. A distributed Nash equilibrium seeking algorithm integrating gradient strategy and leader-following consensus protocol is proposed for the noncooperative games with intermittent communication and matrix-weighted communication graphs. The effect of the average intermittent communication rate on the convergence of the distributed Nash equilibrium seeking algorithm is analyzed, and a lower bound of the average intermittent communication rate that ensures the convergence of the algorithm is given. The convergence of the algorithm is established by means of Lyapunov stability theory. Simulations are presented to verify the proposed distributed Nash equilibrium seeking algorithm.

Shuoshuo Zhang, Jianxiang Ren, Xiao Fang, Tingwen Huang
Accelerated Genetic Algorithm with Population Control for Energy-Aware Virtual Machine Placement in Data Centers

Energy efficiency is crucial for the operation and management of cloud data centers, which are the foundation of cloud computing. Virtual machine (VM) placement plays a vital role in improving energy efficiency in data centers. The genetic algorithm (GA) has been extensively studied for solving the VM placement problem due to its ability to provide high-quality solutions. However, GA’s high computational demands limit further improvement in energy efficiency, where a fast and lightweight solution is required. This paper presents an adaptive population control scheme that enhances gene diversity through population control, adaptive mutation rate, and accelerated termination. Experimental results show that our scheme achieves a 17% faster acceleration and 49% fewer generations compared to the standard GA for energy-efficient VM placement in large-scale data centers.

Zhe Ding, Yu-Chu Tian, Maolin Tang, You-Gan Wang, Zu-Guo Yu, Jiong Jin, Weizhe Zhang
A Framework of Large-Scale Peer-to-Peer Learning System

Federated learning (FL) is a distributed machine learning paradigm in which numerous clients train a model dispatched by a central server while retaining the training data locally. Nonetheless, the failure of the central server can disrupt the training framework. Peer-to-peer approaches enhance the robustness of system as all clients directly interact with other clients without a server. However, a downside of these peer-to-peer approaches is their low efficiency. Communication among a large number of clients is significantly costly, and the synchronous learning framework becomes unworkable in the presence of stragglers. In this paper, we propose a semi-asynchronous peer-to-peer learning system (P2PLSys) suitable for large-scale clients. This system features a server that manages all clients but does not participate in model aggregation. The server distributes a partial client list to selected clients that have completed local training for local model aggregation. Subsequently, clients adjust their own models based on staleness and communicate through a secure multi-party computation protocol for secure aggregation. Through our experiments, we demonstrate the effectiveness of P2PLSys for image classification problems, achieving a similar performance level to classical FL algorithms and centralized training.

Yongkang Luo, Peiyi Han, Wenjian Luo, Shaocong Xue, Kesheng Chen, Linqi Song
Optimizing 3D UAV Path Planning: A Multi-strategy Enhanced Beluga Whale Optimizer

The goal of 3D UAV path planning problem is to assist the UAV in planning a flight path with the lowest total overhead cost. In this paper, we present a novel approach to address the problem by incorporating flight distance, threat cost, flight altitude and path smoothness constraints into a comprehensive cost function. The current popular metaheuristic algorithm is utilized to solve for the closest globally optimal UAV flight path. To overcome the challenges of local optima and slow convergence associated with the conventional Beluga Whale Optimizer (BWO), this paper proposes a modified beluga whale optimizer (OGGBWO) based on random opposition-based learning strategy, adaptive Gauss variational operator and elitist group genetic strategy. Extensive experiments conducted on the CEC2022 test set and four distinct terrain scenarios of varying complexity demonstrate that the OGGBWO algorithm outperforms classical and state-of-the-art metaheuristics. It achieves superior optimization performance across all 12 CEC2022 test functions and exhibits exceptional convergence in generating flight paths with the lowest total cost function in diverse terrain scenarios.

Chen Ye, Wentao Wang, Shaoping Zhang, Peng Shao
Interactive Attention-Based Graph Transformer for Multi-intersection Traffic Signal Control

With the exponential growth in motor vehicle numbers, urban traffic congestion has become a pressing issue. Traffic signal control plays a pivotal role in alleviating the problem. In modeling multi-intersection, most studies focus on communication with regional intersections. They rarely consider the cross-regional. To address the above limitation, we construct an interactive attention-based graph transformer network for traffic signal control (GTLight). Specifically, the model considers correlations between cross-regional intersections using an interactive attention mechanism. In addition, the model designs a phase-timing optimization algorithm to solve the problem of overestimation of Q-value in signal timing strategies. We validate the effectiveness of GTLight on different traffic datasets. Compared to the recent graph-based reinforcement learning method, the average travel time is improved by 28.16%, 26.56%, 25.79%, 26.46%, and 19.59%, respectively.

Yining Lv, Nianwen Ning, Hengji Li, Li Wang, Yanyu Zhang, Yi Zhou
PatchFinger: A Model Fingerprinting Scheme Based on Adversarial Patch

As deep neural networks (DNNs) gain great popularity and importance, protecting their intellectual property is always the topic. Previous model watermarking schemes based on backdoors require explicit embedding of the backdoor, which changes the structure and parameters. Model fingerprinting based on adversarial examples does not require any modification of the model, but is limited by the characteristics of the original task and not versatile enough. We find that adversarial patch can be regarded as an inherent backdoor and can achieve the output of specific categories injected. Inspired by this, we propose PatchFinger, a model fingerprinting scheme based on adversarial patch which is applied to the original samples as a model fingerprinting through a specific fusion method. As a model fingerprinting scheme, PatchFinger does not sacrifice the accuracy of the source model, and the characteristics of the adversarial patch make it more flexible and highly robust. Experimental results show that PatchFinger achieves an ARUC value of 0.936 in a series of tests on the Tiny-ImageNet dataset, which exceeds the baseline by 19%. When considering average query accuracy, PatchFinger gets 97.04% outperforming the method tested.

Bo Zeng, Kunhao Lai, Jianpeng Ke, Fangchao Yu, Lina Wang
Attribution of Adversarial Attacks via Multi-task Learning

Deep neural networks (DNNs) can be easily fooled by adversarial examples during inference phase when attackers add imperceptible perturbations to original examples. Many works focus on adversarial detection and adversarial training to defend against adversarial attacks. However, few works explore the tool-chains behind adversarial examples, which is called Adversarial Attribution Problem (AAP). In this paper, AAP is defined as the recognition of three signatures, i.e., attack algorithm, victim model and hyperparameter. Existing works transfer AAP into a single-label classification task and ignore the relationship among above three signatures. Actually, there exists owner-member relationship between attack algorithm and hyperparameter, which means hyperparameter recognition relies on the result of attack algorithm classification. Besides, the value of hyperparameter is continuous, hence hyperparameter recognition should be regarded as a regression task. As a result, AAP should be considered as a multi-task learning problem rather than a single-label classification problem or a single-task learning problem. To deal with above problems, we propose a multi-task learning framework named Multi-Task Adversarial Attribution (MTAA) to recognize above three signatures simultaneously. It takes the relationship between attack algorithm and the corresponding hyperparameter into account and uses the uncertainty weighted loss to adjust the weights of three recognition tasks. The experimental results on MNIST and ImageNet show the feasibility and scalability of the proposed framework.

Zhongyi Guo, Keji Han, Yao Ge, Yun Li, Wei Ji
A Reinforcement Learning Method for Generating Class Integration Test Orders Considering Dynamic Couplings

In recent years, with the rapid development of artificial intelligence, reinforcement learning has made significant progress in various fields. However, there are still some challenges when applying reinforcement learning to solve problems in software engineering. The generation of class integration test orders is a key challenge in object-oriented program integration testing. Previous research mainly focused on static couplings and neglected dynamic couplings, leading to inaccurate cost measurement of class integration test orders. In this paper, we propose a reinforcement learning method to generate class integration test orders considering dynamic couplings. Firstly, the concept of dynamic couplings generated by polymorphism is introduced, and a strategy for measuring the stubbing complexity of simulating dynamic dependencies is proposed. Then, we combine this new stubbing complexity with a reinforcement learning method to generate class integration test orders and achieve the optimal result with minimal overall stubbing complexity. Comprehensive experiments show that our proposed approach outperforms other methods in measuring the cost of generating class integration test orders.

Yanru Ding, Yanmei Zhang, Guan Yuan, Yingjie Li, Shujuan Jiang, Wei Dai
A Novel Machine Learning Model Using CNN-LSTM Parallel Networks for Predicting Ship Fuel Consumption

With continuous increasing of carbon emission, prediction of ship fuel consumption is gaining significance in reduction of energy consumption and emissions for ships. This paper proposes a novel model of parallel network by combining convolutional neural network and long short-term memory (CNN-LSTM). The proposed model integrates three advantages. The CNN part of proposed model can extract spatial features, the LSTM part of proposed model can capture temporal relationships, and the parallel structure of proposed model can obtain feature fusion from both of CNN and LSTM based on multi-source data. Experimental outcomes reveal that CNN-LSTM parallel networks can obtain best results of MAE and RMSE, which outperformed single LSTM, single CNN and other neural networks with decreasing of 48.06%, 64.06% and 48.56% in MAE, and 35.71%, 58.25% and 37.85% in RMSE.

Xinyu Li, Yi Zuo, Tieshan Li, C. L. Philip Chen
Two-Stage Attention Model to Solve Large-Scale Traveling Salesman Problems

The Traveling Salesman Problem (TSP) widely exists in real-world scenarios. Various methods, such as exact methods, heuristic methods, and deep learning-based methods, can solve TSPs efficiently. However, as the size of the problems increases, these methods become increasingly time-consuming due to the high complexity of large-scale TSPs. This paper proposes a two-stage attention model (TSAM) that incorporates the divide-and-conquer strategy and attention model to solve large-scale TSPs efficiently. Experimental results demonstrate that TSAM can rapidly produce promising solutions for TSP instances ranging from 500 to 10,000 nodes.

Qi He, Feng Wang, Jingge Song
Learning Primitive-Aware Discriminative Representations for Few-Shot Learning

Few-shot Learning (FSL) aims to learn a classifier that can be easily adapted to recognize novel classes with only a few labeled examples. Recently, some works about FSL have yielded promising classification performance, where the image-level feature is used to calculate the similarity among samples for classification. However, the image-level feature ignores abundant fine-grained and structural information of objects that could be transferable and consistent between seen and unseen classes. How can humans easily identify novel classes with several samples? Some studies from cognitive science argue that humans recognize novel categories based on primitives. Although base and novel categories are non-overlapping, they share some primitives in common. Inspired by above research, we propose a Primitive Mining and Reasoning Network (PMRN) to learn primitive-aware representations based on metric-based FSL model. Concretely, we first add Self-supervision Jigsaw task (SSJ) for feature extractor parallelly, guiding the model encoding visual pattern corresponding to object parts into feature channels. Moreover, to mine discriminative representations, an Adaptive Channel Grouping (ACG) method is applied to cluster and weight spatially and semantically related visual patterns to generate a set of visual primitives. To further enhance the discriminability and transferability of primitives, we propose a visual primitive Correlation Reasoning Network (CRN) based on Graph Convolutional network to learn abundant structural information and internal correlation among primitives. Finally, a primitive-level metric is conducted for classification in a meta-task based on episodic training strategy. Extensive experiments show that our method achieves state-of-the-art results on miniImageNet and Caltech-UCSD Birds.

Jianpeng Yang, Yuhang Niu, Xuemei Xie, Guangming Shi
Time-Series Forecasting Through Contrastive Learning with a Two-Dimensional Self-attention Mechanism

Contrastive learning methods have impressive capabilities in time-series representation; however, challenges in capturing contextual consistency and extracting features that meet the requirements of representation learning remain. To address these problems, this study proposed a time-series prediction contrastive learning model based on a two-dimensional self-attention mechanism. The main innovations of this model were as follows: First, long short-term memory (LSTM) adaptive pruning was used to form two subsequences with overlapping parts to provide robust context representation for each timestamp. Second, the model extracted sequence data features in both global and local dimensions. In the channel dimension, the model encoded sequence data using a combination of a self-attention mechanism and dilated convolution to extract key features for capturing long-term trends and periodic changes in data. In the spatial dimension, the model adopted a sliding-window self-attention mechanism to encode sequence data, thereby improving its perceptual ability for local features. Finally, the model introduced a self-correlation attention mechanism that converted the similarity calculation from the real domain to the frequency domain through a Fourier transform, better capturing the periodicity and trends in the data. The experimental results showed that the proposed model outperformed existing models in multiple time-series prediction tasks, demonstrating its effectiveness and feasibility in time-series prediction tasks.

Linling Jiang, Fan Zhang, Mingli Zhang, Caiming Zhang
Task Scheduling with Multi-strategy Improved Sparrow Search Algorithm in Cloud Datacenters

How to efficiently schedule tasks is the focus of cloud computing. Combining the task scheduling characteristics of the cloud computing environment, a multi-strategy improved sparrow search algorithm (MISSA) that takes into account task completion time, task completion cost and load balancing is proposed. First, the initialization of the population using piecewise linear chaotic map (PWLCM) enhances the degree of individual dispersion. After that, the global search phase in the marine predator algorithm (MPA) is incorporated to increase the scope of the search space. The introduction of dynamic adjustment factors in the joiner part strengthens the search ability of the algorithm in the early stage and the convergence ability in the late stage. Finally, the greedy strategy is used to update the joiner’s position so that the information of the optimal solution and the worst solution can be uesd to guide the next generation of position updates. Using CloudSim for simulation, the experimental results show that the proposed algorithm has a shorter task completion time and a more balanced system load. Compared with the ant colony optimization (ACO), MPA, and sparrow search algorithm (SSA), the MISSA improves the integrated fitness function values by 20 $$\%$$ % , 22 $$\%$$ % , and 17 $$\%$$ % , confirming the feasibility of the proposed algorithm.

Yao Liu, Wenlong Ni, Yang Bi, Lingyue Lai, Xinyu Zhou, Hua Chen
Advanced State-Aware Traffic Light Optimization Control with Deep Q-Network

The former traffic light control (TLC) system cannot effectively regulate the traffic conditions dynamically in real time due to urban growth. The Dueling Double Deep Recurrent Q-Network with Attention Mechanism (3DRQN-AM) method for TLC is proposed in this study. The proposed method is based on Deep Q-Network and employs target network, double learning method and dueling network to boost its learning efficiency. In order to integrate the past state of the vehicle’s motion trajectory with the current state of the vehicle for the best decision-making, the Long-Short Term Memory (LSTM) is introduced. While this is going on, an Attention Mechanism is introduced to help the neural network automatically focus on crucial state components and improve its capacity to represent state. According to experimental findings, the Dueling Double Deep Q-Network with Attention Mechanism (3DQN-AM), Dueling Double Deep Recurrent Q-Network (3DRQN), Dueling Double Deep Q-Network (3DQN), Fixed-Time-3DRQN-AM (FT-3DRQN-AM) signal management methods are compared. The techniques presented in this work lower the average waiting time under typical traffic flow by about 46.2%, 53.3%, 85.1%, and 30.0% respectively, and the average queue length by about 41.9%, 44.6%, 76.0%, and 21.7% respectively. Under peak traffic conditions, the average waiting time is decreased by around 20.8%, 32.1%, 36.7%, and 38.7% respectively, while the average queue is decreased by roughly 2.8%, 2.8%, 21.3%, and 44.9%.

Wenlong Ni, Zehong Li, Peng Wang, Chuanzhaung Li
Impulsive Accelerated Reinforcement Learning for  Control

This paper revisits reinforcement learning for $$H_\infty $$ H ∞ control of affine nonlinear systems with partially unknown dynamics. By incorporating an impulsive momentum-based control into the conventional critic neural network, an impulsive accelerated reinforcement learning algorithm with a restart mechanism is proposed to improve the convergence speed and transient performance compared to traditional gradient descent-based techniques or continuously accelerated gradient methods. Moreover, by utilizing the quasi-periodic Lyapunov function method, sufficient condition for input-to-state stability with respect to approximation errors of the closed-loop system is established. A numerical example with comparisons is provided to illustrate the theoretical results.

Yan Wu, Shixian Luo, Yan Jiang
MRRC: Multi-agent Reinforcement Learning with Rectification Capability in Cooperative Tasks

Motivated by the centralised training with decentralised execution (CTDE) paradigm, multi-agent reinforcement learning (MARL) algorithms have made significant strides in addressing cooperative tasks. However, the challenges of sparse environmental rewards and limited scalability have impeded further advancements in MARL. In response, MRRC, a novel actor-critic-based approach is proposed. MRRC tackles the sparse reward problem by equipping each agent with both an individual policy and a cooperative policy, harnessing the benefits of the individual policy’s rapid convergence and the cooperative policy’s global optimality. To enhance scalability, MRRC employs a monotonic mix network to rectify the state-action value function Q for each agent, yielding the joint value function $${Q_{tot}}$$ Q tot to facilitate global updates of the entire critic network. Additionally, the Gumbel-Softmax technique is introduced to rectify discrete actions, enabling MRRC to handle discrete tasks effectively. By comparing MRRC with advanced baseline algorithms in the “Predator-Prey” and challenging “SMAC” environments, as well as conducting ablation experiments, the superior performance of MRRC is demonstrated in this study. The experimental results reveal the efficacy of MRRC in reward-sparse environments and its ability to scale well with increasing numbers of agents.

Sheng Yu, Wei Zhu, Shuhong Liu, Zhengwen Gong, Haoran Chen
Latent Causal Dynamics Model for Model-Based Reinforcement Learning

Learning an accurate dynamics model is the key task for model-based reinforcement learning (MBRL). Most existing MBRL methods learn the dynamics model over states. But in most cases, the relationships among states are complex because the states are affected by the interaction of various factors in the environment. Recently some works are proposed to learn the dynamics model on latent representations space. But the learned model is dense and may contain spurious associations between latent representations. To deal with these problems, we introduce a latent causal dynamics model over latent representations and provide a learning method for MBRL. Specifically, we first learn the latent representations from the observed state space. Second, we learn a latent causal dynamics model among latent representations by a causal discovery method. Finally, the latent causal dynamics model is used to aid policy learning. The above steps are iterative to update the unified loss function until convergence. Experimental results on four tasks show that the performance of our proposed method benefits from the causality and the learned latent representations.

Zhifeng Hao, Haipeng Zhu, Wei Chen, Ruichu Cai
Gradient Coupled Flow: Performance Boosting on Network Pruning by Utilizing Implicit Loss Decrease

Network pruning prior to training makes generalization more challenging than ever, while recent studies mainly focus on the trainability of the pruned networks in isolation. This paper explores a new perspective on loss implicit decrease of the data to be trained caused by one-batch training during each round, whose first-order approximation we term gradient coupled flow. We thus present a criterion sensitive to gradient coupled flow (GCS), which is hypothesized to capture those weights most sensitive to performance boosting at initialization. Interestingly, our explorations show there exists a linear correlation between generalization and implicit loss decrease based measurements on previous works as well as GCS, which ideally describes causes of accuracy fluctuation in a fine-grained manner. Our code is made public at: .

Jiaying Wu, Xiatao Kang, Jingying Xiao, Jiayi Yao
Motif-SocialRec: A Multi-channel Interactive Semantic Extraction Model for Social Recommendation

To capture complex interaction semantics beyond pairwise relationships for social recommendation, a novel recommendation model, namely Motif-SocialRec, is proposed under the perspective of motif. It efficiently describes interaction pattern from multi-channel with different motifs. In the model, we depict a series of local structures by motif, which can describe the high-level interactive semantics in the fused network from three views. By employing hypergraph convolution network, representations that preserve potential semantic patterns can be learned. Additionally, we enhance the learned representations by establishing self-supervised learning tasks on different scales to further explore the inherent characteristics of the network. Finally, a joint optimization model is constructed by integrating the primary and auxiliary tasks to produce recommendation predictions. Results of extensive experiments on four real-world datasets show that Motif-SocialRec significantly outperforms baselines in terms of different evaluation metrics.

Hangyuan Du, Yuan Liu, Wenjian Wang, Liang Bai
Dual Channel Graph Neural Network Enhanced by External Affective Knowledge for Aspect Level Sentiment Analysis

Aspect-level sentiment analysis is a prominent technology in natural language processing (NLP) that analyzes the sentiment polarity of target words in a text. Despite its long history of development, current methods still have some shortcomings. Mainly, they lack the integration of external affective knowledge, which is crucial for allocating attention to aspect-related words in syntactic and semantic information processing. Additionally, the synergy between syntactic and semantic information is often neglected, with most approaches focusing on only one dimension. To address these issues, we propose a knowledge-enhanced dual-channel graph neural network. Our model incorporates external affective knowledge into both the semantic and syntactic channels in different ways, then utilizes a dynamic attention mechanism to fuse information from these channels. We conducted experiments on Semeval2014, 2015, and 2016 datasets, and the results showed significant improvements compared to existing methods. Our approach bridges the gaps in current techniques and enhances performance in aspect-level sentiment analysis.

Hu Jin, Qifei Zhang, Xiubo Liang, Yulin Zhou, Wenjuan Li
New Predefined-Time Stability Theorem and Applications to the Fuzzy Stochastic Memristive Neural Networks with Impulsive Effects

The paper mainly investigates the issue of achieving predefined-time synchronization for fuzzy memristive neural networks with both impulsive effects and stochastic disturbances. Firstly, due to the fact that the existed predefined-time stability theorems can hardly be applied to systems with impulsive effects, a new predefined-time stability theorem is proposed to solve the stability problem of the systems with impulsive effects. The theorem is flexible and can guide impulsive stochastic fuzzy memristive neural network models to achieve predefined-time synchronization. Secondly, due to the limitation problems for sign function that it can easily lead to cause the chattering phenomenon, resulting in undesirable results such as decreased synchronization performance. A novel and effective feedback controller without the sign function is designed to eliminate this chattering phenomenon in the paper. In addition, The paper overcomes the comprehensive influence of fuzzy logic, memristive state dependence and stochastic disturbance, and gives the effective conditions to ensure that two stochastic systems can achieve the predefined-time synchronization. Finally, the effectiveness of the proposed theoretical results is demonstrated in detail through a numerical simulation.

Hui Zhao, Lei Zhou, Qingjie Wang, Sijie Niu, Xizhan Gao, Xiju Zong
FE-YOLOv5: Improved YOLOv5 Network for Multi-scale Drone-Captured Scene Detection

Due to the different angles and heights of UAV shooting, the shooting environment is complex, and the shooting targets are mostly small, so the target detection task in the drone-captured scene is still challenging. In this study, we present a highly precise technique for identifying objects in scenes captured by drones, which we refer to as FE-YOLOv5. First, to optimize cross-scale feature fusion and maximize the utilization of shallow feature information, we propose a novel feature pyramid model called MSF-BiFPN as our primary approach. Furthermore, to improve the fusion of features at different scales and boost their representational power, our innovative approach proposes an adaptive attention module. Moreover, we propose a novel feature enhancement module that effectively strengthens high-level features before feature fusion. This module effectively minimized feature loss during the fusion process, ultimately resulting in enhanced detection accuracy. Finally, the utilization of the normalized Wasserstein distance serves as a novel metric for enhancing the model’s sensitivity and accuracy in detecting small targets. The experimental results of FE-YOLOv5 on the VisDrone data set show that mAP 0.5 has increased by 7.8 $$\%$$ % , and mAP 0.5:0.95 increased by 5.7 $$\%$$ % . At the same time, the training results of the model at $$960 \times 960$$ 960 × 960 image resolution are better than the current YOLO series models, among which mAP 0.5 can reach 56.3 $$\%$$ % . Based on the experiments conducted, it has been demonstrated that the FE-YOLOv5 model effectively enhances the accuracy of object detection in UAV capture scenes.

Chen Zhao, Zhe Yan, Zhiyan Dong, Dingkang Yang, Lihua Zhang
An Improved NSGA-II for UAV Path Planning

Palm oil is an edible vegetable oil that can be used in a wide range of products across different industries ranging from food and beverages, personal care and cosmetics, animal feed, industrial products, to biofuel. The palm oil industry contributes slightly less than 4% of Malaysia’s overall GDP, making it the country’s second-largest producer and exporter of palm oil worldwide. In Malaysia, it has been estimated that there are around 500,000 plantation workers in palm oil industries. In addition to getting a sufficient and steady supply of such usually low skilled workers, there are also issues related to the limits of the human body in performing tough physical work. As a result, UAVs may be utilized to support some of the processes in the palm oil businesses. However, the power of the batteries used in these UAVs is finite before they need to be recharged. Hence, the flight path for the UAV should be optimally computed for it to be able to cover the area it is assigned. In this paper, an improved Non-Dominated Sorting Genetic Algorithm II (NSGA-II) was developed to compute the optimal flight path of UAVs which also includes the turning angle and elevation. Enhancements to the algorithm is done by improving the selection, crossover, and mutation operations of the genetic algorithm which helps to improve the convergence and diversity of the algorithm beside avoiding getting trapped in local optimal solutions. In the majority of the tests, the improved NSGA-II was able to generate paths that are better than those identified by the human expert. Moreover, the proposed improved NSGA-II algorithm was able to compute good paths in less than the threshold of 10 min.

Wei Hang Tan, Weng Kin Lai, Pak Hen Chen, Lee Choo Tay, Sheng Siang Lee
Reimagining China-US Relations Prediction: A Multi-modal, Knowledge-Driven Approach with KDSCINet

Statistical models and data driven models have achieved remarkable results in international relation forecasting. However, most of these models have several common drawbacks, including (i) rely on large amounts of expert knowledge, limiting the objectivity, applicability, usability, interpretability and sustainability of models, (ii) can only use structured unimodal data or cannot make full use of multimodal data. To address these two problems, we proposed a Knowledge-Driven neural network architecture that conducts Sample Convolution and Interaction, named KDSCINet, for China-US relation forecasting. Firstly, we filter events pertaining to China-US relations from the GDELT database. Then, we extract text descriptions and images from news articles and utilize the fine-tuned pre-trained model MKGformer to obtain embeddings. Finally we connect textual and image embeddings of the event with the structured event value in GDELT database through multi-head attention mechanism to generate time series data, which is then feed into KDSCINet for China-US relation forecasting. Our approach enhances prediction accuracy by establishing a knowledge-driven temporal forecasting model that combines structured data, textual data and image data. Experiments demonstrate that KDSCINet can (i) outperform state-of-the-art methods on time series forecasting problem in the area of international relation forecasting, (ii) improving forecasting performance through the use of multimodal knowledge.

Rui Zhou, Jialin Hao, Ying Zou, Yushi Zhu, Chi Zhang, Fusheng Jin
A Graph Convolution Neural Network for User-Group Aided Personalized Session-Based Recommendation

Session-based recommendation systems aim to predict the next user interaction based on the items with which the user interacts in the current session. Currently, graph neural network-based models have been widely used and proven more effective than others. However, these session-based models mainly focus on the user-item and item-item relations in historical sessions while ignoring information shared by similar users. To address the above issues, a new graph-based representation, User-item Group Graph, which considers not only user-item and item-item but also user-user relations, is developed to take advantage of natural sequential relations shared by similar users. A new personalized session-based recommendation model is developed based on this representation. It first generates groups according to user-related historical item sequences and then uses a user group preference recognition module to capture and balance between group-item preferences and user-item preferences. Comparison experiments show that the proposed model outperforms other state-of-art models when similar users are effectively grouped. This indicates that grouping similar users can help find deep preferences shared by users from the same group and is instructive in finding the most appropriate next item for the current user.

Hui Wang, Hexiang Bai, Jun Huo, Minhu Yang
Disentangling Node Metric Factors for Temporal Link Prediction

Temporal Link Prediction (TLP), as one of the highly concerned tasks in graph mining, requires predicting the future link probability based on historical interactions. On the one hand, traditional methods based on node metrics, such as Common Neighbor, achieve satisfactory performance in the TLP task. On the other hand, node metrics overly focus on the global impact of nodes while neglecting the personalization of different node pairs, which can sometimes mislead link prediction results. However, mainstream TLP methods follow the standard paradigm of learning node embedding, entangling favorable and harmful node metric factors in the representation, reducing the model’s robustness. In this paper, we propose a plug-and-play plugin called Node Metric Disentanglement, which can apply to most TLP methods and boost their performance. It explicitly accounts for node metrics and disentangles them from the embedding representations generated by TLP methods. We adopt the attention mechanism to reasonably select information conducive to the TLP task and integrate it into the node embedding. Experiments on various state-of-the-art methods and dynamic graphs verify the effectiveness and universality of our NMD plugin.

Tianli Zhang, Tongya Zheng, Yuanyu Wan, Ying Li, Wenqi Huang
Action Prediction for Cooperative Exploration in Multi-agent Reinforcement Learning

Multi-agent reinforcement learning methods have shown significant progress, however, they continue to exhibit exploration problems in complex and challenging environments. To address the above issue, current research has introduced several exploration-enhanced methods for multi-agent reinforcement learning, they are still faced with the issues of inefficient exploration and low performance in challenging tasks that necessitate complex cooperation among agents. This paper proposes the prediction-action Qmix (PQmix) method, an action prediction-based multi-agent intrinsic reward construction approach. The PQmix method employs the joint local observation of agents and the next joint local observation after executing actions to predict the real joint action of agents. The method calculates the action prediction error as the intrinsic reward to measure the novel of the joint state and encourages agents to actively explore the action and state spaces in the environment. We compare PQmix with strong baselines on the MARL benchmark to validate it. The result of experiments demonstrates that PQmix outperforms the state-of-the-art algorithms on the StarCraft Multi-Agent Challenge (SMAC). In the end, the stability of the method is verified by experiments.

Yanqiang Zhang, Dawei Feng, Bo Ding
SLAM: A Lightweight Spatial Location Attention Module for Object Detection

Aiming to address the shortcomings of current object detection models, including a large number of parameters, the lack of accurate localization of target bounding boxes, and ineffective detection, this paper proposes a lightweight spatial location attention module (SLAM) that achieves adaptive adjustment of the attention weights of the location information in the feature map while greatly improving the feature representation capability of the network by learning the spatial location information in the input feature map. First, the SLAM module obtains the spatial distribution of the input feature map in the horizontal, vertical, and channel directions through the average pooling and maximum pooling operations, then generates the corresponding location attention weights by computing convolution and activation functions, and finally achieves the weighted feature map by aggregating the features along the three spatial directions respectively. Extensive experiments show that the SLAM module improves the detection performance of the model on the MS COCO dataset and the PASCAL VOC 2012 dataset with almost no additional computational overhead.

Changda Liu, Yunfeng Xu, Jiakui Zhong
A Novel Interaction Convolutional Network Based on Dependency Trees for Aspect-Level Sentiment Analysis

Aspect-based sentiment analysis aims to identity the sentiment polarity of a given aspect-based word in a sentence. Due to the complexity of sentences in the texts, the models based on the graph neural network still have issues in the accurately capturing the relationship between aspect words and viewpoint words in sentences, failing to improve the accuracy of classification. To solve this problem, the paper proposes a novel Aspect-level Sentiment Analysis model based on Interactive convolutional network with the dependency trees, named ASAI-DT in short. In particular, the ASAI-DT model first extracts the aspect words representation from the sentence representation trained by the Bi-GRU model. Meanwhile, the self-attention score of both the sentence and aspect representation are calculated separately by the self-attention mechanism, in order to reduce the attention to the irrelevant information. Afterward, the proposed model constructs the sub-tree of the dependency trees for the word, while the attention weight scores of the aspect representations will be integrated into the sub-tree. Therefore, the acquired comprehensive information about aspect words is processed by the graph convolutional network to maximize the retention of valid information and minimize the interference of noise. Finally, the effective information can be preserved more completely in the integrated information through the interactive network. Through a large number of experiments on various data sets, the proposed ASAI-DT model shows both the effectiveness and the accuracy of aspect sentiment analysis, which outperforms many aspect-based sentiment analysis models.

Lei Mao, Jianxia Chen, Shi Dong, Liang Xiao, Haoying Si, Shu Li, Xinyun Wu
Efficient Collaboration via Interaction Information in Multi-agent System

Cooperative multi-agent reinforcement learning (CMARL) has shown promise in solving real-world scenarios. The interaction information between agents contains rich global information, which is easily neglected after perceiving other agents’ behavior. To tackle this problem, we propose Collaboration Interaction Information Modelling via Hypergraph (CIIMH), which first perceives the behavior of other agents by mutual information optimization and constructs the dynamic interaction information via hypergraph. Perceived behavioral features of other agents are further aggregated in the hypergraph convolutional network to obtain interaction information. We compare our method with three existing baselines on StarCraft II micromanagement tasks (SMAC), Level-based Foraging (LBF), and Hallway. Empirical results show that our method outperforms baseline methods on all maps.

Meilong Shi, Quan Liu, Zhigang Huang
A Deep Graph Matching-Based Method for Trajectory Association in Vessel Traffic Surveillance

Vessel traffic surveillance in inland waterways extensively relies on the Automatic Identification Syst em (AIS) and video cameras. While video data only captures the visual appearance of vessels, AIS data serves as a valuable source of vessel identity and motion information, such as position, speed, and heading. To gain a comprehensive understanding of the behavior and motion of known-identity vessels, it is necessary to fuse the AIS-based and video-based trajectories. An important step in this fusion is to obtain the correspondence between moving targets by trajectory association. Thus, we focus solely on trajectory association in this work and propose a trajectory association method based on deep graph matching. We formulate trajectory association as a graph matching problem and introduce an attention-based flexible context aggregation mechanism to exploit the semantic features of trajectories. Compared to traditional methods that rely on manually designed features, our approach captures complex patterns and correlations within trajectories through end-to-end training. The introduced dustbin mechanism can effectively handle outliers during matching. Experimental results on synthetic and real-world datasets demonstrate the exceptional performance of our method in terms of trajectory association accuracy and robustness.

Yuchen Lu, Xiangkai Zhang, Xu Yang, Pin Lv, Liguo Sun, Ryan Wen Liu, Yisheng Lv
Few-Shot Anomaly Detection in Text with Deviation Learning

Most current methods for detecting anomalies in text concentrate on constructing models solely relying on unlabeled data. These models operate on the presumption that no labeled anomalous examples are available, which prevents them from utilizing prior knowledge of anomalies that are typically present in small numbers in many real-world applications. Furthermore, these models prioritize learning feature embeddings rather than optimizing anomaly scores directly, which could lead to suboptimal anomaly scoring and inefficient use of data during the learning process. In this paper, we introduce FATE, a deep few-shot learning-based framework that leverages limited anomaly examples and learns anomaly scores explicitly in an end-to-end method using deviation learning. In this approach, the anomaly scores of normal examples are adjusted to closely resemble reference scores obtained from a prior distribution. Conversely, anomaly samples are forced to have anomalous scores that considerably deviate from the reference score in the upper tail of the prior. Additionally, our model is optimized to learn the distinct behavior of anomalies by utilizing a multi-head self-attention layer and multiple instance learning approaches. Comprehensive experiments on several benchmark datasets demonstrate that our proposed approach attains a new level of state-of-the-art performance (Our code is available at ).

Anindya Sundar Das, Aravind Ajay, Sriparna Saha, Monowar Bhuyan
MOC: Multi-modal Sentiment Analysis via Optimal Transport and Contrastive Interactions

Multi-modal sentiment analysis (MSA) aims to utilize information from various modalities to improve the classification of emotions. Most existing studies employ attention mechanisms for modality fusion, overlooking the heterogeneity of different modalities. To address this issue, we propose an approach that leverages optimal transport for modality alignment and fusion, specifically focusing on distributional alignment. However, solely relying on the optimal transport module may result in a deficiency of intra-modal and inter-sample interactions. To tackle this deficiency, we introduce a double-modal contrastive learning module. Specifically, we propose a model MOC (Multi-modal sentiment analysis via Optimal transport and Contrastive interactions), which integrates optimal transport and contrastive learning. Through empirical comparisons on three established multi-modal sentiment analysis datasets, we demonstrate that our approach achieves state-of-the-art performance. Additionally, we conduct extended ablation studies to validate the effectiveness of each proposed module.

Yi Li, Qingmeng Zhu, Hao He, Ziyin Gu, Changwen Zheng
Two-Phase Semantic Retrieval for Explainable Multi-Hop Question Answering

Explainable Multi-Hop Question Answering (MHQA) requires an ability to reason explicitly across facts to arrive at the answer. The majority of multi-hop reasoning methods concentrate on semantic similarity to obtain the next hops or act as entity-centric inference. However, approaches that ignore the rationales required for problems can easily lead to blindness in reasoning. In this paper, we propose a two-Phase text Retrieval method with an entity Mask mechanism (PRM), which focuses on the rationale from global semantics along with entity consideration. Specifically, it consists of two components: 1) The rationale-aware retriever is pre-trained via a dual encoder framework with an entity mask mechanism. The learned representations of hypotheses and facts are utilized to obtain top K candidate core facts by a sentence-level dense retrieval. 2) The entity-aware validator determines the reachability of hypotheses and core facts with an entity granularity sparse matrix. Our experiments on three public datasets in the scientific domain (i.e., OpenbookQA, Worldtree, and ARC-Challenge) demonstrate that the proposed model has achieved remarkable performance over the existing methods.

Qin Wang, Jianzhou Feng, Ganlin Xu, Lei Huang
Efficient Spiking Neural Architecture Search with Mixed Neuron Models and Variable Thresholds

Spiking Neural Networks (SNNs) are emerging as energy-efficient alternatives to artificial neural networks (ANNs) due to their event-driven computation and effective processing of temporal information. While Neural Architecture Search (NAS) has been extensively used to optimize neural network structures, its application to SNNs remains limited. Existing studies often overlook the temporal differences in information propagation between ANNs and SNNs. Instead, they focus on shared structures such as convolutional, recurrent, or pooling modules. This work introduces a novel neural architecture search framework, MixedSNN, explicitly designed for SNNs. Inspired by the human brain, MixedSNN incorporates a novel search space called SSP, which explores the impact of utilizing Mixed spiking neurons and Variable thresholds on SNN performance. Additionally, we propose a training-free evaluation strategy called Period-Based Spike Evaluation (PBSE), which leverages spike activation patterns to incorporate temporal features in SNNs. The performance of SNN architectures obtained through MixedSNN is evaluated on three datasets, including CIFAR-10, CIFAR-100, and CIFAR-10-DVS. Results demonstrate that MixedSNN can achieve state-of-the-art performance with significantly lower timesteps.

Zaipeng Xie, Ziang Liu, Peng Chen, Jianan Zhang
Towards Scalable Feature Selection: An Evolutionary Multitask Algorithm Assisted by Transfer Learning Based Co-surrogate

When faced with large-instance datasets, existing feature selection methods based on evolutionary algorithms still face the challenge of high computational cost. To address this issue, this paper proposes a scalable evolutionary algorithm for feature selection on large-instance datasets, namely, transfer learning based co-surrogate assisted evolutionary multitask algorithm (cosEMT). Firstly, we tackle the feature selection on large-instance datasets via an evolutionary multitasking framework. The co-surrogate models are constructed to measure the similarity between each auxiliary task and main task, and the knowledge transfer between tasks is realized through instance-based transfer learning. Through the numerical relationship between the relative and absolute number of transferable instances, we propose a novel dynamic resource allocation strategy to make more efficient use of limited computational resources and accelerate evolutionary convergence. Meanwhile, an adaptive surrogate model update mechanism is proposed to balance the exploration and exploitation of the base optimizer embedded in the cosEMT framework. Finally, the proposed algorithm is compared with several state-of-the-art feature selection algorithms on twelve large-instance datasets. The experimental results show that the cosEMT framework can obtain significant acceleration in the convergence speed and high-quality solutions. All verify that cosEMT is a highly competitive method for feature selection on large-instance datasets.

Liangjiang Lin, Zefeng Chen, Yuren Zhou
CAS-NN: A Robust Cascade Neural Network Without Compromising Clean Accuracy

Adversarial training has emerged as a prominent approach for training robust classifiers. However, recent researches indicate that adversarial training inevitably results in a decline in a classifier’s accuracy on clean (natural) data. Robustness is at odds with clean accuracy due to the inherent tension between the objectives of adversarial robustness and standard generalization. Training a single classifier that combines high adversarial robustness and high clean accuracy appears to be an insurmountable challenge. This paper proposes a straightforward strategy to bridge the gap between robustness and clean accuracy. Inspired by the idea underlying dynamic neural networks, i.e., adaptive inference, we propose a robust cascade framework that integrates a standard classifier and a robust classifier. The cascade neural network dynamically classifies clean and adversarial samples using distinct classifiers based on the confidence score of each input sample. As deep neural networks suffer from serious overconfident problems on adversarial samples, we propose an effective confidence calibration algorithm for the standard classifier, enabling accurate confidence scores for adversarial samples. The experiments demonstrate that the proposed cascade neural network increases the clean accuracies by 10.1%, 14.67%, and 9.11% compared to the advanced adversarial training (HAT) on CIFAR10, CIFAR100, and Tiny ImageNet while keeping similar robust accuracies.

Zhuohuang Chen, Zhimin He, Yan Zhou, Patrick P. K. Chan, Fei Zhang, Haozhen Situ
Multi-scale Information Fusion Combined with Residual Attention for Text Detection

Driven by deep learning and neural networks, text detection technology has made further developments. Due to the complexity and diversity of scene text, detecting text of arbitrary shapes has become a challenging task. Previous segmentation-based text detection methods can hardly solve the problem of missed detection in complexity scene text detection. In this paper, we propose a text detection model that combines residual attention with a multi-scale information fusion structure to effectively capture text information in natural scenes and avoid text omission. Specifically, the multi-scale information fusion structure extracts text features from different levels to achieve better text localisation and facilitate the fusion of text information. At the same time, residual attention is combined with features from high-resolution images to enhance the contextual information of the text and avoid text omission. Finally, text instances are obtained by a binarisation method. The proposed model is very helpful for text detection in complex scenes. Experiments conducted on three public benchmark datasets show that the model achieves state-of-the-art performance.

Wenxiu Zhao, Changlei Dongye
Encrypted-SNN: A Privacy-Preserving Method for Converting Artificial Neural Networks to Spiking Neural Networks

The transformation from Artificial Neural Networks (ANNs) to Spiking Neural Networks (SNNs) presents a formidable challenge, particularly in terms of preserving privacy to safeguard sensitive data during the conversion process. In response to these privacy concerns, a novel Encrypted-SNN approach is proposed for the ANN-SNN conversion. By incorporating noise into the gradients of both ANNs and SNNs, privacy protection without compromising network performance can be enhanced. The proposed method is tested using popular datasets including CIFAR10, MNIST, and Fashion MNIST, achieving respective accuracies of 88.1 $$\%$$ % , 99.3 $$\%$$ % , and 93.0 $$\%$$ % respectively. The influence of three distinct privacy budgets ( $$\epsilon $$ ϵ = 0.5, 1.0, and 1.6) on the accuracy of the model are also discussed. Experimental results demonstrate that the Encrypted-SNN approach effectively optimizes the balance between privacy and performance. This has practical implications for data privacy protection and contributes to the enhancement of security and privacy within SNNs.

Xiwen Luo, Qiang Fu, Sheng Qin, Kaiyang Wang
PoShapley-BCFL: A Fair and Robust Decentralized Federated Learning Based on Blockchain and the Proof of Shapley-Value

Recently, blockchain-based Federated learning (BCFL) has emerged as a promising technology for promoting data sharing in the Internet of Things (IoT) without relying on a central authority, while ensuring data privacy, security, and traceability. However, it remains challenging to design an decentralized and appropriate incentive scheme that should promise a fair and efficient contribution evaluation for participants while defending against low-quality data attacks. Although Shapley-Value (SV) methods have been widely adopted in FL due to their ability to quantify individuals’ contributions, they rely on a central server for calculation and incur high computational costs, making it impractical for decentralized and large-scale BCFL scenarios. In this paper, we designed and evaluated PoShapley-BCFL, a new blockchain-based FL approach to accommodate both contribution evaluation and defense against inferior data attacks. Specifically, we proposed PoShapley, a Shapley-value-enabled blockchain consensus protocol tailored to support a fair and efficient contribution assessment in PoShapley-BCFL. It mimics the Proof-of-Work mechanism that allows all participants to compute contributions in parallel based on an improved lightweight SV approach. Following using the PoShapley protocol, we further designed a fair-robust aggregation rule to improve the robustness of PoShapley-BCFL when facing inferior data attacks. Extensive experimental results validate the accuracy and efficiency of PoShapley in terms of distance and time cost, and also demonstrate the robustness of our designed PoShapley-BCFL.

Ziwen Cheng, Yi Liu, Chao Wu, Yongqi Pan, Liushun Zhao, Cheng Zhu
Small-World Echo State Networks for Nonlinear Time-Series Prediction

Echo state network (ESN) is a reservoir computing approach for efficiently training recurrent neural networks. However, it sometimes suffers from poor performance and robustness due to the non-trainable reservoir. This paper proposes a novel computational framework for ESNs to improve prediction performance and robustness. A small-world network is applied as the reservoir topology, a biologically plausible unsupervised learning method named dual-threshold Bienenstock-Cooper-Munro learning rule is applied to adjust reservoir weights adaptively, and a recursive least-squares-based composite learning algorithm is introduced to update readout weights. The proposed method is compared with several kinds of ESNs on the Mackey-Glass system, a benchmark problem of nonlinear time-series prediction. Simulation results have shown that the proposed method not only achieves the best prediction performance but also exhibits remarkable robustness against noise.

Shu Mo, Kai Hu, Weibing Li, Yongping Pan
Preserving Potential Neighbors for Low-Degree Nodes via Reweighting in Link Prediction

Link prediction is an important task for graph data. Methods based on graph neural networks achieve high accuracy by simultaneously modeling the node attributes and structure of the observed graph. However, these methods often get worse performance for low-degree nodes. After theoretical analysis, we find that current link prediction methods focus more on negative samples for low-degree nodes, which makes it hard to find potential neighbors for these nodes during inference. In order to improve the performance on low-degree nodes, we first design a node-wise score to quantify how seriously the training is biased to negative samples. Based on the score, we develop a reweighting method called harmonic weighting(HAW) to help the model preserve potential neighbors for low-degree nodes. Experimental results show that the model combined with HAW can achieve better performance on most datasets. By detailedly analyzing the performance on nodes with different degrees, we find that HAW can preserve more potential neighbors for low-degree nodes without reducing the performance of other nodes.

Ziwei Li, Yucan Zhou, Haihui Fan, Xiaoyan Gu, Bo Li, Dan Meng
6D Object Pose Estimation with Attention Aware Bi-gated Fusion

Accurate object pose estimation is a prerequisite for successful robotic grasping tasks. Currently keypoint-based pose estimation methods using RGB-D data have shown promising results in simple environments. However, how to fuse the complementary features from RGB-D data is still a challenging task. To this end, this paper proposes a two-branch network with attention aware bi-gated fusion (A2BF) module for the keypoint-based 6D object pose estimation, named A2BNet for abbreviation. A2BF module consists of two key components, bidirectional gated fusion and attention mechanism modules to effectively extract information from both RGB and point cloud data, prioritizing crucial details while disregarding irrelevant information. Several A2BF modules can be embedded in the network to generate complementary texture and geometric information. Extensive experiments are conducted on the public LineMOD and Occlusion LineMOD datasets. Experimental results demonstrate that the average accuracy using the proposed method on both datasets can reach 99.8% and 67.6% respectively, outperforms the state-of-the-art methods.

Laichao Wang, Weiding Lu, Yuan Tian, Yong Guan, Zhenzhou Shao, Zhiping Shi
Neural Information Processing
Biao Luo
Long Cheng
Zheng-Guang Wu
Hongyi Li
Chaojie Li
Copyright Year
Springer Nature Singapore
Electronic ISBN
Print ISBN

Premium Partner