Skip to main content

2024 | Book

Neural Information Processing

30th International Conference, ICONIP 2023, Changsha, China, November 20–23, 2023, Proceedings, Part I

Editors: Biao Luo, Long Cheng, Zheng-Guang Wu, Hongyi Li, Chaojie Li

Publisher: Springer Nature Singapore

Book Series : Lecture Notes in Computer Science


About this book

The six-volume set LNCS 14447 until 14452 constitutes the refereed proceedings of the 30th International Conference on Neural Information Processing, ICONIP 2023, held in Changsha, China, in November 2023.
The 652 papers presented in the proceedings set were carefully reviewed and selected from 1274 submissions. They focus on theory and algorithms, cognitive neurosciences; human centred computing; applications in neuroscience, neural networks, deep learning, and related fields.

Table of Contents


Theory and Algorithms

Single Feedback Based Kernel Generalized Maximum Correntropy Adaptive Filtering Algorithm

This paper presents a novel single feedback based kernel generalized maximum correntropy (SF-KGMC) algorithm by introducing a single delay into the framework of kernel adaptive filtering. In SF-KGMC, the history information implicitly existing in the single delayed output can enhance the convergence rate. Compared to the second-order statistics criterion, the generalized maximum correntropy (GMC) criterion shows better robustness against outliers. Therefore, SF-KGMC can efficiently reduce the influence of impulsive noise and avoids significant performance degradation. In addition, for SF-KGMC, the theoretical convergence analysis is also conducted. Simulation results on chaotic time-series prediction and real-world data applications validate that SF-KGMC achieves better filtering accuracy and a faster convergence rate.

Jiaming Liu, Ji Zhao, Qiang Li, Lingli Tang, Hongbin Zhang
Application of Deep Learning Methods in the Diagnosis of Coronary Heart Disease Based on Electronic Health Record

With the development of Internet technology, the number of electronic health record data has surged. Also, artificial intelligence simulates the ability of human beings to solve problems and make decisions by conducting complex and fast calculation on a large amount of data. Based on the electronic health records of hypertension patients in Peking University Shenzhen Hospital, we proposed the use of deep learning methods to achieve intelligent coronary heart disease diagnosis for hypertensive patients. We firstly conducted statistical data analysis and effective feature selection experiments. Then, we established an intelligent diagnosis model for coronary heart disease based on the Transformer and contrastive learning. The model integrates multiple types of health record data such as patient’s personal information, symptoms, concurrent diseases and test data, and the results proved that our model achieved the best classification performance and accuracy compared with CNN, RNN and LSTM, with AUC value reached 0.9349. In the future, this model can be extended to the diagnosis of general chronic diseases.

Hanyang Meng, Xingjun Wang
Learning Adaptable Risk-Sensitive Policies to Coordinate in Multi-agent General-Sum Games

In general-sum games, the interaction of self-interested learning agents commonly leads to socially worse outcomes, such as defect-defect in the iterated stag hunt (ISH). Previous works address this challenge by sharing rewards or shaping their opponents’ learning process, which require too strong assumptions. In this paper, we observe that agents trained to optimize expected returns are more likely to choose a safe action that leads to guaranteed but lower rewards. To overcome this, we present Adaptable Risk-Sensitive Policy (ARSP). ARSP learns the distributions over agent’s return and estimates a dynamic risk-seeking bonus to discover risky coordination strategies. Furthermore, to avoid overfitting training opponents, ARSP learns an auxiliary opponent modeling task to infer opponents’ types and dynamically alter corresponding strategies during execution. Extensive experiments show that ARSP agents can achieve stable coordination during training and adapt to non-cooperative opponents during execution, outperforming a set of baselines by a large margin.

Ziyi Liu, Yongchun Fang
Traffic Data Recovery and Outlier Detection Based on Non-negative Matrix Factorization and Truncated-Quadratic Loss Function

Intelligent Transportation System (ITS) plays a critical role in managing traffic flow and ensuring safe transportation. However, the presence of missing and corrupted traffic data may undermine the accuracy and reliability of the system. The problem of recovering traffic data can often be transformed into a low-rank matrix factorization problem by exploiting the intrinsic low-rank characteristics of the traffic matrix. While many existing methods demonstrate excellent recovery performance under the assumption of noiseless or Gaussian noise, they often exhibit suboptimal performance in the presence of outliers. In this paper, we propose a novel method for recovering traffic data using non-negative matrix factorization with a truncated-quadratic loss function. Although the objective function in our model is non-convex and non-smooth, we convert it to a convex formulation using half-quadratic theory. Then, a solver based on block coordinate descent is developed. Our experiments on real-world traffic datasets demonstrate superior performance compared to state-of-the-art methods.

Linfang Yu, Hao Wang, Yuxin He, Yang Wen
ADEQ: Adaptive Diversity Enhancement for Zero-Shot Quantization

Zero-shot quantization (ZSQ) is an effective way to compress neural networks, especially when real training sets are inaccessible because of privacy and security issues. Most existing synthetic-data-driven zero-shot quantization methods introduce diversity enhancement to simulate the distribution of real samples. However, the adaptivity between the enhancement degree and network is neglected, i.e., whether the enhancement degree benefits different network layers and different classes, and whether it reaches the best match between the inter-class distance and intra-class diversity. Due to the absence of the metric for class-wise and layer-wise diversity, maladaptive enhancement degree run the vulnerability of mode collapse of the inter-class inseparability. To address this issue, we propose a novel zero-shot quantization method, ADEQ. For layer-wise and class-wise adaptivity, the enhancement degree of different layers is adaptively initialized with a diversity coefficient. For inter-class adaptivity, an incremental diversity enhancement strategy is proposed to achieve the trade-off between inter-class distance and intra-class diversity. Extensive experiments on the CIFAR-100 and ImageNet show that our ADEQ is observed to have advanced performance at low bit-width quantization. For example, when ResNet-18 is quantized to 3 bits, we improve top-1 accuracy by 17.78% on ImageNet compared to the advanced ARC. Code at .

Xinrui Chen, Renao Yan, Junru Cheng, Yizhi Wang, Yuqiu Fu, Yi Chen, Tian Guan, Yonghong He
ASTPSI: Allocating Spare Time and Planning Speed Interval for Intelligent Train Control of Sparse Reward

When using deep reinforcement learning (DRL) to solve train operation control in urban railways, encounter complex and dynamic environments with sparse rewards. Therefore, it is crucial to alleviate the negative impact of sparse rewards on finding the optimal trajectory. This paper introduces a novel algorithm called Allocating Spare Time and Planning Speed Intervals (ASTPSI), which can reduce the blindness of exploration dramatically of intelligent train agents under sparse rewards when using DRL and significantly improve their learning efficiency and operation quality. The ASTPSI can generate real-time train trajectories that meet the requirements by combining different DRL algorithms. To evaluate the algorithm’s performance, we verified the convergence rate of the ASTPSI-DRL to optimize train trajectories in the face of sparse rewards on a real track. ASTPSI-DRL has better performance and stability than genetic algorithms and original DRL algorithms in reducing train energy consumption, punctuality, and accurate stopping.

Haotong Zhang, Gang Xian
Amortized Variational Inference via Nosé-Hoover Thermostat Hamiltonian Monte Carlo

Sampling latents from the posterior distribution efficiently and accurately is a fundamental problem for posterior inference. Markov chain Monte Carlo (MCMC) is such a useful tool to do that but at the cost of computational burden since it needs many transition steps to converge to the stationary distribution for each datapoint. Amortized variational inference within the framework of MCMC is thus proposed where the learned parameters of the model are shared by all observations. Langevin autoencoder is a newly proposed method that amortizes inference in parameter space. This paper generalizes the Langevin autoencoder by utilizing the stochastic gradient Nosé-Hoover Thermostat Hamiltonian Monte Carlo to conduct amortized updating of the parameters of the inference distribution. The proposed method improves variational inference accuracy for the latent by subtly dealing with the noise introduced by stochastic gradient without estimating that noise explicitly. Experiments benchmarking our method against baseline generative methods highlight the effectiveness of our proposed method.

Zhan Yuan, Chao Xu, Zhiwen Lin, Zhenjie Zhang
AM-RRT*: An Automatic Robot Motion Planning Algorithm Based on RRT

Motion planning is a very important part of robot technology, where the quality of planning directly affects the energy consumption and safety of robots. Focusing on the shortcomings of traditional RRT methods such as long, unsmooth paths, and uncoupling with robot control system, an automatic robot motion planning method was proposed based on Rapid Exploring Random Tree called AM-RRT* (automatic motion planning based on RRT*). First, the RRT algorithm was improved by increasing the attractive potential fields of the target points of the environment, making it more directional during the sampling process. Then, a path optimization method based on a dynamic model and cubic B-spline curve was designed to make the planned path coupling with the robot controller. Finally, an RRT speed planning algorithm was added to the planned path to avoid dynamic obstacles in real time. To verify the feasibility of AM-RRT*, a detailed comparison was made between AM-RRT* and the traditional RRT series algorithms. The results showed that AM-RRT* improved the shortcomings of RRT and made it more suitable for robot motion planning in a dynamic environment. The proposal of AM-RRT* can provide a new idea for robots to replace human labor in complex environments such as underwater, nuclear power, and mines.

Peng Chi, Zhenmin Wang, Haipeng Liao, Ting Li, Jiyu Tian, Xiangmiao Wu, Qin Zhang
MS3DAAM: Multi-scale 3-D Analytic Attention Module for Convolutional Neural Networks

In this paper, we propose a compact and effective module, called multi-scale 3-D analytic attention module (MS3DAAM) to address this challenge. We significantly reduce model complexity by developing a decoupling-and-coupling strategy. Firstly, we factorize the regular attention along channel, height and width directions and then efficiently encode the information via 1-D convolutions, which greatly saves the computational power. Secondly, we multiply the weighted embedding results of the three direction vectors to regain a better 3-D attention map, which allocates an independent weight to each neuron, thus developing a unified measurement method for attention. Furthermore, multi-scale method is introduced to further strengthen our module capability in locating by capturing both the inter-channel relationships and long-range spatial interactions from different receptive fields. Finally, we develop a structural re-parameterization technique for multi-scale 1-D convolutions to boost the inference speed. Extensive experiments in classification and object detection verify the superiority of our proposed method over other state-of-the-art counterparts. This factorizing-and-combining mechanism with the beauty of brevity can be further extended to simplify similar network structures.

Yincong Wang, Shoubiao Tan, Chunyu Peng
Nonlinear Multiple-Delay Feedback Based Kernel Least Mean Square Algorithm

In this paper, a novel algorithm called nonlinear multiple-delay feedback kernel least mean square (NMDF-KLMS) is proposed by introducing a nonlinear multiple-delay into the framework of multikernel adaptive filtering. The proposed algorithm incorporates the nonlinear multiple-delay to enhance the filtering performance in comparison with the kernel adaptive filtering algorithm using linear feedback. Furthermore, for NMDF-KLMS, the theoretical mean-square convergence analyses is also conducted. Simulation results under chaotic time-series prediction and real-world data applications show that NMDF-KLMS achieves a faster convergence rate and superior filtering accuracy.

Ji Zhao, Jiaming Liu, Qiang Li, Lingli Tang, Hongbin Zhang
AGGDN: A Continuous Stochastic Predictive Model for Monitoring Sporadic Time Series on Graphs

Monitoring data of real-world networked systems could be sparse and irregular due to node failures or packet loss, which makes it a challenge to model the continuous dynamics of system states. Representing a network as graph, we propose a deep learning model, Adversarial Graph-Gated Differential Network (AGGDN). To accurately capture the spatial-temporal interactions and extract hidden features from data, AGGDN introduces a novel module, dynDC-ODE, which empowers Ordinary Differential Equation (ODE) with learning-based Diffusion Convolution (DC) to effectively infer relations among nodes and parameterize continuous-time system dynamics over graph. It further incorporates a Stochastic Differential Equation (SDE) module and applies it over graph to efficiently capture the underlying uncertainty of the networked systems. Different from any single differential equation model, the ODE part also works as a control signal to modulate the SDE propagation. With the recurrent running of the two modules, AGGDN can serve as an accurate online predictive model that is effective for either monitoring or analyzing the real-world networked objects. In addition, we introduce a soft masking scheme to capture the effects of partial observations caused by the random missing of data from nodes. As training a model with SDE component could be challenging, Wasserstein adversarial training is exploited to fit the complicated distribution. Extensive results demonstrate that AGGDN significantly outperforms existing methods for online prediction.

Yucheng Xing, Jacqueline Wu, Yingru Liu, Xuewen Yang, Xin Wang
Attribution Guided Layerwise Knowledge Amalgamation from Graph Neural Networks

Knowledge Amalgamation (KA), aiming to transfer knowledge from multiple well-trained teacher networks to a multi-talented and compact student, is gaining attention due to its crucial role in resource-constrained scenarios. Previous literature on KA, although exhibiting promising results, is primarily geared toward Convolutional Neural Networks (CNNs). However, when transferred to Graph Neural Networks (GNNs) with non-grid data, KA techniques face new challenges that can be difficult to overcome. Moreover, the layerwise aggregation of GNNs produces significant noise as they progress from a shallow to a deep level, which can impede KA students’ deep-level semantic comprehension. This work aims to overcome this limitation and propose a novel strategy termed LAyerwIse Knowledge Amalgamation (LaiKA). It involves Hierarchical Feature Alignment between the teachers and the student, which enables the student to directly master the feature aggregation rules from teacher GNNs. Meanwhile, we propose a Selective Attribution Transfer (SAT) module that identifies task-relevant topological substructures to assist the capacity-limited student in mitigating noise and enhancing performance. Extensive experiments conducted on six datasets demonstrate that our proposed method equips a single student GNN to handle tasks from multiple teachers effectively and achieve comparable or superior results to those of the teachers without human annotations.

Yunzhi Hao, Yu Wang, Shunyu Liu, Tongya Zheng, Xingen Wang, Xinyu Wang, Mingli Song, Wenqi Huang, Chun Chen
Distributed Neurodynamic Approach for Optimal Allocation with Separable Resource Losses

To solve the optimal allocation problem with separable resource losses, this paper proposes a neurodynamic approach based on multi-agent system. By using KKT condition, the nonlinear coupling equality constraint in the original problem is equivalently transformed into a convex coupling inequality constraint. Then, with the help of finite-time tracking technology and fixed-time projection method, a neurodynamic approach is designed and its convergence is strictly proved. Finally, simulation results verify the effectiveness of the proposed neurodynamic approach.

Linhua Luan, Yining Liu, Sitian Qin, Jiqiang Feng
Multimodal Isotropic Neural Architecture with Patch Embedding

Patch embedding has been a significant advancement in Transformer-based models, particularly the Vision Transformer (ViT), as it enables handling larger image sizes and mitigating the quadratic runtime of self-attention layers in Transformers. Moreover, it allows for capturing global dependencies and relationships between patches, enhancing effective image understanding and analysis. However, it is important to acknowledge that Convolutional Neural Networks (CNNs) continue to excel in scenarios with limited data availability. Their efficiency in terms of memory usage and latency makes them particularly suitable for deployment on edge devices. Expanding upon this, we propose Minape, a novel multimodal isotropic convolutional neural architecture that incorporates patch embedding to both time series and image data for classification purposes. By employing isotropic models, Minape addresses the challenges posed by varying data sizes and complexities of the data. It groups samples based on modality type, creating two-dimensional representations that undergo linear embedding before being processed by a scalable isotropic convolutional network architecture. The outputs of these pathways are merged and fed to a temporal classifier. Experimental results demonstrate that Minape significantly outperforms existing approaches in terms of accuracy while requiring fewer than 1M parameters and occupying less than 12 MB in size. This performance was observed on multimodal benchmark datasets and the authors’ newly collected multi-dimensional multimodal dataset, Mudestreda, obtained from real industrial processing devices $$^{1}$$ 1 ( $$^{1}$$ 1 Link to code and dataset: ).

Hubert Truchan, Evgenii Naumov, Rezaul Abedin, Gregory Palmer, Zahra Ahmadi
Determination of Local and Global Decision Weights Based on Fuzzy Modeling

An essential challenge in multi-criteria decision analysis (MCDA) is the determination of criteria weights. These weights map the decision maker’s preferences for decision problems in determining the importance of criteria. However, these values are not necessarily constant in the whole domain. Although many approaches are related to their determination, some MCDA models can have local weights that are difficult to map in global spaces. This paper focuses on an approach in which we determine global and local weights from the Characteristic Objects METhod (COMET) by using linear regression. Moreover, obtained linear models are compared with COMET and Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) models to answer how similar they are. Then, the relationships between the obtained global and local weights are analyzed based on a simple case study. The results demonstrate the high sensitivity of the COMET method and the applicability of the proposed approach for determining global and local weights. The most useful contribution is the proposed approach to identify local weights that can be used for deeper decision analysis.

Bartłomiej Kizielewicz, Jakub Więckowski, Bartosz Paradowski, Andrii Shekhovtsov, Wojciech Sałabun
Binary Mother Tree Optimization Algorithm for 0/1 Knapsack Problem

The knapsack problem is a well-known strongly NP-complete problem where the profits of collection of items in knapsack is maximized under a certain weight capacity constraint. In this paper, a novel Binary Mother Tree Optimization Algorithm (BMTO) and Knapsack Problem Framework (KPF) are proposed to find an efficient solution for 0/1 knapsack problem in a short time. The proposed BMTO method is built on the original MTO and a binary module to solve an optimization problem in a discrete space. The binary module converts a set of real numbers equal to the dimension of the knapsack problem to a binary number using a threshold and the sigmoid function. In fact, the KPF makes the implementation of a metaheuristic algorithm to solve the knapsack problem much simpler. In order to assess the performance of the proposed solutions, extensive experiments are conducted. In this regard, several statistical analyses on the resulting solution are evaluated when solved for two sets of knapsack instances (small and large scale). The results demonstrate that BMTO can produce an efficient solution for knapsack instances of different sizes in a short time, and it outperforms two other algorithms Binary Particle Swarm Optimization (BPSO) and Binary Bacterial Foraging (BBF) algorithms in terms of best solution and time. In addition, the results of BPSO and BBF show the effectiveness of KPF compared to the results in the literature.

Wael Korani
Distributed State Estimation for Multi-agent Systems Under Consensus Control

Distributed state estimation and consensus control for linear time-invariant multi-agent systems under strongly connected directed graph are addressed in this paper. The distributed output tracking algorithm and the local state estimator are designed for each agent to estimate the output and state of the entire multi-agent system, despite having access only to local output measurements that are insufficient to directly reconstruct the entire state. The consensus control protocol is further designed based on each agent’s own entire state estimation. Neither distributed state estimation nor consensus control protocol design requires state information from neighboring agents, eliminating the transmission of the values of state estimations during the whole process. The theoretical analysis demonstrates that the realization of distributed output tracking and state estimation. Moreover, all agents achieve consensus. Finally, numerical simulations are worked out to show the effectiveness of the proposed algorithm.

Yan Li, Jiazhu Huang, Yuezu Lv, Jialing Zhou
Integrated Design of Fully Distributed Adaptive State Estimation and Consensus Control for Multi-agent Systems

In this paper, the problem of fully distributed adaptive state estimation and consensus control for linear multi-agent systems is investigated. By designing fully distributed adaptive output tracking observers, the entire output information is available to each agent, and local state estimators based on the estimated output is constructed to estimate the overall state of multi-agent systems. The consensus control protocol based on state estimation for multi-agent systems is designed to ensure the agents achieve consensus. The proposed control input of each agent relies on its own estimation of the entire state. Theoretical analysis proves the effectiveness of the algorithm and practical applications are given by simulation.

Jiazhu Huang, Yan Li, Yuezu Lv
Computer Simulations of Applying Zhang Inequation Equivalency and Solver of Neurodynamics to Redundant Manipulators at Acceleration Level

An equation can be transformed into an equivalent equation at a different level, which is termed equation equivalence or even generalized to be equation equivalency. In recent years, Zhang equivalency, more specifically, Zhang equation equivalency, i.e., a new equation equivalency originated from Zhang neurodynamics, has been proposed and investigated. Referring to Zhang equivalency and doing a careful investigation, we similarly find that an inequation can also be transformed into an equivalent inequation at a different level. The novel inequation equivalency named Zhang inequation equivalency (ZIE) is investigated in this paper. Then, ZIE is applied to acceleration-level redundant manipulator motion control. The configuration adjustment and cyclic motion generation of two types of redundant manipulators are investigated and simulated. Comparative experimental results verify the validity of the proposed ZIE. In fact, ZIE can also be applied in different actual projects according to practical requirements.

Ji Lu, Min Yang, Ning Tan, Haifeng Hu, Yunong Zhang
High-Order Control Barrier Function Based Robust Collision Avoidance Formation Tracking of Constrained Multi-agent Systems

In this work, we propose a high-order control barrier functions (HOCBFs) based safe formation tracking controller for second-order multi-agent systems subject to input uncertainties and both velocity and input constraints (VICs). First, a nominal velocity and input constrained formation tracking controller is proposed which using sliding mode control theory to eliminate the effects of the uncertain dynamics. Then, the HOCBFs-based collision avoidance conditions are derived for the followers where both collision among the agents and between the agents and the obstacles are considered. Finally, the collision avoidance formation tracking controller for the constrained uncertain second-order multi-agent systems is constructed by formulating a local quadratic programming (QP) problem for each follower. It is shown that under proper initial conditions, there always exist feasible control inputs such that collision avoidance can be guaranteed under both VICs of the agents. Simulation examples illustrate the effectiveness of the proposed control strategy.

Dan Liu, Junjie Fu
Decision Support System Based on MLP: Formula One (F1) Grand Prix Study Case

Neural networks are widely used due to the adaptability of models to many problems and high efficiency. These solutions are also gaining popularity in the design of Decision Support Systems. It leads to increased use of such techniques to support the decision-maker in practical problems.In this paper, we propose an Artificial Neural Network Decision Support System (ANN-DSS) based on Multilayer Perceptron. The model structure was determined by searching the optimal hyperparameters with Tree-structured Parzen Estimator. Based on the qualification results, the proposed system was directed to evaluate the Formula 1 divers’ best lap time during the race. Obtained rankings were compared with reference rankings using the WS rank similarity. Model performance proves to be highly consistent in rankings predictions, which makes it a reliable tool for the given problem.

Jakub Więckowski, Bartosz Paradowski, Bartłomiej Kizielewicz, Andrii Shekhovtsov, Wojciech Sałabun
Theoretical Analysis of Gradient-Zhang Neural Network for Time-Varying Equations and Improved Method for Linear Equations

Solving time-varying equations is fundamental in science and engineering. This paper aims to find a fast-converging and high-precision method for solving time-varying equations. We combine two classes of feedback neural networks, i.e., gradient neural network (GNN) and Zhang neural network (ZNN), to construct a continuous gradient-Zhang neural network (GZNN) model. Our research shows that GZNN has the advantages of high convergence precision of ZNN and fast convergence speed of GNN in certain cases, i.e., all the eigenvalues of Jacobian matrix of the time-varying equations multiplied by its transpose are larger than 1. Furthermore, we conduct the different detailed mathematical proof and theoretical analysis to establish the stability and convergence of the GZNN model. Additionally, we discretize the GZNN model by utilizing time discretization formulas (i.e., Euler and Taylor-Zhang discretization formulas), to construct corresponding discrete GZNN algorithms for solving discrete time-varying problems. Different discretization formulas can construct discrete algorithms with varying precision. As the number of time sampling instants increases, the precision of discrete algorithms can be further improved. Furthermore, we improve the matrix inverse operation in the GZNN model and develop inverse-free GZNN algorithms to solve linear problems, effectively reducing their time complexity. Finally, numerical experiments are conducted to validate the feasibility of GZNN model and the corresponding discrete algorithms in solving time-varying equations, as well as the efficiency of the inverse-free method in solving linear equations.

Changyuan Wang, Yunong Zhang
EdgeMA: Model Adaptation System for Real-Time Video Analytics on Edge Devices

Real-time video analytics on edge devices for changing scenes remains a difficult task. As edge devices are usually resource-constrained, edge deep neural networks (DNNs) have fewer weights and shallower architectures than general DNNs. As a result, they only perform well in limited scenarios and are sensitive to data drift. In this paper, we introduce EdgeMA, a practical and efficient video analytics system designed to adapt models to shifts in real-world video streams over time, addressing the data drift problem. EdgeMA extracts the gray level co-occurrence matrix based statistical texture feature and uses the Random Forest classifier to detect the domain shift. Moreover, we have incorporated a method of model adaptation based on importance weighting, specifically designed to update models to cope with the label distribution shift. Through rigorous evaluation of EdgeMA on a real-world dataset, our results illustrate that EdgeMA significantly improves inference accuracy.

Liang Wang, Nan Zhang, Xiaoyang Qu, Jianzong Wang, Jiguang Wan, Guokuan Li, Kaiyu Hu, Guilin Jiang, Jing Xiao
Mastering Complex Coordination Through Attention-Based Dynamic Graph

The coordination between agents in multi-agent systems has become a popular topic in many fields. To catch the inner relationship between agents, the graph structure is combined with existing methods and improves the results. But in large-scale tasks with numerous agents, an overly complex graph would lead to a boost in computational cost and a decline in performance. Here we present DAGMIX, a novel graph-based value factorization method. Instead of a complete graph, DAGMIX generates a dynamic graph at each time step during training, on which it realizes a more interpretable and effective combining process through the attention mechanism. Experiments show that DAGMIX significantly outperforms previous SOTA methods in large-scale scenarios, as well as achieving promising results on other tasks.

Guangchong Zhou, Zhiwei Xu, Zeren Zhang, Guoliang Fan
SORA: Improving Multi-agent Cooperation with a Soft Role Assignment Mechanism

Role-based multi-agent reinforcement learning (MARL) holds the promise of achieving scalable multi-agent cooperation by decomposing complex tasks through the concept of roles and has enjoyed great success in various tasks. However, conventional role-based MARL methods typically assign a single role to each agent, limiting the agent’s behavior in certain scenarios. In real life, an individual usually performs multiple responsibilities in a given task. To meet such situations, we propose a novel soft role assignment (SORA) process that enables an agent to play multiple roles simultaneously. Concretely, SORA first generates a role distribution via the attention mechanism to interpret the agent’s identity as a combination of different roles. To ensure consistent behavior with an agent’s assigned role, we also introduce role-specific Q networks for decision-making. By virtue of these advances, our proposed method makes a prominent improvement over the prior state-of-the-art approaches on StarCraft multi-agent challenges and Google Research Football.

Guangchong Zhou, Zhiwei Xu, Zeren Zhang, Guoliang Fan
Outer Synchronization for Multi-derivative Coupled Complex Networks with and without External Disturbance

This paper investigates the outer synchronization of multi-derivative coupled complex networks (MDCCNs), and further studies the outer $$H_{\infty }$$ H ∞ synchronization between two MDCCNs with external disturbance. For the outer synchronization, a synchronization criterion is proposed by using adaptive control strategy, which is proved based on Lyapunov functional and the Barbalat’s lemma. For the outer $$H_\infty $$ H ∞ synchronization, an adaptive state controller and parameter updating scheme are devised for MDCCNs with external disturbance. Finally, the validity of the presented criteria is demonstrated by providing two simulation examples.

Han-Yu Wu, Qingshan Liu
A Distributed Projection-Based Algorithm with Local Estimators for Optimal Formation of Multi-robot System

In general, the optimal formation problem can be modeled as a standard constrained optimization problem according to the shape theory. By adding local supplementary estimators, it can be further modeled as a distributed constrained optimization problem. Then a distributed projection-based algorithm is designed for solving this problem. The aim of the algorithm is to drive a group of robots to move to the desired geometric pattern by minimizing the total travel distance of robots from the initial positions. It is worth noticing that, as long as the graph of the communication network among the robots is undirected and connected, the global convergence of the algorithm can be guaranteed. Moreover, all of the robots finally form an ideal formation in the limited space. Finally, simulation results are provided to verify the effectiveness of the proposed distributed algorithm.

Yuanyuan Yue, Qingshan Liu, Ziming Zhang
A Stochastic Gradient-Based Projection Algorithm for Distributed Constrained Optimization

This paper investigates a category of constrained convex optimization problems, where the collective objective function is represented as the sum of all local objective functions subjected to local bounds and equality constraints. This kind of problems is important and can be formulated form a variety of applications, such as power control, sensor networks and source localization. To solve this problem more reliable and effective, we propose a novel distributed stochastic gradient-based projection algorithm under the presence of noisy gradients, where the gradients are infiltrated by arbitrary but uniformly bounded noise sample through local gradient observation. The proposed algorithm allows the adoption of constant step-size, which guarantees it can possess faster convergence rate compared with existing distributed algorithms with diminishing step-size. The effectiveness of the proposed algorithm is verified and testified by simulation experiments.

Keke Zhang, Shanfu Gao, Yingjue Chen, Zuqing Zheng, Qingguo Lü
FalconNet: Factorization for the Light-Weight ConvNets

Designing light-weight CNN models with little parameters and Flops is a prominent research concern. However, three significant issues persist in the current light-weight CNNs: i) the lack of architectural consistency leads to redundancy and hindered capacity comparison, as well as the ambiguity in causation between architectural choices and performance enhancement; ii) the utilization of a single-branch depth-wise convolution compromises the model representational capacity; iii) the depth-wise convolutions account for large proportions of parameters and Flops, while lacking efficient method to make them light-weight. To address these issues, we factorize the four vital components of light-weight CNNs from coarse to fine and redesign them: i) we design a light-weight overall architecture termed LightNet, which obtains better performance by simply implementing the basic blocks of other light-weight CNNs; ii) we abstract a Meta Light Block, which consists of spatial operator and channel operator and uniformly describes current basic blocks; iii) we raise RepSO which constructs multiple spatial operator branches to enhance the representational ability; iv) we raise the concept of receptive range, guided by which we raise RefCO to sparsely factorize the channel operator. Based on above four vital components, we raise a novel light-weight CNN model termed as FalconNet. Experimental results validate that FalconNet can achieve higher accuracy with lower number of parameters and Flops compared to existing light-weight CNNs.

Zhicheng Cai, Qiu Shen
An Interactive Evolutionary Algorithm for Ceramic Formula Design

The ceramic industry is a representative traditional industry in Guangdong Province, where its degree of informatization is low, and the design of ceramic formula mainly depends on human experience. To intelligently generate ceramic formulas, two main challenges are raised, i.e., the evaluation of a ceramic formula by actual firing is expensive, and the historical accumulated actual data are limited. To solve this problem, this paper models the ceramic formula design process as an expensive constrained multi-objective optimization problem. Based on the mathematical model, we propose an interactive hybrid metaheuristic evolutionary algorithm, cEDA to optimize the production cost and meanwhile satisfy the category constraints, chemical component constraints and material constraints. It consists of three key components, nondominated sorting, materials selection and proportion allocation to search for qualified ceramic formulas. To incorporate domain expertise, a classification-based interactive optimization method is introduced in cEDA. After two rounds of interaction, the acceptance rate of the generated formulas by the algorithm has increased from 18% to 87.5%, which demonstrates the effectiveness of the proposed algorithm.

Wen-Xiang Song, Wei-Neng Chen, Ya-Hui Jia
Using Less but Important Information for Feature Distillation

The purpose of feature distillation is that using the teacher network to supervise student network so that the student network can mimic the intermediate layer representation of the teacher network. The most intuitive way of feature distillation is to use the Mean-Square Error (MSE) to optimize the distance of feature representation at the same level for both networks. However, one problem in feature distillation is that the dimension of the intermediate layer feature maps of the student network may be different from that of the teacher network. Previous work mostly elaborated a projector to transform feature maps to the same dimension. In this paper, we proposed a simple and straightforward feature distillation method without additional projector to adapt the feature dimension inconsistency between the teacher and the student networks. We consider the redundancy of the data and show that it is not necessary to use all the information when performing feature distillation. In detail, we propose a cut-off operation for channel alignment and use singular value decomposition (SVD) for knowledge alignment so that only important information is transferred to the student network to solve the dimension inconsistency problem. Extensive experiments on several different models show that our method can improve the performance of student networks.

Xiang Wen, Yanming Chen, Li Liu, Choonghyun Lee, Yi Zhao, Yong Gong
Efficient Mobile Robot Navigation Based on Federated Learning and Three-Way Decisions

In the context of Industry 5.0, the significance of MRN (Mobile Robot Navigation) cannot be overstated, as it is crucial for facilitating the collaboration between machines and humans. To augment MRN capabilities, emerging technologies such as federated learning (FL) are being utilized. FL enables the consolidation of knowledge from numerous robots located in diverse areas, enabling them to collectively learn and enhance their navigation skills. By integrating FL into MRN systems, Industry 5.0 can effectively utilize collaborative intelligence for efficient and high-quality production processes. When considering information representation in MRN, the adoption of picture fuzzy sets (PFSs), which expand upon the concept of intuitionistic fuzzy sets, offers significant advantages in effectively handling information inconsistencies in practical situations. Specifically, by leveraging the benefits of multi-granularity (MG) probabilistic rough sets (PRSs) and three-way decisions (3WD) within the FL framework, an efficient MRN approach based on FL and 3WD is thoroughly investigated. Initially, the adjustable MG picture fuzzy (PF) PRS model is developed by incorporating MG PRSs into the PF framework. Subsequently, the PF maximum deviation method is utilized to calculate various weights. In order to determine the optimal granularity of MG PF membership degrees, the CODAS (Combinative Distance based ASsesment) method is employed, known for its flexibility in handling both quantitative and qualitative attributes whereas effectively managing incomplete or inconsistent data with transparency and efficiency. After determining the optimal granularity, the MRN method grounded in FL and 3WD is established. Finally, a realistic case study utilizing MRN data from the Kaggle database is performed to validate the feasibility of our method.

Chao Zhang, Haonan Hou, Arun Kumar Sangaiah, Deyu Li, Feng Cao, Baoli Wang
GCM-FL: A Novel Granular Computing Model in Federated Learning for Fault Diagnosis

In the realm of industrial production, maintaining continuous monitoring and implementing precise diagnostics of mine ventilators (MVs) holds a critical role in minimizing faults and accidents. Hence, it becomes imperative to devise an efficient and precise fault diagnosis (FD) technique for MVs. This paper endeavors to address the FD challenge of MVs by integrating federated learning (FL) with granular computing models. FL offers a partial solution to the issues of security and privacy associated with data sharing, which ensures data security while establishing an FD system for MVs. This system harnesses the adjustable multi-granularity (MG) triangular fuzzy (TF) probabilistic rough set (PRS) model to enhance the model’s interpretability. In this study, the TF concept is introduced into the structure of three-way decisions (TWD) to tackle uncertainty and multi-performance attributes. We introduce the notion of an MG TF information system (IS) and propose an adjustable MG TF PRS model. The ELECTRE (Elimination Et Choice Translating Reality) method is employed to determine the optimal threshold. Furthermore, to validate the efficiency of the proposed model, we establish a TF multi-attribute group decision-making (MAGDM) approach using MV data within the MG and TWD frameworks. Finally, we verify the method’s applicability through comparative analysis experiments. The experimental outcomes demonstrate the method’s effectiveness and practicality in diagnosing faults for MVs.

Xueqing Fan, Chao Zhang, Arun Kumar Sangaiah, Yuting Cheng, Anna Wang, Liyin Wang
Adaptive Load Frequency Control and Optimization Based on TD3 Algorithm and Linear Active Disturbance Rejection Control

This paper presents the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm optimized Linear Active Disturbance Rejection Control (LADRC) approach to tackle the problem of frequency deviation resulting from load disturbance and Renewable Energy Sources (RESs) in interconnected power systems. The LADRC approach employs a Linear Extended State Observer (LESO) to estimate the disturbance information in each area and utilizes a Proportional-Derivative (PD) controller to eliminate the disturbance. Simultaneously, the TD3 algorithm is trained in to acquire the adaptive controller parameters. In order to improve the convergence of the TD3 algorithm, a Lyapunov-reward shaping function is adopted. Finally, the proposed method is applied to two-area interconnected power system, comprising thermal, hydro, and gas power plants in each area, as well as RESs such as a noise-based wind turbine and photovoltaic (PV) system. The simulation results indicate that the proposed method is a highly effective approach for load frequency control.

Yuemin Zheng, Jin Tao, Qinglin Sun, Hao Sun, Mingwei Sun, Zengqiang Chen
Theory-Guided Convolutional Neural Network with an Enhanced Water Flow Optimizer

Theory-guided neural network recently has been used to solve partial differential equations. This method has received widespread attention due to its low data requirements and adherence to physical laws during the training process. However, the selection of the punishment coefficient for including physical laws as a penalty term in the loss function undoubtedly affects the performance of the model. In this paper, we propose a comprehensive theory-guided framework using a bilevel programming model that can adaptively adjust the hyperparameters of the loss function to further enhance the performance of the model. An enhanced water flow optimizer (EWFO) algorithm is applied to optimize upper-level variables in the framework. In this algorithm, an opposition-based learning technic is used in the initialization phase to boost the initial group quality; a nonlinear convergence factor is added to the laminar flow operator to upgrade the diversity of the group and expand the search range. The experiments show that competitive performance of the method in solving stochastic partial differential equations.

Xiaofeng Xue, Xiaoling Gong, Jacek Mańdziuk, Jun Yao, El-Sayed M. El-Alfy, Jian Wang
An End-to-End Dense Connected Heterogeneous Graph Convolutional Neural Network

Graph convolutional networks (GCNs) are powerful models for graph-structured data learning task. However, most existing GCNs may confront with two major challenges when dealing with heterogeneous graph: (1) Predefined meta-paths are required to capture the semantic relations between nodes from different types, which may not exploit all the useful information in the graph; (2) Performance degradation and semantic confusion may happen with the growth of the network depth, which limits their ability to capture long-range dependencies. To meet these challenges, we propose Dense-HGCN, an end-to-end dense connected heterogeneous convolutional neural network to learn node representation. Dense-HGCN computes the attention weights between different nodes and incorporates the information of previous layers into each layer’s aggregation process via a specific fuse function. Moreover, Dense-HGCN leverages multi-scale information for node classification or other downstream tasks. Experimental results on real-world datasets demonstrate the superior performance of Dense-HGCN in enhancing the representational power compared with several state-of-the-art methods.

Ranhui Yan, Jia Cai
Actor-Critic with Variable Time Discretization via Sustained Actions

Reinforcement learning (RL) methods work in discrete time. In order to apply RL to inherently continuous problems like robotic control, a specific time discretization needs to be defined. This is a choice between sparse time control, which may be easier to train, and finer time control, which may allow for better ultimate performance. In this work, we propose SusACER, an off-policy RL algorithm that combines the advantages of different time discretization settings. Initially, it operates with sparse time discretization and gradually switches to a fine one. We analyze the effects of the changing time discretization in robotic control environments: Ant, HalfCheetah, Hopper, and Walker2D. In all cases our proposed algorithm outperforms state of the art.

Jakub Łyskawa, Paweł Wawrzyński
Scalable Bayesian Tensor Ring Factorization for Multiway Data Analysis

Tensor decompositions play a crucial role in numerous applications related to multi-way data analysis. By employing a Bayesian framework with sparsity-inducing priors, Bayesian Tensor Ring (BTR) factorization offers probabilistic estimates and an effective approach for automatically adapting the tensor ring rank during the learning process. However, previous BTR [10] method employs an Automatic Relevance Determination (ARD) prior, which can lead to sub-optimal solutions. Besides, it solely focuses on continuous data, whereas many applications involve discrete data. More importantly, it relies on the Coordinate-Ascent Variational Inference (CAVI) algorithm, which is inadequate for handling large tensors with extensive observations. These limitations greatly limit its application scales and scopes, making it suitable only for small-scale problems, such as image/video completion. To address these issues, we propose a novel BTR model that incorporates a nonparametric Multiplicative Gamma Process (MGP) prior, known for its superior accuracy in identifying latent structures. To handle discrete data, we introduce the Pólya-Gamma augmentation for closed-form updates. Furthermore, we develop an efficient Gibbs sampler for consistent posterior simulation, which reduces the computational complexity of previous VI algorithm by two orders, and an online EM algorithm that is scalable to extremely large tensors. To showcase the advantages of our model, we conduct extensive experiments on both simulation data and real-world applications.

Zerui Tao, Toshihisa Tanaka, Qibin Zhao
Predefined-Time Event-Triggered Consensus for Nonlinear Multi-Agent Systems with Uncertain Parameter

In this paper, a novel predefined-time event-triggered control method is proposed, which achieved to the consistency of multi-agent systems with uncertain parameter. Firstly, a new predefined-time stability theorem is given, and the correctness and feasibility of this stability theorem are analyzed, the flexible preset time is more practical than the existed stability theorem. Compared with existing stability theorems, this theorem simplifies the conditions satisfied by Lyapunov function and is easier to implement in practical applications. Secondly, an event-triggered control strategy is designed to reduce control costs. Then, a new sufficient criterion is given to achieve the consistency of multi-agent systems with uncertain parameter based on the predefined-time stability theorem and event-triggered controller. In addition, the state consensus between nonlinear agents is completed in a predefined time, as well as the measurement error of the agent is converges to zero within the predefined time, respectively. Finally, the validity and feasibility of the given theoretical results are verified by a simulation example.

Yafei Lu, Hui Zhao, Aidi Liu, Mingwen Zheng, Sijie Niu, Xizhan Gao, Xiju Zong
Cascaded Fuzzy PID Control for Quadrotor UAVs Based on RBF Neural Networks

Since quadrotor UAVs often need to fly in complex and changing environments, their systems suffer from slow smooth control response, weak self-turbulence capability, and poor self-adaptability. Thus, it is crucially important to carefully formulate a quadrotor UAV control system that can maintain high-precision control and high immunity to disturbance in complex environments. In this paper, an improved nonlinear cascaded fuzzy PID control approach for quadrotor UAVs based on RBF neural network is proposed. Based on the analysis and establishment of the UAV flight control model, this paper designs a control approach with an outer-loop fuzzy adaptive PID control and an inner-loop RBF neural network. The simulation results show that introducing RBF neural networks into the nonlinear fuzzy adaptive PID control can make it have better high-precision control and high anti-disturbance under the influence of different environmental variables.

Zicheng Huang, Huiwei Wang, Xin Wang
Generalizing Graph Network Models for the Traveling Salesman Problem with Lin-Kernighan-Helsgaun Heuristics

Existing graph convolutional network (GCN) models for the traveling salesman problem (TSP) cannot generalize well to TSP instances with larger number of cities than training samples, and the NP-Hard nature of the TSP renders it impractical to use large-scale instances for training. This paper proposes a novel approach that generalizes well a pre-trained GCN model for a fixed small TSP size to large scale instances with the help of Lin-Kernighan-Helsgaun (LKH) heuristics. This is realized by first devising a Sierpinski partition scheme to partition a large TSP into sub-problems that can be efficiently solved by the pre-trained GCN, and then developing an attention-based merging mechanism to integrate the sub-solutions as a whole solution to the original TSP instance. Specifically, we train a GCN model by supervised learning to produce edge prediction heat maps of small-scale TSP instances, then apply it to the sub-problems of a large TSP instance generated by partition strategies. Controlled by an attention mechanism, all the heat maps of the sub-problems are merged into a complete one to construct the edge candidate set for LKH. Experiments show that this new approach significantly enhances the generalization ability of the pre-trained GCN model without using labeled large-scale TSP instances in the training process and also outperforms LKH in the same time limit.

Mingfei Li, Shikui Tu, Lei Xu
Communication-Efficient Distributed Minimax Optimization via Markov Compression

Recently, the minimax problem has attracted a lot of attention due to its wide applications in modern machine learning fields such as GANs. With the exponential growth of data volumes and increasing problem sizes, the design of distributed algorithms to train high-performance models has become imperative. However, distributed algorithms often suffer from communication bottlenecks. To address this challenge, in this paper, we propose a communication-efficient distributed compressed stochastic gradient descent ascent algorithm, abbreviated as DCSGDA, in a parameter-server setting. To reduce the communication cost, each client in DCSGDA transmits the compressed gradients of the primal and dual variables to the server at each iteration. In particular, we leverage a Markov compression mechanism that allows both unbiased and biased compressors to mitigate the negative effect of compression errors on convergence. Namely, we show theoretically that the DCSGDA algorithm can still achieve linear convergence in the presence of compression errors, provided that the local objective function is strongly-convex-strongly-concave. Finally, numerical experiments demonstrate the desirable communication efficiency and efficacy of the proposed DCSGDA.

Linfeng Yang, Zhen Zhang, Keqin Che, Shaofu Yang, Suyang Wang
Multi-level Augmentation Boosts Hybrid CNN-Transformer Model for Semi-supervised Cardiac MRI Segmentation

Over the past few years, many supervised deep learning algorithms based on Convolutional Neural Networks (CNN) and Vision Transformers (ViT) have achieved remarkable progress in the field of clinical-assisted diagnosis. However, the specific application of these algorithms e.g. ViT which requires a large amount of data in the training process is greatly limited due to the high cost of medical image annotation. To address this issue, this paper proposes an effective semi-supervised medical image segmentation framework, which combines two models with different structures, i.e. CNN and Transformer, and integrates their abilities to extract local and global information through a mutual supervision strategy. Based on this heterogeneous dual-network model, we employ multi-level image augmentation to expand the dataset, alleviating the model’s demand for data. Additionally, we introduce an uncertainty minimization constraint to further improve the model’s robustness, and incorporate an equivariance regularization module to encourage the model to capture semantic information of different categories in the images. In public benchmark tests, we demonstrate that the proposed method outperforms the recently developed semi-supervised medical image segmentation methods in terms of specific metrics such as Dice coefficient and 95% Hausdorff Distance for segmentation performance. The code will be released at .

Ruohan Lin, Wangjing Qi, Tao Wang
Wasserstein Diversity-Enriched Regularizer for Hierarchical Reinforcement Learning

Hierarchical reinforcement learning composites subpolicies in different hierarchies to accomplish complex tasks. Automated subpolicies discovery, which does not depend on domain knowledge, is a promising approach to generating subpolicies. However, the degradation problem is a challenge that existing methods can hardly deal with due to the lack of consideration of diversity or the employment of weak regularizers. In this paper, we propose a novel task-agnostic regularizer called the Wasserstein Diversity-Enriched Regularizer (WDER), which enlarges the diversity of subpolicies by maximizing the Wasserstein distances among action distributions. The proposed WDER can be easily incorporated into the loss function of existing methods to boost their performance further. Experimental results demonstrate that our WDER improves performance and sample efficiency in comparison with prior work without modifying hyperparameters, which indicates the applicability and robustness of the WDER.

Haorui Li, Jiaqi Liang, Linjing Li, Daniel Zeng
An Adaptive Detector for Few Shot Object Detection

Few-shot object detection has made progress in recent years. However, most research assumes that base and new classes come from the same domain. In real-world applications, they often come from different domains, resulting in poor adaptability of existing methods. To address this problem, we designed an adaptive few-shot object detection framework. Based on the Meta R-CNN framework, we added an image domain classifier after the backbone’s last layer to reduce domain discrepancy. To avoid class feature confusion caused by image feature distribution alignment, we also added a feature filter module (CAFFM) to filter out features irrelevant to specific classes. We tested our method on three base/new splits and found significant performance improvements compared to the base model Meta R-CNN. In base/new split2, mAP50 increased by $$\pm 8 \% $$ ± 8 % , and in the remaining two splits, mAP50 improved by $$\pm 3 \% $$ ± 3 % . Our method outperforms state-of-the-art methods in most cases for the three different base/new splits, validating the efficacy and generality of our approach.

Jiming Yan, Hongbo Wang, Xinchen Liu
Neural Information Processing
Biao Luo
Long Cheng
Zheng-Guang Wu
Hongyi Li
Chaojie Li
Copyright Year
Springer Nature Singapore
Electronic ISBN
Print ISBN

Premium Partner