Skip to main content
Top

2024 | Book

Neural Information Processing

30th International Conference, ICONIP 2023, Changsha, China, November 20–23, 2023, Proceedings, Part III

Editors: Biao Luo, Long Cheng, Zheng-Guang Wu, Hongyi Li, Chaojie Li

Publisher: Springer Nature Singapore

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

The six-volume set LNCS 14447 until 14452 constitutes the refereed proceedings of the 30th International Conference on Neural Information Processing, ICONIP 2023, held in Changsha, China, in November 2023.
The 652 papers presented in the proceedings set were carefully reviewed and selected from 1274 submissions. They focus on theory and algorithms, cognitive neurosciences; human centred computing; applications in neuroscience, neural networks, deep learning, and related fields.

Table of Contents

Frontmatter

Theory and Algorithms

Frontmatter
Efficient Lightweight Network with Transformer-Based Distillation for Micro-crack Detection of Solar Cells

Micro-cracks on solar cells often affect the power generation efficiency, so this paper proposes a lightweight network for cell image micro-crack detection task. Firstly, a Feature Selection framework is proposed, which can efficiently and adaptively decide the number of layers of the feature extraction network, and clip unnecessary feature generation process. In addition, based on the design of the Transformer layer, Transformer Distillation is proposed. In Transformer Distillation, the designed Transformer Refine module excavates the distillation information from the two dimensions of features and relations. Using a combination of Feature Selection and Transformer Distillation, the lightweight networks based on ResNet and ViT can achieve much better effects than the original networks, with classification accuracy rates of 88.58% and 89.35% respectively.

Xiangying Xie, Xinyue Liu, QiXiang Chen, Biao Leng
MTLAN: Multi-Task Learning and Auxiliary Network for Enhanced Sentence Embedding

The objective of cross-lingual sentence embedding learning is to map sentences into a shared representation space, where semantically similar sentence representations are closer together, while distinct sentence representations exhibit clear differentiation. This paper proposes a novel sentence embedding model called MTLAN, which incorporates multi-task learning and auxiliary networks. The model utilizes the LaBSE model for extracting sentence features and undergoes joint training on tasks related to sentence semantic representation and distance measurement. Furthermore, an auxiliary network is employed to enhance the contextual expression of words within sentences. To address the issue of limited resources for low-resource languages, we construct a pseudo-corpus dataset using a multilingual dictionary for unsupervised learning. We conduct experiments on multiple publicly available datasets, including STS and SICK, to evaluate both monolingual sentence similarity and cross-lingual semantic similarity. The empirical results demonstrate the significant superiority of our proposed model over state-of-the-art methods.

Gang Liu, Tongli Wang, Wenli Yang, Zhizheng Yan, Kai Zhan
Correlated Online k-Nearest Neighbors Regressor Chain for Online Multi-output Regression

Online multi-output regression is a crucial task in machine learning with applications in various domains such as environmental monitoring, energy efficiency prediction, and water quality prediction. This paper introduces CONNRC, a novel algorithm designed to address online multi-output regression challenges and provide accurate real-time predictions. CONNRC builds upon the k-nearest neighbor algorithm in an online manner and incorporates a relevant chain structure to effectively capture and utilize correlations among structured multi-outputs. The main contribution of this work lies in the potential of CONNRC to enhance the accuracy and efficiency of real-time predictions across diverse application domains. Through a comprehensive experimental evaluation on six real-world datasets, CONNRC is compared against five existing online regression algorithms. The consistent results highlight that CONNRC consistently outperforms the other algorithms in terms of average Mean Absolute Error, demonstrating its superior accuracy in multi-output regression tasks. However, the time performance of CONNRC requires further improvement, indicating an area for future research and optimization.

Zipeng Wu, Chu Kiong Loo, Kitsuchart Pasupa
Evolutionary Computation for Berth Allocation Problems: A Survey

Berth allocation problem (BAP) is to assign berthing spaces for incoming vessels while considering various constraints and objectives, which is an important optimization problem in port logistics. Evolutionary computation (EC) algorithms are a class of meta-heuristic optimization algorithms that mimic the process of natural evolution and swarm intelligence behaivors to generate and evolve potential solutions to optimization problems. Due to the advantages of strong gobal search capability and robustness, the EC algorithms have gained significant attention in many research fields. In recent years, many studies have successfully applied EC algorithms in solving BAPs and achieved encouraging performance. This paper aims to survey the existing literature on the EC algorithms for solving BAPs. First, this survey introduces two common models of BAPs, which are continuous BAP and discrete BAP. Second, this paper introduces three typical EC algorithms (including genetic algorithm, particle swarm optimization, and ant colony optimization) and analyzes the existing studies of using these EC algorithms to solve BAPs. Finally, this paper analyzes the future research directions of the EC algorithms in solving BAPs.

Xin-Xin Xu, Yi Jiang, Lei Zhang, Xun Liu, Xiang-Qian Ding, Zhi-Hui Zhan

Cognitive Neurosciences

Frontmatter
Privacy-Preserving Travel Time Prediction for Internet of Vehicles: A Crowdsensing and Federated Learning Approach

Travel time prediction (TTP) is an important module task to support various applications for Internet of Vehicles (IoVs). Although TTP has been widely investigated in the existing literature, most of them assume that the traffic data for estimating the travel time are comprehensive and public for free. However, accurate TTP needs real-time vehicular data so that the prediction can be adaptive to traffic changes. Moreover, since real-time data contain vehicles’ privacy, TTP requires protection during the data processing. In this paper, we propose a novel Privacy-Preserving TTP mechanism for IoVs, $$\mathbb{P}\mathbb{T}$$ P T Prediction, based on crowdsensing and federated learning. In crowdsensing, a data curator continually collects traffic data from vehicles for TTP. To protect the vehicles’ privacy, we make use of the federated learning so that vehicles can help the data curator train the prediction model without revealing their information. We also design a spatial prefix encoding method to protect vehicles’ location information, along with a ciphertext-policy attribute-based encryption (CP-ABE) mechanism to protect the prediction model of the curator. We evaluate $$\mathbb{P}\mathbb{T}$$ P T Prediction in terms of MAE, MSE, RMSE on two real-world traffic datasets. The experimental results illustrate that the proposed $$\mathbb{P}\mathbb{T}$$ P T Prediction shows higher prediction accuracy and stronger privacy protection comparing to the existing methods.

Hongyu Huang, Cui Sun, Xinyu Lei, Nankun Mu, Chunqiang Hu, Chao Chen, Huaqing Li, Yantao Li
A Fine-Grained Domain Adaptation Method for Cross-Session Vigilance Estimation in SSVEP-Based BCI

Brain-computer interface (BCI), a direct communication system between the human brain and external environment, can provide assistance for people with disabilities. Vigilance is an important cognitive state and has a close influence on the performance of users in BCI systems. In this study, a four-target BCI system for cursor control was built based on steady-state visual evoked potential (SSVEP) and twelve subjects were recruited and carried out two long-term BCI experimental sessions, which consisted of two SSVEP-based cursor-control tasks. During each session, electroencephalogram (EEG) signals were recorded. Based on the labeled EEG data of the source domain (previous session) and a small amount of unlabeled EEG data of the target domain (new session), we developed a fine-grained domain adaptation network (FGDAN) for cross-session vigilance estimation in BCI tasks. In the FGDAN model, the graph convolution network (GCN) was built to extract deep features of EEG. The fined-grained feature alignment was proposed to highlight the importance of the different channels figured out by the attention weights mechanism and aligns the feature distributions between source and target domains at the channel level. The experimental results demonstrate that the proposed FGDAN achieved a better performance than the compared methods and indicate the feasibility and effectiveness of our methods for cross-session vigilance estimation of BCI users.

Kangning Wang, Shuang Qiu, Wei Wei, Ying Gao, Huiguang He, Minpeng Xu, Dong Ming
RMPE:Reducing Residual Membrane Potential Error for Enabling High-Accuracy and Ultra-low-latency Spiking Neural Networks

Spiking neural networks (SNNs) have attracted great attention due to their distinctive properties of low power consumption and high computing efficiency on neuromorphic hardware. An effective way to obtain deep SNNs with competitive accuracy on large-scale datasets is ANN-SNN conversion. However, it requires a long time window to get an optimal mapping between the firing rates of SNNs and the activation of ANNs due to conversion error. Compared with the source ANN, the converted SNN usually suffers a huge loss of accuracy at ultra-low latency. In this paper, we first analyze the residual membrane potential error caused by the asynchronous transmission property of spikes at ultra-low latency, and we deduce an explicit expression for the residual membrane potential error (RMPE) and the SNN parameters. Then we propose a layer-by-layer calibration algorithm for these SNN parameters to eliminate RMPE. Finally, a two-stage ANN-SNN conversion scheme is proposed to eliminate the quantization error, the truncation error, and the RMPE separately. We evaluate our method on CIRFARs and ImageNet, and the experimental results show that the proposed ANN-SNN conversion method has a significant reduction in accuracy loss at ultra-low-latency. When T is $$\le 64$$ ≤ 64 , our method requires about half the latency of other methods of similar accuracy on ImageNet. The code is available at https://github.com/JominWink/SNN_Conversion_Phase.

Yunhua Chen, Zhimin Xiong, Ren Feng, Pinghua Chen, Jinsheng Xiao
An Improved Target Searching and Imaging Method for CSAR

Circular Synthetic Aperture Radar (CSAR) has attracted much attention in the field of high-resolution SAR imaging. In order to shorten the computation time and improve the imaging effect, in this paper, we propose a fast CSAR imaging strategy that searches the target and automatically selects the area of interest for imaging. The first step is to find the target and select the imaging center and interest imaging area based on the target search algorithm, the second step is to divide the full-aperture data into sub-apertures according to the angle, the third step is to approximate the sub-apertures as linear arrays and imaging them separately, and the last step is to perform sub-image fusion to obtain the final CSAR image. This method can greatly reduce the imaging time and obtain well-focused CSAR images. The proposed algorithm is verified by both simulation and processing real data collected with our mmWave imager prototype utilizing commercially available 77-GHz MIMO radar sensors. Through the experimental results we verified the performance and the superiority of the our algorithm.

Yuxiao Deng, Chuandong Li, Yawei Shi, Huiwei Wang, Huaqing Li
Block-Matching Multi-pedestrian Tracking

Target association is an extremely important problem in the field of multi-object tracking, especially for pedestrian scenes with high similarity in appearance and dense distribution. The traditional approach of combining IOU and ReID techniques with the Hungarian algorithm only partially addresses these challenges. To improve the model’s matching ability, this paper proposes a block-matching model that extracts local features using a Block Matching Module (BMM) based on the Transformer model. The BMM extracts features by dividing them into blocks and mines effective features of the target to complete target similarity evaluation. Additionally, a Euclidean Distance Module (EDM) based on the Euclidean distance association matching strategy is introduced to further enhance the model’s association ability. By integrating BMM and EDM into the same multi-object tracking model, this paper establishes a novel model called BWTrack that achieves excellent performance on MOT16, MOT17, and MOT20 while maintaining high performance at 7 FPS on a single GPU.

Chao Zhang
RPF3D: Range-Pillar Feature Deep Fusion 3D Detector for Autonomous Driving

In this paper, we present RPF3D, an innovative single-stage framework that explores the complementary nature of point clouds and range images for 3D object detection. Our method addresses the sampling region imbalance issue inherent in fixed-dilation-rate convolutional layers, allowing for a more accurate representation of the input data. To enhance the model’s adaptability, we introduce several attention layers that accommodate a wide range of dilation rates necessary for processing range image scenes. To tackle the challenges of feature fusion and alignment, we propose the AttentiveFusion module and the Range Image Guided Deep Fusion (RIGDF) backbone architecture in the Range-Pillar Feature Fusion section, which effectively addresses the one-pillar-to-multiple-pixels feature alignment problem caused by the point cloud encoding strategy. These innovative components work together to provide a more robust and accurate fusion of features for improved 3D object detection. We validate the effectiveness of our RPF3D framework through extensive experiments on the KITTI and Waymo Open Datasets. The results demonstrate the superior performance of our approach compared to existing methods, particularly in the Car class detection where a significant enhancement is achieved on both datasets. This showcases the practical applicability and potential impact of our proposed framework in real-world scenarios and emphasizes its relevance in the domain of 3D object detection.

Yihan Wang, Qiao Yan
Traffic Signal Control Optimization Based on Deep Reinforcement Learning with Attention Mechanisms

Deep reinforcement learning (DRL) methodology with traffic control systems plays a vital role in adaptive traffic signal controls. However, previous studies have frequently disregarded the significance of vehicles near intersections, which typically involve higher decision-making requirements and safety considerations. To overcome this challenge, this paper presents a novel DRL-based method for traffic signal control, which incorporates an attention mechanism into the Dueling Double Deep Q Network (D3QN) framework. This approach emphasizes the priority of vehicles near intersections by assigning them higher weights and more attention. Moreover, the state design incorporates signal light statuses to facilitate a more comprehensive understanding of the current traffic environment. Furthermore, the model’s performance is enhanced through the utilization of Double DQN and Dueling DQN techniques. The experimental findings demonstrate the superior efficacy of the proposed method in critical metrics such as vehicle waiting time, queue length, and the number of halted vehicles when compared to D3QN, traditional DQN, and fixed timing strategies.

Wenlong Ni, Peng Wang, Zehong Li, Chuanzhuang Li
CMCI: A Robust Multimodal Fusion Method for Spiking Neural Networks

Human understand the external world through a variety of perceptual processes such as sight, sound, touch and smell. Simulating such biological multi-sensory fusion decisions using a computational model is important for both computer and neuroscience research. Spiking Neural Networks (SNNs) mimic the neural dynamics of the brain, which are expected to reveal the biological multimodal perception mechanism. However, existing works of multimodal SNNs are still limited, and most of them only focus on audiovisual fusion and lack systematic comparison of the performance and robustness of the models. In this paper, we propose a novel fusion module called Cross-modality Current Integration (CMCI) for multimodal SNNs and systematically compare it with other fusion methods on visual, auditory and olfactory fusion recognition tasks. Besides, a regularization technique called Modality-wise Dropout (ModDrop) is introduced to further improve the robustness of multimodal SNNs in missing modalities. Experimental results show that our method exhibits superiority in both modality-complete and missing conditions without any additional networks or parameters.

Runhao Jiang, Jianing Han, Yingying Xue, Ping Wang, Huajin Tang
A Weakly Supervised Deep Learning Model for Alzheimer’s Disease Prognosis Using MRI and Incomplete Labels

Predicting cognitive scores using magnetic resonance imaging (MRI) can aid in the early recognition of Alzheimer’s disease (AD) and provide insights into future disease progression. Existing methods typically ignore the temporal consistency of cognitive scores and discard the subjects with incomplete cognitive scores. In this paper, we propose a Weakly supervised Alzheimer’s Disease Prognosis (WADP) model that incorporates an image embedding network and a label embedding network to predict cognitive scores using baseline MRI and incomplete cognitive scores. The image embedding network is an attention consistency regularized network to project MRI into the image embedding space and output the cognitive scores at multiple time-points. The attention consistency regularization captures the correlations among time-points by encouraging the attention maps at different time-points to be similar. The label embedding network employs a denoising autoencoder to embed cognitive scores into the label embedding space and impute missing cognitive scores. This enables the utilization of subjects with incomplete cognitive scores in the training process. Moreover, a relation alignment module is incorporated to make the relationships between samples in the image embedding space consistent with those in the label embedding space. The experimental results on two ADNI datasets show that WADP outperforms the state-of-the-art methods.

Zhi Chen, Yongguo Liu, Yun Zhang, Jiajing Zhu, Qiaoqin Li
Two-Stream Spectral-Temporal Denoising Network for End-to-End Robust EEG-Based Emotion Recognition

Emotion recognition based on electroencephalography (EEG) is attracting more and more interest in affective computing. Previous studies have predominantly relied on manually extracted features from EEG signals. It remains largely unexplored in the utilization of raw EEG signals, which contain more temporal information but present a significant challenge due to their abundance of redundant data and susceptibility to contamination from other physiological signals, such as electrooculography (EOG) and electromyography (EMG). To cope with the high dimensionality and noise interference in end-to-end EEG-based emotion recognition tasks, we introduce a Two-Stream Spectral-Temporal Denoising Network (TS-STDN) which takes into account the spectral and temporal aspects of EEG signals. Moreover, two U-net modules are adopted to reconstruct clean EEG signals in both spectral and temporal domains while extracting discriminative features from noisy data for classifying emotions. Extensive experiments are conducted on two public datasets, SEED and SEED-IV, with the original EEG signals and the noisy EEG signals contaminated by EMG signals. Compared to the baselines, our TS-STDN model exhibits a notable improvement in accuracy, demonstrating an increase of 6% and 8% on the clean data and 11% and 10% on the noisy data, which shows the robustness of the model.

Xuan-Hao Liu, Wei-Bang Jiang, Wei-Long Zheng, Bao-Liang Lu
Brain-Inspired Binaural Sound Source Localization Method Based on Liquid State Machine

Binaural Sound Source Localization (BSSL) is a remarkable topic in robot design and human hearing aid. A great number of algorithms flourished due to a leap in machine learning. However, prior approaches lack the ability to make a trade-off between parameter size and accuracy, which is a primary obstacle to their further implementation on resource-constrained devices. Spiking Neural Network (SNN)-based models have also emerged due to their inherent computing superiority over sparse event processing. Liquid State Machine (LSM) is a classic Spiking Recurrent Neural Network (SRNN) which has the natural potential of processing spatiotemporal information. LSM has been proved advantageous on numerous tasks once proposed. Yet, to our best knowledge, it is the first proposed BSSL model based on LSM, and we name it BSSL-LSM. BSSL-LSM is lightweight with only 1.04M parameters, which is a considerable reduction compared to CNN (10.1M) and D-BPNN (2.23M) while maintaining comparable or even superior accuracy. Compared to SNN-IID, there is a 10% accuracy improvement for $$10^\circ $$ 10 ∘ interval localization. To achieve better performance, we introduce Bayesian Optimization (BO) for hyperparameters searching and a novel soft label technique for better differentiating adjacent angles, which can be easily mirrored on related works. Project page: https://github.com/BSSL-LSM .

Yuan Li, Jingyue Zhao, Xun Xiao, Renzhi Chen, Lei Wang
A Causality-Based Interpretable Cognitive Diagnosis Model

Cognitive diagnosis model (abbr.CDM) aims to assess students’ cognitive processes during learning, enabling personalized support based on their needs. Nevertheless, deep learning-based CDMs are inherently opaque, posing challenges in providing psychological insights into the reasoning behind predicted outcomes. We address this by creating three interpretable parameters: skill mastery, exercise difficulty, and exercise discrimination. Inspired by Bayesian networks and neural networks, we use feature engineering for extraction of interpretable parameters and tree-enhanced naive Bayes classifiers for prediction. Our method balances interpretability and accuracy. Experimentally, we compare our approach to traditional and advanced models on four datasets, analyzing each feature’s impact. We conduct ablation studies on each feature to examine their contribution to student performance prediction. Thus, causality-based interpretable cognitive diagnosis model (CBICDM) has great potential for providing adaptive and personalized instructions with causal reasoning in real-world educational systems.

Jinwei Zhou, Zhengyang Wu, Changzhe Yuan, Lizhang Zeng
RoBrain: Towards Robust Brain-to-Image Reconstruction via Cross-Domain Contrastive Learning

With the development of neuroimaging technology and deep learning methods, neural decoding with functional Magnetic Resonance Imaging (fMRI) of human brain has attracted more and more attention. Neural reconstruction task, which intends to reconstruct stimulus images from fMRI, is one of the most challenging tasks in neural decoding. Due to the instability of neural signals, trials of fMRI collected under the same stimulus prove to be very different, which leads to the poor robustness and generalization ability of the existing models. In this work, we propose a robust brain-to-image model based on cross-domain contrastive learning. With deep neural network (DNN) features as paradigms, our model can extract features of stimulus stably and generate reconstructed images via DCGAN. Experiments on the benchmark Deep Image Reconstruction dataset show that our method can enhance the robustness of reconstruction significantly.

Che Liu, Changde Du, Huiguang He
High-Dimensional Multi-objective PSO Based on Radial Projection

When solving multi-objective problems, traditional methods face increased complexity and convergence difficulties because of the increasing number of objectives. This paper proposes a high-dimensional multi-objective particle swarm algorithm that utilizes radial projection to reduce the dimensionality of high-dimensional particles. Firstly, the solution vector space coordinates undergo normalization. Subsequently, the high-dimensional solution space is projected onto 2-dimensional radial space, aiming to reduce computational complexity. Following this, grid partitioning is employed to enhance the efficiency and effectiveness of optimization algorithms. Lastly, the iterative solution is achieved by utilizing the particle swarm optimization algorithm. In the process of iteratively updating particle solutions, the offspring reuse-based parents selection strategy and the maximum fitness-based elimination selection strategy are used to strengthen the diversity of the population, thereby enhancing the search ability of the particles. The computational expense is significantly diminished by projecting the solution onto 2-dimensional radial space that exhibits comparable characteristics to the high-dimensional solution, while simultaneously maintaining the distribution and crowding conditions of the complete point set. In addition, the offspring reuse-based parents selection strategy is used to update the external archive set, further avoiding premature convergence to local optimal solution. The experimental results verify the effectiveness of the method in this paper. Compared with four state-of-the-art algorithms, the algorithm proposed in this paper has high search efficiency and fast convergence in solving high-dimensional multi-objective optimization problems, and can also obtain higher quality solutions.

Dekun Tan, Ruchun Zhou, Xuhui Liu, Meimei Lu, Xuefeng Fu, Zhenzhen Li
Link Prediction Based on the Sub-graphs Learning with Fused Features

As one of the important research methods in the area of the knowledge graph completion, link prediction aims to capture the structural information or the attribute information of nodes in the network to predict the link probability between nodes, In particular, the graph neural networks based on the sub-graphs provide a popular approach for the learning representation to the link prediction tasks. However, they cannot solve the resource consumption in large graphs, nor do they combine global structural features since they often simply stitch attribute features and embedding to predict. Therefore, this paper proposes a novel link prediction model based on the Sub-graphs Learning with the Fused Features, named SLFF in short. In particular, the proposed model utilizes random walks to extract the sub-graphs to reduce the overhead in the process. Moreover, it utilizes the Node2Vec to process the entire graph and obtain the global structure characteristics of the node. Afterward, the SLFF model utilizes the existing embedding to reconstruct the embedding according to the neighborhood defined by the graph structure and node attribute space. Finally, the SLFF model can combine the attribute characteristics of the node with the structural characteristics of the node together. The extensive experiments on datasets demonstrates that the proposed SLFF has better performance than that of the state-of-the-art approaches.

Haoran Chen, Jianxia Chen, Dipai Liu, Shuxi Zhang, Shuhan Hu, Yu Cheng, Xinyun Wu
Naturalistic Emotion Recognition Using EEG and Eye Movements

Emotion recognition in affective brain-computer interfaces (aBCI) has emerged as a prominent research area. However, existing experimental paradigms for collecting emotional data often rely on stimuli-based elicitation, which may not accurately reflect emotions experienced in everyday life. Moreover, these paradigms are limited in terms of stimulus types and lack investigation into decoding naturalistic emotional states. To address these limitations, we propose a novel experimental paradigm that enables the recording of physiological signals in a more natural way. In our approach, emotions are allowed to arise spontaneously, unrestricted by specific experimental activities. Participants have the autonomy to determine the start and end of each recording session and provide corresponding emotion label. Over a period of three months, we recruited six subjects and collected data through multiple recording sessions per subject. We utilized electroencephalogram (EEG) and eye movement signals in both subject-dependent and cross-subject settings. In the subject-dependent unimodal condition, our attentive simple graph convolutional network (ASGC) achieved the highest accuracy of 76.32% for emotion recognition based on EEG data. For the cross-subject unimodal condition, our domain adversarial neural network (DANN) outperformed other models, achieving an average accuracy of 71.90% based on EEG data. These experimental results demonstrate the feasibility of recognizing emotions in naturalistic settings. The proposed experimental paradigm holds significant potential for advancing emotion recognition in various practical applications. By allowing emotions to unfold naturally, our approach enables the future emergence of more robust and applicable emotion recognition models in the field of aBCI.

Jian-Ming Zhang, Jiawen Liu, Ziyi Li, Tian-Fang Ma, Yiting Wang, Wei-Long Zheng, Bao-Liang Lu
Task Scheduling with Improved Particle Swarm Optimization in Cloud Data Center

This paper proposes an improved particle swarm optimization algorithm with simulated annealing (IPSO-SA) for the task scheduling problem of cloud data center. The algorithm uses Tent chaotic mapping to make the initial population more evenly distributed. Second, a non-convex function is constructed to adaptively and decreasingly change the inertia weights to adjust the optimization-seeking ability of the particles in different iteration periods. Finally, the Metropolis criterion in SA is used to generate perturbed particles, combined with an modified equation for updating particles to avoid premature particle convergence. Comparative experimental results show that the IPSO-SA algorithm improves 13.8% in convergence accuracy over the standard PSO algorithm. The respective improvements over the other two modified PSO are 15.2% and 9.1%.

Yang Bi, Wenlong Ni, Yao Liu, Lingyue Lai, Xinyu Zhou
Traffic Signal Optimization at T-Shaped Intersections Based on Deep Q Networks

In this paper traffic signal control strategies for T-shaped intersections in urban road networks using deep Q network (DQN) algorithms are proposed. Different DQN networks and dynamic time aggregation were used for decision-makings. The effectiveness of various strategies under different traffic conditions are checked using the Simulation of Urban Mobility (SUMO) software. The simulation results showed that the strategy combining the Dueling DQN method and dynamic time aggregation significantly improved vehicle throughput. Compared with DQN and fixed-time methods, this strategy can reduce the average travel time by up to 43% in low-traffic periods and up to 15% in high-traffic periods. This paper demonstrated the significant advantages of applying Dueling DQN in traffic signal control strategies for urban road networks.

Wenlong Ni, Chuanzhuang Li, Peng Wang, Zehong Li
A Multi-task Framework for Solving Multimodal Multiobjective Optimization Problems

In multimodal multiobjective optimization problems, there may have more than one Pareto optimal solution corresponding to the same objective vector. The key is to find solutions converged and well-distributed. Even though the existing evolutionary multimodal multiobjective algorithms have taken both the distance in the decision space and objective space into consideration, most of them still focus on convergence property. This may omit some regions difficult to search in the decision space during the process of converging to the Pareto front. In order to resolve this problem and maintain the diversity in the whole process, we propose a differential evolutionary algorithm in a muti-task framework (MT-MMEA). This framework uses an $$\varepsilon $$ ε -based auxiliary task only concerning the diversity in decision space and provides well-distributed individuals to the main task by knowledge transfer method. The main task evolves using a non-dominated sorting strategy and outputs the final population as the result. MT-MMEA is comprehensively tested on two MMOP benchmarks and compared with six state-of-the-art algorithms. The results show that our algorithm has a superior performance in solving these problems.

Xinyi Wu, Fei Ming, Wenyin Gong
Domain Generalized Object Detection with Triple Graph Reasoning Network

Recent advances in Domain Adaptive Object Detection (DAOD) have vastly restrained the performance degradation caused by distribution shift. However, DAOD relies on the strong assumption of accessible target domain during the learning procedure, which is tough to be satisfied in real-world applications. Domain Generalized Object Detection (DGOD) aims to generalize the detector trained on the source domains directing to an unknown target domain without accessing the target data. Thus it is a much more challenged problem and very few contributions have been reported. Extracting domain-invariant information is the key problem of domain generalization. Considering that the topological structure of objects does not change with the domain, we present a general DGOD framework, Triple Graph Reasoning Network (TGRN) to uncover and model the structure of objects. The proposed TGRN models the topological relations of foregrounds via building refined sparse graphs on both pixel-level and semantic-level. Meanwhile, a bipartite graph is created to capture structural consistency of instances across domain, implicitly enabling distribution alignment. Experiments on our newly constructed datasets verify the effectiveness of the proposed TGRN. Codes and datasets are available at https://github.com/zjrao/tgrn .

Zhijie Rao, Luyao Tang, Yue Huang, Xinghao Ding
RPUC: Semi-supervised 3D Biomedical Image Segmentation Through Rectified Pyramid Unsupervised Consistency

Deep learning models have demonstrated remarkable performance in various biomedical image segmentation tasks. However, their reliance on a large amount of labeled data for training poses challenges as acquiring well-annotated data is expensive and time-consuming. To address this issue, semi-supervised learning (SSL) has emerged as a potential solution to leverage abundant unlabeled data. In this paper, we propose a simple yet effective consistency regularization scheme called Rectified Pyramid Unsupervised Consistency (RPUC) for semi-supervised 3D biomedical image segmentation. Our RPUC adopts a pyramid-like structure by incorporating three segmentation networks. To fully exploit the available unlabeled data, we introduce a novel pyramid unsupervised consistency (PUC) loss, which enforces consistency among the outputs of the three segmentation models and facilitates the transfer of cyclic knowledge. Additionally, we perturb the inputs of the three networks with varying ratios of Gaussian noise to enhance the consistency of unlabeled data outputs. Furthermore, three pseudo labels are generated from the outputs of the three segmentation networks, providing additional supervision during training. Experimental results demonstrate that our proposed RPUC achieves state-of-the-art performance in semi-supervised segmentation on two publicly available 3D biomedical image datasets.

Xiaogen Zhou, Zhiqiang Li, Tong Tong
Cancellable Iris Recognition Scheme Based on Inversion Fusion and Local Ranking

Iris recognition has gained significant attention and application in real-life and financial scenarios in recent years due to its importance as a biometric data source. While many proposed solutions boast high recognition accuracy, one major concern remains the effective protection of users’ iris data and prevention of privacy breaches. To address this issue, we propose an improved cancellable biometrics scheme based on the inversion fusion and local ranking strategy (IFCB), specifically targeting the vulnerability of the local ranking-based cancellable biometrics scheme (LRCB) to the ranking-inversion attack when recognition accuracy is high. The proposed method disrupts the original iris data by applying a random substitution string and rearranging blocks within each iris string. For every rearranged block, it is either inverted or kept unchanged. This combination of inversed and unchanged blocks, referred as inversion fusion, is then sorted to obtain rank values that are stored for subsequent matching. It is important to note that the inversion fusion step may lead to a loss of accuracy, which can be compensated by amplifying the iris data to improve accuracy. By utilizing a set of different random substitution strings, the rearranged iris strings are employed in both the inversion fusion and local ranking steps. A long iris template is generated and stored as the final protected iris template, forming the basis of the proposed IFCB method. Theoretical and experimental analyses demonstrate that the IFCB scheme effectively withstands rank-inversion attacks and achieves a favorable balance of accuracy, irreversibility, unlinkability, and revocability.

Dongdong Zhao, Wentao Cheng, Jing Zhou, Hongmin Wang, Huanhuan Li
EWMIGCN: Emotional Weighting Based Multimodal Interaction Graph Convolutional Networks for Personalized Prediction

To address the challenges of information overload and cold start in personalized prediction systems, researchers have proposed graph neural network-based recommendation methods. However, existing studies have largely overlooked the shared or similar characteristics among different modal features. Moreover, there is a mismatch between the focuses of multimodal feature extraction (MFE) and user preference modeling (UPM). To tackle these issues, this paper establishes an interaction graph by extracting multimodal information and addresses the mismatch between MFE and UPM by constructing an emotion-weighted bisymmetric linear graph convolutional network (EW-BGCN). Specifically, this paper introduces a novel model called EWMIGCN, which combines multimodal information extraction using parallel CNNs to build an interaction graph, propagates the information on EW-BGCN, and predicts user preferences by summing the expressions of users and items through inner product calculations. Notably, this paper incorporates sentiment information from user comments to finely weigh the neighborhood aggregation in EW-BGCN, enhancing the overall quality of items. Experimental results demonstrate that the proposed model achieves superior performance compared to other baseline models on three datasets, as measured by HitsRatio with Normalized Discounted Cumulative Gain.

Qing Liu, Qian Gao, Jun Fan
Neighborhood Learning for Artificial Bee Colony Algorithm: A Mini-survey

Artificial bee colony (ABC) algorithm is a representative paradigm of swarm intelligence optimization (SIO) algorithms, which has received much attention in the field of global optimization for its good performance yet simple structure. However, there still exists a drawback for ABC that it owns strong exploration but weak exploitation, resulting in slow convergence speed and low convergence accuracy. To solve this drawback, in recent years, the neighborhood learning mechanism has emerged as an effective method, becoming a hot research topic in the community of ABC. However, there has been no surveys on it, even a short one. Considering the appeal of the neighborhood learning mechanism, we are motivated to provide a mini-survey to highlight some key aspects about it, including 1) how to construct a neighborhood topology? 2) how to select the learning exemplar? and 3) what are the advantages and disadvantages? In this mini-survey, some related neighborhood-based ABC variants are reviewed to reveal the key aspects. Furthermore, some interesting future research directions are also given to encourage deeper related works.

Xinyu Zhou, Guisen Tan, Yanlin Wu, Shuixiu Wu

Human Centred Computing

Frontmatter
Channel Attention Separable Convolution Network for Skin Lesion Segmentation

Skin cancer is a frequently occurring cancer in the human population, and it is very important to be able to diagnose malignant tumors in the body early. Lesion segmentation is crucial for monitoring the morphological changes of skin lesions, extracting features to localize and identify diseases to assist doctors in early diagnosis. Manual de-segmentation of dermoscopic images is error-prone and time-consuming, thus there is a pressing demand for precise and automated segmentation algorithms. Inspired by advanced mechanisms such as U-Net, DenseNet, Separable Convolution, Channel Attention, and Atrous Spatial Pyramid Pooling (ASPP), we propose a novel network called Channel Attention Separable Convolution Network (CASCN) for skin lesions segmentation. The proposed CASCN is evaluated on the PH2 dataset with limited images. Without excessive pre-/post-processing of images, CASCN achieves state-of-the-art performance on the PH2 dataset with Dice similarity coefficient of 0.9461 and accuracy of 0.9645.

Changlu Guo, Jiangyan Dai, Márton Szemenyei, Yugen Yi
A DNN-Based Learning Framework for Continuous Movements Segmentation

This study presents a novel experimental paradigm for collecting Electromyography (EMG) data from continuous movement sequences and a Deep Neural Network (DNN) learning framework for segmenting movements from these signals. Unlike prior research focusing on individual movements, this approach characterizes human motion as continuous sequences. The DNN framework comprises a segmentation module for time point level labeling of EMG data and a transfer module predicting movement transition time points. These outputs are integrated based on defined rules. Experimental results reveal an impressive capacity to accurately segment movements, evidenced by segmentation metrics (accuracy: $$88.3\%$$ 88.3 % ; Dice coefficient: $$82.9\%$$ 82.9 % ; mIoU: $$72.7\%$$ 72.7 % ). This innovative approach to time point level analysis of continuous movement sequences via EMG signals offers promising implications for future studies of human motor functions and the advancement of human-machine interaction systems.

Tian-yu Xiang, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Zhen-Qiu Feng, Mei-Jiang Gui, Hao Li, Zeng-Guang Hou
Neural-Symbolic Recommendation with Graph-Enhanced Information

The recommendation task is not only a problem of inductive statistics from data but also a cognitive task that requires reasoning ability. The most advanced graph neural networks have been widely used in recommendation systems because they can capture implicit structured information from graph-structured data. However, like most neural network algorithms, they only learn matching patterns from a perception perspective. Some researchers use user behavior for logic reasoning to achieve recommendation prediction from the perspective of cognitive reasoning, but this kind of reasoning is a local one and ignores implicit information on a global scale. In this work, we combine the advantages of graph neural networks and propositional logic operations to construct a neuro-symbolic recommendation model with both global implicit reasoning ability and local explicit logic reasoning ability. We first build an item-item graph based on the principle of adjacent interaction and use graph neural networks to capture implicit information in global data. Then we transform user behavior into propositional logic expressions to achieve recommendations from the perspective of cognitive reasoning. Extensive experiments on five public datasets show that our proposed model outperforms several state-of-the-art methods, source code is avaliable at [ https://github.com/hanzo2020/GNNLR ].

Bang Chen, Wei Peng, Maonian Wu, Bo Zheng, Shaojun Zhu
Contrastive Hierarchical Gating Networks for Rating Prediction

Review-based recommendations suffer from text noises and the absence of supervised signals. To address those challenges, we propose a novel hierarchical gated sentiment-aware model for rating prediction in this paper. Specifically, to automatically suppress the influence of noisy reviews, we propose a hierarchical gating network to select informative textual signals at different levels of granularity. Specifically, a local gating module is proposed to select reviews with personalized end-to-end differential thresholds. The aim is to gate reviews in a relatively “hard” way to minimize the information flow from noisy reviews while facilitating the model training. A global gating module is employed to evaluate the overall usefulness of the review signals by estimating the uncertainties encoded in the historical reviews. In addition, a discriminative learning module is proposed to supervise the learning of the hierarchical gating network. The essential intuition is to exploit the sentiment consistencies between the target reviews and the target ratings for developing self-supervision signals so that the hierarchical gating network can select relevant reviews related to the target ratings for better prediction. Finally, extensive experiments on public datasets and comparison studies with state-of-the-art baselines have demonstrated the effectiveness of the proposed model, additional investigations also provide a deep insight into the rationale underlying the superiority of the proposed model.

Jingwei Ma, Jiahui Wen, Chenglong Huang, Mingyang Zhong, Lu Wang, Guangda Zhang
Interactive Selection Recommendation Based on the Multi-head Attention Graph Neural Network

The click-through rate prediction of users is a critical task in the recommendation system. As a powerful machine learning method, graph neural networks have been favored by scholars to solve the task recently. However, most graph neural network-based click-through rate prediction models ignore the effectiveness of feature interaction and generally model all feature combinations, even if some are meaningless. Therefore, this paper proposes a Multi-head attention Graph Neural Network with Interactive Selection, named MGNN_IS in short, to capture the complex feature interactions via graph structures. In particular, there are three sub-graphs to be constructed to capture internal information of users and items respectively, and interactive information between users and items, namely the user internal graph, item internal graph, and user-item interaction graph correspondingly. Moreover, the proposed model designs a multi-head attention propagation module for the aggregation with an interactive selection strategy. This module can select the constructed graph and increase diversity with multiple heads to achieve the high-order interaction from the multiple layers. Finally, the proposed model fuses the features, and predicts. Experiments on three public datasets demonstrate that the proposed model outperformed other advanced models.

Shuxi Zhang, Jianxia Chen, Meihan Yao, Xinyun Wu, Yvfan Ge, Shu Li
CM-TCN: Channel-Aware Multi-scale Temporal Convolutional Networks for Speech Emotion Recognition

Speech emotion recognition (SER) plays a crucial role in understanding user intent and improving human-computer interaction (HCI). Currently, the most widely used and effective methods are based on deep learning. In the existing research, the temporal information becomes more and more important in SER. Although some advanced deep learning methods can achieve good results, such as convolutional neural networks (CNN) and attention module, they often ignore the temporal information in speech, which can lead to insufficient representation and low classification accuracy. In order to make full use of temporal features, we proposed channel-aware multi-scale temporal convolutional networks (CM-TCN). Firstly, channel-aware temporal convolutional networks (CATCN) is used as the basic structure to extract multi-scale temporal features combining channel information. Then, global feature attention (GFA) captures the global information at different time scales and enhances the important information. Finally, we use the adaptive fusion module (AFM) to establish the overall dependency of different network layers and fuse features. We conduct extensive experiments on six dataset, and the experimental results demonstrate the superior performance of CM-TCN.

Tianqi Wu, Liejun Wang, Jiang Zhang
FLDNet: A Foreground-Aware Network for Polyp Segmentation Leveraging Long-Distance Dependencies

Given the close association between colorectal cancer and polyps, the diagnosis and identification of colorectal polyps play a critical role in the detection and surgical intervention of colorectal cancer. In this context, the automatic detection and segmentation of polyps from various colonoscopy images has emerged as a significant problem that has attracted broad attention. Current polyp segmentation techniques face several challenges: firstly, polyps vary in size, texture, color, and pattern; secondly, the boundaries between polyps and mucosa are usually blurred, existing studies have focused on learning the local features of polyps while ignoring the long-range dependencies of the features, and also ignoring the local context and global contextual information of the combined features. To address these challenges, we propose FLDNet (Foreground-Long-Distance Network), a Transformer-based neural network that captures long-distance dependencies for accurate polyp segmentation. Specifically, the proposed model consists of three main modules: a pyramid-based Transformer encoder, a local context module, and a foreground-Aware module. Multilevel features with long-distance dependency information are first captured by the pyramid-based transformer encoder. On the high-level features, the local context module obtains the local characteristics related to the polyps by constructing different local context information. The coarse map obtained by decoding the reconstructed highest-level features guides the feature fusion process in the foreground-Aware module of the high-level features to achieve foreground enhancement of the polyps. Our proposed method, FLDNet, was evaluated using seven metrics on common datasets and demonstrated superiority over state-of-the-art methods on widely-used evaluation measures.

Xuefeng Wei, Xuan Zhou
Domain-Invariant Task Optimization for Cross-domain Recommendation

The challenge of cold start has long been a persistent issue in recommender systems. However, Cross-domain Recommendation (CDR) provides a promising solution by utilizing the abundant information available in the auxiliary source domain to facilitate cold-start recommendations for the target domain. Many existing popular CDR methods only use overlapping user data but ignore non-overlapping user data when training the model to establish a mapping function, which reduces the model’s generalization ability. Furthermore, these CDR methods often directly learn the target embedding during training, because the target embedding itself may be unreasonable, resulting in an unreasonable transformed embedding, exacerbating the difficulty of model generalization. To address these issues, we propose a novel framework named Domain-Invariant Task Optimization for Cross-domain Recommendation (DITOCDR). To effectively utilize non-overlapping user information, we employ source and target domain autoencoders to learn overlapping and non-overlapping user embeddings and extract domain-invariant factors. Additionally, we use a task-optimized strategy for target embedding learning to optimize the embedding and implicitly transform the source domain user embedding to the target feature space. We evaluate our proposed DITOCDR on three real-world datasets collected by Amazon, and the experimental results demonstrate its excellent performance and effectiveness.

Dou Liu, Qingbo Hao, Yingyuan Xiao, Wenguang Zheng, Jinsong Wang
Ensemble of Randomized Neural Network and Boosted Trees for Eye-Tracking-Based Driver Situation Awareness Recognition and Interpretation

Ensuring traffic safety is crucial in the pursuit of sustainable transportation. Across diverse traffic systems, maintaining good situation awareness (SA) is important in promoting and upholding traffic safety. This work focuses on a regression problem of using eye-tracking features to perform situation awareness (SA) recognition in the context of conditionally automated driving. As a type of tabular dataset, recent advances have shown that both neural networks (NNs) and gradient-boosted decision trees (GBDTs) are potential solutions to achieve better performance. To avoid the complex analysis to select the suitable model for the task, this work proposed to combine the NNs and tree-based models to achieve better performance on the task of SA assessment generally. Considering the necessity of the real-time measure for practical applications, the ensemble deep random vector functional link (edRVFL) and light gradient boosting machine (lightGBM) were used as the representative models of NNs and GBDTs in the investigation, respectively. Furthermore, this work exploited Shapley additive explanations (SHAP) to interpret the contributions of the input features, upon which we further developed two ensemble modes. Experimental results demonstrated that the proposed model outperformed the baseline models, highlighting its effectiveness. In addition, the interpretation results can also provide practitioners with references regarding the eye-tracking features that are more relevant to SA recognition.

Ruilin Li, Minghui Hu, Jian Cui, Lipo Wang, Olga Sourina
Temporal Modeling Approach for Video Action Recognition Based on Vision-language Models

The usage of large-scale vision-language pre-training models plays an important role in reducing computational consumption and improving the accuracy of the video action recognition task. However, pre-training models trained by image data may ignore temporal information which is significant for video tasks. In this paper, we introduce a temporal modeling approach for the action recognition task based on large-scale pre-training models. We make the model capture the temporal information contained in frames by modeling the short-time local temporal information and the long-time global temporal information in videos separately. We introduce a multi-scale difference approach to getting the difference between adjacent frames, and employ a cross-frame attention approach to capturing semantic differences and details of temporal changes. In addition, we use residual attention blocks to implement the temporal Transformer and assign individual importance scores to each frame by computing the similarity of the frame to the clustering center, to obtain the overall temporal information of the video. Our model achieves 82.3% accuracy on the Kinetics400 dataset with just eight frames. Furthermore, zero-shot results on the HMDB51 dataset and UCF101 dataset demonstrate the strong transferability of our model.

Yue Huang, Xiaodong Gu
A Deep Learning Framework with Pruning RoI Proposal for Dental Caries Detection in Panoramic X-ray Images

Dental caries is a prevalent noncommunicable disease that affects over half of the global population. It can significantly diminish individuals’ quality of life by impairing their eating and socializing abilities. Consistent dental check-ups and professional oral healthcare are crucial in preventing dental caries and other oral diseases. Deep learning based object detection provides an efficient approach to assist dentists in identifying and treating dental caries. In this paper, we present a deep learning framework with a lightweight pruning region of interest (P-RoI) proposal specifically designed for detecting dental caries in panoramic dental radiographic images. Moreover, this framework can be enhanced with an auxiliary head for label assignment during the training process. By utilizing the Cascade Mask R-CNN model with a ResNet-101 backbone as the baseline, our modified framework with the P-RoI proposal and auxiliary head achieves a notable 3.85 increase in Average Precision (AP) for the dental caries class within our dental dataset.

Xizhe Wang, Jing Guo, Peng Zhang, Qilei Chen, Zhang Zhang, Yu Cao, Xinwen Fu, Benyuan Liu
User Stance Aware Network for Rumor Detection Using Semantic Relation Inference and Temporal Graph Convolution

The massive propagation of rumor has impaired the credibility of online social networks while effective rumor detection remains a difficulty. Recent studies leverage stance inference to explore the semantic evidence in comments to improve detection performance. However, existing models only consider stance-relevant semantic features and ignore stance distribution and evolution, thus leaving room for improvement. Moreover, we argue that stance inference without considering the context in threads may lead to incorrect semantic features being accumulated and carried through to rumor detection. In this paper, we propose a user stance aware attention network (USAT), which learns the temporal features in semantic content, individual stance and collective stance for rumor detection. Specifically, a high-order graph convolutional operator is designed to aggregate the preceding posts of each post, ensuring a complete semantic context for stance inference. Two temporal graph convolutional networks work in parallel to model the evolution of stance distribution and semantic content respectively and share stance-based attention for de-nosing content aggregation. Extensive experiments demonstrate that our model outperforms the state-of-the-art baselines.

Danke Wu, Zhenhua Tan, Taotao Jiang
IEEG-CT: A CNN and Transformer Based Method for Intracranial EEG Signal Classification

Intracranial electroencephalography (iEEG) is of great importance for the preoperative evaluation of drug-resistant epilepsy. Automatic classification of iEEG signals can speed up the process of epilepsy diagnosis. Existing deep learning-based approaches for iEEG signal classification usually rely on convolutional neural network (CNN) and long short-term memory network. However, these approaches have limitations in terms of classification accuracy. In this study, we propose a CNN and Transformer based method, which is named as IEEG-CT, for iEEG signal classification. Firstly, IEEG-CT utilizes deep one-dimensional CNN to extract the critical local features from the raw iEEG signals. Secondly, IEEG-CT combines a Transformer encoder, which employs a multi-head attention mechanism to capture long-range global information among the extracted features. In particular, we leverage a causal convolution multi-head attention instead of the standard Transformer block to efficiently capture the temporal dependencies within the input features. Finally, the obtained global features by the Transformer encoder are employed for the classification. We assess the performance of IEEG-CT on two publicly available multicenter iEEG datasets. According to the experimental results, IEEG-CT surpasses state-of-the-art techniques in terms of several evaluation metrics, i.e., accuracy, AUROC, and AUPRC.

Mengxin Yu, Yuang Zhang, Haihui Liu, Xiaona Wu, Mingsen Du, Xiaojie Liu
Multi-task Learning Network for Automatic Pancreatic Tumor Segmentation and Classification with Inter-Network Channel Feature Fusion

Pancreatic cancer is a malignant tumor with a high mortality rate. Therefore, accurately identifying pancreatic cancer is of great significance for early diagnosis and treatment. Currently, several methods have been developed using network structures based on multi-task learning to address tumor recognition issues. One common approach is to use the encoding part of a segmentation network as shared features for both segmentation and classification tasks. However, due to the focus on detailed features in segmentation tasks and the requirement for more global features in classification tasks, the shared features may not provide more discriminatory feature representation for the classification task. To address above challenges, we propose a novel multi-task learning network that leverages the correlation between the segmentation and classification networks to enhance the performance of both tasks. Specifically, the classification task takes the tumor region images extracted from the segmentation network’s output as input, effectively capturing the shape and internal texture features of the tumor. Additionally, a feature fusion module is added between the networks to facilitate information exchange and fusion. We evaluated our model on 82 clinical CT image samples. Experimental results demonstrate that our proposed multi-task network achieves excellent performance with a Dice similarity coefficient (DSC) of 88.42% and a classification accuracy of 85.71%.

Kaiwen Chen, Chunyu Zhang, Chengjian Qiu, Yuqing Song, Anthony Miller, Lu Liu, Imran Ul Haq, Zhe Liu
Fast and Efficient Brain Extraction with Recursive MLP Based 3D UNet

Extracting brain from other non-brain tissues is an essential step in neuroimage analyses such as brain volume estimation. The transformers and 3D UNet based methods achieve strong performance using attention and 3D convolutions. They normally have complex architecture and are thus computationally slow. Consequently, they can hardly be deployed in computational resource-constrained environments. To achieve rapid segmentation, the most recent work UNeXt reduces convolution filters and also presents the Multilayer Perception (MLP) blocks that exploit simpler and linear MLP operations. To further boost performance, it shifts the feature channels in MLP block so as to focus on learning local dependencies. However, it performs segmentation on 2D medical images rather than 3D volumes. In this paper, we propose a recursive MLP based 3D UNet to efficiently extract brain from 3D head volume. Our network involves 3D convolution blocks and MLP blocks to capture both long range information and local dependencies. Meanwhile, we also leverage the simplicity of MLPs to enhance computational efficiency. Unlike UNeXt extracting one locality, we apply several shifts to capture multiple localities representing different local dependencies and then introduce a recursive design to aggregate them. To save computational cost, the shifts do not introduce any parameters and the parameters are also shared across recursions. Extensive experiments on two public datasets demonstrate the superiority of our approach against other state-of-the-art methods with respect to both accuracy and CPU inference time.

Guoqing Shangguan, Hao Xiong, Dong Liu, Hualei Shen
A Hip-Knee Joint Coordination Evaluation System in Hemiplegic Individuals Based on Cyclogram Analysis

Inter-joint coordination analysis can provide deep insights into assessing patients’ walking ability. This paper developed a hip-knee joint coordination assessment system. Firstly, we introduced a hip-knee joint cyclogram generation model that takes into account walking speed. This model serves as a reference template for identifying abnormal patterns in the hip-knee joints when walking at different speeds. Secondly, we developed a portable motion capture platform based on stereovision technology. It uses near-infrared cameras and markers to accurately capture kinematic data of the human lower limb. Thirdly, we designed a hip-knee joint coordination assessment metric DTW-ED (Dynamic Time Wrapping - Euclidean Distance), which can score the subject’s hip-knee joint coordination. Experimental results indicate that the hip-knee joint cyclogram generation model has an error range of [0.78 $$^{\circ }$$ ∘ , 1.08 $$^{\circ }$$ ∘ ]. We conducted walking experiments with five hemiplegic subjects and five healthy subjects. The evaluation system successfully scored the hip-knee joint coordination of patients, allowing us to differentiate between healthy individuals and hemiplegic patients. This assessment system can also be used to distinguish between the affected and unaffected sides of hemiplegic subjects. In conclusion, the hip-knee joint coordination assessment system developed in this paper has significant potential for clinical disease diagnosis.

Ningcun Xu, Chen Wang, Liang Peng, Jingyao Chen, Zhi Cheng, Zeng-Guang Hou, Pu Zhang, Zejia He
Evaluation of Football Players’ Performance Based on Multi-Criteria Decision Analysis Approach and Sensitivity Analysis

The use of information systems and recommendation models in football has become a popular way to improve performance. With their help, it is possible to make more informed and effective decisions in terms of team management, the selection of training parameters, or the building of player line-ups. To this end, in this paper, we propose a decision model based on the Multi-Criteria Decision Analysis (MCDA) approach to assess defensive football players regarding overall and defense skills. The model was examined with selected objective weighting techniques and MCDA methods to comprehensively analyze the potential footballers’ performance scores. A sensitivity analysis is performed to indicate what factors of the game the players should be focusing on throughout the season to increase the evaluation score of their performance and, thus, be a more attractive choice to clubs’ managers. The results from the sensitivity analysis show that improving the performance regarding particular criteria can significantly improve the evaluation score of the players.

Jakub Wiȩckowski, Wojciech Sałabun
Backmatter
Metadata
Title
Neural Information Processing
Editors
Biao Luo
Long Cheng
Zheng-Guang Wu
Hongyi Li
Chaojie Li
Copyright Year
2024
Publisher
Springer Nature Singapore
Electronic ISBN
978-981-9980-67-3
Print ISBN
978-981-9980-66-6
DOI
https://doi.org/10.1007/978-981-99-8067-3

Premium Partner