Zum Inhalt

Advanced Intelligent Computing Technology and Applications

21st International Conference, ICIC 2025, Ningbo, China, July 26–29, 2025, Proceedings, Part XXII

  • 2025
  • Buch
insite
SUCHEN

Über dieses Buch

The 20-volume set LNCS 15842-15861, together with the 4-volume set LNAI 15862-15865 and the 4-volume set LNBI 15866-15869, constitutes the refereed proceedings of the 21st International Conference on Intelligent Computing, ICIC 2025, held in Ningbo, China, during July 26-29, 2025.

The 1206 papers presented in these proceedings books were carefully reviewed and selected from 4032 submissions. They deal with emerging and challenging topics in artificial intelligence, machine learning, pattern recognition, bioinformatics, and computational biology.

Inhaltsverzeichnis

Frontmatter

Neural Networks

Frontmatter
Event Data Classification Using TPE-Based Deep Spiking Neural Networks

Spiking neural networks (SNNs) represent a class of neural networks that emulate the operation of biological neurons within the nervous system, simulating the transmission of electrical signals between neurons to facilitate information processing and learning. Deep SNNs possess the capability to extract intricate features from data, making them well-suited for classification tasks. Nonetheless, the complexity of deep SNNs and the multitude of hyperparameters pose challenges in effectively determining optimal parameters, often resulting in performance degradation. The Tree-structured Parzen Estimator (TPE) optimization algorithm is frequently employed to address global optimization problems for black-box models. This study leverages the Bayesian optimization algorithm to fine-tune the learning rate of SNNs and the membrane time constant of Leaky Integrate-and-Fire neurons within each layer. The optimized deep SNN models are subsequently utilized for event data classification. The experimental findings demonstrate a remarkable enhancement in the performance of deep SNNs following the application of the TPE optimization algorithm. Specifically, the network achieved an accuracy of 97.95% on the SED dataset, 76.03% on the CIFAR10-DVS dataset, and 96.87% on the DVS Gesture 128 dataset. This research underscores the potential of conventional optimization algorithms in optimizing tasks for SNNs, offering a novel approach to boosting the efficacy of spiking neural networks.

Junxiu Liu, Huazhi Liu, Qiang Fu, Yuling Luo, Sheng Qin, Xiwen Luo, Yeqing Xiong
TIINet: A Three-Stage Interactive Integration Network for RGB-D Salient Object Detection

Current research on salient object detection predominantly focuses on optimizing intermediate features to enhance the utilization of multimodal information. However, most models overlook the differences between features at various levels during the encoding and decoding stages, leading to insufficient feature utilization and consequently limiting model performance. To address this issue, we propose the Three-stage Interactive Integration Network (TIINet). In the encoding phase, we introduce a three-stage feature optimization module to process RGB and depth features at different stages, enhancing the overall representation. In the decoding stage, we propose a three-stage feature aggregation module, which effectively fuses multimodal features by integrating features at different levels, thereby improving model performance. Extensive experiments demonstrate that our approach outperforms state-of-the-art (SOTA) methods, achieving significant performance improvements across five benchmark datasets: STERE, NJU2K, NLPR, SIP, and SSD.

Qiuqian Long, Xintao Zhuo, Zhenyu Zhang, Qingzhen Xu
Dynamic Semantic Graph Learning with Progressive Alignment for Image-Text Matching

Image-text matching, a core task in multimodal learning that aligns visual and textual semantics, faces two critical challenges: (1) existing graph-based methods often struggle to balance over-connection and semantic loss due to rigid thresholding strategies, and (2) single-level interaction mechanisms fail to capture hierarchical cross-modal dependencies effectively. As a response to the identified problems, our research proposes an advanced framework integrating Dynamic Semantic Graph Enhancement (DSGE) with Progressive Semantic Alignment (PSA). The DSGE module adaptively adjusts graph connectivity based on the statistical properties of similarity distributions, overcoming the limitations of manually defined thresholds that typically result in either over-connection or the omission of critical relationships. The PSA module establishes coarse-grained correspondences through bidirectional cross-modal attention and progressively refines alignment precision using a context-aware hierarchical strategy. Comprehensive evaluations on Flickr30K and MS-COCO, particularly in complex semantic scenarios, confirm our framework achieving significant performance gains over existing methods.

Yudi Wang, Yifan Lu, Bailing Zhang, Kexuan Zhou
MGTDGraph: Multi-granularity Graph Attention Networks for Multivariate Long-Term Time Series Forecasting

Conventional time series forecasting approaches often face challenges in modeling intricate multivariate correlations and fail to jointly capture local temporal dynamics as well as long-range dependencies. To overcome these issues, we propose a novel forecasting framework that leverages multi-granularity feature extraction based on graph neural network (GNN). Our approach integrates information at the node, edge, and subgraph levels to construct a comprehensive representation that encompasses both fine-grained and global structures in multivariate time series data. A Graph Attention Network (GAT) is employed to adaptively assign importance weights between nodes and their neighbors, enabling the model to effectively capture complex spatial–temporal interactions. Extensive experiments conducted on four benchmark datasets across multiple prediction horizons demonstrate that our method consistently outperforms existing baselines in predictive accuracy. Beyond accuracy improvements, the model’s ability to represent structural intricacies of time series data enhances its applicability to a wide range of forecasting scenarios across diverse domains.

Shumin Tan, Yuexian Zou
Topology-Aware Discriminative Graph Convolutional Network for Skeleton-Based Action Recognition

In recent years, graph convolutional networks (GCNs) based on skeletal data have been widely applied in action recognition. However, challenges persist in effectively constructing graph topologies and aggregating skeletal information. Conventional methods usually employ fixed, static skeletal topologies, which struggle to adapt to subtle variations in complex actions. Adaptive topology methods, although capable of dynamically adjusting the graph structure, often do so at the expense of the effectiveness of the original skeletal information.To address these issues, this paper proposes a Topology-aware Discriminative Graph Convolutional Network (TD-GCN) designed to enhance the modeling flexibility for complex actions and improve adaptive topology construction. Additionally, the paper introduces a Spatiotemporal Feature Discrimination Head (STFD-Head) that leverages contrastive learning to dynamically identify and rectify ambiguous samples in the feature space, further boosting the model's discriminative capability in challenging scenarios. Experimental results demonstrate that the proposed method performs excellently on the NTU RGB + D, NTU RGB + D 120, and NW-UCLA datasets. Notably, on the large-scale and complex NTU RGB + D 120 dataset, the method achieves accuracies of 90.7% on the X-Sub benchmark and 91.9% on the X-Set benchmark.

Lei Shi, Yilei Mei, Caixia Meng, Yucheng Shi, Yufei Gao, Lin Wei
Online Delay Learning Algorithm for Feedforward Spiking Neural Networks Based on Spike Train Kernels

Synaptic delay plasticity has been used to design supervised learning algorithms for spiking neural networks (SNNs). Considering both synaptic connection strength plasticity and synaptic delay plasticity in biological nervous systems, we propose an online synaptic weight-delay supervised learning algorithm based on spike train kernels for feedforward SNNs in this paper. The proposed algorithm uses the kernel function representation of spike trains to construct a real-time error function at the spike train level, and derives online learning rules for synaptic weights and delays by combining the gradient descent rule. The proposed algorithm enables online learning of spike trains, and can simultaneously adjust synaptic weights and delays between neurons during supervised learning. The learning performance of the proposed algorithm is verified by spike train learning tasks and nonlinear pattern recognition tasks on UCI datasets. The spike train learning results show that the introduction of dynamic learnable synaptic delays can effectively improve the learning performance of feedforward SNNs. The results of UCI dataset classification show that the proposed algorithm has certain advantages in solving complex spatio-temporal pattern recognition problems compared to some common supervised learning algorithms for SNNs.

Xiangwen Wang, Xun Li, Li Zou, Xianghong Lin
Energy-Constrained UAV Network Topology Recovery Based on Graph Convolutional Networks

This paper addresses the issue of network disconnection in UAV networks caused by the departure of multiple drones due to energy depletion or damage during flight. We propose an energy-constrained topology recovery algorithm based on Graph Convolutional Networks (GCN). A network topology connectivity model is developed, incorporating an energy constraint module, with the optimization objectives of rapid self-healing and minimized energy consumption. Introducing a novel weighted loss function to balance the trade-off between minimizing energy consumption and movement distance. In the event of a network disconnection, the remaining energy, and link connectivity information of the operational UAVs are encoded into a Laplacian matrix, which is then fed into the GCN model as the input. The model is trained using gradient descent, resulting in a UAV flight strategy for topology recovery. Simulation results demonstrate that, compared to the currently known methods, the proposed approach reduces self-healing time by 41.2%, overall network communication time by 13.3%, and improves the self-healing rate by 27%. Furthermore, the proposed method achieves effective energy control while maintaining high network stability and persistence.

Huijiao Wang, Zhuoyang Cai, Chuanyu Liao, Biao Li
UBDet: An Unsupervised Breast Tumor Detection Framework with Boundary-Aware Enhancement

The early diagnosis of breast cancer is of crucial importance for patient treatment. However, existing breast tumor datasets suffer from limited labeled sample sizes and general object detection algorithms fail to explicitly model boundary information for specific scenarios such as breast tumor detection. Therefore, this paper presents an innovative unsupervised learning approach, and proposes two key components: 1) an Explicit Boundary Coarse Modeling (EBCM) module and 2) a Boundary-Aware Layer (BALayer) with frequency-domain separation, which collaboratively enhance the extraction of discriminative breast tumor features. The core idea lies in leveraging the powerful capability of unsupervised learning to automatically extract the tumors’ salient features, while enhancing semantic extraction through BALayer, thereby providing rich discriminative features for downstream medical tasks. Experimental results demonstrate that our method achieves significant improvement by 15.2% on mAP in unsupervised breast tumor detection and 9.0% mAP in semi-supervised scenarios. It can more accurately identify tumor regions and significantly enhance the precision and recall of breast tumor detection. Moreover, it can also serve as a pre-trained model with discriminative features for breast tumor identification on small-sample breast tumor datasets.

Xingxin Guo, Zhihui Lai, Heng Kong, Xiaoling Luo
Bidirectional Interactive Prompt Fusion and Noise Filtering for Multimodal Aspect-Based Sentiment Analysis

Multimodal Aspect-Based Sentiment Analysis (MABSA) is a fine-grained mission that intents to analyze users’ sentiment tendency towards target aspects through different and rich modal contents. In conjunction with this topic, many methods have been proposed to link modalities to form interactive judgments of emotional tendencies. However, the currently proposed methods have some disadvantages: (1) Since some image modalities are unrelated to text modalities, information irrelevant to the aspect will be introduced; (2) Different modalities are difficult to complement each other during the fusion process, resulting in poor fusion performance, and the fusion process may also introduce additional noise. To resolve these points, we propose a novel MABSA network model that combines a simple noise filtering approach with an innovative fast learning method for effective classification. Specifically, for the visual modality, we introduce an image content filtering (ICF) layer to filter out information that is not relevant to the aspect and text modalities. In addition, in the fusion stage, we proposed an excellent bidirectional interactive prompt fusion (BIPF) layer, which integrates prompt learning into the query and fusion stages, realizes the fusion of different modalities and contextual semantic information, and makes modal fusion more comprehensive. The outcomes of our experiments indicate that the model we developed attains leading-edge performance levels when tested on two MABSA datasets.

Fuxian Zhu, Xiaoli Xu
AMTerrain: Research on Arbitrary-Modal Terrain Segmentation Based on Text Guidance

Current terrain segmentation research is limited to fixed modalities and closed label sets. This paper proposes the AMTerrain method, which combines arbitrary modality processing with open vocabulary learning capabilities. We use a Unified Coding structure to encode arbitrary modality data and introduce two groups of learnable tokens, Score Embeds and Modal Embeds, to achieve unified representation of arbitrary modality data. Meanwhile, we use Customized vision-prompts generated by DeepSeek-R1 to enhance the model’s open vocabulary learning ability. The number of parameters of this method is comparable to that of the method with the fewest parameters, and its performance has been improved by 1.02%, 1.86%, and 2.58% respectively under three experimental settings: all-modality, modality-agnostic, and open-vocabulary. Experiments show that our model can fully exploit the value of each modality, has strong adaptability to arbitrary modality inputs, and possesses certain open vocabulary learning capabilities.

Yuqian Wang, Xuefu Xiang, Yongcun Wu, Peng Quan, Yunzhi Luo, Lu Zhang
Correlation Adaptive Dynamic Graph Convolutional Networks for Traffic Flow Prediction

Accurate traffic flow prediction can help traffic management authorities address transportation problems, but traditional traffic flow prediction methods cannot handle nonlinear data and parameter selection well. To this end, this paper proposes the Correlation Adaptive Dynamic Graph Convolutional Networks (CADGCN) for traffic flow prediction. The CADGCN consists of an attention mechanism, a correlation graph convolutional network (CorrGCN), and a deep reinforcement learning (DRL) module. Specifically, the attention mechanism enhances the ability of the CorrGCN network to effectively capture spatiotemporal correlations, thereby improving the timeliness and accuracy of the predictions. Moreover, the CorrGCN network discovers local and global dependencies to comprehensively understand traffic flow relationships between locations. It allows the method to extract spatial and temporal correlations by aggregating node features through multiple graph convolutional layers that progressively propagate the information. Finally, the DRL module is employed to adaptively adjust the adjacency matrix according to different traffic data. The experimental results indicate that the accuracy of CADGCN is superior to that of HA, ARIMA, LSTM, GRU, CNN, DCRNN, GMAN, T-GCN, STGCN, ASTGCN, STSGCN, AGCRN, STGODE, ASTTN, and DDGformer.

Yan Chen, Dawen Xia, Yang Hu, Wenyong Zhang, Fuchu Zhang
EMDC-YOLO: A Residual Multi-scale Attention and Cross-Scale Fusion-Based Method for Pedestrian Detection in Crowded Scenes

Crowded scenes present significant challenges for pedestrian detection due to multi-scale pedestrian instances, complex backgrounds, and varying occlusion patterns, ranging from minimal to severe occlusion. To overcome these difficulties, this paper introduces EMDC-YOLO, a residual multi-scale attention and cross-scale fusion-based method for pedestrian detection in crowded scenes. Firstly, an iRB_EMA model integrates channel and contextual information to minimize background noise and improve pixel-level focus on occluded pedestrians, thereby improving the model’s occlusion handling capability. Secondly, the DCPAN network is proposed, optimizing multi-scale feature fusion and accurately detecting severely occluded pedestrians. Finally, the ASFF Head network is proposed to generate features with the same resolution and channel number, thereby enhancing the recognition performance of pedestrian targets at different scales. Compared to the baseline, EMDC-YOLO achieves a precision that is 2.17% higher and recall improved by 8.4%. mAP@50 shows a 7.2% enhancement, while mAP@50:95 increases by 7.27%, demonstrating its improved performance and superiority.

Xinxin Zhou, Chunying Xie, Yucai Li, Chunzhen Li
A Novel Lightweight YOLO Method for Satellite Remote Sensing via Matrix Decomposition

The You Only Look Once (YOLO) algorithm series is acclaimed for its efficient and real-time object detection capabilities, which exhibit outstanding performance in various data processing tasks, especially in satellite remote sensing. However, in practice, it is a formidable challenge to directly deploy the YOLO model with massive parameters on resource-constrained edge devices.The principal challenge lies in the fact that the training and inference phases of the YOLO model, due to its extensive parameterization, demand considerable computational power and substantial hardware storage. To overcome this difficulty, we propose a new lightweight YOLO model via random matrix factorization, named CUR-YOLO, capable of significantly reducing the computational complexity and the number of parameters in the YOLO model The key innovation of CUR-YOLO involves breaking down each large weight matrix of the YOLO model into three smaller sub-matrices, effectively replacing the computation and storage demands of the original matrix with those of the more efficient sub-matrices. Furthermore, we present a novel rank-constrained back-propagation algorithm, which enables direct parallel training of these small-scale sub-matrices on edge devices. In contrast to existing lightweight methods, the proposed CUR-YOLO effectively reduces the original YOLO’s computational complexity and parameter count, under the same recognition accuracy conditions. These findings highlight the enormous potential of deploying our CUR-YOLO on resource-constrained edge devices to enable real-time analytics and decision-making for satellite remote sensing tasks.

Hongfu Liu, Hongyu Fu, Bin Li, Cheng Sun, Shenghong Li, Chenglin Zhao
TSMDM-Net: A Speech Emotion Recognition Model Based on Multi-scale Time Series Dynamic Modeling

Speech Emotion Recognition (SER), as a core technology of emotional intelligence, aims to analyze emotional perception in human-computer interaction through acoustic features. In response to the limitations of traditional Transformer models in global attention noise sensitivity and insufficient multi-scale temporal modeling, this paper proposes a Multi-Scale Time Series Dynamic Modeling Network (TSMDM-Net). The network innovatively integrates dynamic dilated causal convolution and deformable attention mechanisms through the collaborative architecture of the Temporal Convolutional Encoder (TCE) and Local-Global Interaction Module (LGIM). TCE extracts multi-resolution temporal features using exponentially dilated convolution kernels, while LGIM adaptively adjusts local attention windows through a dynamic routing strategy and achieves complementary modeling of local acoustic events and global prosodic patterns through a cross-layer weight sharing mechanism. Experiments on the IEMOCAP and MELD datasets demonstrate that TSMDM-Net effectively captures multi-scale emotional features and significantly outperforms mainstream Transformer models and other advanced methods in terms of overall performance. Ablation experiments further validate the core contribution of the TCE and LGIM modules to the performance improvement.

Wei Wei, Yibing Wang, Bingkun Zhang, Xiaodong Duan
Research on Contrastive Learning-Based Knowledge Distillation for Deep Graph Neural Networks

Graph Neural Networks have achieved remarkable success across various domains but suffer from over-smoothing and high computational cost in deep architectures. Knowledge distillation offers a solution by transferring knowledge from a deep teacher to a lightweight student model, yet most existing approaches depend on labeled data, limiting their applicability in real-world unlabeled scenarios. To address this, we propose DisGCL, a contrastive learning-based distillation framework for deep GNNs without labels. DisGCL enables the student to inherit knowledge from the teacher while extracting discriminative features via self-supervision. Extensive experiments show that DisGCL outperforms state-of-the-art distillation methods in unlabeled settings, achieving improved generalization with lower complexity and resource demand.

Yizhuo Wang, Xiaohu Luo, Hongli Ding, Zhao Ma, Jing Zhu
Two-View Fusion Graph Neural Networks for Graph Classification

Graph Neural Networks (GNNs) have made notable advancements in graph-based tasks, such as node classification and link prediction. However, current GNN models still face several limitations. First, they often struggle to fully capture the complex, deep-level relationships between nodes. Second, traditional GNNs rely on static or predefined rules for selecting neighboring nodes, limiting their adaptability to changes in dynamic data distributions. Lastly, the effective integration of information from multiple perspectives remains a challenge when working with multi-view data. To address these issues, this paper introduces an innovative GNN framework, TFGraph. TFGraph enhances the learning capabilities of graph-structured data by integrating attention mechanisms, context-aware aggregation strategies, and channel aggregation methods. Specifically, we propose a reinforcement learning-based module using GATv2 to dynamically adjust the importance threshold of adjacent edges, enabling the accurate identification of critical neighboring nodes. Additionally, we introduce a context-aware aggregation model that combines the PageRank algorithm with a local perception strategy, applying weighted processing to nodes with identical labels. To further improve the integration of multisource information, we design a gated feature fusion mechanism that intelligently combines data from different sub-networks based on their gating weights. Ultimately, TFGraph improves task efficiency by integrating channel convolution mechanisms with Graph Isomorphism Networks (GIN). Experimental results demonstrate that TFGraph outperforms traditional GNN methods, showing significant performance improvements across multiple benchmark datasets.

Zhouhua Shi, Shiwen Sun, Guang Yang, Yan Liu
MambaForDIF: Distance-Importance Features and Long-Range Dependencies for Enhancing Aspect-Based Sentiment Analysis

In Aspect-based Sentiment Analysis (ABSA), utilizing graph neural networks (GNNs) to exploit syntactic structures derived from dependency parsing has proven effective in enhancing ABSA. However, most existing studies mainly focus on modeling dependency relationships using graph topology or attention coefficients, which may limit the effective utilization of syntactic information. Additionally, the complexity of attention mechanisms restricts neural networks’ capability to measure long-range dependencies, thus limiting model effectiveness to short-range relationships. To overcome these limitations, a novel approach, termed MambaDIF, is proposed to enhance long-range dependency modeling by improving distance-based dependency importance calculations and MambaFormer module. Specifically, the proposed MambaDIF introduces a refined distance importance function to better model syntactic dependencies. Additionally, it incorporates the MambaFormer module, which effectively encodes inputs containing both dependency and semantic information. In the MambaFormer module, Multi-Head Attention (MHA) and the Mamba blocks work in tandem to simultaneously capture short-term and long-term dependencies. Additionally, Graph Convolutional Networks (GCNs) and semantic GCNs are incorporated to leverage triple learning and orthogonal projection techniques, effectively extracting multi-level information. Extensive experiments were carried out on three benchmark datasets. The experimental results demonstrate that the MambaDIF model consistently outperforms baselines.

Yili Wang, Kaiqi Wang, Chengsheng Yuan
FedCWE: Federated Cluster-Based Weight Sampling and Ensemble Learning for Non-IID Data

Federated Learning (FL) enables collaborative training across distributed clients while preserving data privacy. Non-independent and identically distributed (non-IID) data poses a significant challenge, causing inconsistent local updates and degrading global model performance. We propose FedCWE, a federated cluster-based weight sampling and ensemble learning algorithm. FedCWE clusters clients based on data heterogeneity and volume, applies weighted sampling, and uses ensemble learning to enhance the global model. Experiments on natural and medical image datasets show FedCWE improves accuracy by 2%–5% over state-of-the-art FL methods and up to 10% in extreme non-IID scenarios, while reducing communication costs.

Xing Wu, Yan Wang, Quan Qian, Bin Huang, Jun Song
ORE: An Offline Redundancy Elimination System for GNN Acceleration

Graph Neural Networks (GNNs), widely applied in social networks and knowledge graphs, face significant computational bottlenecks. Redundancy elimination has shown promise in optimizing GNNs, but existing methods often sacrifice effectiveness for lower algorithmic time cost, limiting their acceleration capabilities. To address this, we propose ORE (Offline Redundancy Elimination), a system with two components: A multi-level iterative strategy to expand the redundant data pool; A redundancy elimination algorithm formulated as a maximum weight clique problem, solved using an advanced solver to maximize elimination performance. Experiments on public datasets show that ORE achieves up to 10.4× end-to-end speedup for GCN—4.6× higher than previous SOTA methods—and improves redundancy elimination by 3.7×.

Ziqi Wang, Yongquan Fu, Huayou Su
Time Efficiency: Legendre Polynomials in Kolmogorov-Arnold Network

B-spline functions necessitate piecewise interval calculations under De Boor algorithm, while Legendre polynomials enable global domain operations without segmentation. This characteristic renders Legendre polynomials computationally more efficient. Motivated by this efficiency, we introduce Legendre-KAN, a reformulated version of Kolmogorov-Arnold Networks (KAN) that substitutes B-spline basis functions with Legendre polynomials. Our approach reparameterizes the network's weights using Legendre polynomials, which not only substantially reduces training time but also preserves competitive approximation accuracy, albeit with marginally higher root-mean-square error (RMSE) in certain tasks. Experimental results confirm that Legendre-KAN provides a computationally efficient alternative to traditional KAN architectures and surpasses MLP modules in terms of training accuracy. This work offers a promising avenue for addressing the inherent time-cost challenges in KAN.

Wei Chen, JiaHui Sun, QingFeng Xia
SCAUnet: Symmetric Cross-Attention U-net Model for Semantic Segementation

U-Net leverages skip connections to link the encoder and decoder, effectively combining low-level spatial features with high-level semantic features. However, this architecture is constrained by its limited ability to incorporate global semantic information from the entire image. To overcome this constraint, we propose the Symmetric Cross-Attention (SCA) network. The SCA network incorporates a novel decoder branch engineered for semantic feature extraction. These features are subsequently integrated with the spatial characteristics originating from the primary decoder. At each level, Spatial features serve as the keys, whereas semantic features function as the queries. Spatial cross-attention is utilized to refine low-level spatial information, while channel cross-attention is applied to enrich high-level semantic representations. The results from both attention directions are fused, and the decoder is restructured to more effectively capture long-range dependencies. The SCA module establishes mutual correlations between semantic and spatial responses, thereby enhancing the representation of specific semantic features. We integrate the SCA module into four models: U-Net, V-Net, R2-Unet, and ResUnet. Experiments conducted on two benchmark medical image segmentation datasets, Kvasir-Seg and CVC-ClinicDB, demonstrate that the SCA module significantly improves segmentation performance.

Chunbo Yang, Hailin Liu
Self-attention Multiscale Mixed Propagation Network Based on Contrastive Augmentation

Graph Neural Networks (GNNs) have demonstrated remarkable success in various fields, however, they face challenges in deep network architectures, such as over-smoothing, sensitivity to topological perturbations, and limitations on heterogeneous graphs. This paper proposes a Self-Attention Propagation Network based on Contrastive Augmentation (SAMPCA) to address these issues. SAMPCA uses a novel multi-dimensional graph perturbation graph data augmentation method and introduces graph regularization to optimize the graph structure. It also incorporates a self-attention multiscale mixed mechanism for adaptive propagation, mitigating over-smoothing and enriching neighborhood information diversity. Furthermore, SAMPCA extends edge weights to negative values to better adapt to complex heterogeneous graph topologies. Experiments demonstrate that SAMPCA effectively alleviates over-smoothing and outperforms SOTA models in semi-supervised node classification tasks across multiple datasets. On homogeneous graphs like Cora, SAMPCA achieved an improvement of 2.23% over GPRGNN. On heterogeneous graphs, it demonstrated remarkable improvement on the Texas dataset, outperforming GPRGNN by 1.7%. These results showcase its augmented generalization and robustness.

Qianli Ma, Xiao Zhang, Junqi Liu, Zongyang Li
Sentiment Perception from Tokens: A Multitask Learning Framework with Entropy-Driven Fusion

Multimodal sentiment analysis (MSA) is crucial for applications like human-computer interaction. While pre-trained models have achieved remarkable performance across general domains, fine-tuning them solely on label-supervised sentiment tasks often leads to an over-reliance on global representations, thereby overlooking fine-grained sentiment indicators. However, it is precisely these fine-grained features that drive changes in sentiment. To address this issue, we propose a Sentiment-Aware Multitask Learning framework that enables the model to simultaneously understand both coarse-grained and fine-grained sentiment information. In addition to label-supervised learning, we aim to identify fine-grained tokens within the data that influence sentiment, which are then masked to prompt the model to reconstruct the missing components. Besides, to better integrate multimodal features, some late fusion methods introduce learnable modules, such as a linear layer. However, linear-layer late fusion methods necessitate dataset-specific retraining, while non-parametric late fusion methods like voting or averaging inadequately integrate multimodal features. Inspired by information theory, we propose an Entropy-Driven Fusion method that preserves the plug-and-play nature of non-parametric techniques while enabling more effective multimodal fusion. Experiments demonstrate that our approach achieves state-of-the-art results on the MSA task.

Jingshan Yan, Ao Li, Longwei Xu, Pengwei Wang
GCLCP: Graph Contrastive Learning with Convolutional Perturbation for Recommendation

Recommender systems increasingly leverage graph-based representations to model complex user-item relationships. To alleviate interaction sparsity, graph contrastive learning (GCL) incorporates self-supervised signals to improve recommendation accuracy. However, existing GCL-based methods suffer from two key limitations: (i) graph perturbation-based contrastive views may distort structural information and degrade embedding quality; (ii) optimization via random negative sampling reduces training efficiency and recommendation quality. To address these challenges, we introduce a novel framework named Graph Contrastive Learning with Convolutional Perturbation (GCLCP) for recommendation. GCLCP introduces perturbations to the neighborhood aggregation in graph convolution, generating contrastive views while preserving the graph structure. Furthermore, inspired by the concepts of alignment and uniformity in representation learning, we incorporate these objectives into the GCL framework as loss functions, thereby improving both efficiency and accuracy without relying on negative sampling. Experiments conducted on three benchmark datasets demonstrate the superior performance of GCLCP compared to the representative baselines. Specifically, on the iFashion dataset, it achieves a 6.04% improvement in recommendation accuracy, measured by NDCG@20.

Hao Pan, Lei Chen, Yangxun Ou
Agro-LLaVA-Next: A Large Multimodal Model for Plant Diseases Recognization

This paper presents Agro-LLaVA-Next, a specialized framework aimed at overcoming the limitations of general-purpose large multimodal models (LMMs) in plant disease recognition. We propose a parameter-efficient fine-tuning strategy that freezes the language module while selectively optimizing the visual encoder and multimodal projector. This approach ensures ffective adaptation to agricultural vision tasks without compromising the model’s inherent linguistic capabilities. To address the scarcity of data in agricultural visual question answering, we construct the AgroTest dataset, which includes 73,125 training samples across 22 crop categories contain 117 disease or health types, through strategic data synthesis. Specifically, 2% of the total 74,548 images were randomly and equally stratified sampled to form the test set (1,423 samples), while the remaining 98% were used to generate training data. Single-turn question-answer (QA) pairs for the training set were synthesized using predefined templates, whereas QA pairs for the test set were generated as multiple-choice questions using the Claude 3.5 Sonnet model. Experimental validation on the test set demonstrates a significant 30.2% improvement in disease recognition accuracy compared to baseline LMMs, with only a moderate 10.6% performance degradation in general dialogue tasks. This establishes an effective trade-off for agricultural applications. The proposed framework underscores the potential of targeted LMM adaptation in addressing domain-specific visual recognition challenges.

Guowei Xu, Weiting Zhao, Yuhui Bie, Mingliang Ge, Zekun Cui, Yaojun Wang
LTL-GCL:A More Efficient Layer-to-Layer Graph Contrastive Learning Method for Recommender System

Recent advancements in recommendation algorithms have increasingly adopted graph contrastive learning (GCL) to optimize collaborative filtering (CF) frameworks. Extensive research has shown its effectiveness in mitigating data sparsity challenges. The current common method is to construct a comparison view using random perturbation, and then align the corresponding nodes. We believe that this approach has some issues. Constructing additional views can result in a lack of some real interactions or false interactions in the comparison view. In addition, current research focuses on aligning nodes of the same type (between users or projects), neglecting the interaction modeling between users and projects. Our work proposes LTL-GCL, a GCL-enhanced recommendation algorithm designed to overcome these limitations. The original embeddings are fed into a two-layer GCN, with the resulting output serving as a synthetic contrastive view. The generated view does not lose any real interaction information and does not require GCN execution on the pseudo view. In addition, we have added a node alignment loss between users and items during the training phase. To evaluate our approach, we performed comprehensive testing across three real-world datasets, assessing both computational performance and recommendation quality.

Haoyang Li, Jiaying Chen, Wanlong Jiang, Zhongrui Zhu
IMVGCN: Interactive Multi-view Learning Graph Convolutional Networks for Traffic Flow Forecasting

Capturing complex spatiotemporal correlations in traffic data for forecasting remains a significant challenge. Most existing approaches ignore interactive learning between spatiotemporal features and extract them serially or in parallel. However, serial extraction leads to information coverage, while parallel extraction results in excessive parameters and challenges in model fitting. To address these issues, we propose the Interactive Multi-view Graph Convolutional Network. This model effectively learns spatiotemporal features interactively, capturing global dynamic spatial patterns in traffic data. Additionally, a fast parallel learning module is constructed within the multi-view learning module, enabling efficient local feature extraction with minimal parameters. A serial learning module further expands the receptive field. Extensive experiments on four real-world traffic flow datasets verify that the proposed model surpasses benchmark models in forecasting accuracy.

Yingyu Li, Huahu Xu
An Inverse Cavity Scattering Inversion Method Based on Adaptive Neural Fuzzy Inference System

In this study, we address the inverse scattering problem in the field of acoustics and explore the reconstruction method of cavity shape under acoustically soft boundary conditions. By incorporating an adaptive neuro-fuzzy inference system (ANFIS), an inversion technique based on a single point source and a finite number of measurement points is developed for reconstructing the geometry of acoustically soft cavities. First, the measured near-field data are downscaled by principal component analysis (PCA) method to reduce the complexity of the near-field data. Then, this paper proposes an adaptive fuzzy inference system, which uses gradient descent method and extended Kalman filter (EKF) algorithm to update the pre and post parameters of the model, so as to minimize the mean square error of the retrieved shape parameters. We demonstrate the effectiveness of the method with several numerical examples, verifying that choosing a suitable measurement profile and aperture range will lead to better results of the inversion. Finally, we compare the effectiveness of ANFIS with that of a conventional feed-forward neural network (FNN) for cavity inversion, and verify that ANFIS is robust to noise in scattered field measurements.

Teng Li, Liu Yang, Aoyu Zhu, Jinhong Li
Entity Backdoor Attacks Against Fine-Tuned Models

Fine-tuning is a training paradigm that allows large models to achieve strong performance on downstream tasks with a small number of samples and minimal training time. However, this study reveals that the models fine-tuned from pre-trained models are vulnerable to a new threat called an entity backdoor attack. Entity backdoor attacks are a new type of backdoor attack that can use arbitrary instances in a given entity to trigger a backdoor attack. More importantly, the arbitrary instances (i.e., poisoned example) in the entity are visually similar to the clean example. For example, an entity backdoor attack can use the husky dog (which belongs to the dog entity) to trigger the stop sign class in the traffic recognition task, but the poisoned example in the training dataset is visually like a stop sign. The advantages of entity backdoor attacks over traditional backdoor attacks are twofold. First, entity backdoor attacks are triggered more stealthily because they do not require a specially defined trigger pattern superimposed on a normal image to trigger the backdoor attack. The instance (e.g., a husky dog) itself is a trigger, using the instance can directly trigger the backdoor attack. Second, the poisoned examples in the training datasets of entity backdoor attacks are more stealthy because we use very small perturbations to generate the poisoned examples, making them hard to distinguish from the clean examples. Experiments on multiple datasets show that systems using fine-tuned models are vulnerable to the threat of entity backdoor attacks.

Ting Yang, Jinxue Zhao, Xinyu Lei, Hongyu Huang, Nankun Mu, Xu Zhou, Mahabubur Rahman Miraj
Knowledge Graph Denoising with Dual Contrast for Recommendation

Recent research has increasingly utilized knowledge graphs (KG) to enhance recommendation system performance. However, traditional knowledge-aware recommendation models often encounter noisy data challenges in user-item interaction graphs and KG. Furthermore, these systems tend to prioritize frequently interacted popular items, neglecting less popular yet potentially relevant content, which compromises recommendation personalization and diversity. To address these issues, we propose the Knowledge Graph Denoising Dual-Contrastive Recommendation Model (KG-DCRec). Our approach performs denoising and reconstruction on low-attention edges in the KG while filtering high-attention edges. It effectively applies Singular Value Decomposition (SVD)-based denoising to the user-item interaction graph to extract users’ latent interests. In addition, we introduce a dual-contrastive learning mechanism that contrasts the denoised user-item interaction graph with the original interaction graph and the denoised knowledge graph. This design enables the model to adapt to users’ evolving preferences while enhancing its discriminative capability and generalization from multi-perspective representations. Extensive experiments on four datasets demonstrate that KG-DCRec outperforms state-of-the-art models.

Jingyan Zhou, Zhilong Shan, Zhengyang Wu, Xiaoyong Hu, Su Mu
DDformer: Deepfake Detection with Multimodal Fusion Transformer

Early deepfakes primarily focused on visual face swapping, but the advancement of multimodal deepfake technology now allows for realistic face and audio replacements. Although some researchers have made advances in using multimodal learning for deepfake detection, they still encounter two major challenges: heterogeneity and complementary data fusion. We propose a novel approach called DDformer, and introduce two fusion methods: Multimodal Fusion Transformer (MFT) and Shared Weight Attention Fusion (SWAF). MFT utilizes the powerful global modeling capability of the transformer, which enhances the fusion of multimodal features. However, SWAF incorporates channel attention with shared weights to further complement and enhance multimodal features. Finally, we design a novel classifier specifically tailored for detecting different types of deepfakes. This classifier effectively utilizes the fused multimodal features to accurately classify and identify various types of deepfake videos. DDformer achieved a multi-class classification accuracy of 97.59% on the challenging FakeAVCeleb dataset and demonstrated its generalization ability through generalization experiments. DDformer provides a promising solution for addressing the challenges of heterogeneity and complementary data fusion in multimodal deepfake detection.

Jiazhan Gao, Deqi Huang, Jinlai Zhang, Eksan Firkat, Chao Liu, Jihong Zhu
Improved Transfer Learning Based on Increased Model Capacity and Weight Re-initialization for ResNet

The Residual Network (ResNet) architecture is among the most renowned in the field of deep neural networks. It has been widely employed for tasks such as image classification, object detection, semantic segmentation, and other related applications. In practical settings, when models are deployed on novel tasks or custom datasets, transfer learning is frequently utilized by leveraging a pretrained model and fine-tuning its final layer for the new task. In this paper, we present a novel transfer learning framework for ResNet that combines enhanced model capacity with weight re-initialization. The proposed framework specifically explores weight re-initialization strategies for ResNet variants with increased capacity. Experimental results on the CUB-200-2011 and Food-101 datasets demonstrate that the proposed approach outperforms baseline methods.

Fengqian Pang, Yunjian He, Yingying Kang, Yuming Xia, Qian Li, Ruiyi Xu, Zhiqiang Xing
BEVboost: Research on 3D Object Detection Method for Roadside Based on Multi-feature Fusion

In the current research and development of autonomous driving technology, most existing autonomous driving systems rely on perception methods based on self-vehicle sensors. This approach neglects a highly promising avenue for perception enhancement: leveraging intelligent roadside cameras to overcome visual limitations and achieve more comprehensive and in-depth perceptual capabilities. Current vision-centric detection methods exhibit low adaptability when applied to roadside camera detection in diverse intersection scenarios.Roadside cameras are primarily used for roadside 3D object detection. Early methods employed single sensors but suffered from limited accuracy and poor performance in complex scenarios. Subsequent advancements introduced multi-sensor fusion, though synergistic advantages were not fully exploited. Recent deep learning-based multi-sensor fusion methods have significantly improved detection accuracy and stability. Current research focuses on optimizing algorithm performance, expanding application scenarios, and leveraging communication technologies for collaborative processing.This paper proposes Bird’s-Eye-View Boost (BEVboost), a roadside 3D object detection method that addresses the low adaptability of existing methods across diverse intersection scenarios. Through experiments and comparative evaluations on a roadside 3D object detection benchmark platform, BEVboost outperforms vision-centric counterparts in performance. This achievement provides novel insights and technical support for advancing perception in autonomous driving.

Wenze Liu, Xingang Wang
ARG-Net: Gaze Estimation Based on Adversarial Learning and Learnable Networks

Gaze estimation tasks have widespread applications in fields such as human-computer interaction, virtual reality, and driver monitoring. However, they still face several challenges and difficulties in practical scenarios. Issues such as individual differences, occlusion, lighting variations, dynamic environments, and difficulties in data annotation continue to pose challenges for this task in real-world applications. Therefore, this paper proposes an innovative gaze estimation model—ARG-Net—based on cross-domain gaze estimation tasks. This model first designs a facial feature adversarial reconstruction module to remove unnecessary background noise and retain the key information related to gaze direction. Then, after processing the features, the image is passed through a multi-head self-attention mechanism to further enhance the ability to capture complex contextual information, effectively extracting long-range dependencies from facial images. This helps the model accurately predict gaze direction even in complex environments. The network’s non-linear representation ability is enhanced through learnable activation functions, improving its ability to perceive subtle eye movements and handle complex gaze features. Experimental results show that ARG-Net performs excellently in cross-domain gaze estimation tasks on the Gaze360 and MPiiGaze datasets, with gaze estimation errors of 9.84° and 6.53°, respectively, lower than the recent cross-domain gaze estimation errors in these two datasets. Compared with other existing models, ARG-Net achieves better accuracy and robustness, performing well under occlusion, lighting changes, and complex scenarios. It has broad practical application potential and can provide more accurate gaze detection solutions for intelligent interaction, security monitoring, and other fields.

Ziqi Feng, Yi OuYang, Xiaogang Xu
GNN Advanced Heuristics Algorithm for Solving Multi-depot Vehicle Problem

The Multi-Depot Vehicle Routing Problem (MDVRP) is a critical challenge in logistics optimization. Most existing heuristic-based algorithms trade off speed and solution quality. It is of great academic significance and application value to study a fast and high-quality algorithm to solve the problem. This work proposes GAMDVRP, the first framework integrating graph neural networks (GNN) with ant colony optimization (ACO) algorithm to address MDVRP. Our key innovation lies in using heuristic information learned from GNN to guide ACO algorithm to solve MDVRP. Experiments demonstrate 16.48%, 23.83%, and 23.97% improvement in solution quality over classical ACO variants across three problem scales (50, 100 and 200 nodes) in synthetic data, achieving 95.5% optimality on real-world benchmarks with greater speedup versus genetic algorithms.

You Wu, An Liu
MSDBNet: A Multi-scale and Dual-Branch Network for Cross-Domain Person Re-identification

Cross-domain person Re-identification (re-ID) has witnessed rapid development, driven by the breakthrough advancements of unsupervised techniques in visual tasks. Current unsupervised domain adaptation (UDA) methods generally follow a two-step strategy, which involves data generation and feature extraction, are widely adopted in cross-domain person re-ID. However, the accuracy of UDA networks remains limited, primarily due to the weak feature alignment capability, which fails to generate high-quality pseudo target domain data, and the networks’ tendency to over-emphasize global features while neglecting crucial local characteristics during training. To address these issues, we propose a UDA-based Multi-Scale and Dual-Branch Network (MSDBNet), which integrates a Style-Injected Generative Adversarial Network (SIGAN) and a Dual-Branch Alignment and Cross-Attention Fusion Network (DBCF-Net). Specifically, SIGAN alleviates domain distribution discrepancies through a multi-scale domain-aware generation strategy. DBCF-Net mitigates critical information oversight through dual-branch global-local feature alignment, graph sampling optimization, and cross-attention fusion learning. Extensive experiments on two public datasets and a self-built dataset demonstrate the superiority of MSDBNet, achieving improvements of up to 0.7% mAP and 1.2% Rank-1 accuracy compared with existing leading networks.

Gaobo Zhang, Wenhan Long, Xinlong Wen, Weijing Da, Rongbo Zhu
Global and Local Feature Enhancement for Short Video Fake News Detection

With the increasing prevalence of short videos as a medium for news dissemination, their openness and ease of editing have rendered them a high-risk vehicle for the spread of fake news. However, existing detection methods struggle to effectively capture deep forgery features, particularly when faced with complex cross-modal interactions and diverse forgery patterns. To address this challenge, we propose GLFE-SVFD, a fake news detection framework that integrates global and local feature enhancement. Leveraging an attention mechanism, GLFE-SVFD dynamically measures modality complementarity and disparity, adaptively adjusting modality weights to amplify critical information while suppressing noise interference. Specifically, global feature enhancement captures cross-modal correlations, while local feature enhancement further focuses on fine-grained modality discrepancies. Experimental results demonstrate that GLFE-SVFD surpasses existing methods in short video fake news detection, significantly improving detection performance.

Haoran Wang, Yan Yang, Yingli Zhong
SpikingRM: Efficient Scheduling Algorithm Based on Spiking Neural Network and Deep Reinforcement Learning

Cloud computing has become a key infrastructure for global computing and digital transformation, but efficient and balanced resource scheduling remains a major challenge. Deep Reinforcement Learning (DRL) has shown promise in this field, yet struggles with high-dimensional data and sample efficiency. Spiking Neural Networks (SNN), inspired by biological neurons, offer low energy consumption and sparse computation, but practical applications in scheduling are still limited due to issues like feature loss and sample dependence. This paper proposes SpikingRM, a hybrid model that integrates SNN with DRL to leverage their respective strengths. The system encodes task and resource features into spike frequency and intensity, enabling efficient scheduling across heterogeneous resources such as CPU, memory, and GPU. Experiments based on the Google Cluster Data 2019 show that SpikingRM outperforms DeepRM by 10%–30% in scheduling speed, load balancing, and energy efficiency, and achieves 20%–35% gains over traditional algorithms like SJF and FCFS in key performance metrics.

Xiubo Liang, Shuwei Liu, Hongzhi Wang, Qifei Zhang, Bin Zhao
Infrared Multi-Scale Target Detection Based on Improved YOLOv11 and Spatiotemporal Features

Infrared image has the inherent defects of less effective information and low signal-to-noise ratio, which makes it challenging to discern targets from the background clutter in the spatial domain. To mitigate this challenge, we propose a multi-scale infrared detection model based on time-spatial feature fusion and YOLOv11n architecture. In particular, we design the PTF module to fuse spatial features, enabling the model extract robust spatiotemporal features of information complementarity while mitigating the impact of infrared noise on detection accuracy. Given the prevalence of background clutter in infrared images, which often generates erroneous feature points and degrades bounding box localization. Therefore, we developed EGF module to integrate edge information into multi-scale features to accurately target key feature points. Extensive evaluations on HIT-UAV public dataset confirm our model’s superiority, gaining 7.8% and 12.4% improvements in mAP50 and mAP50:95 against main-stream models.

Yiqing Li, Ke Xu
Hierarchical Attention-Driven Dynamic Graph Neural Networks for Accurate Supply Chain Demand Forecasting

Supply chain demand forecasting is fundamental to supply chain management. Precise forecasts optimize inventory, cut costs, and boost efficiency. However, traditional methods struggle in existing scenarios. They can’t clearly define hierarchical relationships among supply chain enterprises, leading to information transfer and integration issues. Also, they fail to adapt quickly to dynamic changes like adjusted cooperation and logistics route shifts. Therefore, this paper proposes a supply chain demand forecasting model based on hierarchical attention mechanism and dynamic graph neural network. The hierarchical attention mechanism is used to deeply explore the relationships within and across the levels of the supply chain, helping the model understand complex structures. The dynamic graph structure learning module tracks the changes in the supply chain in real time and adaptively adjusts the graph structure. Additionally, we expand the model into a probabilistic model to quantify the uncertainty of predictions, providing more information for decision-making. We conducted experiments on the SupplyGraph dataset, comparing the performance of our model with several GNN baseline models. The experimental results demonstrate that our model outperforms the baseline model in RMSE, MAE and other metrics.

Xiaowei Liu, Qingxiang Wang, Xiumei Wei, Hu Liang
DHCBR: Evaluating the Influence of Supply Chain Complex Network Nodes Based on ResNet

Due to the impact of sudden events, the security of global supply chains is facing serious challenges. Modeling supply chain networks as complex networks and analyzing them can effectively identify key nodes, thereby providing support for downstream risk analysis. Currently, various methods have been proposed to identify influential nodes in complex networks by constructing network topological features. However, due to the unique characteristics of supply chain networks, there is a lack of targeted methods for identifying influential nodes in this field. To address this issue, this paper proposes a framework for identifying influential nodes in complex networks—DHCBR, based on ResNet and node feature representation. By analyzing supply chain networks, we propose using node degree, H-index, and clustering coefficient to characterize node information, and we evaluate the algorithm using the ResNet model. Experimental results show that DHCBR can effectively identify influential nodes in supply chain networks and has the potential to be generalized to other real-world networks. Furthermore, through a comparison of time overhead with other methods, DHCBR demonstrates good applicability when dealing with large-scale networks.

Zhihao Zhang, Jinghua Yan, Xingyu Fu, Taiyao Zhang, Zhou Zhou
An Efficient DNN Training Method with Progressive Pruning

As the complexity of deep neural networks (DNNs) grows, there is an increasing demand for efficient and lightweight training methods, especially for edge devices with limited resources. As a model compression technique, pruning is a promising solution to improve training efficiency by removing redundant neurons and connections. It typically involves two approaches: train-prune-retain, which incurs additional resource costs, and pruning-while-training, which struggles to balance the model size and accuracy in the one-step manner. In contrast, progressive pruning provides more granular control over the pruning schedule and intensity, facilitating a better balance between model sparsity and accuracy. Inspired by this, we propose a two-stage progressive pruning-based training method. The top-k weights are identified and updated using the learning rate, while the remaining weights are adjusted only through a weight decay mechanism. In the second stage, progressive pruning is applied to the tracked weights, with the untracked weights being pruned at the beginning. During the progressive pruning, a nested model library is generated, which enables efficient switching between different balance points of sparsity and accuracy. When applied to resource-limited edge devices, the library provides greater flexibility to trade off resource cost and model accuracy. This method is applied to train VGG-S, DenseNet, MobileNetV2, and ResNet18, using the CIFAR-10, CIFAR-100, and SVHN datasets. Experimental results show that our method achieves a 20× compression rate with less than 1% accuracy loss compared to baseline models (trained without pruning). Compared to existing pruning-based training methods, our method yields the least accuracy loss and convergence time, while maintaining a similar compression rate.

Wenyan Luo, Zhaohui Guo, Qiang Liu
TPKD: Teacher-Pruned Knowledge Distillation for Point Cloud-Based 3D Object Detection

Advanced point cloud-based 3D object detectors often suffer from considerable computational overhead, limiting their deployment in resource-constrained scenarios such as autonomous driving, smart cities, and robotics. To address this issue, we propose a teacher-pruned knowledge distillation framework that combines a teacher pruning process and a rewind-based label-switching strategy for efficient 3D object detection. Rather than transferring knowledge directly, our method exploits the beneficial effects of unstructured global pruning and structured channel pruning to generate high-quality soft labels. Additionally, we propose a rewind-based label-switching strategy combined with a multi-cycle learning rate schedule to improve the performance of teacher-pruned knowledge transfer. Extensive experiments conducted on both the Waymo and the KITTI datasets validate the effectiveness of our approach. Specifically, our CP-voxel (×0.75) model maintains accuracy while achieving 1.8 × and 1.4 × reductions in FLOPs and latency, respectively. And our accuracy-preserving SECOND (×0.5) variant attains 4.0 × and 1.3 × reductions in FLOPs and latency, respectively.

Fuyang Li, Liang Xiao, Dawei Zhao, Qi Zhu, Yiming Nie, Bin Dai
Network Protocol Security Evaluation via LLM-Enhanced Fuzzing in Extended ProFuzzBench

Network protocol is of paramount importance as it serves as the bridge for secure communication within and between networks, and protocol security is vital for ensuring the integrity, confidentiality, and availability of data transmission. Nonetheless, protocol testing is a challenging task. we propose LLMFuzz, a novel LLM-based network protocol testing approach, and extend ProFuzzBench by integrating three advanced fuzzers, AFLNet Legion, ChatAFL, and LLMFuzz—along with three protocols (HTTP, MQTT, and Modbus). Experimental results demonstrate that our newly proposed and incorporated LLMFuzz outperforms the baseline fuzzer AFLNet and state-of-the-art fuzzer ChatAFL across multiple network protocols and evaluation metrics.

Hanwen Gong, Chen Yan, Yinxing Xue, Yan Guo
Backmatter
Titel
Advanced Intelligent Computing Technology and Applications
Herausgegeben von
De-Shuang Huang
Wei Chen
Yijie Pan
Haiming Chen
Copyright-Jahr
2025
Verlag
Springer Nature Singapore
Electronic ISBN
978-981-9500-09-3
Print ISBN
978-981-9500-08-6
DOI
https://doi.org/10.1007/978-981-95-0009-3

Die PDF-Dateien dieses Buches wurden gemäß dem PDF/UA-1-Standard erstellt, um die Barrierefreiheit zu verbessern. Dazu gehören Bildschirmlesegeräte, beschriebene nicht-textuelle Inhalte (Bilder, Grafiken), Lesezeichen für eine einfache Navigation, tastaturfreundliche Links und Formulare sowie durchsuchbarer und auswählbarer Text. Wir sind uns der Bedeutung von Barrierefreiheit bewusst und freuen uns über Anfragen zur Barrierefreiheit unserer Produkte. Bei Fragen oder Bedarf an Barrierefreiheit kontaktieren Sie uns bitte unter accessibilitysupport@springernature.com.

    Bildnachweise
    AvePoint Deutschland GmbH/© AvePoint Deutschland GmbH, NTT Data/© NTT Data, Wildix/© Wildix, arvato Systems GmbH/© arvato Systems GmbH, Ninox Software GmbH/© Ninox Software GmbH, Nagarro GmbH/© Nagarro GmbH, GWS mbH/© GWS mbH, CELONIS Labs GmbH, USU GmbH/© USU GmbH, G Data CyberDefense/© G Data CyberDefense, FAST LTA/© FAST LTA, Vendosoft/© Vendosoft, Kumavision/© Kumavision, Noriis Network AG/© Noriis Network AG, WSW Software GmbH/© WSW Software GmbH, tts GmbH/© tts GmbH, Asseco Solutions AG/© Asseco Solutions AG, AFB Gemeinnützige GmbH/© AFB Gemeinnützige GmbH