Skip to main content
Top

Advanced Intelligent Computing Technology and Applications

21st International Conference, ICIC 2025, Ningbo, China, July 26–29, 2025, Proceedings, Part XXVII

  • 2025
  • Book
insite
SEARCH

About this book

The 20-volume set LNCS 15842-15861, together with the 4-volume set LNAI 15862-15865 and the 4-volume set LNBI 15866-15869, constitutes the refereed proceedings of the 21st International Conference on Intelligent Computing, ICIC 2025, held in Ningbo, China, during July 26-29, 2025.

The 1206 papers presented in these proceedings books were carefully reviewed and selected from 4032 submissions. They deal with emerging and challenging topics in artificial intelligence, machine learning, pattern recognition, bioinformatics, and computational biology.

Table of Contents

Frontmatter

Healthcare Informatics Theory and Methods

Frontmatter
GaMNet: A Hybrid Network with Gabor Fusion and NMamba for Efficient 3D Glioma Segmentation

Gliomas are aggressive brain tumors that pose serious health risks. Deep learning aids in lesion segmentation, but CNN and Transformer-based models often lack context modeling or demand heavy computation, limiting real-time use on mobile medical devices. We propose GaMNet, integrating the NMamba module for global modeling and a multi-scale CNN for efficient local feature extraction. To improve interpretability and mimic the human visual system, we apply Gabor filters at multiple scales. Our method achieves high segmentation accuracy with fewer parameters and faster computation. Extensive experiments show GaMNet outperforms existing methods, notably reducing false positives and negatives, which enhances the reliability of clinical diagnosis.

Chengwei Ye, Huanzhen Zhang, Yufei Lin, Kangsheng Wang, Linuo Xu, Shuyan Liu
RSEF: Enhancing Fairness and Accuracy in Hematopoietic Stem Cell Transplantation Survival Prediction Through Race-Stratified Ensemble Framework

This research introduces the Race-Stratified Ensemble Framework (RSEF) to address accuracy and fairness in survival prediction for patients undergoing hematopoietic stem cell transplantation (HCT). Despite HCT’s critical role in treating hematologic malignancies, racial disparities in prognosis persist, with traditional models overlooking fairness. RSEF integrates fairness through a four-stage architecture: data preparation and stratification, base model training, direct model optimization, and metamodel ensemble with risk score calibration. Key innovations include a dual-objective transformation strategy, race-specific data processing, data auditing, a multi-model ensemble, and race-specific risk score calibration. Experiments on the CIBMTR dataset show that RSEF achieves a stratified concordance index of 0.6918, improving performance by 1.5 percentage points and reducing racial group variability by 50%. Performance gains are most notable among African American and Hispanic patients, demonstrating RSEF’s effectiveness in reducing racial bias in medical predictions. This study offers an equitable HCT survival prediction method and new insights into algorithmic fairness in medical AI, potentially advancing fairness in precision medicine.

Tianxiang Xu, Jiahao Li, Chenyu Liu, Chang Liu, Jianhe Li, Kangsheng Wang, Changbang Li
The Cyber-Physical System of Oral Health Monitoring: A Data-Driven Approach for Inference

This study proposes a data-driven Cyber-Physical System (CPS) for oral health monitoring that embeds an innovative deep learning model. By integrating sensor data from the physical world with artificial intelligence (AI) algorithms from the cyber world, the system establishes a closed loop from data collection and diagnostic inference to feedback-driven decision-making. The novel deep learning model, which is Composite Attention Backbone Segmentation Network (CBASNet), is designed to analyze oral panoramic X-ray images, enabling multi-pathology diagnosis and the identification of restorative methods. Experimental results demonstrate that the proposed system excels in oral health monitoring and disease diagnosis, achieving state-of-the-art performance with 91.9% Bbox mAP, 88.4% Segm mAP, and 87.7% Dice, surpassing traditional diagnostic methods and existing technologies. These results highlight the system's ability to improve the early detection rate of oral diseases, enhance treatment outcomes, and provide an innovative solution for intelligent diagnosis in dentistry.

Shuo Wang, Qiankun Li, Wanxiu Xu, Dezhi Yuan, Baokang Wu, Qingqin Xu
MSSTDCN: A Multi-Scale Spatiotemporal Deep Convolutional Network Based on Power Spectral Density for Cross-Subject Epileptic Seizure Detection

Deep learning has improved subject-specific epileptic seizure detection from EEG signals, but cross-subject performance remains challenging due to inter-subject variability like age, seizure types, and gender. To tackle this challenge, we present a Multi-Scale Spatiotemporal Deep Convolutional Network (MSSTDCN) enhanced with Power Spectral Density (PSD) feature extraction for improved generalization. The framework includes signal preprocessing, data augmentation, and class balancing to boost robustness. PSD features from multiple frequency bands are integrated and classified to distinguish seizures from interictal states. Experiments show that MSSTDCN achieves 87.07% accuracy on the CHB-MIT dataset, outperforming baselines by 2.39%, and 88.05% on the Siena dataset, with a 2.36% improvement. These results highlight the model’s strong cross-subject generalization and potential for clinical epilepsy diagnosis.

Jibin Shou, Jingyuan Wang, Peipei Gu, Meiyan Xu, Jiayang Guo, Duo Chen, Wenhong Li
Ocular Disease Classification Based on Heterogeneous Interaction Among Visual, Diagnostic Semantics, and Generative Knowledge

Accurate diagnosis of ocular diseases is crucial for early detection and effective treatment. While recent vision-language interaction methods have achieved significant performance in medical diagnosis, existing approaches rely solely on convolutional neural networks for visual feature extraction, overlooking critical diagnostic textual data, and individual-specific traits. To address these limitations, this paper proposes an innovative framework that integrates visual, diagnostic semantic, and generative knowledge for robust ocular disease classification. The framework consists of three key modules: a Scalable Diagnostic Information Network (SDIN), a Heterogeneous Graph Attention Network (HGAT), and a Fusion Decision Module (FDM). SDIN combines binocular fundus images, diagnostic keywords, and external information generated by a Large Language Model into a comprehensive graph-based representation. HGAT employs a dual-level attention mechanism to enhance feature representation across heterogeneous modalities, while FDM effectively integrates features from binocular images for accurate multi-label ocular disease classification. Additionally, patient-specific information, such as age and gender, is encoded to further refine the feature representation. Experiments conducted on the OIA-ODIR dataset demonstrate that the proposed model achieves state-of-the-art performance, with a 1.19% improvement in final score over previous methods, showcasing its effectiveness and robustness in diverse clinical scenarios.

Zechang Xiong, Zhenyan Ji, Jiuqian Dai, Hui Liu, Wenhui Chen, Shen Yin, Jose Enrique Armendariz-Inigo
Energy Efficiency Evaluation Method Based on CNN-BiGRU-Attention in University Integrated Energy System

Energy consumption in universities has steadily risen in recent years due to the continuous expansion of school infrastructure. This trend highlights the urgent need for effective energy evaluation and conservation. To address this challenge, this paper proposes a CNN-BiGRU-Attention (CBA)-based energy efficiency evaluation method for university integrated energy system. First, we construct an energy efficiency evaluation index system for universities from four dimensions: energy consumption, environment, economy, and reliability. We then propose a method to calculate energy efficiency values based on the improved combined weights assignment and TOPSIS method. This method strengthens the data foundation for model training by optimizing weight allocation and data fusion. Furthermore, we propose an energy efficiency evaluation method based on CBA. This method combines spatial feature extraction, time series data processing, and dynamic weight allocation to comprehensively evaluate the energy efficiency of the university integrated energy system using raw data from electricity load, cooling load, heating load, etc. The experimental results show that the proposed model improves R2 by 1.341% and reduces RMSE by 1.657 on average across the four dimensions compared to CNN. The model leads to a more accurate and comprehensive evaluation of university energy efficiency.

Jie Cao, Shuang Liang, Nan Qu, Yang Xi, Haoran Wang
Adaptive Balancing and Progressive Self-Training: An Effective Semi-Supervised Method for Histopathological Image Segmentation

In the field of histopathology, semantic segmentation is essential for the diagnosis of diseases, the evaluation of treatment effectiveness, and the prediction of outcomes. However, the task of assigning pixel-level labels to histopathology images manually is a process that requires significant time and effort. In this work, we introduce an advanced semi-supervised framework for semantic segmentation in histopathology imaging, called AEL-SPST, which effectively merges Adaptive Equalization Learning (AEL) and Selective Progressive Self-Training (SPST). Overcoming the intrinsic difficulties posed by scarce labeled data and class imbalance in histopathology images. The AEL adopts a dual-network structure, where the instructor network is tasked with producing pseudo-labels, and the learner network performs online learning. Its core component, Adaptive Equalization Sampling (AES), is central to monitoring and dynamically adapting to category-specific performance during training, with the aim of improving segmentation accuracy for underrepresented classes. For the pseudo-label generation process, SPST employs a reliability assessment mechanism to filter and rank the pseudo-labels based on their consistency and accuracy. The framework selectively retrains the model, first focusing on the reliable pseudo-labels and subsequently incorporating the less reliable ones. This approach not only enhances the robustness and precision of pseudo-labels but also significantly reduces reliance on extensive labeled datasets. In more intuitive terms, AEL structures the instructional methodology, while SPST cultivates a curriculum-based learning process that gradually assimilates complex data. Rigorous experimental assessments conducted across two difficult datasets, BCSS and LUAD-HistoSeg, demonstrates our method’s outstanding performance, achieving average improvements of 3.5% and 6.35% respectively.

Rui Xu, Yanhao Wang, Nan Zhou, Shiming Shen, Hang Wan
Multi-Modality and Multi-Grained Transformer for Accurate Radiology Report Generation

Automatic generation of radiology reports has become an important task for medical AI, which aims to generate their corresponding textual descriptions from radiology images. This limitation arises from existing methods’ tendency to overlook core elements of radiologists’ diagnostic workflow, making it difficult to establish image–clinical history relevance and structurally disentangle disease semantics into diagnostic concepts and their manifestations, thereby compromising clinical coherence and diagnostic precision. To address this, we propose the Multi-Modality and Multi-Grained Transformer (MMMGT), a framework that incorporates elements of the radiologist’s diagnostic workflow by embedding clinical reasoning patterns into both the encoding and decoding processes. (1) In the encoding process, the Multi-Modality Semantic Encoder (MMSE) integrates visual features with clinical history embeddings via cross-modal attention, dynamically adjusting attention weights to focus on abnormal regions and mitigate visual bias. (2) The Multi-Grained Semantic Decoder (MGSD) generates structured topic–state pairs to mimic the hierarchical diagnostic process. It further incorporates a clinical alignment signal to enhance consistency with disease labels. Extensive evaluations on the IU-Xray and MIMIC-CXR datasets demonstrate the effectiveness of MMMGT in achieving clinical coherence and diagnostic precision, attaining state-of-the-art results with 0.520 BLEU-1 and 0.445 ROUGE-L (reflecting clinical narrative consistency) on IU-Xray, and a 0.411 F1 score (indicating diagnostic accuracy) for clinical efficacy on MIMIC-CXR. Ablation studies validate the contribution of each module.

Hongzhao Li, Liangzhi Zhang, Xiangrong Zhong, Jingpu Zhang, Shuo Feng, Shupan Li
Fine-Tuning Large Language Models for Early Mental Health Intervention in China: A Culturally Adapted Triage and Therapy Framework

Mental health challenges in China are rising, yet early intervention remains limited due to stigma, high costs, and therapist shortages. This paper presents a culturally adapted Large Language Model (LLM) that integrates Cognitive Behavioral Therapy (CBT), psychodynamic therapy, and psychoanalysis to provide scalable, anonymous support. Using 10,000 anonymized dialogues from “壹心理,” we generate multi-framework responses with GPT-4o-mini and fine-tune Qwen-0.5B using Guided Reinforcement Parameter Optimization (GRPO). Evaluated against GPT-3.5, untuned Qwen-0.5B, and licensed therapists, our model shows superior performance in triage relevance and user motivation, despite its compact size. The system reduces labeling effort, aligns with Chinese cultural norms, and offers a promising direction for AI-assisted early intervention.

Yuting Shi
Dynamic Bidirectional Attentional Mamba Model for EEG-Based Motor Imagery Classification

Convolutional Neural Networks (CNNs) and Transformer-based models have been extensively explored in brain–computer interface (BCI) systems based on motor imagery (MI) for decoding electroencephalogram (EEG) signals. However, CNNs are limited in modeling long-range dependencies, while the self-attention mechanism of Transformers is constrained by its quadratic computational complexity as input size increases. Recently, State Space Models (SSMs) like Mamba, with hardware-aware designs, have shown strong capability in long-sequence modeling, offering linear computational complexity. Inspired by this, we propose DBAM-EEG, a model designed for efficient decoding of EEG-based MI signals. DBAM-EEG consists of three main modules: attention bidirectional convolution (ABC) block, Vision Mamba (Vim) Encoder, and weighted residual temporal convolutional network (WRTCN). The ABC block encodes the MI-EEG signals into advanced temporal sequence representations, the Vim Encoder focuses on capturing discriminative temporal features within the sequences, and the WRTCN extracts high-level temporal features. On the BCI Competition IV-2a dataset, DBAM-EEG achieved classification accuracies of 87.65% and 71.51% under subject-dependent and subject-independent paradigms, respectively, surpassing the performance of state-of-the-art methods.

Bin Liu, Qianzi Shen, Zijian Wang, Yanting Zhang, Cairong Yan
A Review of Non-invasive Brain-Computer Interface Rehabilitation Research for Stroke

Non-invasive brain-computer interfaces (BCIs) have attracted growing attention in stroke rehabilitation research and clinical applications. This review comprehensively analyzes the current status, progress, and challenges of non-invasive BCI-based rehabilitation research. It systematically explores the application scenarios, existing hurdles, and future directions of non-invasive BCIs in stroke rehabilitation, aiming to offer valuable reference for researchers and promote further development and innovation in this field.

Yanhua Deng, Changwu Ke, Nan Chen, Sha Gu
Secure Multicenter Medical Model Inference from Homomorphic Encryption

To address the privacy requirements of sensitive medical data, we present a multicenter diagnostic inference framework based on homomorphic encryption (HE). Leveraging the CKKS scheme implemented in the Lattigo library with 128-bit security, our method enables efficient and privacy-preserving inference for diseases such as bladder cancer, breast cancer, and sepsis. By exploiting the sparsity of LASSO parameters, we significantly reduce the computational overhead of encrypted inference. Notably, our LASSO-based analysis reveals a potential therapeutic target for bladder cancer. Experimental results across multiple datasets show that encrypted inference achieves performance comparable to plaintext inference, demonstrating that strong privacy can be preserved without sacrificing diagnostic performance.

Jingwei Chen, Chen Yang, Yuwen Chen, Kunhua Zhong, Wenqiang Yang, Jiang Liu, Wenyuan Wu, Bin Yi
A 3D Liver and Tumor Segmentation Method Based on U-Mamba and Efficient Paired Attention

Accurate segmentation of the liver and tumor is crucial for clinical diagnosis. However, variations in liver size, similarity to adjacent organs, and the small, irregular nature of tumors pose significant challenges for automated methods. To tackle these problems, this paper proposes a 3D liver and tumor segmentation network named U-EPM, which is based on U-Mamba and efficient paired attention. In order to make full use of spatial information and establish global dependencies among features, this paper introduces efficient paired attention block into the U-Mamba block and names this combination the efficient paired attention Mamba module (EPM). In the EPM module, local details are first captured through convolution operations. The resulting features are then fed into the efficient paired attention block to further extract and fuse spatial and channel information. Finally, the Mamba block is applied to capture global contextual dependencies. During the decoder stage, a triple-path skip connection block (TSB) with branch weights is employed to fully leverage the information from the encoder, thus enhancing the segmentation performance of the model. Experimental results on the Abdomen CT, ATLAS, and private MRI datasets indicate that U-EPM demonstrates excellent performance in the tasks of liver and tumor segmentation.

Shili Yang, Xiaolong Zhang, Xiaoli Lin, He Deng, Hongwei Ren
Mamba-Enhanced Decoder with Prototype Consistency for Semi-Supervised Medical Image Segmentation

Convolutional neural networks (CNNs) excel in extracting local features while limited by small receptive fields. Transformers are good at capturing global relationships but are vulnerable to high computational costs at high resolutions. To solve this issue, a Mamba-Enhanced Decoder framework combined with Prototype Consistency learning (MEDPC) is introduced. The Mamba-enhanced auxiliary decoder establishes long-distance dependencies and effectively captures complex shape structure features. In addition, to effectively capture category associations among pixels, we introduce prototype consistency loss to enhance the ability to distinguish category features. With 20% labeled samples, our method achieves 91.1% Dice performance on the LA dataset.

Dong Chen, Yunrong Zhang, Xiaonan Li, Haibin Ma, Liang Tian, Lei Li
BES-UNet: A Boundary Enhanced Sparse Attention UNet for Skin Lesion Segmentation

To address the challenge of simultaneously achieving accurate segmentation and computational efficiency in skin lesion segmentation, this paper proposes a boundary-enhanced sparse attention network (BES-UNet). The network achieves high-precision segmentation and efficient inference of skin lesions, especially those with fuzzy boundaries and irregular shapes, through the following two key innovations: (1) a bidirectional enhanced adaptive learning framework is proposed, which achieves the collaborative optimization of accurate boundary positioning and overall regional segmentation; (2) designs a boundary-aware dynamic sparse attention mechanism, which uses a content-adaptive dynamic sparsification strategy and a multi-scale shared memory space to significantly reduce model computational complexity while improving the ability to identify lesion edges. Experiments on the ISIC2017 and ISIC2018 public datasets show that BES-UNet has a strong advantage in segmenting fuzzy or irregular boundary regions, with an average intersection over union (mIoU) of up to 81.98% and a Dice similarity coefficient (DSC) of 89.65%. The F1 score for boundary regions is as high as 60.17%, an improvement of 1.05–2.22 percentage points over existing methods. Ablation experiments further confirmed the effectiveness of each component, providing a feasible solution for high-precision skin lesion segmentation in resource-constrained environments.

Cheng-Le Qu, Jing-Rui Xu, Zheng-Yue Song
Memory and Time: A Psychology-Informed Depression Detection

Current computational methods for depression detection via social media primarily rely on static textual analysis, overlooking two key aspects: the temporal evolution of depressive symptoms and the psychological mechanisms behind emotional memory persistence and decay. Existing approaches often treat posts as isolated instances, and simply combining temporal and semantic embeddings can lead to entangled, hard-to-interpret representations. To address these limitations, we propose a novel framework that integrates psychological theories of memory with deep temporal modeling. Our approach introduces a dynamic memory system that simulates natural emotional fading while preserving impactful experiences, alongside a neural architecture that captures symptom progression by jointly modeling content and temporal dynamics. It incorporates an exponential forgetting mechanism, adaptive thresholding for emotional persistence, and a temporal-aware transformer to identify long-term depressive patterns. By aligning computational modeling with psychological insights, our method not only detects expressed symptoms but also traces their development over time. Experiments on eRisk2017 and eRisk2018 show state-of-the-art results, confirming that modeling temporal-psychological dynamics enhances both detection accuracy and clinical interpretability, offering a promising tool for early mental health intervention.

Xiangyu Zheng, Junyu Lu, Qingtao Cheng, Yu Su
WMCTCF: A Wavelet and Multi-scale Convolution Based Transformer Cross-Modal Framework for Early Diagnosis of Alzheimer’s Disease

Cross-modal deep learning algorithms have achieved significant results in the classification of Alzheimer’s disease (AD). However, current fusion methods primarily rely on 3D-based approaches, whose complex algorithms may introduce redundant features and show limitations in extracting discriminative representations as well as effectively integrating heterogeneous data sources. Furthermore, the limited availability of medical data poses additional constraints on fully leveraging the strengths of Transformer models. To overcome these challenges, we propose WMCTCF, a novel Transformer-based cross-modal deep learning framework specifically designed for early AD detection. The proposed method integrates 2D MRI, clinical data, and genetic information, introducing an innovative strategy for image feature extraction by combining wavelet transform with the Swin-Transformer, which effectively captures both global and local features. Additionally, a multi-scale convolution mechanism combined with a Transformer encoder is designed for non-image modalities to better model long-range dependencies in clinical and genetic data. Experimental results demonstrate that WMCTCF achieves an ACC of 0.9859 and an AUC of 0.9947 on the ADNI dataset, marking a significant performance improvement over traditional 3D algorithms. The proposed WMCTCF provides a novel and effective solution for multi-modal AD diagnosis tasks.

Shang Wang, Ling Wang, Yunyi Qin, Donglin Xie, Jin-Xing Liu, Chunhou Zheng, Lili Han, Xinchun Cui
Cali-rPPG: A Unified Uncertainty-Aware Framework for Remote Photoplethysmography

Vision-based remote photoplethysmography (rPPG) enables non-contact heart rate measurement using consumer-grade cameras. However, the reliability assessment methods of rPPG predictions in out-of-distribution scenarios remains limited. To address this challenge, we propose Cali-rPPG, a unified uncertainty-aware framework with modular design that provides reliable reference-free uncertainty estimations. The framework comprises three modules: a rPPG module that generates heart rate estimations, an uncertainty module that predicts heart rate distributions and a calibration module that produces calibrated heart rate distributions whose uncertainty aligns with actual prediction errors. We comprehensively evaluate this framework across fifteen existing rPPG methods on five public databases, demonstrating its effectiveness. The proposed framework offers significant value for applications requiring reliable rPPG measurements in real-world settings. Code available at: https://github.com/stzzz99289/calirppg .

Tianze Shi, Yunjia Sun, Siqing Ye, Tao Wang

Biomedical Informatics Theory and Methods

Frontmatter
Noise-Aware Self-supervised Electrocardiogram Anomaly Detection

Electrocardiographic (ECG) anomaly detection aims to detect abnormal cardiac behaviors in the ECG. However, existing time-series anomaly detection methods have challenges in extracting key ECG features. To enhance the model’s ability to extract ECG features, we propose a simple yet effective self-supervised anomaly detection technique, named Noise Distribution Aware Autoencoder (NDAAE). This method utilizes mixed noisy ECG data to increase the feature extraction difficulty and aids model learning through a self-supervised task to capture more effective ECG features. Specifically, we first randomly inject six different types of noise signals into the raw ECG data. Then, using an autoencoder, we reconstruct the noisy ECG data back to the original noise-free ECG data. Finally, we establish a self-supervised task from the perspectives of low-dimensional feature space similarity measurement and noise classification, in order to improve the model’s ability to extract features. The method enables the model to obtain more effective ECG representations and improve the anomaly detection accuracy. We evaluate our proposed method on the real ECG dataset, and empirical results show that our proposal outperforms existing methods in detecting anomalies.

Jiawei Luo, Peng Chen, Haoyi Fan, Chunyi Guo, Zongmin Wang
A Multimodal Small Feature Set-Based Assisted Alzheimer’s Disease Diagnosis

Memory impairment and cognitive decline are hallmarks of Alzheimer’s disease (AD), a degenerative neurological disorder. For prompt action to limit its course, an early and accurate diagnosis is essential. Recently, multimodal data integration has emerged as a promising strategy to increase the accuracy of AD diagnoses. These techniques lower the danger of overfitting and increase diagnostic accuracy by integrating data from many sources. High-dimensional multimodal data, however, requires a lot of computing power. Small feature sets are extracted to improve model generalization and lower dimensionality in order to address this. We describe a multimodal small feature set-based approach to AD diagnosis in this study. We provide a method that combines between-group difference analysis with a multilayered attention mechanism. Using intergroup difference analysis, we filtered 114 SNPs, 77 RGV features, and 18 sMRI brain areas to create a low-redundancy small feature set. To create an AD diagnosis model using a multimodal small feature set, we integrated a convolutional neural network (CNN), a multilayer perceptron (MLP), and a multilayer attentional mechanism. 96.14% was the model’s average accuracy rate. Furthermore, by identifying important brain regions and genetic markers associated with AD, this approach provides insights into the disease’s genesis. Our strategy beats numerous modern AD diagnostic techniques, even if diagnostic performance based on multimodal small feature sets would not be as good as models employing entire data. Critical diagnostic information is retained while data dimensionality is efficiently reduced using the multimodal small feature set-based technique. It enhances computational efficiency and model interpretability, providing new avenues for neurological disease early detection.

Pengfei Tian, Qian Wang, Yang Xi
Two-Stage Multi-stained Cell Analysis with the Segment Anything Model for Pathological Image Segmentation

The Segment Anything Model (SAM) has emerged as a promising tool for image segmentation in the age of large-scale models. Despite being trained on an extensive dataset of over 11 million natural images, the nuanced and critical nature of pathological images challenges SAM’s applicability. Leveraging an expansive dataset of 28,786 patches across five modalities, we introduce a two-stage segmentation strategy tailored for SAM. The initial stage deploys convolutional neural network-based density regression models for cell detection, while the subsequent stage uses these detected regions as prompts to guide SAM’s segmentation task. Our model has achieved outstanding performance on both stages and provides a general paradigm for the detection and segmentation of pathological cells. Besides, this paper offers an extensive evaluation of SAM’s performance in pathological image segmentation, with a specific focus on multi-stained cell segmentation. Our results indicate that SAM’s performance varies across distinct pathological modalities when guided by bounding box prompts, which proved the need of developing a more robust model for pathological image segmentation in the future.

Jinke Li, Fang Yan, Xiaofan Zhang, Liangjing Yang
WTPAN-Net: Epileptic Seizure Prediction Model Based on Wavelet Convolutions and Attention Mechanisms

Epilepsy is a prevalent chronic neurological disorder, and the unpredictability of seizures significantly impacts patients’ quality of life. Electroencephalography (EEG) signals hold considerable potential for seizure prediction. Existing methods focus on either short-term or long-term EEG features, neglecting their integration and limiting seizure prediction efficacy. In this study, we propose a novel deep learning architecture, the WTConv-Parallelised Attention Network (WTPAN-Net), which integrates two key modules: the Wavelet Convolution (WTConv) module for capturing multi-scale features in the wavelet transform domain through dilated receptive fields, addressing the non-smooth nature of EEG signals, and the Parallelised Attention (PPA) module for dynamically fusing local and global features through a parallelised attention mechanism, enhancing the model’s sensitivity to key pre-seizure patterns. Experiments on the CHB-MIT dataset demonstrate WTPAN-Net’s effectiveness, achieving accuracy, sensitivity, and specificity levels above 95%. Ablation studies confirm the critical roles of both modules, and comparisons with state-of-the-art models show WTPAN-Net’s superior predictive performance, offering a reliable solution for epilepsy seizure prediction.

Shuoyue Jiao, Lin Chen, Changyu Zhang, Jun Guan
FCGR-Enhancer: A Lightweight Multi-scale CNN Model for Super-Enhancer Identification via Chaos Game Representation

Enhancer identification is a critical task in genomic sequence analysis, particularly in distinguishing typical enhancers from super-enhancers. In this study, we propose a lightweight and interpretable deep learning framework based on frequency chaos game representation (FCGR) and a multi-branch convolutional architecture for enhancer classification. Unlike existing methods that rely on sequence embedding or large-scale pretrained models, our approach encodes raw DNA sequences into multi-resolution FCGR matrices and preserves their original scale without resizing, effectively capturing complementary structural features across different k-mer granularities. To leverage these heterogeneous representations, we introduce a multi-branch convolutional neural network that processes each FCGR resolution independently before feature fusion, avoiding information loss and interpolation artifacts. Furthermore, attention mechanisms, including Squeeze-and-Excitation (SE) and Convolutional Block Attention Module (CBAM), are incorporated at both the branch and fusion levels to enhance informative feature extraction. Extensive experiments on both human and mouse enhancer datasets demonstrate the effectiveness of our approach. The best-performing variant, A3-CBAM, achieves state-of-the-art performance with an F1-score of 0.749 on the human dataset and 0.778 on the mouse dataset, outperforming several competitive baselines. The proposed model not only achieves superior performance but also maintains a compact architecture with minimal computational overhead. Our results validate the importance of multi-scale representation and branch-wise attention modeling in enhancer classification, providing a promising direction for interpretable and efficient genomic sequence analysis.

Huan Liu, Yidong He, Lingyun Luo, Pingjian Ding
Predicting Potential Associations Between Microbes and Diseases Using Graph Attention Auto-encoder and PU Learning

Over the past decade, a large body of research has underscored the intricate link between the occurrence of various complex diseases in humans and microbial communities, with microorganisms exerting significant impacts on human health through modulating biological processes. Biological experiments excel in accuracy for identifying disease-related microbes but suffer from prolonged timelines and excessive costs. The development of predictive models capable of accurately identifying disease-related microbes can effectively mitigate labor and time costs. In our investigation, a deep learning model named GRNMDA was developed for predicting human Microbe-Disease Association (MDA) by fully leveraging diverse biological data to construct the features of microbial and disease entities. Initially, our model calculated the functional similarities and Gaussian Interaction Profile (GIP) kernel similarities for each microbe and disease. To create a holistic similarity matrix for microbes and diseases, these features underwent fusion. Afterward, the feature representation of each microbe-disease pair was captured using a Graph Attention Auto-Encoder. Subsequently, based on the idea of Positive Unlabeled Learning (PU Learning), spectral clustering and the Light Gradient Boosting Machine (LightGBM) algorithm were combined to select reliable negative samples. Finally, the extracted MDA features, along with the chosen negative samples, were used as input data, and a modified residual network was constructed to predict potential MDAs. Using the model on HMDAD, extensive tests showed remarkable results: AUC of 96.47% and accuracy of 98.87%. Compared to baseline models, improvements were up to 1.56% and 6.97%, respectively. The model’s efficacy and dependability were further demonstrated by case studies on conditions including colorectal cancer.

Jie Zheng, Lingyun Dai, Feng Li, Rong Zhu
Cancer Subtype Recognition Algorithm Based on Hierarchical Attention and Contrastive Learning

Cancer, as a multifactorial regulated disease, demands precise subtype classification to support individualized treatment design and enhance survival outcomes. Although existing multi-omics data analysis reveals various facets of cancer progression, traditional methods overemphasize the consensus information across omics and neglect extracting omics-specific features, leading to challenges in addressing complex intergroup interactions and feature fusion problems. To address this problem, this study proposes a deep learning model based on a variational autoencoder, called Deep Multi-view Hierarchical Attention-based Clustering Learning (DMHACL). The model integrates multiple independent variational autoencoders through a multi-view encoder and combines them with a contrastive loss function to capture omics-specific variables and latent shared representations in different omics. Then, a hierarchical attention mechanism is introduced to fuse consistent representations across omics. In addition, this study uses a self-supervised deep embedding-based clustering algorithm to enhance clustering performance. Experimental results on six public cancer genome atlas datasets demonstrate that DMHACL outperforms five other competitive methods in cancer subtype identification. Recognizing cancer subtypes by DMHACL is biologically meaningful and interpretable in breast cancer case studies.

Yueqiao Ma, Xianguo Zhang
GEPD: GAN-Enhanced Generalizable Model for EEG-Based Detection of Parkinson’s Disease

Electroencephalography has been established as an effective method for detecting Parkinson’s disease, typically diagnosed early. Current Parkinson’s disease detection methods have shown significant success within individual datasets, however, the variability in detection methods across different EEG datasets and the small size of each dataset pose challenges for training a generalizable model for cross-dataset scenarios. To address these issues, this paper proposes a GAN-enhanced generalizable model, named GEPD, specifically for EEG-based cross-dataset classification of Parkinson’s disease. First, we design a generative network that creates fusion EEG data by controlling the distribution similarity between generated data and real data. In addition, an EEG signal quality assessment model is designed to ensure the quality of generated data great. Second, we design a classification network that utilizes a combination of multiple convolutional neural networks to effectively capture the time-frequency characteristics of EEG signals, while maintaining a generalizable structure and ensuring easy convergence. This work is dedicated to utilizing intelligent methods to study pathological manifestations, aiming to facilitate the diagnosis and monitoring of neurological diseases. The evaluation results demonstrate that our model performs comparably to state-of-the-art models in cross-dataset settings, achieving an accuracy of 84.3% and an F1-score of 84.0%, showcasing the generalizability of the proposed model.

Qian Zhang, Ruilin Zhang, Biaokai Zhu, Jun Xiao, Yifan Liu, Xun Han, Zhe Wang
CSDT-Net: Integrating Color Space Normalization and Deformable Transformer for Robust Breast Cancer Diagnosis

Breast cancer is one of the most prevalent cancers in women worldwide, with histopathological examination serving as the diagnostic gold standard. However, variations in staining protocols across institutions pose challenges for computer-aided diagnosis (CAD) systems. While Vision Transformer (ViT) excels at capturing global features, its limited ability to extract fine-grained lesion details and reliance on large-scale data hinder its direct application in medical image analysis. To address these challenges, we propose the Color Space Deformable Transformer Network (CSDT-Net), which integrates Color Space Normalization Layer (CSNL) and Deformable Transformer Layer (DTL) for robust breast cancer diagnosis. The CSNL mitigates color and brightness variations via multi-color space conversion and feature normalization, while the DTL leverages deformable self-attention to enhance fine-grained lesion representation. Experimental results on BACH and BRACS datasets show that CSDT-Net achieves state-of-the-art performance with accuracy rates of 91.38% and 89.53%, respectively, outperforming existing methods.

Longsheng Song, Jiale Wang, Haiyu Huang
A Novel Multichannel EEG Analysis Method Using Multiscale Graph Convolution and Cross Attention Transformer for Depression Detection

Depression, a major mental health disorder, has been increasingly prevalent worldwide. The diagnosis of depression through multichannel EEG topology emerges as a promising research direction. So, we introduce a method named MGFormer, designed to explore the complex interactions among channels and unearth underlying patterns within topological structures. Specifically, we propose a channel’s information aggregation strategy. Leveraging the capabilities of graph convolutional networks combined with internal across receptive fields, this approach flexibly extracts the channel’s neighboring features and captures spatial information at varying propagation depths. Compared to traditional GNN-based methods, this mechanism overcomes the limitations of node information aggregation, and pays more attention to the personalized needs of each channel. To optimize this process, we employ a precomputation technique that facilitates the parallel acquisition of these features. Moreover, we develop an information fusion strategy based on the cross-attention Transformer to enhance the dynamic interaction between different modalities. By exchange of query vectors, the model enhances information integration. Our method is verified on the HUSM and MODMA datasets. The model’s accuracy reaches 99.46% and 91.67%, respectively. We observe that depressed individuals exhibit significant differences in frontal and temporal EEG patterns. This work underscores the contribution of multichannel spatiotemporal features in depression detection, offering valuable support for its auxiliary diagnosis.

Xin Chen, Yici Liu, Zidong Liu, Yuhang Liu, Jean-Louis Coatrieux, Huazhong Shu
Multi-level Feature Enhancement Method for Lung Parenchyma Segmentation

Accurate segmentation of lung parenchyma plays a crucial role in computer-aided diagnosis systems for pulmonary carcinoma. While existing deep learning-based segmentation architectures demonstrate competent performance in processing large and well-defined lung regions, they exhibit notable limitations in capturing small and indistinct parenchymal areas. To address this critical challenge, we propose a novel Multi-level Feature Enhancement Network (MFE-Net) architecture based on U-Net framework. Firstly, we develop a Multi-Receptive Field Fusion Module (MRFB) to replace conventional convolutional blocks in the encoder pathway. This hierarchical architecture enables simultaneous extraction of multi-scale contextual information through parallel dilated convolution branches with varying dilation rates. Secondly, we introduce a Mixed Local Channel Attention (MLCA) mechanism within skip connections to establish cross-level feature interactions. Thirdly, we formulate a hybrid loss function combining Binary Cross-Entropy (BCE) and Dice loss to optimize both pixel-wise classification accuracy and regional shape consistency, particularly beneficial for segmenting marginal regions and small parenchymal lesions.In order to verify the effectiveness of the method, we conducted extensive experiments on the LIDC-IDRI lung CT dataset, and the experimental results showed that the overall average Dice Similarity Coefficient (D-S) of this network reached 93.58%. The proposed architecture demonstrates dual advantages in pulmonary tissue segmentation: it significantly enhances delineation accuracy for subtle, low-contrast parenchymal regions while preserving precise segmentation performance for well-defined macroscopic structures.

Tianyang Li, Yuhong Nie
Multilevel Residual Sleep Stage Classification Based on Dual-Stream Spatiotemporal 3D Convolutional Neural Networks

Sleep, as one of the key functions of the brain, is crucial for maintaining the physical and mental energy required for daily activities. Sleep staging classification is a common method for monitoring the quality of human sleep. Previous sleep classification methods were unable to effectively recognize the complex spatiotemporal features of EEG signals, leading to inadequate classification results. In this paper, we propose a Dual-Stream Spatiotemporal 3D Convolutional Neural Network (DST-3DCNN). Our method processes features in both the temporal and spatial streams through dual branches, learning the intrinsic connections between Electroencephalogram(EEG) channels to better analyze the spatiotemporal characteristics of signals. We also employ a multi-level residual network to fuse features from the temporal and spatial domains, enhancing the effectiveness of feature fusion. Moreover, to reduce the complexity of the network, we incorporate an efficient channel attention mechanism into DST-3DCNN, which enhances the perception of important features. Experimental results on two real-world datasets show that our model achieves an accuracy of 82.4% in sleep staging classification on ISRUC-S3, with an F1-Score of 0.816 and a kappa value of 0.763, demonstrating strong competitiveness. The accuracy on the ISRUC-S1 dataset is 82.0%, the F1-score is 0.794, and the Kappa value is 0.762.This study provides a new technical framework for automatic sleep staging and also offers a new approach for processing complex EEG signals.

Liu Yingying, Li Huifu, Wan Yuchai, Zhang Xun, Cui Letian
RWNet: A Recursive Wavelet-Driven Network for Deformable Medical Image Registration

Deformable image registration is crucial for medical imaging tasks such as disease diagnosis and treatment planning. Recent advances have relied on either transformer- or attention-based deep learning registration architectures for dense deformation field estimation, resulting in high complexity but with insufficient granularity. Moreover, the insufficient capability of multi-scale resolution modeling still poses a challenge to the accurate and effective registration of large volume deformations. Therefore, we propose a Recursive Wavelet-driven Network (RWNet) by integrating multi-level wavelet sub-bands and a recursive strategy to facilitate multi-scale deformation representation learning for deformation field prediction. Specifically, a pure convolutional encoder with discrete wavelet transform is developed for deep feature mining at multiple scales with different frequency components without high-weight attentions. Then, a deformation fusion-based field estimation method is carefully designed, which combines the frequency-driven field with the spatially-enhanced one to facilitate the reconstruction of the displacement field. Finally, a step-by-step recursive strategy is adopted by integrating high-level features to iteratively refine transformations in a coarse-to-fine manner. Extensive experiments on two publicly available brain MRI datasets demonstrate the superior performance against existing registration methods.

Yuqing Tong, Ting Zhang, Guoqiang Wang
A Mutually Reinforcing Semi-supervised Active Learning Framework for Lung Surgical Section Image Classification

Deep learning is widely used in medical image analysis, but obtaining large-scale annotation data is challenging due to the need for expert medical professionals to perform the annotations. This paper proposes a semi-supervised active learning framework to classify medical images with limited annotated data, helping clinicians assess lung tumor risk levels. This study designs an active learning strategy that estimates the uncertainty of samples based on their training dynamics, allowing for the selection of the most informative samples for manual labeling. To address the cold start problem of active learning, the framework constructs an initial labeled dataset with a uniform distribution using unsupervised training and clustering techniques. The semi-supervised learning framework strategically combines limited annotated samples with abundant unlabeled datasets during model training. This study conducts supervised training on the labeled data, while consistency regularization and pseudo-labeling techniques are applied to the unlabeled data. The framework effectively combines the advantages of active learning and semi-supervised learning. Compared to traditional semi-supervised learning frameworks, this framework effectively prioritizes the selection of samples on the decision boundary for labeling. Moreover, in each iteration of the active learning training process, the framework integrates the latest semi-supervised trained model, further improving the selection performance of active learning. Extensive evaluations on a clinical lung tumor surgical lesion slice image dataset show that the proposed framework achieves 88.79% accuracy, 88.74% precision, and an F1-score of 0.9147, surpassing existing baseline methods.

Lewen Nie, Qizhi Huang, Gansen Zhao, Jinji Yang, Haiyu Zhou
FPGS-Net: An Enhanced U-Net-Based Architecture for Retinal Vessel Segmentation Integrating Fusion Pooling and Guided-Attention Skip Modules

Retinal vascular abnormalities are key diagnostic indicators for various diseases, including glaucoma, cataracts, hypertension, diabetes, and arteriosclerosis. Due to the complex structure of retinal images, low contrast between blood vessels and the background, and blurred microvascular structures, retinal vessel segmentation is highly challenging. To tackle these issues, this paper introduces a multi-scale attention-guided fusion network (FPGS-Net), which aims to enhance the accuracy of automatic segmentation. The network consists of a multi-scale feature convolution module, which alleviates the semantic gap between vessels and the background and enhances noise suppression, thus preserving microvascular features. The Fusion Pooling Module (FPM) enhances the model’s ability to capture microvessels by integrating multi-scale features. It preserves local details while ensuring the completeness of global semantic information. Additionally, a guided attention skip module (GSM) combines attention mechanisms with feature fusion strategies to resolve semantic discrepancies between shallow and deep features, improving global feature extraction and structural integrity. To address the class imbalance of vascular pixels, to enhance the network’s focus on vascular regions during training, a hybrid loss function is proposed, integrating the Dice coefficient and the Focal Tversky coefficient. Experimental results on the DRIVE, CHASE_DB1, and STARE datasets demonstrate that FPGS-Net outperforms existing state-of-the-art methods, achieving superior segmentation performance and validating the model’s effectiveness.

Yanmin Niu, Zhangyu Gao, Hang Wen
A New Method for Detecting Cancer Driver Genes by Constructing a Heterogeneous Network with Test-Time Training from Multi-view

Cancer results from the accumulation of driver gene mutations. Therefore, identifying cancer driver genes is a key issue for the effective treatment and diagnosis of cancer. Previous studies have focused on integrating gene networks with multi-omics datasets and using graph neural networks (GNNs) to improve prediction performance. However, relying on GNNs is insufficient to fully integrate the biological information contained in multi-omics data and the interaction information within multi-gene relationship networks. In this paper, we propose a new method, called T3HGCN, which constructs a heterogeneous graph convolutional network (HGCN) from multi-view using six gene relationship networks, and introduces a test-time training framework for the HGCN to address the above challenges. T3HGCN consists of two self-supervised contrastive learning tasks, global contrastive learning and local contrastive learning, which can enable the model to effectively learn the global interaction information of each network and the biological information of each gene, respectively. The experimental results show that, compared to the existing methods, our model achieves better results on the area under the ROC curve (AUC) and the area under the precision - recall curve (AUPRC).

Mingxin Zhang, Jiayi Gao, Yuanhao Fan, Juan Wang
AMKD: Adaptive Multi-modality Knowledge Distillation for Pathological Survival Analysis

While cancer survival analysis through multi-modal integration of histopathology imaging and transcriptomic profiling provides comprehensive prognostic insights, transcriptomic profiling remains technically demanding and cost-prohibitive in clinic practice. However, existing multi-modal survival analysis frameworks have critical dependence on complete data availability during clinical deployment, rendering them operationally impractical given the incompleteness of real-world patient datasets. Considering the clinical reality of incomplete modality availability, implementing cross-modal knowledge fusion during training offers a viable solution to this dilemma, enabling the deployed histopathology-based model to maintain robust prognostic performance even when transcriptomic data are absent. In this work, we propose the Adaptive Multi-modality Knowledge Distillation (AMKD), which enables robust survival prediction using only pathology slides during inference. AMKD introduces (1) a gene-guided pathology knowledge enhancement module that refines multi-modal knowledge from the multi-model teacher, and (2) an adaptive redundancy reduction loss that dynamically balances knowledge transfer based on discrepancies between teacher-student performance during prediction. Evaluated on four TCGA datasets, AMKD achieves state-of-the-art performance (average C-index: 0.669), outperforming both unimodal and multi-modal methods.

Yangfan Xu, Linghan Cai, Yifeng Wang, Hailun Cheng, Fengchun Liu, Runming Wang, Yongbing Zhang
HED-Net: Hybrid Encoder and Decoder Network for Medical Image Segmentation

Medical image segmentation is crucial for supporting clinicians in lesion regions locating, disease diagnosis and treatment planning. In recent years, the Transformer architecture has become more and more popular for medical image segmentation due to its outstanding ability in modeling long-range dependencies. Many methods combine CNN and Transformer to overcome the limitations of Transformer in local feature extraction. However, most of these methods fail to effectively utilize both CNN and Transformer to fully extract local and global semantic information. To address this issue, we propose a novel Hybrid Encoder and Decoder Network (HED-Net) for medical image segmentation, which effectively extracts local and global features by constructing a hybrid CNN-Transformer encoder and decoder. In the encoder, a mixed encoder structure is built by alternately using CNN and Transformer to extract local and global semantic information. In the decoder, a Parallel Feature Extraction Module (PFEM) is designed to combine CNN and Transformer in parallel, generating hierarchical representations while fusing low-level and high-level features to enhance segmentation performance. Experimental results show that HED-Net achieves the highest Dice score of 81.78% and 90.01% on the Synapse and ACDC datasets, significantly outperforms most state-of-the-art medical image segmentation networks.

Junran Liang, Wenjie Luo
YOSAM: A YOLO and MedSAM-Based Framework for Automatic Measurement of Fetal Head Circumference in Ultrasound Images

Accurate measurement of fetal head circumference (HC) in ultrasound images remains essential yet challenging for obstetric assessment, primarily due to anatomical variations across gestational stages and inherent imaging artifacts. In response to these limitations, we introduce YOSAM, a novel framework for fetal HC measurement that synergistically combines YOLOv11-based detection with our enhanced MedSAM-AD model. The MedSAM-AD integrates an Adapter layer for domain-specific feature adaptation and a Dimensional Reciprocal Attention Mixing Transformer (D-RAMiT) block for a joint spatial-channel attention mechanism into the MedSAM architecture. Within our cascaded framework, YOLOv11 first generates bounding boxes to localize the fetal head, serving as spatial prompts for MedSAM-AD to perform precise segmentation. The segmented fetal head is then processed with Canny edge detection and elliptical fitting to compute HC. Experimental results show that our approach achieves outstanding performance among standard biometric metrics of the HC18 dataset, attaining a Dice Similarity Coefficient (DSC) of 98.06 ± 1.06%, a Difference (DF) of 0.13 ± 2.47 mm, an Absolute Difference (AD) of 1.76 ± 1.74 mm, and a Hausdorff Distance (HD) of 1.18 ± 0.71 mm. With HD as the principal criterion for boundary delineation, our method achieves state-of-the-art performance in fetal head boundary delineation.

Zhihao Li, Liyan Chen, Li Lu, Ye Ding, Xiuxiu Hao
SAPS-ViM: Spatial Aggregation Prefix Synergistic Vision Mamba for Wheat Diseases Classification

Wheat diseases are one of the main factors that affect healthy growth, yield, and quality, leading to an annual global reduction in grain production of approximately 14%, with a growing trend. In this study, we propose the SAPS-ViM network, a novel method that integrates a bidirectional state-space model with an advanced convolutional neural network (CNN) architecture. SAPS-ViM utilizes Multi-Scale Depthwise Convolutions (MDWC) and Multi-Step Adaptive Gated Aggregation (MAGA), which are fused into a Spatial Aggregation Block (SAB) and integrated into the Vision Mamba framework. This synergistic combination improves the capability to learn features of the network by efficiently capturing contextual relationships and spatial information. Experimental results show that SAPS-ViM achieves a favorable balance between model complexity and performance through the effective coordination of multi-step depthwise convolution, adaptive gating mechanisms, and the bidirectional state-space model. Evaluation of the WPDD and LWDCD-Pro datasets demonstrates that our method achieves a classification accuracy improvement of 3.06% and 2.43%, respectively, compared to Vision Mamba, with an average accuracy increase of 4.13% over other mainstream networks of similar parameter scales. SAPS-ViM sets a new industry benchmark with superior classification accuracy, significantly exceeding established methods.

Siyuan Qin, Jinsong Wu
Multidimensional EEG Signal Analysis and Vision Transformer-Masked Autoencoder-Based Image Processing for Alzheimer's Disease Detection

Alzheimer's disease (AD) is a neurodegenerative disorder that severely impairs patients’ cognitive functions and daily living abilities. EEG signals possess unique advantages in the early diagnosis of cognitive dysfunction. This study aims to explore an effective AD diagnostic method using EEG signals and deep learning techniques. We conducted signal analysis and image processing on EEG data from 65 subjects, including 36 AD patients and 29 healthy controls. Image feature extraction was performed using a vision transformer and a masked autoencoder, and a feature-level fusion strategy was employed to construct multidimensional EEG features. The study found that using a multilayer perceptron for AD detection achieved an accuracy of 96.92%, a sensitivity of 97.22%, and a specificity of 96.55%. The results validate the effectiveness of single-modal multidimensional EEG features in AD diagnosis and provide a scientific basis for the application of deep learning in the early diagnosis of AD.

Shu Xiang, Haobo Ling, Shiwei Chen, Meihong Wu
HeartOx: Efficient Multi-task Learning for Contactless Heart Rate and Blood Oxygen Estimation

Remote photoplethysmography (rPPG) noninvasively measures physiological parameters by analyzing facial blood flow variations, demonstrating broad prospects in medical monitoring and health management applications. However, existing deep learning methods are limited to single-task estimation (HR or $${\text{SpO}}_{{2}}$$ SpO 2 alone), failing to capture the intrinsic correlation between these vital signs. Thus, we present HeartOx, a multi-task deep learning framework capable of simultaneously achieving accurate non-contact estimation of both blood oxygen saturation and heart rate. HeartOx includes a $${\text{SpO}}_{{2}}$$ SpO 2 branch and a HR branch and is finally optimized by combing heart rate loss and blood oxygen loss. Moreover, to improve HR estimation, we design a plug-and-play Spatio-temporal SE Attention Stack Conv module (STSE). It enhances key features while reducing noise and redundancy, using stacked spatio-temporal convolutions to better capture rPPG signal dynamics and relationships. The results show that our multi-task model achieves comparable or better results on concurrent HR and $${\text{SpO}}_{{2}}$$ SpO 2 estimation, compared with the existing task-specific models.

Mengqi Wang, Mingyu Gu, Jing Cai, Yanbing Xue
PISynergy: A Triplet Interaction and Causal Interpretation Framework for Drug Synergy Prediction

Combination therapy is critical in cancer treatment, yet accurately identifying synergistic drug combinations and their mechanisms remains challenging due to poor interpretability and limited generalization in existing models. Here, we introduce PISynergy, a novel synergy prediction framework that integrates precise synergy classification with causal interpretability. PISynergy consists of: (1) a triple interaction attention (TIA)-based prediction module explicitly modeling drug–drug–cell line interactions, strengthened by cross-layer residual connections for enhanced stability; and (2) a causal interpreter employing encoder-decoder architecture with causal constraints to uncover key drug substructures and gene features underpinning synergistic interactions. Evaluations on two public datasets demonstrated PISynergy significantly outperformed existing methods with improvements of up to 13% in core metrics and substantially reduced prediction variance. In challenging cold-start scenarios, it exhibited remarkable generalization. Ablation studies confirmed both TIA and residual connections as essential for accuracy and robustness. Importantly, the causal interpreter revealed biologically meaningful substructures and validated pathways. Additionally, literature-supported novel synergistic pairs emerged from top-ranked predictions. Thus, PISynergy provides an accurate, generalizable, and causally interpretable platform for drug synergy prediction, facilitating trustworthy insights and experimental validation of novel therapeutic combinations.

Haitao Li, Long Zheng, Yiwei Chen, Junjie Li, Chunhou Zheng, Junjie Wang, Yansen Su
Dense Depth-Supervised Simultaneous Localization and Mapping for Robust Bronchoscopic Navigation

Bronchoscopic navigation plays an essential role in helping surgeons perceive the location of bronchoscopy and providing guidance for minimally invasive procedures. Vision-based navigation methods, particularly visual SLAM, have gained popularity due to their convenience and simple configuration. However, the poor texture and low contrast of bronchoscopic images may break down the conventional visual SLAM-based navigation system due to insufficient feature matches and a lack of reliable keypoints for pose estimation. This work proposes a new dense depth-supervised visual SLAM framework to draw benefits from the recent success of detector-free matchers and monocular depth estimation to maintain continuous and stable tracking of autonomous bronchoscopic navigation. Specifically, our framework first employs a dense detector-free feature-matching model to obtain more widely distributed matching pairs between low-quality bronchoscopic images. Moreover, we train an accurate monocular dense depth estimation model and integrate it into visual SLAM to obtain more 3D points for camera pose estimation via Perspective-n-Point (PnP), thereby improving tracking performance. We collected CT and bronchoscopic videos from the hospital for evaluation. The experimental results demonstrate that our proposed framework achieves more accurate and continuous bronchoscopic navigation results compared to the state-of-the-art visual SLAM methods.

Xiuling Huang, Wenkang Fan, Hao Fang, Xiongbiao Luo
Backmatter
Title
Advanced Intelligent Computing Technology and Applications
Editors
De-Shuang Huang
Chuanlei Zhang
Qinhu Zhang
Yijie Pan
Copyright Year
2025
Publisher
Springer Nature Singapore
Electronic ISBN
978-981-9500-33-8
Print ISBN
978-981-9500-32-1
DOI
https://doi.org/10.1007/978-981-95-0033-8

PDF files of this book have been created in accordance with the PDF/UA-1 standard to enhance accessibility, including screen reader support, described non-text content (images, graphs), bookmarks for easy navigation, keyboard-friendly links and forms and searchable, selectable text. We recognize the importance of accessibility, and we welcome queries about accessibility for any of our products. If you have a question or an access need, please get in touch with us at accessibilitysupport@springernature.com.

Premium Partner

    Image Credits
    Neuer Inhalt/© ITandMEDIA, Nagarro GmbH/© Nagarro GmbH, AvePoint Deutschland GmbH/© AvePoint Deutschland GmbH, AFB Gemeinnützige GmbH/© AFB Gemeinnützige GmbH, USU GmbH/© USU GmbH, Ferrari electronic AG/© Ferrari electronic AG