Skip to main content

2025 | Buch

Pattern Recognition and Computer Vision

7th Chinese Conference, PRCV 2024, Urumqi, China, October 18–20, 2024, Proceedings, Part XIV

herausgegeben von: Zhouchen Lin, Ming-Ming Cheng, Ran He, Kurban Ubul, Wushouer Silamu, Hongbin Zha, Jie Zhou, Cheng-Lin Liu

Verlag: Springer Nature Singapore

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This 15-volume set LNCS 15031-15045 constitutes the refereed proceedings of the 7th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2024, held in Urumqi, China, during October 18–20, 2024.

The 579 full papers presented were carefully reviewed and selected from 1526 submissions. The papers cover various topics in the broad areas of pattern recognition and computer vision, including machine learning, pattern classification and cluster analysis, neural network and deep learning, low-level vision and image processing, object detection and recognition, 3D vision and reconstruction, action recognition, video analysis and understanding, document analysis and recognition, biometrics, medical image analysis, and various applications.

Inhaltsverzeichnis

Frontmatter

Medical lmage Processing and Analysis I

Frontmatter
A Fine-Grained Recurrent Network for Image Segmentation via Vector Field Guided Refinement

Since the segmentation of regions of interest from medical images is significant for doctors, current researches are in pursuit of high segmentation precision. However, the utilization and fusion of multi-scale hierarchical feature maps is unsatisfactory. In this paper, we propose a fine-grained recurrent network (FRNet) for medical image segmentation via vector field guided refinement, which is based on the encoder-decoder structure. In the proposed network, the encoder utilizes CNN to extract multi-scale feature maps. Then, in the decoder, for making better use of the multi-scale semantic features and obtaining finer features, we design a new fine-grained recurrent unit for refining feature maps and score maps. Additionally, vector field embedded in the decoder guides the improvement of upsampling accuracy and rectification of edge segmentation. Experimental results on four datasets demonstrate that the proposed FRNet not only improves the segmentation precision, but also flexibly works on different CNN-based backbones.

Xinxin Shan, Yao Li, Fang Chen, Dongchu Wang, Yifan Deng
Semi-supervised Medical Image Segmentation with Strong/Weak Task-Aware Consistency

Semi-supervised learning (SSL) is increasingly employed in medical image segmentation, primarily due to the scarcity of well-labeled data, which necessitates precise and technical annotations at the pixel level. This study aims to elucidate task-invariant and task-specific dependencies among common representation tasks. Initially, strong and weak correlation tasks from various levels are categorized with respect to the pixel-level segmentation. Subsequently, we introduce the Task-Aware Smoothness (TAS) Assumption, which capitalizes on task-aware perturbations within a single model while promoting task-aware consistency across correlated tasks. Building upon this assumption, we propose a novel Unified Task-aware Consistency (UniTask) framework to simultaneously unify and reinforce both strong and weak task-aware consistency for SSL. The UniTask integrates two auxiliary branches onto a single backbone, each dedicated to performing two types of correlated tasks. Specifically, our network consists of a medical segmentation (MS) branch at the pixel level, a level-set (LS) branch at the geometry level from a strong-correlation perspective, and a point set (PS) branch at the point level from a weak-correlation viewpoint. Consequently, our UniTask, with the incorporation of two additional tasks, facilitates interactions and induces inherent segmentation perturbations at three distinct levels, thereby promoting both supervised and semi-supervised learning. The proposed methods undergo extensive evaluation on inner cell mass (ICM) and left atrium (LA) datasets. Comfortingly, our strategies yield obvious improvements compared with state-of-the-art methods, thus validating the efficacy of our hypothesis.

Hua Wang, Linwei Qiu, Yiming Li, Jingfei Hu, Jicong Zhang
Steerable Pyramid Transform Enables Robust Left Ventricle Quantification

Predicting cardiac indices has long been a focal point in the medical imaging community. While various deep learning models have demonstrated success in quantifying cardiac indices, they remain susceptible to mild input perturbations, e.g., spatial transformations, image distortions, and adversarial attacks. This vulnerability undermines confidence in using learning-based automated systems for diagnosing cardiovascular diseases. In this work, we describe a simple yet effective method to learn robust models for left ventricle (LV) quantification, encompassing cavity and myocardium areas, directional dimensions, and regional wall thicknesses. Our success hinges on employing the biologically inspired steerable pyramid transform (SPT) for fixed front-end processing, which offers three main benefits. First, the basis functions of SPT align with the anatomical structure of LV and the geometric features of the measured indices. Second, SPT facilitates weight sharing across different orientations as a form of parameter regularization and naturally captures the scale variations of LV. Third, the residual highpass subband can be conveniently discarded, promoting robust feature learning. Extensive experiments on the Cardiac-Dig benchmark show that our SPT-augmented model not only achieves reasonable prediction accuracy compared to state-of-the-art methods, but also exhibits significantly improved robustness against input perturbations. Code is available at https://github.com/yangyangyang127/RobustLV .

Xiangyang Zhu, Kede Ma, Wufeng Xue
Semantics Guided Disentangled GAN for Chest X-Ray Image Rib Segmentation

The label annotations for chest X-ray image rib segmentation are time consuming and laborious, and the labeling quality heavily relies on medical knowledge of annotators. To reduce the dependency on annotated data, existing works often utilize generative adversarial network (GAN) to generate training data. However, GAN-based methods overlook the nuanced information specific to individual organs, which degrades the generation quality of chest X-ray image. Hence, we propose a novel Semantics guided Disentangled GAN (SD-GAN), which can generate the high-quality training data by fully utilizing the semantic information of different organs, for chest X-ray image rib segmentation. In particular, we use three ResNet50 branches to disentangle features of different organs, then use a decoder to combine features and generate corresponding images. To ensure that the generated images correspond to the input organ labels in semantics tags, we employ a semantics guidance module to perform semantic guidance on the generated images. To evaluate the efficacy of SD-GAN in generating high-quality samples, we introduce modified TransUNet (MTUNet), a specialized segmentation network designed for multi-scale contextual information extracting and multi-branch decoding, effectively tackling the challenge of organ overlap. We also propose a new chest X-ray image dataset (CXRS). It includes 1250 samples from various medical institutions. Lungs, clavicles, and 24 ribs are simultaneously annotated on each chest X-ray image. The visualization and quantitative results demonstrate the efficacy of SD-GAN in generating high-quality chest X-ray image-mask pairs. Using generated data, our trained MTUNet overcomes the limitations of the data scale and outperforms other segmentation networks.

Lili Huang, Dexin Ma, Xiaowei Zhao, Chenglong Li, Haifeng Zhao, Jin Tang, Chuanfu Li
MedPrompt: Cross-modal Prompting for Multi-task Medical Image Translation

The ability to translate medical images across different modalities is crucial for synthesizing missing data and aiding in clinical diagnosis. However, existing learning-based techniques have limitations when it comes to capturing cross-modal and global features. These techniques are often tailored to specific pairs of modalities, limiting their practical utility, especially considering the variability of missing modalities in different cases. In this study, we introduce MedPrompt, a multi-task framework designed to efficiently translate diverse modalities. Our framework incorporates the Self-adaptive Prompt Block, which dynamically guides the translation network to handle different modalities effectively. To encode the cross-modal prompt efficiently, we introduce the Prompt Extraction Block and the Prompt Fusion Block. Additionally, we leverage the Transformer model to enhance the extraction of global features across various modalities. Through extensive experimentation involving five datasets and four pairs of modalities, we demonstrate that our proposed model achieves state-of-the-art visual quality and exhibits excellent generalization capability. The results highlight the effectiveness and versatility of MedPrompt in addressing the challenges associated with cross-modal medical image translation.

Xuhang Chen, Shenghong Luo, Chi-Man Pun, Shuqiang Wang
Enhancing Hippocampus Segmentation: SwinUNETR Model Optimization with CPS

Deep learning techniques have made remarkable strides in medical image segmentation, overcoming many challenges associated with traditional methods. Despite their success, these techniques typically rely on large amounts of manually annotated data, which is both costly and requires expert knowledge for accurate annotations. Additionally, the need for substantial computational power, especially when processing three-dimensional images, further complicates their application. To address these challenges, this paper presents a novel optimization method called Combining Parallel and Sequential Strategy (CPS). This method leverages an efficient parameter transfer learning strategy that integrates the strengths of LoRA and Adapter. CPS can retain the original knowledge structure of the pre-trained model while updating only a minimal number of parameters, thereby reducing the risk of overfitting. We employ CPS to enhance the state-of-the-art SwinUNETR model for medical image segmentation, initially pre-trained on the BraTs2021 dataset, this enhanced model is subsequently applied to three hippocampal datasets. The results reveal that CPS significantly outperforms existing methods, increasing the Dice coefficient by an average of 1.14% and decreasing the HD95 by an average of 0.767, compared to the LoRA method. These findings highlight the effectiveness of our fine-tuning method in leveraging limited data resources, marking a significant advancement in the field of hippocampus segmentation.

Wangang Cheng, Guanghua He, Hancan Zhu
Uncertainty-Inspired Credible Pseudo-Labeling in Semi-Supervised Medical Image Segmentation

Semi-Supervised Medical Image Segmentation (SSMIS) has significantly reduced the need for manual labeling by utilizing unlabeled data and made considerable progress, there are still issues with errors from noisy pseudo-labels and limited utilization of pseudo-label information. To address these two challenges, we propose a novel Uncertainty-Inspired Credible Pseudo-Labeling (UCPL) framework for SSMIS. UCPL leverages uncertainty estimation, which indicates the reliability of predictions, to guide the Semi-Supervised Learning (SSL) process. Boosted by this uncertainty estimation, UCPL gains from acquiring more reliable pseudo-labels and enhances learning efficiency from unlabeled data. Specifically, our approach starts by estimating uncertainty to obtain uncertainty maps. These uncertainty maps then guide the proposed Class-aware Uncertainty Region-Paste (CURP) and Uncertainty-aware Thresholding (UAT). CURP selectively replaces the most uncertain regions in unlabeled images with matching class regions from labeled images, improving the credibility of pseudo-labels. By considering the model’s real-time learning state through uncertainty, the suggested UAT dynamically adjusts the confidence threshold and balances the involvement of pseudo-labels and the noise contained. Experiments on two public medical image segmentation datasets reveal that our method outperforms existing SSL methods. The code will be released at https://github.com/Duckyee728/UCPL.git .

Zhiyu Zheng, Liang Lv, Bo Ni
MFPNet: Mixed Feature Perception Network for Automated Skin Lesion Segmentation

Skin lesion segmentation methods are capable of autonomously segmenting the lesion area, thereby providing precise support for lesion diagnosis and treatment. In dermoscopic images, global contextual information facilitates the prediction of the lesion’s central area, while local detail information aids in predicting the lesion’s boundary. Nevertheless, the majority of existing networks don’t fully exploit these two types of features. Consequently, this paper proposes a Mixed Feature Perception Network (MFPNet) that effectively integrates global contextual features and local features to proficiently segment the lesion area. Additionally, we devised a Multi-scale Feature Perception Module (MFPM) that mitigates the loss of spatial information during downsampling, thereby assisting the backbone in reconstructing mixed features. Ultimately, we employ an Adaptive Decoder (AD) to amalgamate the feature transmitted by the encoder and the upper-level decoder to enhance the network’s robustness. Experiments demonstrate that MFPNet surpasses other state-of-the-art methods on both the ISIC2018 dataset and the PH $$^{2}$$ 2 dataset. Our code is open-source and available at https://github.com/XYQ1517/MFPNet.

Youqiang Xiong, Di Yuan, Lu Li, Xiu Shu
LD-BSAM: Combined Latent Diffusion with Bounding SAM for HIFU Target Region Segmentation

The performance of segment anything model (SAM) is satisfactory in natural images, but it exhibits obvious performance degradation and limited generalization ability in the context of high-intensity focused ultrasound(HIFU) treatment monitoring images. There are two main problems: (1) the dataset of HIFU treatment monitoring images is excessively rare due to data collection limitations and privacy protection. (2) the noise and artefacts of B-mode ultrasonic images lead to low contrast and blurred boundaries in the target region. In this work, we propose a combined Latent Diffusion of bounding SAM for HIFU target region segmentation, called LD-BSAM. We designed and incorporated a data filtering module in the Latent Diffusion model to generate high-quality HIFU ultrasound surveillance images to assist in BSAM training. At the same time, innovated feature extractor and bounding extractor are added to SAM to extract HIFU treatment of target region more accurately. The experimental results show that the ultrasound surveillance images generated by the Latent Diffusion model in this paper exhibit better metrics on FID and LPIPS than other generative models. Compared to other 19 state-of-the-art segmentation models, the model in this paper works best in the ultrasound surveillance dataset of uterine fibroids for HIFU clinical treatment. To further explore the generalisability of the proposed algorithm, validation was continued on the breast ultrasound public datasets (BUSI, BUSC, BUS) and the thyroid ultrasound public dataset TN3K. The code, data and models will be released at https://github.com/425877/LD-BSAM .

Jintao Zhai, Feng Tian, Fangfang Ju, Xiao Zou, Shengyou Qian
Hierarchical Decoder with Parallel Transformer and CNN for Medical Image Segmentation

With the success of Transformers, hybrid Transformer and CNN methods gain considerable popularity in medical image segmentation. These methods utilize a hybrid architecture that combines Transformers and CNNs to fuse global and local information, supplemented by a pyramid structure to facilitate multi-scale interaction. However, they encounter two primary limitations: (i) Transformer struggle to capture complete global information due to the sliding window nature of the convolutional operator, and (ii) the pyramid structure within single decoder fails to provide sufficient multi-scale interaction necessary for restoring detailed features at higher levels. In this paper, we introduce the Hierarchical Decoder with Parallel Transformer and CNN (HiPar), a novel architecture designed to address these limitations. Firstly, we present a parallel structure of Transformer and CNN to maximize the capture of both global and local features. Subsequently, we propose a hierarchical decoder to model multi-scale information and progressively restore spatial details. Additionally, we incorporate lightweight components to enhance the efficiency of feature representation. Extensive experiments demonstrate that our HiPar achieves state-of-the-art results on three popular medical image segmentation benchmarks: Synapse, ACDC and GlaS.

Shijie Li, Yu Gong, Qingyuan Xiang, Zheng Li
Class-Aware Cross Pseudo Supervision Framework for Semi-Supervised Multi-organ Segmentation in Abdominal CT Scans

Automatic multi-organ segmentation in abdominal computed tomography (CT) scans is crucial for accurate computer-aided diagnosis. Nowadays, numerous semi-supervised learning (SSL) techniques have been introduced to leverage the vast amount of unlabeled data. However, the class imbalanced issue still impedes accurate segmentation, particularly for small organs in multi-organ segmentation. To address this issue, we propose a class-aware cross pseudo supervision (C $$^2$$ 2 PS) framework, which built upon cross pseudo supervision (CPS) method. Specifically, Our approach enhances network learning for small organs in unlabeled data through a dynamic threshold-based consistency (DTC) loss, while a dedicated organ-specific weighted (OSW) loss is designed for labeled data. We make full use of the label distributions of each organ and the pseudo-label distributions of each organ output by the model in order to direct the model’s attention towards smaller organs. Results on public benchmarks show that our method outperforms competing SSL techniques, as demonstrated by improved mean Dice (2.36% to 3.44%) and mean Jaccard (2.67% - 3.51%) on the FLARE2022 dataset, mean Dice(0.93% - 1.81%) and mean Jaccard (0.99% to 2.13%) on the AMOS2022 dataset.

Deqian Yang, Haochen Zhao, Gaojie Jin, Hui Meng, Lijun Zhang
APAN: Anti-curriculum Pseudo-Labelling and Adversarial Noises Training for Semi-supervised Medical Image Classification

Pseudo-label semi-supervised learning (SSL) has gained extensive adoption in medical image analysis, with many methods allocating pseudo-labels to unlabeled samples through probability thresholds. However, this approach is prone to introducing incorrect pseudo-labels, leading to confirmation bias and it cannot effectively handle multi-class and multi-label problems. Additionally, the unselected samples are often ignored and not fully utilized. To address these issues, a novel SSL method named APAN is proposed in this paper, utilizing anti-curriculum learning and adversarial noise training. Instead of employing a probability threshold approach, we opt to select high-information samples for pseudo-labeling, thereby empowering our model to effectively address multi-label and multi-class problems. Moreover, we introduce adversarial noise to smooth the decision boundary on unselected samples rather than discarding them outright. We conducted extensive experiments on two public medical image datasets, Chest X-Ray14 and ISIC2018, demonstrating the feasibility of our approach.

Junfan Chen, Jun Yang, Anfei Fan, Jinyin Jia, Chiyu Zhang, Wei Li
Multi-Modal Learning for Predicting the Progression of Transarterial Chemoembolization Therapy in Hepatocellular Carcinoma

Hepatocellular carcinoma (HCC) is marked by high morbidity and is often diagnosed in middle or late stages. Transarterial chemoembolization (TACE) stands as the current standard of care for intermediate-stage HCC patients. Nevertheless, the tumor’s heterogeneity significantly impacts patient prognosis. In this paper, a new dynamic multi-model graph network fusion multi-sequence magnetic resonance imaging is proposed to predict the prognosis of HCC patients after TACE treatment. The model proposes a spatial graph convolution module focusing on active regions within the tumor, a multi-module dynamic fusion module capturing the potential relationship between the tumor and the liver, and a cross-model topology fusion module using topological information to guide the multi-sequence MRI fusion. Our method achieved the best results compared to the state-of-the-art method, with an ACC of 75.27%, AUC of 76.69%, F1 of 73.84%, C-index of 0.6978, HR of 3.1988.

Lingzhi Tang, Haibo Shao, Jinzhu Yang, Jiachen Xu, Jiao Li, Yong Feng, Jiayuan Liu, Song Sun, Qisen Wang
Growing with the Help of Multiple Teachers: Lightweight and Noise-Resistant Student Model for Medical Image Classification

In recent years, the development of medical imaging technology has transformed imaging solutions from laboratory-based to point-of-care imaging with real-time capabilities. However, these point-of-care devices are often constrained by environmental factors such as ambient light and noise, leading to poor image quality and consequently affecting the diagnostic accuracy of point-of-care devices. Furthermore, due to the need for lightweight models in point-of-care devices, traditional models fail to meet requirements in terms of computational resources, model parameters, and inference time. Therefore, to address the aforementioned issues, this paper proposes an optimized lightweight student model that focuses on residual information. A lightweight structure based on Shift MLP is designed on the residual branch of the model to enhance the model’s capability to acquire spatial feature information at multiple scales. Simultaneously, we propose a multi-teacher distillation strategy to improve the accuracy and noise-resistance of the student model. Firstly, we introduce an adaptive learning approach based on auxiliary teachers, leveraging unlabeled and noisy data for adaptive learning to enhance the model’s robustness. Then, we design a global teacher model to enhance the accuracy of the student model and indirectly improve the teaching ability of auxiliary teacher model, thereby achieving knowledge transfer at a global level. We evaluate our approach on two public medical image classification datasets, and the results demonstrate that while almost maintaining accuracy, we reduce the number of parameters by 38 times, decrease computational complexity by 11 times, and achieve an inference time of only 18.94ms on CPU.

Yucheng Song, Jincan Wang, Yifan Ge, Zhifang Liao, Peng Lan, Jia Guo, Lifeng Li
DRA-CN: A Novel Dual-Resolution Attention Capsule Network for Histopathology Image Classification

The automatic classification of histopathological images plays a crucial role in cancer diagnosis. However, most existing medical image classification studies based on Capsule Network (CapsNet) suffer from issues such as overly localized feature extraction, redundancy in low-level capsule information within routing mechanisms, and inadequate learning of category-specific features for multiclass tasks. This work proposes a novel Dual-Resolution Attention Capsule Network (DRA-CN) tailored for histopathological image classification, which successfully achieves precise classification. During the image feature learning step, DRA-CN introduces a dynamic routing optimization strategy for capsule features and a dual-resolution attention feature fusion strategy to enhance the network’s capability to capture image information. To enhance category learning, DRA-CN integrates a category attention block model to enhance classification performance. Experimental results demonstrate the superior performance of DRA-CN in histopathological image classification tasks, outperforming existing models. On the ChaoYang multi-class dataset, DRA-CN achieved a classification accuracy of 97.50 $$\%$$ % , F1-Score of 84.26 $$\%$$ % , precision of 84.54 $$\%$$ % , and recall of 83.98 $$\%$$ % .

Palidan Tursun, Siyu Li, Min Li, Xiaoyi Lv, Cheng Chen, Chen Chen, Yunling Wang
A Mask Guided Network for Self-supervised Low-Dose CT Imaging

Self-supervised low-dose Computed Tomography (LDCT) imaging methods have demonstrated significant clinical potential as they can train an efficient denoising model without high-quality normal-dose CT (NDCT) images. However, existing methods only focus on improving overall quality of the images, potentially resulting in the loss of details in critical areas when subjected to high levels of noise. To address this issue, we develop a mask guided network to enhance the quality of the desired regions in a self-supervised manner. Firstly, an adaptive organ segmentation model is trained with efficient fine-tuning strategies based on the Segment Anything Model. Secondly, we utilize the proposed segmentation model to generate mask embeddings for each LDCT image, incorporating both positional information of the target (e.g., liver and kidney) and latent image features. Finally, we propose a novel noise reduction network that incorporates mask embedding into self-supervised learning to recover high-quality CT images. Comprehensive comparisons and analyses on two datasets have demonstrated that the proposed method can achieve excellent performance in suppressing overall noise and improve imaging quality in key areas.

Qianyu Wu, Yunbo Gu
Dental Diagnosis from X-Ray Panoramic Radiography Images: A Dataset and A Hybrid Framework

Deep neural networks have displayed promising performance in various fields, including biometrics, medical image processing and analysis, as well as dental healthcare. However, deep learning solutions have not yet become the norm in routine dental practice. This is mainly due to the scarcity of dental datasets. To address this challenge, we have built a dataset called Quadruple Dental X-ray Panoramic (Quad-DXP) Dataset, specifically targeted at the recognition of dental disease and treatment. This dataset annotates nine types of dental issues (disease or treatment), and is the dental panorama dataset with the most abundant types of annotations so far. We further propose a framework for dental pathological issue identification on panoramic radiographs. This framework takes a panoramic X-ray image as input, feeds it into a series of neural network modules, and then achieves the recognition results of dental disease/treatment and enumeration detection. We have achieved satisfactory experimental results under the supervision of dentists and experts, which proves the effectiveness and reliability of our framework in dental diagnosis. This work can assist dentists in formulating treatment plans and improving dental healthcare.

Gege Shan, Xiaoliang Ma, Xiaojie Bai, Hongzhou Zhu, Ting Wang, Shengji Zhu, Lei Wang
Edge-Guided Bidirectional-Attention Residual Network for Polyp Segmentation

Precise polyp segmentation provides important information in the early detection of colorectal cancer in clinical practice. However, it is a challenging task for two major reasons: 1) the color and texture of polyps are very similar to surrounding mucosa especially in the edge area; 2) the polyps often vary largely in scale, shape and location. To this end, we propose an edge-guided bidirectional-attention residual network (EBRNet) equipped with an edge-guided bidirectional-attention residual module (EBRM) and a context enrichment layer (CEL). The proposed EBRM focuses on both foreground and background regions for detail recovery and noise suppression to capture the camouflaged polyps in cluttered tissue, and introduces edge cues for accurate boundaries. The CEL enriches the contextual semantics in multiple levels to adaptively detect the polyps in various sizes, shapes and locations. Extensive experiments on five benchmark datasets demonstrate that our EBRNet performs favorably against most state-of-the-art methods under different evaluation metrics. The source code will be publicly available at https://github.com/LanhooNg/EBRNet .

Lanhu Wu, Miao Zhang, Yongri Piao, Zhiwei Li, Huchuan Lu
From Coarse to Fine: A Novel Colon Polyp Segmentation Method Like Human Observation

Colon polyp screening is critical for the prevention of colon cancer, and the use of colon polyp segmentation to assist physicians in identifying potential polyps can improve detection efficiency and reduce misdiagnosis and missed diagnoses. However, polyp segmentation encounters the following challenges: (1) the size and shape of polyps vary widely; (2) the edge between polyps and the surrounding normal area is not obvious. To address the above challenges, a novel colon polyp segmentation method like human observation (LHONet) is proposed. This approach aims to align the polyp segmentation network more closely with human cognitive processes. First, the rough outline of the colon polyp image is identified to roughly understand the size and shape of the polyp, and then the polyp edge is finely segmented. The network structure consists of three modules: the Rough Outline Generation (ROG) module is designed to generate the rough outline of colon polyps; the Edge Information Extraction (EIE) module extracts the edge information of the polyps more accurately by combining with the classical edge detection technique; and the Outline Feature Clarifying (OFC) module is devised to supplement the edge information into the rough outline to realize the accurate segmentation of polyps. The method was compared to other methods on five datasets: EndoScene, CVC-ClinicDB, KvasirSEG, CVC-ColonDB, and ETIS-LaribPolypDB, with mDice scores of 90.79, 94.46, 92.13, 82.00, 82.29%, respectively. The codes are available at https://github.com/heyeying/LHONet .

Wei Wang, Huiying Sun, Xin Wang
Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-label Medical Image Classification

The task of medical image recognition is notably complicated by the presence of varied and multiple pathological indications, presenting a unique challenge in multi-label classification with unseen labels. This complexity underlines the need for computer-aided diagnosis methods employing multi-label zero-shot learning. Recent advancements in pre-trained vision-language models (VLMs) have showcased notable zero-shot classification abilities on medical images. However, these methods have limitations on leveraging extensive pre-trained knowledge from broader datasets, and often depend on manual prompt construction by expert radiologists. By automating the process of prompt tuning, prompt learning techniques have emerged as an efficient way to adapt VLMs to downstream tasks. Yet, existing CoOp-based strategies fall short in performing class-specific prompts on unseen categories, limiting generalizability in fine-grained scenarios. To overcome these constraints, we introduce a novel prompt generation approach inspirited by text generation in natural language processing (NLP). Our method, named Pseudo-Prompt Generating (PsPG), capitalizes on the priori knowledge of multi-modal features. Featuring a RNN-based decoder, PsPG autoregressively generates class-tailored embedding vectors, i.e., pseudo-prompts. Comparative evaluations on various multi-label chest radiograph datasets affirm the superiority of our approach against leading medical vision-language and multi-label prompt learning methods. The source code is available at https://github.com/fallingnight/PsPG .

Yaoqin Ye, Junjie Zhang, Hongwei Shi
Multi-Perspective Text-Guided Multimodal Fusion Network for Brain Tumor Segmentation

In brain tumor segmentation research, there is considerable interest in fully exploring the potential of all modalities. Most multi-modal fusion segmentation methods rely mainly on traditional discrete label representation learning, focusing solely on utilizing image data for segmentation tasks. The creation of multi-modal medical imaging datasets requires specialized knowledge and time-consuming, making it difficult to achieve large-scale datasets. Networks that rely solely on images are prone to bottlenecks due to the limitations in the quantity and quality of available images. With the emergence of pre-trained visual-language models, the establishment of spatial structural consistency between image and text data enables text information to serve as prompt, guiding models to achieve significant performance. This approach also aids in establishing spatial structural consistency between image and text data. Inspired by these insights, we propose a multi-perspective text-guided multi-modal fusion segmentation network. This network provides semantic guidance for feature extraction fusion and output result deorthogonality through modal and class text prompts, respectively. Our method outperforms existing approaches, achieving superior segmentation performance as demonstrated by evaluation on the BraTS2020 and BraTS2021 datasets.

Huanping Zhang, Yi Zhang, Guoxia Xu, Jiangpeng Zheng, Meng Zhao
Continual Learning for Fundus Image Segmentation

Accurately segmenting biomarkers in fundus images is crucial for the recognition of retinal diseases. While most existing segmentation methods are fully supervised and limited to handling single tasks, fundus image datasets labelled with partial classes are sequentially constructed from various medical institutions in real-world scenarios. Consequently, dynamically extending a model to new datasets and classes is essential for training a unified segmentation model. It is rather challenging without the availability of previous datasets and annotations due to storage and privacy restrictions. In this paper, we propose a novel replay-free continual segmentation method for fundus images. To address the issue of overlapping class regions in fundus images, we introduce a learning procedure that generates pseudo labels separately for old classes. Additionally, to tackle the problem of imbalanced class distribution in fundus images, we combine class-agnostic knowledge distillation and class prototype contrastive learning for old knowledge transfer to achieve a better balance between stability and plasticity. Furthermore, considering the impact of different task orders on the quality of generated pseudo labels, we employ a dynamic forget gate to selectively utilize knowledge distillation for further improving plasticity. Through extensive experiments conducted on multiple fundus image datasets with varying task orders, our proposed method demonstrates its effectiveness by significantly outperforming other comparison methods at every incremental step.

Yufan Liu
Embedded Deep Learning Based CT Images for Rifampicin Resistant Tuberculosis Diagnosis

In the treatment of tuberculosis (TB), drug-resistant tuberculosis arises when Mycobacterium tuberculosis undergoes genetic mutations or acquires resistance through horizontal gene transfer. Identifying the treatment response of TB patients to Rifampicin, a principal medication for TB treatment, is essential for healthcare professionals to make timely and accurate diagnoses. Not only can this approach save on the costs and duration of TB treatment, but it also helps prevent the disease’s spread and fatalities. Traditional methods for diagnosing Rifampicin-resistant TB involve molecular biology tests and drug susceptibility testing, which are time-consuming, expensive, and labor-intensive. To assist physicians in diagnosing the treatment response of TB patients to Rifampicin more rapidly and efficiently, this study introduces a computer-aided diagnostic algorithm based on Embedded Deep Learning (EDL). Initially, CT images from target patients at two imaging centers were collected. The classifier model used in this research combines image preprocessing techniques, three convolutional neural networks, and decision fusion technology to enhance the model’s classification efficiency and reduce overfitting. Additionally, the Grad-CAM model was utilized for visualizing the areas of lesions. In the test sets from both centers, the Embedded Deep Learning Model (EDL Model) demonstrated superior performance over other models by combining hard voting or soft voting mechanisms, with an average accuracy improvement of 3.16–16.87%, AUC increase of 3.05–12.66%, and F1-score enhancement of 6.38–22.49%. The diagnostic tool developed in this research for assisting in the diagnosis of TB patients’ response to Rifampicin treatment has significant clinical potential, particularly in settings lacking specialized radiological expertise.

Wenjun Li, Jiaojiao Xiang, Huan Peng, Wanjun Ma, Weijun Liang
Combining Segment Anything Model with Domain-Specific Knowledge for Semi-Supervised Learning in Medical Image Segmentation

The Segment Anything Model (SAM) exhibits a capability to segment a wide array of objects in natural images, serving as a versatile perceptual tool for various downstream image segmentation tasks. In contrast, medical image segmentation tasks often rely on domain-specific knowledge (DSK). In this paper, we propose a novel method (SamDSK) that combines the segmentation foundation model (i.e., SAM) with domain-specific knowledge (DSK) for reliable utilization of unlabeled images in building a medical image segmentation model. Our new method is iterative and consists of two main stages: (1) segmentation model training; (2) expanding the labeled set by using the trained segmentation model, an unlabeled set, SAM, and domain-specific knowledge. These two stages are repeated until no more samples are added to the labeled set. A novel optimal-matching-based method is developed for combining the SAM-generated segmentation proposals and pixel-level and image-level DSK for constructing annotations of unlabeled images in the iterative stage (2). In experiments, we demonstrate the effectiveness of our proposed method for breast cancer segmentation in ultrasound images, polyp segmentation in endoscopic images, and skin lesion segmentation in dermoscopic images. The code is available at github.com/yizhezhang2000/SamDSK.

Yizhe Zhang, Tao Zhou, Ye Wu, Pengfei Gu, Shuo Wang
Meply: A Large-scale Dataset and Baseline Evaluations for Metastatic Perirectal Lymph Node Detection and Segmentation

Accurate segmentation of metastatic lymph nodes in rectal cancer is crucial for the staging and treatment of rectal cancer. However, existing segmentation approaches face challenges due to the absence of pixel-level annotated datasets tailored for lymph nodes around the rectum. Additionally, metastatic lymph nodes are characterized by their relatively small size, irregular shapes, and lower contrast compared to the background, further complicating the segmentation task. To address these challenges, we present the first large-scale perirectal metastatic lymph node CT image dataset called Meply, which encompasses pixel-level annotations of 269 patients diagnosed with rectal cancer. Furthermore, we introduce a novel lymph-node segmentation model named CoSAM. The CoSAM utilizes sequence-based detection to guide the segmentation of metastatic lymph nodes in rectal cancer, contributing to improved localization performance for the segmentation model. It comprises three key components: sequence-based detection module, segmentation module, and collaborative convergence unit. To evaluate the effectiveness of CoSAM, we systematically compare its performance with several popular segmentation methods using the Meply dataset. The code can be accessed at: https://github.com/kanydao/CoSAM .

Weidong Guo, Hantao Zhang, Shouhong Wan, Bingbing Zou, Wanqin Wang, Chenyang Qiu, Jun Li, Peiquan Jin
Swin-HAUnet: A Swin-Hierarchical Attention Unet For Enhanced Medical Image Segmentation

Medical image segmentation plays a pivotal role in computer-aided diagnosis and treatment planning. Traditional segmentation approaches often struggle to balance global and local context, either capturing overall anatomical structures or focusing on minute details, but not both. This paper introduces the Swin-Hierarchical Attention Unet (Swin-HAUnet), which harmonizes this dichotomy by integrating global contextual insights with local feature enhancement. The proposed network architecture employs a hybrid approach, leveraging an advanced transformer-based encoder to process wide-ranging contextual information and an attention-enhanced decoder to refine the segmentation of nuanced and intricate anatomical features. We performed experiments on two publicly available datasets, the Synapse multi-organ segmentation CT dataset and the UW-Madison dataset. The Swin-HAUnet shows a marked improvement in performance, achieving a Dice similarity coefficient of 79.91%, a notable increase of 1.26% over the baseline model on Synapse datasets. These results underscore the model’s effectiveness in complex segmentation tasks and the importance of attention mechanisms in medical image analysis.

Jiarong Chen, Xuyang Zhang, Rongwen Li, Peng Zhou
ODC-SA Net: Orthogonal Direction Enhancement and Scale Aware Network for Polyp Segmentation

Accurate polyp segmentation is crucial for the early detection of colorectal cancer. However, existing polyp detection methods sometimes ignore multi-directional features and the drastic scale changes of concealed targets. To address these challenges, we design an Orthogonal Direction Enhancement and Scale Aware Network (ODC-SA Net) for polyp segmentation. The Orthogonal Direction Convolutional (ODC) block can extract multi-directional features using transposed rectangular convolution kernels through forming sets of orthogonal feature vector basis, which solves the issue of random feature direction changes. Additionally, the Multi-scale Fusion Attention (MSFA) mechanism is proposed to emphasize scale changes in both spatial and channel dimensions, enhancing the segmentation accuracy for polyps of varying sizes. Extraction with Re-attention (ERA) module is used to re-combine effective features, and Shallow Reverse Attention (SRA) mechanism is used to enhance polyp edge with low level information. A large number of experiments conducted on public datasets have demonstrated that the performance of this model is superior to state-of-the-art methods.

Chenhao Xu, Yudian Zhang, Kaiye Xu, Haijiang Zhu
Two-Stage Multi-scale Feature Fusion for Small Medical Object Segmentation

Accurate segmentation of small abnormal regions or anatomical structures in medical images, such as brain tumors and inferior alveolar nerve canals, is crucial for early disease diagnosis and surgical treatments. Most convolutional neural network-based medical image segmentation models still struggle with segmenting small objects due to limited pixel information and unclear features. To address this issue, we propose a method for small medical object segmentation base on multi-scale feature fusion and two-stage joint learning. In the first stage, a coarse segmentation network extracts multi-level features from the medical image and aggregates multi-scale high-level features to obtain a coarse-scaled result that includes location information about the objects. Then, a spatial transformation module passes the output of the coarse-scaled stage to the fine-scaled stage pixel-wise spatial weighting. In the second stage, a shallow denoising module utilizes deep features as guidance for shallow ones to suppress background noise. This results in clearer multi-scale features that enable accurate segmentation mask generation for small target. The proposed method was tested on three small object segmentation tasks, achieving top-ranking metrics such as mean Dice. These results demonstrate the superiority of our method over other state-of-the-art methods. The code is available at https://github.com/wth-stack/MFTN .

Tianhao Wang, Xinli Xu, Cheng Zheng, Haixia Long, Haigen Hu, Qiu Guan, Jianmin Yang
A Two-Stage Automatic Collateral Scoring Framework Based on Brain Vessel Segmentation

Accurately assessing the collateral circulation is critically essential for making acute ischemic stroke treatment plans. Current automatic methods usually rely on a single-stage CNN classifier, which typically requires a huge amount of data for training and thus struggles to cope with the challenge of limited data in clinical practice. To achieve an objective and efficient collateral circulation assessment under small datasets, we propose a two-stage automatic collateral scoring framework composed of a brain vessel segmentation and a scoring classifier. In the segmentation stage, we introduce an improved U-Net named BVU-Net, which can address the diverse and scattered brain vessel morphology in CTA images and achieve more precise segmentation results. In the assessment stage, we propose the Seg-based Vessel Indicator Set (SVIS), comprising four vessel quantification indicators extracted from the output masks of BVU-Net. Using SVIS, classifiers are evaluated on a small clinical dataset of 191 patients. The experiment results demonstrate that the proposed framework significantly outperforms single-stage CNN classifiers, showing substantial advantages and providing a valuable reference for clinical decision-making.

Tianxu Zhang, Hui Huang, Yan Ma, Bingcang Huang, Weiping Lu, Ao Xu
SPARK: Cross-Guided Knowledge Distillation with Spatial Position Augmentation for Medical Image Segmentation

For medical image segmentation, some distillation methods have yielded impressive results, but in these approaches student models normally fail to obtain the focus-needed knowledge during the feature distillation procedure. Therefore, in this paper, we propose Spatial Position Augmentation cRoss-guided Knowledge distillation called SPARK . Specifically, we first focus on the enhanced intermediate features by spatial position, allowing student models to acquire knowledge that is of more concern to segmentation, thus, segmenting to a more precise location. Furthermore, we design a novel Cross-Guided Distillation (CGD) in which the student model can acquire old knowledge to avoid oblivion of knowledge, and acquire new knowledge to obtain learning direction from the teacher model. Thanks to that, student models can segment more precisely, especially for small targets. Besides, by transferring knowledge from well-trained but heavy teacher models to another lightweight model, we address the problem that most existing segmentation models depend on massive storage and complex computations and cannot be used in current clinical settings. To validate the effectiveness of the proposed method, we conduct experiments on two widely used CT datasets LiTS17 and KiTS19. The results demonstrate that our approach significantly improves the segmentation performance of lightweight models, with improvements of up to 32.69% in the dice coefficient score.

Lingbing Xu, Zhiyuan Wang, Weitao Song, Yi Ji, Chunping Liu
VATBoost-Net: Integrating Enhanced Feature Perturbation and Detail Enhancement for Medical Image Segmentation

Medical image segmentation is of great help to clinical practice. However, due to the high annotation costs and substantial time requirements for medical images, fully supervised medical image segmentation models often suffer from insufficient data volume,leading to poor performance. To tackle the annotation challenge of medical image segmentation, we introduce a semi-supervised learning algorithm called vatMatch, which relies on strong-weak consistency perturbation. It leverages partially annotated data to guide the training on unlabeled data and can achieve results comparable to fully supervised learning. Additionally, due to the uneven distribution of feature information across various channels in medical images, the existing mainstream methods based on Transformers are not effective in extracting feature information from individual image channels, and typically incur significant computational resource usage during the feature extraction process. To extract more feature information across different channels using minimal computational resources, we propose a Pixel Fusion Module (PFA) and an Adaptive Large Kernel Convolution (ALK) to enhance the ability of feature information extraction. By integrating PFA, ALK, and vatMatch, we propose a detail-enhanced attention network called VATBoost-Net, which not only addresses the annotation challenge but also enhances the capability of extracting image information features. Experimental results demonstrate the effectiveness of VATBoost-Net. Especially, our method outperforms the current state-of-the-art (SOTA) semi-supervised learning segmentation method UniMatch on the ACDC dataset.

Baichen Liu, Jiaxin Cai, Shunzhi Zhu
DTIL-Net: Dual-Task Interactive Learning Network for Automated Grading of Diabetic Retinopathy and Macular Edema

Diabetic retinopathy (DR) has been the leading cause of blindness associated with a common complication of diabetic macular oedema (DME). Automatic grading of diabetic retinopathy and diabetic macular oedema can reduce the risk of blindness. However, previous studies have focused only on grading DR or DME, often ignoring the interactive association between these two diseases. In this paper, we introduce the interactive learning network (DTIL-Net) for automated grading of DR and DME. DTIL-Net aims to explore and exploit the potential correlation between DR and DME. It consists of two main components: the attention module (AM) and the dual branch exchange module (DBEM). Specifically, we propose the Attention Module to create independent branches of lesion representations for these two diseases, thus enabling cross-channel interactions between high-level semantic features. Further, we introduce the Dual Branch Exchange Module (DBEM) to facilitate the exchange of lesion feature information between the two branches through feature squeezing and excitation operations, thus establishing an intrinsic link between DR and DME. In addition, considering the fine-grained nature of lesion features, we introduce a local diversity discrimination loss (LDD loss) to encourage the network to focus on more discriminative lesion regions. Extensive experiments on the Messidor-1 and IDRiD datasets show that DTIL-Net achieves superior results over existing state-of-the-art methods. On the Messidor-1 dataset, DR and DME classification outperform most other methods (DR: AUC: 97.0 $$\%$$ % , Acc: 93.0 $$\%$$ % ; DME: AUC: 92.6 $$\%$$ % , Acc: 91.2 $$\%$$ % ).

Jie Long, Yumei Tan, Shuxiang Song, Haiying Xia
DeformSegNet: Segmentation Network Fused with Deformation Field for Pancreatic CT Scans

Accurate pancreatic segmentation for CT scans is increasingly important in the early detection and diagnosis of pancreatic cancer. Though deep neural networks based pancreatic segmentation methods have achieved significant success in recent years, due to the low-contrast and indistinct boundaries of pancreas in CT scans, precise segmentation of small pancreatic tumors remains an extremely challenging task. To address this challenge, we propose a novel network named DeformSegNet (DSN), which is composed of three components: Localization Module, Deformation Field Module and Adaptive Segmentation Module. First of all, Localization Module is employed to capture local Region of Interest(ROI) of small pancreatic lesion. And then local ROI is carried out a Deformation Field Module, which relocate local ROI to the center of a new image and then magnify it to the same size as the original input, enhancing the neighborhood information and texture. After that, Adaptive Segmentation Module is used to implement the fine segmentation to the local image. Finally, the Deformation Field Module is employed again to adjust the output of segmentation to its original size. Experiments are conducted on two datasets: NIH and a private dataset provided by collaborating hospitals. The proposed network achieved Dice Similarity Coefficient (DSC) scores of 87.02±4.12%and 85.94%, respectively, demonstrating that the performance of our proposed method has outperformed most of SOTA models.

Dezhang Ye, Qiu Guan, Zehan Zhang, Jianmin Yang, Haigen Hu, Yang Chen, Feng Chen
InsSegLN: A Novel 3D Instance Segmentation Method for Mediastinal Lymph Node

The status of mediastinal lymph nodes plays an important role in accurate clinical staging, treatment selection and prognosis improvement of cancer patients. However, the contrast between lymph nodes and surrounding tissues in computed tomography (CT) images is low, and the size of lymph nodes varies, making manual identification and statistics of lymph nodes time-consuming and inefficient. Moreover, there are few research on instance segmentation of 3D volumetric image. In this paper, we propose a novel 3D image instance segmentation framework called InsSegLN and establish a benchmark for the challenging mediastinal lymph node instance segmentation task. InsSegLN is the first end-to-end 3D instance segmentation model directly processing 3D volumetric data for lymph nodes. It divides the instance segmentation task into a segmentation subtask and a detection subtask, and obtains the instance segmentation result by integrating results of the two subtasks. In order to solve problems such as blurred edges and large size differences of lymph nodes, we build a new backbone and make improvements on the feature pyramid and detection heads. And we use border-core representations to supervise training, which is helpful for the model to identify touched lymph node individuals. Finally, we provide a simple but effective method to integrate the detection result with the segmentation result. We conducted our experiments on a public dataset and an inhouse dataset of mediastinal lymph node and validate the effectiveness of our improvement measures through ablation study. Compared with the baseline method, on the public dataset, InsSegLN improves AP from 0.1971 to 0.2605 when the IoU threshold is set to 0.5, and improves mAP from 0.0764 to 0.1269 when the IoU threshold ranges from 0.5 to 0.9. InsSegLN also achieves significant performance improvement on our inhouse dataset, showing the effectiveness of our method.

Jingyu Xie
RRANet: A Reverse Region-Aware Network with Edge Difference for Accurate Breast Tumor Segmentation in Ultrasound Images

Breast UltraSound (BUS) image segmentation is crucial for the diagnosis and analysis of breast cancer. However, most existing methods for BUS tend to overlook the vital edge information. Meanwhile, noise, similar intensity distribution, varying tumor shape and size, will lead to severe missed detection and false detection. To address these issues, we propose a Reverse Region-Aware Network with Edge Difference, called RRANet, which learns edge information and region information from low-level features and high-level features, respectively. Specifically, we first design Edge Difference Convolution (EDC) to fully mine edge information. EDC aggregates intensity and gradient information to obtain edge details of low-level features in both horizontal and vertical directions. Next, we propose a Multi-Scale Adaptive Module (MSAM) that can effectively extract global information from high-level features. MSAM encodes features in the spatial dimension, which expands the receptive field and captures more local contextual information. In addition, we develop the Reverse Region-Aware Module (RRAM) to gradually refine the global information. This module can establish the relationship between region and edge cues, while correcting some erroneous predictions. Finally, the edge information and global information are fused to improve the prediction accuracy of BUS images. Extensive experiments on three challenging public BUS datasets show that our model outperforms several state-of-the-art medical image segmentation methods on benchmark datasets.

Zhengyu Chen, Xiaoning Song, Yang Hua, Wenjie Zhang
Learning Frequency and Structure in UDA for Medical Object Detection

In medical imaging applications, particularly in cardiac and skeletal analysis, the anatomical structure detection is crucial for diagnosing cardiac disease and other disease. However, the domain gap between images acquired from different sources or modalities poses a significant challenge and impedes model generalization across diverse patient populations and imaging conditions. Bridging this gap is particularly essential in image-based diagnosis, where subtle variations in anatomical structures and imaging characteristics can profoundly impact diagnostic performance. Take fetal cardiac ultrasound images as an example, this paper proposes a novel method for unsupervised domain adaptive fetal cardiac structure detection. The method integrates both the frequency-based distributional properties and anatomical structural information inherent in medical images. Specifically, we introduce a Frequency Distribution Alignment (FDA) module and an Organ Structure Alignment (OSA) module to mitigate detection misalignment across different hospital settings. We demonstrates the effectiveness of these modules through extensive experiments. Our method significantly improves the performance of fetal cardiac structure detection tasks, enabling adaptation to diverse hospital scenarios and showcasing its potential in addressing domain gaps in medical imaging.

Liwen Wang, Xiaoyan Zhang, Guannan He, Ying Tan, Shengli Li, Bin Pu, Zhe Jin, Wen Sha, Xingbo Dong
Skin Lesion Segmentation Method Based on Global Pixel Weighted Focal Loss

Utilizing deep neural networks for automatic segmentation of skin lesion images represents a significant advancement in current research. The issue of class imbalance poses a major challenge in most skin lesion datasets, as skin cancer patients are often much fewer in number compared to patients with other common skin conditions. This disparity results in difficulties for the network to learn features of minority lesion classes, leading to suboptimal segmentation results. To address this problem, we propose the Global Pixel Weighted Focal Loss (GPW-FL) function. Unlike Focal Loss used in the domain of object detection to tackle foreground-background class imbalance, it applies a modulation term to each pixel in an image to adjust its weight. Our approach focuses on reducing the loss assigned to all pixels in well-segmented lesion images during training. Specifically, GPW-FL utilizes Dice loss during training to assess the overall segmentation status of input lesions, enabling the network to pay more attention to poorly segmented skin lesions to improve segmentation performance. To evaluate the effectiveness of GPW-FL, we integrate it into a traditional U-Net segmentation network for training. Experimental results on the ISIC2018 dermoscopic skin lesion dataset and the XJUSL clinical skin lesion dataset demonstrate that our proposed loss function exhibits robustness on class-imbalanced datasets, and segmentation performance surpass those of baseline models and other state-of-the-art segmentation models.

Aolun Li, Jinmiao Song, Long Yu, Shuang Liang, Shengwei Tian, Xin Fan, Zhezhe Zhu, Xiangzuo Huo
Competing Dual-Network with Pseudo-Supervision Rectification for Semi-Supervised Medical Image Segmentation

Semi-supervised medical image segmentation utilizes a large number of unlabeled images in combination with a limited number of labeled images for model training and optimization, significantly reducing the reliance on large-scale labeled images. However, due to the model’s cognitive biases, distribution gap between labeled and unlabeled images, and potential noise in the pseudo-supervision process, learning robust representations from a large number of unlabeled images is still a challenging task. To address these issues, we propose a new framework of Competing Dual-Network with Pseudo-Supervision Rectification (CDPR), which integrates the bidirectional copy-paste mechanism for single image pair and the pseudo-supervision rectification strategy into the architecture of the competing dual-network. Through the competing dual-network, we encourage two segmentation networks to engage in mutual learning and competition, which contributes to break the model’s cognitive biases. We utilize the bidirectional copy-paste technique for single image pair to establish a consistent learning strategy for both labeled and unlabeled data, thereby better aligning the data distribution. Finally, by optimizing the pseudo-supervised loss, the negative impact of potential noise on the model’s segmentation performance during the pseudo-supervision stage is effectively alleviated. Experimental results on the benchmark dataset demonstrate that our method achieves outstanding performance compared to several state-of-the-art methods.

Ping Zhou, Feng Chen, Bingwen Hu, Zhen Tang, Heng Liu, Meiyu Du
Scribbled-Supervised Meibomian Gland Segmentation via Perturbation and Conflict in Dual-Branch Network

Meibomian gland dysfunction (MGD) is the leading cause of dry eyes, and the accurate segmentation of meibomian glands in infrared meibography images is essential for its evaluation and diagnosis. However, obtaining pixel-wise manual annotation remains a highly time-consuming and knowledge-intensive work, severely hindering the progress of segmentation technologies. Using weak annotation, such as scribble, has demonstrated promise in reduction of annotation cost. But these weakly supervised methods still suffer from limited supervision of sparse annotations. In this paper, we propose a novel scribble-supervised segmentation model for meibomian glands, exploring the perturbation and conflict characteristics inherent in the dual-branch structure. Our model leverages an additional branch to induce perturbations, generating pseudo-labels by dynamically mixing the predictions of both branches. This approach enriches the supervision information and mitigates the risk of convergence to local optima. Meanwhile, an uncertainty-based separated self-training strategy is introduced to handle conflict prediction, guiding the model to discern and extract valuable information from predictions with varying confidence levels. Experimental results on an internal dataset demonstrate that our approach achieves outstanding performance. It outperforms existing state-of-the-art methods, even with just 30% of the usual annotation amounts.

Lingjie Lin, Kunfeng Lai, Yushun Huang, Li Li, Jiawen Lin
Backmatter
Metadaten
Titel
Pattern Recognition and Computer Vision
herausgegeben von
Zhouchen Lin
Ming-Ming Cheng
Ran He
Kurban Ubul
Wushouer Silamu
Hongbin Zha
Jie Zhou
Cheng-Lin Liu
Copyright-Jahr
2025
Verlag
Springer Nature Singapore
Electronic ISBN
978-981-9784-96-7
Print ISBN
978-981-9784-95-0
DOI
https://doi.org/10.1007/978-981-97-8496-7