Skip to main content
Top

2021 | Book

Medical Image Computing and Computer Assisted Intervention – MICCAI 2021

24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II

Editors: Prof. Dr. Marleen de Bruijne, Prof. Dr. Philippe C. Cattin, Stéphane Cotin, Nicolas Padoy, Prof. Stefanie Speidel, Yefeng Zheng, Caroline Essert

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

The eight-volume set LNCS 12901, 12902, 12903, 12904, 12905, 12906, 12907, and 12908 constitutes the refereed proceedings of the 24th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2021, held in Strasbourg, France, in September/October 2021.*

The 542 revised full papers presented were carefully reviewed and selected from 1809 submissions in a double-blind review process. The papers are organized in the following topical sections:

Part I: image segmentation

Part II: machine learning - self-supervised learning; machine learning - semi-supervised learning; and machine learning - weakly supervised learning

Part III: machine learning - advances in machine learning theory; machine learning - domain adaptation; machine learning - federated learning; machine learning - interpretability / explainability; and machine learning - uncertainty

Part IV: image registration; image-guided interventions and surgery; surgical data science; surgical planning and simulation; surgical skill and work flow analysis; and surgical visualization and mixed, augmented and virtual reality

Part V: computer aided diagnosis; integration of imaging with non-imaging biomarkers; and outcome/disease prediction

Part VI: image reconstruction; clinical applications - cardiac; and clinical applications - vascular

Part VII: clinical applications - abdomen; clinical applications - breast; clinical applications - dermatology; clinical applications - fetal imaging; clinical applications - lung; clinical applications - neuroimaging - brain development; clinical applications - neuroimaging - DWI and tractography; clinical applications - neuroimaging - functional brain networks; clinical applications - neuroimaging – others; and clinical applications - oncology

Part VIII: clinical applications - ophthalmology; computational (integrative) pathology; modalities - microscopy; modalities - histopathology; and modalities - ultrasound

*The conference was held virtually.

Table of Contents

Frontmatter

Machine Learning - Self-Supervised Learning

Frontmatter
SSLP: Spatial Guided Self-supervised Learning on Pathological Images

Nowadays, there is an urgent requirement of self-supervised learning (SSL) on whole slide pathological images (WSIs) to relieve the demand of finely expert annotations. However, the performance of SSL algorithms on WSIs has long lagged behind their supervised counterparts. To close this gap, in this paper, we fully explore the intrinsic characteristics of WSIs and propose SSLP: Spatial Guided Self-supervised Learning on Pathological Images. We argue the patch-wise spatial proximity is a significant characteristic of WSIs, if properly employed, shall provide abundant supervision for free. Specifically, we explore three semantic invariance from 1) self-invariance: the same patch of different augmented views, 2) intra-invariance: the patches within spatial neighbors and 3) inter-invariance: their corresponding neighbors in the feature space. As a result, our SSLP model achieves $$82.9\%$$ 82.9 % accuracy and $$85.7\%$$ 85.7 % AUC on CAMELYON linear classification and $$95.2\%$$ 95.2 % accuracy fine-tuning on cross-disease classification on NCTCRC, which outperforms previous state-of-the-art algorithm and matches the performance of a supervised counterpart.

Jiajun Li, Tiancheng Lin, Yi Xu
Segmentation of Left Atrial MR Images via Self-supervised Semi-supervised Meta-learning

Deep learning algorithms for cardiac MRI segmentation depend heavily upon abundant, labelled data located at a single medical centre. Clinical settings, however, contain abundant, unlabelled and scarce, labelled data located across distinct medical centres. To account for this, we propose a unified pre-training framework, entitled self-supervised semi-supervised meta-learning ( $$\text {S}^{4}$$ S 4 ML), that exploits distributed labelled and unlabelled data to quickly and reliably perform cardiac MRI segmentation given scarce, labelled data from a potentially different distribution. We show that $$\text {S}^{4}$$ S 4 ML outperforms baseline methods when adapting to data from a novel medical centre, cardiac chamber, and MR sequence. We also show that this behaviour holds even in extremely low-data regimes.

Dani Kiyasseh, Albert Swiston, Ronghua Chen, Antong Chen
Deformed2Self: Self-supervised Denoising for Dynamic Medical Imaging

Image denoising is of great importance for medical imaging system, since it can improve image quality for disease diagnosis and downstream image analyses. In a variety of applications, dynamic imaging techniques are utilized to capture the time-varying features of the subject, where multiple images are acquired for the same subject at different time points. Although signal-to-noise ratio of each time frame is usually limited by the short acquisition time, the correlation among different time frames can be exploited to improve denoising results with shared information across time frames. With the success of neural networks in computer vision, supervised deep learning methods show prominent performance in single-image denoising, which rely on large datasets with clean-vs-noisy image pairs. Recently, several self-supervised deep denoising models have been proposed, achieving promising results without needing the pairwise ground truth of clean images. In the field of multi-image denoising, however, very few works have been done on extracting correlated information from multiple slices for denoising using self-supervised deep learning methods. In this work, we propose Deformed2Self, an end-to-end self-supervised deep learning framework for dynamic imaging denoising. It combines single-image and multi-image denoising to improve image quality and use a spatial transformer network to model motion between different slices. Further, it only requires a single noisy image with a few auxiliary observations at different time frames for training and inference. Evaluations on phantom and in vivo data with different noise statistics show that our method has comparable performance to other state-of-the-art unsupervised or self-supervised denoising methods and outperforms under high noise levels.

Junshen Xu, Elfar Adalsteinsson
Imbalance-Aware Self-supervised Learning for 3D Radiomic Representations

Radiomics can quantify the properties of regions of interest in medical image data. Classically, they account for pre-defined statistics of shape, texture, and other low-level image features. Alternatively, deep learning-based representations are derived from supervised learning but require expensive annotations and often suffer from overfitting and data imbalance issues. In this work, we address the challenge of learning the representation of a 3D medical image for an effective quantification under data imbalance. We propose a self-supervised representation learning framework to learn high-level features of 3D volumes as a complement to existing radiomics features. Specifically, we demonstrate how to learn image representations in a self-supervised fashion using a 3D Siamese network. More importantly, we deal with data imbalance by exploiting two unsupervised strategies: a) sample re-weighting, and b) balancing the composition of training batches. When combining the learned self-supervised feature with traditional radiomics, we show significant improvement in brain tumor classification and lung cancer staging tasks covering MRI and CT imaging modalities. Codes are available in https://github.com/hongweilibran/imbalanced-SSL .

Hongwei Li, Fei-Fei Xue, Krishna Chaitanya, Shengda Luo, Ivan Ezhov, Benedikt Wiestler, Jianguo Zhang, Bjoern Menze
Self-supervised Visual Representation Learning for Histopathological Images

Self-supervised learning provides a possible solution to extract effective visual representations from unlabeled histopathological images. However, existing methods either fail to make good use of domain-specific knowledge, or rely on side information like spatial proximity and magnification. In this paper, we propose CS-CO, a hybrid self-supervised visual representation learning method tailored for histopathological images, which integrates advantages of both generative and discriminative models. The proposed method consists of two self-supervised learning stages: cross-stain prediction (CS) and contrastive learning (CO), both of which are designed based on domain-specific knowledge and do not require side information. A novel data augmentation approach, stain vector perturbation, is specifically proposed to serve contrastive learning. Experimental results on the public dataset NCT-CRC-HE-100K demonstrate the superiority of the proposed method for histopathological image visual representation. Under the common linear evaluation protocol, our method achieves 0.915 eight-class classification accuracy with only 1,000 labeled data, which is about 1.3% higher than the fully-supervised ResNet18 classifier trained with the whole 89,434 labeled training data. Our code is available at https://github.com/easonyang1996/CS-CO .

Pengshuai Yang, Zhiwei Hong, Xiaoxu Yin, Chengzhan Zhu, Rui Jiang
Contrastive Learning with Continuous Proxy Meta-data for 3D MRI Classification

Traditional supervised learning with deep neural networks requires a tremendous amount of labelled data to converge to a good solution. For 3D medical images, it is often impractical to build a large homogeneous annotated dataset for a specific pathology. Self-supervised methods offer a new way to learn a representation of the images in an unsupervised manner with a neural network. In particular, contrastive learning has shown great promises by (almost) matching the performance of fully-supervised CNN on vision tasks. Nonetheless, this method does not take advantage of available meta-data, such as participant’s age, viewed as prior knowledge. Here, we propose to leverage continuous proxy metadata, in the contrastive learning framework, by introducing a new loss called y-Aware InfoNCE loss. Specifically, we improve the positive sampling during pre-training by adding more positive examples with similar proxy meta-data with the anchor, assuming they share similar discriminative semantic features. With our method, a 3D CNN model pre-trained on $$10^4$$ 10 4 multi-site healthy brain MRI scans can extract relevant features for three classification tasks: schizophrenia, bipolar diagnosis and Alzheimer’s detection. When fine-tuned, it also outperforms 3D CNN trained from scratch on these tasks, as well as state-of-the-art self-supervised methods. Our code is made publicly available here .

Benoit Dufumier, Pietro Gori, Julie Victor, Antoine Grigis, Michele Wessa, Paolo Brambilla, Pauline Favre, Mircea Polosan, Colm McDonald, Camille Marie Piguet, Mary Phillips, Lisa Eyler, Edouard Duchesnay, the Alzheimer’s Disease Neuroimaging Initiative
Sli2Vol: Annotate a 3D Volume from a Single Slice with Self-supervised Learning

The objective of this work is to segment any arbitrary structures of interest (SOI) in 3D volumes by only annotating a single slice, (i.e. semi-automatic 3D segmentation). We show that high accuracy can be achieved by simply propagating the 2D slice segmentation with an affinity matrix between consecutive slices, which can be learnt in a self-supervised manner, namely slice reconstruction. Specifically, we compare our proposed framework, termed as Sli2Vol, with supervised approaches and two other unsupervised/self-supervised slice registration approaches, on 8 public datasets (both CT and MRI scans), spanning 9 different SOIs. Without any parameter-tuning, the same model achieves superior performance with Dice scores (0–100 scale) of over 80 for most of the benchmarks, including the ones that are unseen during training. Our results show generalizability of the proposed approach across data from different machines and with different SOIs: a major use case of semi-automatic segmentation methods where fully supervised approaches would normally struggle.

Pak-Hei Yeung, Ana I. L. Namburete, Weidi Xie
Self-supervised Longitudinal Neighbourhood Embedding

Longitudinal MRIs are often used to capture the gradual deterioration of brain structure and function caused by aging or neurological diseases. Analyzing this data via machine learning generally requires a large number of ground-truth labels, which are often missing or expensive to obtain. Reducing the need for labels, we propose a self-supervised strategy for representation learning named Longitudinal Neighborhood Embedding (LNE). Motivated by concepts in contrastive learning, LNE explicitly models the similarity between trajectory vectors across different subjects. We do so by building a graph in each training iteration defining neighborhoods in the latent space so that the progression direction of a subject follows the direction of its neighbors. This results in a smooth trajectory field that captures the global morphological change of the brain while maintaining the local continuity. We apply LNE to longitudinal T1w MRIs of two neuroimaging studies: a dataset composed of 274 healthy subjects, and Alzheimer’s Disease Neuroimaging Initiative (ADNI, $$N=632$$ N = 632 ). The visualization of the smooth trajectory vector field and superior performance on downstream tasks demonstrate the strength of the proposed method over existing self-supervised methods in extracting information associated with normal aging and in revealing the impact of neurodegenerative disorders. The code is available at https://github.com/ouyangjiahong/longitudinal-neighbourhood-embedding .

Jiahong Ouyang, Qingyu Zhao, Ehsan Adeli, Edith V. Sullivan, Adolf Pfefferbaum, Greg Zaharchuk, Kilian M. Pohl
Self-supervised Multi-modal Alignment for Whole Body Medical Imaging

This paper explores the use of self-supervised deep learning in medical imaging in cases where two scan modalities are available for the same subject. Specifically, we use a large publicly-available dataset of over 20,000 subjects from the UK Biobank with both whole body Dixon technique magnetic resonance (MR) scans and also dual-energy x-ray absorptiometry (DXA) scans. We make three contributions: (i) We introduce a multi-modal image-matching contrastive framework, that is able to learn to match different-modality scans of the same subject with high accuracy. (ii) Without any adaption, we show that the correspondences learnt during this contrastive training step can be used to perform automatic cross-modal scan registration in a completely unsupervised manner. (iii) Finally, we use these registrations to transfer segmentation maps from the DXA scans to the MR scans where they are used to train a network to segment anatomical regions without requiring ground-truth MR examples. To aid further research, our code is publicly available ( https://github.com/rwindsor1/biobank-self-supervised-alignment ).

Rhydian Windsor, Amir Jamaludin, Timor Kadir, Andrew Zisserman
SimTriplet: Simple Triplet Representation Learning with a Single GPU

Contrastive learning is a key technique of modern self-supervised learning. The broader accessibility of earlier approaches is hindered by the need of heavy computational resources (e.g., at least 8 GPUs or 32 TPU cores), which accommodate for large-scale negative samples or momentum. The more recent SimSiam approach addresses such key limitations via stop-gradient without momentum encoders. In medical image analysis, multiple instances can be achieved from the same patient or tissue. Inspired by these advances, we propose a simple triplet representation learning (SimTriplet) approach on pathological images. The contribution of the paper is three-fold: (1) The proposed SimTriplet method takes advantage of the multi-view nature of medical images beyond self-augmentation; (2) The method maximizes both intra-sample and inter-sample similarities via triplets from positive pairs, without using negative samples; and (3) The recent mix precision training is employed to advance the training by only using a single GPU with 16 GB memory. By learning from 79,000 unlabeled pathological patch images, SimTriplet achieved 10.58% better performance compared with supervised learning. It also achieved 2.13% better performance compared with SimSiam. Our proposed SimTriplet can achieve decent performance using only 1% labeled data. The code and data are available at https://github.com/hrlblab/SimTriplet .

Quan Liu, Peter C. Louis, Yuzhe Lu, Aadarsh Jha, Mengyang Zhao, Ruining Deng, Tianyuan Yao, Joseph T. Roland, Haichun Yang, Shilin Zhao, Lee E. Wheless, Yuankai Huo
Lesion-Based Contrastive Learning for Diabetic Retinopathy Grading from Fundus Images

Manually annotating medical images is extremely expensive, especially for large-scale datasets. Self-supervised contrastive learning has been explored to learn feature representations from unlabeled images. However, unlike natural images, the application of contrastive learning to medical images is relatively limited. In this work, we propose a self-supervised framework, namely lesion-based contrastive learning for automated diabetic retinopathy (DR) grading. Instead of taking entire images as the input in the common contrastive learning scheme, lesion patches are employed to encourage the feature extractor to learn representations that are highly discriminative for DR grading. We also investigate different data augmentation operations in defining our contrastive prediction task. Extensive experiments are conducted on the publicly-accessible dataset EyePACS, demonstrating that our proposed framework performs outstandingly on DR grading in terms of both linear evaluation and transfer capacity evaluation.

Yijin Huang, Li Lin, Pujin Cheng, Junyan Lyu, Xiaoying Tang
SAR: Scale-Aware Restoration Learning for 3D Tumor Segmentation

Automatic and accurate tumor segmentation on medical images is in high demand to assist physicians with diagnosis and treatment. However, it is difficult to obtain massive amounts of annotated training data required by the deep-learning models as the manual delineation process is often tedious and expertise required. Although self-supervised learning (SSL) scheme has been widely adopted to address this problem, most SSL methods focus only on global structure information, ignoring the key distinguishing features of tumor regions: local intensity variation and large size distribution. In this paper, we propose Scale-Aware Restoration (SAR), a SSL method for 3D tumor segmentation. Specifically, a novel proxy task, i.e. scale discrimination, is formulated to pre-train the 3D neural network combined with the self-restoration task. Thus, the pre-trained model learns multi-level local representations through multi-scale inputs. Moreover, an adversarial learning module is further introduced to learn modality invariant representations from multiple unlabeled source datasets. We demonstrate the effectiveness of our methods on two downstream tasks: i) Brain tumor segmentation, ii) Pancreas tumor segmentation. Compared with the state-of-the-art 3D SSL methods, our proposed approach can significantly improve the segmentation accuracy. Besides, we analyze its advantages from multiple perspectives such as data efficiency, performance, and convergence speed.

Xiaoman Zhang, Shixiang Feng, Yuhang Zhou, Ya Zhang, Yanfeng Wang
Self-supervised Correction Learning for Semi-supervised Biomedical Image Segmentation

Biomedical image segmentation plays a significant role in computer-aided diagnosis. However, existing CNN based methods rely heavily on massive manual annotations, which are very expensive and require huge human resources. In this work, we adopt a coarse-to-fine strategy and propose a self-supervised correction learning paradigm for semi-supervised biomedical image segmentation. Specifically, we design a dual-task network, including a shared encoder and two independent decoders for segmentation and lesion region inpainting, respectively. In the first phase, only the segmentation branch is used to obtain a relatively rough segmentation result. In the second step, we mask the detected lesion regions on the original image based on the initial segmentation map, and send it together with the original image into the network again to simultaneously perform inpainting and segmentation separately. For labeled data, this process is supervised by the segmentation annotations, and for unlabeled data, it is guided by the inpainting loss of masked lesion regions. Since the two tasks rely on similar feature information, the unlabeled data effectively enhances the representation of the network to the lesion regions and further improves the segmentation performance. Moreover, a gated feature fusion (GFF) module is designed to incorporate the complementary features from the two tasks. Experiments on three medical image segmentation datasets for different tasks including polyp, skin lesion and fundus optic disc segmentation well demonstrate the outstanding performance of our method compared with other semi-supervised approaches. The code is available at https://github.com/ReaFly/SemiMedSeg .

Ruifei Zhang, Sishuo Liu, Yizhou Yu, Guanbin Li
SpineGEM: A Hybrid-Supervised Model Generation Strategy Enabling Accurate Spine Disease Classification with a Small Training Dataset

Most deep-learning based magnetic resonance image (MRI) analysis methods require numerous amounts of labelling work manually done by specialists, which is laborious and time-consuming. In this paper, we aim to develop a hybrid-supervised model generation strategy, called SpineGEM, which can economically generate a high-performing deep learning model for the classification of multiple pathologies of lumbar degeneration disease (LDD). A unique self-supervised learning process is adopted to generate a pre-trained model, with no pathology labels or human interventions required. The anatomical priori information is explicitly integrated into the self-supervised process, through auto-generated pixel-wise masks (using MRI-SegFlow: a system with unique voting processes for unsupervised deep learning-based segmentation) of vertebral bodies (VBs) and intervertebral discs (IVDs). With finetuning of a small dataset, the model can produce accurate pathology classifications. Our SpineGEM is validated on the Hong Kong Disc Degeneration Cohort (HKDDC) dataset with pathologies including Schneiderman Score, Disc Bulging, Pfirrmann Grading and Schmorl’s Node. Results show that compared with training from scratch (n = 1280), the model generated through SpineGEM (n = 320) can achieve higher classification accuracy with much less supervision (~5% higher on mean-precision and ~4% higher on mean-recall).

Xihe Kuang, Jason Pui Yin Cheung, Xiaowei Ding, Teng Zhang
Contrastive Learning of Relative Position Regression for One-Shot Object Localization in 3D Medical Images

Deep learning networks have shown promising performance for object localization in medical images, but require large amount of annotated data for supervised training. To address this problem, we propose: 1) A novel contrastive learning method which embeds the anatomical structure by predicting the Relative Position Regression (RPR) between any two patches from the same volume; 2) An one-shot framework for organ and landmark localization in volumetric medical images. Our main idea comes from that tissues and organs from different human bodies own similar relative position and context. Therefore, we could predict the relative positions of their non-local patches, thus locate the target organ. Our one-shot localization framework is composed of three parts: 1) A deep network trained to project the input patch into a 3D latent vector, representing its anatomical position; 2) A coarse-to-fine framework contains two projection networks, providing more accurate localization of the target; 3) Based on the coarse-to-fine model, we transfer the organ bounding-box (B-box) detection to locating six extreme points along x, y and z directions in the query volume. Experiments on multi-organ localization from head-and-neck (HaN) and abdominal CT volumes showed that our method acquired competitive performance in real time, which is more accurate and $$10^5$$ 10 5 times faster than template matching methods with the same setting for one-shot localization in 3D medical images. Code is available at https://github.com/HiLab-git/RPR-Loc .

Wenhui Lei, Wei Xu, Ran Gu, Hao Fu, Shaoting Zhang, Shichuan Zhang, Guotai Wang
Topological Learning and Its Application to Multimodal Brain Network Integration

A long-standing challenge in multimodal brain network analyses is to integrate topologically different brain networks obtained from diffusion and functional MRI in a coherent statistical framework. Existing multimodal frameworks will inevitably destroy the topological difference of the networks. In this paper, we propose a novel topological learning framework that integrates networks of different topology through persistent homology. Such challenging task is made possible through the introduction of a new topological loss that bypasses intrinsic computational bottlenecks and thus enables us to perform various topological computations and optimizations with ease. We validate the topological loss in extensive statistical simulations with ground truth to assess its effectiveness of discriminating networks. Among many possible applications, we demonstrate the versatility of topological loss in the twin imaging study where we determine the extend to which brain networks are genetically heritable.

Tananun Songdechakraiwut, Li Shen, Moo Chung
One-Shot Medical Landmark Detection

The success of deep learning methods relies on the availability of a large number of datasets with annotations; however, curating such datasets is burdensome, especially for medical images. To relieve such a burden for a landmark detection task, we explore the feasibility of using only a single annotated image and propose a novel framework named Cascade Comparing to Detect (CC2D) for one-shot landmark detection. CC2D consists of two stages: 1) Self-supervised learning (CC2D-SSL) and 2) Training with pseudo-labels (CC2D-TPL). CC2D-SSL captures the consistent anatomical information in a coarse-to-fine fashion by comparing the cascade feature representations and generates predictions on the training set. CC2D-TPL further improves the performance by training a new landmark detector with those predictions. The effectiveness of CC2D is evaluated on a widely-used public dataset of cephalometric landmark detection, which achieves a competitive detection accuracy of 86.25.01% within 4.0 mm, comparable to the state-of-the-art semi-supervised methods using a lot more than one training image. Our code is available at https://github.com/ICT-MIRACLE-lab/Oneshot_landmark_detection .

Qingsong Yao, Quan Quan, Li Xiao, S. Kevin Zhou
Implicit Field Learning for Unsupervised Anomaly Detection in Medical Images

We propose a novel unsupervised out-of-distribution detection method for medical images based on implicit fields image representations. In our approach, an auto-decoder feed-forward neural network learns the distribution of healthy images in the form of a mapping between spatial coordinates and probabilities over a proxy for tissue types. At inference time, the learnt distribution is used to retrieve, from a given test image, a restoration, i.e. an image maximally consistent with the input one but belonging to the healthy distribution. Anomalies are localized using the voxel-wise probability predicted by our model for the restored image. We tested our approach in the task of unsupervised localization of gliomas on brain MR images and compared it to several other VAE-based anomaly detection methods. Results show that the proposed technique substantially outperforms them (average DICE 0.640 vs 0.518 for the best performing VAE-based alternative) while also requiring considerably less computing time.

Sergio Naval Marimont, Giacomo Tarroni
Dual-Consistency Semi-supervised Learning with Uncertainty Quantification for COVID-19 Lesion Segmentation from CT Images

The novel coronavirus disease 2019 (COVID-19) characterized by atypical pneumonia has caused millions of deaths worldwide. Automatically segmenting lesions from chest Computed Tomography (CT) is a promising way to assist doctors in COVID-19 screening, treatment planning, and follow-up monitoring. However, voxel-wise annotations are extremely expert-demanding and scarce, especially when it comes to novel diseases, while an abundance of unlabeled data could be available. To tackle the challenge of limited annotations, in this paper, we propose an uncertainty-guided dual-consistency learning network (UDC-Net) for semi-supervised COVID-19 lesion segmentation from CT images. Specifically, we present a dual-consistency learning scheme that simultaneously imposes image transformation equivalence and feature perturbation invariance to effectively harness the knowledge from unlabeled data. We then quantify the segmentation uncertainty in two forms and employ them together to guide the consistency regularization for more reliable unsupervised learning. Extensive experiments showed that our proposed UDC-Net improves the fully supervised method by 6.3% in Dice and outperforms other competitive semi-supervised approaches by significant margins, demonstrating high potential in real-world clinical practice. (Code is available at https://github.com/poiuohke/UDC-Net ).

Yanwen Li, Luyang Luo, Huangjing Lin, Hao Chen, Pheng-Ann Heng
Contrastive Pre-training and Representation Distillation for Medical Visual Question Answering Based on Radiology Images

One of the primary challenges facing medical visual question answering (Med-VQA) is the lack of large-scale well-annotated datasets for training. To overcome this challenge, this paper proposes a two-stage pre-training framework by learning transferable feature representations of radiology images and distilling a lightweight visual feature extractor for Med-VQA. Specifically, we leverage large amounts of unlabeled radiology images to train three teacher models for the body regions of brain, chest, and abdomen respectively via contrastive learning. Then, we distill the teacher models to a lightweight student model that can be used as a universal visual feature extractor for any Med-VQA system. The lightweight feature extractor can be readily fine-tuned on the training radiology images of any Med-VQA dataset, saving the annotation effort while preventing overfitting to small-scale training data. The effectiveness and advantages of the pre-trained model are demonstrated by extensive experiments with state-of-the-art Med-VQA methods on existing benchmarks. The source code and the pre-training dataset can be downloaded from https://github.com/awenbocc/cprd .

Bo Liu, Li-Ming Zhan, Xiao-Ming Wu
Positional Contrastive Learning for Volumetric Medical Image Segmentation

The success of deep learning heavily depends on the availability of large labeled training sets. However, it is hard to get large labeled datasets in medical image domain because of the strict privacy concern and costly labeling efforts. Contrastive learning, an unsupervised learning technique, has been proved powerful in learning image-level representations from unlabeled data. The learned encoder can then be transferred or fine-tuned to improve the performance of downstream tasks with limited labels. A critical step in contrastive learning is the generation of contrastive data pairs, which is relatively simple for natural image classification but quite challenging for medical image segmentation due to the existence of the same tissue or organ across the dataset. As a result, when applied to medical image segmentation, most state-of-the-art contrastive learning frameworks inevitably introduce a lot of false negative pairs and result in degraded segmentation quality. To address this issue, we propose a novel positional contrastive learning (PCL) framework to generate contrastive data pairs by leveraging the position information in volumetric medical images. Experimental results on CT and MRI datasets demonstrate that the proposed PCL method can substantially improve the segmentation performance compared to existing methods in both semi-supervised setting and transfer learning setting. (Code available at github.com/dewenzeng/positional_cl ).

Dewen Zeng, Yawen Wu, Xinrong Hu, Xiaowei Xu, Haiyun Yuan, Meiping Huang, Jian Zhuang, Jingtong Hu, Yiyu Shi
Longitudinal Self-supervision to Disentangle Inter-patient Variability from Disease Progression

The problem of building disease progression models with longitudinal data has long been addressed with parametric mixed-effect models. They provide interpretable models at the cost of modeling assumptions on the progression profiles and their variability across subjects. Their deep learning counterparts, on the other hand, strive on flexible data-driven modeling, and additional interpretability - or, as far as generative models are involved, disentanglement of latent variables with respect to generative factors - comes from additional constraints. In this work, we propose a deep longitudinal model designed to disentangle inter-patient variability from an estimated disease progression timeline. We do not seek for an explicit mapping between age and disease stage, but to learn the latter solely from the ordering between visits using a differentiable ranking loss. Furthermore, we encourage inter-patient variability to be encoded in a separate latent space, where for each patient a single representation is learned from its set of visits, with a constraint of invariance under permutation of the visits. The modularity of the network architecture allows us to apply our model on various data types: a synthetic image dataset with known generative factors, cognitive assessments and neuroimaging data. We show that, combined with our patient encoder, the ranking loss for visits helps to exceed models with supervision, in particular in terms of disease staging.

Raphaël Couronné, Paul Vernhet, Stanley Durrleman
Self-supervised Vessel Enhancement Using Flow-Based Consistencies

Vessel segmentation is an essential task in many clinical applications. Although supervised methods have achieved state-of-art performance, acquiring expert annotation is laborious and mostly limited for two-dimensional datasets with a small sample size. On the contrary, unsupervised methods rely on handcrafted features to detect tube-like structures such as vessels. However, those methods require complex pipelines involving several hyper-parameters and design choices rendering the procedure sensitive, dataset-specific, and not generalizable. We propose a self-supervised method with a limited number of hyper-parameters that is generalizable across modalities. Our method uses tube-like structure properties, such as connectivity, profile consistency, and bifurcation, to introduce inductive bias into a learning algorithm. To model those properties, we generate a vector field that we refer to as a flow. Our experiments on various public datasets in 2D and 3D show that our method performs better than unsupervised methods while learning useful transferable features from unlabeled data. Unlike generic self-supervised methods, the learned features learn vessel-relevant features that are transferable for supervised approaches, which is essential when the number of annotated data is limited.

Rohit Jena, Sumedha Singla, Kayhan Batmanghelich
Unsupervised Contrastive Learning of Radiomics and Deep Features for Label-Efficient Tumor Classification

Tumor classification is important for decision support of precision medicine. Computer-aided diagnosis by convolutional neural networks relies on a large amount of annotated dataset, which is costly sometimes. To solve the poor predictive ability caused by tumor heterogeneity and inadequate labeled image data, a self-supervised learning method combined with radiomics is proposed to learn rich visual representation about tumors without human supervision. A self-supervised pretext task, namely “Radiomics-Deep Feature Correspondence”, is formulated to maximize agreement between radiomics view and deep learning view of the same sample in the latent space. The presented self-supervised model is evaluated on two public medical image datasets of thyroid nodule and kidney tumor and achieves high score on linear evaluations. Furthermore, fine-tuning the pre-trained network leads to a better score than the train-from-scratch models on the tumor classification task and shows label-efficient performance using small training datasets. This shows injecting radiomics prior knowledge about tumors into the representation space can build a more powerful self-supervised method.

Ziteng Zhao, Guanyu Yang
Learning 4D Infant Cortical Surface Atlas with Unsupervised Spherical Networks

Spatiotemporal (4D) cortical surface atlas during infancy plays an important role for surface-based visualization, normalization and analysis of the dynamic early brain development. Conventional atlas construction methods typically rely on classical group-wise registration on sub-populations and ignore longitudinal constraints, thus having three main issues: 1) constructing templates at discrete time points; 2) resulting in longitudinal inconsistency among different age’s atlases; and 3) taking extremely long runtime. To address these issues, in this paper, we propose a fast unsupervised learning-based surface atlas construction framework incorporating longitudinal constraints to enforce the within-subject temporal correspondence in the atlas space. To well handle the difficulties of learning large deformations, we propose a multi-level multi-modal spherical registration network to perform cortical surface registration in a coarse-to-fine manner. Thus, only small deformations need to be estimated at each resolution level using the registration network, which further improves registration accuracy and atlas quality. Our constructed 4D infant cortical surface atlas based on 625 longitudinal scans from 291 infants is temporally continuous, in contrast to the state-of-the-art UNC 4D Infant Surface Atlas, which only provides the atlases at a few discrete sparse time points. By evaluating the intra- and inter-subject spatial normalization accuracy after alignment onto the atlas, our atlas demonstrates more detailed and fine-grained cortical patterns, thus leading to higher accuracy in surface registration.

Fenqiang Zhao, Zhengwang Wu, Li Wang, Weili Lin, Shunren Xia, Gang Li, the UNC/UMN Baby Connectome Project Consortium
Multimodal Representation Learning via Maximization of Local Mutual Information

We propose and demonstrate a representation learning approach by maximizing the mutual information between local features of images and text. The goal of this approach is to learn useful image representations by taking advantage of the rich information contained in the free text that describes the findings in the image. Our method trains image and text encoders by encouraging the resulting representations to exhibit high local mutual information. We make use of recent advances in mutual information estimation with neural network discriminators. We argue that the sum of local mutual information is typically a lower bound on the global mutual information. Our experimental results in the downstream image classification tasks demonstrate the advantages of using local features for image-text representation learning.

Ruizhi Liao, Daniel Moyer, Miriam Cha, Keegan Quigley, Seth Berkowitz, Steven Horng, Polina Golland, William M. Wells
Inter-regional High-Level Relation Learning from Functional Connectivity via Self-supervision

In recent studies, we have witnessed the applicability of deep learning methods on resting-state functional magnetic resonance image (rs-fMRI) analysis and its use for brain disease diagnosis, e.g., autism spectrum disorder (ASD). However, it still remains challenging to learn discriminative representations from raw BOLD signals or functional connectivity (FC) with a limited number of samples. In this paper, we propose a simple but efficient representation learning method for FC in a self-supervised learning manner. Specifically, we devise a proxy task of estimating the randomly masked seed-based functional networks from the remaining ones in FC, to discover the complex high-level relations among brain regions, which are not directly observable from an input FC. Thanks to the random masking strategy in our proxy task, it also has the effect of augmenting training samples, thus allowing for robust training. With the pretrained feature representation network in a self-supervised manner, we then construct a decision network for the downstream task of ASD diagnosis. In order to validate the effectiveness of our proposed method, we used the ABIDE dataset that collected subjects from multiple sites and our proposed method showed superiority to the comparative methods in various metrics.

Wonsik Jung, Da-Woon Heo, Eunjin Jeon, Jaein Lee, Heung-Il Suk

Machine Learning - Semi-Supervised Learning

Frontmatter
Semi-supervised Left Atrium Segmentation with Mutual Consistency Training

Semi-supervised learning has attracted great attention in the field of machine learning, especially for medical image segmentation tasks, since it alleviates the heavy burden of collecting abundant densely annotated data for training. However, most of existing methods underestimate the importance of challenging regions (e.g. small branches or blurred edges) during training. We believe that these unlabeled regions may contain more crucial information to minimize the uncertainty prediction for the model and should be emphasized in the training process. Therefore, in this paper, we propose a novel Mutual Consistency Network (MC-Net) for semi-supervised left atrium segmentation from 3D MR images. Particularly, our MC-Net consists of one encoder and two slightly different decoders, and the prediction discrepancies of two decoders are transformed as an unsupervised loss by our designed cycled pseudo label scheme to encourage mutual consistency. Such mutual consistency encourages the two decoders to have consistent and low-entropy predictions and enables the model to gradually capture generalized features from these unlabeled challenging regions. We evaluate our MC-Net on the public Left Atrium (LA) database and it obtains impressive performance gains by exploiting the unlabeled data effectively. Our MC-Net outperforms six recent semi-supervised methods for left atrium segmentation, and sets the new state-of-the-art performance on the LA database.

Yicheng Wu, Minfeng Xu, Zongyuan Ge, Jianfei Cai, Lei Zhang
Semi-supervised Meta-learning with Disentanglement for Domain-Generalised Medical Image Segmentation

Generalising deep models to new data from new centres (termed here domains) remains a challenge. This is largely attributed to shifts in data statistics (domain shifts) between source and unseen domains. Recently, gradient-based meta-learning approaches where the training data are split into meta-train and meta-test sets to simulate and handle the domain shifts during training have shown improved generalisation performance. However, the current fully supervised meta-learning approaches are not scalable for medical image segmentation, where large effort is required to create pixel-wise annotations. Meanwhile, in a low data regime, the simulated domain shifts may not approximate the true domain shifts well across source and unseen domains. To address this problem, we propose a novel semi-supervised meta-learning framework with disentanglement. We explicitly model the representations related to domain shifts. Disentangling the representations and combining them to reconstruct the input image allows unlabeled data to be used to better approximate the true domain shifts for meta-learning. Hence, the model can achieve better generalisation performance, especially when there is a limited amount of labeled data. Experiments show that the proposed method is robust on different segmentation tasks and achieves state-of-the-art generalisation performance on two public benchmarks. Code is publicly available at: https://github.com/vios-s/DGNet .

Xiao Liu, Spyridon Thermos, Alison O’Neil, Sotirios A. Tsaftaris
Efficient Semi-supervised Gross Target Volume of Nasopharyngeal Carcinoma Segmentation via Uncertainty Rectified Pyramid Consistency

Gross Target Volume (GTV) segmentation plays an irreplaceable role in radiotherapy planning for Nasopharyngeal Carcinoma (NPC). Despite that Convolutional Neural Networks (CNN) have achieved good performance for this task, they rely on a large set of labeled images for training, which is expensive and time-consuming to acquire. In this paper, we propose a novel framework with Uncertainty Rectified Pyramid Consistency (URPC) regularization for semi-supervised NPC GTV segmentation. Concretely, we extend a backbone segmentation network to produce pyramid predictions at different scales. The pyramid predictions network (PPNet) is supervised by the ground truth of labeled images and a multi-scale consistency loss for unlabeled images, motivated by the fact that prediction at different scales for the same input should be similar and consistent. However, due to the different resolution of these predictions, encouraging them to be consistent at each pixel directly has low robustness and may lose some fine details. To address this problem, we further design a novel uncertainty rectifying module to enable the framework to gradually learn from meaningful and reliable consensual regions at different scales. Experimental results on a dataset with 258 NPC MR images showed that with only 10% or 20% images labeled, our method largely improved the segmentation performance by leveraging the unlabeled images, and it also outperformed five state-of-the-art semi-supervised segmentation methods. Moreover, when only 50% labeled images, URPC achieved an average Dice score of 82.74% that was close to fully supervised learning. Code is available at: https://github.com/HiLab-git/SSL4MIS .

Xiangde Luo, Wenjun Liao, Jieneng Chen, Tao Song, Yinan Chen, Shichuan Zhang, Nianyong Chen, Guotai Wang, Shaoting Zhang
Few-Shot Domain Adaptation with Polymorphic Transformers

Deep neural networks (DNNs) trained on one set of medical images often experience severe performance drop on unseen test images, due to various domain discrepancy between the training images (source domain) and the test images (target domain), which raises a domain adaptation issue. In clinical settings, it is difficult to collect enough annotated target domain data in a short period. Few-shot domain adaptation, i.e., adapting a trained model with a handful of annotations, is highly practical and useful in this case. In this paper, we propose a Polymorphic Transformer (Polyformer), which can be incorporated into any DNN backbones for few-shot domain adaptation. Specifically, after the polyformer layer is inserted into a model trained on the source domain, it extracts a set of prototype embeddings, which can be viewed as a “basis” of the source-domain features. On the target domain, the polyformer layer adapts by only updating a projection layer which controls the interactions between image features and the prototype embeddings. All other model weights (except BatchNorm parameters) are frozen during adaptation. Thus, the chance of overfitting the annotations is greatly reduced, and the model can perform robustly on the target domain after being trained on a few annotated images. We demonstrate the effectiveness of Polyformer on two medical segmentation tasks (i.e., optic disc/cup segmentation, and polyp segmentation). The source code of Polyformer is released at https://github.com/askerlee/segtran .

Shaohua Li, Xiuchao Sui, Jie Fu, Huazhu Fu, Xiangde Luo, Yangqin Feng, Xinxing Xu, Yong Liu, Daniel S. W. Ting, Rick Siow Mong Goh
Lesion Segmentation and RECIST Diameter Prediction via Click-Driven Attention and Dual-Path Connection

Measuring lesion size is an important step to assess tumor growth and monitor disease progression and therapy response in oncology image analysis. Although it is tedious and highly time-consuming, radiologists have to work on this task by using RECIST criteria (Response Evaluation Criteria In Solid Tumors) routinely and manually. Even though lesion segmentation may be the more accurate and clinically more valuable means, physicians can not manually segment lesions as now since much more heavy laboring will be required. In this paper, we present a prior-guided dual-path network (PDNet) to segment common types of lesions throughout the whole body and predict their RECIST diameters accurately and automatically. Similar to [23], a click guidance from radiologists is the only requirement. There are two key characteristics in PDNet: 1) Learning lesion-specific attention matrices in parallel from the click prior information by the proposed prior encoder, named click-driven attention; 2) Aggregating the extracted multi-scale features comprehensively by introducing top-down and bottom-up connections in the proposed decoder, named dual-path connection. Experiments show the superiority of our proposed PDNet in lesion segmentation and RECIST diameter prediction using the DeepLesion dataset and an external test set. PDNet learns comprehensive and representative deep image features for our tasks and produces more accurate results on both lesion segmentation and RECIST diameter prediction.

Youbao Tang, Ke Yan, Jinzheng Cai, Lingyun Huang, Guotong Xie, Jing Xiao, Jingjing Lu, Gigin Lin, Le Lu
Reciprocal Learning for Semi-supervised Segmentation

Semi-supervised learning has been recently employed to solve problems from medical image segmentation due to challenges in acquiring sufficient manual annotations, which is an important prerequisite for building high-performance deep learning methods. Since unlabeled data is generally abundant, most existing semi-supervised approaches focus on how to make full use of both limited labeled data and abundant unlabeled data. In this paper, we propose a novel semi-supervised strategy called reciprocal learning for medical image segmentation, which can be easily integrated into any CNN architecture. Concretely, the reciprocal learning works by having a pair of networks, one as a student and one as a teacher. The student model learns from pseudo label generated by the teacher. Furthermore, the teacher updates its parameters autonomously according to the reciprocal feedback signal of how well student performs on the labeled set. Extensive experiments on two public datasets show that our method outperforms current state-of-the-art semi-supervised segmentation methods, demonstrating the potential of our strategy for the challenging semi-supervised problems. The code is publicly available at https://github.com/XYZach/RLSSS .

Xiangyun Zeng, Rian Huang, Yuming Zhong, Dong Sun, Chu Han, Di Lin, Dong Ni, Yi Wang
Disentangled Sequential Graph Autoencoder for Preclinical Alzheimer’s Disease Characterizations from ADNI Study

Given a population longitudinal neuroimaging measurements defined on a brain network, exploiting temporal dependencies within the sequence of data and corresponding latent variables defined on the graph (i.e., network encoding relationships between regions of interest (ROI)) can highly benefit characterizing the brain. Here, it is important to distinguish time-variant (e.g., longitudinal measures) and time-invariant (e.g., gender) components to analyze them individually. For this, we propose an innovative and ground-breaking Disentangled Sequential Graph Autoencoder which leverages the Sequential Variational Autoencoder (SVAE), graph convolution and semi-supervising framework together to learn a latent space composed of time-variant and time-invariant latent variables to characterize disentangled representation of the measurements over the entire ROIs. Incorporating target information in the decoder with a supervised loss let us achieve more effective representation learning towards improved classification. We validate our proposed method on the longitudinal cortical thickness data from Alzheimer’s Disease Neuroimaging Initiative (ADNI) study. Our method outperforms baselines with traditional techniques demonstrating benefits for effective longitudinal data representation for predicting labels and longitudinal data generation.

Fan Yang, Rui Meng, Hyuna Cho, Guorong Wu, Won Hwa Kim
POPCORN: Progressive Pseudo-Labeling with Consistency Regularization and Neighboring

Semi-supervised learning (SSL) uses unlabeled data to compensate for the scarcity of annotated images and the lack of method generalization to unseen domains, two usual problems in medical segmentation tasks. In this work, we propose POPCORN, a novel method combining consistency regularization and pseudo-labeling designed for image segmentation. The proposed framework uses high-level regularization to constrain our segmentation model to use similar latent features for images with similar segmentations. POPCORN estimates a proximity graph to select data from easiest ones to more difficult ones, in order to ensure accurate pseudo-labeling and to limit confirmation bias. Applied to multiple sclerosis lesion segmentation, our method demonstrates competitive results compared to other state-of-the-art SSL strategies.

Reda Abdellah Kamraoui, Vinh-Thong Ta, Nicolas Papadakis, Fanny Compaire, José V. Manjon, Pierrick Coupé
3D Semantic Mapping from Arthroscopy Using Out-of-Distribution Pose and Depth and In-Distribution Segmentation Training

Minimally invasive surgery (MIS) has many documented advantages, but the surgeon’s limited visual contact with the scene can be problematic. Hence, systems that can help surgeons navigate, such as a method that can produce a 3D semantic map, can compensate for the limitation above. In theory, we can borrow 3D semantic mapping techniques developed for robotics, but this requires finding solutions to the following challenges in MIS: 1) semantic segmentation, 2) depth estimation, and 3) pose estimation. In this paper, we propose the first 3D semantic mapping system from knee arthroscopy that solves the three challenges above. Using out-of-distribution non-human datasets, where pose could be labeled, we jointly train depth+pose estimators using self-supervised and supervised losses. Using an in-distribution human knee dataset, we train a fully-supervised semantic segmentation system to label arthroscopic image pixels into femur, ACL, and meniscus. Taking testing images from human knees, we combine the results from these two systems to automatically create 3D semantic maps of the human knee. The result of this work opens the pathway to the generation of intra-operative 3D semantic mapping, registration with pre-operative data, and robotic-assisted arthroscopy. Source code: https://github.com/YJonmo/EndoMapNet .

Yaqub Jonmohamadi, Shahnewaz Ali, Fengbei Liu, Jonathan Roberts, Ross Crawford, Gustavo Carneiro, Ajay K. Pandey
Semi-Supervised Unpaired Multi-Modal Learning for Label-Efficient Medical Image Segmentation

Multi-modal learning using unpaired labeled data from multiple modalities to boost the performance of deep learning models on each individual modality has attracted a lot of interest in medical image segmentation recently. However, existing unpaired multi-modal learning methods require a considerable amount of labeled data from both modalities to obtain satisfying segmentation results which are not easy to obtain in reality. In this paper, we investigate the use of unlabeled data for label-efficient unpaired multi-modal learning, with a focus on the scenario when labeled data is scarce and unlabeled data is abundant. We term this new problem as Semi-Supervised Unpaired Multi-Modal Learning and thereupon, propose a novel deep co-training framework. Specifically, our framework consists of two segmentation networks, where we train one of them for each modality. Unlabeled data is effectively applied to learn two image translation networks for translating images across modalities. Thus, labeled data from one modality is employed for the training of the segmentation network in the other modality after image translation. To prevent overfitting under the label scarce scenario, we introduce a new semantic consistency loss to regularize the predictions of an image and its translation from the two segmentation networks to be semantically consistent. We further design a novel class-balanced deep co-training scheme to effectively leverage the valuable complementary information from both modalities to boost the segmentation performance. We verify the effectiveness of our framework with two medical image segmentation tasks and our framework outperforms existing methods significantly.

Lei Zhu, Kaiyuan Yang, Meihui Zhang, Ling Ling Chan, Teck Khim Ng, Beng Chin Ooi
Implicit Neural Distance Representation for Unsupervised and Supervised Classification of Complex Anatomies

The task of 3D shape classification is closely related to finding a good representation of the shapes. In this study, we focus on surface representations of complex anatomies and on how such representations can be utilized for super- and unsupervised classification. We present a novel Implicit Neural Distance Representation based on unsigned distance fields (UDFs). The UDFs can be embedded into a low-dimensional latent space, which is optimized using only the shape itself. We demonstrate that this self-optimized latent space holds important global shape information useful for reconstructing the anatomies, but also that unsupervised clustering of the latent vectors successfully separates different anatomies (left atrium, left/right ear-canals and human faces). Finally, we show how the representation can be used to do gender classification of human face geometries, which is a notoriously hard problem.

Kristine Aavild Juhl, Xabier Morales, Ole de Backer, Oscar Camara, Rasmus Reinhold Paulsen
3D Graph-S2Net: Shape-Aware Self-ensembling Network for Semi-supervised Segmentation with Bilateral Graph Convolution

Semi-supervised learning (SSL) algorithms have attracted much attentions in medical image segmentation due to challenge in acquiring pixel-wise annotations by using unlabeled data. However, most of existing SSLs neglected the geometric shape constraint in object, leading to unsatisfactory boundary and non-smooth of object. In this paper, we propose a shape-aware semi-supervised 3D medical image segmentation network, named 3D Graph-S2Net, which incorporates the flexible shape information and learns duality constraints between semantics and geometrics in the graph domain. Specifically, our method consists of two parts: a multi-task learning network (3D S2Net) and a graph-based cross-task module (3D BGCM). The 3D S2Net improves the existing self-ensembling model (i.e., Mean-Teacher model) by adding a signed distance map (SDM) prediction task, which encodes richer features of object shape and surface. Moreover, the 3D BGCM explores the co-occurrence relations between the semantics segmentation and SDM prediction task, so that the network learns stronger semantic and geometric correspondences from both labeled and unlabeled data. Experimental results on the Atrial Segmentation Challenge confirm that our 3D Graph-S2Net outperforms the state-of-the-arts in semi-supervised segmentation.

Huimin Huang, Nan Zhou, Lanfen Lin, Hongjie Hu, Yutaro Iwamoto, Xian-Hua Han, Yen-Wei Chen, Ruofeng Tong
Duo-SegNet: Adversarial Dual-Views for Semi-supervised Medical Image Segmentation

Segmentation of images is a long-standing challenge in medical AI. This is mainly due to the fact that training a neural network to perform image segmentation requires a significant number of pixel-level annotated data, which is often unavailable. To address this issue, we propose a semi-supervised image segmentation technique based on the concept of multi-view learning. In contrast to the previous art, we introduce an adversarial form of dual-view training and employ a critic to formulate the learning problem in multi-view training as a min-max problem. Thorough quantitative and qualitative evaluations on several datasets, indicate that our proposed method outperforms state-of-the-art medical image segmentation algorithms consistently and comfortably. The code is publicly available at https://github.com/himashi92/Duo-SegNet .

Himashi Peiris, Zhaolin Chen, Gary Egan, Mehrtash Harandi
Neighbor Matching for Semi-supervised Learning

Consistency regularization has shown superiority in deep semi-supervised learning, which commonly estimates pseudo-label conditioned on each single sample and its perturbations. However, such a strategy ignores the relation between data points, and probably arises error accumulation problems once one sample and its perturbations are integrally misclassified. Against this issue, we propose Neighbor Matching, a pseudo-label estimator that propagates labels for unlabeled samples according to their neighboring ones (labeled samples with the same semantic category) during training in an online manner. Different from existing methods, for an unlabeled sample, our Neighbor Matching defines a mapping function that predicts its pseudo-label conditioned on itself and its local manifold. Concretely, the local manifold is constructed by a memory padding module that memorizes the embeddings and labels of labeled data across different mini-batches. We experiment with two distinct benchmark datasets for semi-supervised classification of thoracic disease and skin lesion, and the results demonstrate the superiority of our approach beyond other state-of-the-art methods. Source code is publicly available at https://github.com/renzhenwang/neighbor-matching .

Renzhen Wang, Yichen Wu, Huai Chen, Lisheng Wang, Deyu Meng
Tripled-Uncertainty Guided Mean Teacher Model for Semi-supervised Medical Image Segmentation

Due to the difficulty in accessing a large amount of labeled data, semi-supervised learning is becoming an attractive solution in medical image segmentation. To make use of unlabeled data, current popular semi-supervised methods (e.g., temporal ensembling, mean teacher) mainly impose data-level and model-level consistency on unlabeled data. In this paper, we argue that in addition to these strategies, we could further utilize auxiliary tasks and consider task-level consistency to better leverage unlabeled data for segmentation. Specifically, we introduce two auxiliary tasks, i.e., a foreground and background reconstruction task for capturing semantic information and a signed distance field (SDF) prediction task for imposing shape constraint, and explore the mutual promotion effect between the two auxiliary and the segmentation tasks based on mean teacher architecture. Moreover, to handle the potential bias of the teacher model caused by annotation scarcity, we develop a tripled-uncertainty guided framework to encourage the three tasks in the teacher model to generate more reliable pseudo labels. When calculating uncertainty, we propose an uncertainty weighted integration (UWI) strategy for yielding the segmentation predictions of the teacher. Extensive experiments on public 2017 ACDC dataset and PROMISE12 dataset have demostrated the effectiveness of our method. Code is available at https://github.com/DeepMedLab/Tri-U-MT .

Kaiping Wang, Bo Zhan, Chen Zu, Xi Wu, Jiliu Zhou, Luping Zhou, Yan Wang
Learning with Noise: Mask-Guided Attention Model for Weakly Supervised Nuclei Segmentation

Deep convolutional neural networks have been highly effective in segmentation tasks. However, high performance often requires large datasets with high-quality annotations, especially for segmentation, which requires precise pixel-wise labelling. The difficulty of generating high-quality datasets often constrains the improvement of research in such areas. To alleviate this issue, we propose a weakly supervised learning method for nuclei segmentation that only requires annotation of the nuclear centroid. To train the segmentation model with point annotations, we first generate boundary and superpixel-based masks as pseudo ground truth labels to train a segmentation network that is enhanced by a mask-guided attention auxiliary network. Then to further improve the accuracy of supervision, we apply Confident Learning to correct the pseudo labels at the pixel-level for a refined training. Our method shows highly competitive performance of cell nuclei segmentation in histopathology images on two public datasets. Our code is available at: https://github.com/RuoyuGuo/MaskGA_Net .

Ruoyu Guo, Maurice Pagnucco, Yang Song
Order-Guided Disentangled Representation Learning for Ulcerative Colitis Classification with Limited Labels

Ulcerative colitis (UC) classification, which is an important task for endoscopic diagnosis, involves two main difficulties. First, endoscopic images with the annotation about UC (positive or negative) are usually limited. Second, they show a large variability in their appearance due to the location in the colon. Especially, the second difficulty prevents us from using existing semi-supervised learning techniques, which are the common remedy for the first difficulty. In this paper, we propose a practical semi-supervised learning method for UC classification by newly exploiting two additional features, the location in a colon (e.g., left colon) and image capturing order, both of which are often attached to individual images in endoscopic image sequences. The proposed method can extract the essential information of UC classification efficiently by a disentanglement process with those features. Experimental results demonstrate that the proposed method outperforms several existing semi-supervised learning methods in the classification task, even with a small number of annotated images.

Shota Harada, Ryoma Bise, Hideaki Hayashi, Kiyohito Tanaka, Seiichi Uchida
Semi-supervised Contrastive Learning for Label-Efficient Medical Image Segmentation

The success of deep learning methods in medical image segmentation tasks heavily depends on a large amount of labeled data to supervise the training. On the other hand, the annotation of biomedical images requires domain knowledge and can be laborious. Recently, contrastive learning has demonstrated great potential in learning latent representation of images even without any label. Existing works have explored its application to biomedical image segmentation where only a small portion of data is labeled, through a pre-training phase based on self-supervised contrastive learning without using any labels followed by a supervised fine-tuning phase on the labeled portion of data only. In this paper, we establish that by including the limited label information in the pre-training phase, it is possible to boost the performance of contrastive learning. We propose a supervised local contrastive loss that leverages limited pixel-wise annotation to force pixels with the same label to gather around in the embedding space. Such loss needs pixel-wise computation which can be expensive for large images, and we further propose two strategies, downsampling and block division, to address the issue. We evaluate our methods on two public biomedical image datasets of different modalities. With different amounts of labeled data, our methods consistently outperform the state-of-the-art contrast-based methods and other semi-supervised learning techniques.

Xinrong Hu, Dewen Zeng, Xiaowei Xu, Yiyu Shi
Functional Magnetic Resonance Imaging Data Augmentation Through Conditional ICA

Advances in computational cognitive neuroimaging research are related to the availability of large amounts of labeled brain imaging data, but such data are scarce and expensive to generate. While powerful data generation mechanisms, such as Generative Adversarial Networks (GANs), have been designed in the last decade for computer vision, such improvements have not yet carried over to brain imaging. A likely reason is that GANs training is ill-suited to the noisy, high-dimensional and small-sample data available in functional neuroimaging. In this paper, we introduce Conditional Independent Components Analysis (Conditional ICA): a fast functional Magnetic Resonance Imaging (fMRI) data augmentation technique, that leverages abundant resting-state data to create images by sampling from an ICA decomposition. We then propose a mechanism to condition the generator on classes observed with few samples. We first show that the generative mechanism is successful at synthesizing data indistinguishable from observations, and that it yields gains in classification accuracy in brain decoding problems. In particular it outperforms GANs while being much easier to optimize and interpret. Lastly, Conditional ICA enhances classification accuracy in eight datasets without further parameters tuning.

Badr Tajini, Hugo Richard, Bertrand Thirion
Scalable Joint Detection and Segmentation of Surgical Instruments with Weak Supervision

Computer vision based models, such as object segmentation, detection and tracking, have the potential to assist surgeons intra-operatively and improve the quality and outcomes of minimally invasive surgery. Different work streams towards instrument detection include segmentation, bounding box localisation and classification. While segmentation models offer much more granular results, bounding box annotations are easier to annotate at scale. To leverage the granularity of segmentation approaches with the scalability of bounding box-based models, a multi-task model for joint bounding box detection and segmentation of surgical instruments is proposed. The model consists of a shared backbone and three independent heads for the tasks of classification, bounding box regression, and segmentation. Using adaptive losses together with simple yet effective weakly-supervised label inference, the proposed model use weak labels to learn to segment surgical instruments with a fraction of the dataset requiring segmentation masks. Results suggest that instrument detection and segmentation tasks share intrinsic challenges and jointly learning from both reduces the burden of annotating masks at scale. Experimental validation shows that the proposed model obtain comparable results to that of single-task state-of-the-art detector and segmentation models, while only requiring a fraction of the dataset to be annotated with masks. Specifically, the proposed model obtained 0.81 weighted average precision (wAP) and 0.73 mean intersection-over-union (IOU) in the Endovis2018 dataset with 1% annotated masks, while performing joint detection and segmentation at more than 20 frames per second.

Ricardo Sanchez-Matilla, Maria Robu, Imanol Luengo, Danail Stoyanov

Machine Learning - Weakly Supervised Learning

Frontmatter
Weakly-Supervised Universal Lesion Segmentation with Regional Level Set Loss

Accurately segmenting a variety of clinically significant lesions from whole body computed tomography (CT) scans is a critical task on precision oncology imaging, denoted as universal lesion segmentation (ULS). Manual annotation is the current clinical practice, being highly time-consuming and inconsistent on tumor’s longitudinal assessment. Effectively training an automatic segmentation model is desirable but relies heavily on a large number of pixel-wise labelled data. Existing weakly-supervised segmentation approaches often struggle with regions nearby the lesion boundaries. In this paper, we present a novel weakly-supervised universal lesion segmentation method by building an attention enhanced model based on the High-Resolution Network (HRNet), named AHRNet, and propose a regional level set (RLS) loss for optimizing lesion boundary delineation. AHRNet provides advanced high-resolution deep image features by involving a decoder, dual-attention and scale attention mechanisms, which are crucial to performing accurate lesion segmentation. RLS can optimize the model reliably and effectively in a weakly-supervised fashion, forcing the segmentation close to lesion boundary. Extensive experimental results demonstrate that our method achieves the best performance on the publicly large-scale DeepLesion dataset and a hold-out test set.

Youbao Tang, Jinzheng Cai, Ke Yan, Lingyun Huang, Guotong Xie, Jing Xiao, Jingjing Lu, Gigin Lin, Le Lu
Bounding Box Tightness Prior for Weakly Supervised Image Segmentation

This paper presents a weakly supervised image segmentation method that adopts tight bounding box annotations. It proposes generalized multiple instance learning (MIL) and smooth maximum approximation to integrate the bounding box tightness prior into the deep neural network in an end-to-end manner. In generalized MIL, positive bags are defined by parallel crossing lines with a set of different angles, and negative bags are defined as individual pixels outside of any bounding boxes. Two variants of smooth maximum approximation, i.e., $$\alpha $$ α -softmax function and $$\alpha $$ α -quasimax function, are exploited to conquer the numeral instability introduced by maximum function of bag prediction. The proposed approach was evaluated on two pubic medical datasets using Dice coefficient. The results demonstrate that it outperforms the state-of-the-art methods. The codes are available at https://github.com/wangjuan313/wsis-boundingbox .

Juan Wang, Bin Xia
OXnet: Deep Omni-Supervised Thoracic Disease Detection from Chest X-Rays

Chest X-ray (CXR) is the most typical diagnostic X-ray examination for screening various thoracic diseases. Automatically localizing lesions from CXR is promising for alleviating radiologists’ reading burden. However, CXR datasets are often with massive image-level annotations and scarce lesion-level annotations, and more often, without annotations. Thus far, unifying different supervision granularities to develop thoracic disease detection algorithms has not been comprehensively addressed. In this paper, we present OXnet, the first deep omni-supervised thoracic disease detection network to our best knowledge that uses as much available supervision as possible for CXR diagnosis. We first introduce supervised learning via a one-stage detection model. Then, we inject a global classification head to the detection model and propose dual attention alignment to guide the global gradient to the local detection branch, which enables learning lesion detection from image-level annotations. We also impose intra-class compactness and inter-class separability with global prototype alignment to further enhance the global information learning. Moreover, we leverage a soft focal loss to distill the soft pseudo-labels of unlabeled data generated by a teacher model. Extensive experiments on a large-scale chest X-ray dataset show the proposed OXnet outperforms competitive methods with significant margins. Further, we investigate omni-supervision under various annotation granularities and corroborate OXnet is a promising choice to mitigate the plight of annotation shortage for medical image diagnosis (Code is available at https://github.com/LLYXC/OXnet .).

Luyang Luo, Hao Chen, Yanning Zhou, Huangjing Lin, Pheng-Ann Heng
Adapting Off-the-Shelf Source Segmenter for Target Medical Image Segmentation

Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a labeled source domain to an unlabeled and unseen target domain, which is usually trained on data from both domains. Access to the source domain data at the adaptation stage, however, is often limited, due to data storage or privacy issues. To alleviate this, in this work, we target source free UDA for segmentation, and propose to adapt an “off-the-shelf” segmentation model pre-trained in the source domain to the target domain, with an adaptive batch-wise normalization statistics adaptation framework. Specifically, the domain-specific low-order batch statistics, i.e., mean and variance, are gradually adapted with an exponential momentum decay scheme, while the consistency of domain shareable high-order batch statistics, i.e., scaling and shifting parameters, is explicitly enforced by our optimization objective. The transferability of each channel is adaptively measured first from which to balance the contribution of each channel. Moreover, the proposed source free UDA framework is orthogonal to unsupervised learning methods, e.g., self-entropy minimization, which can thus be simply added on top of our framework. Extensive experiments on the BraTS 2018 database show that our source free UDA framework outperformed existing source-relaxed UDA methods for the cross-subtype UDA segmentation task and yielded comparable results for the cross-modality UDA segmentation task, compared with a supervised UDA methods with the source data.

Xiaofeng Liu, Fangxu Xing, Chao Yang, Georges El Fakhri, Jonghye Woo
Quality-Aware Memory Network for Interactive Volumetric Image Segmentation

Despite recent progress of automatic medical image segmentation techniques, fully automatic results usually fail to meet the clinical use and typically require further refinement. In this work, we propose a quality-aware memory network for interactive segmentation of 3D medical images. Provided by user guidance on an arbitrary slice, an interaction network is firstly employed to obtain an initial 2D segmentation. The quality-aware memory network subsequently propagates the initial segmentation estimation bidirectionally over the entire volume. Subsequent refinement based on additional user guidance on other slices can be incorporated in the same manner. To further facilitate interactive segmentation, a quality assessment module is introduced to suggest the next slice to segment based on the current segmentation quality of each slice. The proposed network has two appealing characteristics: 1) The memory-augmented network offers the ability to quickly encode past segmentation information, which will be retrieved for the segmentation of other slices; 2) The quality assessment module enables the model to directly estimate the qualities of segmentation predictions, which allows an active learning paradigm where users preferentially label the lowest-quality slice for multi-round refinement. The proposed network leads to a robust interactive segmentation engine, which can generalize well to various types of user annotations (e.g., scribbles, boxes). Experimental results on various medical datasets demonstrate the superiority of our approach in comparison with existing techniques.

Tianfei Zhou, Liulei Li, Gustav Bredell, Jianwu Li, Ender Konukoglu
Improving Pneumonia Localization via Cross-Attention on Medical Images and Reports

Localization and characterization of diseases like pneumonia are primary steps in a clinical pipeline, facilitating detailed clinical diagnosis and subsequent treatment planning. Additionally, such location annotated datasets can provide a pathway for deep learning models to be used for downstream tasks. However, acquiring quality annotations is expensive on human resources and usually requires domain expertise. On the other hand, medical reports contain a plethora of information both about pnuemonia characteristics and its location. In this paper, we propose a novel weakly-supervised attention-driven deep learning model that leverages encoded information in medical reports during training to facilitate better localization. Our model also performs classification of attributes that are associated to pneumonia and extracted from medical reports for supervision. Both the classification and localization are trained in conjunction and once trained, the model can be utilized for both the localization and characterization of pneumonia using only the input image. In this paper, we explore and analyze the model using chest X-ray datasets and demonstrate qualitatively and quantitatively that the introduction of textual information improves pneumonia localization. We showcase quantitative results on two datasets, MIMIC-CXR and Chest X-ray-8, and we also showcase severity characterization on COVID-19 dataset.

Riddhish Bhalodia, Ali Hatamizadeh, Leo Tam, Ziyue Xu, Xiaosong Wang, Evrim Turkbey, Daguang Xu
Combining Attention-Based Multiple Instance Learning and Gaussian Processes for CT Hemorrhage Detection

Intracranial hemorrhage (ICH) is a life-threatening emergency with high rates of mortality and morbidity. Rapid and accurate detection of ICH is crucial for patients to get a timely treatment. In order to achieve the automatic diagnosis of ICH, most deep learning models rely on huge amounts of slice labels for training. Unfortunately, the manual annotation of CT slices by radiologists is time-consuming and costly. To diagnose ICH, in this work, we propose to use an attention-based multiple instance learning (Att-MIL) approach implemented through the combination of an attention-based convolutional neural network (Att-CNN) and a variational Gaussian process for multiple instance learning (VGPMIL). Only labels at scan-level are necessary for training. Our method (a) trains the model using scan labels and assigns each slice with an attention weight, which can be used to provide slice-level predictions, and (b) uses the VGPMIL model based on low-dimensional features extracted by the Att-CNN to obtain improved predictions both at slice and scan levels. To analyze the performance of the proposed approach, our model has been trained on 1150 scans from an RSNA dataset and evaluated on 490 scans from an external CQ500 dataset. Our method outperforms other methods using the same scan-level training and is able to achieve comparable or even better results than other methods relying on slice-level annotations.

Yunan Wu, Arne Schmidt, Enrique Hernández-Sánchez, Rafael Molina, Aggelos K. Katsaggelos
CPNet: Cycle Prototype Network for Weakly-Supervised 3D Renal Compartments Segmentation on CT Images

Renal compartment segmentation on CT images targets on extracting the 3D structure of renal compartments from abdominal CTA images and is of great significance to the diagnosis and treatment for kidney diseases. However, due to the unclear compartment boundary, thin compartment structure and large anatomy variation of 3D kidney CT images, deep-learning based renal compartment segmentation is a challenging task. We propose a novel weakly supervised learning framework, Cycle Prototype Network, for 3D renal compartment segmentation. It has three innovations: (1) A Cycle Prototype Learning (CPL) is proposed to learn consistency for generalization. It learns from pseudo labels through the forward process and learns consistency regularization through the reverse process. The two processes make the model robust to noise and label-efficient. (2) We propose a Bayes Weakly Supervised Module (BWSM) based on cross-period prior knowledge. It learns prior knowledge from cross-period unlabeled data and perform error correction automatically, thus generates accurate pseudo labels. (3) We present a Fine Decoding Feature Extractor (FDFE) for fine-grained feature extraction. It combines global morphology information and local detail information to obtain feature maps with sharp detail, so the model will achieve fine segmentation on thin structures. Our extensive experiments demonstrated our great performance. Our model achieves Dice of $$79.1\%$$ 79.1 % and $$78.7\%$$ 78.7 % with only four labeled images, achieving a significant improvement by about $$20\%$$ 20 % than typical prototype model PANet [16].

Song Wang, Yuting He, Youyong Kong, Xiaomei Zhu, Shaobo Zhang, Pengfei Shao, Jean-Louis Dillenseger, Jean-Louis Coatrieux, Shuo Li, Guanyu Yang
Observational Supervision for Medical Image Classification Using Gaze Data

Deep learning models have demonstrated favorable performance on many medical image classification tasks. However, they rely on expensive hand-labeled datasets that are time-consuming to create. In this work, we explore a new supervision source to training deep learning models by using gaze data that is passively and cheaply collected during a clinician’s workflow. We focus on three medical imaging tasks, including classifying chest X-ray scans for pneumothorax and brain MRI slices for metastasis, two of which we curated gaze data for. The gaze data consists of a sequence of fixation locations on the image from an expert trying to identify an abnormality. Hence, the gaze data contains rich information about the image that can be used as a powerful supervision source. We first identify a set of gaze features and show that they indeed contain class-discriminative information. Then, we propose two methods for incorporating gaze features into deep learning pipelines. When no task labels are available, we combine multiple gaze features to extract weak labels and use them as the sole source of supervision (Gaze-WS). When task labels are available, we propose to use the gaze features as auxiliary task labels in a multi-task learning framework (Gaze-MTL). On three medical image classification tasks, our Gaze-WS method without task labels comes within 5 AUROC points (1.7 precision points) of models trained with task labels. With task labels, our Gaze-MTL method can improve performance by 2.4 AUROC points (4 precision points) over multiple baselines.

Khaled Saab, Sarah M. Hooper, Nimit S. Sohoni, Jupinder Parmar, Brian Pogatchnik, Sen Wu, Jared A. Dunnmon, Hongyang R. Zhang, Daniel Rubin, Christopher Ré
Inter Extreme Points Geodesics for End-to-End Weakly Supervised Image Segmentation

We introduce InExtremIS, a weakly supervised 3D approach to train a deep image segmentation network using particularly weak train-time annotations: only 6 extreme clicks at the boundary of the objects of interest. Our fully-automatic method is trained end-to-end and does not require any test-time annotations. From the extreme points, 3D bounding boxes are extracted around objects of interest. Then, deep geodesics connecting extreme points are generated to increase the amount of “annotated” voxels within the bounding boxes. Finally, a weakly supervised regularised loss derived from a Conditional Random Field formulation is used to encourage prediction consistency over homogeneous regions. Extensive experiments are performed on a large open dataset for Vestibular Schwannoma segmentation. InExtremIS obtained competitive performance, approaching full supervision and outperforming significantly other weakly supervised techniques based on bounding boxes. Moreover, given a fixed annotation time budget, InExtremIS outperformed full supervision. Our code and data are available online.

Reuben Dorent, Samuel Joutard, Jonathan Shapey, Aaron Kujawa, Marc Modat, Sébastien Ourselin, Tom Vercauteren
Efficient and Generic Interactive Segmentation Framework to Correct Mispredictions During Clinical Evaluation of Medical Images

Semantic segmentation of medical images is an essential first step in computer-aided diagnosis systems for many applications. However, given many disparate imaging modalities and inherent variations in the patient data, it is difficult to consistently achieve high accuracy using modern deep neural networks (DNNs). This has led researchers to propose interactive image segmentation techniques where a medical expert can interactively correct the output of a DNN to the desired accuracy. However, these techniques often need separate training data with the associated human interactions, and do not generalize to various diseases, and types of medical images. In this paper, we suggest a novel conditional inference technique for DNNs which takes the intervention by a medical expert as test time constraints and performs inference conditioned upon these constraints. Our technique is generic can be used for medical images from any modality. Unlike other methods, our approach can correct multiple structures simultaneously and add structures missed at initial segmentation. We report an improvement of 13.3, 12.5, 17.8, 10.2, and 12.4 times in user annotation time than full human annotation for the nucleus, multiple cells, liver and tumor, organ, and brain segmentation respectively. We report a time saving of 2.8, 3.0, 1.9, 4.4, and 8.6 fold compared to other interactive segmentation techniques. Our method can be useful to clinicians for diagnosis and post-surgical follow-up with minimal intervention from the medical expert. The source-code and the detailed results are available here [1].

Bhavani Sambaturu, Ashutosh Gupta, C. V. Jawahar, Chetan Arora
Learning Whole-Slide Segmentation from Inexact and Incomplete Labels Using Tissue Graphs

Segmenting histology images into diagnostically relevant regions is imperative to support timely and reliable decisions by pathologists. To this end, computer-aided techniques have been proposed to delineate relevant regions in scanned histology slides. However, the techniques necessitate task-specific large datasets of annotated pixels, which is tedious, time-consuming, expensive, and infeasible to acquire for many histology tasks. Thus, weakly-supervised semantic segmentation techniques are proposed to leverage weak supervision which is cheaper and quicker to acquire. In this paper, we propose $$\textsc {SegGini}$$ S E G G I N I , a weakly-supervised segmentation method using graphs, that can utilize weak multiplex annotations, i.e., inexact and incomplete annotations, to segment arbitrary and large images, scaling from tissue microarray (TMA) to whole slide image (WSI). Formally, $$\textsc {SegGini}$$ S E G G I N I constructs a tissue-graph representation for an input image, where the graph nodes depict tissue regions. Then, it performs weakly-supervised segmentation via node classification by using inexact image-level labels, incomplete scribbles, or both. We evaluated $$\textsc {SegGini}$$ S E G G I N I on two public prostate cancer datasets containing TMAs and WSIs. Our method achieved state-of-the-art segmentation performance on both datasets for various annotation settings while being comparable to a pathologist baseline. Code and models are available at: https://github.com/histocartography/seg-gini .

Valentin Anklin, Pushpak Pati, Guillaume Jaume, Behzad Bozorgtabar, Antonio Foncubierta-Rodriguez, Jean-Philippe Thiran, Mathilde Sibony, Maria Gabrani, Orcun Goksel
Label-Set Loss Functions for Partial Supervision: Application to Fetal Brain 3D MRI Parcellation

Deep neural networks have increased the accuracy of automatic segmentation, however their accuracy depends on the availability of a large number of fully segmented images. Methods to train deep neural networks using images for which some, but not all, regions of interest are segmented are necessary to make better use of partially annotated datasets. In this paper, we propose the first axiomatic definition of label-set loss functions that are the loss functions that can handle partially segmented images. We prove that there is one and only one method to convert a classical loss function for fully segmented images into a proper label-set loss function. Our theory also allows us to define the leaf-Dice loss, a label-set generalisation of the Dice loss particularly suited for partial supervision with only missing labels. Using the leaf-Dice loss, we set a new state of the art in partially supervised learning for fetal brain 3D MRI segmentation. We achieve a deep neural network able to segment white matter, ventricles, cerebellum, extra-ventricular CSF, cortical gray matter, deep gray matter, brainstem, and corpus callosum based on fetal brain 3D MRI of anatomically normal fetuses or with open spina bifida. Our implementation of the proposed label-set loss functions is available at https://github.com/LucasFidon/label-set-loss-functions .

Lucas Fidon, Michael Aertsen, Doaa Emam, Nada Mufti, Frédéric Guffens, Thomas Deprest, Philippe Demaerel, Anna L. David, Andrew Melbourne, Sébastien Ourselin, Jan Deprest, Tom Vercauteren
Backmatter
Metadata
Title
Medical Image Computing and Computer Assisted Intervention – MICCAI 2021
Editors
Prof. Dr. Marleen de Bruijne
Prof. Dr. Philippe C. Cattin
Stéphane Cotin
Nicolas Padoy
Prof. Stefanie Speidel
Yefeng Zheng
Caroline Essert
Copyright Year
2021
Electronic ISBN
978-3-030-87196-3
Print ISBN
978-3-030-87195-6
DOI
https://doi.org/10.1007/978-3-030-87196-3

Premium Partner