Skip to main content

2021 | Book

Medical Image Computing and Computer Assisted Intervention – MICCAI 2021

24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part III

Editors: Prof. Dr. Marleen de Bruijne, Prof. Dr. Philippe C. Cattin, Stéphane Cotin, Nicolas Padoy, Prof. Stefanie Speidel, Yefeng Zheng, Caroline Essert

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science


About this book

The eight-volume set LNCS 12901, 12902, 12903, 12904, 12905, 12906, 12907, and 12908 constitutes the refereed proceedings of the 24th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2021, held in Strasbourg, France, in September/October 2021.*

The 542 revised full papers presented were carefully reviewed and selected from 1809 submissions in a double-blind review process. The papers are organized in the following topical sections:

Part I: image segmentation

Part II: machine learning - self-supervised learning; machine learning - semi-supervised learning; and machine learning - weakly supervised learning

Part III: machine learning - advances in machine learning theory; machine learning - domain adaptation; machine learning - federated learning; machine learning - interpretability / explainability; and machine learning - uncertainty

Part IV: image registration; image-guided interventions and surgery; surgical data science; surgical planning and simulation; surgical skill and work flow analysis; and surgical visualization and mixed, augmented and virtual reality

Part V: computer aided diagnosis; integration of imaging with non-imaging biomarkers; and outcome/disease prediction

Part VI: image reconstruction; clinical applications - cardiac; and clinical applications - vascular

Part VII: clinical applications - abdomen; clinical applications - breast; clinical applications - dermatology; clinical applications - fetal imaging; clinical applications - lung; clinical applications - neuroimaging - brain development; clinical applications - neuroimaging - DWI and tractography; clinical applications - neuroimaging - functional brain networks; clinical applications - neuroimaging – others; and clinical applications - oncology

Part VIII: clinical applications - ophthalmology; computational (integrative) pathology; modalities - microscopy; modalities - histopathology; and modalities - ultrasound

*The conference was held virtually.

Table of Contents


Machine Learning - Advances in Machine Learning Theory

Towards Robust General Medical Image Segmentation

The reliability of Deep Learning systems depends on their accuracy but also on their robustness against adversarial perturbations to the input data. Several attacks and defenses have been proposed to improve the performance of Deep Neural Networks under the presence of adversarial noise in the natural image domain. However, robustness in computer-aided diagnosis for volumetric data has only been explored for specific tasks and with limited attacks. We propose a new framework to assess the robustness of general medical image segmentation systems. Our contributions are two-fold: (i) we propose a new benchmark to evaluate robustness in the context of the Medical Segmentation Decathlon (MSD) by extending the recent AutoAttack natural image classification framework to the domain of volumetric data segmentation, and (ii) we present a novel lattice architecture for RObust Generic medical image segmentation (ROG). Our results show that ROG is capable of generalizing across different tasks of the MSD and largely surpasses the state-of-the-art under sophisticated adversarial attacks.

Laura Daza, Juan C. Pérez, Pablo Arbeláez
Joint Motion Correction and Super Resolution for Cardiac Segmentation via Latent Optimisation

In cardiac magnetic resonance (CMR) imaging, a 3D high-resolution segmentation of the heart is essential for detailed description of its anatomical structures. However, due to the limit of acquisition duration and respiratory/cardiac motion, stacks of multi-slice 2D images are acquired in clinical routine. The segmentation of these images provides a low-resolution representation of cardiac anatomy, which may contain artefacts caused by motion. Here we propose a novel latent optimisation framework that jointly performs motion correction and super resolution for cardiac image segmentations. Given a low-resolution segmentation as input, the framework accounts for inter-slice motion in cardiac MR imaging and super-resolves the input into a high-resolution segmentation consistent with input. A multi-view loss is incorporated to leverage information from both short-axis view and long-axis view of cardiac imaging. To solve the inverse problem, iterative optimisation is performed in a latent space, which ensures the anatomical plausibility. This alleviates the need of paired low-resolution and high-resolution images for supervised learning. Experiments on two cardiac MR datasets show that the proposed framework achieves high performance, comparable to state-of-the-art super-resolution approaches and with better cross-domain generalisability and anatomical plausibility. The codes are available at .

Shuo Wang, Chen Qin, Nicolò Savioli, Chen Chen, Declan P. O’Regan, Stuart Cook, Yike Guo, Daniel Rueckert, Wenjia Bai
Targeted Gradient Descent: A Novel Method for Convolutional Neural Networks Fine-Tuning and Online-Learning

A convolutional neural network (ConvNet) is usually trained and then tested using images drawn from the same distribution. To generalize a ConvNet to various tasks often requires a complete training dataset that consists of images drawn from different tasks. In most scenarios, it is nearly impossible to collect every possible representative dataset as a priori. The new data may only become available after the ConvNet is deployed in clinical practice. ConvNet, however, may generate artifacts on out-of-distribution testing samples. In this study, we present Targeted Gradient Descent (TGD), a novel fine-tuning method that can extend a pre-trained network to a new task without revisiting data from the previous task while preserving the knowledge acquired from previous training. To a further extent, the proposed method also enables online learning of patient-specific data. The method is built on the idea of reusing a pre-trained ConvNet’s redundant kernels to learn new knowledge. We compare the performance of TGD to several commonly used training approaches on the task of Positron emission tomography (PET) image denoising. Results from clinical images show that TGD generated results on par with training-from-scratch while significantly reducing data preparation and network training time. More importantly, it enables online learning on the testing study to enhance the network’s generalization capability in real-world applications.

Junyu Chen, Evren Asma, Chung Chan
A Hierarchical Feature Constraint to Camouflage Medical Adversarial Attacks

Deep neural networks for medical images are extremely vulnerable to adversarial examples (AEs), which poses security concerns on clinical decision-making. Recent findings have shown that existing medical AEs are easy to detect in feature space. To better understand this phenomenon, we thoroughly investigate the characteristic of traditional medical AEs in feature space. Specifically, we first perform a stress test to reveal the vulnerability of medical images and compare them to natural images. Then, we theoretically prove that the existing adversarial attacks manipulate the prediction by continuously optimizing the vulnerable representations in a fixed direction, leading to outlier representations in feature space. Interestingly, we find this vulnerability is a double-edged sword that can be exploited to help hide AEs in the feature space. We propose a novel hierarchical feature constraint (HFC) as an add-on to existing white-box attacks, which encourages hiding the adversarial representation in the normal feature distribution. We evaluate the proposed method on two public medical image datasets, namely Fundoscopy and Chest X-Ray. Experimental results demonstrate the superiority of our HFC as it bypasses an array of state-of-the-art adversarial medical AEs detector more efficiently than competing adaptive attacks. Our code is available at .

Qingsong Yao, Zecheng He, Yi Lin, Kai Ma, Yefeng Zheng, S. Kevin Zhou
Group Shift Pointwise Convolution for Volumetric Medical Image Segmentation

Recent studies have witnessed the effectiveness of 3D convolutions on segmenting volumetric medical images. Compared with the 2D counterparts, 3D convolutions can capture the spatial context in three dimensions. Nevertheless, models employing 3D convolutions introduce more trainable parameters and are more computationally complex, which may lead easily to model overfitting especially for medical applications with limited available training data. This paper aims to improve the effectiveness and efficiency of 3D convolutions by introducing a novel Group Shift Pointwise Convolution (GSP-Conv). GSP-Conv simplifies 3D convolutions into pointwise ones with $$1\times 1\times 1$$ 1 × 1 × 1 kernels, which dramatically reduces the number of model parameters and FLOPs (e.g. $$27\times $$ 27 × fewer than 3D convolutions with $$3\times 3\times 3$$ 3 × 3 × 3 kernels). Naïve pointwise convolutions with limited receptive fields cannot make full use of the spatial image context. To address this problem, we propose a parameter-free operation, Group Shift (GS), which shifts the feature maps along different spatial directions in an elegant way. With GS, pointwise convolutions can access features from different spatial locations, and the limited receptive fields of pointwise convolutions can be compensated. We evaluate the proposed method on two datasets, PROMISE12 and BraTS18. Results show that our method, with substantially decreased model complexity, achieves comparable or even better performance than models employing 3D convolutions.

Junjun He, Jin Ye, Cheng Li, Diping Song, Wanli Chen, Shanshan Wang, Lixu Gu, Yu Qiao

Machine Learning - Attention Models

UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation

Transformer architecture has emerged to be successful in a number of natural language processing tasks. However, its applications to medical vision remain largely unexplored. In this study, we present UTNet, a simple yet powerful hybrid Transformer architecture that integrates self-attention into a convolutional neural network for enhancing medical image segmentation. UTNet applies self-attention modules in both encoder and decoder for capturing long-range dependency at different scales with minimal overhead. To this end, we propose an efficient self-attention mechanism along with relative position encoding that reduces the complexity of self-attention operation significantly from $$O(n^2)$$ O ( n 2 ) to approximate O(n). A new self-attention decoder is also proposed to recover fine-grained details from the skipped connections in the encoder. Our approach addresses the dilemma that Transformer requires huge amounts of data to learn vision inductive bias. Our hybrid layer design allows the initialization of Transformer into convolutional networks without a need of pre-training. We have evaluated UTNet on the multi-label, multi-vendor cardiac magnetic resonance imaging cohort. UTNet demonstrates superior segmentation performance and robustness against the state-of-the-art approaches, holding the promise to generalize well on other medical image segmentations.

Yunhe Gao, Mu Zhou, Dimitris N. Metaxas
AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation

Recently, medical report generation, which aims to automatically generate a long and coherent descriptive paragraph of a given medical image, has received growing research interests. Different from the general image captioning tasks, medical report generation is more challenging for data-driven neural models. This is mainly due to 1) the serious data bias: the normal visual regions dominate the dataset over the abnormal visual regions, and 2) the very long sequence. To alleviate above two problems, we propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules: 1) AHA module first predicts the disease tags from the input image and then learns the multi-grained visual features by hierarchically aligning the visual regions and disease tags. The acquired disease-grounded visual features can better represent the abnormal regions of the input image, which could alleviate data bias problem; 2) MGT module effectively uses the multi-grained features and Transformer framework to generate the long medical report. The experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets. Moreover, the human evaluation conducted by professional radiologists further proves the effectiveness of our approach.

Di You, Fenglin Liu, Shen Ge, Xiaoxia Xie, Jing Zhang, Xian Wu
Continuous-Time Deep Glioma Growth Models

The ability to estimate how a tumor might evolve in the future could have tremendous clinical benefits, from improved treatment decisions to better dose distribution in radiation therapy. Recent work has approached the glioma growth modeling problem via deep learning and variational inference, thus learning growth dynamics entirely from a real patient data distribution. So far, this approach was constrained to predefined image acquisition intervals and sequences of fixed length, which limits its applicability in more realistic scenarios. We overcome these limitations by extending Neural Processes, a class of conditional generative models for stochastic time series, with a hierarchical multi-scale representation encoding including a spatio-temporal attention mechanism. The result is a learned growth model that can be conditioned on an arbitrary number of observations, and that can produce a distribution of temporally consistent growth trajectories on a continuous time axis. On a dataset of 379 patients, the approach successfully captures both global and finer-grained variations in the images, exhibiting superior performance compared to other learned growth models.

Jens Petersen, Fabian Isensee, Gregor Köhler, Paul F. Jäger, David Zimmerer, Ulf Neuberger, Wolfgang Wick, Jürgen Debus, Sabine Heiland, Martin Bendszus, Philipp Vollmuth, Klaus H. Maier-Hein
Spine-Transformers: Vertebra Detection and Localization in Arbitrary Field-of-View Spine CT with Transformers

In this paper, we address the problem of automatic detection and localization of vertebrae in arbitrary Field-Of-View (FOV) Spine CT. We propose a novel transformers-based 3D object detection method that views automatic detection of vertebrae in arbitrary FOV CT scans as an one-to-one set prediction problem. The main components of the new framework, called Spine-Transformers, are an one-to-one set based global loss that forces unique predictions and a light-weighted transformer architecture equipped with skip connections and learnable positional embeddings for encoder and decoder, respectively. It reasons about the relations of different levels of vertebrae and the global volume context to directly output all vertebrae in parallel. We additionally propose an inscribed sphere-based object detector to replace the regular box-based object detector for a better handling of volume orientation variation. Comprehensive experiments are conducted on two public datasets and one in-house dataset. The experimental results demonstrate the efficacy of the present approach. A reference implementation of our method can be found at: .

Rong Tao, Guoyan Zheng
Multi-view Analysis of Unregistered Medical Images Using Cross-View Transformers

Multi-view medical image analysis often depends on the combination of information from multiple views. However, differences in perspective or other forms of misalignment can make it difficult to combine views effectively, as registration is not always possible. Without registration, views can only be combined at a global feature level, by joining feature vectors after global pooling. We present a novel cross-view transformer method to transfer information between unregistered views at the level of spatial feature maps. We demonstrate this method on multi-view mammography and chest X-ray datasets. On both datasets, we find that a cross-view transformer that links spatial feature maps can outperform a baseline model that joins feature vectors after global pooling.

Gijs van Tulder, Yao Tong, Elena Marchiori

Machine Learning - Domain Adaptation

Stain Mix-Up: Unsupervised Domain Generalization for Histopathology Images

Computational histopathology studies have shown that stain color variations considerably hamper the performance. Stain color variations indicate the slides exhibit greatly different color appearance due to the diversity of chemical stains, staining procedures, and slide scanners. Previous approaches tend to improve model robustness via data augmentation or stain color normalization. However, they still suffer from generalization to new domains with unseen stain colors. In this study, we address the issue of unseen color domain generalization in histopathology images by encouraging the model to adapt varied stain colors. To this end, we propose a novel data augmentation method, stain mix-up, which incorporates the stain colors of unseen domains into training data. Unlike previous mix-up methods employed in computer vision, the proposed method constructs the combination of stain colors without using any label information, hence enabling unsupervised domain generalization. Extensive experiments are conducted and demonstrate that our method is general enough to different tasks and stain methods, including H&E stains for tumor classification and hematological stains for bone marrow cell instance segmentation. The results validate that the proposed stain mix-up can significantly improves the performance on the unseen domains.

Jia-Ren Chang, Min-Sheng Wu, Wei-Hsiang Yu, Chi-Chung Chen, Cheng-Kung Yang, Yen-Yu Lin, Chao-Yuan Yeh
A Unified Hyper-GAN Model for Unpaired Multi-contrast MR Image Translation

Cross-contrast image translation is an important task for completing missing contrasts in clinical diagnosis. However, most existing methods learn separate translator for each pair of contrasts, which is inefficient due to many possible contrast pairs in real scenarios. In this work, we propose a unified Hyper-GAN model for effectively and efficiently translating between different contrast pairs. Hyper-GAN consists of a pair of hyper-encoder and hyper-decoder to first map from the source contrast to a common feature space, and then further map to the target contrast image. To facilitate the translation between different contrast pairs, contrast-modulators are designed to tune the hyper-encoder and hyper-decoder adaptive to different contrasts. We also design a common space loss to enforce that multi-contrast images of a subject share a common feature space, implicitly modeling the shared underlying anatomical structures. Experiments on two datasets of IXI and BraTS 2019 show that our Hyper-GAN achieves state-of-the-art results in both accuracy and efficiency, e.g., improving more than 1.47 and 1.09 dB in PSNR on two datasets with less than half the amount of parameters.

Heran Yang, Jian Sun, Liwei Yang, Zongben Xu
Generative Self-training for Cross-Domain Unsupervised Tagged-to-Cine MRI Synthesis

Self-training based unsupervised domain adaptation (UDA) has shown great potential to address the problem of domain shift, when applying a trained deep learning model in a source domain to unlabeled target domains. However, while the self-training UDA has demonstrated its effectiveness on discriminative tasks, such as classification and segmentation, via the reliable pseudo-label selection based on the softmax discrete histogram, the self-training UDA for generative tasks, such as image synthesis, is not fully investigated. In this work, we propose a novel generative self-training (GST) UDA framework with continuous value prediction and regression objective for cross-domain image synthesis. Specifically, we propose to filter the pseudo-label with an uncertainty mask, and quantify the predictive confidence of generated images with practical variational Bayes learning. The fast test-time adaptation is achieved by a round-based alternative optimization scheme. We validated our framework on the tagged-to-cine magnetic resonance imaging (MRI) synthesis problem, where datasets in the source and target domains were acquired from different scanners or centers. Extensive validations were carried out to verify our framework against popular adversarial training UDA methods. Results show that our GST, with tagged MRI of test subjects in new target domains, improved the synthesis quality by a large margin, compared with the adversarial training UDA methods.

Xiaofeng Liu, Fangxu Xing, Maureen Stone, Jiachen Zhuo, Timothy Reese, Jerry L. Prince, Georges El Fakhri, Jonghye Woo
Cooperative Training and Latent Space Data Augmentation for Robust Medical Image Segmentation

Deep learning-based segmentation methods are vulnerable to unforeseen data distribution shifts during deployment, e.g. change of image appearances or contrasts caused by different scanners, unexpected imaging artifacts etc. In this paper, we present a cooperative framework for training image segmentation models and a latent space augmentation method for generating hard examples. Both contributions improve model generalization and robustness with limited data. The cooperative training framework consists of a fast-thinking network (FTN) and a slow-thinking network (STN). The FTN learns decoupled image features and shape features for image reconstruction and segmentation tasks. The STN learns shape priors for segmentation correction and refinement. The two networks are trained in a cooperative manner. The latent space augmentation generates challenging examples for training by masking the decoupled latent space in both channel-wise and spatial-wise manners. We performed extensive experiments on public cardiac imaging datasets. Using only 10 subjects from a single site for training, we demonstrated improved cross-site segmentation performance, and increased robustness against various unforeseen imaging artifacts compared to strong baseline methods. Particularly, cooperative training with latent space data augmentation yields 15% improvement in terms of average Dice score when compared to a standard training method.

Chen Chen, Kerstin Hammernik, Cheng Ouyang, Chen Qin, Wenjia Bai, Daniel Rueckert
Controllable Cardiac Synthesis via Disentangled Anatomy Arithmetic

Acquiring annotated data at scale with rare diseases or conditions remains a challenge. It would be extremely useful to have a method that controllably synthesizes images that can correct such underrepresentation. Assuming a proper latent representation, the idea of a “latent vector arithmetic” could offer the means of achieving such synthesis. A proper representation must encode the fidelity of the input data, preserve invariance and equivariance, and permit arithmetic operations. Motivated by the ability to disentangle images into spatial anatomy (tensor) factors and accompanying imaging (vector) representations, we propose a framework termed “disentangled anatomy arithmetic”, in which a generative model learns to combine anatomical factors of different input images such that when they are re-entangled with the desired imaging modality (e.g. MRI), plausible new cardiac images are created with the target characteristics. To encourage a realistic combination of anatomy factors after the arithmetic step, we propose a localized noise injection network that precedes the generator. Our model is used to generate realistic images, pathology labels, and segmentation masks that are used to augment the existing datasets and subsequently improve post-hoc classification and segmentation tasks. Code is publicly available at .

Spyridon Thermos, Xiao Liu, Alison O’Neil, Sotirios A. Tsaftaris
CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation

Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation. The convolutional operations used in these networks, however, inevitably have limitations in modeling the long-range dependency due to their inductive bias of locality and weight sharing. Although Transformer was born to address this issue, it suffers from extreme computational and spatial complexities in processing high-resolution 3D feature maps. In this paper, we propose a novel framework that efficiently bridges a Convolutional neural network and a Transformer (CoTr) for accurate 3D medical image segmentation. Under this framework, the CNN is constructed to extract feature representations and an efficient deformable Transformer (DeTrans) is built to model the long-range dependency on the extracted feature maps. Different from the vanilla Transformer which treats all image positions equally, our DeTrans pays attention only to a small set of key positions by introducing the deformable self-attention mechanism. Thus, the computational and spatial complexities of DeTrans have been greatly reduced, making it possible to process the multi-scale and high-resolution feature maps, which are usually of paramount importance for image segmentation. We conduct an extensive evaluation on the Multi-Atlas Labeling Beyond the Cranial Vault (BCV) dataset that covers 11 major human organs. The results indicate that our CoTr leads to a substantial performance improvement over other CNN-based, transformer-based, and hybrid methods on the 3D multi-organ segmentation task. Code is available at: .

Yutong Xie, Jianpeng Zhang, Chunhua Shen, Yong Xia
Harmonization with Flow-Based Causal Inference

Heterogeneity in medical data, e.g., from data collected at different sites and with different protocols in a clinical study, is a fundamental hurdle for accurate prediction using machine learning models, as such models often fail to generalize well. This paper leverages a recently proposed normalizing-flow-based method to perform counterfactual inference upon a structural causal model (SCM), in order to achieve harmonization of such data. A causal model is used to model observed effects (brain magnetic resonance imaging data) that result from known confounders (site, gender and age) and exogenous noise variables. Our formulation exploits the bijection induced by flow for the purpose of harmonization. We infer the posterior of exogenous variables, intervene on observations, and draw samples from the resultant SCM to obtain counterfactuals. This approach is evaluated extensively on multiple, large, real-world medical datasets and displayed better cross-domain generalization compared to state-of-the-art algorithms. Further experiments that evaluate the quality of confounder-independent data generated by our model using regression and classification tasks are provided.

Rongguang Wang, Pratik Chaudhari, Christos Davatzikos
Uncertainty-Aware Label Rectification for Domain Adaptive Mitochondria Segmentation

Mitochondria segmentation from electron microscopy images has seen great progress, especially for learning-based methods. However, since the learning of model requires massive annotations, it is time and labour expensive to learn a specific model for each acquired dataset. On the other hand, it is challenging to generalize a learned model to datasets of unknown species or those acquired by unknown devices, mainly due to the difference of data distributions. In this paper, we study unsupervised domain adaptation to enhance the generalization capacity, where no annotation for target datasets is required. We start from an effective solution, which learns the target data distribution with pseudo labels predicted by a source-domain model. However, the obtained pseudo labels are usually noisy due to the domain gap. To address this issue, we propose an uncertainty-aware model to rectify noisy labels. Specifically, we insert Monte-Carlo dropout layers to a UNet backbone, where the uncertainty is measured by the standard deviation of predictions. Experiments on MitoEM and FAFB datasets demonstrate the superior performance of proposed model, in terms of the adaptations between different species and acquisition devices.

Siqi Wu, Chang Chen, Zhiwei Xiong, Xuejin Chen, Xiaoyan Sun
Semantic Consistent Unsupervised Domain Adaptation for Cross-Modality Medical Image Segmentation

Unsupervised domain adaptation (UDA) for cross-modality medical image segmentation has shown great progress by domain-invariant feature learning or image appearance translation. Feature-level adaptation based methods learn good domain-invariant features in classification tasks but usually cannot detect domain shift at the pixel level and are not able to achieve good results in dense semantic segmentation tasks. Image appearance adaptation based methods translate images into different styles with good appearance, but semantic consistency is hard to maintain and results in poor cross-modality segmentation. In this paper, we propose intra- and cross-modality semantic consistency (ICMSC) for UDA and our key insight is that the segmentation of synthesised images in different styles should be consistent. Specifically, our model consists of an image translation module and a domain-specific segmentation module. The image translation module is a standard CycleGAN, while the segmentation module contains two domain-specific segmentation networks. The intra-modality semantic consistency (IMSC) forces the reconstructed image after a cycle to be segmented in the same way as the original input image, while the cross-modality semantic consistency (CMSC) encourages the synthesised images after translation to be segmented exactly the same as before translation. Comprehensive experiments on two different datasets (cardiac and hip) demonstrate that our proposed method outperforms other UDA state-of-the-art methods by a large margin.

Guodong Zeng, Till D. Lerch, Florian Schmaranzer, Guoyan Zheng, Jürgen Burger, Kate Gerber, Moritz Tannast, Klaus Siebenrock, Nicolas Gerber
Anatomy of Domain Shift Impact on U-Net Layers in MRI Segmentation

Domain Adaptation (DA) methods are widely used in medical image segmentation tasks to tackle the problem of differently distributed train (source) and test (target) data. We consider the supervised DA task with a limited number of annotated samples from the target domain. It corresponds to one of the most relevant clinical setups: building a sufficiently accurate model on the minimum possible amount of annotated data. Existing methods mostly fine-tune specific layers of the pretrained Convolutional Neural Network (CNN). However, there is no consensus on which layers are better to fine-tune, e.g. the first layers for images with low-level domain shift or the deeper layers for images with high-level domain shift. To this end, we propose SpotTUnet – a CNN architecture that automatically chooses the layers which should be optimally fine-tuned. More specifically, on the target domain, our method additionally learns the policy that indicates whether a specific layer should be fine-tuned or reused from the pretrained network. We show that our method performs at the same level as the best of the non-flexible fine-tuning methods even under the extreme scarcity of annotated data. Secondly, we show that SpotTUnet policy provides a layer-wise visualization of the domain shift impact on the network, which could be further used to develop robust domain generalization methods. In order to extensively evaluate SpotTUnet performance, we use a publicly available dataset of brain MR images (CC359), characterized by explicit domain shift. We release a reproducible experimental pipeline ( ).

Ivan Zakazov, Boris Shirokikh, Alexey Chernyavskiy, Mikhail Belyaev
FoldIt: Haustral Folds Detection and Segmentation in Colonoscopy Videos

Haustral folds are colon wall protrusions implicated for high polyp miss rate during optical colonoscopy procedures. If segmented accurately, haustral folds can allow for better estimation of missed surface and can also serve as valuable landmarks for registering pre-treatment virtual (CT) and optical colonoscopies, to guide navigation towards the anomalies found in pre-treatment scans. We present a novel generative adversarial network, FoldIt, for feature-consistent image translation of optical colonoscopy videos to virtual colonoscopy renderings with haustral fold overlays. A new transitive loss is introduced in order to leverage ground truth information between haustral fold annotations and virtual colonoscopy renderings. We demonstrate the effectiveness of our model on real challenging optical colonoscopy videos as well as on textured virtual colonoscopy videos with clinician-verified haustral fold annotations. All code and scripts to reproduce the experiments of this paper will be made available via our Computational Endoscopy Platform at .

Shawn Mathew, Saad Nadeem, Arie Kaufman
Reference-Relation Guided Autoencoder with Deep CCA Restriction for Awake-to-Sleep Brain Functional Connectome Prediction

The difficulty of acquiring resting-state fMRI of early developing children under the same condition leads to a dedicated protocol, i.e., scanning younger infants during sleep and older children during being awake, respectively. However, the obviously different brain activities of sleep and awake states arouse a new challenge of awake-to-sleep connectome prediction/translation, which remains unexplored despite its importance in the longitudinally-consistent delineation of brain functional development. Due to the data scarcity and huge differences between natural images and geometric data (e.g., brain connectome), existing methods tailored for image translation generally fail in predicting functional connectome from awake to sleep. To fill this critical gap, we unprecedentedly propose a novel reference-relation guided autoencoder with deep CCA restriction (R2AE-dCCA) for awake-to-sleep connectome prediction. Specifically, 1) A reference-autoencoder (RAE) is proposed to realize a guided generation from the source domain to the target domain. The limited paired data are thus greatly augmented by including the combinations of all the age-restricted neighboring subjects as the references, while the target-specific pattern is fully learned; 2) A relation network is then designed and embedded into RAE, which utilizes the similarity in the source domain to determine the belief-strength of the reference during prediction; 3) To ensure that the learned relation in the source domain can effectively guide the generation in the target domain, a deep CCA restriction is further employed to maintain the neighboring relation during translation; 4) New validation metrics dedicated for connectome prediction are also proposed. Experimental results showed that our proposed R2AE-dCCA produces better prediction accuracy and well maintains the modular structure of brain functional connectome in comparison with state-of-the-art methods.

Dan Hu, Weiyan Yin, Zhengwang Wu, Liangjun Chen, Li Wang, Weili Lin, Gang Li, UNC/UMN Baby Connectome Project Consortium
Domain Composition and Attention for Unseen-Domain Generalizable Medical Image Segmentation

Domain generalizable model is attracting increasing attention in medical image analysis since data is commonly acquired from different institutes with various imaging protocols and scanners. To tackle this challenging domain generalization problem, we propose a Domain Composition and Attention-based network (DCA-Net) to improve the ability of domain representation and generalization. First, we present a domain composition method that represents one certain domain by a linear combination of a set of basis representations (i.e., a representation bank). Second, a novel plug-and-play parallel domain preceptor is proposed to learn these basis representations and we introduce a divergence constraint function to encourage the basis representations are as divergent as possible. Then, a domain attention module is proposed to learn the linear combination coefficients of the basis representations. The result of liner combination is used to calibrate the feature maps of an input image, which enables the model to generalize to different and even unseen domains. We validate our method on public prostate MRI dataset acquired from six different institutions with apparent domain shift. Experimental results show that our proposed model can generalizes well on different and even unseen domains and it outperforms state-of-the-art methods on the multi-domain prostate segmentation task. Code is available at .

Ran Gu, Jingyang Zhang, Rui Huang, Wenhui Lei, Guotai Wang, Shaoting Zhang
Fully Test-Time Adaptation for Image Segmentation

When adopting a model from the source domain to the target domain, its performance usually degrades due to the domain shift problem. In clinical practice, the source data usually cannot be accessed during adaptation for privacy policy and the label for the target domain is in shortage because of the high cost of professional labeling. Therefore, it is worth considering how to efficiently adopt a pretrained model with only unlabeled data from the target domain. In this paper, we propose a novel fully test-time unsupervised adaptation method for image segmentation based on Regional Nuclear-norm (RN) and Contour Regularization (CR). The RN loss is specially designed for segmentation tasks to efficiently improve discriminability and diversity of prediction. The CR loss constrains the continuity and connectivity to enhance the relevance between pixels and their neighbors. Instead of retraining all parameters, we modify only the parameters in batch normalization layers with only a few epochs. We demonstrate the effectiveness and efficiency of the proposed method in the pancreas and liver segmentation dataset from the Medical Segmentation Decathlon and CHAOS challenge.

Minhao Hu, Tao Song, Yujun Gu, Xiangde Luo, Jieneng Chen, Yinan Chen, Ya Zhang, Shaoting Zhang
OLVA: Optimal Latent Vector Alignment for Unsupervised Domain Adaptation in Medical Image Segmentation

This paper addresses the domain shift problem for segmentation. As a solution, we propose OLVA, a novel and lightweight unsupervised domain adaptation method based on a Variational Auto-Encoder (VAE) and Optimal Transport (OT) theory. Thanks to the VAE, our model learns a shared cross-domain latent space that follows a normal distribution, which reduces the domain shift. To guarantee valid segmentations, our shared latent space is designed to model the shape rather than the intensity variations. We further rely on an OT loss to match and align the remaining discrepancy between the two domains in the latent space. We demonstrate OLVA’s effectiveness for the segmentation of multiple cardiac structures on the public Multi-Modality Whole Heart Segmentation (MM-WHS) dataset, where the source domain consists of annotated 3D MR images and the unlabelled target domain of 3D CTs. Our results show remarkable improvements with an additional margin of $$12.5\%$$ 12.5 % dice score over concurrent generative training approaches.

Dawood Al Chanti, Diana Mateus
Prototypical Interaction Graph for Unsupervised Domain Adaptation in Surgical Instrument Segmentation

Surgical instrument segmentation is fundamental for the advanced computer-assisted system. The variability of the surgical scene, a major obstacle in this task, leads to the domain shift problem. Unsupervised domain adaptation (UDA) technique can be employed to solve this problem and adapt the model to various surgical scenarios. However, existing UDA methods ignore the relationship among different categories, hindering the model learning discriminative features from a global view. Additionally, the adversarial strategy utilized in these methods only narrows down the domain gap at the end of the network, leading to the poor feature alignment. To tackle above mentioned problems, we advance a semantic-prototype interaction graph (SePIG) framework for surgical instrument type segmentation to grasp the category-level relationship and further align the feature distribution. The proposed framework consists of prototypical inner-interaction graph (PI-Graph) and prototypical cross-interaction graph (PC-Graph). In PI-Graph, EM-Grouping module is designed to generate multi-prototypes representing the semantic information adequately. Then, propagation is performed upon these multi-prototypes to communicate semantic information inner each domain. Aiming at narrowing down the domain gaps, the PC-Graph constructs hierarchical graphs upon multi-prototypes and category centers, and conducts dynamic reasoning to exchange the correlated information among two domains. Extensive experiments on the EndoVis Instrument Segmentation 2017 $$\rightarrow $$ → 2018 scenarios demonstrate the superiority of our SePIG framework compared with state-of-the-art methods. Code is available at .

Jie Liu, Xiaoqing Guo, Yixuan Yuan
Unsupervised Domain Adaptation for Small Bowel Segmentation Using Disentangled Representation

We present a novel unsupervised domain adaptation method for small bowel segmentation based on feature disentanglement. To make the domain adaptation more controllable, we disentangle intensity and non-intensity features within a unique two-stream auto-encoding architecture, and selectively adapt the non-intensity features that are believed to be more transferable across domains. The segmentation prediction is performed by aggregating the disentangled features. We evaluated our method using intravenous contrast-enhanced abdominal CT scans with and without oral contrast, which are used as source and target domains, respectively. The proposed method showed clear improvements in terms of three different metrics compared to other domain adaptation methods that are without the feature disentanglement. The method brings small bowel segmentation closer to clinical application.

Seung Yeon Shin, Sungwon Lee, Ronald M. Summers
Data-Driven Mapping Between Functional Connectomes Using Optimal Transport

Functional connectomes derived from functional magnetic resonance imaging have long been used to understand the functional organization of the brain. Nevertheless, a connectome is intrinsically linked to the atlas used to create it. In other words, a connectome generated from one atlas is different in scale and resolution compared to a connectome generated from another atlas. Being able to map connectomes and derived results between different atlases without additional pre-processing is a crucial step in improving interpretation and generalization between studies that use different atlases. Here, we use optimal transport, a powerful mathematical technique, to find an optimum mapping between two atlases. This mapping is then used to transform time series from one atlas to another in order to reconstruct a connectome. We validate our approach by comparing transformed connectomes against their “gold-standard” counterparts (i.e., connectomes generated directly from an atlas) and demonstrate the utility of transformed connectomes by applying these connectomes to predictive models based on a different atlas. We show that these transformed connectomes are significantly similar to their “gold-standard” counterparts and maintain individual differences in brain-behavior associations, demonstrating both the validity of our approach and its utility in downstream analyses. Overall, our approach is a promising avenue to increase the generalization of connectome-based results across different atlases.

Javid Dadashkarimi, Amin Karbasi, Dustin Scheinost
EndoUDA: A Modality Independent Segmentation Approach for Endoscopy Imaging

Gastrointestinal (GI) cancer precursors require frequent monitoring for risk stratification of patients. Automated segmentation methods can help to assess risk areas more accurately, and assist in therapeutic procedures or even removal. In clinical practice, addition to the conventional white-light imaging (WLI), complimentary modalities such as narrow-band imaging (NBI) and fluorescence imaging are used. While, today most segmentation approaches are supervised and only concentrated on a single modality dataset, this work exploits to use a target-independent unsupervised domain adaptation (UDA) technique that is capable to generalize to an unseen target modality. In this context, we propose a novel UDA-based segmentation method that couples the variational autoencoder and U-Net with a common EfficientNet-B4 backbone, and uses a joint loss for latent-space optimization for target samples. We show that our model can generalize to unseen target NBI (target) modality when trained using only WLI (source) modality. Our experiments on both upper and lower GI endoscopy data show the effectiveness of our approach compared to naive supervised approach and state-of-the-art UDA segmentation methods.

Numan Celik, Sharib Ali, Soumya Gupta, Barbara Braden, Jens Rittscher
Style Transfer Using Generative Adversarial Networks for Multi-site MRI Harmonization

Large data initiatives and high-powered brain imaging analyses require the pooling of MR images acquired across multiple scanners, often using different protocols. Prospective cross-site harmonization often involves the use of a phantom or traveling subjects. However, as more datasets are becoming publicly available, there is a growing need for retrospective harmonization, pooling data from sites not originally coordinated together. Several retrospective harmonization techniques have shown promise in removing cross-site image variation. However, most unsupervised methods cannot distinguish between image-acquisition based variability and cross-site population variability, so they require that datasets contain subjects or patient groups with similar clinical or demographic information. To overcome this limitation, we consider cross-site MRI image harmonization as a style transfer problem rather than a domain transfer problem. Using a fully unsupervised deep-learning framework based on a generative adversarial network (GAN), we show that MR images can be harmonized by inserting the style information encoded from a reference image directly, without knowing their site/scanner labels a priori. We trained our model using data from five large-scale multi-site datasets with varied demographics. Results demonstrated that our style-encoding model can harmonize MR images, and match intensity profiles, successfully, without relying on traveling subjects. This model also avoids the need to control for clinical, diagnostic, or demographic information. Moreover, we further demonstrated that if we included diverse enough images into the training set, our method successfully harmonized MR images collected from unseen scanners and protocols, suggesting a promising novel tool for ongoing collaborative studies.

Mengting Liu, Piyush Maiti, Sophia Thomopoulos, Alyssa Zhu, Yaqiong Chai, Hosung Kim, Neda Jahanshad

Machine Learning - Federated Learning

Federated Semi-supervised Medical Image Classification via Inter-client Relation Matching

Federated learning (FL) has emerged with increasing popularity to collaborate distributed medical institutions for training deep networks. However, despite existing FL algorithms only allow the supervised training setting, most hospitals in realistic usually cannot afford the intricate data labeling due to absence of budget or expertise. This paper studies a practical yet challenging FL problem, named Federated Semi-supervised Learning (FSSL), which aims to learn a federated model by jointly utilizing the data from both labeled and unlabeled clients (i.e., hospitals). We present a novel approach for this problem, which improves over traditional consistency regularization mechanism with a new inter-client relation matching scheme. The proposed learning scheme explicitly connects the learning across labeled and unlabeled clients by aligning their extracted disease relationships, thereby mitigating the deficiency of task knowledge at unlabeled clients and promoting discriminative information from unlabeled samples. We validate our method on two large-scale medical image classification datasets. The effectiveness of our method has been demonstrated with the clear improvements over state-of-the-arts as well as the thorough ablation analysis on both tasks (Code will be made available at ).

Quande Liu, Hongzheng Yang, Qi Dou, Pheng-Ann Heng
FedPerl: Semi-supervised Peer Learning for Skin Lesion Classification

Skin cancer is one of the most deadly cancers worldwide. Yet, it can be reduced by early detection. Recent deep-learning methods have shown a dermatologist-level performance in skin cancer classification. Yet, this success demands a large amount of centralized data, which is oftentimes not available. Federated learning has been recently introduced to train machine learning models in a privacy-preserved distributed fashion demanding annotated data at the clients, which is usually expensive and not available, especially in the medical field. To this end, we propose $$\texttt {FedPerl}$$ FedPerl , a semi-supervised federated learning method that utilizes peer learning from social sciences and ensemble averaging from committee machines to build communities and encourage its members to learn from each other such that they produce more accurate pseudo labels. We also propose the peer anonymization (PA) technique as a core component of $$\texttt {FedPerl}$$ FedPerl . PA preserves privacy and reduces the communication cost while maintaining the performance without additional complexity. We validated our method on 38,000 skin lesion images collected from 4 publicly available datasets. $$\texttt {FedPerl}$$ FedPerl achieves superior performance over the baselines and state-of-the-art $$\texttt {SSFL}$$ SSFL by 15.8%, and 1.8% respectively. Further, $$\texttt {FedPerl}$$ FedPerl shows less sensitivity to noisy clients ( ).

Tariq Bdair, Nassir Navab, Shadi Albarqouni
Personalized Retrogress-Resilient Framework for Real-World Medical Federated Learning

Nowadays, deep learning methods with large-scale datasets can produce clinically useful models for computer-aided diagnosis. However, the privacy and ethical concerns are increasingly critical, which make it difficult to collect large quantities of data from multiple institutions. Federated Learning (FL) provides a promising decentralized solution to train model collaboratively by exchanging client models instead of private data. However, the server aggregation of existing FL methods is observed to degrade the model performance in real-world medical FL setting, which is termed as retrogress. To address this problem, we propose a personalized retrogress-resilient framework to produce a superior personalized model for each client. Specifically, we devise a Progressive Fourier Aggregation (PFA) at the server to achieve more stable and effective global knowledge gathering by integrating client models from low-frequency to high-frequency gradually. Moreover, with an introduced deputy model to receive the aggregated server model, we design a Deputy-Enhanced Transfer (DET) strategy at the client and conduct three steps of Recover-Exchange-Sublimate to ameliorate the personalized local model by transferring the global knowledge smoothly. Extensive experiments on real-world dermoscopic FL dataset prove that our personalized retrogress-resilient framework outperforms state-of-the-art FL methods, as well as the generalization on an out-of-distribution cohort. The code and dataset are available at .

Zhen Chen, Meilu Zhu, Chen Yang, Yixuan Yuan
Federated Whole Prostate Segmentation in MRI with Personalized Neural Architectures

Building robust deep learning-based models requires diverse training data, ideally from several sources. However, these datasets cannot be combined easily because of patient privacy concerns or regulatory hurdles, especially if medical data is involved. Federated learning (FL) is a way to train machine learning models without the need for centralized datasets. Each FL client trains on their local data while only sharing model parameters with a global server that aggregates the parameters from all clients. At the same time, each client’s data can exhibit differences and inconsistencies due to the local variation in the patient population, imaging equipment, and acquisition protocols. Hence, the federated learned models should be able to adapt to the local particularities of a client’s data. In this work, we combine FL with an AutoML technique based on local neural architecture search by training a “supernet”. Furthermore, we propose an adaptation scheme to allow for personalized model architectures at each FL client’s site. The proposed method is evaluated on four different datasets from 3D prostate MRI and shown to improve the local models’ performance after adaptation through selecting an optimal path through the AutoML supernet.

Holger R. Roth, Dong Yang, Wenqi Li, Andriy Myronenko, Wentao Zhu, Ziyue Xu, Xiaosong Wang, Daguang Xu
Federated Contrastive Learning for Volumetric Medical Image Segmentation

Supervised deep learning needs a large amount of labeled data to achieve high performance. However, in medical imaging analysis, each site may only have a limited amount of data and labels, which makes learning ineffective. Federated learning (FL) can help in this regard by learning a shared model while keeping training data local for privacy. Traditional FL requires fully-labeled data for training, which is inconvenient or sometimes infeasible to obtain due to high labeling cost and the requirement of expertise. Contrastive learning (CL), as a self-supervised learning approach, can effectively learn from unlabeled data to pre-train a neural network encoder, followed by fine-tuning for downstream tasks with limited annotations. However, when adopting CL in FL, the limited data diversity on each client makes federated contrastive learning (FCL) ineffective. In this work, we propose an FCL framework for volumetric medical image segmentation with limited annotations. More specifically, we exchange the features in the FCL pre-training process such that diverse contrastive data are provided to each site for effective local CL while keeping raw data private. Based on the exchanged features, global structural matching further leverages the structural similarity to align local features to the remote ones such that a unified feature space can be learned among different sites. Experiments on a cardiac MRI dataset show the proposed framework substantially improves the segmentation performance compared with state-of-the-art techniques.

Yawen Wu, Dewen Zeng, Zhepeng Wang, Yiyu Shi, Jingtong Hu
Federated Contrastive Learning for Decentralized Unlabeled Medical Images

A label-efficient paradigm in computer vision is based on self-supervised contrastive pre-training on unlabeled data followed by fine-tuning with a small number of labels. Making practical use of a federated computing environment in the clinical domain and learning on medical images poses specific challenges. In this work, we propose FedMoCo, a robust federated contrastive learning (FCL) framework, which makes efficient use of decentralized unlabeled medical data. FedMoCo has two novel modules: metadata transfer, an inter-node statistical data augmentation module, and self-adaptive aggregation, an aggregation module based on representational similarity analysis. To the best of our knowledge, this is the first FCL work on medical images. Our experiments show that FedMoCo can consistently outperform FedAvg, a seminal federated learning framework, in extracting meaningful representations for downstream tasks. We further show that FedMoCo can substantially reduce the amount of labeled data required in a downstream task, such as COVID-19 detection, to achieve a reasonable performance.

Nanqing Dong, Irina Voiculescu

Machine Learning - Interpretability/Explainability

Explaining COVID-19 and Thoracic Pathology Model Predictions by Identifying Informative Input Features

Neural networks have demonstrated remarkable performance in classification and regression tasks on chest X-rays. In order to establish trust in the clinical routine, the networks’ prediction mechanism needs to be interpretable. One principal approach to interpretation is feature attribution. Feature attribution methods identify the importance of input features for the output prediction. Building on Information Bottleneck Attribution (IBA) method, for each prediction we identify the chest X-ray regions that have high mutual information with the network’s output. Original IBA identifies input regions that have sufficient predictive information. We propose Inverse IBA to identify all informative regions. Thus all predictive cues for pathologies are highlighted on the X-rays, a desirable property for chest X-ray diagnosis. Moreover, we propose Regression IBA for explaining regression models. Using Regression IBA we observe that a model trained on cumulative severity score labels implicitly learns the severity of different X-ray regions. Finally, we propose Multi-layer IBA to generate higher resolution and more detailed attribution/saliency maps. We evaluate our methods using both human-centric (ground-truth-based) interpretability metrics, and human-agnostic feature importance metrics on NIH Chest X-ray8 and BrixIA datasets. The code ( ) is publicly available.

Ashkan Khakzar, Yang Zhang, Wejdene Mansour, Yuezhi Cai, Yawei Li, Yucheng Zhang, Seong Tae Kim, Nassir Navab
Demystifying T1-MRI to FDG-PET Image Translation via Representational Similarity

Recent development of image-to-image translation techniques has enabled the generation of rare medical images (e.g., PET) from common ones (e.g., MRI). Beyond the potential benefits of the reduction in scanning time, acquisition cost, and radiation exposure risks, the translation models in themselves are inscrutable black boxes. In this work, we propose two approaches to demystify the image translation process, where we particularly focus on the T1-MRI to PET translation. First, we adopt the representational similarity analysis and discover that the process of T1-MR to PET image translation includes the stages of brain tissue segmentation and brain region recognition, which unravels the relationship between the structural and functional neuroimaging data. Second, based on our findings, an Explainable and Simplified Image Translation (ESIT) model is proposed to demonstrate the capability of deep learning models for extracting gray matter volume information and identifying brain regions related to normal aging and Alzheimer’s disease, which untangles the biological plausibility hidden in deep learning models.

Chia-Hsiang Kao, Yong-Sheng Chen, Li-Fen Chen, Wei-Chen Chiu
Fairness in Cardiac MR Image Analysis: An Investigation of Bias Due to Data Imbalance in Deep Learning Based Segmentation

The subject of ‘fairness’ in artificial intelligence (AI) refers to assessing AI algorithms for potential bias based on demographic characteristics such as race and gender, and the development of algorithms to address this bias. Most applications to date have been in computer vision, although some work in healthcare has started to emerge. The use of deep learning (DL) in cardiac MR segmentation has led to impressive results in recent years, and such techniques are starting to be translated into clinical practice. However, no work has yet investigated the fairness of such models. In this work, we perform such an analysis for racial/gender groups, focusing on the problem of training data imbalance, using a nnU-Net model trained and evaluated on cine short axis cardiac MR data from the UK Biobank dataset, consisting of 5,903 subjects from 6 different racial groups. We find statistically significant differences in Dice performance between different racial groups. To reduce the racial bias, we investigated three strategies: (1) stratified batch sampling, in which batch sampling is stratified to ensure balance between racial groups; (2) fair meta-learning for segmentation, in which a DL classifier is trained to classify race and jointly optimized with the segmentation model; and (3) protected group models, in which a different segmentation model is trained for each racial group. We also compared the results to the scenario where we have a perfectly balanced database. To assess fairness we used the standard deviation (SD) and skewed error ratio (SER) of the average Dice values. Our results demonstrate that the racial bias results from the use of imbalanced training data, and that all proposed bias mitigation strategies improved fairness, with the best SD and SER resulting from the use of protected group models.

Esther Puyol-Antón, Bram Ruijsink, Stefan K. Piechnik, Stefan Neubauer, Steffen E. Petersen, Reza Razavi, Andrew P. King
An Interpretable Approach to Automated Severity Scoring in Pelvic Trauma

Pelvic ring disruptions result from blunt injury mechanisms and are often found in patients with multi-system trauma. To grade pelvic fracture severity in trauma victims based on whole-body CT, the Tile AO/OTA classification is frequently used. Due to the high volume of whole-body trauma CTs generated in busy trauma centers, an automated approach to Tile classification would provide substantial value, e. g., to prioritize the reading queue of the attending trauma radiologist. In such scenario, an automated method should perform grading based on a transparent process and based on interpretable features to enable interaction with human readers and lower their workload by offering insights from a first automated read of the scan. This paper introduces an automated yet interpretable pelvic trauma decision support system to assist radiologists in fracture detection and Tile grade classification. The method operates similarly to human interpretation of CT scans and first detects distinct pelvic fractures on CT with high specificity using a Faster-RCNN model that are then interpreted using a structural causal model based on clinical best practices to infer an initial Tile grade. The Bayesian causal model and finally, the object detector are then queried for likely co-occurring fractures that may have been rejected initially due to the highly specific operating point of the detector, resulting in an updated list of detected fractures and corresponding final Tile grade. Our method is transparent in that it provides finding location and type using the object detector, as well as information on important counterfactuals that would invalidate the system’s recommendation and achieves an AUC of 83.3%/85.1% for translational/rotational instability. Despite being designed for human-machine teaming, our approach does not compromise on performance compared to previous black-box approaches.

Anna Zapaishchykova, David Dreizin, Zhaoshuo Li, Jie Ying Wu, Shahrooz Faghihroohi, Mathias Unberath
Scalable, Axiomatic Explanations of Deep Alzheimer’s Diagnosis from Heterogeneous Data

Deep Neural Networks (DNNs) have an enormous potential to learn from complex biomedical data. In particular, DNNs have been used to seamlessly fuse heterogeneous information from neuroanatomy, genetics, biomarkers, and neuropsychological tests for highly accurate Alzheimer’s disease diagnosis. On the other hand, their black-box nature is still a barrier for the adoption of such a system in the clinic, where interpretability is absolutely essential. We propose Shapley Value Explanation of Heterogeneous Neural Networks (SVEHNN) for explaining the Alzheimer’s diagnosis made by a DNN from the 3D point cloud of the neuroanatomy and tabular biomarkers. Our explanations are based on the Shapley value, which is the unique method that satisfies all fundamental axioms for local explanations previously established in the literature. Thus, SVEHNN has many desirable characteristics that previous work on interpretability for medical decision making is lacking. To avoid the exponential time complexity of the Shapley value, we propose to transform a given DNN into a Lightweight Probabilistic Deep Network without re-training, thus achieving a complexity only quadratic in the number of features. In our experiments on synthetic and real data, we show that we can closely approximate the exact Shapley value with a dramatically reduced runtime and can reveal the hidden knowledge the network has learned from the data.

Sebastian Pölsterl, Christina Aigner, Christian Wachinger
SPARTA: An Integrated Stability, Discriminability, and Sparsity Based Radiomic Feature Selection Approach

In order to ensure that a radiomics-based machine learning model will robustly generalize to new, unseen data (which may harbor significant variations compared to the discovery cohort), radiomic features are often screened for stability via test/retest or cross-site evaluation. However, as stability screening is often conducted independent of the feature selection process, the resulting feature set may not be simultaneously optimized for discriminability, stability, as well as sparsity. In this work, we present a novel radiomic feature selection approach termed SPARse sTable lAsso (SPARTA), uniquely developed to identify a highly discriminative and sparse set of features which are also stable to acquisition or institution variations. The primary contribution of this work is the integration of feature stability as a generalizable regularization term into a least absolute shrinkage and selection operator (LASSO)-based optimization function. Secondly, we utilize a unique non-convex sparse relaxation approach inspired by proximal algorithms to provide a computationally efficient convergence guarantee for our novel algorithm. SPARTA was evaluated on three different multi-institutional imaging cohorts to identify the most relevant radiomic features for distinguishing: (a) healthy from diseased lesions in 147 prostate cancer patients via T2-weighted MRI, (b) healthy subjects from Crohn’s disease patients via 170 CT enterography scans, and (c) responders and non-responders to chemoradiation in 82 rectal cancer patients via T2w MRI. When compared to 3 state-of-the-art feature selection schemes, features selected via SPARTA yielded significantly higher classifier performance on unseen data in multi-institutional validation (hold-out AUCs of 0.91, 0.91, and 0.93 in the 3 cohorts).

Amir Reza Sadri, Sepideh Azarianpour Esfahani, Prathyush Chirra, Jacob Antunes, Pavithran Pattiam Giriprakash, Patrick Leo, Anant Madabhushi, Satish E. Viswanath
The Power of Proxy Data and Proxy Networks for Hyper-parameter Optimization in Medical Image Segmentation

Deep learning models for medical image segmentation are primarily data-driven. Models trained with more data lead to improved performance and generalizability. However, training is a computationally expensive process because multiple hyper-parameters need to be tested to find the optimal setting for best performance. In this work, we focus on accelerating the estimation of hyper-parameters by proposing two novel methodologies: proxy data and proxy networks. Both can be useful for estimating hyper-parameters more efficiently. We test the proposed techniques on CT and MR imaging modalities using well-known public datasets. In both cases using one dataset for building proxy data and another data source for external evaluation. For CT, the approach is tested on spleen segmentation with two datasets. The first dataset is from the medical segmentation decathlon (MSD), where the proxy data is constructed, the secondary dataset is utilized as an external validation dataset. Similarly, for MR, the approach is evaluated on prostate segmentation where the first dataset is from MSD and the second dataset is PROSTATEx. First, we show higher correlation to using full data for training when testing on the external validation set using smaller proxy data than a random selection of the proxy data. Second, we show that a high correlation exists for proxy networks when compared with the full network on validation Dice score. Third, we show that the proposed approach of utilizing a proxy network can speed up an AutoML framework for hyper-parameter search by 3.3 $$\times $$ × , and by 4.4 $$\times $$ × if proxy data and proxy network are utilized together.

Vishwesh Nath, Dong Yang, Ali Hatamizadeh, Anas A. Abidin, Andriy Myronenko, Holger R. Roth, Daguang Xu
Fighting Class Imbalance with Contrastive Learning

Medical image datasets are hard to collect, expensive to label, and often highly imbalanced. The last issue is underestimated, as typical average metrics hardly reveal that the often very important minority classes have a very low accuracy. In this paper, we address this problem by a feature embedding that balances the classes using contrastive learning as an alternative to the common cross-entropy loss. The approach is largely orthogonal to existing sampling methods and can be easily combined with those. We show on the challenging ISIC2018 and APTOS2019 datasets that the approach improves especially the accuracy of minority classes without negatively affecting the majority ones.

Yassine Marrakchi, Osama Makansi, Thomas Brox
Interpretable Gender Classification from Retinal Fundus Images Using BagNets

Deep neural networks (DNNs) are able to predict a person’s gender from retinal fundus images with high accuracy, even though this task is usually considered hardly possible by ophthalmologists. Therefore, it has been an open question which features allow reliable discrimination between male and female fundus images. To study this question, we used a particular DNN architecture called BagNet, which extracts local features from small image patches and then averages the class evidence across all patches. The BagNet performed on par with the more sophisticated Inception-v3 model, showing that the gender information can be read out from local features alone. BagNets also naturally provide saliency maps, which we used to highlight the most informative patches in fundus images. We found that most evidence was provided by patches from the optic disc and the macula, with patches from the optic disc providing mostly male and patches from the macula providing mostly female evidence. Although further research is needed to clarify the exact nature of this evidence, our results suggest that there are localized structural differences in fundus images between genders. Overall, we believe that BagNets may provide a compelling alternative to the standard DNN architectures also in other medical image analysis tasks, as they do not require post-hoc explainability methods.

Indu Ilanchezian, Dmitry Kobak, Hanna Faber, Focke Ziemssen, Philipp Berens, Murat Seçkin Ayhan
Explainable Classification of Weakly Annotated Wireless Capsule Endoscopy Images Based on a Fuzzy Bag-of-Colour Features Model and Brain Storm Optimization

Wireless capsule endoscopy (WCE) constitutes a medical imaging technology developed for the endoscopic exploration of the gastrointestinal (GI) tract, whereas it provides a more comfortable examination method, in comparison to the conventional endoscopy technologies. In this paper, we propose a novel Explainable Fuzzy Bag-of-Words (XFBoW) feature extraction model, for the classification of weakly annotated WCE images. A comparative advantage of the proposed model over state-of-the-art feature extractors is that it can provide an explainable classification outcome, even with conventional classification schemes, such as Support Vector Machines. The explanations that can be derived are based on the similarity of the image content with the content of the training images, used for the construction of the model. The feature extraction process relies on data clustering and fuzzy sets. Clustering is used to encode the image content into visual words. These words are subsequently used for the formation of fuzzy sets to enable a linguistic characterization of similarities with the training images. A state-of-the-art Brain Storm Optimization algorithm is used as an optimizer to define the most appropriate number of visual words and fuzzy sets and also the fittest parameters of the classifier, in order to optimally classify the WCE images. The training of XFBoW is performed using only image-level, semantic labels instead of detailed, pixel-level annotations. The proposed method is investigated on real datasets that include a variety of GI abnormalities. The results show that XFBoW outperforms several state-of-the-art methods, while providing the advantage of explainability.

Michael Vasilakakis, Georgia Sovatzidi, Dimitris K. Iakovidis
Towards Semantic Interpretation of Thoracic Disease and COVID-19 Diagnosis Models

Convolutional neural networks are showing promise in the automatic diagnosis of thoracic pathologies on chest x-rays. Their black-box nature has sparked many recent works to explain the prediction via input feature attribution methods (aka saliency methods). However, input feature attribution methods merely identify the importance of input regions for the prediction and lack semantic interpretation of model behavior. In this work, we first identify the semantics associated with internal units (feature maps) of the network. We proceed to investigate the following questions; Does a regression model that is only trained with COVID-19 severity scores implicitly learn visual patterns associated with thoracic pathologies? Does a network that is trained on weakly labeled data (e.g. healthy, unhealthy) implicitly learn pathologies? Moreover, we investigate the effect of pretraining and data imbalance on the interpretability of learned features. In addition to the analysis, we propose semantic attribution to semantically explain each prediction. We present our findings using publicly available chest pathologies (CheXpert [5], NIH ChestX-ray8 [25]) and COVID-19 datasets (BrixIA [20], and COVID-19 chest X-ray segmentation dataset [4]). The Code ( ) is publicly available.

Ashkan Khakzar, Sabrina Musatian, Jonas Buchberger, Icxel Valeriano Quiroz, Nikolaus Pinger, Soroosh Baselizadeh, Seong Tae Kim, Nassir Navab
A Principled Approach to Failure Analysis and Model Repairment: Demonstration in Medical Imaging

Machine learning models commonly exhibit unexpected failures post-deployment due to either data shifts or uncommon situations in the training environment. Domain experts typically go through the tedious process of inspecting the failure cases manually, identifying failure modes and then attempting to fix the model. In this work, we aim to standardise and bring principles to this process through answering two critical questions: (i) how do we know that we have identified meaningful and distinct failure types?; (ii) how can we validate that a model has, indeed, been repaired? We suggest that the quality of the identified failure types can be validated through measuring the intra- and inter-type generalisation after fine-tuning and introduce metrics to compare different subtyping methods. Furthermore, we argue that a model can be considered repaired if it achieves high accuracy on the failure types while retaining performance on the previously correct data. We combine these two ideas into a principled framework for evaluating the quality of both the identified failure subtypes and model repairment. We evaluate its utility on a classification and an object detection tasks. Our code is available at .

Thomas Henn, Yasukazu Sakamoto, Clément Jacquet, Shunsuke Yoshizawa, Masamichi Andou, Stephen Tchen, Ryosuke Saga, Hiroyuki Ishihara, Katsuhiko Shimizu, Yingzhen Li, Ryutaro Tanno
Using Causal Analysis for Conceptual Deep Learning Explanation

Model explainability is essential for the creation of trustworthy Machine Learning models in healthcare. An ideal explanation resembles the decision-making process of a domain expert and is expressed using concepts or terminology that is meaningful to the clinicians. To provide such explanation, we first associate the hidden units of the classifier to clinically relevant concepts. We take advantage of radiology reports accompanying the chest X-ray images to define concepts. We discover sparse associations between concepts and hidden units using a linear sparse logistic regression. To ensure that the identified units truly influence the classifier’s outcome, we adopt tools from Causal Inference literature and, more specifically, mediation analysis through counterfactual interventions. Finally, we construct a low-depth decision tree to translate all the discovered concepts into a straightforward decision rule, expressed to the radiologist. We evaluated our approach on a large chest x-ray dataset, where our model produces a global explanation consistent with clinical knowledge.

Sumedha Singla, Stephen Wallace, Sofia Triantafillou, Kayhan Batmanghelich
A Spherical Convolutional Neural Network for White Matter Structure Imaging via dMRI

Diffusion Magnetic Resonance Imaging (dMRI) is a powerful non-invasive and in-vivo imaging modality for probing brain white matter structure. Convolutional neural networks (CNNs) have been shown to be a powerful tool for many computer vision problems where the signals are acquired on a regular grid and where translational invariance is important. However, as we are considering dMRI signals that are acquired on a sphere, rotational invariance, rather than translational, is desired. In this work, we propose a spherical CNN model with fully spectral domain convolutional and non-linear layers. It provides rotational invariance and is adapted to the real nature of dMRI signals and uniform random distribution of sampling points. The proposed model is positively evaluated on the problem of estimation of neurite orientation dispersion and density imaging (NODDI) parameters on the data from Human Connectome Project (HCP).

Sara Sedlar, Abib Alimi, Théodore Papadopoulo, Rachid Deriche, Samuel Deslauriers-Gauthier
Sharpening Local Interpretable Model-Agnostic Explanations for Histopathology: Improved Understandability and Reliability

Being accountable for the signed reports, pathologists may be wary of high-quality deep learning outcomes if the decision-making is not understandable. Applying off-the-shelf methods with default configurations such as Local Interpretable Model-Agnostic Explanations (LIME) is not sufficient to generate stable and understandable explanations. This work improves the application of LIME to histopathology images by leveraging nuclei annotations, creating a reliable way for pathologists to audit black-box tumor classifiers. The obtained visualizations reveal the sharp, neat and high attention of the deep classifier to the neoplastic nuclei in the dataset, an observation in line with clinical decision making. Compared to standard LIME, our explanations show improved understandability for domain-experts, report higher stability and pass the sanity checks of consistency to data or initialization changes and sensitivity to network parameters. This represents a promising step in giving pathologists tools to obtain additional information on image classification models. The code and trained models are available on GitHub.

Mara Graziani, Iam Palatnik de Sousa, Marley M. B. R. Vellasco, Eduardo Costa da Silva, Henning Müller, Vincent Andrearczyk
Improving the Explainability of Skin Cancer Diagnosis Using CBIR

Explainability is a key feature for computer-aided diagnosis systems. This property not only helps doctors understand their decisions, but also allows less experienced practitioners to improve their knowledge. Skin cancer diagnosis is a field where explainability is of critical importance, as lesions of different classes often exhibit confounding characteristics. This work proposes a deep neural network (DNN) for skin cancer diagnosis that provides explainability through content-based image retrieval. We explore several state-of-the-art approaches to improve the feature space learned by the DNN, namely contrastive, distillation, and triplet losses. We demonstrate that the combination of these regularization losses with the categorical cross-entropy leads to the best performances on melanoma classification, and results in a hybrid DNN that simultaneously: i) classifies the images; and ii) retrieves similar images justifying the diagnosis. The code is available at .

Catarina Barata, Carlos Santiago
PAC Bayesian Performance Guarantees for Deep (Stochastic) Networks in Medical Imaging

Application of deep neural networks to medical imaging tasks has in some sense become commonplace. Still, a “thorn in the side” of the deep learning movement is the argument that deep networks are prone to overfitting and are thus unable to generalize well when datasets are small (as is common in medical imaging tasks). One way to bolster confidence is to provide mathematical guarantees, or bounds, on network performance after training which explicitly quantify the possibility of overfitting. In this work, we explore recent advances using the PAC-Bayesian framework to provide bounds on generalization error for large (stochastic) networks. While previous efforts focus on classification in larger natural image datasets (e.g., MNIST and CIFAR-10), we apply these techniques to both classification and segmentation in a smaller medical imagining dataset: the ISIC 2018 challenge set. We observe the resultant bounds are competitive compared to a simpler baseline, while also being more explainable and alleviating the need for holdout sets.

Anthony Sicilia, Xingchen Zhao, Anastasia Sosnovskikh, Seong Jae Hwang

Machine Learning – Uncertainty

Medical Matting: A New Perspective on Medical Segmentation with Uncertainty

In medical image segmentation, it is difficult to mark ambiguous areas accurately with binary masks, especially when dealing with small lesions. Therefore, it is a challenge for radiologists to reach a consensus by using binary masks under the condition of multiple annotations. However, these uncertain areas may contain anatomical structures that are conducive to diagnosis. Uncertainty is introduced to study these situations. Nevertheless, the uncertainty is usually measured by the variances between predictions in a multiple trial way. It is not intuitive, and there is no exact correspondence in the image. Inspired by image matting, we introduce matting as a soft segmentation method and a new perspective to deal with and represent uncertain regions into medical scenes, namely medical matting. More specifically, because there is no available medical matting dataset, we first labeled two medical datasets with alpha matte. Secondly, the matting methods applied to the natural image are not suitable for the medical scene, so we propose a new architecture to generate binary masks and alpha matte in a row. Thirdly, the uncertainty map is introduced to highlight the ambiguous regions from the binary results and improve the matting performance. Evaluated on these datasets, the proposed model outperformed state-of-the-art matting algorithms by a large margin, and alpha matte is proved to be a more efficient labeling form than a binary mask.

Lin Wang, Lie Ju, Donghao Zhang, Xin Wang, Wanji He, Yelin Huang, Zhiwen Yang, Xuan Yao, Xin Zhao, Xiufen Ye, Zongyuan Ge
Confidence-Aware Cascaded Network for Fetal Brain Segmentation on MR Images

Fetal brain segmentation from Magnetic Resonance (MR) images is a fundamental step in brain development study and early diagnosis. Although progress has been made, performance still needs to be improved especially for the images with motion artifacts (due to unpredictable fetal movement) and/or changes of magnetic field. In this paper, we propose a novel confidence-aware cascaded framework to accurately extract fetal brain from MR image. Different from the existing coarse-to-fine techniques, our two-stage strategy aims to segment brain region and simultaneously produce segmentation confidence for each slice in 3D MR image. Then, the image slices with high-confidence scores are leveraged to guide brain segmentation of low-confidence image slices, especially on the brain regions with blurred boundaries. Furthermore, a slice consistency loss is also proposed to enhance the relationship among the segmentations of adjacent slices. Experimental results on fetal brain MRI dataset show that our proposed model achieves superior performance, and outperforms several state-of-the-art methods.

Xukun Zhang, Zhiming Cui, Changan Chen, Jie Wei, Jingjiao Lou, Wenxin Hu, He Zhang, Tao Zhou, Feng Shi, Dinggang Shen
Orthogonal Ensemble Networks for Biomedical Image Segmentation

Despite the astonishing performance of deep-learning based approaches for visual tasks such as semantic segmentation, they are known to produce miscalibrated predictions, which could be harmful for critical decision-making processes. Ensemble learning has shown to not only boost the performance of individual models but also reduce their miscalibration by averaging independent predictions. In this scenario, model diversity has become a key factor, which facilitates individual models converging to different functional solutions. In this work, we introduce Orthogonal Ensemble Networks (OEN), a novel framework to explicitly enforce model diversity by means of orthogonal constraints. The proposed method is based on the hypothesis that inducing orthogonality among the constituents of the ensemble will increase the overall model diversity. We resort to a new pairwise orthogonality constraint which can be used to regularize a sequential ensemble training process, resulting on improved predictive performance and better calibrated model outputs. We benchmark the proposed framework in two challenging brain lesion segmentation tasks –brain tumor and white matter hyper-intensity segmentation in MR images. The experimental results show that our approach produces more robust and well-calibrated ensemble models and can deal with challenging tasks in the context of biomedical image segmentation.

Agostina J. Larrazabal, César Martínez, Jose Dolz, Enzo Ferrante
Learning to Predict Error for MRI Reconstruction

In healthcare applications, predictive uncertainty has been used to assess predictive accuracy. In this paper, we demonstrate that predictive uncertainty estimated by the current methods does not highly correlate with prediction error by decomposing the latter into random and systematic errors, and showing that the former is equivalent to the variance of the random error. In addition, we observe that current methods unnecessarily compromise performance by modifying the model and training loss to estimate the target and uncertainty jointly. We show that estimating them separately without modifications improves performance. Following this, we propose a novel method that estimates the target labels and magnitude of the prediction error in two steps. We demonstrate this method on a large-scale MRI reconstruction task, and achieve significantly better results than the state-of-the-art uncertainty estimation methods.

Shi Hu, Nicola Pezzotti, Max Welling
Uncertainty-Guided Progressive GANs for Medical Image Translation

Image-to-image translation plays a vital role in tackling various medical imaging tasks such as attenuation correction, motion correction, undersampled reconstruction, and denoising. Generative adversarial networks have been shown to achieve the state-of-the-art in generating high fidelity images for these tasks. However, the state-of-the-art GAN-based frameworks do not estimate the uncertainty in the predictions made by the network that is essential for making informed medical decisions and subsequent revision by medical experts and has recently been shown to improve the performance and interpretability of the model. In this work, we propose an uncertainty-guided progressive learning scheme for image-to-image translation. By incorporating aleatoric uncertainty as attention maps for GANs trained in a progressive manner, we generate images of increasing fidelity progressively. We demonstrate the efficacy of our model on three challenging medical image translation tasks, including PET to CT translation, undersampled MRI reconstruction, and MRI motion artefact correction. Our model generalizes well in three different tasks and improves performance over state of the art under full-supervision and weak-supervision with limited data. Code is released here: .

Uddeshya Upadhyay, Yanbei Chen, Tobias Hepp, Sergios Gatidis, Zeynep Akata
Variational Topic Inference for Chest X-Ray Report Generation

Automating report generation for medical imaging promises to reduce workload and assist diagnosis in clinical practice. Recent work has shown that deep learning models can successfully caption natural images. However, learning from medical data is challenging due to the diversity and uncertainty inherent in the reports written by different radiologists with discrepant expertise and experience. To tackle these challenges, we propose variational topic inference for automatic report generation. Specifically, we introduce a set of topics as latent variables to guide sentence generation by aligning image and language modalities in a latent space. The topics are inferred in a conditional variational inference framework, with each topic governing the generation of a sentence in the report. Further, we adopt a visual attention module that enables the model to attend to different locations in the image and generate more informative descriptions. We conduct extensive experiments on two benchmarks, namely Indiana U. Chest X-rays and MIMIC-CXR. The results demonstrate that our proposed variational topic inference method can generate novel reports rather than mere copies of reports used in training, while still achieving comparable performance to state-of-the-art methods in terms of standard language generation criteria.

Ivona Najdenkoska, Xiantong Zhen, Marcel Worring, Ling Shao
Uncertainty Aware Deep Reinforcement Learning for Anatomical Landmark Detection in Medical Images

Deep reinforcement learning (DRL) is a promising technique for anatomical landmark detection in 3D medical images and a useful first step in automated medical imaging pathology detection. However, deployment of landmark detection in a pathology detection pipeline requires a self-assessment process to identify out-of-distribution images for manual review. We therefore propose a novel method derived from the full-width-half-maxima of q-value probability distributions for estimating the uncertainty of a distributional deep q-learning (dist-DQN) landmark detection agent. We trained two dist-DQN models targeting the locations of knee fibular styloid and intercondylar eminence of the tibia, using 1552 MR sequences (Sagittal PD, PDFS and T2FS) with an approximate 75%, 5%, 20% training, validation, and test split. Error for the two landmarks was 3.25 ± 0.12 mm and 3.06 ± 0.10 mm respectively (mean ± standard error). Mean error for the two landmarks was 28% lower than a non-distributional DQN baseline (3.16 ± 0.11 mm vs 4.36 ± 0.27 mm). Additionally, we demonstrate that the dist-DQN derived uncertainty metric has an AUC of 0.91 for predicting out-of-distribution images with a specificity of 0.77 at sensitivity 0.90, illustrating the double benefit of improved error rate and the ability to defer reviews to experts.

James Browning, Micha Kornreich, Aubrey Chow, Jayashri Pawar, Li Zhang, Richard Herzog, Benjamin L. Odry
Medical Image Computing and Computer Assisted Intervention – MICCAI 2021
Prof. Dr. Marleen de Bruijne
Prof. Dr. Philippe C. Cattin
Stéphane Cotin
Nicolas Padoy
Prof. Stefanie Speidel
Yefeng Zheng
Caroline Essert
Copyright Year
Electronic ISBN
Print ISBN

Premium Partner