Skip to main content

2020 | Buch

Medical Image Computing and Computer Assisted Intervention – MICCAI 2020

23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I

herausgegeben von: Prof. Anne L. Martel, Purang Abolmaesumi, Danail Stoyanov, Diana Mateus, Maria A. Zuluaga, S. Kevin Zhou, Daniel Racoceanu, Prof. Leo Joskowicz

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

The seven-volume set LNCS 12261, 12262, 12263, 12264, 12265, 12266, and 12267 constitutes the refereed proceedings of the 23rd International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2020, held in Lima, Peru, in October 2020. The conference was held virtually due to the COVID-19 pandemic.

The 542 revised full papers presented were carefully reviewed and selected from 1809 submissions in a double-blind review process. The papers are organized in the following topical sections:

Part I: machine learning methodologies

Part II: image reconstruction; prediction and diagnosis; cross-domain methods and reconstruction; domain adaptation; machine learning applications; generative adversarial networks

Part III: CAI applications; image registration; instrumentation and surgical phase detection; navigation and visualization; ultrasound imaging; video image analysis

Part IV: segmentation; shape models and landmark detection

Part V: biological, optical, microscopic imaging; cell segmentation and stain normalization; histopathology image analysis; opthalmology

Part VI: angiography and vessel analysis; breast imaging; colonoscopy; dermatology; fetal imaging; heart and lung imaging; musculoskeletal imaging

Part VI: brain development and atlases; DWI and tractography; functional brain networks; neuroimaging; positron emission tomography

Inhaltsverzeichnis

Frontmatter

Machine Learning Methodologies

Frontmatter
Attention, Suggestion and Annotation: A Deep Active Learning Framework for Biomedical Image Segmentation

Despite the great success, deep learning based segmentation methods still face a critical obstacle: the difficulty in acquiring sufficient training data due to high annotation costs. In this paper, we propose a deep active learning framework that combines the attention gated fully convolutional network (ag-FCN) and the distribution discrepancy based active learning algorithm (dd-AL) to significantly reduce the annotation effort by iteratively annotating the most informative samples to train the ag-FCN for the better segmentation performance. Our framework is evaluated on 2015 MICCAI Gland Segmentaion dataset and 2017 MICCAI 6-month infant brain MRI Segmentation dataset. Experiment results show that our framework can achieve state-of-the-art segmentation performance by using only a portion of the training data.

Haohan Li, Zhaozheng Yin
Scribble2Label: Scribble-Supervised Cell Segmentation via Self-generating Pseudo-Labels with Consistency

Segmentation is a fundamental process in microscopic cell image analysis. With the advent of recent advances in deep learning, more accurate and high-throughput cell segmentation has become feasible. However, most existing deep learning-based cell segmentation algorithms require fully annotated ground-truth cell labels, which are time-consuming and labor-intensive to generate. In this paper, we introduce Scribble2Label, a novel weakly-supervised cell segmentation framework that exploits only a handful of scribble annotations without full segmentation labels. The core idea is to combine pseudo-labeling and label filtering to generate reliable labels from weak supervision. For this, we leverage the consistency of predictions by iteratively averaging the predictions to improve pseudo labels. We demonstrate the performance of Scribble2Label by comparing it to several state-of-the-art cell segmentation methods with various cell image modalities, including bright-field, fluorescence, and electron microscopy. We also show that our method performs robustly across different levels of scribble details, which confirms that only a few scribble annotations are required in real-use cases.

Hyeonsoo Lee, Won-Ki Jeong
Are Fast Labeling Methods Reliable? A Case Study of Computer-Aided Expert Annotations on Microscopy Slides

Deep-learning-based pipelines have shown the potential to revolutionalize microscopy image diagnostics by providing visual augmentations and evaluations to a trained pathology expert. However, to match human performance, the methods rely on the availability of vast amounts of high-quality labeled data, which poses a significant challenge. To circumvent this, augmented labeling methods, also known as expert-algorithm-collaboration, have recently become popular. However, potential biases introduced by this operation mode and their effects for training deep neuronal networks are not entirely understood. This work aims to shed light on some of the effects by providing a case study for three pathologically relevant diagnostic settings. Ten trained pathology experts performed a labeling tasks first without and later with computer-generated augmentation. To investigate different biasing effects, we intentionally introduced errors to the augmentation. In total, the pathology experts annotated 26,015 cells on 1,200 images in this novel annotation study. Backed by this extensive data set, we found that the concordance of multiple experts was significantly increased in the computer-aided setting, versus the unaided annotation. However, a significant percentage of the deliberately introduced false labels was not identified by the experts.

Christian Marzahl, Christof A. Bertram, Marc Aubreville, Anne Petrick, Kristina Weiler, Agnes C. Gläsel, Marco Fragoso, Sophie Merz, Florian Bartenschlager, Judith Hoppe, Alina Langenhagen, Anne-Katherine Jasensky, Jörn Voigt, Robert Klopfleisch, Andreas Maier
Deep Reinforcement Active Learning for Medical Image Classification

In this paper, we propose a deep reinforcement learning algorithm for active learning on medical image data. Although deep learning has achieved great success on medical image processing, it relies on a large number of labeled data for training, which is expensive and time-consuming. Active learning, which follows a strategy to select and annotate informative samples, is an effective approach to alleviate this issue. However, most existing methods of active learning adopt a hand-design strategy, which cannot handle the dynamic procedure of classifier training. To address this issue, we model the procedure of active learning as a Markov decision process, and propose a deep reinforcement learning algorithm to learn a dynamic policy for active learning. To achieve this, we employ the actor-critic approach, and apply the deep deterministic policy gradient algorithm to train the model. We conduct experiments on two kinds of medical image data sets, and the results demonstrate that our method is able to learn better strategy compared with the existing hand-design ones.

Jingwen Wang, Yuguang Yan, Yubing Zhang, Guiping Cao, Ming Yang, Michael K. Ng
An Effective Data Refinement Approach for Upper Gastrointestinal Anatomy Recognition

Accurate recognition of anatomy sites is important for evaluating the quality of esophagogastroduodenoscopy (EGD) examinations. However, because some anatomy sites have similar appearances and anatomical landmarks are lacking, gastric-anatomy image annotations are less than accurate. The annotations by doctors with various experience levels vary widely. Deep learning–based systems trained on these noisy annotations have poor recognition performance. In this work, we propose a novel data refinement approach to alleviate the problem of noisy annotations and improve the upper gastrointestinal anatomy recognition performance. In essence, we introduce a new uncertainty inference module for deep convolutional neural networks (CNNs) and leverage Bayesian uncertainty estimates to select possibly noisy data. In addition, we employ an ensemble of semi-supervised learning to rectify noisy labels and produce refined training data. We validate the proposed approach via controlled experiments on CIFAR-10, in which the noise rate is adjusted and noisy data are made known. It shows much improvement on classification accuracy using the refined dataset, and outperforms state-of-the-art robust training methods, e.g., MentorNet and Co-teaching. An evaluation of the upper gastrointestinal anatomy recognition task proves that our proposed method effectively improves the recognition accuracy for real, noisy clinical data. The proposed data refinement approach reduces the human effort needed to filter out and manually rectify noisy annotations. It can also be applied to wider scenarios where accurate expert labeling is expensive.

Li Quan, Yan Li, Xiaoyi Chen, Ni Zhang
Synthetic Sample Selection via Reinforcement Learning

Synthesizing realistic medical images provides a feasible solution to the shortage of training data in deep learning based medical image recognition systems. However, the quality control of synthetic images for data augmentation purposes is under-investigated, and some of the generated images are not realistic and may contain misleading features that distort data distribution when mixed with real images. Thus, the effectiveness of those synthetic images in medical image recognition systems cannot be guaranteed when they are being added randomly without quality assurance. In this work, we propose a reinforcement learning (RL) based synthetic sample selection method that learns to choose synthetic images containing reliable and informative features. A transformer based controller is trained via proximal policy optimization (PPO) using the validation classification accuracy as the reward. The selected images are mixed with the original training data for improved training of image recognition systems. To validate our method, we take the pathology image recognition as an example and conduct extensive experiments on two histopathology image datasets. In experiments on a cervical dataset and a lymph node dataset, the image classification performance is improved by $$8.1\%$$ 8.1 % and $$2.3\%$$ 2.3 % , respectively, when utilizing high-quality synthetic images selected by our RL framework. Our proposed synthetic sample selection method is general and has great potential to boost the performance of various medical image recognition systems given limited annotation.

Jiarong Ye, Yuan Xue, L. Rodney Long, Sameer Antani, Zhiyun Xue, Keith C. Cheng, Xiaolei Huang
Dual-Level Selective Transfer Learning for Intrahepatic Cholangiocarcinoma Segmentation in Non-enhanced Abdominal CT

Automatic and accurate Intrahepatic Cholangiocarcinoma (ICC) segmentation in non-enhanced abdominal CT images can provide significant assistance for clinical decision making. While deep neural networks offer an effective tool for ICC segmentation, collecting large amounts of annotated data for deep network training may not be practical for this kind of applications. To this end, transfer learning approaches utilize abundant data from similar tasks and transfer the prior-learned knowledge to achieve better results. In this paper, we propose a novel Dual-level Selective Transfer Learning (DSTL) model for ICC segmentation, which selects similar information at global and local levels from a source dataset and produces transfer learning using the selected hierarchical information. Besides the basic segmentation networks, our DSTL model is composed of a global information selection network (GISNet) and a local information selection network (LISNet). The GISNet is utilized to output weights for global information selection and to mitigate the gap between the source and target tasks. The LISNet outputs weights for local information selection. Experimental results show that our DSTL model achieves superior ICC segmentation performance and outperforms the original and image selection based transfer learning and joint training strategies. To the best of our knowledge, this is the first method for ICC segmentation in non-enhanced abdominal CT.

Wenzhe Wang, Qingyu Song, Jiarong Zhou, Ruiwei Feng, Tingting Chen, Wenhao Ge, Danny Z. Chen, S. Kevin Zhou, Weilin Wang, Jian Wu
BiO-Net: Learning Recurrent Bi-directional Connections for Encoder-Decoder Architecture

U-Net has become one of the state-of-the-art deep learning-based approaches for modern computer vision tasks such as semantic segmentation, super resolution, image denoising, and inpainting. Previous extensions of U-Net have focused mainly on the modification of its existing building blocks or the development of new functional modules for performance gains. As a result, these variants usually lead to an unneglectable increase in model complexity. To tackle this issue in such U-Net variants, in this paper, we present a novel Bi-directional O-shape network (BiO-Net) that reuses the building blocks in a recurrent manner without introducing any extra parameters. Our proposed bi-directional skip connections can be directly adopted into any encoder-decoder architecture to further enhance its capabilities in various task domains. We evaluated our method on various medical image analysis tasks and the results show that our BiO-Net significantly outperforms the vanilla U-Net as well as other state-of-the-art methods. Our code is available at https://github.com/tiangexiang/BiO-Net .

Tiange Xiang, Chaoyi Zhang, Dongnan Liu, Yang Song, Heng Huang, Weidong Cai
Constrain Latent Space for Schizophrenia Classification via Dual Space Mapping Net

Mining potential biomarkers of schizophrenia (SCZ) while performing classification is essential for the research of SCZ. However, most related studies only perform a simple binary classification with high-dimensional neuroimaging features that ignore individual’s unique clinical symptoms. And the biomarkers mined in this way are more susceptible to confounding factors such as demographic factors. To address these questions, we propose a novel end-to-end framework, named Dual Spaces Mapping Net (DSM-Net), to map the neuroimaging features and clinical symptoms to a shared decoupled latent space, so that constrain the latent space into a solution space associated with detailed symptoms of SCZ. Briefly, taking functional connectivity patterns and the Positive and Negative Syndrome Scale (PANSS) scores as input views, DSM-Net maps the inputs to a shared decoupled latent space which is more discriminative. Besides, with an invertible space mapping sub-network, DSM-Net transforms multi-view learning into multi-task learning and provides regression of PANSS scores as an extra benefit. We evaluate the proposed DSM-Net with multi-site data of SCZ in the leave-one-site-out cross validation setting and experimental results illustrate the effectiveness of DSM-Net in classification, regression performance, and unearthing neuroimaging biomarkers with individual specificity, population commonality and less effect of confusions.

Weiyang Shi, Kaibin Xu, Ming Song, Lingzhong Fan, Tianzi Jiang
Have You Forgotten? A Method to Assess if Machine Learning Models Have Forgotten Data

In the era of deep learning, aggregation of data from several sources is a common approach to ensuring data diversity. Let us consider a scenario where several providers contribute data to a consortium for the joint development of a classification model (hereafter the target model), but, now one of the providers decides to leave. This provider requests that their data (hereafter the query dataset) be removed from the databases but also that the model ‘forgets’ their data. In this paper, for the first time, we want to address the challenging question of whether data have been forgotten by a model. We assume knowledge of the query dataset and the distribution of a model’s output. We establish statistical methods that compare the target’s outputs with outputs of models trained with different datasets. We evaluate our approach on several benchmark datasets (MNIST, CIFAR-10 and SVHN) and on a cardiac pathology diagnosis task using data from the Automated Cardiac Diagnosis Challenge (ACDC). We hope to encourage studies on what information a model retains and inspire extensions in more complex settings.

Xiao Liu, Sotirios A. Tsaftaris
Learning and Exploiting Interclass Visual Correlations for Medical Image Classification

Deep neural network-based medical image classifications often use “hard” labels for training, where the probability of the correct category is 1 and those of others are 0. However, these hard targets can drive the networks over-confident about their predictions and prone to overfit the training data, affecting model generalization and adaption. Studies have shown that label smoothing and softening can improve classification performance. Nevertheless, existing approaches are either non-data-driven or limited in applicability. In this paper, we present the Class-Correlation Learning Network (CCL-Net) to learn interclass visual correlations from given training data, and produce soft labels to help with classification tasks. Instead of letting the network directly learn the desired correlations, we propose to learn them implicitly via distance metric learning of class-specific embeddings with a lightweight plugin CCL block. An intuitive loss based on a geometrical explanation of correlation is designed for bolstering learning of the interclass correlations. We further present end-to-end training of the proposed CCL block as a plugin head together with the classification backbone while generating soft labels on the fly. Our experimental results on the International Skin Imaging Collaboration 2018 dataset demonstrate effective learning of the interclass correlations from training data, as well as consistent improvements in performance upon several widely used modern network structures with the CCL block.

Dong Wei, Shilei Cao, Kai Ma, Yefeng Zheng
Feature Preserving Smoothing Provides Simple and Effective Data Augmentation for Medical Image Segmentation

CNNs represent the current state of the art for image classification, as well as for image segmentation. Recent work suggests that CNNs for image classification suffer from a bias towards texture, and that reducing it can increase the network’s accuracy. We hypothesize that CNNs for medical image segmentation might suffer from a similar bias. We propose to reduce it by augmenting the training data with feature preserving smoothing, which reduces noise and high-frequency textural features, while preserving semantically meaningful boundaries. Experiments on multiple medical image segmentation tasks confirm that, especially when limited training data is available or a domain shift is involved, feature preserving smoothing can indeed serve as a simple and effective augmentation technique.

Rasha Sheikh, Thomas Schultz
Deep kNN for Medical Image Classification

Human-level diagnostic performance from intelligent systems often depends on large set of training data. However, the amount of available data for model training may be limited for part of diseases, which would cause the widely adopted deep learning models not generalizing well. One alternative simple approach to small class prediction is the traditional k-nearest neighbor (kNN). However, due to the non-parametric characteristics of kNN, it is difficult to combine the kNN classification into the learning of feature extractor. This paper proposes an end-to-end learning strategy to unify the kNN classification and the feature extraction procedure. The basic idea is to enforce that each training sample and its K nearest neighbors belong to the same class during learning the feature extractor. Experiments on multiple small-class and class-imbalanced medical image datasets showed that the proposed deep kNN outperforms both kNN and other strong classifiers.

Jiaxin Zhuang, Jiabin Cai, Ruixuan Wang, Jianguo Zhang, Wei-Shi Zheng
Learning Semantics-Enriched Representation via Self-discovery, Self-classification, and Self-restoration

Medical images are naturally associated with rich semantics about the human anatomy, reflected in an abundance of recurring anatomical patterns, offering unique potential to foster deep semantic representation learning and yield semantically more powerful models for different medical applications. But how exactly such strong yet free semantics embedded in medical images can be harnessed for self-supervised learning remains largely unexplored. To this end, we train deep models to learn semantically enriched visual representation by self-discovery, self-classification, and self-restoration of the anatomy underneath medical images, resulting in a semantics-enriched, general-purpose, pre-trained 3D model, named Semantic Genesis. We examine our Semantic Genesis with all the publicly-available pre-trained models, by either self-supervision or fully supervision, on the six distinct target tasks, covering both classification and segmentation in various medical modalities (i.e., CT, MRI, and X-ray). Our extensive experiments demonstrate that Semantic Genesis significantly exceeds all of its 3D counterparts as well as the de facto ImageNet-based transfer learning in 2D. This performance is attributed to our novel self-supervised learning framework, encouraging deep models to learn compelling semantic representation from abundant anatomical patterns resulting from consistent anatomies embedded in medical images. Code and pre-trained Semantic Genesis are available at https://github.com/JLiangLab/SemanticGenesis .

Fatemeh Haghighi, Mohammad Reza Hosseinzadeh Taher, Zongwei Zhou, Michael B. Gotway, Jianming Liang
DECAPS: Detail-Oriented Capsule Networks

Capsule Networks (CapsNets) have demonstrated to be a promising alternative to Convolutional Neural Networks (CNNs). However, they often fall short of state-of-the-art accuracies on large-scale high-dimensional datasets. We propose a Detail-Oriented Capsule Network (DECAPS) that combines the strength of CapsNets with several novel techniques to boost its classification accuracies. First, DECAPS uses an Inverted Dynamic Routing (IDR) mechanism to group lower-level capsules into heads before sending them to higher-level capsules. This strategy enables capsules to selectively attend to small but informative details within the data which may be lost during pooling operations in CNNs. Second, DECAPS employs a Peekaboo training procedure, which encourages the network to focus on fine-grained information through a second-level attention scheme. Finally, the distillation process improves the robustness of DECAPS by averaging over the original and attended image region predictions. We provide extensive experiments on the CheXpert and RSNA Pneumonia datasets to validate the effectiveness of DECAPS. Our networks achieve state-of-the-art accuracies not only in classification (increasing the average area under ROC curves from 87.24% to 92.82% on the CheXpert dataset) but also in the weakly-supervised localization of diseased areas (increasing average precision from 41.7% to 80% for the RSNA Pneumonia detection dataset).

Aryan Mobiny, Pengyu Yuan, Pietro Antonio Cicalese, Hien Van Nguyen
Federated Simulation for Medical Imaging

Labelling data is expensive and time consuming especially for domains such as medical imaging that contain volumetric imaging data and require expert knowledge. Exploiting a larger pool of labeled data available across multiple centers, such as in federated learning, has also seen limited success since current deep learning approaches do not generalize well to images acquired with scanners from different manufacturers. We aim to address these problems in a common, learning-based image simulation framework which we refer to as Federated Simulation. We introduce a physics-driven generative approach that consists of two learnable neural modules: 1) a module that synthesizes 3D cardiac shapes along with their materials, and 2) a CT simulator that renders these into realistic 3D CT Volumes, with annotations. Since the model of geometry and material is disentangled from the imaging sensor, it can effectively be trained across multiple medical centers. We show that our data synthesis framework improves the downstream segmentation performance on several datasets.

Daiqing Li, Amlan Kar, Nishant Ravikumar, Alejandro F. Frangi, Sanja Fidler
Continual Learning of New Diseases with Dual Distillation and Ensemble Strategy

Most intelligent diagnosis systems are developed for one or a few specific diseases, while medical specialists can diagnose all diseases of certain organ or tissue. Since it is often difficult to collect data of all diseases, it would be desirable if an intelligent system can initially diagnose a few diseases, and then continually learn to diagnose more and more diseases with coming data of these new classes in the future. However, current intelligent systems are characterised by catastrophic forgetting of old knowledge when learning new classes. In this paper, we propose a new continual learning framework to alleviate this issue by simultaneously distilling both old knowledge and recently learned new knowledge and by ensembling the class-specific knowledge from the previous classifier and the learned new classifier. Experiments showed that the proposed method outperforms state-of-the-art methods on multiple medical and natural image datasets.

Zhuoyun Li, Changhong Zhong, Ruixuan Wang, Wei-Shi Zheng
Learning to Segment When Experts Disagree

Recent years have seen an increasing use of supervised learning methods for segmentation tasks. However, the predictive performance of these algorithms depend on the quality of labels, especially in medical image domain, where both the annotation cost and inter-observer variability are high. In a typical annotation collection process, different clinical experts provide their estimates of the “true” segmentation labels under the influence of their levels of expertise and biases. Treating these noisy labels blindly as the ground truth can adversely affect the performance of supervised segmentation models. In this work, we present a neural network architecture for jointly learning, from noisy observations alone, both the reliability of individual annotators and the true segmentation label distributions. The separation of the annotators’ characteristics and true segmentation label is achieved by encouraging the estimated annotators to be maximally unreliable while achieving high fidelity with the training data. Our method can also be viewed as a translation of STAPLE, an established label aggregation framework proposed in Warfield et al. [1] to the supervised learning paradigm. We demonstrate first on a generic segmentation task using MNIST data and then adapt for usage with MRI scans of multiple sclerosis (MS) patients for lesion labelling. Our method shows considerable improvement over the relevant baselines on both datasets in terms of segmentation accuracy and estimation of annotator reliability, particularly when only a single label is available per image. An open-source implementation of our approach can be found at https://github.com/UCLBrain/MSLS .

Le Zhang, Ryutaro Tanno, Kevin Bronik, Chen Jin, Parashkev Nachev, Frederik Barkhof, Olga Ciccarelli, Daniel C. Alexander
Deep Disentangled Hashing with Momentum Triplets for Neuroimage Search

Neuroimaging has been widely used in computer-aided clinical diagnosis and treatment, and the rapid increase of neuroimage repositories introduces great challenges for efficient neuroimage search. Existing image search methods often use triplet loss to capture high-order relationships between samples. However, we find that the traditional triplet loss is difficult to pull positive and negative sample pairs to make their Hamming distance discrepancies larger than a small fixed value. This may reduce the discriminative ability of learned hash code and degrade the performance of image search. To address this issue, in this work, we propose a deep disentangled momentum hashing (DDMH) framework for neuroimage search. Specifically, we first investigate the original triplet loss and find that this loss function can be determined by the inner product of hash code pairs. Accordingly, we disentangle hash code norms and hash code directions and analyze the role of each part. By decoupling the loss function from the hash code norm, we propose a unique disentangled triplet loss, which can effectively push positive and negative sample pairs by desired Hamming distance discrepancies for hash codes with different lengths. We further develop a momentum triplet strategy to address the problem of insufficient triplet samples caused by small batch-size for 3D neuroimages. With the proposed disentangled triplet loss and the momentum triplet strategy, we design an end-to-end trainable deep hashing framework for neuroimage search. Comprehensive empirical evidence on three neuroimage datasets shows that DDMH has better performance in neuroimage search compared to several state-of-the-art methods.

Erkun Yang, Dongren Yao, Bing Cao, Hao Guan, Pew-Thian Yap, Dinggang Shen, Mingxia Liu
Learning Joint Shape and Appearance Representations with Metamorphic Auto-Encoders

Transformation-based methods for shape analysis offer a consistent framework to model the geometrical content of images. Most often relying on diffeomorphic transforms, they lack however the ability to properly handle texture and differing topological content. Conversely, modern deep learning methods offer a very efficient way to analyze image textures. Building on the theory of metamorphoses, which models images as combined intensity-domain and spatial-domain transforms of a prototype, we introduce the “metamorphic” auto-encoding architecture. This class of neural networks is interpreted as a Bayesian generative and hierarchical model, allowing the joint estimation of the network parameters, a representative prototype of the training images, as well as the relative importance between the geometrical and texture contents.We give arguments for the practical relevance of the learned prototype and Euclidean latent-space metric, achieved thanks to an explicit normalization layer. Finally, the ability of the proposed architecture to learn joint and relevant shape and appearance representations from image collections is illustrated on BraTs 2018 datasets, showing in particular an encouraging step towards personalized numerical simulation of tumors with data-driven models.

Alexandre Bône, Paul Vernhet, Olivier Colliot, Stanley Durrleman
Collaborative Learning of Cross-channel Clinical Attention for Radiotherapy-Related Esophageal Fistula Prediction from CT

Early prognosis of the radiotherapy-related esophageal fistula is of great significance in making personalized stratification and optimal treatment plans for esophageal cancer (EC) patients. The effective fusion of diagnostic consideration guided multi-level radiographic visual descriptors is a challenging task. We propose an end-to-end clinical knowledge enhanced multi-level cross-channel feature extraction and aggregation model. Firstly, clinical attention is represented by contextual CT, segmented tumor and anatomical surroundings from nine views of planes. Then for each view, a Cross-Channel-Atten Network is proposed with CNN blocks for multi-level feature extraction, cross-channel convolution module for multi-domain clinical knowledge embedding at the same feature level, and attentional mechanism for the final adaptive fusion of multi-level cross-domain radiographic features. The experimental results and ablation study on 558 EC patients showed that our model outperformed the other methods in comparison with or without multi-view, multi-domain knowledge, and multi-level attentional features. Visual analysis of attention maps shows that the network learns to focus on tumor and organs of interests, including esophagus, trachea, and mediastinal connective tissues.

Hui Cui, Yiyue Xu, Wanlong Li, Linlin Wang, Henry Duh
Learning Bronchiole-Sensitive Airway Segmentation CNNs by Feature Recalibration and Attention Distillation

Training deep convolutional neural networks (CNNs) for airway segmentation is challenging due to the sparse supervisory signals caused by severe class imbalance between long, thin airways and background. In view of the intricate pattern of tree-like airways, the segmentation model should pay extra attention to the morphology and distribution characteristics of airways. We propose a CNNs-based airway segmentation method that enjoys superior sensitivity to tenuous peripheral bronchioles. We first present a feature recalibration module to make the best use of learned features. Spatial information of features is properly integrated to retain relative priority of activated regions, which benefits the subsequent channel-wise recalibration. Then, attention distillation module is introduced to reinforce the airway-specific representation learning. High-resolution attention maps with fine airway details are passing down from late layers to previous layers iteratively to enrich context knowledge. Extensive experiments demonstrate considerable performance gain brought by the two proposed modules. Compared with state-of-the-art methods, our method extracted much more branches while maintaining competitive overall segmentation performance.

Yulei Qin, Hao Zheng, Yun Gu, Xiaolin Huang, Jie Yang, Lihui Wang, Yue-Min Zhu
Learning Rich Attention for Pediatric Bone Age Assessment

Bone Age Assessment (BAA) is a challenging clinical practice in pediatrics, which requires rich attention on multiple anatomical Regions of Interest (RoIs). Recently developed deep learning methods address the challenge in BAA with a hard-crop attention mechanism, which segments or detects the discriminative RoIs for meticulous analysis. Great strides have been made, however, these methods face severe requirements on precise RoIs annotation, complex network design and expensive computing expenditure. In this paper, we show it is possible to learn rich attention without the need for complicated network design or precise annotation – a simple module is all it takes. The proposed Rich Attention Network (RA-Net) is composed of a flexible baseline network and a lightweight Rich Attention module (RAm). Taking the feature map from baseline network, the RA module is optimized to generate attention with discriminability and diversity, thus the deep network can learn rich pattern attention and representation. With this artful design, we enable an end-to-end framework for BAA without RoI annotation. The RA-Net brings significant margin in performance, meanwhile negligible additional overhead in parameter and computation. Extensive experiments verify that our method yields state-of-the-art performance on the public RSNA datasets with mean absolute error (MAE) of 4.10 months.

Chuanbin Liu, Hongtao Xie, Yunyan Yan, Zhendong Mao, Yongdong Zhang
Weakly Supervised Organ Localization with Attention Maps Regularized by Local Area Reconstruction

Fully supervised methods with numerous dense-labeled training data have achieved accurate localization results for anatomical structures. However, obtaining such a dedicated dataset usually requires clinical expertise and time-consuming annotation process. In this work, we tackle the organ localization problem under the setting of image-level annotations. Previous Class Activation Map (CAM) and its derivatives have proved that discriminative regions of images can be located with basic classification networks. To improve the representative capacity of attention maps generated by CAMs, a novel learning-based Local Area Reconstruction (LAR) method is proposed. Our weakly supervised organ localization network, namely OLNet, can generate high-resolution attention maps that preserve fine-detailed target anatomical structures. Online generated pseudo ground-truth is utilized to impose geometric constraints on attention maps. Extensive experiments on In-house Chest CT Dataset and Kidney Tumor Segmentation Benchmark (KiTS19) show that our approach can provide promising localization results both in saliency map and semantic segmentation perspectives.

Heng Guo, Minfeng Xu, Ying Chi, Lei Zhang, Xian-Sheng Hua
High-Order Attention Networks for Medical Image Segmentation

Segmentation is a fundamental task in medical image analysis. Current state-of-the-art Convolutional Neural Networks on medical image segmentation capture local context information using fixed-shape receptive fields and feature detectors with position-invariant weights, which limits the robustness to the variance of input, such as medical objects of variant sizes, shapes, and domains. In order to capture global context information, we propose High-order Attention (HA), a novel attention module with adaptive receptive fields and dynamic weights. HA allows each pixel to has its own global attention map that models its relationship to all other pixels. In particular, HA constructs the attention map through graph transduction and thus captures high relevant context information at high-order. Consequently, feature maps at each position are selectively aggregated as a weighted sum of feature maps at all positions. We further embed the proposed HA module into an efficient encoder-decoder structure for medical image segmentation, namely High-order Attention Network (HANet). Extensive experiments are conducted on four benchmark sets for three tasks, i.e., REFUGE and Drishti-GS1 for optic disc/cup segmentation, DRIVE for blood vessel segmentation, and LUNA for lung segmentation. The results justify the effectiveness of the new attention module for medical image segmentation.

Fei Ding, Gang Yang, Jun Wu, Dayong Ding, Jie Xv, Gangwei Cheng, Xirong Li
NAS-SCAM: Neural Architecture Search-Based Spatial and Channel Joint Attention Module for Nuclei Semantic Segmentation and Classification

The segmentation and classification of different types of nuclei plays an important role in discriminating and diagnosing of the initiation, development, invasion, metastasis and therapeutic response of tumors of various organs. Recently, deep learning method based on attention mechanism has achieved good results in nuclei semantic segmentation. However, the design of attention module architecture relies heavily on the experience of researchers and a large number of experiments. Therefore, in order to avoid this manual design and achieve better performance, we propose a new Neural Architecture Search-based Spatial and Channel joint Attention Module (NAS-SCAM) to obtain better spatial and channel weighting effect. To the best of our knowledge, this is the first time to apply NAS to the attention mechanism. At the same time, we also use synchronous search strategy to search architectures independently for different attention modules in the same network structure. We verify the superiority of our methods over the state-of-the-art attention modules and networks in public dataset of MoNuSAC 2020. We make our code and model available at https://github.com/ZuhaoLiu/NAS-SCAM .

Zuhao Liu, Huan Wang, Shaoting Zhang, Guotai Wang, Jin Qi
Scientific Discovery by Generating Counterfactuals Using Image Translation

Model explanation techniques play a critical role in understanding the source of a model’s performance and making its decisions transparent. Here we investigate if explanation techniques can also be used as a mechanism for scientific discovery. We make three contributions: first, we propose a framework to convert predictions from explanation techniques to a mechanism of discovery. Second, we show how generative models in combination with black-box predictors can be used to generate hypotheses (without human priors) that can be critically examined. Third, with these techniques we study classification models for retinal images predicting Diabetic Macular Edema (DME), where recent work [30] showed that a CNN trained on these images is likely learning novel features in the image. We demonstrate that the proposed framework is able to explain the underlying scientific mechanism, thus bridging the gap between the model’s performance and human understanding.

Arunachalam Narayanaswamy, Subhashini Venugopalan, Dale R. Webster, Lily Peng, Greg S. Corrado, Paisan Ruamviboonsuk, Pinal Bavishi, Michael Brenner, Philip C. Nelson, Avinash V. Varadarajan
Interpretable Deep Models for Cardiac Resynchronisation Therapy Response Prediction

Advances in deep learning (DL) have resulted in impressive accuracy in some medical image classification tasks, but often deep models lack interpretability. The ability of these models to explain their decisions is important for fostering clinical trust and facilitating clinical translation. Furthermore, for many problems in medicine there is a wealth of existing clinical knowledge to draw upon, which may be useful in generating explanations, but it is not obvious how this knowledge can be encoded into DL models - most models are learnt either from scratch or using transfer learning from a different domain. In this paper we address both of these issues. We propose a novel DL framework for image-based classification based on a variational autoencoder (VAE). The framework allows prediction of the output of interest from the latent space of the autoencoder, as well as visualisation (in the image domain) of the effects of crossing the decision boundary, thus enhancing the interpretability of the classifier. Our key contribution is that the VAE disentangles the latent space based on ‘explanations’ drawn from existing clinical knowledge. The framework can predict outputs as well as explanations for these outputs, and also raises the possibility of discovering new biomarkers that are separate (or disentangled) from the existing knowledge. We demonstrate our framework on the problem of predicting response of patients with cardiomyopathy to cardiac resynchronization therapy (CRT) from cine cardiac magnetic resonance images. The sensitivity and specificity of the proposed model on the task of CRT response prediction are 88.43% and 84.39% respectively, and we showcase the potential of our model in enhancing understanding of the factors contributing to CRT response.

Esther Puyol-Antón, Chen Chen, James R. Clough, Bram Ruijsink, Baldeep S. Sidhu, Justin Gould, Bradley Porter, Marc Elliott, Vishal Mehta, Daniel Rueckert, Christopher A. Rinaldi, Andrew P. King
Encoding Visual Attributes in Capsules for Explainable Medical Diagnoses

Convolutional neural network based systems have largely failed to be adopted in many high-risk application areas, including healthcare, military, security, transportation, finance, and legal, due to their highly uninterpretable “black-box” nature. Towards solving this deficiency, we teach a novel multi-task capsule network to improve the explainability of predictions by embodying the same high-level language used by human-experts. Our explainable capsule network, X-Caps, encodes high-level visual object attributes within the vectors of its capsules, then forms predictions based solely on these human-interpretable features. To encode attributes, X-Caps utilizes a new routing sigmoid function to independently route information from child capsules to parents. Further, to provide radiologists with an estimate of model confidence, we train our network on a distribution of expert labels, modeling inter-observer agreement and punishing over/under confidence during training, supervised by human-experts’ agreement. X-Caps simultaneously learns attribute and malignancy scores from a multi-center dataset of over 1000 CT scans of lung cancer screening patients. We demonstrate a simple 2D capsule network can outperform a state-of-the-art deep dense dual-path 3D CNN at capturing visually-interpretable high-level attributes and malignancy prediction, while providing malignancy prediction scores approaching that of non-explainable 3D CNNs. To the best of our knowledge, this is the first study to investigate capsule networks for making predictions based on radiologist-level interpretable attributes and its applications to medical image diagnosis. Code is publicly available at https://github.com/lalonderodney/X-Caps .

Rodney LaLonde, Drew Torigian, Ulas Bagci
Interpretability-Guided Content-Based Medical Image Retrieval

When encountering a dubious diagnostic case, radiologists typically search in public or internal databases for similar cases that would help them in their decision-making process. This search represents a massive burden to their workflow, as it considerably reduces their time to diagnose new cases. It is, therefore, of utter importance to replace this manual intensive search with an automatic content-based image retrieval system. However, general content-based image retrieval systems are often not helpful in the context of medical imaging since they do not consider the fact that relevant information in medical images is typically spatially constricted. In this work, we explore the use of interpretability methods to localize relevant regions of images, leading to more focused feature representations, and, therefore, to improved medical image retrieval. As a proof-of-concept, experiments were conducted using a publicly available Chest X-ray dataset, with results showing that the proposed interpretability-guided image retrieval translates better the similarity measure of an experienced radiologist than state-of-the-art image retrieval methods. Furthermore, it also improves the class-consistency of top retrieved results, and enhances the interpretability of the whole system, by accompanying the retrieval with visual explanations.

Wilson Silva, Alexander Poellinger, Jaime S. Cardoso, Mauricio Reyes
Domain Aware Medical Image Classifier Interpretation by Counterfactual Impact Analysis

The success of machine learning methods for computer vision tasks has driven a surge in computer assisted prediction for medicine and biology. Based on a data-driven relationship between input image and pathological classification, these predictors deliver unprecedented accuracy. Yet, the numerous approaches trying to explain the causality of this learned relationship have fallen short: time constraints, coarse, diffuse and at times misleading results, caused by the employment of heuristic techniques like Gaussian noise and blurring, have hindered their clinical adoption.In this work, we discuss and overcome these obstacles by introducing a neural-network based attribution method, applicable to any trained predictor. Our solution identifies salient regions of an input image in a single forward-pass by measuring the effect of local image-perturbations on a predictor’s score. We replace heuristic techniques with a strong neighborhood conditioned inpainting approach, avoiding anatomically implausible, hence adversarial artifacts. We evaluate on public mammography data and compare against existing state-of-the-art methods. Furthermore, we exemplify the approach’s generalizability by demonstrating results on chest X-rays. Our solution shows, both quantitatively and qualitatively, a significant reduction of localization ambiguity and clearer conveying results, without sacrificing time efficiency.

Dimitrios Lenis, David Major, Maria Wimmer, Astrid Berg, Gert Sluiter, Katja Bühler
Towards Emergent Language Symbolic Semantic Segmentation and Model Interpretability

Recent advances in methods focused on the grounding problem have resulted in techniques that can be used to construct a symbolic language associated with a specific domain. Inspired by how humans communicate complex ideas through language, we developed a generalized Symbolic Semantic (S2) framework for interpretable segmentation. Unlike adversarial models (e.g., GANs), we explicitly model cooperation between two agents, a Sender and a Receiver, that must cooperate to achieve a common goal. The Sender receives information from a high layer of a segmentation network and generates a symbolic sentence derived from a categorical distribution. The Receiver obtains the symbolic sentences and cogenerates the segmentation mask. In order for the model to converge, the Sender and Receiver must learn to communicate using a private language. We apply our architecture to segment tumors in the TCGA dataset. A UNet-like architecture is used to generate input to the Sender network which produces a symbolic sentence, and a Receiver network cogenerates the segmentation mask based on the sentence. Our Sementation framework achieved similar or better performance compared with state-of-the-art segmentation methods. In addition, our results suggest direct interpretation of the symbolic sentences to discriminate between normal and tumor tissue, tumor morphology, and other image characteristics.

Alberto Santamaria-Pang, James Kubricht, Aritra Chowdhury, Chitresh Bhushan, Peter Tu
Meta Corrupted Pixels Mining for Medical Image Segmentation

Deep neural networks have achieved satisfactory performance in piles of medical image analysis tasks. However the training of deep neural network requires a large amount of samples with high-quality annotations. In medical image segmentation, it is very laborious and expensive to acquire precise pixel-level annotations. Aiming at training deep segmentation models on datasets with probably corrupted annotations, we propose a novel Meta Corrupted Pixels Mining (MCPM) method based on a simple meta mask network. Our method is targeted at automatically estimate a weighting map to evaluate the importance of every pixel in the learning of segmentation network. The meta mask network which regards the loss value map of the predicted segmentation results as input, is capable of identifying out corrupted layers and allocating small weights to them. An alternative algorithm is adopted to train the segmentation network and the meta mask network, simultaneously. Extensive experimental results on LIDC-IDRI and LiTS datasets show that our method outperforms state-of-the-art approaches which are devised for coping with corrupted annotations.

Jixin Wang, Sanping Zhou, Chaowei Fang, Le Wang, Jinjun Wang
UXNet: Searching Multi-level Feature Aggregation for 3D Medical Image Segmentation

Aggregating multi-level feature representation plays a critical role in achieving robust volumetric medical image segmentation, which is important for the auxiliary diagnosis and treatment. Unlike the recent neural architecture search (NAS) methods that typically searched the optimal operators in each network layer, but missed a good strategy to search for feature aggregations, this paper proposes a novel NAS method for 3D medical image segmentation, named UXNet, which searches both the scale-wise feature aggregation strategies as well as the block-wise operators in the encoder-decoder network. UXNet has several appealing benefits. (1) It significantly improves flexibility of the classical UNet architecture, which only aggregates feature representations of encoder and decoder in equivalent resolution. (2) A continuous relaxation of UXNet is carefully designed, enabling its searching scheme performed in an efficient differentiable manner. (3) Extensive experiments demonstrate the effectiveness of UXNet compared with recent NAS methods for medical image segmentation. The architecture discovered by UXNet outperforms existing state-of-the-art models in terms of Dice on several public 3D medical image segmentation benchmarks, especially for the boundary locations and tiny tissues. The searching computational complexity of UXNet is cheap, enabling to search a network with best performance less than 1.5 days on two TitanXP GPUs.

Yuanfeng Ji, Ruimao Zhang, Zhen Li, Jiamin Ren, Shaoting Zhang, Ping Luo
Difficulty-Aware Meta-learning for Rare Disease Diagnosis

Rare diseases have extremely low-data regimes, unlike common diseases with large amount of available labeled data. Hence, to train a neural network to classify rare diseases with a few per-class data samples is very challenging, and so far, catches very little attention. In this paper, we present a difficulty-aware meta-learning method to address rare disease classifications and demonstrate its capability to classify dermoscopy images. Our key approach is to first train and construct a meta-learning model from data of common diseases, then adapt the model to perform rare disease classification. To achieve this, we develop the difficulty-aware meta-learning method that dynamically monitors the importance of learning tasks during the meta-optimization stage. To evaluate our method, we use the recent ISIC 2018 skin lesion classification dataset, and show that with only five samples per class, our model can quickly adapt to classify unseen classes by a high AUC of 83.3%. Also, we evaluated several rare disease classification results in the public Dermofit Image Library to demonstrate the potential of our method for real clinical practice.

Xiaomeng Li, Lequan Yu, Yueming Jin, Chi-Wing Fu, Lei Xing, Pheng-Ann Heng
Few Is Enough: Task-Augmented Active Meta-learning for Brain Cell Classification

Deep Neural Networks (or DNNs) must constantly cope with distribution changes in the input data when the task of interest or the data collection protocol changes. Retraining a network from scratch to combat this issue poses a significant cost. Meta-learning aims to deliver an adaptive model that is sensitive to these underlying distribution changes, but requires many tasks during the meta-training process. In this paper, we propose a tAsk-auGmented actIve meta-LEarning (AGILE) method to efficiently adapt DNNs to new tasks by using a small number of training examples. AGILE combines a meta-learning algorithm with a novel task augmentation technique which we use to generate an initial adaptive model. It then uses Bayesian dropout uncertainty estimates to actively select the most difficult samples when updating the model to a new task. This allows AGILE to learn with fewer tasks and a few informative samples, achieving high performance with a limited dataset. We perform our experiments using the brain cell classification task and compare the results to a plain meta-learning model trained from scratch. We show that the proposed task-augmented meta-learning framework can learn to classify new cell types after a single gradient step with a limited number of training samples. We show that active learning with Bayesian uncertainty can further improve the performance when the number of training samples is extremely small. Using only 1% of the training data and a single update step, we achieved 90% accuracy on the new cell type classification task, a 50% points improvement over a state-of-the-art meta-learning algorithm.

Pengyu Yuan, Aryan Mobiny, Jahandar Jahanipour, Xiaoyang Li, Pietro Antonio Cicalese, Badrinath Roysam, Vishal M. Patel, Maric Dragan, Hien Van Nguyen
Automatic Data Augmentation for 3D Medical Image Segmentation

Data augmentation is an effective and universal technique for improving generalization performance of deep neural networks. It could enrich diversity of training samples that is essential in medical image segmentation tasks because 1) the scale of medical image dataset is typically smaller, which may increase the risk of overfitting; 2) the shape and modality of different objects such as organs or tumors are unique, thus requiring customized data augmentation policy. However, most data augmentation implementations are hand-crafted and suboptimal in medical image processing. To fully exploit the potential of data augmentation, we propose an efficient algorithm to automatically search for the optimal augmentation strategies. We formulate the coupled optimization w.r.t. network weights and augmentation parameters into a differentiable form by means of stochastic relaxation. This formulation allows us to apply alternative gradient-based methods to solve it, i.e. stochastic natural gradient method with adaptive step-size. To the best of our knowledge, it is the first time that differentiable automatic data augmentation is employed in medical image segmentation tasks. Our numerical experiments demonstrate that the proposed approach significantly outperforms existing build-in data augmentation of state-of-the-art models.

Ju Xu, Mengzhang Li, Zhanxing Zhu
MS-NAS: Multi-scale Neural Architecture Search for Medical Image Segmentation

The recent breakthroughs of Neural Architecture Search (NAS) have motivated various applications in medical image segmentation. However, most existing work either simply rely on hyper-parameter tuning or stick to a fixed network backbone, thereby limiting the underlying search space to identify more efficient architecture. This paper presents a Multi-Scale NAS (MS-NAS) framework that is featured with multi-scale search space from network backbone to cell operation, and multi-scale fusion capability to fuse features with different sizes. To mitigate the computational overhead due to the larger search space, a partial channel connection scheme and a two-step decoding method are utilized to reduce computational overhead while maintaining optimization quality. Experimental results show that on various datasets for segmentation, MS-NAS outperforms the state-of-the-art methods and achieves 0.6–5.4% mIOU and 0.4–3.5% DSC improvements, while the computational resource consumption is reduced by 18.0–24.9%.

Xingang Yan, Weiwen Jiang, Yiyu Shi, Cheng Zhuo
Comparing to Learn: Surpassing ImageNet Pretraining on Radiographs by Comparing Image Representations

In deep learning era, pretrained models play an important role in medical image analysis, in which ImageNet pretraining has been widely adopted as the best way. However, it is undeniable that there exists an obvious domain gap between natural images and medical images. To bridge this gap, we propose a new pretraining method which learns from 700k radiographs given no manual annotations. We call our method as Comparing to Learn (C2L) because it learns robust features by comparing different image representations. To verify the effectiveness of C2L, we conduct comprehensive ablation studies and evaluate it on different tasks and datasets. The experimental results on radiographs show that C2L can outperform ImageNet pretraining and previous state-of-the-art approaches significantly. Code and models are available at https://github.com/funnyzhou/C2L_MICCAI2020 .

Hong-Yu Zhou, Shuang Yu, Cheng Bian, Yifan Hu, Kai Ma, Yefeng Zheng
Dual-Task Self-supervision for Cross-modality Domain Adaptation

Data annotation is always an expensive and time-consuming issue for deep learning based medical image analysis. To ease the need of annotations, domain adaptation is recently introduced to generalize neural networks from a labeled source domain to unlabeled target domain without much performance degradation. In this paper, we propose a novel target domain self-supervision for domain adaptation by constructing an edge generation auxiliary task to assist primary segmentation task so as to extract better target representation and improve target segmentation performance. Besides, in order to leverage detailed information contained in low-level features, we propose a hierarchical low-level adversarial learning mechanism to encourage low-level features domain uninformative in a hierarchical way, so that the segmentation performance can benefit from low-level features without being affected by domain shift. Following these two proposed approach, we develop a cross-modality domain adaptation framework which employs the dual-task collaboration for target domain self-supervision, and encourages low-level detailed features domain uninformative for better alignment. Our proposed framework achieves state-of-the-art results on public cross-modality segmentation datasets.

Yingying Xue, Shixiang Feng, Ya Zhang, Xiaoyun Zhang, Yanfeng Wang
Dual-Teacher: Integrating Intra-domain and Inter-domain Teachers for Annotation-Efficient Cardiac Segmentation

Medical image annotations are prohibitively time-consuming and expensive to obtain. To alleviate annotation scarcity, many approaches have been developed to efficiently utilize extra information, e.g., semi-supervised learning further exploring plentiful unlabeled data, domain adaptation including multi-modality learning and unsupervised domain adaptation resorting to the prior knowledge from additional modality. In this paper, we aim to investigate the feasibility of simultaneously leveraging abundant unlabeled data and well-established cross-modality data for annotation-efficient medical image segmentation. To this end, we propose a novel semi-supervised domain adaptation approach, namely Dual-Teacher, where the student model not only learns from labeled target data (e.g., CT), but also explores unlabeled target data and labeled source data (e.g., MR) by two teacher models. Specifically, the student model learns the knowledge of unlabeled target data from intra-domain teacher by encouraging prediction consistency, as well as the shape priors embedded in labeled source data from inter-domain teacher via knowledge distillation. Consequently, the student model can effectively exploit the information from all three data resources and comprehensively integrate them to achieve improved performance. We conduct extensive experiments on MM-WHS 2017 dataset and demonstrate that our approach is able to concurrently utilize unlabeled data and cross-modality data with superior performance, outperforming semi-supervised learning and domain adaptation methods with a large margin.

Kang Li, Shujun Wang, Lequan Yu, Pheng-Ann Heng
Test-Time Unsupervised Domain Adaptation

Convolutional neural networks trained on publicly available medical imaging datasets (source domain) rarely generalise to different scanners or acquisition protocols (target domain). This motivates the active field of domain adaptation. While some approaches to the problem require labelled data from the target domain, others adopt an unsupervised approach to domain adaptation (UDA). Evaluating UDA methods consists of measuring the model’s ability to generalise to unseen data in the target domain. In this work, we argue that this is not as useful as adapting to the test set directly. We therefore propose an evaluation framework where we perform test-time UDA on each subject separately. We show that models adapted to a specific target subject from the target domain outperform a domain adaptation method which has seen more data of the target domain but not this specific target subject. This result supports the thesis that unsupervised domain adaptation should be used at test-time, even if only using a single target-domain subject.

Thomas Varsavsky, Mauricio Orbes-Arteaga, Carole H. Sudre, Mark S. Graham, Parashkev Nachev, M. Jorge Cardoso
Self Domain Adapted Network

Domain shift is a major problem for deploying deep networks in clinical practice. Network performance drops significantly with (target) images obtained differently than its (source) training data. Due to a lack of target label data, most work has focused on unsupervised domain adaptation (UDA). Current UDA methods need both source and target data to train models which perform image translation (harmonization) or learn domain-invariant features. However, training a model for each target domain is time consuming and computationally expensive, even infeasible when target domain data are scarce or source data are unavailable due to data privacy. In this paper, we propose a novel self domain adapted network (SDA-Net) that can rapidly adapt itself to a single test subject at the testing stage, without using extra data or training a UDA model. The SDA-Net consists of three parts: adaptors, task model, and auto-encoders. The latter two are pre-trained offline on labeled source images. The task model performs tasks like synthesis, segmentation, or classification, which may suffer from the domain shift problem. At the testing stage, the adaptors are trained to transform the input test image and features to reduce the domain shift as measured by the auto-encoders, and thus perform domain adaptation. We validated our method on retinal layer segmentation from different OCT scanners and T1 to T2 synthesis with T1 from different MRI scanners and with different imaging parameters. Results show that our SDA-Net, with a single test subject and a short amount of time for self adaptation at the testing stage, can achieve significant improvements.

Yufan He, Aaron Carass, Lianrui Zuo, Blake E. Dewey, Jerry L. Prince
Entropy Guided Unsupervised Domain Adaptation for Cross-Center Hip Cartilage Segmentation from MRI

Hip cartilage damage is a major predictor of the clinical outcome of surgical correction for femoroacetabular impingement (FAI) and hip dysplasia. Automatic segmentation for hip cartilage is an essential prior step in assessing cartilage damage status. Deep Convolutional Neural Networks have shown great success in various automated medical image segmentations, but testing on domain-shifted datasets (e.g. images obtained from different centers) can lead to severe performance losses. Creating annotations for each center is particularly expensive. Unsupervised Domain Adaptation (UDA) addresses this challenge by transferring knowledge from a domain with labels (source domain) to a domain without labels (target domain). In this paper, we propose an entropy-guided domain adaptation method to address this challenge. Specifically, we first trained our model with supervised loss on the source domain, which enables low-entropy predictions on source-like images. Two discriminators were then used to minimize the gap between source and target domain with respect to the alignment of feature and entropy distribution: the feature map discriminator $$D_F$$ D F and the entropy map discriminator $$D_E$$ D E . $$D_F$$ D F aligns the feature map of different domains, while $$D_E$$ D E matches the target segmentation to low-entropy predictions like those from the source domain. The results of comprehensive experiments on cross-center MRI hip cartilage segmentation show the effectiveness of this method.

Guodong Zeng, Florian Schmaranzer, Till D. Lerch, Adam Boschung, Guoyan Zheng, Jürgen Burger, Kate Gerber, Moritz Tannast, Klaus Siebenrock, Young-Jo Kim, Eduardo N. Novais, Nicolas Gerber
User-Guided Domain Adaptation for Rapid Annotation from User Interactions: A Study on Pathological Liver Segmentation

Mask-based annotation of medical images, especially for 3D data, is a bottleneck in developing reliable machine learning models. Using minimal-labor (UIs) to guide the annotation is promising, but challenges remain on best harmonizing the mask prediction with the UIs. To address this, we propose the user-guided domain adaptation (UGDA) framework, which uses prediction-based adversarial domain adaptation (PADA) to model the combined distribution of UIs and mask predictions. The UIs are then used as anchors to guide and align the mask prediction. Importantly, UGDA can both learn from unlabelled data and also model the high-level semantic meaning behind different UIs. We test UGDA on annotating pathological livers using a clinically comprehensive dataset of 927 patient studies. Using only extreme-point UIs, we achieve a mean (worst-case) performance of $$96.1\%$$ 96.1 % ( $$94.9\%$$ 94.9 % ), compared to $$93.0\%$$ 93.0 % ( $$87.0\%$$ 87.0 % ) for deep extreme points (DEXTR). Furthermore, we also show UGDA can retain this state-of-the-art performance even when only seeing a fraction of available UIs, demonstrating an ability for robust and reliable UI-guided segmentation with extremely minimal labor demands.

Ashwin Raju, Zhanghexuan Ji, Chi Tung Cheng, Jinzheng Cai, Junzhou Huang, Jing Xiao, Le Lu, ChienHung Liao, Adam P. Harrison
SALAD: Self-supervised Aggregation Learning for Anomaly Detection on X-Rays

Deep anomaly detection models using a supervised mode of learning usually work under a closed set assumption and suffer from overfitting to previously seen rare anomalies at training, which hinders their applicability in a real scenario. In addition, obtaining annotations for X-rays is very time consuming and requires extensive training of radiologists. Hence, training anomaly detection in a fully unsupervised or self-supervised fashion would be advantageous, allowing a significant reduction of time spent on the report by radiologists. In this paper, we present SALAD, an end-to-end deep self-supervised methodology for anomaly detection on X-Ray images. The proposed method is based on an optimization strategy in which a deep neural network is encouraged to represent prototypical local patterns of the normal data in the embedding space. During training, we record the prototypical patterns of normal training samples via a memory bank. Our anomaly score is then derived by measuring similarity to a weighted combination of normal prototypical patterns within a memory bank without using any anomalous patterns. We present extensive experiments on the challenging NIH Chest X-rays and MURA dataset, which indicate that our algorithm improves state-of-the-art methods by a wide margin.

Behzad Bozorgtabar, Dwarikanath Mahapatra, Guillaume Vray, Jean-Philippe Thiran
Scribble-Based Domain Adaptation via Co-segmentation

Although deep convolutional networks have reached state-of-the-art performance in many medical image segmentation tasks, they have typically demonstrated poor generalisation capability. To be able to generalise from one domain (e.g. one imaging modality) to another, domain adaptation has to be performed. While supervised methods may lead to good performance, they require to fully annotate additional data which may not be an option in practice. In contrast, unsupervised methods don’t need additional annotations but are usually unstable and hard to train. In this work, we propose a novel weakly-supervised method. Instead of requiring detailed but time-consuming annotations, scribbles on the target domain are used to perform domain adaptation. This paper introduces a new formulation of domain adaptation based on structured learning and co-segmentation. Our method is easy to train, thanks to the introduction of a regularised loss. The framework is validated on Vestibular Schwannoma segmentation (T1 to T2 scans). Our proposed method outperforms unsupervised approaches and achieves comparable performance to a fully-supervised approach.

Reuben Dorent, Samuel Joutard, Jonathan Shapey, Sotirios Bisdas, Neil Kitchen, Robert Bradford, Shakeel Saeed, Marc Modat, Sébastien Ourselin, Tom Vercauteren
Source-Relaxed Domain Adaptation for Image Segmentation

Domain adaptation (DA) has drawn high interests for its capacity to adapt a model trained on labeled source data to perform well on unlabeled or weakly labeled target data from a different domain. Most common DA techniques require the concurrent access to the input images of both the source and target domains. However, in practice, it is common that the source images are not available in the adaptation phase. This is a very frequent DA scenario in medical imaging, for instance, when the source and target images come from different clinical sites. We propose a novel formulation for adapting segmentation networks, which relaxes such a constraint. Our formulation is based on minimizing a label-free entropy loss defined over target-domain data, which we further guide with a domain-invariant prior on the segmentation regions. Many priors can be used, derived from anatomical information. Here, a class-ratio prior is learned via an auxiliary network and integrated in the form of a Kullback–Leibler (KL) divergence in our overall loss function. We show the effectiveness of our prior-aware entropy minimization in adapting spine segmentation across different MRI modalities. Our method yields comparable results to several state-of-the-art adaptation techniques, even though is has access to less information, the source images being absent in the adaptation phase. Our straight-forward adaptation strategy only uses one network, contrary to popular adversarial techniques, which cannot perform without the presence of the source images. Our framework can be readily used with various priors and segmentation problems.

Mathilde Bateson, Hoel Kervadec, Jose Dolz, Hervé Lombaert, Ismail Ben Ayed
Region-of-Interest Guided Supervoxel Inpainting for Self-supervision

Self-supervised learning has proven to be invaluable in making best use of all of the available data in biomedical image segmentation. One particularly simple and effective mechanism to achieve self-supervision is inpainting, the task of predicting arbitrary missing areas based on the rest of an image. In this work, we focus on image inpainting as the self-supervised proxy task, and propose two novel structural changes to further enhance the performance. Our method can be regarded as an efficient addition to self-supervision, where we guide the process of generating images to inpaint by using supervoxel-based masking instead of random masking, and also by focusing on the area to be segmented in the primary task, which we term as the region-of-interest. We postulate that these additions force the network to learn semantics that are more attuned to the primary task, and test our hypotheses on two applications: brain tumour and white matter hyperintensities segmentation. We empirically show that our proposed approach consistently outperforms both supervised CNNs, without any self-supervision, and conventional inpainting-based self-supervision methods on both large and small training set sizes.

Subhradeep Kayal, Shuai Chen, Marleen de Bruijne
Harnessing Uncertainty in Domain Adaptation for MRI Prostate Lesion Segmentation

The need for training data can impede the adoption of novel imaging modalities for learning-based medical image analysis. Domain adaptation methods partially mitigate this problem by translating training data from a related source domain to a novel target domain, but typically assume that a one-to-one translation is possible. Our work addresses the challenge of adapting to a more informative target domain where multiple target samples can emerge from a single source sample. In particular we consider translating from mp-MRI to VERDICT, a richer MRI modality involving an optimized acquisition protocol for cancer characterization. We explicitly account for the inherent uncertainty of this mapping and exploit it to generate multiple outputs conditioned on a single input. Our results show that this allows us to extract systematically better image representations for the target domain, when used in tandem with both simple, CycleGAN-based baselines, as well as more powerful approaches that integrate discriminative segmentation losses and/or residual adapters. When compared to its deterministic counterparts, our approach yields substantial improvements across a broad range of dataset sizes, increasingly strong baselines, and evaluation measures.

Eleni Chiou, Francesco Giganti, Shonit Punwani, Iasonas Kokkinos, Eleftheria Panagiotaki
Deep Semi-supervised Knowledge Distillation for Overlapping Cervical Cell Instance Segmentation

Deep learning methods show promising results for overlapping cervical cell instance segmentation. However, in order to train a model with good generalization ability, voluminous pixel-level annotations are demanded which is quite expensive and time-consuming for acquisition. In this paper, we propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation. We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining (MMT-PSM), which consists of a teacher and a student network during training. Two networks are encouraged to be consistent both in feature and semantic level under small perturbations. The teacher’s self-ensemble predictions from K-time augmented samples are used to construct the reliable pseudo-labels for optimizing the student. We design a novel strategy to estimate the sensitivity to perturbations for each proposal and select informative samples from massive cases to facilitate fast and effective semantic distillation. In addition, to eliminate the unavoidable noise from the background region, we propose to use the predicted segmentation mask as guidance to enforce the feature distillation in the foreground region. Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only, and outperforms state-of-the-art semi-supervised methods. Code: https://github.com/SIAAAAAA/MMT-PSM .

Yanning Zhou, Hao Chen, Huangjing Lin, Pheng-Ann Heng
DMNet: Difference Minimization Network for Semi-supervised Segmentation in Medical Images

Semantic segmentation is an important task in medical image analysis. In general, training models with high performance needs a large amount of labeled data. However, collecting labeled data is typically difficult, especially for medical images. Several semi-supervised methods have been proposed to use unlabeled data to facilitate learning. Most of these methods use a self-training framework, in which the model cannot be well trained if the pseudo masks predicted by the model itself are of low quality. Co-training is another widely used semi-supervised method in medical image segmentation. It uses two models and makes them learn from each other. All these methods are not end-to-end. In this paper, we propose a novel end-to-end approach, called difference minimization network (DMNet), for semi-supervised semantic segmentation. To use unlabeled data, DMNet adopts two decoder branches and minimizes the difference between soft masks generated by the two decoders. In this manner, each decoder can learn under the supervision of the other decoder, thus they can be improved at the same time. Also, to make the model generalize better, we force the model to generate low-entropy masks on unlabeled data so the decision boundary of model lies in low-density regions. Meanwhile, adversarial training strategy is adopted to learn a discriminator which can encourage the model to generate more accurate masks. Experiments on a kidney tumor dataset and a brain tumor dataset show that our method can outperform the baselines, including both supervised and semi-supervised ones, to achieve the best performance.

Kang Fang, Wu-Jun Li
Double-Uncertainty Weighted Method for Semi-supervised Learning

Though deep learning has achieved advanced performance recently, it remains a challenging task in the field of medical imaging, as obtaining reliable labeled training data is time-consuming and expensive. In this paper, we propose a double-uncertainty weighted method for semi-supervised segmentation based on the teacher-student model. The teacher model provides guidance for the student model by penalizing their inconsistent prediction on both labeled and unlabeled data. We train the teacher model using Bayesian deep learning to obtain double-uncertainty, i.e. segmentation uncertainty and feature uncertainty. It is the first to extend segmentation uncertainty estimation to feature uncertainty, which reveals the capability to capture information among channels. A learnable uncertainty consistency loss is designed for the unsupervised learning process in an interactive manner between prediction and uncertainty. With no ground-truth for supervision, it can still incentivize more accurate teacher’s predictions and facilitate the model to reduce uncertain estimations. Furthermore, our proposed double-uncertainty serves as a weight on each inconsistency penalty to balance and harmonize supervised and unsupervised training processes. We validate the proposed feature uncertainty and loss function through qualitative and quantitative analyses. Experimental results show that our method outperforms the state-of-the-art uncertainty-based semi-supervised methods on two public medical datasets.

Yixin Wang, Yao Zhang, Jiang Tian, Cheng Zhong, Zhongchao Shi, Yang Zhang, Zhiqiang He
Shape-Aware Semi-supervised 3D Semantic Segmentation for Medical Images

Semi-supervised learning has attracted much attention in medical image segmentation due to challenges in acquiring pixel-wise image annotations, which is a crucial step for building high-performance deep learning methods. Most existing semi-supervised segmentation approaches either tend to neglect geometric constraint in object segments, leading to incomplete object coverage, or impose strong shape prior that requires extra alignment. In this work, we propose a novel shape-aware semi-supervised segmentation strategy to leverage abundant unlabeled data and to enforce a geometric shape constraint on the segmentation output. To achieve this, we develop a multi-task deep network that jointly predicts semantic segmentation and signed distance map (SDM) of object surfaces. During training, we introduce an adversarial loss between the predicted SDMs of labeled and unlabeled data so that our network is able to capture shape-aware features more effectively. Experiments on the Atrial Segmentation Challenge dataset show that our method outperforms current state-of-the-art approaches with improved shape estimation, which validates its efficacy. Code is available at https://github.com/kleinzcy/SASSnet .

Shuailin Li, Chuyu Zhang, Xuming He
Local and Global Structure-Aware Entropy Regularized Mean Teacher Model for 3D Left Atrium Segmentation

Emerging self-ensembling methods have achieved promising semi-supervised segmentation performances on medical images through forcing consistent predictions of unannotated data under different perturbations. However, the consistency only penalizes on independent pixel-level predictions, making structure-level information of predictions not exploited in the learning procedure. In view of this, we propose a novel structure-aware entropy regularized mean teacher model to address the above limitation. Specifically, we firstly introduce the entropy minimization principle to the student network, thereby adjusting itself to produce high-confident predictions of unannotated images. Based on this, we design a local structural consistency loss to encourage the consistency of inter-voxel similarities within the same local region of predictions from teacher and student networks. To further capture local structural dependencies, we enforce the global structural consistency by matching the weighted self-information maps between two networks. In this way, our model can minimize the prediction uncertainty of unannotated images, and more importantly that it can capture local and global structural information and their complementarity. We evaluate the proposed method on a publicly available 3D left atrium MR image dataset. Experimental results demonstrate that our method achieves outstanding segmentation performances than the state-of-the-art approaches in scenes with limited annotated images.

Wenlong Hang, Wei Feng, Shuang Liang, Lequan Yu, Qiong Wang, Kup-Sze Choi, Jing Qin
Improving Dense Pixelwise Prediction of Epithelial Density Using Unsupervised Data Augmentation for Consistency Regularization

Although the amount of medical data keeps increasing, data annotations are scarce and often very difficult to obtain. It is even harder for the case that involves multi-modal imaging or data such as radiology-pathology correlation. In this regard, semi-supervised learning has the potential to leverage unlabeled data for improved medical image analysis. Herein, we propose a semi-supervised learning framework for the dense pixelwise prediction in multi-parametric magnetic resonance imaging (mpMRI). The proposed method predicts the epithelial density in mpMRI per-voxel basis. The ground truth annotations are only obtainable from the corresponding pathology images, which are often unavailable for mpMRI. Introducing unsupervised data augmentation and supervised training signal annealing strategies during training, the proposed method utilizes both labeled and unlabeled mpMRI in an efficient and effective manner. The experimental results demonstrate that the proposed framework is effective in improving the stability and accuracy of the density prediction. The proposed framework achieves the mean absolute error of 6.493, compared to 7.353 by the supervised learning counterpart, outperforming other competing methods. The results suggest that the semi-supervised learning framework could aid in resolving the scarcity of medical data and annotations, in particular for radiology-pathology correlation.

Minh Nguyen Nhat To, Sandeep Sankineni, Sheng Xu, Baris Turkbey, Peter A. Pinto, Vanessa Moreno, Maria Merino, Bradford J. Wood, Jin Tae Kwak
Knowledge-Guided Pretext Learning for Utero-Placental Interface Detection

Modern machine learning systems, such as convolutional neural networks rely on a rich collection of training data to learn discriminative representations. In many medical imaging applications, unfortunately, collecting a large set of well-annotated data is prohibitively expensive. To overcome data shortage and facilitate representation learning, we develop Knowledge-guided Pretext Learning (KPL) that learns anatomy-related image representations in a pretext task under the guidance of knowledge from the downstream target task. In the context of utero-placental interface detection in placental ultrasound, we find that KPL substantially improves the quality of the learned representations without consuming data from external sources such as ImageNet. It outperforms the widely adopted supervised pre-training and self-supervised learning approaches across model capacities and dataset scales. Our results suggest that pretext learning is a promising direction for representation learning in medical image analysis, especially in the small data regime.

Huan Qi, Sally Collins, J. Alison Noble
Self-supervised Depth Estimation to Regularise Semantic Segmentation in Knee Arthroscopy

Intra-operative automatic semantic segmentation of knee joint structures can assist surgeons during knee arthroscopy in terms of situational awareness. However, due to poor imaging conditions (e.g., low texture, overexposure, etc.), automatic semantic segmentation is a challenging scenario, which justifies the scarce literature on this topic. In this paper, we propose a novel self-supervised monocular depth estimation to regularise the training of the semantic segmentation in knee arthroscopy. To further regularise the depth estimation, we propose the use of clean training images captured by the stereo arthroscope of routine objects (presenting none of the poor imaging conditions and with rich texture information) to pre-train the model. We fine-tune such model to produce both the semantic segmentation and self-supervised monocular depth using stereo arthroscopic images taken from inside the knee. Using a data set containing 3868 arthroscopic images captured during cadaveric knee arthroscopy with semantic segmentation annotations, 2000 stereo image pairs of cadaveric knee arthroscopy, and 2150 stereo image pairs of routine objects, we show that our semantic segmentation regularised by self-supervised depth estimation produces a more accurate segmentation than a state-of-the-art semantic segmentation approach modeled exclusively with semantic segmentation annotation.

Fengbei Liu, Yaqub Jonmohamadi, Gabriel Maicas, Ajay K. Pandey, Gustavo Carneiro
Semi-supervised Medical Image Classification with Global Latent Mixing

Computer-aided diagnosis via deep learning relies on large-scale annotated data sets, which can be costly when involving expert knowledge. Semi-supervised learning (SSL) mitigates this challenge by leveraging unlabeled data. One effective SSL approach is to regularize the local smoothness of neural functions via perturbations around single data points. In this work, we argue that regularizing the global smoothness of neural functions by filling the void in between data points can further improve SSL. We present a novel SSL approach that trains the neural network on linear mixing of labeled and unlabeled data, at both the input and latent space in order to regularize different portions of the network. We evaluated the presented model on two distinct medical image data sets for semi-supervised classification of thoracic disease and skin lesion, demonstrating its improved performance over SSL with local perturbations and SSL with global mixing but at the input space only. Our code is available at https://github.com/Prasanna1991/LatentMixing .

Prashnna Kumar Gyawali, Sandesh Ghimire, Pradeep Bajracharya, Zhiyuan Li, Linwei Wang
Self-Loop Uncertainty: A Novel Pseudo-Label for Semi-supervised Medical Image Segmentation

Witnessing the success of deep learning neural networks in natural image processing, an increasing number of studies have been proposed to develop deep-learning-based frameworks for medical image segmentation. However, since the pixel-wise annotation of medical images is laborious and expensive, the amount of annotated data is usually deficient to well-train a neural network. In this paper, we propose a semi-supervised approach to train neural networks with limited labeled data and a large quantity of unlabeled images for medical image segmentation. A novel pseudo-label (namely self-loop uncertainty), generated by recurrently optimizing the neural network with a self-supervised task, is adopted as the ground-truth for the unlabeled images to augment the training set and boost the segmentation accuracy. The proposed self-loop uncertainty can be seen as an approximation of the uncertainty estimation yielded by ensembling multiple models with a significant reduction of inference time. Experimental results on two publicly available datasets demonstrate the effectiveness of our semi-supervised approach.

Yuexiang Li, Jiawei Chen, Xinpeng Xie, Kai Ma, Yefeng Zheng
Semi-supervised Classification of Diagnostic Radiographs with NoTeacher: A Teacher that is Not Mean

Deep learning approaches offer strong performance for radiology image classification, but are bottlenecked by the need for large labeled training datasets. Semi-supervised learning (SSL) methods that can leverage small labeled datasets alongside larger unlabeled datasets offer potential for reducing labeling cost. However, few studies have demonstrated gains of SSL for real-world radiology image classification. Here, we adapt three leading SSL methods (Mean Teacher, Virtual Adversarial Training, Pseudo-labeling) for radiograph classification, and characterize their performance on two public X-Ray and CT classification benchmarks. We observe that Mean Teacher can achieve good performance gains in the low labeled data regime, but is sensitive to hyperparameters and susceptible to confirmation bias. To address these issues, we introduce a novel SSL method named NoTeacher. This method incorporates a probabilistic graphical model to maximize mutual agreement between student networks, thereby eliminating the need for a teacher network. We show that NoTeacher outperforms contemporary SSL baselines by enforcing better consistency regularization, and achieves over 90% of the fully supervised AUROC with less than 5% labeling budget.

Balagopal Unnikrishnan, Cuong Manh Nguyen, Shafa Balaram, Chuan Sheng Foo, Pavitra Krishnaswamy
Predicting Potential Propensity of Adolescents to Drugs via New Semi-supervised Deep Ordinal Regression Model

Addiction to drugs between young people is one of the most severe problems in the real world, and it imposes a huge financial and emotional burden on their families and societies. Therefore, predicting potential inclination to drugs at earlier ages can prevent lots of detriments. In this paper, we propose a new semi-supervised deep ordinal regression model to predict the possible propensity of adolescents to marijuana using the diffusion MRI-derived mean diffusivity (MD) from 148 Regions of Interest (ROIs). The traditional deep ordinal regression models cannot be directly applied to our biomedical problem which only has a small number of labeled data, not enough to train the deep learning models. Thus, we design a semi-supervised learning mechanism for deep ordinal regression, such that both labeled and unlabeled data can be used to enhance the model training. In our experiments, we use the ABCD dataset, which contains MRI images of the adolescents under study and their answers in the Likert scale to a questionnaire containing questions about Marijuana. Experimental results on the ABCD dataset validate the superior performance of our new method. Our study provides an inexpensive way to predict the drug tendency using brain MRI data.

Alireza Ganjdanesh, Kamran Ghasedi, Liang Zhan, Weidong Cai, Heng Huang
Deep Q-Network-Driven Catheter Segmentation in 3D US by Hybrid Constrained Semi-supervised Learning and Dual-UNet

Catheter segmentation in 3D ultrasound is important for computer-assisted cardiac intervention. However, a large amount of labeled images are required to train a successful deep convolutional neural network (CNN) to segment the catheter, which is expensive and time-consuming. In this paper, we propose a novel catheter segmentation approach, which requests fewer annotations than the supervised learning method, but nevertheless achieves better performance. Our scheme considers a deep Q learning as the pre-localization step, which avoids voxel-level annotation and which can efficiently localize the target catheter. With the detected catheter, patch-based Dual-UNet is applied to segment the catheter in 3D volumetric data. To train the Dual-UNet with limited labeled images and leverage information of unlabeled images, we propose a novel semi-supervised scheme, which exploits unlabeled images based on hybrid constraints from predictions. Experiments show the proposed scheme achieves a higher performance than state-of-the-art semi-supervised methods, while it demonstrates that our method is able to learn from large-scale unlabeled images.

Hongxu Yang, Caifeng Shan, Alexander F. Kolen, Peter H. N. de With
Domain Adaptive Relational Reasoning for 3D Multi-organ Segmentation

In this paper, we present a novel unsupervised domain adaptation (UDA) method, named Domain Adaptive Relational Reasoning (DARR), to generalize 3D multi-organ segmentation models to medical data collected from different scanners and/or protocols (domains). Our method is inspired by the fact that the spatial relationship between internal structures in medical images is relatively fixed, e.g., a spleen is always located at the tail of a pancreas, which serves as a latent variable to transfer the knowledge shared across multiple domains. We formulate the spatial relationship by solving a jigsaw puzzle task, i.e., recovering a CT scan from its shuffled patches, and jointly train it with the organ segmentation task. To guarantee the transferability of the learned spatial relationship to multiple domains, we additionally introduce two schemes: 1) Employing a super-resolution network also jointly trained with the segmentation model to standardize medical images from different domain to a certain spatial resolution; 2) Adapting the spatial relationship for a test image by test-time jigsaw puzzle training. Experimental results show that our method improves the performance by $$29.60\%$$ 29.60 % DSC on target datasets on average without using any data from the target domain during training.

Shuhao Fu, Yongyi Lu, Yan Wang, Yuyin Zhou, Wei Shen, Elliot Fishman, Alan Yuille
Realistic Adversarial Data Augmentation for MR Image Segmentation

Neural network-based approaches can achieve high accuracy in various medical image segmentation tasks. However, they generally require large labelled datasets for supervised learning. Acquiring and manually labelling a large medical dataset is expensive and sometimes impractical due to data sharing and privacy issues. In this work, we propose an adversarial data augmentation method for training neural networks for medical image segmentation. Instead of generating pixel-wise adversarial attacks, our model generates plausible and realistic signal corruptions, which models the intensity inhomogeneities caused by a common type of artefacts in MR imaging: bias field. The proposed method does not rely on generative networks, and can be used as a plug-in module for general segmentation networks in both supervised and semi-supervised learning. Using cardiac MR imaging we show that such an approach can improve the generalization ability and robustness of models as well as provide significant improvements in low-data scenarios.

Chen Chen, Chen Qin, Huaqi Qiu, Cheng Ouyang, Shuo Wang, Liang Chen, Giacomo Tarroni, Wenjia Bai, Daniel Rueckert
Learning to Segment Anatomical Structures Accurately from One Exemplar

Accurate segmentation of critical anatomical structures is at the core of medical image analysis. The main bottleneck lies in gathering the requisite expert-labeled image annotations in a scalable manner. Methods that permit to produce accurate anatomical structure segmentation without using a large amount of fully annotated training images are highly desirable. In this work, we propose a novel contribution of Contour Transformer Network (CTN), a one-shot anatomy segmentor including a naturally built-in human-in-the-loop mechanism. Segmentation is formulated by learning a contour evolution behavior process based on graph convolutional networks (GCN). Training of our CTN model requires only one labeled image exemplar and leverages additional unlabeled data through newly introduced loss functions that measure the global shape and appearance consistency of contours. We demonstrate that our one-shot learning method significantly outperforms non-learning-based methods and performs competitively to the state-of-the-art fully supervised deep learning approaches. With minimal human-in-the-loop editing feedback, the segmentation performance can be further improved and tailored towards the observer desired outcomes. This can facilitate the clinician designed imaging-based biomarker assessments (to support personalized quantitative clinical diagnosis) and outperforms fully supervised baselines.

Yuhang Lu, Weijian Li, Kang Zheng, Yirui Wang, Adam P. Harrison, Chihung Lin, Song Wang, Jing Xiao, Le Lu, Chang-Fu Kuo, Shun Miao
Uncertainty Estimates as Data Selection Criteria to Boost Omni-Supervised Learning

For many medical applications, large quantities of imaging data are routinely obtained but it can be difficult and time-consuming to obtain high-quality labels for that data. We propose a novel uncertainty-based method to improve the performance of segmentation networks when limited manual labels are available in a large dataset. We estimate segmentation uncertainty on unlabeled data using test-time augmentation and test-time dropout. We then use uncertainty metrics to select unlabeled samples for further training in a semi-supervised learning framework. Compared to random data selection, our method gives a significant boost in Dice coefficient for semi-supervised volume segmentation on the EADC-ADNI/HARP MRI dataset and the large-scale INTERGROWTH-21st ultrasound dataset. Our results show a greater performance boost on the ultrasound dataset, suggesting that our method is most useful with data of lower or more variable quality.

Lorenzo Venturini, Aris T. Papageorghiou, J. Alison Noble, Ana I. L. Namburete
Extreme Consistency: Overcoming Annotation Scarcity and Domain Shifts

Supervised learning has proved effective for medical image analysis. However, it can utilize only the small labeled portion of data; it fails to leverage the large amounts of unlabeled data that is often available in medical image datasets. Supervised models are further handicapped by domain shifts, when the labeled dataset fails to cover different protocols or ethnicities. In this paper, we introduce extreme consistency, which overcomes the above limitations, by maximally leveraging unlabeled data from the same or a different domain in a teacher-student semi-supervised paradigm. Extreme consistency is the process of sending an extreme transformation of a given image to the student network and then constraining its prediction to be consistent with the teacher network’s prediction for the original image. The extreme nature of our consistency loss distinguishes our method from related works that yield suboptimal performance by exercising only mild prediction consistency. Our method is 1) auto-didactic, as it requires no extra expert annotations; 2) versatile, as it handles both domain shift and limited annotation problems; 3) generic, as it is readily applicable to classification, segmentation, and detection tasks; and 4) simple to implement, as it requires no adversarial training. We evaluate our method for the tasks of lesion and retinal vessel segmentation in skin and fundus images. Our experiments demonstrate a significant performance gain over both modern supervised networks and recent semi-supervised models. This performance is attributed to the strong regularization enforced by extreme consistency, which enables the student network to learn how to handle extreme variants of both labeled and unlabeled images. This enhances the network’s ability to tackle the inevitable same- and cross-domain data variability during inference.

Gaurav Fotedar, Nima Tajbakhsh, Shilpa Ananth, Xiaowei Ding
Spatio-Temporal Consistency and Negative Label Transfer for 3D Freehand US Segmentation

The manual segmentation of multiple organs in 3D ultrasound (US) sequences and volumes towards their quantitative analysis is very expensive and time-consuming. Fully supervised segmentation methods still require the collection of large volumes of annotated data while unlabeled images are abundant. In this work, we propose a novel semi-automatic deep learning approach modeled as a weak-label learning problem: given a few 2-D annotations for selected slices, the goal is to propagate the masks to the entire sequence. To this end, we make use of both positive and negative constraints induced by incomplete labels to penalize the segmentation loss function. Our model is composed of one encoder and two decoders to model the segmentation and an auxiliary reconstruction task. Moreover, we consider the spatio-temporal information by deploying a Convolutional Long Short Term Memory module. Our findings suggest that the reconstruction decoder and the Spatio-temporal information lead to a better geometrical estimation of the mask shape. We apply the model to the task of low-limb muscle segmentation in a dataset of 44 patients and 6160 images.

Vanessa Gonzalez Duque, Dawood Al Chanti, Marion Crouzier, Antoine Nordez, Lilian Lacourpaille, Diana Mateus
Characterizing Label Errors: Confident Learning for Noisy-Labeled Image Segmentation

Convolutional neural networks (CNNs) have achieved remarkable performance in image processing for its mighty capability to fit huge amount of data. However, if the training data are corrupted by noisy labels, the resulting performance might be deteriorated. In the domain of medical image analysis, this dilemma becomes extremely severe. This is because the medical image annotation always requires medical expertise and clinical experience, which would inevitably introduce subjectivity. In this paper, we design a novel algorithm based on the teacher-student architecture for noisy-labeled medical image segmentation. Creatively, We introduce confident learning (CL) method to identify the corrupted labels and endow CNN an anti-interference ability to the noises. Specifically, the CL technique is introduced to the teacher model to characterize the suspected wrong-labeled pixels. Since the noise identification maps are a little away from sufficient precision, the spatial label smoothing regularization technique is utilized to generate soft-corrected masks for training the student model. Since our method identifies and revises the noisy labels of the training data in a pixel-level rather than simply assigns lower weights to the noisy masks, it outperforms the state-of-the-art method in the noisy-labeled image segmentation task on the JSRT dataset, especially when the training data are severely corrupted by noises.

Minqing Zhang, Jiantao Gao, Zhen Lyu, Weibing Zhao, Qin Wang, Weizhen Ding, Sheng Wang, Zhen Li, Shuguang Cui
Leveraging Undiagnosed Data for Glaucoma Classification with Teacher-Student Learning

Recently, deep learning has been adopted to the glaucoma classification task with performance comparable to that of human experts. However, a well trained deep learning model demands a large quantity of properly labeled data, which is relatively expensive since the accurate labeling of glaucoma requires years of specialist training. In order to alleviate this problem, we propose a glaucoma classification framework which takes advantage of not only the properly labeled images, but also undiagnosed images without glaucoma labels. To be more specific, the proposed framework is adapted from the teacher-student-learning paradigm. The teacher model encodes the wrapped information of undiagnosed images to a latent feature space, meanwhile the student model learns from the teacher through knowledge transfer to improve the glaucoma classification. For the model training procedure, we propose a novel training strategy that simulates the real-world teaching practice named as “Learning To Teach with Knowledge Transfer (L2T-KT)", and establish a“Quiz Pool" as the teacher’s optimization target. Experiments show that the proposed framework is able to utilize the undiagnosed data effectively to improve the glaucoma prediction performance.

Junde Wu, Shuang Yu, Wenting Chen, Kai Ma, Rao Fu, Hanruo Liu, Xiaoguang Di, Yefeng Zheng
Difficulty-Aware Glaucoma Classification with Multi-rater Consensus Modeling

Medical images are generally labeled by multiple experts before the final ground-truth labels are determined. Consensus or disagreement among experts regarding individual images reflects the gradeability and difficulty levels of the image. However, when being used for model training, only the final ground-truth label is utilized, while the critical information contained in the raw multi-rater gradings regarding the image being an easy/hard case is discarded. In this paper, we aim to take advantage of the raw multi-rater gradings to improve the deep learning model performance for the glaucoma classification task. Specifically, a multi-branch model structure is proposed to predict the most sensitive, most specifical and a balanced fused result for the input images. In order to encourage the sensitivity branch and specificity branch to generate consistent results for consensus labels and opposite results for disagreement labels, a consensus loss is proposed to constrain the output of the two branches. Meanwhile, the consistency/inconsistency between the prediction results of the two branches implies the image being an easy/hard case, which is further utilized to encourage the balanced fusion branch to concentrate more on the hard cases. Compared with models trained only with the final ground-truth labels, the proposed method using multi-rater consensus information has achieved superior performance, and it is also able to estimate the difficulty levels of individual input images when making the prediction.

Shuang Yu, Hong-Yu Zhou, Kai Ma, Cheng Bian, Chunyan Chu, Hanruo Liu, Yefeng Zheng
Intra-operative Forecasting of Growth Modulation Spine Surgery Outcomes with Spatio-Temporal Dynamic Networks

Vertebral Body Growth Modulation (VBGM) allows to treat mild to severe spinal deformations by tethering vertebral bodies together, helping to preserve lower back flexibility. Forecasting the outcome of VBGM from skeletally immature patients remains elusive with several factors involved in corrective vertebral tethering, but could help orthopaedic surgeons plan and tailor VBGM procedures prior to surgery. We introduce a novel intra-operative framework forecasting the outcomes during VBGM surgery in scoliosis patients. The method is based on spatial-temporal corrective networks, which learns the similarity in segmental corrections between patients and integrates a long-term shifting mechanism designed to cope with timing differences in onset to surgery dates, between patients in the training set. The model captures dynamic geometric dependencies in scoliosis patients, as well as ensuring long-term dependancy with temporal dynamics in curve evolution. The loss function of the network introduces a regularization term based on learned group-average piecewise-geodesic path to ensure the generated corrective transformations are coherent with regards to the observed evolution of spine corrections at follow-up exams. The network was trained on 695 3D spine models and tested on 72 patients using a set of pre-operative spine reconstructions as inputs. The spatio-temporal network predicted outputs with errors of $$2.1 \pm 0.9$$ 2.1 ± 0.9 mm in 3D anatomical landmarks, and yielding geometries similar to ground-truth reconstructions.

William Mandel, Stefan Parent, Samuel Kadoury
Self-supervision on Unlabelled or Data for Multi-person 2D/3D Human Pose Estimation

2D/3D human pose estimation is needed to develop novel intelligent tools for the operating room that can analyze and support the clinical activities. The lack of annotated data and the complexity of state-of-the-art pose estimation approaches limit, however, the deployment of such techniques inside the OR. In this work, we propose to use knowledge distillation in a teacher/student framework to harness the knowledge present in a large-scale non-annotated dataset and in an accurate but complex multi-stage teacher network to train a lightweight network for joint 2D/3D pose estimation. The teacher network also exploits the unlabeled data to generate both hard and soft labels useful in improving the student predictions. The easily deployable network trained using this effective self-supervision strategy performs on par with the teacher network on MVOR+, an extension of the public MVOR dataset where all persons have been fully annotated, thus providing a viable solution for real-time 2D/3D human pose estimation in the OR.

Vinkle Srivastav, Afshin Gangi, Nicolas Padoy
Knowledge Distillation from Multi-modal to Mono-modal Segmentation Networks

The joint use of multiple imaging modalities for medical image segmentation has been widely studied in recent years. The fusion of information from different modalities has demonstrated to improve the segmentation accuracy, with respect to mono-modal segmentations, in several applications. However, acquiring multiple modalities is usually not possible in a clinical setting due to a limited number of physicians and scanners, and to limit costs and scan time. Most of the time, only one modality is acquired. In this paper, we propose KD-Net, a framework to transfer knowledge from a trained multi-modal network (teacher) to a mono-modal one (student). The proposed method is an adaptation of the generalized distillation framework where the student network is trained on a subset (1 modality) of the teacher’s inputs (n modalities). We illustrate the effectiveness of the proposed framework in brain tumor segmentation with the BraTS 2018 dataset. Using different architectures, we show that the student network effectively learns from the teacher and always outperforms the baseline mono-modal network in terms of segmentation accuracy.

Minhao Hu, Matthis Maillard, Ya Zhang, Tommaso Ciceri, Giammarco La Barbera, Isabelle Bloch, Pietro Gori
Heterogeneity Measurement of Cardiac Tissues Leveraging Uncertainty Information from Image Segmentation

Identifying arrhythmia substrates and quantifying their heterogeneity has great potential to provide critical guidance for radio frequency ablation. However, quantitative analysis of heterogeneity on cardiac optical coherence tomography (OCT) images is lacking. In this paper, we conduct the first study on quantifying cardiac tissue heterogeneity from human OCT images. Our proposed method applies a dropout-based Monte Carlo sampling technique to measure the model uncertainty. The heterogeneity information is extracted by decoupling the intra/inter-tissue heterogeneity and tissue boundary uncertainty from the uncertainty measurement. We empirically demonstrate that our model can highlight the subtle features from OCT images, and the heterogeneity information extracted is positively correlated with the tissue heterogeneity information from corresponding histology images.

Ziyi Huang, Yu Gan, Theresa Lye, Haofeng Zhang, Andrew Laine, Elsa D. Angelini, Christine Hendon
Efficient Shapley Explanation for Features Importance Estimation Under Uncertainty

Complex deep learning models have shown their impressive power in analyzing high-dimensional medical image data. To increase the trust of applying deep learning models in medical field, it is essential to understand why a particular prediction was reached. Data feature importance estimation is an important approach to understand both the model and the underlying properties of data. Shapley value explanation (SHAP) is a technique to fairly evaluate input feature importance of a given model. However, the existing SHAP-based explanation works have limitations such as 1) computational complexity, which hinders their applications on high-dimensional medical image data; 2) being sensitive to noise, which can lead to serious errors. Therefore, we propose an uncertainty estimation method for the feature importance results calculated by SHAP. Then we theoretically justify the methods under a Shapley value framework. Finally we evaluate our methods on MNIST and a public neuroimaging dataset. We show the potential of our method to discover disease related biomarkers from neuroimaging data.

Xiaoxiao Li, Yuan Zhou, Nicha C. Dvornek, Yufeng Gu, Pamela Ventola, James S. Duncan
Cartilage Segmentation in High-Resolution 3D Micro-CT Images via Uncertainty-Guided Self-training with Very Sparse Annotation

Craniofacial syndromes often involve skeletal defects of the head. Studying the development of the chondrocranium (the part of the endoskeleton that protects the brain and other sense organs) is crucial to understanding genotype-phenotype relationships and early detection of skeletal malformation. Our goal is to segment craniofacial cartilages in 3D micro-CT images of embryonic mice stained with phosphotungstic acid. However, due to high image resolution, complex object structures, and low contrast, delineating fine-grained structures in these images is very challenging, even manually. Specifically, only experts can differentiate cartilages, and it is unrealistic to manually label whole volumes for deep learning model training. We propose a new framework to progressively segment cartilages in high-resolution 3D micro-CT images using extremely sparse annotation (e.g., annotating only a few selected slices in a volume). Our model consists of a lightweight fully convolutional network (FCN) to accelerate the training speed and generate pseudo labels (PLs) for unlabeled slices. Meanwhile, we take into account the reliability of PLs using a bootstrap ensemble based uncertainty quantification method. Further, our framework gradually learns from the PLs with the guidance of the uncertainty estimation via self-training. Experiments show that our method achieves high segmentation accuracy compared to prior arts and obtains performance gains by iterative self-training.

Hao Zheng, Susan M. Motch Perrine, M. Kathleen Pitirri, Kazuhiko Kawasaki, Chaoli Wang, Joan T. Richtsmeier, Danny Z. Chen
Probabilistic 3D Surface Reconstruction from Sparse MRI Information

Surface reconstruction from magnetic resonance (MR) imaging data is indispensable in medical image analysis and clinical research. A reliable and effective reconstruction tool should: be fast in prediction of accurate well localised and high resolution models, evaluate prediction uncertainty, work with as little input data as possible. Current deep learning state of the art (SOTA) 3D reconstruction methods, however, often only produce shapes of limited variability positioned in a canonical position or lack uncertainty evaluation. In this paper, we present a novel probabilistic deep learning approach for concurrent 3D surface reconstruction from sparse 2D MR image data and aleatoric uncertainty prediction. Our method is capable of reconstructing large surface meshes from three quasi-orthogonal MR imaging slices from limited training sets whilst modelling the location of each mesh vertex through a Gaussian distribution. Prior shape information is encoded using a built-in linear principal component analysis (PCA) model. Extensive experiments on cardiac MR data show that our probabilistic approach successfully assesses prediction uncertainty while at the same time qualitatively and quantitatively outperforms SOTA methods in shape prediction. Compared to SOTA, we are capable of properly localising and orientating the prediction via the use of a spatially aware neural network.

Katarína Tóthová, Sarah Parisot, Matthew Lee, Esther Puyol-Antón, Andrew King, Marc Pollefeys, Ender Konukoglu
Can You Trust Predictive Uncertainty Under Real Dataset Shifts in Digital Pathology?

Deep learning-based algorithms have shown great promise for assisting pathologists in detecting lymph node metastases when evaluated based on their predictive accuracy. However, for clinical adoption, we need to know what happens when the test set dramatically changes from the training distribution. In such settings, we should estimate the uncertainty of the predictions, so we know when to trust the model (and when not to). Here, we i) investigate current popular methods for improving the calibration of predictive uncertainty, and ii) compare the performance and calibration of the methods under clinically relevant in-distribution dataset shifts. Furthermore, we iii) evaluate their performance on the task of out-of-distribution detection of a different histological cancer type not seen during training. Of the investigated methods, we show that deep ensembles are more robust in respect of both performance and calibration for in-distribution dataset shifts and allows us to better detect incorrect predictions. Our results also demonstrate that current methods for uncertainty quantification are not necessarily able to detect all dataset shifts, and we emphasize the importance of monitoring and controlling the input distribution when deploying deep learning for digital pathology.

Jeppe Thagaard, Søren Hauberg, Bert van der Vegt, Thomas Ebstrup, Johan D. Hansen, Anders B. Dahl
Deep Generative Model for Synthetic-CT Generation with Uncertainty Predictions

MR-only radiation treatment planning is attractive due to the superior soft tissue definition of MRI as compared to CT, and the elimination of the uncertainty introduced by CT-MRI registration. To facilitate MR-only radiation therapy planning, synthetic-CT (sCT) algorithms (for electron density correction) are required for dose calculation. Deep neural networks for sCT generation are useful due to their predictive power, but lack of uncertainty information is a concern for clinical implementation. The feasibility of using a conditional generative adversarial model (cGAN) to generate sCTs with accompanying uncertainty maps was investigated. Dropout-based variational inference was used to account for uncertainty in the trained model. The cGAN loss function was also combined with an additional term such that the network learns which regions of input data are associated with highly variable outputs. On a dataset of 105 brain cancer patients, our results demonstrate that the network generates well-calibrated uncertainty predictions and produces sCTs with equivalent accuracy as previously reported deterministic models.

Matt Hemsley, Brige Chugh, Mark Ruschin, Young Lee, Chia-Lin Tseng, Greg Stanisz, Angus Lau
Backmatter
Metadaten
Titel
Medical Image Computing and Computer Assisted Intervention – MICCAI 2020
herausgegeben von
Prof. Anne L. Martel
Purang Abolmaesumi
Danail Stoyanov
Diana Mateus
Maria A. Zuluaga
S. Kevin Zhou
Daniel Racoceanu
Prof. Leo Joskowicz
Copyright-Jahr
2020
Electronic ISBN
978-3-030-59710-8
Print ISBN
978-3-030-59709-2
DOI
https://doi.org/10.1007/978-3-030-59710-8

Premium Partner