Skip to main content
Top

2020 | Book

Interpretable and Annotation-Efficient Learning for Medical Image Computing

Third International Workshop, iMIMIC 2020, Second International Workshop, MIL3ID 2020, and 5th International Workshop, LABELS 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 4–8, 2020, Proceedings

Editors: Jaime Cardoso, Hien Van Nguyen, Nicholas Heller, Pedro Henriques Abreu, Ivana Isgum, Wilson Silva, Ricardo Cruz, Jose Pereira Amorim, Prof. Vishal Patel, Dr. Badri Roysam, Kevin Zhou, Steve Jiang, Ngan Le, Khoa Luu, Raphael Sznitman, Veronika Cheplygina, Diana Mateus, Prof. Emanuele Trucco, Samaneh Abbasi

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This book constitutes the refereed joint proceedings of the Third International Workshop on Interpretability of Machine Intelligence in Medical Image Computing, iMIMIC 2020, the Second International Workshop on Medical Image Learning with Less Labels and Imperfect Data, MIL3ID 2020, and the 5th International Workshop on Large-scale Annotation of Biomedical data and Expert Label Synthesis, LABELS 2020, held in conjunction with the 23rd International Conference on Medical Imaging and Computer-Assisted Intervention, MICCAI 2020, in Lima, Peru, in October 2020.

The 8 full papers presented at iMIMIC 2020, 11 full papers to MIL3ID 2020, and the 10 full papers presented at LABELS 2020 were carefully reviewed and selected from 16 submissions to iMIMIC, 28 to MIL3ID, and 12 submissions to LABELS. The iMIMIC papers focus on introducing the challenges and opportunities related to the topic of interpretability of machine learning systems in the context of medical imaging and computer assisted intervention. MIL3ID deals with best practices in medical image learning with label scarcity and data imperfection. The LABELS papers present a variety of approaches for dealing with a limited number of labels, from semi-supervised learning to crowdsourcing.

Table of Contents

Frontmatter

iMIMIC 2020

Frontmatter
Assessing Attribution Maps for Explaining CNN-Based Vertebral Fracture Classifiers
Abstract
Automated evaluation of vertebral fracture status on computed tomography (CT) scans acquired for various purposes (opportunistic CT) may substantially enhance vertebral fracture detection rate. Convolutional neural networks (CNNs) have shown promising performance in numerous tasks but their black box nature may hinder acceptance by physicians. We aim (a) to evaluate CNN architectures for osteoporotic fracture discrimination as part of a pipeline localizing and classifying vertebrae in CT images and (b) to evaluate the benefit of using attribution maps to explain a network’s decision. Training different model architectures on 3D patches containing vertebrae, we show that CNNs permit highly accurate discrimination of the fracture status of individual vertebrae. Explanations were computed using selected attribution methods: Gradient, Gradient * Input, Guided BackProp, and SmoothGrad algorithms. Quantitative and visual tests were conducted to evaluate the meaningfulness of the explanations (sanity checks). The explanations were found to depend on the model architecture, the realization of the parameters, and the precise position of the target object of interest.
Eren Bora Yilmaz, Alexander Oliver Mader, Tobias Fricke, Jaime Peña, Claus-Christian Glüer, Carsten Meyer
Projective Latent Interventions for Understanding and Fine-Tuning Classifiers
Abstract
High-dimensional latent representations learned by neural network classifiers are notoriously hard to interpret. Especially in medical applications, model developers and domain experts desire a better understanding of how these latent representations relate to the resulting classification performance. We present Projective Latent Interventions (PLIs), a technique for retraining classifiers by back-propagating manual changes made to low-dimensional embeddings of the latent space. The back-propagation is based on parametric approximations of \(t\)-distributed stochastic neighbourhood embeddings. PLIs allow domain experts to control the latent decision space in an intuitive way in order to better match their expectations. For instance, the performance for specific pairs of classes can be enhanced by manually separating the class clusters in the embedding. We evaluate our technique on a real-world scenario in fetal ultrasound imaging.
Andreas Hinterreiter, Marc Streit, Bernhard Kainz
Interpretable CNN Pruning for Preserving Scale-Covariant Features in Medical Imaging
Abstract
Image scale carries crucial information in medical imaging, e.g. the size and spatial frequency of local structures, lesions, tumors and cell nuclei. With feature transfer being a common practice, scale-invariant features implicitly learned from pretraining on ImageNet tend to be preferred over scale-covariant features. The pruning strategy in this paper proposes a way to maintain scale covariance in the transferred features. Deep learning interpretability is used to analyze the layer-wise encoding of scale information for popular architectures such as InceptionV3 and ResNet50. Interestingly, the covariance of scale peaks at central layers and decreases close to softmax. Motivated by these results, our pruning strategy removes the layers where invariance to scale is learned. The pruning operation leads to marked improvements in the regression of both nuclei areas and magnification levels of histopathology images. These are relevant applications to enlarge the existing medical datasets with open-access images as those of PubMed Central. All experiments are performed on publicly available data and the code is shared on GitHub.
Mara Graziani, Thomas Lompech, Henning Müller, Adrien Depeursinge, Vincent Andrearczyk
Improving the Performance and Explainability of Mammogram Classifiers with Local Annotations
Abstract
Cancer prediction models, which deeply impact human lives, must provide explanations for their predictions. We study a simple extension of a cancer mammogram classifier, trained with image-level annotations, to facilitate the built-in generation of prediction explanations. This extension also enables the classifier to learn from local annotations of malignant findings, if such are available. We tested this extended classifier for different percentages of local annotations in the training data. We evaluated the generated explanations by their level of agreement with (i) local annotations of malignant findings, and (ii) perturbation-based explanations, produced by the LIME method, which estimates the effect of each image segment on the classification score. Our results demonstrate an improvement in classification performance and explainability when local annotations are added to the training data. We observe that training with only 20–40% of the local annotations is sufficient to achieve improved performance and explainability comparable to a classifier trained with the entire set of local annotations.
Lior Ness, Ella Barkan, Michal Ozery-Flato
Improving Interpretability for Computer-Aided Diagnosis Tools on Whole Slide Imaging with Multiple Instance Learning and Gradient-Based Explanations
Abstract
Deep learning methods are widely used for medical applications to assist medical doctors in their daily routines. While performances reach expert’s level, interpretability (highlight how and what a trained model learned and why it makes a specific decision) is the next important challenge that deep learning methods need to answer to be fully integrated in the medical field. In this paper, we address the question of interpretability in the context of whole slide images (WSI) classification. We formalize the design of WSI classification architectures and propose a piece-wise interpretability approach, relying on gradient-based methods, feature visualization and multiple instance learning context. We aim at explaining how the decision is made based on tile level scoring, how these tile scores are decided and which features are used and relevant for the task. After training two WSI classification architectures on Camelyon-16 WSI dataset, highlighting discriminative features learned, and validating our approach with pathologists, we propose a novel manner of computing interpretability slide-level heat-maps, based on the extracted features, that improves tile-level classification performances by more than 29% for tile level AUC.
Antoine Pirovano, Hippolyte Heuberger, Sylvain Berlemont, Saïd Ladjal, Isabelle Bloch
Explainable Disease Classification via Weakly-Supervised Segmentation
Abstract
Deep learning based approaches to Computer Aided Diagnosis (CAD) typically pose the problem as an image classification (Normal or Abnormal) problem. These systems achieve high to very high accuracy in specific disease detection for which they are trained but lack in terms of an explanation for the provided decision/classification result. The activation maps which correspond to decisions do not correlate well with regions of interest for specific diseases. This paper examines this problem and proposes an approach which mimics the clinical practice of looking for an evidence prior to diagnosis. A CAD model is learnt using a mixed set of information: class labels for the entire training set of images plus a rough localisation of suspect regions as an extra input for a smaller subset of training images for guiding the learning. The proposed approach is illustrated with detection of diabetic macular edema (DME) from OCT slices. Results of testing on a large public dataset show that with just a third of images with roughly segmented fluid filled regions, the classification accuracy is on par with state of the art methods while providing a good explanation in the form of anatomically accurate heatmap /region of interest. The proposed solution is then adapted to Breast Cancer detection from mammographic images. Good evaluation results on public datasets underscores the generalisability of the proposed solution.
Aniket Joshi, Gaurav Mishra, Jayanthi Sivaswamy
Reliable Saliency Maps for Weakly-Supervised Localization of Disease Patterns
Abstract
Training convolutional neural networks with image-based labels leads to black-box image classification results. Saliency maps offer localization cues of class-relevant patterns, without requiring costly pixel-based labels. We show a failure mode for recently proposed weakly supervised localization models, e.g., models highlight the wrong input region, but classify correctly across all samples. Subsequently, we tested multiple architecture modifications, and propose two simple, but effective training approaches based on two-stage-learning and optional bounding box guidance, that avoid such misleading projections. Our saliency maps localize pneumonia patterns reliably and significantly better than gradCAM in terms of localization scores and expert radiologist’s ratings.
Maximilian Möller, Matthias Kohl, Stefan Braunewell, Florian Kofler, Benedikt Wiestler, Jan S. Kirschke, Björn H. Menze, Marie Piraud
Explainability for Regression CNN in Fetal Head Circumference Estimation from Ultrasound Images
Abstract
The measurement of fetal head circumference (HC) is performed throughout the pregnancy to monitor fetus growth using ultrasound (US) images. Recently, methods that directly predict biometric from images, instead of resorting to segmentation, have emerged. In our previous work, we have proposed such method, based on a regression convolutional neural network (CNN). If deep learning methods are the gold standard in most image processing tasks, they are often considered as black boxes and fail to provide interpretable decisions. In this paper, we investigate various saliency maps methods, to leverage their ability at explaining the predicted value of the regression CNN. Since saliency maps methods have been developed for classification CNN mostly, we provide an interpretation for regression saliency maps, as well as an adaptation of a perturbation-based quantitative evaluation of explanation methods. Results obtained on a public dataset of ultrasound images show that some saliency maps indeed exhibit the head contour as the most relevant features to assess the head circumference and also that the map quality depends on the backbone architecture and whether the prediction error is low or high.
Jing Zhang, Caroline Petitjean, Florian Yger, Samia Ainouz

MIL3ID 2020

Frontmatter
Recovering the Imperfect: Cell Segmentation in the Presence of Dynamically Localized Proteins
Abstract
Deploying off-the-shelf segmentation networks on biomedical data has become common practice, yet if structures of interest in an image sequence are visible only temporarily, existing frame-by-frame methods fail. In this paper, we provide a solution to segmentation of imperfect data through time based on temporal propagation and uncertainty estimation. We integrate uncertainty estimation into Mask R-CNN network and propagate motion-corrected segmentation masks from frames with low uncertainty to those frames with high uncertainty to handle temporary loss of signal for segmentation. We demonstrate the value of this approach over frame-by-frame segmentation and regular temporal propagation on data from human embryonic kidney (HEK293T) cells transiently transfected with a fluorescent protein that moves in and out of the nucleus over time. The method presented here will empower microscopic experiments aimed at understanding molecular and cellular function.
Özgün Çiçek, Yassine Marrakchi, Enoch Boasiako Antwi, Barbara Di Ventura, Thomas Brox
Semi-supervised Instance Segmentation with a Learned Shape Prior
Abstract
To date, most instance segmentation approaches are based on supervised learning that requires a considerable amount of annotated object contours as training ground truth. Here, we propose a framework that searches for the target object based on a shape prior. The shape prior model is learned with a variational autoencoder that requires only a very limited amount of training data: In our experiments, a few dozens of object shape patches from the target dataset, as well as purely synthetic shapes, were sufficient to achieve results en par with supervised methods with full access to training data on two out of three cell segmentation datasets. Our method with a synthetic shape prior was superior to pre-trained supervised models with access to limited domain-specific training data on all three datasets. Since the learning of prior models requires shape patches, whether real or synthetic data, we call this framework semi-supervised learning. The code is available to the public (https://​github.​com/​looooongChen/​shape_​prior_​seg).
Long Chen, Weiwen Zhang, Yuli Wu, Martin Strauch, Dorit Merhof
COMe-SEE: Cross-modality Semantic Embedding Ensemble for Generalized Zero-Shot Diagnosis of Chest Radiographs
Abstract
Zero-shot learning, in spite of its recent popularity, remains an unexplored area for medical image analysis. We introduce a first-of-its-kind generalized zero-shot learning (GZSL) framework that utilizes information from two different imaging modalities (CT and x-ray) for the diagnosis of chest radiographs. Our model makes use of CT radiology reports to create a semantic space consisting of signatures corresponding to different chest diseases and conditions. We introduce a CrOss-Modality Semantic Embedding Ensemble (COMe-SEE) for zero-shot diagnosis of chest x-rays by relating an input x-ray to a signature in the semantic space. The ensemble, designed using a novel semantic saliency preserving autoencoder, utilizes the visual and the semantic saliency to facilitate GZSL. The use of an ensemble not only helps in dealing with noise but also makes our model useful across different datasets. Experiments on two publicly available datasets show that the proposed model can be trained using one dataset and still be applied to data from another source for zero-shot diagnosis of chest x-rays.
Angshuman Paul, Thomas C. Shen, Niranjan Balachandar, Yuxing Tang, Yifan Peng, Zhiyong Lu, Ronald M. Summers
Semi-supervised Machine Learning with MixMatch and Equivalence Classes
Abstract
Semi-supervised methods have an increasing impact on computer vision tasks to make use of scarce labels on large datasets, yet these approaches have not been well translated to medical imaging. Of particular interest, the MixMatch method achieves significant performance improvement over popular semi-supervised learning methods with scarce labels in the CIFAR-10 dataset. In a complementary approach, Nullspace Tuning on equivalence classes offers the potential to leverage multiple subject scans when the ground truth for the subject is unknown. This work is the first to (1) explore MixMatch with Nullspace Tuning in the context of medical imaging and (2) characterize the impacts of the methods with diminishing labels. We consider two distinct medical imaging domains: skin lesion diagnosis and lung cancer prediction. In both cases we evaluate models trained with diminishing labeled data using supervised, MixMatch, and Nullspace Tuning methods as well as MixMatch with Nullspace Tuning together. MixMatch with Nullspace Tuning together is able to achieve an AUC of 0.755 in lung cancer diagnosis with only 200 labeled subjects on the National Lung Screening Trial and a balanced multi-class accuracy of 77% with only 779 labeled examples on HAM10000. This performance is similar to that of the fully supervised methods when all labels are available. In advancing data driven methods in medical imaging, it is important to consider the use of current state-of-the-art semi-supervised learning methods from the greater machine learning community and their impact on the limitations of data acquisition and annotation.
Colin B. Hansen, Vishwesh Nath, Riqiang Gao, Camilo Bermudez, Yuankai Huo, Kim L. Sandler, Pierre P. Massion, Jeffrey D. Blume, Thomas A. Lasko, Bennett A. Landman
Non-contrast CT Liver Segmentation Using CycleGAN Data Augmentation from Contrast Enhanced CT
Abstract
Non-contrast CT is often preferred in clinical screening while segmentation of such CT data is more challenging due to the low contrast in tissue boundaries and scarce supervised training data than contrast-enhanced CT (CTce) segmentation. To alleviate manual labelling work of radiologists, we generate training samples for 3D U-Net segmentation network by transforming the existing CTce liver segmentation dataset to the non-contrast CT styled volumes with CycleGAN. We validated the performance of CycleGAN in both unsupervised and hybrid supervised training strategy. The results show that using CycleGAN in unsupervised segmentation can achieve higher mean Dice coefficients than fully supervised manner in liver segmentation. The hybrid training of generated samples and the target task samples can improve the generalization ability of segmentation.
Chongchong Song, Baochun He, Hongyu Chen, Shuangfu Jia, Xiaoxia Chen, Fucang Jia
Uncertainty Estimation in Medical Image Localization: Towards Robust Anterior Thalamus Targeting for Deep Brain Stimulation
Abstract
Atlas-based methods are the standard approaches for automatic targeting of the Anterior Nucleus of the Thalamus (ANT) for Deep Brain Stimulation (DBS), but these are known to lack robustness when anatomic differences between atlases and subjects are large. To improve the localization robustness, we propose a novel two-stage deep learning (DL) framework, where the first stage identifies and crops the thalamus regions from the whole brain MRI and the second stage performs per-voxel regression on the cropped volume to localize the targets at the finest resolution scale. To address the issue of data scarcity, we train the models with the pseudo labels which are created based on the available labeled data using multi-atlas registration. To assess the performance of the proposed framework, we validate two sampling-based uncertainty estimation techniques namely Monte Carlo Dropout (MCDO) and Test-Time Augmentation (TTA) on the second-stage localization network. Moreover, we propose a novel uncertainty estimation metric called maximum activation dispersion (MAD) to estimate the image-wise uncertainty for localization tasks. Our results show that the proposed method achieved more robust localization performance than the traditional multi-atlas method and TTA could further improve the robustness. Moreover, the epistemic and hybrid uncertainty estimated by MAD could be used to detect the unreliable localizations and the magnitude of the uncertainty estimated by MAD could reflect the degree of unreliability for the rejected predictions.
Han Liu, Can Cui, Dario J. Englot, Benoit M. Dawant
A Case Study of Transfer of Lesion-Knowledge
Abstract
All organs in the human body are susceptible to cancer, and we now have a growing store of images of lesions in different parts of the body. This, along with the acknowledged ability of neural-network methods to analyse image data, would suggest that accurate models for lesions can now be constructed by a deep neural network. However an important difficulty arises from the lack of annotated images from various parts of the body. Our proposed approach to address the issue of scarce training data for a target organ is to apply a form of transfer learning: that is, to adapt a model constructed for one organ to another for which there are minimal or no annotations. After consultation with medical specialists, we note that there are several discriminating visual features between malignant and benign lesions that occur consistently across organs. Therefore, in principle, these features boost the case for transfer learning on lesion images across organs. However, this has never been previously investigated. In this paper, we investigate whether lesion knowledge can be transferred across organs. Specifically, as a case study, we examine the transfer of a lesion model from the brain to lungs and lungs to the brain. We evaluate the efficacy of transfer of a brain-lesion model to the lung, and the transfer of a lung-lesion model to the brain by comparing against a model constructed: (a) without model-transfer (i.e. random weights); and (b) using model-transfer from a lesion-agnostic dataset (ImageNet). In all cases, our lesion models perform substantially better. These results point to the potential utility of transferring lesion-knowledge across organs other than those considered here.
Soundarya Krishnan, Rishab Khincha, Lovekesh Vig, Tirtharaj Dash, Ashwin Srinivasan
Transfer Learning with Joint Optimization for Label-Efficient Medical Image Anomaly Detection
Abstract
Many medical imaging applications require robust capabilities for automated image anomaly detection. Supervised deep learning approaches can be employed for such tasks, but poses large data collection and annotation burdens. To address this challenge, recent works have proposed advanced unsupervised, semi-supervised or transfer learning based deep learning methods for label-efficient image anomaly detection. However, these methods often require extensive hyperparameter tuning to achieve good performance, and have yet to be demonstrated in data-scarce domain centric applications with nuanced normal-vs-anomaly distinctions. Here, we propose a practical label-efficient anomaly detection method that employs fine-tuning of pre-trained model based on a small target domain dataset. Our approach employs a joint optimization framework to enhance discriminative power for anomaly detection performance. In evaluations on two benchmark medical image datasets, we demonstrate (a) strong performance gains over state-of-the-art baselines and (b) increased label efficiency over standard fine-tuning approaches. Importantly, our approach reduces the need for large annotated datasets, requires minimal hyperparameter tuning, and shows stronger performance boost for more challenging anomalies (Supplement: http://​s000.​tinyupload.​com/​?​file_​id=​2491695942187098​9415).
Xintong Li, Huijuan Yang, Zhiping Lin, Pavitra Krishnaswamy
Unsupervised Wasserstein Distance Guided Domain Adaptation for 3D Multi-domain Liver Segmentation
Abstract
Deep neural networks have shown exceptional learning capability and generalizability in the source domain when massive labeled data is provided. However, the well-trained models often fail in the target domain due to the domain shift. Unsupervised domain adaptation aims to improve network performance when applying robust models trained on medical images from source domains to a new target domain. In this work, we present an approach based on the Wasserstein distance guided disentangled representation to achieve 3D multi-domain liver segmentation. Concretely, we embed images onto a shared content space capturing shared feature-level information across domains and domain-specific appearance spaces. The existing mutual information-based representation learning approaches often fail to capture complete representations in multi-domain medical imaging tasks. To mitigate these issues, we utilize Wasserstein distance to learn more complete representation, and introduces a content discriminator to further facilitate the representation disentanglement. Experiments demonstrate that our method outperforms the state-of-the-art on the multi-modality liver segmentation task.
Chenyu You, Junlin Yang, Julius Chapiro, James S. Duncan
HydraMix-Net: A Deep Multi-task Semi-supervised Learning Approach for Cell Detection and Classification
Abstract
Semi-supervised techniques have removed the barriers of large scale labelled set by exploiting unlabelled data to improve the performance of a model. In this paper, we propose a semi-supervised deep multi-task classification and localization approach HydraMix-Net in the field of medical imagining where labelling is time consuming and costly. Firstly, the pseudo labels are generated using the model’s prediction on the augmented set of unlabelled image with averaging. The high entropy predictions are further sharpened to reduced the entropy and are then mixed with the labelled set for training. The model is trained in multi-task learning manner with noise tolerant joint loss for classification localization and achieves better performance when given limited data in contrast to a simple deep model. On DLBCL data it achieves 80% accuracy in contrast to simple CNN achieving 70% accuracy when given only 100 labelled examples.
Raja Muhammad Saad Bashir, Talha Qaiser, Shan E Ahmed Raza, Nasir M Rajpoot
Semi-supervised Classification of Chest Radiographs
Abstract
To train deep learning models in a supervised fashion, we need a significant amount of training data, but in most medical imaging scenarios, there is a lack of annotated data available. In this paper, we compare state-of-the-art semi-supervised classification methods in a medical imaging scenario. We evaluate the performance of different approaches in a chest radiograph classification task using the ChestX-ray14 dataset. We adapted methods based on pseudo-labeling and consistency regularization to perform multi-label classification and to use a state-of-the-art model architecture in chest radiograph classification. Our proposed approaches resulted in average AUCs up to 0.6691 with only 25 labeled samples per class, and an average AUC of 0.7182 when using only 2% of the labeled data, achieving results superior to previous approaches on semi-supervised chest radiograph classification.
Eduardo H. P. Pooch, Pedro Ballester, Rodrigo C. Barros

LABELS 2020

Frontmatter
Risk of Training Diagnostic Algorithms on Data with Demographic Bias
Abstract
One of the critical challenges in machine learning applications is to have fair predictions. There are numerous recent examples in various domains that convincingly show that algorithms trained with biased datasets can easily lead to erroneous or discriminatory conclusions. This is even more crucial in clinical applications where predictive algorithms are designed mainly based on a given set of medical images, and demographic variables such as age, sex and race are not taken into account. In this work, we conduct a survey of the MICCAI 2018 proceedings to investigate the common practice in medical image analysis applications. Surprisingly, we found that papers focusing on diagnosis rarely describe the demographics of the datasets used, and the diagnosis is purely based on images. In order to highlight the importance of considering the demographics in diagnosis tasks, we used a publicly available dataset of skin lesions. We then demonstrate that a classifier with an overall area under the curve (AUC) of 0.83 has variable performance between 0.76 and 0.91 on subgroups based on age and sex, even though the training set was relatively balanced. Moreover, we show that it is possible to learn unbiased features by explicitly using demographic variables in an adversarial training setup, which leads to balanced scores per subgroups. Finally, we discuss the implications of these results and provide recommendations for further research.
Samaneh Abbasi-Sureshjani, Ralf Raumanns, Britt E. J. Michels, Gerard Schouten, Veronika Cheplygina
Semi-weakly Supervised Learning for Prostate Cancer Image Classification with Teacher-Student Deep Convolutional Networks
Abstract
Deep Convolutional Neural Networks (CNN) are at the backbone of the state–of–the art methods to automatically analyze Whole Slide Images (WSIs) of digital tissue slides. One challenge to train fully-supervised CNN models with WSIs is providing the required amount of costly, manually annotated data. This paper presents a semi-weakly supervised model for classifying prostate cancer tissue. The approach follows a teacher-student learning paradigm that allows combining a small amount of annotated data (tissue microarrays with regions of interest traced by pathologists) with a large amount of weakly-annotated data (whole slide images with labels extracted from the diagnostic reports). The task of the teacher model is to annotate the weakly-annotated images. The student is trained with the pseudo-labeled images annotated by the teacher and fine-tuned with the small amount of strongly annotated data. The evaluation of the methods is in the task of classification of four Gleason patterns and the Gleason score in prostate cancer images. Results show that the teacher-student approach improves significatively the performance of the fully-supervised CNN, both at the Gleason pattern level in tissue microarrays (respectively \(\kappa = 0.594 \pm 0.022\) and \(\kappa = 0.559 \pm 0.034\)) and at the Gleason score level in WSIs (respectively \(\kappa = 0.403 \pm 0.046\) and \(\kappa = 0.273 \pm 0.12\)). Our approach opens the possibility of transforming large weakly–annotated (and unlabeled) datasets into valuable sources of supervision for training robust CNN models in computational pathology.
Sebastian Otálora, Niccolò Marini, Henning Müller, Manfredo Atzori
Are Pathologist-Defined Labels Reproducible? Comparison of the TUPAC16 Mitotic Figure Dataset with an Alternative Set of Labels
Abstract
Pathologist-defined labels are the gold standard for histopathological data sets, regardless of well-known limitations in consistency for some tasks. To date, some datasets on mitotic figures are available and were used for development of promising deep learning-based algorithms. In order to assess robustness of those algorithms and reproducibility of their methods it is necessary to test on several independent datasets. The influence of different labeling methods of these available datasets is currently unknown. To tackle this, we present an alternative set of labels for the images of the auxiliary mitosis dataset of the TUPAC16 challenge. Additional to manual mitotic figure screening, we used a novel, algorithm-aided labeling process, that allowed to minimize the risk of missing rare mitotic figures in the images. All potential mitotic figures were independently assessed by two pathologists. The novel, publicly available set of labels contains 1,999 mitotic figures (+28.80%) and additionally includes 10,483 labels of cells with high similarities to mitotic figures (hard examples). We found significant difference comparing \(F_1\) scores between the original label set (0.549) and the new alternative label set (0.735) using a standard deep learning object detection architecture. The models trained on the alternative set showed higher overall confidence values, suggesting a higher overall label consistency. Findings of the present study show that pathologists-defined labels may vary significantly resulting in notable difference in the model performance. Comparison of deep learning-based algorithms between independent datasets with different labeling methods should be done with caution.
Christof A. Bertram, Mitko Veta, Christian Marzahl, Nikolas Stathonikos, Andreas Maier, Robert Klopfleisch, Marc Aubreville
EasierPath: An Open-Source Tool for Human-in-the-Loop Deep Learning of Renal Pathology
Abstract
Considerable morphological phenotyping studies in nephrology have emerged in the past few years, aiming to discover hidden regularities between clinical and imaging phenotypes. Such studies have been largely enabled by deep learning based image analysis to extract sparsely located targeting objects (e.g., glomeruli) on high-resolution whole slide images (WSI). However, such methods need to be trained using labor-intensive high-quality annotations, ideally labeled by pathologists. Inspired by the recent “human-in-the-loop” strategy, we developed EasierPath, an open-source tool to integrate human physicians and deep learning algorithms for efficient large-scale pathological image quantification as a loop. Using EasierPath, physicians are able to (1) optimize the recall and precision of deep learning object detection outcomes adaptively, (2) seamlessly support deep learning outcomes refining using either our EasierPath or prevalent ImageScope software without changing physician’s user habit, and (3) manage and phenotype each object with user-defined classes. As a user case of EasierPath, we present the procedure of curating large-scale glomeruli in an efficient human-in-the-loop fashion (with two loops). From the experiments, the EasierPath saved 57% of the annotation efforts to curate 8,833 glomeruli during the second loop. Meanwhile, the average precision of glomerular detection was leveraged from 0.504 to 0.620. The EasierPath software has been released as open-source to enable the large-scale glomerular prototyping. The code can be found in https://​github.​com/​yuankaihuo/​EasierPath.
Zheyu Zhu, Yuzhe Lu, Ruining Deng, Haichun Yang, Agnes B. Fogo, Yuankai Huo
Imbalance-Effective Active Learning in Nucleus, Lymphocyte and Plasma Cell Detection
Abstract
An Imbalance-Effective Active Learning (IEAL) based deep neural network algorithm is proposed for the automatic detection of nucleus, lymphocyte and plasma cells in hepatitis diagnosis. The active sampling approach reduces the training sample annotation cost and mitigates extreme imbalances among the nucleus, lymphocytes and plasma samples. A Bayesian U-net model is developed by incorporating IEAL with basic U-Net. The testing results obtained using an in-house dataset consisting of 43 whole slide images (300 256 * 256 images) show that the proposed method achieves an equal or better performance compared than a basic U-net classifier using less than half the number of annotated samples.
Chao-Ting Li, Hung-Wen Tsai, Tseng-Lung Yang, Jung-Chi Lin, Nan-Haw Chow, Yu Hen Hu, Kuo-Sheng Cheng, Pau-Choo Chung
Labeling of Multilingual Breast MRI Reports
Abstract
Medical reports are an essential medium in recording a patient’s condition throughout a clinical trial. They contain valuable information that can be extracted to generate a large labeled dataset needed for the development of clinical tools. However, the majority of medical reports are stored in an unregularized format, and a trained human annotator (typically a doctor) must manually assess and label each case, resulting in an expensive and time consuming procedure. In this work, we present a framework for developing a multilingual breast MRI report classifier using a custom-built language representation called LAMBR. Our proposed method overcomes practical challenges faced in clinical settings, and we demonstrate improved performance in extracting labels from medical reports when compared with conventional approaches.
Chen-Han Tsai, Nahum Kiryati, Eli Konen, Miri Sklair-Levy, Arnaldo Mayer
Predicting Scores of Medical Imaging Segmentation Methods with Meta-learning
Abstract
Deep learning has led to state-of-the-art results for many medical imaging tasks, such as segmentation of different anatomical structures. With the increased numbers of deep learning publications and openly available code, the approach to choosing a model for a new task becomes more complicated, while time and (computational) resources are limited. A possible solution to choosing a model efficiently is meta-learning, a learning method in which prior performance of a model is used to predict the performance for new tasks. We investigate meta-learning for segmentation across ten datasets of different organs and modalities. We propose four ways to represent each dataset by meta-features: one based on statistical features of the images and three are based on deep learning features. We use support vector regression and deep neural networks to learn the relationship between the meta-features and prior model performance. On three external test datasets these methods give Dice scores within 0.10 of the true performance. These results demonstrate the potential of meta-learning in medical imaging.
Tom van Sonsbeek, Veronika Cheplygina
Labelling Imaging Datasets on the Basis of Neuroradiology Reports: A Validation Study
Abstract
Natural language processing (NLP) shows promise as a means to automate the labelling of hospital-scale neuroradiology magnetic resonance imaging (MRI) datasets for computer vision applications. To date, however, there has been no thorough investigation into the validity of this approach, including determining the accuracy of report labels compared to image labels as well as examining the performance of non-specialist labellers. In this work, we draw on the experience of a team of neuroradiologists who labelled over 5000 MRI neuroradiology reports as part of a project to build a dedicated deep learning-based neuroradiology report classifier. We show that, in our experience, assigning binary labels (i.e. normal vs abnormal) to images from reports alone is highly accurate. In contrast to the binary labels, however, the accuracy of more granular labelling is dependent on the category, and we highlight reasons for this discrepancy. We also show that downstream model performance is reduced when labelling of training reports is performed by a non-specialist. To allow other researchers to accelerate their research, we make our refined abnormality definitions and labelling rules available, as well as our easy-to-use radiology report labelling tool which helps streamline this process.
David A. Wood, Sina Kafiabadi, Aisha Al Busaidi, Emily Guilhem, Jeremy Lynch, Matthew Townend, Antanas Montvila, Juveria Siddiqui, Naveen Gadapa, Matthew Benger, Gareth Barker, Sebastian Ourselin, James H. Cole, Thomas C. Booth
Semi-supervised Learning for Instrument Detection with a Class Imbalanced Dataset
Abstract
The automated recognition of surgical instruments in surgical videos is an essential factor for the evaluation and analysis of surgery. The analysis of surgical instrument localization information can help in analyses related to surgical evaluation and decision making during surgery. To solve the problem of the localization of surgical instruments, we used an object detector with bounding box labels to train the localization of the surgical tools shown in a surgical video. In this study, we propose a semi-supervised learning-based training method to solve the class imbalance between surgical instruments, which makes it challenging to train the detectors of the surgical instruments. First, we labeled gastrectomy videos for gastric cancer performed in 24 cases of robotic surgery to detect the initial bounding box of the surgical instruments. Next, a trained instrument detector was used to discern the unlabeled videos, and new labels were added to the tools causing class imbalance based on the previously acquired statistics of the labeled videos. We also performed object tracking-based label generation in the spatio-temporal domain to obtain accurate label information from the unlabeled videos in an automated manner. We were able to generate dense labels for the surgical instruments lacking labels through bidirectional object tracking using a single object tracker; thus, we achieved improved instrument detection in a fully or semi-automated manner.
Jihun Yoon, Jiwon Lee, SungHyun Park, Woo Jin Hyung, Min-Kook Choi
Paying Per-Label Attention for Multi-label Extraction from Radiology Reports
Abstract
Training medical image analysis models requires large amounts of expertly annotated data which is time-consuming and expensive to obtain. Images are often accompanied by free-text radiology reports which are a rich source of information. In this paper, we tackle the automated extraction of structured labels from head CT reports for imaging of suspected stroke patients, using deep learning. Firstly, we propose a set of 31 labels which correspond to radiographic findings (e.g. hyperdensity) and clinical impressions (e.g. haemorrhage) related to neurological abnormalities. Secondly, inspired by previous work, we extend existing state-of-the-art neural network models with a label-dependent attention mechanism. Using this mechanism and simple synthetic data augmentation, we are able to robustly extract many labels with a single model, classified according to the radiologist’s reporting (positive, uncertain, negative). This approach can be used in further research to effectively extract many labels from medical text.
Patrick Schrempf, Hannah Watson, Shadia Mikhael, Maciej Pajak, Matúš Falis, Aneta Lisowska, Keith W. Muir, David Harris-Birtill, Alison Q. O’Neil
Correction to: Interpretable and Annotation-Efficient Learning for Medical Image Computing
Jaime Cardoso, Hien Van Nguyen, Nicholas Heller, Pedro Henriques Abreu, Ivana Isgum, Wilson Silva, Ricardo Cruz, Jose Pereira Amorim, Vishal Patel, Badri Roysam, Kevin Zhou, Steve Jiang, Ngan Le, Khoa Luu, Raphael Sznitman, Veronika Cheplygina, Diana Mateus, Emanuele Trucco, Samaneh Abbasi
Backmatter
Metadata
Title
Interpretable and Annotation-Efficient Learning for Medical Image Computing
Editors
Jaime Cardoso
Hien Van Nguyen
Nicholas Heller
Pedro Henriques Abreu
Ivana Isgum
Wilson Silva
Ricardo Cruz
Jose Pereira Amorim
Prof. Vishal Patel
Dr. Badri Roysam
Kevin Zhou
Steve Jiang
Ngan Le
Khoa Luu
Raphael Sznitman
Veronika Cheplygina
Diana Mateus
Prof. Emanuele Trucco
Samaneh Abbasi
Copyright Year
2020
Electronic ISBN
978-3-030-61166-8
Print ISBN
978-3-030-61165-1
DOI
https://doi.org/10.1007/978-3-030-61166-8

Premium Partner