Skip to main content

Über dieses Buch

The four-volume set LNCS 11070, 11071, 11072, and 11073 constitutes the refereed proceedings of the 21st International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2018, held in Granada, Spain, in September 2018.

The 373 revised full papers presented were carefully reviewed and selected from 1068 submissions in a double-blind review process. The papers have been organized in the following topical sections:
Part I: Image Quality and Artefacts; Image Reconstruction Methods; Machine Learning in Medical Imaging; Statistical Analysis for Medical Imaging; Image Registration Methods.
Part II: Optical and Histology Applications: Optical Imaging Applications; Histology Applications; Microscopy Applications; Optical Coherence Tomography and Other Optical Imaging Applications. Cardiac, Chest and Abdominal Applications: Cardiac Imaging Applications: Colorectal, Kidney and Liver Imaging Applications; Lung Imaging Applications; Breast Imaging Applications; Other Abdominal Applications.
Part III: Diffusion Tensor Imaging and Functional MRI: Diffusion Tensor Imaging; Diffusion Weighted Imaging; Functional MRI; Human Connectome. Neuroimaging and Brain Segmentation Methods: Neuroimaging; Brain Segmentation Methods.
Part IV: Computer Assisted Intervention: Image Guided Interventions and Surgery; Surgical Planning, Simulation and Work Flow Analysis; Visualization and Augmented Reality. Image Segmentation Methods: General Image Segmentation Methods, Measures and Applications; Multi-Organ Segmentation; Abdominal Segmentation Methods; Cardiac Segmentation Methods; Chest, Lung and Spine Segmentation; Other Segmentation Applications.



Correction to: Towards a Glaucoma Risk Index Based on Simulated Hemodynamics from Fundus Images

In the paper titled “Towards a glaucoma risk index based on simulated hemodynamics from fundus images”, the acknowledgement has been updated.

José Ignacio Orlando, João Barbosa Breda, Karel van Keer, Matthew B. Blaschko, Pablo J. Blanco, Carlos A. Bulant

Optical and Histology Applications: Optical Imaging Applications


Instance Segmentation and Tracking with Cosine Embeddings and Recurrent Hourglass Networks

Different to semantic segmentation, instance segmentation assigns unique labels to each individual instance of the same class. In this work, we propose a novel recurrent fully convolutional network architecture for tracking such instance segmentations over time. The network architecture incorporates convolutional gated recurrent units (ConvGRU) into a stacked hourglass network to utilize temporal video information. Furthermore, we train the network with a novel embedding loss based on cosine similarities, such that the network predicts unique embeddings for every instance throughout videos. Afterwards, these embeddings are clustered among subsequent video frames to create the final tracked instance segmentations. We evaluate the recurrent hourglass network by segmenting left ventricles in MR videos of the heart, where it outperforms a network that does not incorporate video information. Furthermore, we show applicability of the cosine embedding loss for segmenting leaf instances on still images of plants. Finally, we evaluate the framework for instance segmentation and tracking on six datasets of the ISBI celltracking challenge, where it shows state-of-the-art performance.

Christian Payer, Darko Štern, Thomas Neff, Horst Bischof, Martin Urschler

Skin Lesion Classification in Dermoscopy Images Using Synergic Deep Learning

Automated skin lesion classification in the dermoscopy images is an essential way to improve diagnostic performance and reduce melanoma deaths. Although deep learning has shown proven advantages over traditional methods, which rely on handcrafted features, in image classification, it remains challenging to classify skin lesions due to the significant intra-class variation and inter-class similarity. In this paper, we propose a synergic deep learning (SDL) model to address this issue, which not only uses dual deep convolutional neural networks (DCNNs) but also enables them to mutually learn from each other. Specifically, we concatenate the image representation learned by both DCNNs as the input of a synergic network, which has a fully connected structure and predicts whether the pair of input images belong to the same class. We train the SDL model in the end-to-end manner under the supervision of the classification error in each DCNN and the synergic error. We evaluated our SDL model on the ISIC 2016 Skin Lesion Classification dataset and achieved the state-of-the-art performance.

Jianpeng Zhang, Yutong Xie, Qi Wu, Yong Xia

SLSDeep: Skin Lesion Segmentation Based on Dilated Residual and Pyramid Pooling Networks

Skin lesion segmentation (SLS) in dermoscopic images is a crucial task for automated diagnosis of melanoma. In this paper, we present a robust deep learning SLS model represented as an encoder-decoder network. The encoder network is constructed by dilated residual layers, in turn, a pyramid pooling network followed by three convolution layers is used for the decoder. Unlike the traditional methods employing a cross-entropy loss, we formulated a new loss function by combining both Negative Log Likelihood (NLL) and End Point Error (EPE) to accurately segment the boundaries of melanoma regions. The robustness of the proposed model was evaluated on two public databases: ISBI 2016 and 2017 for skin lesion analysis towards melanoma detection challenge. The proposed model outperforms the state-of-the-art methods in terms of the segmentation accuracy. Moreover, it is capable of segmenting about 100 images of a $$384\times 384$$ size per second on a recent GPU.

Md. Mostafa Kamal Sarker, Hatem A. Rashwan, Farhan Akram, Syeda Furruka Banu, Adel Saleh, Vivek Kumar Singh, Forhad U. H. Chowdhury, Saddam Abdulwahab, Santiago Romani, Petia Radeva, Domenec Puig

-Hemolysis Detection on Cultured Blood Agar Plates by Convolutional Neural Networks

The recent introduction of Full Laboratory Automation (FLA) systems in Clinical Microbiology opens to the availability of huge streams of high definition diagnostic images representing bacteria colonies on culturing plates. In this context, the presence of $$\beta $$ -hemolysis is a key diagnostic sign to assess the presence and virulence of pathogens like streptococci and to characterize major respiratory tract infections. Since it can manifest in a high variety of shapes, dimensions and intensities, obtaining a reliable automated detection of $$\beta $$ -hemolysis is a challenging task, never been tackled so far in its real complexity. To this aim, here we follow a deep learning approach operating on a database of 1500 fully annotated dual-light (top-lit and back-lit) blood agar plate images collected from FLA systems operating in ordinary clinical conditions. Patch-based training and test sets are obtained with the help of an ad-hoc, total recall, region proposal technique. A DenseNet Convolutional Neural Network architecture, dimensioned and trained to classify patch candidates, achieves a 98.9% precision with a recall of 98.9%, leading to an overall 90% precision and 99% recall on a plate basis, where false negative occurrence needs to be minimized. Being the first approach able to detect $$\beta $$ -hemolysis on a whole plate basis, the obtained results open new opportunities for supporting diagnostic decisions, with an expected high impact on the efficiency and accuracy of the laboratory workflow.

Mattia Savardi, Sergio Benini, Alberto Signoroni

A Pixel-Wise Distance Regression Approach for Joint Retinal Optical Disc and Fovea Detection

This paper introduces a novel strategy for the task of simultaneously locating two key anatomical landmarks in retinal images of the eye fundus, namely the optic disc and the fovea. For that, instead of attempting to classify each pixel as belonging to the background, the optic disc, or the fovea center, which would lead to a highly class-imbalanced setting, the problem is reformulated as a pixelwise regression task. The regressed quantity consists of the distance from the closest landmark of interest. A Fully-Convolutional Deep Neural Network is optimized to predict this distance for each image location, implicitly casting the problem into a per-pixel Multi-Task Learning approach by which a globally consistent distribution of distances across the entire image can be learned. Once trained, the two minimal distances predicted by the model are selected as the locations of the optic disc and the fovea. The joint learning of every pixel position relative to the optic disc and the fovea favors an automatic understanding of the overall anatomical distribution. This results in an effective technique that can detect both locations simultaneously, as opposed to previous methods that handle both tasks separately. Comprehensive experimental results on a large public dataset validate the proposed approach.

Maria Ines Meyer, Adrian Galdran, Ana Maria Mendonça, Aurélio Campilho

Deep Random Walk for Drusen Segmentation from Fundus Images

This paper presents a deep random walk technique for drusen segmentation from fundus images. It is formulated as a deep learning architecture which learns deep representations from fundus images and specify an optimal pixel-pixel affinity. Specifically, the proposed architecture is mainly composed of three parts: a deep feature extraction module to learn both semantic-level and low-level representation of image, an affinity learning module to get pixel-pixel affinities for formulating the transition matrix of random walk and a random walk module which propagates manual labels. The power of our technique comes from the fact that the learning procedures for deep image representations and pixel-pixel affinities are driven by the random walk process. The accuracy of our proposed algorithm surpasses state-of-the-art drusen segmentation techniques as validated on the public STARE and DRIVE databases.

Fang Yan, Jia Cui, Yu Wang, Hong Liu, Hui Liu, Benzheng Wei, Yilong Yin, Yuanjie Zheng

Retinal Artery and Vein Classification via Dominant Sets Clustering-Based Vascular Topology Estimation

The classification of the retinal vascular tree into arteries and veins is important in understanding the relation between vascular changes and a wide spectrum of diseases. In this paper, we have proposed a novel framework that is capable of making the artery/vein (A/V) distinction in retinal color fundus images. We have successfully adapted the concept of dominant sets clustering and formalize the retinal vessel topology estimation and the A/V classification problem as a pairwise clustering problem. Dominant sets clustering is a graph-theoretic approach that has been proven to work well in data clustering. The proposed approach has been applied to three public databases (INSPIRE, DRIVE and VICAVR) and achieved high accuracies of 91.0%, 91.2%, and 91.0%, respectively. Furthermore, we have made manual annotations of vessel topologies from these databases, and this annotation will be released for public access to facilitate other researchers in the community to do research in the same and related topics.

Yitian Zhao, Jianyang Xie, Pan Su, Yalin Zheng, Yonghuai Liu, Jun Cheng, Jiang Liu

Towards a Glaucoma Risk Index Based on Simulated Hemodynamics from Fundus Images

Glaucoma is the leading cause of irreversible but preventable blindness in the world. Its major treatable risk factor is the intra-ocular pressure, although other biomarkers are being explored to improve the understanding of the pathophysiology of the disease. It has been recently observed that glaucoma induces changes in the ocular hemodynamics. However, its effects on the functional behavior of the retinal arterioles have not been studied yet. In this paper we propose a first approach for characterizing those changes using computational hemodynamics. The retinal blood flow is simulated using a 0D model for a steady, incompressible non Newtonian fluid in rigid domains. The simulation is performed on patient-specific arterial trees extracted from fundus images. We also propose a novel feature representation technique to comprise the outcomes of the simulation stage into a fixed length feature vector that can be used for classification studies. Our experiments on a new database of fundus images show that our approach is able to capture representative changes in the hemodynamics of glaucomatous patients. Code and data are publicly available in .

José Ignacio Orlando, João Barbosa Breda, Karel van Keer, Matthew B. Blaschko, Pablo J. Blanco, Carlos A. Bulant

A Framework for Identifying Diabetic Retinopathy Based on Anti-noise Detection and Attention-Based Fusion

Automatic diagnosis of diabetic retinopathy (DR) using retinal fundus images is a challenging problem because images of low grade DR may contain only a few tiny lesions which are difficult to perceive even to human experts. Using annotations in the form of lesion bounding boxes may help solve the problem by deep learning models, but fully annotated samples of this type are usually expensive to obtain. Missing annotated samples (i.e., true lesions but not included in annotations) are noise and can affect learning models negatively. Besides, how to utilize lesion information for identifying DR should be considered carefully because different types of lesions may be used to distinguish different DR grades. In this paper, we propose a new framework for unifying lesion detection and DR identification. Our lesion detection model first determines the missing annotated samples to reduce their impact on the model, and extracts lesion information. Our attention-based network then fuses original images and lesion information to identify DR. Experimental results show that our detection model can considerably reduce the impact of missing annotation and our attention-based network can learn weights between the original images and lesion information for distinguishing different DR grades. Our approach outperforms state-of-the-art methods on two grand challenge retina datasets, EyePACS and Messidor.

Zhiwen Lin, Ruoqian Guo, Yanjie Wang, Bian Wu, Tingting Chen, Wenzhe Wang, Danny Z. Chen, Jian Wu

Deep Supervision with Additional Labels for Retinal Vessel Segmentation Task

Automatic analysis of retinal fundus images is of vital importance in diagnosis tasks of retinopathy. Segmenting vessels accurately is a fundamental step in analysing retinal images. However, it is usually difficult due to various imaging conditions, low image contrast and the appearance of pathologies such as micro-aneurysms. In this paper, we propose a novel method with deep neural networks to solve this problem. We utilize U-net with residual connection to detect vessels. To achieve better accuracy, we introduce an edge-aware mechanism, in which we convert the original task into a multi-class task by adding additional labels on boundary areas. In this way, the network will pay more attention to the boundary areas of vessels and achieve a better performance, especially in tiny vessels detecting. Besides, side output layers are applied in order to give deep supervision and therefore help convergence. We train and evaluate our model on three databases: DRIVE, STARE, and CHASEDB1. Experimental results show that our method has a comparable performance with AUC of 97.99% on DRIVE and an efficient running time compared to the state-of-the-art methods.

Yishuo Zhang, Albert C. S. Chung

A Multi-task Network to Detect Junctions in Retinal Vasculature

Junctions in the retinal vasculature are key points to be able to extract its topology, but they vary in appearance, depending on vessel density, width and branching/crossing angles. The complexity of junction patterns is usually accompanied by a scarcity of labels, which discourages the usage of very deep networks for their detection. We propose a multi-task network, generating labels for vessel interior, centerline, edges and junction patterns, to provide additional information to facilitate junction detection. After the initial detection of potential junctions in junction-selective probability maps, candidate locations are re-examined in centerline probability maps to verify if they connect at least 3 branches. The experiments on the DRIVE and IOSTAR showed that our method outperformed a recent study in which a popular deep network was trained as a classifier to find junctions. Moreover, the proposed approach is applicable to unseen datasets with the same degree of success, after training it only once.

Fatmatülzehra Uslu, Anil Anthony Bharath

A Multitask Learning Architecture for Simultaneous Segmentation of Bright and Red Lesions in Fundus Images

Recent CNN architectures have established state-of-the-art results in a large range of medical imaging applications. We propose an extension to the U-Net architecture relying on multi-task learning: while keeping a single encoding module, multiple decoding modules are used for concurrent segmentation tasks. We propose improvements of the encoding module based on the latest CNN developments: residual connections at every scale, mixed pooling for spatial compression and large kernels for convolutions at the lowest scale. We also use dense connections within the different scales based on multi-size pooling regions. We use this new architecture to jointly detect and segment red and bright retinal lesions which are essential biomarkers of diabetic retinopathy. Each of the two categories is handled by a specialized decoding module. Segmentation outputs are refined with conditional random fields (CRF) as RNN and the network is trained end-to-end with an effective Kappa-based function loss. Preliminary results on a public dataset in the segmentation task on red (resp. bright) lesions shows a sensitivity of 66,9% (resp. 75,3%) and a specificity of 99,8% (resp. 99,9%).

Clément Playout, Renaud Duval, Farida Cheriet

Uniqueness-Driven Saliency Analysis for Automated Lesion Detection with Applications to Retinal Diseases

Saliency is important in medical image analysis in terms of detection and segmentation tasks. We propose a new method to extract uniqueness-driven saliency based on the uniqueness of intensity and spatial distributions within the images. The main novelty of this new saliency feature is that it is powerful in the detection of different types of lesions in different types of images without the need of tuning parameters for different problems. To evaluate its effectiveness, we have applied our method to the detection lesions of retinal images. Four different types of lesions: exudate, hemorrhage, microaneurysms and leakage from 7 independent public retinal image datasets of diabetic retinopathy and malarial retinopathy, were studied and the experimental results show that the proposed method is superior to the state-of-the-art methods.

Yitian Zhao, Yalin Zheng, Yifan Zhao, Yonghuai Liu, Zhili Chen, Peng Liu, Jiang Liu

Multiscale Network Followed Network Model for Retinal Vessel Segmentation

The shape of retinal blood vessels plays a critical role in the early diagnosis of diabetic retinopathy. However, it remains challenging to segment accurately the blood vessels, particularly the capillaries, in color retinal images. In this paper, we propose the multiscale network followed network (MS-NFN) model to address this issue. This model consists of an ‘up-pool’ NFN submodel and a ‘pool-up’ NFN submodel, in which max-pooling layers and up-sampling layers can generate multiscale feature maps. In each NFN, the first multiscale network converts an image patch into a probabilistic retinal vessel map, and the following multiscale network further refines the map. The refined probabilistic retinal vessel maps produced by both NFNs are averaged to construct the segmentation result. We evaluated this model on the digital retinal images for vessel extraction (DRIVE) dataset and the child heart and health study dataset. Our results indicate that the NFN structure we designed is able to produce performance gain and the proposed MS-NFN model achieved the state-of-the-art retinal vessel segmentation accuracy on both datasets.

Yicheng Wu, Yong Xia, Yang Song, Yanning Zhang, Weidong Cai

Optical and Histology Applications: Histology Applications


Predicting Cancer with a Recurrent Visual Attention Model for Histopathology Images

Automatically recognizing cancers from multi-gigapixel whole slide histopathology images is one of the challenges facing machine and deep learning based solutions for digital pathology. Currently, most automatic systems for histopathology are not scalable to large images and hence require a patch-based representation; a sub-optimal solution as it results in important additional computational costs but more importantly in the loss of contextual information. We present a novel attention-based model for predicting cancer from histopathology whole slide images. The proposed model is capable of attending to the most discriminative regions of an image by adaptively selecting a limited sequence of locations and only processing the selected areas of tissues. We demonstrate the utility of the proposed model on the slide-based prediction of macro and micro metastases in sentinel lymph nodes of breast cancer patients. We achieve competitive results with state-of-the-art convolutional networks while automatically identifying discriminative areas of tissues.

Aïcha BenTaieb, Ghassan Hamarneh

A Deep Model with Shape-Preserving Loss for Gland Instance Segmentation

Segmenting gland instance in histology images requires not only separating glands from a complex background but also identifying each gland individually via accurate boundary detection. This is a very challenging task due to lots of noises from the background, tiny gaps between adjacent glands, and the “coalescence” problem arising from adhesive gland instances. State-of-the-art methods adopted multi-channel/multi-task deep models to separately accomplish pixel-wise gland segmentation and boundary detection, yielding a high model complexity and difficulties in training. In this paper, we present a unified deep model with a new shape-preserving loss which facilities the training for both pixel-wise gland segmentation and boundary detection simultaneously. The proposed shape-preserving loss helps significantly reduce the model complexity and make the training process more controllable. Compared with the current state-of-the-art methods, the proposed deep model with the shape-preserving loss achieves the best overall performance on the 2015 MICCAI Gland Challenge dataset. In addition, the flexibility of integrating the proposed shape-preserving loss into any learning based medical image segmentation networks offers great potential for further performance improvement of other applications.

Zengqiang Yan, Xin Yang, Kwang-Ting Tim Cheng

Model-Based Refinement of Nonlinear Registrations in 3D Histology Reconstruction

Recovering the 3D structure of a stack of histological sections (3D histology reconstruction) requires a linearly aligned reference volume in order to minimize z-shift and “banana effect”. Reconstruction can then be achieved by computing 2D registrations between each section and its corresponding resampled slice in the volume. However, these registrations are often inaccurate due to their inter-modality nature and to the strongly nonlinear deformations introduced by histological processing. Here we introduce a probabilistic model of spatial deformations to efficiently refine these registrations, without the need to revisit the imaging data. Our method takes as input a set of nonlinear registrations between pairs of 2D images (within or across modalities), and uses Bayesian inference to estimate the most likely spanning tree of latent transformations that generated the measured deformations. Results on synthetic and real data show that our algorithm can effectively 3D reconstruct the histology while being robust to z-shift and banana effect. An implementation of the approach, which is compatible with a wide array of existing registration methods, is available at JEI’s website: .

Juan Eugenio Iglesias, Marco Lorenzi, Sebastiano Ferraris, Loïc Peter, Marc Modat, Allison Stevens, Bruce Fischl, Tom Vercauteren

Invasive Cancer Detection Utilizing Compressed Convolutional Neural Network and Transfer Learning

Identification of invasive cancer in Whole Slide Images (WSIs) is crucial for tumor staging as well as treatment planning. However, the precise manual delineation of tumor regions is challenging, tedious and time-consuming. Thus, automatic invasive cancer detection in WSIs is of significant importance. Recently, Convolutional Neural Network (CNN) based approaches advanced invasive cancer detection. However, computation burdens of these approaches become barriers in clinical applications. In this work, we propose to detect invasive cancer employing a lightweight network in a fully convolution fashion without model ensembles. In order to improve the small network’s detection accuracy, we utilized the “soft labels” of a large capacity network to supervise its training process. Additionally, we adopt a teacher guided loss to help the small network better learn from the intermediate layers of the high capacity network. With this suite of approaches, our network is extremely efficient as well as accurate. The proposed method is validated on two large scale WSI datasets. Our approach is performed in an average time of 0.6 and 3.6 min per WSI with a single GPU on our gastric cancer dataset and CAMELYON16, respectively, about 5 times faster than Google Inception V3. We achieved an average FROC of $$81.1\%$$ and $$85.6\%$$ respectively, which are on par with Google Inception V3. The proposed method requires less high performance computing resources than state-of-the-art methods, which makes the invasive cancer diagnosis more applicable in the clinical usage.

Bin Kong, Shanhui Sun, Xin Wang, Qi Song, Shaoting Zhang

Which Way Round? A Study on the Performance of Stain-Translation for Segmenting Arbitrarily Dyed Histological Images

Image-to-image translation based on convolutional neural networks recently gained popularity. Especially approaches relying on generative adversarial networks facilitating unpaired training open new opportunities for image analysis. Making use of an unpaired image-to-image translation approach, we propose a methodology to perform stain-independent segmentation of histological whole slide images requiring annotated training data for one single stain only. In this experimental study, we propose and investigate two different pipelines for performing stain-independent segmentation, which are evaluated with three different stain combinations showing different degrees of difficulty. Whereas one pipeline directly translates the images to be evaluated and uses a segmentation model trained on original data, the other “way round” translates the training data in order to finally segment the original images. The results exhibit good performance especially for the first approach and provide evidence that the direction of translation plays a crucial role considering the final segmentation accuracy.

Michael Gadermayr, Vitus Appel, Barbara M. Klinkhammer, Peter Boor, Dorit Merhof

Graph CNN for Survival Analysis on Whole Slide Pathological Images

Deep neural networks have been used in survival prediction by providing high-quality features. However, few works have noticed the significant role of topological features of whole slide pathological images (WSI). Learning topological features on WSIs requires dense computations. Besides, the optimal topological representation of WSIs is still ambiguous. Moreover, how to fully utilize the topological features of WSI in survival prediction is an open question. Therefore, we propose to model WSI as graph and then develop a graph convolutional neural network (graph CNN) with attention learning that better serves the survival prediction by rendering the optimal graph representations of WSIs. Extensive experiments on real lung and brain carcinoma WSIs have demonstrated its effectiveness.

Ruoyu Li, Jiawen Yao, Xinliang Zhu, Yeqing Li, Junzhou Huang

Fully Automated Blind Color Deconvolution of Histopathological Images

Most whole-slide histological images are stained with hematoxylin and eosin dyes. Slide stain separation or color deconvolution is a crucial step within the digital pathology workflow. In this paper, the blind color deconvolution problem is formulated within the Bayesian framework. Our model takes into account both spatial relations among image pixels and similarity to a given reference color-vector matrix. Using Variational Bayes inference, an efficient new blind color deconvolution method is proposed which provides a fully automated procedure to estimate all the unknowns in the problem. A comparison with classical and current state-of-the-art color deconvolution algorithms, using real images with known ground truth hematoxylin and eosin values, has been carried out demonstrating the superiority of the proposed approach.

Natalia Hidalgo-Gavira, Javier Mateos, Miguel Vega, Rafael Molina, Aggelos K. Katsaggelos

Improving Whole Slide Segmentation Through Visual Context - A Systematic Study

While challenging, the dense segmentation of histology images is a necessary first step to assess changes in tissue architecture and cellular morphology. Although specific convolutional neural network architectures have been applied with great success to the problem, few effectively incorporate visual context information from multiple scales. With this paper, we present a systematic comparison of different architectures to assess how including multi-scale information affects segmentation performance. A publicly available breast cancer and a locally collected prostate cancer datasets are being utilised for this study. The results support our hypothesis that visual context and scale plays a crucial role in histology image classification problems.

Korsuk Sirinukunwattana, Nasullah Khalid Alham, Clare Verrill, Jens Rittscher

Adversarial Domain Adaptation for Classification of Prostate Histopathology Whole-Slide Images

Automatic and accurate Gleason grading of histopathology tissue slides is crucial for prostate cancer diagnosis, treatment, and prognosis. Usually, histopathology tissue slides from different institutions show heterogeneous appearances because of different tissue preparation and staining procedures, thus the predictable model learned from one domain may not be applicable to a new domain directly. Here we propose to adopt unsupervised domain adaptation to transfer the discriminative knowledge obtained from the source domain to the target domain without requiring labeling of images at the target domain. The adaptation is achieved through adversarial training to find an invariant feature space along with the proposed Siamese architecture on the target domain to add a regularization that is appropriate for the whole-slide images. We validate the method on two prostate cancer datasets and obtain significant classification improvement of Gleason scores as compared with the baseline models.

Jian Ren, Ilker Hacihaliloglu, Eric A. Singer, David J. Foran, Xin Qi

Rotation Equivariant CNNs for Digital Pathology

We propose a new model for digital pathology segmentation, based on the observation that histopathology images are inherently symmetric under rotation and reflection. Utilizing recent findings on rotation equivariant CNNs, the proposed model leverages these symmetries in a principled manner. We present a visual analysis showing improved stability on predictions, and demonstrate that exploiting rotation equivariance significantly improves tumor detection performance on a challenging lymph node metastases dataset. We further present a novel derived dataset to enable principled comparison of machine learning models, in combination with an initial benchmark. Through this dataset, the task of histopathology diagnosis becomes accessible as a challenging benchmark for fundamental machine learning research.

Bastiaan S. Veeling, Jasper Linmans, Jim Winkens, Taco Cohen, Max Welling

A Probabilistic Model Combining Deep Learning and Multi-atlas Segmentation for Semi-automated Labelling of Histology

Thanks to their high resolution and contrast enhanced by different stains, histological images are becoming increasingly widespread in atlas construction. Building atlases with histology requires manual delineation of a set of regions of interest on a large amount of sections. This process is tedious, time-consuming, and rather inefficient due to the high similarity of adjacent sections. Here we propose a probabilistic model for semi-automated segmentation of stacks of histological sections, in which the user manually labels a sparse set of sections (e.g., one every n), and lets the algorithm complete the segmentation for other sections automatically. The proposed model integrates in a principled manner two families of segmentation techniques that have been very successful in brain imaging: multi-atlas segmentation (MAS) and convolutional neural networks (CNNs). Within this model, we derive a Generalised Expectation Maximisation algorithm to compute the most likely segmentation. Experiments on the Allen dataset show that the model successfully combines the strengths of both techniques (effective label propagation of MAS, and robustness to misregistration of CNNs), and produces significantly more accurate results than using either of them independently.

Alessia Atzeni, Marnix Jansen, Sébastien Ourselin, Juan Eugenio Iglesias

BESNet: Boundary-Enhanced Segmentation of Cells in Histopathological Images

We propose a novel deep learning method called Boundary-Enhanced Segmentation Network (BESNet) for the detection and semantic segmentation of cells on histopathological images. The semantic segmentation of small regions using fully convolutional networks typically suffers from inaccuracies around the boundaries of small structures, like cells, because the probabilities often become blurred. In this work, we propose a new network structure that encodes input images to feature maps similar to U-net but utilizes two decoding paths that restore the original image resolution. One decoding path enhances the boundaries of cells, which can be used to improve the quality of the entire cell segmentation achieved in the other decoding path. We explore two strategies for enhancing the boundaries of cells: (1) skip connections of feature maps, and (2) adaptive weighting of loss functions. In (1), the feature maps from the boundary decoding path are concatenated with the decoding path for entire cell segmentation. In (2), an adaptive weighting of the loss for entire cell segmentation is performed when boundaries are not enhanced strongly, because detecting such parts is difficult. The detection rate of ganglion cells was 80.0% with 1.0 false positives per histopathology slice. The mean Dice index representing segmentation accuracy was 74.0%. BESNet produced a similar detection performance and higher segmentation accuracy than comparable U-net architectures without our modifications.

Hirohisa Oda, Holger R. Roth, Kosuke Chiba, Jure Sokolić, Takayuki Kitasaka, Masahiro Oda, Akinari Hinoki, Hiroo Uchida, Julia A. Schnabel, Kensaku Mori

Panoptic Segmentation with an End-to-End Cell R-CNN for Pathology Image Analysis

The morphological clues of various cancer cells are essential for pathologists to determine the stages of cancers. In order to obtain the quantitative morphological information, we present an end-to-end network for panoptic segmentation of pathology images. Recently, many methods have been proposed, focusing on the semantic-level or instance-level cell segmentation. Unlike existing cell segmentation methods, the proposed network unifies detecting, localizing objects and assigning pixel-level class information to regions with large overlaps such as the background. This unifier is obtained by optimizing the novel semantic loss, the bounding box loss of Region Proposal Network (RPN), the classifier loss of RPN, the background-foreground classifier loss of segmentation Head instead of class-specific loss, the bounding box loss of proposed cell object, and the mask loss of cell object. The results demonstrate that the proposed method not only outperforms state-of-the-art approaches to the 2017 MICCAI Digital Pathology Challenge dataset, but also proposes an effective and end-to-end solution for the panoptic segmentation challenge.

Donghao Zhang, Yang Song, Dongnan Liu, Haozhe Jia, Siqi Liu, Yong Xia, Heng Huang, Weidong Cai

Integration of Spatial Distribution in Imaging-Genetics

To better understand diseases such as cancer, it is crucial for computational inference to quantify the spatial distribution of various cell types within a tumor. To this end, we used Ripley’s K-statistic, which captures the spatial distribution patterns at different scales of both individual point sets and interactions between multiple point sets. We propose to improve the expressivity of histopathology image features by incorporating this descriptor to capture potential cellular interactions, especially interactions between lymphocytes and epithelial cells. We demonstrate the utility of the Ripley’s K-statistic by analyzing digital slides from 710 TCGA breast invasive carcinoma (BRCA) patients. In particular, we consider its use in the context of imaging-genetics to understand correlations between gene expression and image features using canonical correlation analysis (CCA). Our analysis shows that including these spatial features leads to more significant associations between image features and gene expression.

Vaishnavi Subramanian, Weizhao Tang, Benjamin Chidester, Jian Ma, Minh N. Do

Multiple Instance Learning for Heterogeneous Images: Training a CNN for Histopathology

Multiple instance (MI) learning with a convolutional neural network enables end-to-end training in the presence of weak image-level labels. We propose a new method for aggregating predictions from smaller regions of the image into an image-level classification by using the quantile function. The quantile function provides a more complete description of the heterogeneity within each image, improving image-level classification. We also adapt image augmentation to the MI framework by randomly selecting cropped regions on which to apply MI aggregation during each epoch of training. This provides a mechanism to study the importance of MI learning. We validate our method on five different classification tasks for breast tumor histology and provide a visualization method for interpreting local image classifications that could lead to future insights into tumor heterogeneity.

Heather D. Couture, J. S. Marron, Charles M. Perou, Melissa A. Troester, Marc Niethammer

Optical and Histology Applications: Microscopy Applications


Cell Detection with Star-Convex Polygons

Automatic detection and segmentation of cells and nuclei in microscopy images is important for many biological applications. Recent successful learning-based approaches include per-pixel cell segmentation with subsequent pixel grouping, or localization of bounding boxes with subsequent shape refinement. In situations of crowded cells, these can be prone to segmentation errors, such as falsely merging bordering cells or suppressing valid cell instances due to the poor approximation with bounding boxes. To overcome these issues, we propose to localize cell nuclei via star-convex polygons, which are a much better shape representation as compared to bounding boxes and thus do not need shape refinement. To that end, we train a convolutional neural network that predicts for every pixel a polygon for the cell instance at that position. We demonstrate the merits of our approach on two synthetic datasets and one challenging dataset of diverse fluorescence microscopy images.

Uwe Schmidt, Martin Weigert, Coleman Broaddus, Gene Myers

Deep Convolutional Gaussian Mixture Model for Stain-Color Normalization of Histopathological Images

Automated microscopic analysis of stained histopathological images is degraded by the amount of color and intensity variations in data. This paper presents a novel unsupervised probabilistic approach by integrating a convolutional neural network (CNN) and the Gaussian mixture model (GMM) in a unified framework, which jointly optimizes the modeling and normalizing the color and intensity of hematoxylin- and eosin-stained (H&E) histological images. In contrast to conventional GMM-based methods that are applied only on the color distribution of data for stain color normalization, our proposal learns how to cluster the tissue structures according to their shape and appearance and simultaneously fits a multivariate GMM to the data. This approach is more robust than standard GMM in the presence of strong staining variations because fitting the GMM is conditioned on the appearance of tissue structures in the density channel of an image. Performing a gradient descent optimization in an end-to-end learning, the network learns to maximize the log-likelihood of data given estimated parameters of multivariate Gaussian distributions. Our method does not need ground truth, shape and color assumptions of image contents or manual tuning of parameters and thresholds which makes it applicable to a wide range of histopathological images. Experiments show that our proposed method outperforms the state-of-the-art algorithms in terms of achieving a higher color constancy.

Farhad Ghazvinian Zanjani, Svitlana Zinger, Peter H. N. de With

Learning to Segment 3D Linear Structures Using Only 2D Annotations

We propose a loss function for training a Deep Neural Network (DNN) to segment volumetric data, that accommodates ground truth annotations of 2D projections of the training volumes, instead of annotations of the 3D volumes themselves. In consequence, we significantly decrease the amount of annotations needed for a given training set. We apply the proposed loss to train DNNs for segmentation of vascular and neural networks in microscopy images and demonstrate only a marginal accuracy loss associated to the significant reduction of the annotation effort. The lower labor cost of deploying DNNs, brought in by our method, can contribute to a wide adoption of these techniques for analysis of 3D images of linear structures.

Mateusz Koziński, Agata Mosinska, Mathieu Salzmann, Pascal Fua

A Multiresolution Convolutional Neural Network with Partial Label Training for Annotating Reflectance Confocal Microscopy Images of Skin

We describe a new multiresolution “nested encoder-decoder” convolutional network architecture and use it to annotate morphological patterns in reflectance confocal microscopy (RCM) images of human skin for aiding cancer diagnosis. Skin cancers are the most common types of cancers, melanoma being the most deadly among them. RCM is an effective, non-invasive pre-screening tool for skin cancer diagnosis, with the required cellular resolution. However, images are complex, low-contrast, and highly variable, so that it requires months to years of expert-level training for clinicians to be able to make accurate assessments. In this paper we address classifying 4 key clinically important structural/textural patterns in RCM images. The occurrence and morphology of these patterns are used by clinicians for diagnosis of melanomas. The large size of RCM images, the large variance of pattern size, the large scale range over which patterns appear, the class imbalance in collected images, and the lack of fully-labelled images all make this a challenging problem to address, even with automated machine learning tools. We designed a novel nested U-net architecture to cope with these challenges, and a selective loss function to handle partial labeling. Trained and tested on 56 melanoma-suspicious, partially labelled, 12k $$\times $$ 12k pixel images, our network automatically annotated RCM images for these diagnostic patterns with high sensitivity and specificity, providing consistent labels for unlabelled sections of the test images. We believe that providing such annotation in a fast manner will aid clinicians in achieving diagnostic accuracy, and perhaps more important, dramatically facilitate clinical training, thus enabling much more rapid adoption of RCM into widespread clinical use process. In addition our adaptation of U-net architecture provides an intrinsically multiresolution deep network that may be useful in other challenging biomedical image analysis applications.

Alican Bozkurt, Kivanc Kose, Christi Alessi-Fox, Melissa Gill, Jennifer Dy, Dana Brooks, Milind Rajadhyaksha

Weakly-Supervised Learning-Based Feature Localization for Confocal Laser Endomicroscopy Glioma Images

Confocal Laser Endomicroscopy (CLE) is novel handheld fluorescence imaging technology that has shown promise for rapid intraoperative diagnosis of brain tumor tissue. Currently CLE is capable of image display only and lacks an automatic system to aid the surgeon in diagnostically analyzing the images. The goal of this project was to develop a computer-aided diagnostic approach for CLE imaging of human glioma with feature localization function. Despite the tremendous progress in object detection and image segmentation methods in recent years, most of such methods require large annotated datasets for training. However, manual annotation of thousands of histopathology images by physicians is costly and time consuming. To overcome this problem, we constructed a Weakly-Supervised Learning (WSL)-based model for feature localization that trains on image-level annotations, and then localizes incidences of a class-of-interest in the test image. We developed a novel convolutional neural network for diagnostic features localization from CLE images by employing a novel multiscale activation map that is laterally inhibited and collaterally integrated. To validate our method, we compared the model output to the manual annotation performed by four neurosurgeons on test images. The model achieved 88% mean accuracy and 86% mean intersection over union on intermediate features and 87% mean accuracy and 88% mean intersection over union on restrictive fine features, while outperforming other state of the art methods tested. This system can improve accuracy and efficiency in characterization of CLE images of glioma tissue during surgery, and may augment intraoperative decision-making regarding the tumor margin and improve brain tumor resection.

Mohammadhassan Izadyyazdanabadi, Evgenii Belykh, Claudio Cavallo, Xiaochun Zhao, Sirin Gandhi, Leandro Borba Moreira, Jennifer Eschbacher, Peter Nakaji, Mark C. Preul, Yezhou Yang

Synaptic Partner Prediction from Point Annotations in Insect Brains

High-throughput electron microscopy allows recording of large stacks of neural tissue with sufficient resolution to extract the wiring diagram of the underlying neural network. Current efforts to automate this process focus mainly on the segmentation of neurons. However, in order to recover a wiring diagram, synaptic partners need to be identified as well. This is especially challenging in insect brains like Drosophila melanogaster, where one presynaptic site is associated with multiple postsynaptic elements. Here we propose a 3D U-Net architecture to directly identify pairs of voxels that are pre- and postsynaptic to each other. To that end, we formulate the problem of synaptic partner identification as a classification problem on long-range edges between voxels to encode both the presence of a synaptic pair and its direction. This formulation allows us to directly learn from synaptic point annotations instead of more expensive voxel-based synaptic cleft or vesicle annotations. We evaluate our method on the MICCAI 2016 CREMI challenge and improve over the current state of the art, producing 3% fewer errors than the next best method (Code at: ).

Julia Buhmann, Renate Krause, Rodrigo Ceballos Lentini, Nils Eckstein, Matthew Cook, Srinivas Turaga, Jan Funke

Synaptic Cleft Segmentation in Non-isotropic Volume Electron Microscopy of the Complete Drosophila Brain

Neural circuit reconstruction at single synapse resolution is increasingly recognized as crucially important to decipher the function of biological nervous systems. Volume electron microscopy in serial transmission or scanning mode has been demonstrated to provide the necessary resolution to segment or trace all neurites and to annotate all synaptic connections.Automatic annotation of synaptic connections has been done successfully in near isotropic electron microscopy of vertebrate model organisms. Results on non-isotropic data in insect models, however, are not yet on par with human annotation.We designed a new 3D-U-Net architecture to optimally represent isotropic fields of view in non-isotropic data. We used regression on a signed distance transform of manually annotated synaptic clefts of the CREMI challenge dataset to train this model and observed significant improvement over the state of the art.We developed open source software for optimized parallel prediction on very large volumetric datasets and applied our model to predict synaptic clefts in a 50 tera-voxels dataset of the complete Drosophila brain. Our model generalizes well to areas far away from where training data was available.

Larissa Heinrich, Jan Funke, Constantin Pape, Juan Nunez-Iglesias, Stephan Saalfeld

Weakly Supervised Representation Learning for Endomicroscopy Image Analysis

This paper proposes a weakly-supervised representation learning framework for probe-based confocal laser endomicroscopy (pCLE). Unlike previous frame-based and mosaic-based methods, the proposed framework adopts deep convolutional neural networks and integrates frame-based feature learning, global diagnosis prediction and local tumor detection into a unified end-to-end model. The latent objects in pCLE mosaics are inferred via semantic label propagation and the deep convolutional neural networks are trained with a composite loss function. Experiments on 700 pCLE samples demonstrate that the proposed method trained with only global supervisions is able to achieve higher accuracy on global and local diagnosis prediction.

Yun Gu, Khushi Vyas, Jie Yang, Guang-Zhong Yang

DeepHCS: Bright-Field to Fluorescence Microscopy Image Conversion Using Deep Learning for Label-Free High-Content Screening

In this paper, we propose a novel image processing method, DeepHCS, to transform bright-field microscopy images into synthetic fluorescence images of cell nuclei biomarkers commonly used in high-content drug screening. The main motivation of the proposed work is to automatically generate virtual biomarker images from conventional bright-field images, which can greatly reduce time-consuming and laborious tissue preparation efforts and improve the throughput of the screening process. DeepHCS uses bright-field images and their corresponding cell nuclei staining (DAPI) fluorescence images as a set of image pairs to train a series of end-to-end deep convolutional neural networks. By leveraging a state-of-the-art deep learning method, the proposed method can produce synthetic fluorescence images comparable to real DAPI images with high accuracy. We demonstrate the efficacy of this method using a real glioblastoma drug screening dataset with various quality metrics, including PSNR, SSIM, cell viability correlation (CVC), the area under the curve (AUC), and the IC50.

Gyuhyun Lee, Jeong-Woo Oh, Mi-Sun Kang, Nam-Gu Her, Myoung-Hee Kim, Won-Ki Jeong

Optical and Histology Applications: Optical Coherence Tomography and Other Optical Imaging Applications


A Cascaded Refinement GAN for Phase Contrast Microscopy Image Super Resolution

Phase contrast microscopy is a widely-used non-invasive technique for monitoring live cells over time. High-throughput biological experiments expect a wide-view (i.e., a low microscope magnification) to monitor the entire cell population and a high magnification on individual cell’s details, which is hard to achieve simultaneously. In this paper, we propose a cascaded refinement Generative Adversarial Network (GAN) for phase contrast microscopy image super-resolution. Our algorithm uses an optic-related data enhancement and super-resolves a phase contrast microscopy image in a coarse-to-fine fashion, with a new loss function consisting of a content loss and an adversarial loss. The proposed algorithm is both qualitatively and quantitatively evaluated on a dataset of 500 phase contrast microscopy images, showing its superior performance for super-resolving phase contrast microscopy images. The proposed algorithm provides a computational solution on achieving a high magnification on individual cell’s details and a wide-view on cell populations at the same time, which will benefit the microscopy community.

Liang Han, Zhaozheng Yin

Multi-context Deep Network for Angle-Closure Glaucoma Screening in Anterior Segment OCT

A major cause of irreversible visual impairment is angle-closure glaucoma, which can be screened through imagery from Anterior Segment Optical Coherence Tomography (AS-OCT). Previous computational diagnostic techniques address this screening problem by extracting specific clinical measurements or handcrafted visual features from the images for classification. In this paper, we instead propose to learn from training data a discriminative representation that may capture subtle visual cues not modeled by predefined features. Based on clinical priors, we formulate this learning with a presented Multi-Context Deep Network (MCDN) architecture, in which parallel Convolutional Neural Networks are applied to particular image regions and at corresponding scales known to be informative for clinically diagnosing angle-closure glaucoma. The output feature maps of the parallel streams are merged into a classification layer to produce the deep screening result. Moreover, we incorporate estimated clinical parameters to further enhance performance. On a clinical AS-OCT dataset, our system is validated through comparisons to previous screening methods.

Huazhu Fu, Yanwu Xu, Stephen Lin, Damon Wing Kee Wong, Baskaran Mani, Meenakshi Mahesh, Tin Aung, Jiang Liu

Analysis of Morphological Changes of Lamina Cribrosa Under Acute Intraocular Pressure Change

Glaucoma is the second leading cause of blindness worldwide. Despite active research efforts driven by the importance of diagnosis and treatment of the optic degenerative neuropathy, the relationship between structural and functional changes along the glaucomateous evolution are still not clearly understood. Dynamic changes of the lamina cribrosa (LC) in the presence of intraocular pressure (IOP) were suggested to play a significant role in optic nerve damage, which motivates the proposed research to explore the relationship of changes of the 3D structure of the LC collagen meshwork to clinical diagnosis. We introduce a framework to quantify 3D dynamic morphological changes of the LC under acute IOP changes in a series of swept-source optical coherence tomography (SS-OCT) scans taken under different pressure states. Analysis of SS-OCT images faces challenges due to low signal-to-noise ratio, anisotropic resolution, and observation variability caused by subject and ocular motions. We adapt unbiased diffeomorphic atlas building which serves multiple purposes critical for this analysis. Analysis of deformation fields yields desired global and local information on pressure-induced geometric changes. Deformation variability, estimated with repeated images of a healthy volunteer without IOP elevation, is found to be a magnitude smaller than pressure-induced changes and thus illustrates feasibility of the proposed framework. Results in a clinical study with healthy, glaucoma suspect, and glaucoma subjects demonstrate the potential of the proposed method for non-invasive in vivo analysis of LC dynamics, potentially leading to early prediction and diagnosis of glaucoma.

Mathilde Ravier, Sungmin Hong, Charly Girot, Hiroshi Ishikawa, Jenna Tauber, Gadi Wollstein, Joel Schuman, James Fishbaugh, Guido Gerig

Beyond Retinal Layers: A Large Blob Detection for Subretinal Fluid Segmentation in SD-OCT Images

Purpose: To automatically segment neurosensory retinal detachment (NRD)-associated subretinal fluid in spectral domain optical coherence tomography (SD-OCT) images by constructing a Hessian-based Aggregate generalized Laplacian of Gaussian algorithm without the use of retinal layer segmentation. Methods: The B-scan is first filtered into small blob candidate regions based on local convexity by aggregating the log-scale-normalized convolution responses of each individual gLoG filter. Two Hessian-based regional features are extracted based on the aggregate response map. Pooling with regional intensity, the feature vectors are fed into an unsupervised clustering algorithm. By voting the blob candidates into the superpixels, the initial subretinal fluid regions are obtained. Finally, an active contour with narrowband implementation is utilized to obtain integrated segmentations. Results: The testing data set with 23 longitudinal SD-OCT cube scans from 12 eyes of 12 patients are used to evaluate the proposed algorithm. Comparing with two independent experts’ manual segmentations, our algorithm obtained a mean true positive volume fraction 95.15%, positive predicative value 93.65% and dice similarity coefficient 94.35%, respectively. Conclusions: Without retinal layer segmentation, the proposed algorithm can produce higher segmentation accuracy comparing with state-of-the-art methods that relied on retinal layer segmentation results. Our model may provide reliable subretinal fluid segmentations for NRD from SD-OCT images.

Zexuan Ji, Qiang Chen, Menglin Wu, Sijie Niu, Wen Fan, Songtao Yuan, Quansen Sun

Automated Choroidal Neovascularization Detection for Time Series SD-OCT Images

Choroidal neovascularization (CNV), caused by new blood vessels in the choroid growing through the Bruch’s membrane, is an important manifestation of terminal age-related macular degeneration (AMD). Automated CNV detection in three-dimensional (3D) spectral-domain optical coherence tomography (SD-OCT) images is still a huge challenge. This paper presents an automated CNV detection method based on object tracking strategy for time series SD-OCT volumetric images. In our proposed scheme, experts only need to manually calibrate CNV lesion area for the first moment of each patient, and then the CNV of the following moments will be automatically detected. In order to fully represent space consistency of CNV, a 3D-histogram of oriented gradient (3D-HOG) feature is constructed for the generation of random forest model. Finally, the similarity between training and testing samples is measured for model updating. The experiments on 258 SD-OCT cubes from 12 eyes in 12 patients with CNV demonstrate that our results have a high correlation with the manual segmentations. The average of correlation coefficients and overlap ratio for CNV projection area are 0.907 and 83.96%, respectively.

Yuchun Li, Sijie Niu, Zexuan Ji, Wen Fan, Songtao Yuan, Qiang Chen

CapsDeMM: Capsule Network for Detection of Munro’s Microabscess in Skin Biopsy Images

This paper presents an approach for automatic detection of Munro’s Microabscess in stratum corneum (SC) of human skin biopsy in order to realize a machine assisted diagnosis of Psoriasis. The challenge of detecting neutrophils in presence of nucleated cells is solved using the recent advances of deep learning algorithms. Separation of SC layer, extraction of patches from the layer followed by classification of patches with respect to presence or absence of neutrophils form the basis of the overall approach which is effected through an integration of a U-Net based segmentation network and a capsule network for classification. The novel design of the present capsule net leads to a drastic reduction in the number of parameters without any noticeable compromise in the overall performance. The research further addresses the challenge of dealing with Mega-pixel images (in 10X) vis-à-vis Giga-pixel ones (in 40X). The promising result coming out of an experiment on a dataset consisting of 273 real-life images shows that a practical system is possible based on the present research. The implementation of our system is available at .

Anabik Pal, Akshay Chaturvedi, Utpal Garain, Aditi Chandra, Raghunath Chatterjee, Swapan Senapati

Webly Supervised Learning for Skin Lesion Classification

Within medical imaging, manual curation of sufficient well-labeled samples is cost, time and scale-prohibitive. To improve the representativeness of the training dataset, for the first time, we present an approach to utilize large amounts of freely available web data through web-crawling. To handle noise and weak nature of web annotations, we propose a two-step transfer learning based training process with a robust loss function, termed as Webly Supervised Learning (WSL) to train deep models for the task. We also leverage search by image to improve the search specificity of our web-crawling and reduce cross-domain noise. Within WSL, we explicitly model the noise structure between classes and incorporate it to selectively distill knowledge from the web data during model training. To demonstrate improved performance due to WSL, we benchmarked on a publicly available 10-class fine-grained skin lesion classification dataset and report a significant improvement of top-1 classification accuracy from 71.25% to 80.53% due to the incorporation of web-supervision.

Fernando Navarro, Sailesh Conjeti, Federico Tombari, Nassir Navab

Feature Driven Local Cell Graph (FeDeG): Predicting Overall Survival in Early Stage Lung Cancer

The local spatial arrangement of nuclei in histopathology image has been shown to have prognostic value in the context of different cancers. In order to capture the nuclear architectural information locally, local cell cluster graph based measurements have been proposed. However, conventional ways of cell graph construction only utilize nuclear spatial proximity, and do not differentiate different cell types while constructing a cell graph. In this paper, we present feature driven local cell graph (FeDeG), a new approach to constructing local cell graphs by simultaneously considering spatial proximity and attributes of the individual nuclei (e.g. shape, size, texture). In addition, we designed a new set of quantitative graph derived metrics to be extracted from FeDeGs, in turn capturing the interplay between different local cell clusters. We evaluated the efficacy of FeDeG features in a digitized H&E stained tissue micro-array (TMA) images cohort consists of 434 early stage non-small cell lung cancer for predicting short-term (<5 years) vs long-term (>5 years) survival. Across a 100 runs of 10-fold cross-validation, a linear discriminant classifier in conjunction with the 15 most predictive FeDeG features identified via the Wilcoxon Rank Sum Test (WRST) yielded an average of AUC = 0.68. By comparison, four state-of-the-art pathomic and a deep learning based classifier had a corresponding AUC of 0.56, 0.54, 0.61, 0.62, and 0.55 respectively.

Cheng Lu, Xiangxue Wang, Prateek Prasanna, German Corredor, Geoffrey Sedor, Kaustav Bera, Vamsidhar Velcheti, Anant Madabhushi

Cardiac, Chest and Abdominal Applications: Cardiac Imaging Applications


Towards Accurate and Complete Registration of Coronary Arteries in CTA Images

Coronary computed tomography angiography (CCTA) has been widely used nowadays. By combining multiple intra-subject CCTA images from different dates or different phases, cardiologists can monitor the disease progress, and researchers can explore the rules of coronary artery motion and changes within a cardiac cycle. For direct comparison and high efficiency, alignment of arteries is necessary. In this paper, we propose an automated method for accurate and complete registration of coronary arteries. Our method includes bifurcation matching, segment registration, and a novel approach to further improve the completeness of registration by combining the previous results and a level set algorithm. Our method is evaluated using 36 CCTA image pairs captured at different dates or different phases. The average distance error is $$0.044 \pm 0.008$$ mm and the average correct rate of registration is 90.7%.

Shaowen Zeng, Jianjiang Feng, Yunqiang An, Bin Lu, Jiwen Lu, Jie Zhou

Quantifying Tensor Field Similarity with Global Distributions and Optimal Transport

Strain tensor fields quantify tissue deformation and are important for functional analysis of moving organs such as the heart and the tongue. Strain data can be readily obtained using medical imaging. However, quantification of similarity between different data sets is difficult. Strain patterns vary in space and time, and are inherently multidimensional. Also, the same type of mechanical deformation can be applied to different shapes; hence, automatic quantification of similarity should be unaffected by the geometry of the objects being deformed. In the pattern recognition literature, shapes and vector fields have been classified via global distributions. This study uses a distribution of mechanical properties (a 3D histogram), and the Wasserstein distance from optimal transport theory is used to measure histogram similarity. To evaluate the method’s consistency in matching deformations across different objects, the proposed approach was used to sort strain fields according to their similarity. Performance was compared to sorting via maximum shear distribution (a 1D histogram) and tensor residual magnitude in perfectly registered objects. The technique was also applied to correlate muscle activation to muscular contraction observed via tagged MRI. The results show that the proposed approach accurately matches deformation regardless of the shape of the object being deformed. Sorting accuracy surpassed 1D shear distribution and was on par with residual magnitude, but without the need for registration between objects.

Arnold D. Gomez, Maureen L. Stone, Philip V. Bayly, Jerry L. Prince

Cardiac Motion Scoring with Segment- and Subject-Level Non-local Modeling

Motion scoring of cardiac myocardium is of paramount importance for early detection and diagnosis of various cardiac disease. It aims at identifying regional wall motions into one of the four types including normal, hypokinetic, akinetic, and dyskinetic, and is extremely challenging due to the complex myocardium deformation and subtle inter-class difference of motion patterns. All existing work on automated motion analysis are focused on binary abnormality detection to avoid the much more demanding motion scoring, which is urgently required in real clinical practice yet has never been investigated before. In this work, we propose Cardiac-MOS, the first powerful method for cardiac motion scoring from cardiac MR sequences based on deep convolution neural network. Due to the locality of convolution, the relationship between distant non-local responses of the feature map cannot be explored, which is closely related to motion difference between segments. In Cardiac-MOS, such non-local relationship is modeled with non-local neural network within each segment and across all segments of one subject, i.e., segment- and subject-level non-local modeling, and lead to obvious performance improvement. Besides, Cardiac-MOS can effectively extract motion information from MR sequences of various lengths by interpolating the convolution kernel along the temporal dimension, therefore can be applied to MR sequences of multiple sources. Experiments on 1440 myocardium segments of 90 subjects from short axis MR sequences of multiple lengths prove that Cardiac-MOS achieves reliable performance, with correlation of 0.926 for motion score index estimation and accuracy of 77.4% for motion scoring. Cardiac-MOS also outperforms all existing work for binary abnormality detection. As the first automatic motion scoring solution, Cardiac-MOS demonstrates great potential in future clinical application.

Wufeng Xue, Gary Brahm, Stephanie Leung, Ogla Shmuilovich, Shuo Li

Computational Heart Modeling for Evaluating Efficacy of MRI Techniques in Predicting Appropriate ICD Therapy

The objective of this study is to use individualized heart computer models in evaluating efficacy of myocardial infarct (MI) mass determined by two different MRI techniques in predicting patient risk for post-MI ventricular tachycardia (VT). 27 patients with MI underwent late gadolinium-enhanced MRI using inversion-recovery fast gradient echo (IR-FGRE) and multi-contrast late enhancement (MCLE) prior to implantable cardioverter defibrillators (ICD) implantation and were followed up for 6–46 months. The myocardium, MI core (IC), and border zone (BZ) were segmented from the images using previously validated techniques. The segmented structures were then reconstructed as a high-resolution label map in 3D. Individualized image-based computational models were built separately for each imaging technique; simulations of propensity to VT were conducted with each model. The imaging methods were evaluated for sensitivity and specificity by comparing simulated inducibility of VT to clinical outcome (appropriate ICD therapy) in patients. Twelve patients had at least one appropriate ICD therapy for VT at follow-up. For both MCLE and IR-FGRE, the outcomes of the simulations of VT were significantly different between the groups with and without ICD therapy. Between the IR-FGRE and MCLE, the virtual models built using the latter may have yielded higher sensitivity and specificity in predicting appropriate ICD therapy.

Eranga Ukwatta, Plamen Nikolov, Natalia Trayanova, Graham Wright

Multiview Two-Task Recursive Attention Model for Left Atrium and Atrial Scars Segmentation

Late Gadolinium Enhanced Cardiac MRI (LGE-CMRI) for detecting atrial scars in atrial fibrillation (AF) patients has recently emerged as a promising technique to stratify patients, guide ablation therapy and predict treatment success. Visualisation and quantification of scar tissues require a segmentation of both the left atrium (LA) and the high intensity scar regions from LGE-CMRI images. These two segmentation tasks are challenging due to the cancelling of healthy tissue signal, low signal-to-noise ratio and often limited image quality in these patients. Most approaches require manual supervision and/or a second bright-blood MRI acquisition for anatomical segmentation. Segmenting both the LA anatomy and the scar tissues automatically from a single LGE-CMRI acquisition is highly in demand. In this study, we proposed a novel fully automated multiview two-task (MVTT) recursive attention model working directly on LGE-CMRI images that combines a sequential learning and a dilated residual learning to segment the LA (including attached pulmonary veins) and delineate the atrial scars simultaneously via an innovative attention model. Compared to other state-of-the-art methods, the proposed MVTT achieves compelling improvement, enabling to generate a patient-specific anatomical and atrial scar assessment model.

Jun Chen, Guang Yang, Zhifan Gao, Hao Ni, Elsa Angelini, Raad Mohiaddin, Tom Wong, Yanping Zhang, Xiuquan Du, Heye Zhang, Jennifer Keegan, David Firmin

Learning Interpretable Anatomical Features Through Deep Generative Models: Application to Cardiac Remodeling

Alterations in the geometry and function of the heart define well-established causes of cardiovascular disease. However, current approaches to the diagnosis of cardiovascular diseases often rely on subjective human assessment as well as manual analysis of medical images. Both factors limit the sensitivity in quantifying complex structural and functional phenotypes. Deep learning approaches have recently achieved success for tasks such as classification or segmentation of medical images, but lack interpretability in the feature extraction and decision processes, limiting their value in clinical diagnosis. In this work, we propose a 3D convolutional generative model for automatic classification of images from patients with cardiac diseases associated with structural remodeling. The model leverages interpretable task-specific anatomic patterns learned from 3D segmentations. It further allows to visualise and quantify the learned pathology-specific remodeling patterns in the original input space of the images. This approach yields high accuracy in the categorization of healthy and hypertrophic cardiomyopathy subjects when tested on unseen MR images from our own multi-centre dataset (100%) as well on the ACDC MICCAI 2017 dataset (90%). We believe that the proposed deep learning approach is a promising step towards the development of interpretable classifiers for the medical imaging domain, which may help clinicians to improve diagnostic accuracy and enhance patient risk-stratification.

Carlo Biffi, Ozan Oktay, Giacomo Tarroni, Wenjia Bai, Antonio De Marvao, Georgia Doumou, Martin Rajchl, Reem Bedair, Sanjay Prasad, Stuart Cook, Declan O’Regan, Daniel Rueckert

Joint Learning of Motion Estimation and Segmentation for Cardiac MR Image Sequences

Cardiac motion estimation and segmentation play important roles in quantitatively assessing cardiac function and diagnosing cardiovascular diseases. In this paper, we propose a novel deep learning method for joint estimation of motion and segmentation from cardiac MR image sequences. The proposed network consists of two branches: a cardiac motion estimation branch which is built on a novel unsupervised Siamese style recurrent spatial transformer network, and a cardiac segmentation branch that is based on a fully convolutional network. In particular, a joint multi-scale feature encoder is learned by optimizing the segmentation branch and the motion estimation branch simultaneously. This enables the weakly-supervised segmentation by taking advantage of features that are unsupervisedly learned in the motion estimation branch from a large amount of unannotated data. Experimental results using cardiac MlRI images from 220 subjects show that the joint learning of both tasks is complementary and the proposed models outperform the competing methods significantly in terms of accuracy and speed.

Chen Qin, Wenjia Bai, Jo Schlemper, Steffen E. Petersen, Stefan K. Piechnik, Stefan Neubauer, Daniel Rueckert

Multi-Input and Dataset-Invariant Adversarial Learning (MDAL) for Left and Right-Ventricular Coverage Estimation in Cardiac MRI

Cardiac functional parameters, such as, the Ejection Fraction (EF) and Cardiac Output (CO) of both ventricles, are most immediate indicators of normal/abnormal cardiac function. To compute these parameters, accurate measurement of ventricular volumes at end-diastole (ED) and end-systole (ES) are required. Accurate volume measurements depend on the correct identification of basal and apical slices in cardiac magnetic resonance (CMR) sequences that provide full coverage of both left (LV) and right (RV) ventricles. This paper proposes a novel adversarial learning (AL) approach based on convolutional neural networks (CNN) that detects and localizes the basal/apical slices in an image volume independently of image-acquisition parameters, such as, imaging device, magnetic field strength, variations in protocol execution, etc. The proposed model is trained on multiple cohorts of different provenance, and learns image features from different MRI viewing planes to learn the appearance and predict the position of the basal and apical planes. To the best of our knowledge, this is the first work tackling the fully automatic detection and position regression of basal/apical slices in CMR volumes in a dataset-invariant manner. We achieve this by maximizing the ability of a CNN to regress the position of basal/apical slices within a single dataset, while minimizing the ability of a classifier to discriminate image features between different data sources. Our results show superior performance over state-of-the-art methods.

Le Zhang, Marco Pereañez, Stefan K. Piechnik, Stefan Neubauer, Steffen E. Petersen, Alejandro F. Frangi

Factorised Spatial Representation Learning: Application in Semi-supervised Myocardial Segmentation

The success and generalisation of deep learning algorithms heavily depend on learning good feature representations. In medical imaging this entails representing anatomical information, as well as properties related to the specific imaging setting. Anatomical information is required to perform further analysis, whereas imaging information is key to disentangle scanner variability and potential artefacts. The ability to factorise these would allow for training algorithms only on the relevant information according to the task. To date, such factorisation has not been attempted. In this paper, we propose a methodology of latent space factorisation relying on the cycle-consistency principle. As an example application, we consider cardiac MR segmentation, where we separate information related to the myocardium from other features related to imaging and surrounding substructures. We demonstrate the proposed method’s utility in a semi-supervised setting: we use very few labelled images together with many unlabelled images to train a myocardium segmentation neural network. Specifically, we achieve comparable performance to fully supervised networks using a fraction of labelled images in experiments on ACDC and a dataset from Edinburgh Imaging Facility QMRI. Code will be made available at .

Agisilaos Chartsias, Thomas Joyce, Giorgos Papanastasiou, Scott Semple, Michelle Williams, David Newby, Rohan Dharmakumar, Sotirios A. Tsaftaris

High-Dimensional Bayesian Optimization of Personalized Cardiac Model Parameters via an Embedded Generative Model

The estimation of patient-specific tissue properties in the form of model parameters is important for personalized physiological models. However, these tissue properties are spatially varying across the underlying anatomical model, presenting a significance challenge of high-dimensional (HD) optimization at the presence of limited measurement data. A common solution to reduce the dimension of the parameter space is to explicitly partition the anatomical mesh, either into a fixed small number of segments or a multi-scale hierarchy. This anatomy-based reduction of parameter space presents a fundamental bottleneck to parameter estimation, resulting in solutions that are either too low in resolution to reflect tissue heterogeneity, or too high in dimension to be reliably estimated within feasible computation. In this paper, we present a novel concept that embeds a generative variational auto-encoder (VAE) into the objective function of Bayesian optimization, providing an implicit low-dimensional (LD) search space that represents the generative code of the HD spatially-varying tissue properties. In addition, the VAE-encoded knowledge about the generative code is further used to guide the exploration of the search space. The presented method is applied to estimating tissue excitability in a cardiac electrophysiological model. Synthetic and real-data experiments demonstrate its ability to improve the accuracy of parameter estimation with more than 10x gain in efficiency.

Jwala Dhamala, Sandesh Ghimire, John L. Sapp, B. Milan Horáček, Linwei Wang

Generative Modeling and Inverse Imaging of Cardiac Transmembrane Potential

Noninvasive reconstruction of cardiac transmembrane potential (TMP) from surface electrocardiograms (ECG) involves an ill-posed inverse problem. Model-constrained regularization is powerful for incorporating rich physiological knowledge about spatiotemporal TMP dynamics. These models are controlled by high-dimensional physical parameters which, if fixed, can introduce model errors and reduce the accuracy of TMP reconstruction. Simultaneous adaptation of these parameters during TMP reconstruction, however, is difficult due to their high dimensionality. We introduce a novel model-constrained inference framework that replaces conventional physiological models with a deep generative model trained to generate TMP sequences from low-dimensional generative factors. Using a variational auto-encoder (VAE) with long short-term memory (LSTM) networks, we train the VAE decoder to learn the conditional likelihood of TMP, while the encoder learns the prior distribution of generative factors. These two components allow us to develop an efficient algorithm to simultaneously infer the generative factors and TMP signals from ECG data. Synthetic and real-data experiments demonstrate that the presented method significantly improve the accuracy of TMP reconstruction compared with methods constrained by conventional physiological models or without physiological constraints.

Sandesh Ghimire, Jwala Dhamala, Prashnna Kumar Gyawali, John L. Sapp, Milan Horacek, Linwei Wang

Pulmonary Vessel Tree Matching for Quantifying Changes in Vascular Morphology

Invasive right-sided heart catheterization (RHC) is currently the gold standard for assessing treatment effects in pulmonary vascular diseases, such as chronic thromboembolic pulmonary hypertension (CTEPH). Quantifying morphological changes by matching vascular trees (pre- and post-treatment) may provide a non-invasive alternative for assessing hemodynamic changes. In this work, we propose a method for quantifying morphological changes, consisting of three steps: constructing vascular trees from the detected pulmonary vessels, matching vascular trees with preserving local tree topology, and quantifying local morphological changes based on Poiseuille’s law (changes in $$radius^{-4}$$ , $$\triangle r^{-4}$$ ). Subsequently, median and interquartile range (IQR) of all local $$\triangle r^{-4}$$ were calculated as global measurements for assessing morphological changes. The vascular tree matching method was validated with 10 synthetic trees and the relation between clinical RHC parameters and quantifications of morphological changes was investigated in 14 CTEPH patients, pre- and post-treatment. In the evaluation with synthetic trees, the proposed method achieved an average residual distance of $$3.09 \pm 1.28$$ mm, which is a substantial improvement over the coherent point drift method ( $$4.32 \pm 1.89$$ mm) and a method with global-local topology preservation ( $$3.92 \pm 1.59$$ mm). In the clinical evaluation, the morphological changes (IQR of $$\triangle r^{-4}$$ ) was significantly correlated with the changes in RHC examinations, $$\triangle \text {sPAP}$$ ( $$\mathrm{R}=-0.62$$ , p-value = 0.019) and $$\triangle \text {mPAP}$$ ( $$\mathrm{R}=-0.56$$ , p-value = 0.038). Quantifying morphological changes may provide a non-invasive assessment of treatment effects in CTEPH patients, consistent with hemodynamic changes from invasive RHC.

Zhiwei Zhai, Marius Staring, Hideki Ota, Berend C. Stoel

MuTGAN: Simultaneous Segmentation and Quantification of Myocardial Infarction Without Contrast Agents via Joint Adversarial Learning

Simultaneous segmentation and full quantification (estimation of all diagnostic indices) of the myocardial infarction (MI) area are crucial for early diagnosis and surgical planning. Current clinical methods still suffer from high-risk, non-reproducibility and time-consumption issues. In this study, the multitask generative adversarial networks (MuTGAN) is proposed as a contrast-free, stable and automatic clinical tool to segment and quantify MIs simultaneously. MuTGAN consists of generator and discriminator modules and is implemented by three seamless connected networks: spatio-temporal feature extraction network comprehensively learns the morphology and kinematic abnormalities of the left ventricle through a novel three-dimensional successive convolution; joint feature learning network learns the complementarity between segmentation and quantification through innovative inter- and intra-skip connection; task relatedness network learns the intrinsic pattern between tasks to increase the accuracy of estimations through creatively utilized adversarial learning. MuTGAN minimizes a generalized divergence to directly optimize the distribution of estimations by using the competition process, which achieves pixel segmentation and full quantification of MIs. Our proposed method yielded a pixel classification accuracy of 96.46%, and the mean absolute error of the MI centroid was 0.977 mm, from 140 clinical subjects. These results indicate the potential of our proposed method in aiding standardized MI assessments.

Chenchu Xu, Lei Xu, Gary Brahm, Heye Zhang, Shuo Li

More Knowledge Is Better: Cross-Modality Volume Completion and 3D+2D Segmentation for Intracardiac Echocardiography Contouring

Using catheter ablation to treat atrial fibrillation increasingly relies on intracardiac echocardiography (ICE) for an anatomical delineation of the left atrium and the pulmonary veins that enter the atrium. However, it is a challenge to build an automatic contouring algorithm because ICE is noisy and provides only a limited 2D view of the 3D anatomy. This work provides the first automatic solution to segment the left atrium and the pulmonary veins from ICE. In this solution, we demonstrate the benefit of building a cross-modality framework that can leverage a database of diagnostic images to supplement the less available interventional images. To this end, we develop a novel deep neural network approach that uses the (i) 3D geometrical information provided by a position sensor embedded in the ICE catheter and the (ii) 3D image appearance information from a set of computed tomography cardiac volumes. We evaluate the proposed approach over 11,000 ICE images collected from 150 clinical patients. Experimental results show that our model is significantly better than a direct 2D image-to-image deep neural network segmentation, especially for less-observed structures.

Haofu Liao, Yucheng Tang, Gareth Funka-Lea, Jiebo Luo, Shaohua Kevin Zhou

Unsupervised Domain Adaptation for Automatic Estimation of Cardiothoracic Ratio

The cardiothoracic ratio (CTR), a clinical metric of heart size in chest X-rays (CXRs), is a key indicator of cardiomegaly. Manual measurement of CTR is time-consuming and can be affected by human subjectivity, making it desirable to design computer-aided systems that assist clinicians in the diagnosis process. Automatic CTR estimation through chest organ segmentation, however, requires large amounts of pixel-level annotated data, which is often unavailable. To alleviate this problem, we propose an unsupervised domain adaptation framework based on adversarial networks. The framework learns domain invariant feature representations from openly available data sources to produce accurate chest organ segmentation for unlabeled datasets. Specifically, we propose a model that enforces our intuition that prediction masks should be domain independent. Hence, we introduce a discriminator that distinguishes segmentation predictions from ground truth masks. We evaluate our system’s prediction based on the assessment of radiologists and demonstrate the clinical practicability for the diagnosis of cardiomegaly. We finally illustrate on the JSRT dataset that the semi-supervised performance of our model is also very promising.

Nanqing Dong, Michael Kampffmeyer, Xiaodan Liang, Zeya Wang, Wei Dai, Eric Xing

TextRay: Mining Clinical Reports to Gain a Broad Understanding of Chest X-Rays

The chest X-ray (CXR) is by far the most commonly performed radiological examination for screening and diagnosis of many cardiac and pulmonary diseases. There is an immense world-wide shortage of physicians capable of providing rapid and accurate interpretation of this study. A radiologist-driven analysis of over two million CXR reports generated an ontology including the 40 most prevalent pathologies on CXR. By manually tagging a relatively small set of sentences, we were able to construct a training set of 959k studies. A deep learning model was trained to predict the findings given the patient frontal and lateral scans. For 12 of the findings we compare the model performance against a team of radiologists and show that in most cases the radiologists agree on average more with the algorithm than with each other.

Jonathan Laserson, Christine Dan Lantsman, Michal Cohen-Sfady, Itamar Tamir, Eli Goz, Chen Brestel, Shir Bar, Maya Atar, Eldad Elnekave

Localization and Labeling of Posterior Ribs in Chest Radiographs Using a CRF-regularized FCN with Local Refinement

Localization and labeling of posterior ribs in radiographs is an important task and a prerequisite for, e.g., quality assessment, image registration, and automated diagnosis. In this paper, we propose an automatic, general approach for localizing spatially correlated landmarks using a fully convolutional network (FCN) regularized by a conditional random field (CRF) and apply it to rib localization. A reduced CRF state space in form of localization hypotheses (generated by the FCN) is used to make CRF inference feasible, potentially missing correct locations. Thus, we propose a second CRF inference step searching for additional locations. To this end, we introduce a novel “refine” label in the first inference step. For “refine”-labeled nodes, small subgraphs are extracted and a second inference is performed on all image pixels. The approach is thoroughly evaluated on 642 images of the public Indiana chest X-ray collection, achieving a landmark localization rate of 94.6%.

Alexander Oliver Mader, Jens von Berg, Alexander Fabritz, Cristian Lorenz, Carsten Meyer

Evaluation of Collimation Prediction Based on Depth Images and Automated Landmark Detection for Routine Clinical Chest X-Ray Exams

The aim of this study was to evaluate the performance of a machine learning algorithm applied to depth images for the automated computation of X-ray beam collimation parameters in radiographic chest examinations including posterior-anterior (PA) and left-lateral (LAT) views. Our approach used as intermediate step a trained classifier for the detection of internal lung landmarks that were defined on X-ray images acquired simultaneously with the depth image. The landmark detection algorithm was evaluated retrospectively in a 5-fold cross validation experiment on the basis of 89 patient data sets acquired in clinical settings. Two auto-collimation algorithms were devised and their results were compared to the reference lung bounding boxes defined on the X-ray images and to the manual collimation parameters set by the radiologic technologists.

Julien Sénégas, Axel Saalbach, Martin Bergtholdt, Sascha Jockel, Detlef Mentrup, Roman Fischbach

Efficient Active Learning for Image Classification and Segmentation Using a Sample Selection and Conditional Generative Adversarial Network

Training robust deep learning (DL) systems for medical image classification or segmentation is challenging due to limited images covering different disease types and severity. We propose an active learning (AL) framework to select most informative samples and add to the training data. We use conditional generative adversarial networks (cGANs) to generate realistic chest xray images with different disease characteristics by conditioning its generation on a real image sample. Informative samples to add to the training set are identified using a Bayesian neural network. Experiments show our proposed AL framework is able to achieve state of the art performance by using about $$35\%$$ of the full dataset, thus saving significant time and effort over conventional methods.

Dwarikanath Mahapatra, Behzad Bozorgtabar, Jean-Philippe Thiran, Mauricio Reyes

Iterative Attention Mining for Weakly Supervised Thoracic Disease Pattern Localization in Chest X-Rays

Given image labels as the only supervisory signal, we focus on harvesting/mining, thoracic disease localizations from chest X-ray images. Harvesting such localizations from existing datasets allows for the creation of improved data sources for computer-aided diagnosis and retrospective analyses. We train a convolutional neural network (CNN) for image classification and propose an attention mining (AM) strategy to improve the model’s sensitivity or saliency to disease patterns. The intuition of AM is that once the most salient disease area is blocked or hidden from the CNN model, it will pay attention to alternative image regions, while still attempting to make correct predictions. However, the model requires to be properly constrained during AM, otherwise, it may overfit to uncorrelated image parts and forget the valuable knowledge that it has learned from the original image classification task. To alleviate such side effects, we then design a knowledge preservation (KP) loss, which minimizes the discrepancy between responses for X-ray images from the original and the updated networks. Furthermore, we modify the CNN model to include multi-scale aggregation (MSA), improving its localization ability on small-scale disease findings, e.g., lung nodules. We validate our method on the publicly-available ChestX-ray14 dataset, outperforming a class activation map (CAM)-based approach, and demonstrating the value of our novel framework for mining disease locations.

Jinzheng Cai, Le Lu, Adam P. Harrison, Xiaoshuang Shi, Pingjun Chen, Lin Yang

Task Driven Generative Modeling for Unsupervised Domain Adaptation: Application to X-ray Image Segmentation

Automatic parsing of anatomical objects in X-ray images is critical to many clinical applications in particular towards image-guided invention and workflow automation. Existing deep network models require a large amount of labeled data. However, obtaining accurate pixel-wise labeling in X-ray images relies heavily on skilled clinicians due to the large overlaps of anatomy and the complex texture patterns. On the other hand, organs in 3D CT scans preserve clearer structures as well as sharper boundaries and thus can be easily delineated. In this paper, we propose a novel model framework for learning automatic X-ray image parsing from labeled CT scans. Specifically, a Dense Image-to-Image network (DI2I) for multi-organ segmentation is first trained on X-ray like Digitally Reconstructed Radiographs (DRRs) rendered from 3D CT volumes. Then we introduce a Task Driven Generative Adversarial Network (TD-GAN) architecture to achieve simultaneous style transfer and parsing for unseen real X-ray images. TD-GAN consists of a modified cycle-GAN substructure for pixel-to-pixel translation between DRRs and X-ray images and an added module leveraging the pre-trained DI2I to enforce segmentation consistency. The TD-GAN framework is general and can be easily adapted to other learning tasks. In the numerical experiments, we validate the proposed model on 815 DRRs and 153 topograms. While the vanilla DI2I without any adaptation fails completely on segmenting the topograms, the proposed model does not require any topogram labels and is able to provide a promising average dice of $$85\%$$ which achieves the same level accuracy of supervised training (88%).

Yue Zhang, Shun Miao, Tommaso Mansi, Rui Liao

Cardiac, Chest and Abdominal Applications: Colorectal, Kidney and Liver Imaging Applications


Towards Automated Colonoscopy Diagnosis: Binary Polyp Size Estimation via Unsupervised Depth Learning

In colon cancer screening, polyp size estimation using only colonoscopy images or videos is difficult even for expert physicians although the size information of polyps is important for diagnosis. Towards the fully automated computer-aided diagnosis (CAD) pipeline, a robust and precise polyp size estimation method is highly desired. However, the size estimation problem of a three-dimensional object from a two-dimensional image is ill-posed due to the lack of three-dimensional spatial information. To circumvent this challenge, we formulate a relaxed form of size estimation as a binary classification problem and solve it by a new deep neural network architecture: BseNet. This relaxed form of size estimation is defined as a two-category classification: under and over a certain polyp dimension criterion that would provoke different clinical treatments (resecting the polyp or not). BseNet estimates the depth map image from an input colonoscopic RGB image using unsupervised deep learning, and integrates RGB with the computed depth information to produce a four-channel RGB-D imagery data, that is subsequently encoded by BseNet to extract deep RGB-D image features and facilitate the size classification into two categories: under and over 10 mm polyps. For the evaluation of BseNet, a large dataset of colonoscopic videos of totally over 16 h is constructed. We evaluate the accuracies of both binary polyp size estimation and polyp detection performance since detection is a prerequisite step of a fully automated CAD system. The experimental results show that our proposed BseNet achieves 79.2 % accuracy for binary polyp-size classification. We also combine the image feature extraction by BseNet and classification of short video clips using a long short-term memory (LSTM) network. Polyp detection (if the video clip contains a polyp or not) shows 88.8 % sensitivity when employing the spatio-temporal image feature extraction and classification.

Hayato Itoh, Holger R. Roth, Le Lu, Masahiro Oda, Masashi Misawa, Yuichi Mori, Shin-ei Kudo, Kensaku Mori

RIIS-DenseNet: Rotation-Invariant and Image Similarity Constrained Densely Connected Convolutional Network for Polyp Detection

Colorectal cancer is the leading cause of cancer-related deaths. Most colorectal cancers are believed to arise from benign adenomatous polyps. Automatic methods for polyp detection with Wireless Capsule Endoscopy (WCE) images are desirable, but the results of current approaches are limited due to the problems of object rotation and high intra-class variability. To address these problems, we propose a rotation invariant and image similarity constrained Densely Connected Convolutional Network (RIIS-DenseNet) model. We first introduce Densely Connected Convolutional Network (DenseNet), which enables the maximum information flow among layers by a densely connected mechanism, to provide end-to-end polyp detection workflow. The rotation-invariant regularization constraint is then introduced to explicitly enforce learned features of the training samples and the corresponding rotation versions to be mapped close to each other. The image similarity constraint is further proposed by imposing the image category information on the features to maintain small intra-class scatter. Our method achieves an inspiring accuracy 95.62% for polyp detection. Extensive experiments on the WCE dataset show that our method has superior performance compared with state-of-the-art methods.

Yixuan Yuan, Wenjian Qin, Bulat Ibragimov, Bin Han, Lei Xing

Interaction Techniques for Immersive CT Colonography: A Professional Assessment

CT Colonography (CTC) is considered the leading imaging technique for colorectal cancer (CRC) screening. However, conventional CTC systems rely on clumsy 2D input devices and stationary flat displays that make it hard to perceive the colon structure in 3D. To visualize such anatomically complex data, the immersion and freedom of movement afforded by Virtual Reality (VR) systems bear the promise to assist clinicians to improve 3D reading, hence, enabling more expedite diagnoses. To this end, we propose iCOLONIC, a set of interaction techniques using VR to perform CTC reading. iCOLONIC combines immersive Fly-Through navigation with positional tracking, multi-scale representations and mini-maps to guide radiologists and surgeons while navigating throughout the colon. Contrary to stationary VR solutions, iCOLONIC allows users to freely walk within a work space to analyze both local and global 3D features. To assess whether our non-stationary VR approach can assist clinicians in improving 3D colon reading and 3D perception, we conducted a user study with three senior radiologists, three senior general surgeons and one neuroradiology intern. Results from formal evaluation sessions demonstrate iCOLONIC’s usability and feasibility as the proposed interaction techniques were seen to improve spatial awareness and promote a more fluent navigation. Moreover, participants remarked that our approach shows great potential to speed up the screening process.

Daniel Simões Lopes, Daniel Medeiros, Soraia Figueiredo Paulo, Pedro Brasil Borges, Vitor Nunes, Vasco Mascarenhas, Marcos Veiga, Joaquim Armando Jorge

Quasi-automatic Colon Segmentation on T2-MRI Images with Low User Effort

About 50% of the patients consulting a gastroenterology clinic report symptoms without detectable cause. Clinical researchers are interested in analyzing the volumetric evolution of colon segments under the effect of different diets and diseases. These studies require non-invasive abdominal MRI scans without using any contrast agent. In this work, we propose a colon segmentation framework designed to support T2-weighted abdominal MRI scans obtained from an unprepared colon. The segmentation process is based on an efficient and accurate quasi-automatic approach that drastically reduces the specialist interaction and effort with respect other state-of-the-art solutions, while decreasing the overall segmentation cost. The algorithm relies on a novel probabilistic tubularity filter, the detection of the colon medial line, probabilistic information extracted from a training set and a final unsupervised clustering. Experimental results presented show the benefits of our approach for clinical use.

B. Orellana, E. Monclús, P. Brunet, I. Navazo, Á. Bendezú, F. Azpiroz

Ordinal Multi-modal Feature Selection for Survival Analysis of Early-Stage Renal Cancer

Existing studies have demonstrated that combining genomic data and histopathological images can better stratify cancer patients with distinct prognosis than using single biomarker, for different biomarkers may provide complementary information. However, these multi-modal data, most high-dimensional, may contain redundant features that will deteriorate the performance of the prognosis model, and therefore it has become a challenging problem to select the informative features for survival analysis from the redundant and heterogeneous feature groups. Existing feature selection methods assume that the survival information of one patient is independent to another, and thus miss the ordinal relationship among the survival time of different patients. To solve this issue, we make use of the important ordinal survival information among different patients and propose an ordinal sparse canonical correlation analysis (i.e., OSCCA) framework to simultaneously identify important image features and eigengenes for survival analysis. Specifically, we formulate our framework basing on sparse canonical correlation analysis model, which aims at finding the best linear projections so that the highest correlation between the selected image features and eigengenes can be achieved. In addition, we also add constrains to ensure that the ordinal survival information of different patients is preserved after projection. We evaluate the effectiveness of our method on an early-stage renal cell carcinoma dataset. Experimental results demonstrate that the selected features correlated strongly with survival, by which we can achieve better patient stratification than the comparing methods.

Wei Shao, Jun Cheng, Liang Sun, Zhi Han, Qianjin Feng, Daoqiang Zhang, Kun Huang

Noninvasive Determination of Gene Mutations in Clear Cell Renal Cell Carcinoma Using Multiple Instance Decisions Aggregated CNN

Kidney clear cell renal cell carcinoma (ccRCC) is the major sub-type of RCC, constituting one the most common cancers worldwide accounting for a steadily increasing mortality rate with 350,000 new cases recorded in 2012. Understanding the underlying genetic mutations in ccRCC provides crucial information enabling malignancy staging and patient survival estimation thus plays a vital role in accurate ccRCC diagnosis, prognosis, treatment planning, and response assessment. Although the underlying gene mutations can be identified by whole genome sequencing of the ccRCC following invasive nephrectomy or kidney biopsy procedures, recent studies have suggested that such mutations may be noninvasively identified by studying image features of the ccRCC from Computed Tomography (CT) data. Such image feature identification currently relies on laborious manual processes based on visual inspection of 2D image slices that are time-consuming and subjective. In this paper, we propose a convolutional neural network approach for automatic detection of underlying ccRCC gene mutations from 3D CT volumes. We aggregate the mutation-presence/absence decisions for all the ccRCC slices in a kidney into a robust singular decision that determines whether the interrogated kidney bears a specific mutation or not. When validated on clinical CT datasets of 267 patients from the TCIA database, our method detected gene mutations with 94% accuracy.

Mohammad Arafat Hussain, Ghassan Hamarneh, Rafeef Garbi

Combining Convolutional and Recurrent Neural Networks for Classification of Focal Liver Lesions in Multi-phase CT Images

Computer-aided diagnosis (CAD) systems are useful for assisting radiologists with clinical diagnoses by classifying focal liver lesions (FLLs) based on multi-phase computed tomography (CT) images. Although many studies have conducted in the field, there still remain two challenges. First, the temporal enhancement pattern is hard to represent effectively. Second, the local and global information of lesions both are necessary for this task. In this paper, we proposed a framework based on deep learning, called ResGL-BDLSTM, which combines a residual deep neural network (ResNet) with global and local pathways (ResGL Net) with a bi-directional long short-term memory (BD-LSTM) model for the task of focal liver lesions classification in multi-phase CT images. In addition, we proposed a novel loss function to train the proposed framework. The loss function is composed of an inter-loss and intra-loss, which can improve the robustness of the framework. The proposed framework outperforms state-of-the-art approaches by achieving a 90.93% mean accuracy.

Dong Liang, Lanfen Lin, Hongjie Hu, Qiaowei Zhang, Qingqing Chen, Yutaro lwamoto, Xianhua Han, Yen-Wei Chen

Construction of a Spatiotemporal Statistical Shape Model of Pediatric Liver from Cross-Sectional Data

This paper proposes a spatiotemporal statistical shape model of a pediatric liver, which has potential applications in computer-aided diagnosis of the abdomen. Shapes are analyzed in the space of a level set function, which has computational advantages over the diffeomorphic framework commonly employed in conventional studies. We first calculate the time-varying average of the mean shape development using a kernel regression technique with adaptive bandwidth. Then, eigenshape modes for every timepoint are calculated using principal component analysis with an additional regularization term that ensures the smoothness of the temporal change of the eigenshape modes. To further improve the performance, we applied data augmentation using a level set-based nonlinear morphing technique. The proposed algorithm was evaluated in the context of a spatiotemporal statistical shape modeling of a liver using 42 manually segmented livers from children whose age ranged from approximately 2 weeks to 95 months. Our method achieved a higher generalization and specificity ability compared with conventional methods.

Atsushi Saito, Koyo Nakayama, Antonio R. Porras, Awais Mansoor, Elijah Biggs, Marius George Linguraru, Akinobu Shimizu

Deep 3D Dose Analysis for Prediction of Outcomes After Liver Stereotactic Body Radiation Therapy

Accurate and precise dose delivery is the key factor for radiation therapy (RT) success. Currently, RT planning is based on optimization of oversimplified dose-volume metrics that consider all human organs to be homogeneous. The limitations of such an approach result in suboptimal treatments with poor outcomes: short survival, early cancer recurrence and radiation-induced toxicities of healthy organs. This paper pioneers the concept of deep 3D dose analysis for outcome prediction after liver stereotactic body RT (SBRT). The presented work develops tools for unification of dose plans into the same anatomy space, classifies dose plan using convolutional neural networks with transfer learning form anatomy images, and assembles the first volumetric liver atlas of the critical-to-spare liver regions. The concept is validated on prediction of post-SBRT survival and local cancer progression using a clinical database of primary and metastatic liver SBRTs. The risks of negative SBRT outcomes are quantitatively estimated for individual liver segments.

Bulat Ibragimov, Diego A. S. Toesca, Yixuan Yuan, Albert C. Koong, Daniel T. Chang, Lei Xing

Liver Lesion Detection from Weakly-Labeled Multi-phase CT Volumes with a Grouped Single Shot MultiBox Detector

We present a focal liver lesion detection model leveraged by custom-designed multi-phase computed tomography (CT) volumes, which reflects real-world clinical lesion detection practice using a Single Shot MultiBox Detector (SSD). We show that grouped convolutions effectively harness richer information of the multi-phase data for the object detection model, while a naive application of SSD suffers from a generalization gap. We trained and evaluated the modified SSD model and recently proposed variants with our CT dataset of 64 subjects by five-fold cross validation. Our model achieved a 53.3% average precision score and ran in under three seconds per volume, outperforming the original model and state-of-the-art variants. Results show that the one-stage object detection model is a practical solution, which runs in near real-time and can learn an unbiased feature representation from a large-volume real-world detection dataset, which requires less tedious and time consuming construction of the weak phase-level bounding box labels.

Sang-gil Lee, Jae Seok Bae, Hyunjae Kim, Jung Hoon Kim, Sungroh Yoon

A Diagnostic Report Generator from CT Volumes on Liver Tumor with Semi-supervised Attention Mechanism

Automatically generating diagnostic reports with interpretability for computed tomography (CT) volumes is a new challenge for the computer-aided diagnosis (CAD). In this paper, we propose a novel multimodal data and knowledge linking framework between CT volumes and textual reports with a semi-supervised attention mechanism. This multimodal framework includes a CT slices segmentation model and a language model. Semi-supervised attention mechanism paves the way for visually interpreting the underlying reasons that support the diagnosis results. This multi-task deep neural network is trained end-to-end. We not only quantitatively evaluate our system performance (76.6% in terms of BLEU@4), but also qualitatively visualize the attention heat map for this framework on a liver tumor dataset.

Jiang Tian, Cong Li, Zhongchao Shi, Feiyu Xu

Less is More: Simultaneous View Classification and Landmark Detection for Abdominal Ultrasound Images

An abdominal ultrasound examination, which is the most common ultrasound examination, requires substantial manual efforts to acquire standard abdominal organ views, annotate the views in texts, and record clinically relevant organ measurements. Hence, automatic view classification and landmark detection of the organs can be instrumental to streamline the examination workflow. However, this is a challenging problem given not only the inherent difficulties from the ultrasound modality, e.g., low contrast and large variations, but also the heterogeneity across tasks, i.e., one classification task for all views, and then one landmark detection task for each relevant view. While convolutional neural networks (CNN) have demonstrated more promising outcomes on ultrasound image analytics than traditional machine learning approaches, it becomes impractical to deploy multiple networks (one for each task) due to the limited computational and memory resources on most existing ultrasound scanners. To overcome such limits, we propose a multi-task learning framework to handle all the tasks by a single network. This network is integrated to perform view classification and landmark detection simultaneously; it is also equipped with global convolutional kernels, coordinate constraints, and a conditional adversarial module to leverage the performances. In an experimental study based on 187,219 ultrasound images, with the proposed simplified approach we achieve (1) view classification accuracy better than the agreement between two clinical experts and (2) landmark-based measurement errors on par with inter-user variability. The multi-task approach also benefits from sharing the feature extraction during the training process across all tasks and, as a result, outperforms the approaches that address each task individually.

Zhoubing Xu, Yuankai Huo, JinHyeong Park, Bennett Landman, Andy Milkowski, Sasa Grbic, Shaohua Zhou

Cardiac, Chest and Abdominal Applications: Lung Imaging Applications


Deep Active Self-paced Learning for Accurate Pulmonary Nodule Segmentation

Automatic and accurate pulmonary nodule segmentation in lung Computed Tomography (CT) volumes plays an important role in computer-aided diagnosis of lung cancer. However, this task is challenging due to target/background voxel imbalance and the lack of voxel-level annotation. In this paper, we propose a novel deep region-based network, called Nodule R-CNN, for efficiently detecting pulmonary nodules in 3D CT images while simultaneously generating a segmentation mask for each instance. Also, we propose a novel Deep Active Self-paced Learning (DASL) strategy to reduce annotation effort and also make use of unannotated samples, based on a combination of Active Learning and Self-Paced Learning (SPL) schemes. Experimental results on the public LIDC-IDRI dataset show our Nodule R-CNN achieves state-of-the-art results on pulmonary nodule segmentation, and Nodule R-CNN trained with the DASL strategy performs much better than Nodule R-CNN trained without DASL using the same amount of annotated samples.

Wenzhe Wang, Yifei Lu, Bian Wu, Tingting Chen, Danny Z. Chen, Jian Wu

CT-Realistic Lung Nodule Simulation from 3D Conditional Generative Adversarial Networks for Robust Lung Segmentation

Data availability plays a critical role for the performance of deep learning systems. This challenge is especially acute within the medical image domain, particularly when pathologies are involved, due to two factors: (1) limited number of cases, and (2) large variations in location, scale, and appearance. In this work, we investigate whether augmenting a dataset with artificially generated lung nodules can improve the robustness of the progressive holistically nested network (P-HNN) model for pathological lung segmentation of CT scans. To achieve this goal, we develop a 3D generative adversarial network (GAN) that effectively learns lung nodule property distributions in 3D space. In order to embed the nodules within their background context, we condition the GAN based on a volume of interest whose central part containing the nodule has been erased. To further improve realism and blending with the background, we propose a novel multi-mask reconstruction loss. We train our method on over 1000 nodules from the LIDC dataset. Qualitative results demonstrate the effectiveness of our method compared to the state-of-art. We then use our GAN to generate simulated training images where nodules lie on the lung border, which are cases where the published P-HNN model struggles. Qualitative and quantitative results demonstrate that armed with these simulated images, the P-HNN model learns to better segment lung regions under these challenging situations. As a result, our system provides a promising means to help overcome the data paucity that commonly afflicts medical imaging.

Dakai Jin, Ziyue Xu, Youbao Tang, Adam P. Harrison, Daniel J. Mollura

Fast CapsNet for Lung Cancer Screening

Lung cancer is the leading cause of cancer-related deaths in the past several years. A major challenge in lung cancer screening is the detection of lung nodules from computed tomography (CT) scans. State-of-the-art approaches in automated lung nodule classification use deep convolutional neural networks (CNNs). However, these networks require a large number of training samples to generalize well. This paper investigates the use of capsule networks (CapsNets) as an alternative to CNNs. We show that CapsNets significantly outperforms CNNs when the number of training samples is small. To increase the computational efficiency, our paper proposes a consistent dynamic routing mechanism that results in $$3\times $$ speedup of CapsNet. Finally, we show that the original image reconstruction method of CapNets performs poorly on lung nodule data. We propose an efficient alternative, called convolutional decoder, that yields lower reconstruction error and higher classification accuracy.

Aryan Mobiny, Hien Van Nguyen

Mean Field Network Based Graph Refinement with Application to Airway Tree Extraction

We present tree extraction in 3D images as a graph refinement task, of obtaining a subgraph from an over-complete input graph. To this end, we formulate an approximate Bayesian inference framework on undirected graphs using mean field approximation (MFA). Mean field networks are used for inference based on the interpretation that iterations of MFA can be seen as feed-forward operations in a neural network. This allows us to learn the model parameters from training data using back-propagation algorithm. We demonstrate usefulness of the model to extract airway trees from 3D chest CT data. We first obtain probability images using a voxel classifier that distinguishes airways from background and use Bayesian smoothing to model individual airway branches. This yields us joint Gaussian density estimates of position, orientation and scale as node features of the input graph. Performance of the method is compared with two methods: the first uses probability images from a trained voxel classifier with region growing, which is similar to one of the best performing methods at EXACT’09 airway challenge, and the second method is based on Bayesian smoothing on these probability images. Using centerline distance as error measure the presented method shows significant improvement compared to these two methods.

Raghavendra Selvan, Max Welling, Jesper H. Pedersen, Jens Petersen, Marleen de Bruijne

Automated Pulmonary Nodule Detection: High Sensitivity with Few Candidates

Automated pulmonary nodule detection plays an important role in lung cancer diagnosis. In this paper, we propose a pulmonary detection framework that can achieve high sensitivity with few candidates. First, the Feature Pyramid Network (FPN), which leverages multi-level features, is applied to detect nodule candidates that cover almost all true positives. Then redundant candidates are removed by a simple but effective Conditional 3-Dimensional Non-Maximum Suppression (Conditional 3D-NMS). Moreover, a novel Attention 3D CNN (Attention 3D-CNN) which efficiently utilizes contextual information is proposed to further remove the overwhelming majority of false positives. The proposed method yields a sensitivity of $$95.8\%$$ at 2 false positives per scan on the LUng Nodule Analysis 2016 (LUNA16) dataset, which is competitive compared to the current published state-of-the-art methods.

Bin Wang, Guojun Qi, Sheng Tang, Liheng Zhang, Lixi Deng, Yongdong Zhang

Deep Learning from Label Proportions for Emphysema Quantification

We propose an end-to-end deep learning method that learns to estimate emphysema extent from proportions of the diseased tissue. These proportions were visually estimated by experts using a standard grading system, in which grades correspond to intervals (label example: 1–5% of diseased tissue). The proposed architecture encodes the knowledge that the labels represent a volumetric proportion. A custom loss is designed to learn with intervals. Thus, during training, our network learns to segment the diseased tissue such that its proportions fit the ground truth intervals. Our architecture and loss combined improve the performance substantially (8% ICC) compared to a more conventional regression network. We outperform traditional lung densitometry and two recently published methods for emphysema quantification by a large margin (at least 7% AUC and 15% ICC), and achieve near-human-level performance. Moreover, our method generates emphysema segmentations that predict the spatial distribution of emphysema at human level.

Gerda Bortsova, Florian Dubost, Silas Ørting, Ioannis Katramados, Laurens Hogeweg, Laura Thomsen, Mathilde Wille, Marleen de Bruijne

Tumor-Aware, Adversarial Domain Adaptation from CT to MRI for Lung Cancer Segmentation

We present an adversarial domain adaptation based deep learning approach for automatic tumor segmentation from T2-weighted MRI. Our approach is composed of two steps: (i) a tumor-aware unsupervised cross-domain adaptation (CT to MRI), followed by (ii) semi-supervised tumor segmentation using Unet trained with synthesized and limited number of original MRIs. We introduced a novel target specific loss, called tumor-aware loss, for unsupervised cross-domain adaptation that helps to preserve tumors on synthesized MRIs produced from CT images. In comparison, state-of-the art adversarial networks trained without our tumor-aware loss produced MRIs with ill-preserved or missing tumors. All networks were trained using labeled CT images from 377 patients with non-small cell lung cancer obtained from the Cancer Imaging Archive and unlabeled T2w MRIs from a completely unrelated cohort of 6 patients with pre-treatment and 36 on-treatment scans. Next, we combined 6 labeled pre-treatment MRI scans with the synthesized MRIs to boost tumor segmentation accuracy through semi-supervised learning. Semi-supervised training of cycle-GAN produced a segmentation accuracy of 0.66 computed using Dice Score Coefficient (DSC). Our method trained with only synthesized MRIs produced an accuracy of 0.74 while the same method trained in semi-supervised setting produced the best accuracy of 0.80 on test. Our results show that tumor-aware adversarial domain adaptation helps to achieve reasonably accurate cancer segmentation from limited MRI data by leveraging large CT datasets.

Jue Jiang, Yu-Chi Hu, Neelam Tyagi, Pengpeng Zhang, Andreas Rimner, Gig S. Mageras, Joseph O. Deasy, Harini Veeraraghavan

From Local to Global: A Holistic Lung Graph Model

Lung image analysis is an essential part in the assessment of pulmonary diseases. Through visual inspection of CT scans, radiologists detect abnormal patterns in the lung parenchyma, aiming to establish a timely diagnosis and thus improving patient outcome. However, in a generalized disorder of the lungs, such as pulmonary hypertension, the changes in organ tissue can be elusive, requiring additional invasive studies to confirm the diagnosis. We present a graph model that quantifies lung texture in a holistic approach enhancing the analysis between pathologies with similar local changes. The approach extracts local state-of-the-art 3D texture descriptors from an automatically generated geometric parcellation of the lungs. The global texture distribution is encoded in a weighted graph that characterizes the correlations among neighboring organ regions. A data set of 125 patients with suspicion of having a pulmonary vascular pathology was used to evaluate our method. Three classes containing 47 pulmonary hypertension, 31 pulmonary embolism and 47 control cases were classified in a one vs. one setup. An area under the curve of up to 0.85 was obtained adding directionality to the edges of the graph architecture. The approach was able to identify diseased patients, and to distinguish pathologies with abnormal local and global blood perfusion defects.

Yashin Dicente Cid, Oscar Jiménez-del-Toro, Alexandra Platon, Henning Müller, Pierre-Alexandre Poletti

S4ND: Single-Shot Single-Scale Lung Nodule Detection

The most recent lung nodule detection studies rely on computationally expensive multi-stage frameworks to detect nodules from CT scans. To address this computational challenge and provide better performance, in this paper we propose S4ND, a new deep learning based method for lung nodule detection. Our approach uses a single feed forward pass of a single network for detection. The whole detection pipeline is designed as a single 3D Convolutional Neural Network (CNN) with dense connections, trained in an end-to-end manner. S4ND does not require any further post-processing or user guidance to refine detection results. Experimentally, we compared our network with the current state-of-the-art object detection network (SSD) in computer vision as well as the state-of-the-art published method for lung nodule detection (3D DCNN). We used publicly available 888 CT scans from LUNA challenge dataset and showed that the proposed method outperforms the current literature both in terms of efficiency and accuracy by achieving an average FROC-score of 0.897. We also provide an in-depth analysis of our proposed network to shed light on the unclear paradigms of tiny object detection.

Naji Khosravan, Ulas Bagci

Vascular Network Organization via Hough Transform (VaNgOGH): A Novel Radiomic Biomarker for Diagnosis and Treatment Response

As a “hallmark of cancer”, tumor-induced angiogenesis is one of the most important mechanisms of a tumor’s adaptation to changes in nutrient requirement. The angiogenic activity of certain tumors has been found to be predictive of a patient’s ultimate response to therapeutic intervention. This then begs the question if there are differences in vessel arrangement and corresponding convolutedness, between tumors that appear phenotypically similar, but respond differently to treatment. Even though textural radiomics and deep learning-based approaches have been shown to distinguish disease aggressiveness and assess therapeutic response, these descriptors do not specifically interpret differences in vessel characteristics. Moreover, most existing approaches have attempted to model disease characteristics just within tumor confines, or right outside, but do not consider explicit parenchymal vessel morphology. In this work, we introduce VaNgOGH (Vascular Network Organization via Hough transform), a new descriptor of architectural disorder of the tumor’s vascular network. We demonstrate the efficacy of VaNgOGH in two clinically challenging problems: (a) Predicting pathologically complete response (pCR) in breast cancer prior to treatment (BCa, N = 76) and (b) distinguishing benign nodules from malignant non-small cell lung cancer (LCa, N = 81). For both tasks, VaNgOGH had test area under the receiver operating characteristic curve ( $$AUC_{BCa}$$ = 0.75, $$AUC_{LCa}$$ = 0.68) higher than, or comparable to, state of the art radiomic approaches ( $$AUC_{BCa}$$ = 0.75, $$AUC_{LCa}$$ = 0.62) and convolutional neural networks ( $$AUC_{BCa}$$ = 0.67, $$AUC_{LCa}$$ = 0.66). Interestingly, when a known radiomic signature was used in conjunction with VaNgOGH, $$AUC_{BCa}$$ increased to 0.79.

Nathaniel Braman, Prateek Prasanna, Mehdi Alilou, Niha Beig, Anant Madabhushi

DeepEM: Deep 3D ConvNets with EM for Weakly Supervised Pulmonary Nodule Detection

Recently deep learning has been witnessing widespread adoption in various medical image applications. However, training complex deep neural nets requires large-scale datasets labeled with ground truth, which are often unavailable in many medical image domains. For instance, to train a deep neural net to detect pulmonary nodules in lung computed tomography (CT) images, current practice is to manually label nodule locations and sizes in many CT images to construct a sufficiently large training dataset, which is costly and difficult to scale. On the other hand, electronic medical records (EMR) contain plenty of partial information on the content of each medical image. In this work, we explore how to tap this vast, but currently unexplored data source to improve pulmonary nodule detection. We propose DeepEM, a novel deep 3D ConvNet framework augmented with expectation-maximization (EM), to mine weakly supervised labels in EMRs for pulmonary nodule detection. Experimental results show that DeepEM can lead to 1.5% and 3.9% average improvement in free-response receiver operating characteristic (FROC) scores on LUNA16 and Tianchi datasets, respectively, demonstrating the utility of incomplete information in EMRs for improving deep learning algorithms ( ).

Wentao Zhu, Yeeleng S. Vang, Yufang Huang, Xiaohui Xie

Statistical Framework for the Definition of Emphysema in CT Scans: Beyond Density Mask

Lung parenchyma destruction (emphysema) is a major factor in the description of Chronic Obstructive Pulmonary Disease (COPD) and its prognosis. It is defined as an abnormal enlargement of air spaces distal to the terminal bronchioles and the destruction of alveolar walls. In CT imaging, the presence of emphysema is observed by a local decrease of the lung density and the diagnose is usually set as more than 5% of the lung below −950 HU, the so-called emphysema density mask. There is still debate, however, about the definition of this percentage and many researchers set it depending on the population under study. Additionally, the −950 HU threshold may vary depending on factors as the slice thickness or the respiratory phase of the acquisition. In this paper we propose (1) a statistical framework that provides an automatic definition of the density threshold based on the statistical characterization of air and lung parenchyma; (2) the definition of a statistical test for emphysema detection that accounts for the CT noise characteristics. Results show that this novel statistical framework improves the quantification of emphysema against a visual reference and improves the association of emphysema with the pulmonary function tests.

Gonzalo Vegas-Sánchez-Ferrero, Raúl San José Estépar

Cardiac, Chest and Abdominal Applications: Breast Imaging Applications


Conditional Generative Adversarial and Convolutional Networks for X-ray Breast Mass Segmentation and Shape Classification

This paper proposes a novel approach based on conditional Generative Adversarial Networks (cGAN) for breast mass segmentation in mammography. We hypothesized that the cGAN structure is well-suited to accurately outline the mass area, especially when the training data is limited. The generative network learns intrinsic features of tumors while the adversarial network enforces segmentations to be similar to the ground truth. Experiments performed on dozens of malignant tumors extracted from the public DDSM dataset and from our in-house private dataset confirm our hypothesis with very high Dice coefficient and Jaccard index (>94% and >89%, respectively) outperforming the scores obtained by other state-of-the-art approaches. Furthermore, in order to detect portray significant morphological features of the segmented tumor, a specific Convolutional Neural Network (CNN) have also been designed for classifying the segmented tumor areas into four types (irregular, lobular, oval and round), which provides an overall accuracy about 72% with the DDSM dataset.

Vivek Kumar Singh, Santiago Romani, Hatem A. Rashwan, Farhan Akram, Nidhi Pandey, Md. Mostafa Kamal Sarker, Saddam Abdulwahab, Jordina Torrents-Barrena, Adel Saleh, Miguel Arquez, Meritxell Arenas, Domenec Puig

A Robust and Effective Approach Towards Accurate Metastasis Detection and pN-stage Classification in Breast Cancer

Predicting TNM stage is the major determinant of breast cancer prognosis and treatment. The essential part of TNM stage classification is whether the cancer has metastasized to the regional lymph nodes (N-stage). Pathologic N-stage (pN-stage) is commonly performed by pathologists detecting metastasis in histological slides. However, this diagnostic procedure is prone to misinterpretation and would normally require extensive time by pathologists because of the sheer volume of data that needs a thorough review. Automated detection of lymph node metastasis and pN-stage prediction has a great potential to reduce their workload and help the pathologist. Recent advances in convolutional neural networks (CNN) have shown significant improvements in histological slide analysis, but accuracy is not optimized because of the difficulty in the handling of gigapixel images. In this paper, we propose a robust method for metastasis detection and pN-stage classification in breast cancer from multiple gigapixel pathology images in an effective way. pN-stage is predicted by combining patch-level CNN based metastasis detector and slide-level lymph node classifier. The proposed framework achieves a state-of-the-art quadratic weighted kappa score of 0.9203 on the Camelyon17 dataset, outperforming the previous winning method of the Camelyon17 challenge.

Byungjae Lee, Kyunghyun Paeng

3D Anisotropic Hybrid Network: Transferring Convolutional Features from 2D Images to 3D Anisotropic Volumes

While deep convolutional neural networks (CNN) have been successfully applied to 2D image analysis, it is still challenging to apply them to 3D medical images, especially when the within-slice resolution is much higher than the between-slice resolution. We propose a 3D Anisotropic Hybrid Network (AH-Net) that transfers convolutional features learned from 2D images to 3D anisotropic volumes. Such a transfer inherits the desired strong generalization capability for within-slice information while naturally exploiting between-slice information for more effective modelling. We experiment with the proposed 3D AH-Net on two different medical image analysis tasks, namely lesion detection from a Digital Breast Tomosynthesis volume, and liver and liver tumor segmentation from a Computed Tomography volume and obtain state-of-the-art results.

Siqi Liu, Daguang Xu, S. Kevin Zhou, Olivier Pauly, Sasa Grbic, Thomas Mertelmeier, Julia Wicklein, Anna Jerebko, Weidong Cai, Dorin Comaniciu

Deep Generative Breast Cancer Screening and Diagnosis

Mammography is the primary modality for breast cancer screening, attempting to reduce breast cancer mortality risk with early detection. However, robust screening less hampered by misdiagnoses remains a challenge. Deep Learning methods have shown strong applicability to various medical image datasets, primarily thanks to their powerful feature learning capability. Such successful applications are, however, often overshadowed with limitations in real medical settings, dependency of lesion annotations, and discrepancy of data types between training and other datasets. To address such critical challenges, we developed DiaGRAM (Deep GeneRAtive Multi-task), which is built upon the combination of Convolutional Neural Networks (CNN) and Generative Adversarial Networks (GAN). The enhanced feature learning with GAN, and its incorporation with the hybrid training with the region of interest (ROI) and the whole images results in higher classification performance and an effective end-to-end scheme. DiaGRAM is capable of robust prediction, even for a small dataset, without lesion annotation, via transfer learning capacity. DiaGRAM achieves an AUC of 88.4% for DDSM and even 92.5% for the challenging INbreast with its small data size.

Shayan Shams, Richard Platania, Jian Zhang, Joohyun Kim, Kisung Lee, Seung-Jong Park

Integrate Domain Knowledge in Training CNN for Ultrasonography Breast Cancer Diagnosis

Breast cancer is the most common cancer in women, and ultrasound imaging is one of the most widely used approach for diagnosis. In this paper, we proposed to adopt Convolutional Neural Network (CNN) to classify ultrasound images and predict tumor malignancy. CNN is a successful algorithm for image recognition tasks and has achieved human-level performance in real applications. To improve the performance of CNN in breast cancer diagnosis, we integrated domain knowledge and conducted multi-task learning in the training process. After training, a radiologist visually inspected the class activation map of the last convolutional layer of trained network to evaluate the result. Our result showed that CNN classifier can not only give reasonable performance in predicting breast cancer, but also propose potential lesion regions which can be integrated into the breast ultrasound system in the future.

Jiali Liu, Wanyu Li, Ningbo Zhao, Kunlin Cao, Youbing Yin, Qi Song, Hanbo Chen, Xuehao Gong

Small Lesion Classification in Dynamic Contrast Enhancement MRI for Breast Cancer Early Detection

Classification of small lesions is of great importance for early detection of breast cancer. The small size of lesion makes handcrafted features ineffective for practical applications. Furthermore, the relatively small data sets also impose challenges on deep learning based classification methods. Dynamic Contrast Enhancement MRI (DCE-MRI) is widely-used for women at high risk of breast cancer, and the dynamic features become more important in the case of small lesion. To extract more dynamic information, we propose a method for processing sequence data to encode the DCE-MRI, and design a new structure, dense convolutional LSTM, by adding a dense block in convolutional LSTM unit. Faced with the huge number of parameters in deep neural network, we add some semantic priors as constrains to improve generalization performance. Four latent attributes are extracted from diagnostic reports and pathological results, and are predicted together with the classification of benign or malignant. Predicting the latent attributes as auxiliary tasks can help the training of deep neural network, which makes it possible to train complex network with small size dataset and achieve a satisfactory result. Our methods improve the accuracy from 0.625, acquired by ResNet, to 0.847.

Hao Zheng, Yun Gu, Yulei Qin, Xiaolin Huang, Jie Yang, Guang-Zhong Yang

Thermographic Computational Analyses of a 3D Model of a Scanned Breast

Breast cancer is the most common type of cancer among women. Cancer cells are characterized by having a higher metabolic activity and superior vascularization when compared to healthy cells. The internal heat generated by tumors travels to the skin surface where an infrared camera is capable of detecting small temperatures variations on the dermal surface. Breast cancer diagnosis using only thermal images is still not accepted by the medical community which makes necessary another exam to confirm the disease. This work presents a methodology which allows identification of breast cancer using only simulated thermal images. Experiments are performed in a three-dimensional breast geometry obtained with a 3D digital scanning. The procedure starts with the 3D scanning of a model of a real female breast using a “Picza LPX-600RE 3D Laser Scanner” to generate the breast virtual geometry. This virtual 3D model is then used to simulate the heat transfer phenomena using Finite Element Model (FEM). The simulated thermal images of the breast surface are obtained via the FEM model. Based on the temperature difference of a healthy breast and a breast with cancer it is possible to identify the presence of a tumor by analyzing the biggest thermal amplitudes. Results obtained with the FEM model indicate that it is possible to identify breast cancer using only infrared images.

Alisson Augusto Azevedo Figueiredo, Gabriela Lima Menegaz, Henrique Coelho Fernandes, Gilmar Guimaraes

Y-Net: Joint Segmentation and Classification for Diagnosis of Breast Biopsy Images

In this paper, we introduce a conceptually simple network for generating discriminative tissue-level segmentation masks for the purpose of breast cancer diagnosis. Our method efficiently segments different types of tissues in breast biopsy images while simultaneously predicting a discriminative map for identifying important areas in an image. Our network, Y-Net, extends and generalizes U-Net by adding a parallel branch for discriminative map generation and by supporting convolutional block modularity, which allows the user to adjust network efficiency without altering the network topology. Y-Net delivers state-of-the-art segmentation accuracy while learning $$6.6\times $$ fewer parameters than its closest competitors. The addition of descriptive power from Y-Net’s discriminative segmentation masks improve diagnostic classification accuracy by 7% over state-of-the-art methods for diagnostic classification. Source code is available at: .

Sachin Mehta, Ezgi Mercan, Jamen Bartlett, Donald Weaver, Joann G. Elmore, Linda Shapiro

Cardiac, Chest and Abdominal Applications: Other Abdominal Applications


AutoDVT: Joint Real-Time Classification for Vein Compressibility Analysis in Deep Vein Thrombosis Ultrasound Diagnostics

We propose a dual-task convolutional neural network (CNN) to fully automate the real-time diagnosis of deep vein thrombosis (DVT). DVT can be reliably diagnosed through evaluation of vascular compressibility at anatomically defined landmarks in streams of ultrasound (US) images. The combined real-time evaluation of these tasks has never been achieved before. As proof-of-concept, we evaluate our approach on two selected landmarks of the femoral vein, which can be identified with high accuracy by our approach. Our CNN is able to identify if a vein fully compresses with a F1 score of more than 90% while applying manual pressure with the ultrasound probe. Fully compressible veins robustly rule out DVT and such patients do not need to be referred to further specialist examination. We have evaluated our method on 1150 5–10 s compression image sequences from 115 healthy volunteers, which results in a data set size of approximately 200k labelled images. Our method yields a theoretical inference frame rate of more than 500 fps and we thoroughly evaluate the performance of 15 possible configurations.

Ryutaro Tanno, Antonios Makropoulos, Salim Arslan, Ozan Oktay, Sven Mischkewitz, Fouad Al-Noor, Jonas Oppenheimer, Ramin Mandegaran, Bernhard Kainz, Mattias P. Heinrich

MRI Measurement of Placental Perfusion and Fetal Blood Oxygen Saturation in Normal Pregnancy and Placental Insufficiency

The placenta is essential for successful pregnancy outcome. Inadequate placenta development leads to poor placental perfusion and placental insufficiency, responsible for one third of antenatal stillbirths. Current imaging modalities provide poor clinical assessment of placental perfusion and pregnancy outcome. In this work we propose a technique to estimate the vascular properties of retro-placenta myometrial and placental perfusion. The fetal blood saturation is a relative unknown, thus we describe a method to simultaneously estimate the fetal blood volume in addition to the fetal blood T2 relaxation time from which we can estimate this parameter. This information may prove useful for predicting if and when a placenta will fail, and thus when a small baby must be delivered to have the best neurological outcome. We report differences in vascular compartments and saturation values observed between 5 normal pregnancies, and two complicated by placental insufficiency.

Rosalind Aughwane, Magdalena Sokolska, Alan Bainbridge, David Atkinson, Giles Kendall, Jan Deprest, Tom Vercauteren, Anna L. David, Sébastien Ourselin, Andrew Melbourne

Automatic Lacunae Localization in Placental Ultrasound Images via Layer Aggregation

Accurate localization of structural abnormalities is a precursor for image-based prenatal assessment of adverse conditions. For clinical screening and diagnosis of abnormally invasive placenta (AIP), a life-threatening obstetric condition, qualitative and quantitative analysis of ultrasonic patterns correlated to placental lesions such as placental lacunae (PL) is challenging and time-consuming to perform even for experienced sonographers. There is a need for automated placental lesion localization that does not rely on expensive human annotations such as detailed manual segmentation of anatomical structures. In this paper, we investigate PL localization in 2D placental ultrasound images. First, we demonstrate the effectiveness of generating confidence maps from weak dot annotations in localizing PL as an alternative to expensive manual segmentation. Then we propose a layer aggregation structure based on iterative deep aggregation (IDA) for PL localization. Models with this structure were evaluated with 10-fold cross-validations on an AIP database (containing 3,440 images with 9,618 labelled PL from 23 AIP and 11 non-AIP participants). Experimental results demonstrate that the model with the proposed structure yielded the highest mean average precision (mAP = 35.7%), surpassing all other baseline models (32.6%, 32.2%, 29.7%). We argue that features from shallower stages can contribute to PL localization more effectively using the proposed structure. To our knowledge, this is the first successful application of machine learning to placental lesion analysis and has the potential to be adapted for other clinical scenarios in breast, liver, and prostate cancer imaging.

Huan Qi, Sally Collins, J. Alison Noble

A Decomposable Model for the Detection of Prostate Cancer in Multi-parametric MRI

Institutions that specialize in prostate MRI acquire different MR sequences owing to variability in scanning procedure and scanner hardware. We propose a novel prostate cancer detector that can operate in the absence of MR imaging sequences. Our novel prostate cancer detector first trains a forest of random ferns on all MR sequences and then decomposes these random ferns into a sum of MR sequence-specific random ferns enabling predictions to be made in the absence of one or more of these MR sequences. To accomplish this, we first show that a sum of random ferns can be exactly represented by another random fern and then we propose a method to approximately decompose an arbitrary random fern into a sum of random ferns. We show that our decomposed detector can maintain good performance when some MR sequences are omitted.

Nathan Lay, Yohannes Tsehay, Yohan Sumathipala, Ruida Cheng, Sonia Gaur, Clayton Smith, Adrian Barbu, Le Lu, Baris Turkbey, Peter L. Choyke, Peter Pinto, Ronald M. Summers

Direct Automated Quantitative Measurement of Spine via Cascade Amplifier Regression Network

Automated quantitative measurement of the spine (i.e., multiple indices estimation of heights, widths, areas, and so on for the vertebral body and disc) is of the utmost importance in clinical spinal disease diagnoses, such as osteoporosis, intervertebral disc degeneration, and lumbar disc herniation, yet still an unprecedented challenge due to the variety of spine structure and the high dimensionality of indices to be estimated. In this paper, we propose a novel cascade amplifier regression network (CARN), which includes the CARN architecture and local shape-constrained manifold regularization (LSCMR) loss function, to achieve accurate direct automated multiple indices estimation. The CARN architecture is composed of a cascade amplifier network (CAN) for expressive feature embedding and a linear regression model for multiple indices estimation. The CAN consists of cascade amplifier units (AUs), which are used for selective feature reuse by stimulating effective feature and suppressing redundant feature during propagating feature map between adjacent layers, thus an expressive feature embedding is obtained. During training, the LSCMR is utilized to alleviate overfitting and generate realistic estimation by learning the multiple indices distribution. Experiments on MR images of 195 subjects show that the proposed CARN achieves impressive performance with mean absolute errors of 1.2496 ± 1.0624 mm, 1.2887 ± 1.0992 mm, and 1.2692 ± 1.0811 mm for estimation of 15 heights of discs, 15 heights of vertebral bodies, and total indices respectively. The proposed method has great potential in clinical spinal disease diagnoses.

Shumao Pang, Stephanie Leung, Ilanit Ben Nachum, Qianjin Feng, Shuo Li

Estimating Achilles Tendon Healing Progress with Convolutional Neural Networks

Quantitative assessment of a treatment progress in the Achilles tendon healing process - one of the most common musculoskeletal disorders in modern medical practice - is typically a long and complex process: multiple MRI protocols need to be acquired and analysed by radiology experts for proper assessment. In this paper, we propose to significantly reduce the complexity of this process by using a novel method based on a pre-trained convolutional neural network. We first train our neural network on over 500 000 2D axial cross-sections from over 3 000 3D MRI studies to classify MRI images as belonging to a healthy or injured class, depending on the patient’s condition. We then take the outputs of a modified pre-trained network and apply linear regression on the PCA-reduced space of the features to assess treatment progress. Our method allows to reduce up to 5-fold the amount of data needed to be registered during the MRI scan without any information loss. Furthermore, we are able to predict the healing process phase with equal accuracy to human experts in 3 out of 6 main criteria. Finally, contrary to the current approaches to healing assessment that rely on radiologist subjective opinion, our method allows to objectively compare different treatments methods which can lead to faster patient’s recovery.

Norbert Kapinski, Jakub Zielinski, Bartosz A. Borucki, Tomasz Trzcinski, Beata Ciszkowska-Lyson, Krzysztof S. Nowinski


Weitere Informationen

Premium Partner