Skip to main content

2019 | Buch

Machine Learning in Medical Imaging

10th International Workshop, MLMI 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 13, 2019, Proceedings

insite
SUCHEN

Über dieses Buch

This book constitutes the proceedings of the 10th International Workshop on Machine Learning in Medical Imaging, MLMI 2019, held in conjunction with MICCAI 2019, in Shenzhen, China, in October 2019.

The 78 papers presented in this volume were carefully reviewed and selected from 158 submissions. They focus on major trends and challenges in the area, aiming to identify new-cutting-edge techniques and their uses in medical imaging. Topics dealt with are: deep learning, generative adversarial learning, ensemble learning, sparse learning, multi-task learning, multi-view learning, manifold learning, and reinforcement learning, with their applications to medical image analysis, computer-aided detection and diagnosis, multi-modality fusion, image reconstruction, image retrieval, cellular image analysis, molecular imaging, digital pathology, etc.

Inhaltsverzeichnis

Frontmatter
Brain MR Image Segmentation in Small Dataset with Adversarial Defense and Task Reorganization

Medical image segmentation is challenging especially in dealing with small dataset of 3D MR images. Encoding the variation of brain anatomical structures from individual subjects cannot be easily achieved, which is further challenged by only a limited number of well labeled subjects for training. In this study, we aim to address the issue of brain MR image segmentation in small dataset. First, concerning the limited number of training images, we adopt adversarial defense to augment the training data and therefore increase the robustness of the network. Second, inspired by the prior knowledge of neural anatomies, we reorganize the segmentation tasks of different regions into several groups in a hierarchical way. Third, the task reorganization extends to the semantic level, as we incorporate an additional object-level classification task to contribute high-order visual features toward the pixel-level segmentation task. In experiments we validate our method by segmenting gray matter, white matter, and several major regions on a challenge dataset. The proposed method with only seven subjects for training can achieve 84.46% of Dice score in the onsite test set.

Xuhua Ren, Lichi Zhang, Dongming Wei, Dinggang Shen, Qian Wang
Spatial Regularized Classification Network for Spinal Dislocation Diagnosis

Spinal dislocation diagnosis manifests typical characteristics of fine-grained visual categorization tasks, i.e. low inter-class variance and high intra-class variance. A pure data-driven approach towards an automated spinal dislocation diagnosis method would demand not only large volume of training data but also fine-grained labels, which is impractical in medial scenarios. In this paper, we attempt to utilize the expert knowledge that the spinal edges are crucial for dislocation diagnosis to guide model training and explore a data-knowledge dual driven approach for spinal dislocation diagnosis. Specifically, to embed the expert knowledge into the classification networks, we introduce a spatial regularization term to constrain the location of the discriminative regions of spinal CT images. Extensive experimental analysis has shown that the proposed method gains 0.18%–4.79% upon AUC, and the gain is more significant for smaller training sets. What’s more, the spatial regularization brings more discriminative and interpretable features.

Bolin Lai, Shiqi Peng, Guangyu Yao, Ya Zhang, Xiaoyun Zhang, Yanfeng Wang, Hui Zhao
Globally-Aware Multiple Instance Classifier for Breast Cancer Screening

Deep learning models designed for visual classification tasks on natural images have become prevalent in medical image analysis. However, medical images differ from typical natural images in many ways, such as significantly higher resolutions and smaller regions of interest. Moreover, both the global structure and local details play important roles in medical image analysis tasks. To address these unique properties of medical images, we propose a neural network that is able to classify breast cancer lesions utilizing information from both a global saliency map and multiple local patches. The proposed model outperforms the ResNet-based baseline and achieves radiologist-level performance in the interpretation of screening mammography. Although our model is trained only with image-level labels, it is able to generate pixel-level saliency maps that provide localization of possible malignant findings.

Yiqiu Shen, Nan Wu, Jason Phang, Jungkyu Park, Gene Kim, Linda Moy, Kyunghyun Cho, Krzysztof J. Geras
Advancing Pancreas Segmentation in Multi-protocol MRI Volumes Using Hausdorff-Sine Loss Function

Computing pancreatic morphology in 3D radiological scans could provide significant insight about a medical condition. However, segmenting the pancreas in magnetic resonance imaging (MRI) remains challenging due to high inter-patient variability. Also, the resolution and speed of MRI scanning present artefacts that blur the pancreas boundaries between overlapping anatomical structures. This paper proposes a dual-stage automatic segmentation method: (1) a deep neural network is trained to address the problem of vague organ boundaries in high class-imbalanced data. This network integrates a novel loss function to rigorously optimise boundary delineation using the modified Hausdorff metric and a sinusoidal component; (2) Given a test MRI volume, the output of the trained network predicts a sequence of targeted 2D pancreas classes that are reconstructed as a volumetric binary mask. An energy-minimisation approach fuses a learned digital contrast model to suppress the intensities of non-pancreas classes, which, combined with the binary volume performs a refined segmentation in 3D while revealing dense boundary detail. Experiments are performed on two diverse MRI datasets containing 180 and 120 scans, in which the proposed approach achieves a mean Dice score of 84.1 ± 4.6% and 85.7 ± 2.3%, respectively. This approach is statistically stable and outperforms state-of-the-art methods on MRI.

Hykoush Asaturyan, E. Louise Thomas, Julie Fitzpatrick, Jimmy D. Bell, Barbara Villarini
WSI-Net: Branch-Based and Hierarchy-Aware Network for Segmentation and Classification of Breast Histopathological Whole-Slide Images

This paper proposes a novel network WSI-Net for segmentation and classification of gigapixel breast whole-slide images (WSIs). WSI-Net can segment patches from the WSI into three types, including non-malignant, ductal carcinoma in situ, and invasive ductal carcinoma. It adds a parallel classification branch on the top of the low layer of a semantic segmentation model DeepLab. This branch can fast identify and discard those non-malignant patches in advance and thus the high layer of DeepLab can only focus on the remaining possible cancerous inputs. This strategy can accelerate inference and robustly improve segmentation performance. For training WSI-Net, a hierarchy-aware loss function is proposed to combine pixel-level and patch-level loss, which can capture the pathology hierarchical relationships between pixels in each patch. By aggregating patch segmentation results from WSI-Net, we generate a segmentation map for the WSI and extract its morphological features for WSI-level classification. Experimental results show that our WSI-Net can be fast, robust and effective on our benchmark dataset.

Haomiao Ni, Hong Liu, Kuansong Wang, Xiangdong Wang, Xunjian Zhou, Yueliang Qian
Lesion Detection with Deep Aggregated 3D Contextual Feature and Auxiliary Information

Detecting different kinds of lesions in computed tomography (CT) scans at the same time is a difficult but important task for a computer-aided diagnosis (CADx) system. Compared to single-lesion detection methods, our lesion detection method considers additional intra-class differences. In this work, we present a CT image analysis framework for lesion detection. Our model is developed based on a dense region-based fully convolutional network (Dense R-FCN) model using 3D context and is equipped with a dense auxiliary loss (DAL) scheme for end-to-end learning. It fuses shallow, medium, and deep features to meet the needs of detecting lesions of various sizes. Owing to its fully-connected structure, it is called Dense R-FCN. Meanwhile, the DAL supervises the intermediate hidden layers in order to maximize the use of the shallow layer information, which benefits the detection results, especially for small lesions. Experiment results on the DeepLesion dataset corroborate the efficacy of our method.

Han Zhang, Albert C. S. Chung
MSAFusionNet: Multiple Subspace Attention Based Deep Multi-modal Fusion Network

It is common for doctors to simultaneously consider multi-modal information in diagnosis. However, how to use multi-modal medical images effectively has not been fully studied in the field of deep learning within such a context. In this paper, we address the task of end-to-end segmentation based on multi-modal data and propose a novel deep learning framework, multiple subspace attention-based deep multi-modal fusion network (referred to as MSAFusionNet hereon-forth). More specifically, MSAFusionNet consists of three main components: (1) a multiple subspace attention model that contains inter-attention modules and generalized squeeze-and-excitation modules, (2) a multi-modal fusion network which leverages CNN-LSTM layers to integrate sequential multi-modal input images, and (3) a densely-dilated U-Net as the encoder-decoder backbone for image segmentation. Experiments on ISLES 2018 data set have shown that MSAFusionNet achieves the state-of-the-art segmentation accuracy.

Sen Zhang, Changzheng Zhang, Lanjun Wang, Cixing Li, Dandan Tu, Rui Luo, Guojun Qi, Jiebo Luo
DCCL: A Benchmark for Cervical Cytology Analysis

Medical imaging analysis has witnessed impressive progress in recent years thanks to the development of large-scale labeled datasets. However, in many fields, including cervical cytology, a large well-annotated benchmark dataset remains missing. In this paper, we introduce by far the largest cervical cytology dataset, called Deep Cervical Cytological Lesions (referred to as DCCL). DCCL contains 14,432 image patches with around $$1{,}200\times 2{,}000$$ pixels cropped from 1,167 whole slide images collected from four medical centers and scanned by one of the three kinds of digital slide scanners. Besides patch level labels, cell level labels are provided, with 27,972 lesion cells labeled based on The 2014 Bethesda System and the bounding box by six board-certified pathologists with eight years of experience on the average. We also use deep learning models to generate the baseline performance for lesion cell detection and cell type classification on DCCL. We believe this dataset can serve as a valuable resource and platform for researchers to develop new algorithms and pipelines for advanced cervical cancer diagnosis and prevention.

Changzheng Zhang, Dong Liu, Lanjun Wang, Yaoxin Li, Xiaoshi Chen, Rui Luo, Shuanlong Che, Hehua Liang, Yinghua Li, Si Liu, Dandan Tu, Guojun Qi, Pifu Luo, Jiebo Luo
Smartphone-Supported Malaria Diagnosis Based on Deep Learning

Malaria remains a major burden on global health, causing about half a million deaths every year. The objective of this work is to develop a fast, automated, smartphone-supported malaria diagnostic system. Our proposed system is the first system using both image processing and deep learning methods on a smartphone to detect malaria parasites in thick blood smears. The underlying detection algorithm is based on an iterative method for parasite candidate screening and a convolutional neural network model (CNN) for feature extraction and classification. The system runs on Android phones and can process blood smear images taken by the smartphone camera when attached to the eyepiece of a microscope. We tested the system on 50 normal patients and 150 abnormal patients. The accuracies of the system on patch-level and patient-level are 97% and 78%, respectively. AUC values on patch-level and patient-level are, respectively, 98% and 85%. Our system could aid in malaria diagnosis in resource-limited regions, without depending on extensive diagnostic expertise or expensive diagnostic equipment.

Feng Yang, Hang Yu, Kamolrat Silamut, Richard J. Maude, Stefan Jaeger, Sameer Antani
Children’s Neuroblastoma Segmentation Using Morphological Features

Neuroblastoma (NB) is a common type of cancer in children that can develop in the neck, chest, or abdomen. It causes about 15% of cancer deaths in children. However, the automatic segmentation of NB on CT images has been addressed weakly, mostly because children’s CT images have much lower contrast than adults, especially those aged less than one year. Furthermore, neuroblastomas can develop in different body parts and are usually in variable size and irregular shape, which also add to the difficulties of NB segmentation. In view of these issues, we propose a morphological constrained end-to-end NB segmentation approach by taking the sizes and shapes of tumors in consideration for more accurate boundaries. The morphological features of neuroblastomas are predicted as an auxiliary task while performing segmentation and used as additional supervision for the segmentation prediction. We collect 248 CT scans from distinct patients with manually-annotated labels to establish a dataset for NB segmentation. Our method is evaluated on this dataset as well as the public Brats2018, and experimental results shows that the morphological constraints can improve the performance of medical image segmentation networks.

Shengyang Li, Xiaoyun Zhang, Xiaoxia Wang, Yumin Zhong, Xiaofen Yao, Ya Zhang, Yanfeng Wang
GFD Faster R-CNN: Gabor Fractal DenseNet Faster R-CNN for Automatic Detection of Esophageal Abnormalities in Endoscopic Images

Esophageal cancer is ranked as the sixth most fatal cancer type. Most esophageal cancers are believed to arise from overlooked abnormalities in the esophagus tube. The early detection of these abnormalities is considered challenging due to their different appearance and random location throughout the esophagus tube. In this paper, a novel Gabor Fractal DenseNet Faster R-CNN (GFD Faster R-CNN) is proposed which is a two-input network adapted from the Faster R-CNN to address the challenges of esophageal abnormality detection. First, a Gabor Fractal (GF) image is generated using various Gabor filter responses considering different orientations and scales, obtained from the original endoscopic image that strengthens the fractal texture information within the image. Secondly, we incorporate Densely Connected Convolutional Network (DenseNet) as the backbone network to extract features from both original endoscopic image and the generated GF image separately; the DenseNet provides a reduction in the trained parameters while supporting the network accuracy and enables a maximum flow of information. Features extracted from the GF and endoscopic images are fused through bilinear fusion before ROI pooling stage in Faster R-CNN, providing a rich feature representation that boosts the performance of final detection. The proposed architecture was trained and tested on two different datasets independently: Kvasir (1000 images) and MICCAI’15 (100 images). Extensive experiments have been carried out to evaluate the performance of the model, with a recall of 0.927 and precision of 0.942 for Kvasir dataset, and a recall of 0.97 and precision of 0.92 for MICCAI’15 dataset, demonstrating a high detection performance compared to the state-of-the-art.

Noha Ghatwary, Massoud Zolgharni, Xujiong Ye
Deep Active Lesion Segmentation

Lesion segmentation is an important problem in computer-assisted diagnosis that remains challenging due to the prevalence of low contrast, irregular boundaries that are unamenable to shape priors. We introduce Deep Active Lesion Segmentation (DALS), a fully automated segmentation framework that leverages the powerful nonlinear feature extraction abilities of fully Convolutional Neural Networks (CNNs) and the precise boundary delineation abilities of Active Contour Models (ACMs). Our DALS framework benefits from an improved level-set ACM formulation with a per-pixel-parameterized energy functional and a novel multiscale encoder-decoder CNN that learns an initialization probability map along with parameter maps for the ACM. We evaluate our lesion segmentation model on a new Multiorgan Lesion Segmentation (MLS) dataset that contains images of various organs, including brain, liver, and lung, across different imaging modalities—MR and CT. Our results demonstrate favorable performance compared to competing methods, especially for small training datasets.

Ali Hatamizadeh, Assaf Hoogi, Debleena Sengupta, Wuyue Lu, Brian Wilcox, Daniel Rubin, Demetri Terzopoulos
Infant Brain Deformable Registration Using Global and Local Label-Driven Deep Regression Learning

Accurate image registration is important for quantifying dynamic brain development in the first year of life. However, it is challenging to deformable registration of infant brain magnetic resonance (MR) images because: (1) there are large anatomical and appearance variations in these longitudinal images; (2) there is a one-to-many correspondence in appearance between global anatomical regions and local small therein regions. In this paper, we apply a deformable registration scheme based on the global and local label-driven learning with convolution neural networks (CNN). Two to-be-registered patches are fed into an U-Net-like regression network. Then a dense displacement field (DDF) is obtained by optimizing the loss function between many pairs of label patches. Global and local label patch pairs are only leveraged to drive registration during training stage. During inference, the resulting 3D DDF is obtained by inputting two new MR images to the trained network. The highlight is that the global tissues, i.e. white matter (GM), gray matter (GM), cerebrospinal fluid (CSF), and the local hippocampi are well aligned at the same time without any priori ground-truth deformation. Especially for the local hippocampi, their Dice ratios between two aligned images are highly improved. Experiment results are given based on intra-subject and inter-subject registration of infant brain MR images between different time points, yielding higher accuracy in both global and local tissues compared with state-of-the-art registration methods.

Shunbo Hu, Lintao Zhang, Guoqiang Li, Mingtao Liu, Deqian Fu, Wenyin Zhang
A Relation Hashing Network Embedded with Prior Features for Skin Lesion Classification

Deep neural networks have become an effective tool for solving end-to-end classification problems and are suitable for many diagnostic settings. However, the success of such deep models often depends on a large number of training samples with annotations. Moreover, deep networks do not leverage the power of domain knowledge which is usually essential for diagnostic decision. Here we propose a novel relation hashing network via meta-learning to address the problem of skin lesion classification with prior features. In particular, we present a deep relation network to capture and memorize the relation among different samples. To employ the prior domain knowledge, we construct the hybrid-prior feature representation via joint meta-learning based on handcrafted models and deep-learned features. In order to utilize the fast and efficient computation of representation learning, we further create a hashing hybrid-prior feature representation by incorporating deep hashing into hybrid-prior representation learning, and then integrating it into our proposed network. Final recognition is obtained from our hashing relation network by learning to compare among the hashing hybrid-prior features of samples. Experimental results on ISIC Skin 2017 dataset demonstrate that our hashing relation network can achieve the state-of-the-art performance for the task of skin lesion classification.

Wenbo Zheng, Chao Gou, Lan Yan
End-to-End Adversarial Shape Learning for Abdomen Organ Deep Segmentation

Automatic segmentation of abdomen organs using medical imaging has many potential applications in clinical workflows. Recently, the state-of-the-art performance for organ segmentation has been achieved by deep learning models, i.e., convolutional neural network (CNN). However, it is challenging to train the conventional CNN-based segmentation models that aware of the shape and topology of organs. In this work, we tackle this problem by introducing a novel end-to-end shape learning architecture – organ point-network. It takes deep learning features as inputs and generates organ shape representations as points that located on organ surface. We later present a novel adversarial shape learning objective function to optimize the point-network to capture shape information better. We train the point-network together with a CNN-based segmentation model in a multi-task fashion so that the shared network parameters can benefit from both shape learning and segmentation tasks. We demonstrate our method with three challenging abdomen organs including liver, spleen, and pancreas. The point-network generates surface points with fine-grained details and it is found critical for improving organ segmentation. Consequently, the deep segmentation model is improved by the introduced shape learning as significantly better Dice scores are observed for spleen and pancreas segmentation.

Jinzheng Cai, Yingda Xia, Dong Yang, Daguang Xu, Lin Yang, Holger Roth
Privacy-Preserving Federated Brain Tumour Segmentation

Due to medical data privacy regulations, it is often infeasible to collect and share patient data in a centralised data lake. This poses challenges for training machine learning algorithms, such as deep convolutional networks, which often require large numbers of diverse training examples. Federated learning sidesteps this difficulty by bringing code to the patient data owners and only sharing intermediate model training updates among them. Although a high-accuracy model could be achieved by appropriately aggregating these model updates, the model shared could indirectly leak the local training examples. In this paper, we investigate the feasibility of applying differential-privacy techniques to protect the patient data in a federated learning setup. We implement and evaluate practical federated learning systems for brain tumour segmentation on the BraTS dataset. The experimental results show that there is a trade-off between model performance and privacy protection costs.

Wenqi Li, Fausto Milletarì, Daguang Xu, Nicola Rieke, Jonny Hancox, Wentao Zhu, Maximilian Baust, Yan Cheng, Sébastien Ourselin, M. Jorge Cardoso, Andrew Feng
Residual Attention Generative Adversarial Networks for Nuclei Detection on Routine Colon Cancer Histology Images

The automatic detection of nuclei in pathological images plays an important role in diagnosis and prognosis of cancers. Most nuclei detection algorithms are based on the assumption that the nuclei center should have larger responses than their surroundings in the probability map of the pathological image, which in turn transforms the detection or localization problem into finding the local maxima on the probability map. However, all the existing studies used regression algorithms to determine the probability map, which neglect to take the spatial contiguity within the probability map into consideration. In order to capture the higher-order consistency within the generated probability map, we propose an approach called Residual Attention Generative Adversarial Network (i.e., RAGAN) for nuclei detection. Specifically, the objective function of the RAGAN model combines a detection term with an adversarial term. The adversarial term adopts a generator called Residual Attention U-Net (i.e., RAU-Net) to produce the probability maps that cannot be distinguished by the ground-truth. Based on the adversarial model, we can simultaneously estimate the probabilities of many pixels with high-order consistency, by which we can derive a more accurate probability map. We evaluate our method on a public colorectal adenocarcinoma images dataset with 29756 nuclei. Experimental results show that our method can achieve the F1 Score of 0.847 (with a Precision of 0.859 and a Recall of 0.836) for nuclei detection, which is superior to the conventional methods.

Junwei Li, Wei Shao, Zhongnian Li, Weida Li, Daoqiang Zhang
Semi-supervised Multi-task Learning with Chest X-Ray Images

Discriminative models that require full supervision are inefficacious in the medical imaging domain when large labeled datasets are unavailable. By contrast, generative modeling—i.e., learning data generation and classification—facilitates semi-supervised training with limited labeled data. Moreover, generative modeling can be advantageous in accomplishing multiple objectives for better generalization. We propose a novel multi-task learning model for jointly learning a classifier and a segmentor, from chest X-ray images, through semi-supervised learning. In addition, we propose a new loss function that combines absolute KL divergence with Tversky loss (KLTV) to yield faster convergence and better segmentation performance. Based on our experimental results using a novel segmentation model, an Adversarial Pyramid Progressive Attention U-Net (APPAU-Net), we hypothesize that KLTV can be more effective for generalizing multi-tasking models while being competitive in segmentation-only tasks.

Abdullah-Al-Zubaer Imran, Demetri Terzopoulos
Novel Bi-directional Images Synthesis Based on WGAN-GP with GMM-Based Noise Generation

A novel WGAN-GP-based model is proposed in this study to fulfill bi-directional synthesis of medical images for the first time. GMM-based noise generated from the Glow model is newly incorporated into the WGAN-GP-based model to better reflect the characteristics of heterogeneity commonly seen in medical images, which is beneficial to produce high-quality synthesized medical images. Both the conventional “down-sampling”-like synthesis and the more challenging “up-sampling”-like synthesis are realized through the newly introduced model, which is thoroughly evaluated with comparisons towards several popular deep learning-based models both qualitatively and quantitatively. The superiority of the new model is substantiated based on a series of rigorous experiments using a multi-modal MRI database composed of 355 real demented patients in this study, from the statistical perspective.

Wei Huang, Mingyuan Luo, Xi Liu, Peng Zhang, Huijun Ding, Dong Ni
Pseudo-labeled Bootstrapping and Multi-stage Transfer Learning for the Classification and Localization of Dysplasia in Barrett’s Esophagus

Patients suffering from Barrett’s Esophagus (BE) are at an increased risk of developing esophageal adenocarcinoma and early detection is crucial for a good prognosis. To aid the endoscopists with the early detection for this preliminary stage of esphageal cancer, this work concentrates on improving the state of the art for the computer-aided classification and localization of dysplastic lesions in BE. To this end, we employ a large-scale endoscopic data set, consisting of 494, 355 images, to pre-train several instances of the proposed GastroNet architecture, after which several data sets that are increasingly closer to the target domain are used in a multi-stage transfer learning strategy. Finally, ensembling is used to evaluate the results on a prospectively gathered external test set. Results from the performed experiments show that the proposed model improves on the state-of-the-art on all measured metrics. More specifically, compared to the best performing state-of-the-art model, the specificity is improved by more than 20% while preserving sensitivity at a high level, thereby reducing the false positive rate substantially. Our algorithm also significantly outperforms state-of-the-art on the localization metrics, where the intersection of all experts is correctly indicated in approximately 92% of the cases.

Joost van der Putten, Jeroen de Groof, Fons van der Sommen, Maarten Struyvenberg, Svitlana Zinger, Wouter Curvers, Erik Schoon, Jacques Bergman, Peter H. N. de With
Anatomy-Aware Self-supervised Fetal MRI Synthesis from Unpaired Ultrasound Images

Fetal brain magnetic resonance imaging (MRI) offers exquisite images of the developing brain but is not suitable for anomaly screening. For this ultrasound (US) is employed. While expert sonographers are adept at reading US images, MR images are much easier for non-experts to interpret. Hence in this paper we seek to produce images with MRI-like appearance directly from clinical US images. Our own clinical motivation is to seek a way to communicate US findings to patients or clinical professionals unfamiliar with US, but in medical image analysis such a capability is potentially useful, for instance, for US-MRI registration or fusion. Our model is self-supervised and end-to-end trainable. Specifically, based on an assumption that the US and MRI data share a similar anatomical latent space, we first utilise an extractor to determine shared latent features, which are then used for data synthesis. Since paired data was unavailable for our study (and rare in practice), we propose to enforce the distributions to be similar instead of employing pixel-wise constraints, by adversarial learning in both the image domain and latent space. Furthermore, we propose an adversarial structural constraint to regularise the anatomical structures between the two modalities during the synthesis. A cross-modal attention scheme is proposed to leverage non-local spatial correlations. The feasibility of the approach to produce realistic looking MR images is demonstrated quantitatively and with a qualitative evaluation compared to real fetal MR images.

Jianbo Jiao, Ana I. L. Namburete, Aris T. Papageorghiou, J. Alison Noble
End-to-End Boundary Aware Networks for Medical Image Segmentation

Fully convolutional neural networks (CNNs) have proven to be effective at representing and classifying textural information, thus transforming image intensity into output class masks that achieve semantic image segmentation. In medical image analysis, however, expert manual segmentation often relies on the boundaries of anatomical structures of interest. We propose boundary aware CNNs for medical image segmentation. Our networks are designed to account for organ boundary information, both by providing a special network edge branch and edge-aware loss terms, and they are trainable end-to-end. We validate their effectiveness on the task of brain tumor segmentation using the BraTS 2018 dataset. Our experiments reveal that our approach yields more accurate segmentation results, which makes it promising for more extensive application to medical image segmentation.

Ali Hatamizadeh, Demetri Terzopoulos, Andriy Myronenko
Automatic Rodent Brain MRI Lesion Segmentation with Fully Convolutional Networks

Manual segmentation of rodent brain lesions from magnetic resonance images (MRIs) is an arduous, time-consuming and subjective task that is highly important in pre-clinical research. Several automatic methods have been developed for different human brain MRI segmentation, but little research has targeted automatic rodent lesion segmentation. The existing tools for performing automatic lesion segmentation in rodents are constrained by strict assumptions about the data. Deep learning has been successfully used for medical image segmentation. However, there has not been any deep learning approach specifically designed for tackling rodent brain lesion segmentation. In this work, we propose a novel Fully Convolutional Network (FCN), RatLesNet, for the aforementioned task. Our dataset consists of 131 T2-weighted rat brain scans from 4 different studies in which ischemic stroke was induced by transient middle cerebral artery occlusion. We compare our method with two other 3D FCNs originally developed for anatomical segmentation (VoxResNet and 3D-U-Net) with 5-fold cross-validation on a single study and a generalization test, where the training was done on a single study and testing on three remaining studies. The labels generated by our method were quantitatively and qualitatively better than the predictions of the compared methods. The average Dice coefficient achieved in the 5-fold cross-validation experiment with the proposed approach was 0.88, between 3.7% and 38% higher than the compared architectures. The presented architecture also outperformed the other FCNs at generalizing on different studies, achieving the average Dice coefficient of 0.79.

Juan Miguel Valverde, Artem Shatillo, Riccardo De Feo, Olli Gröhn, Alejandra Sierra, Jussi Tohka
Morphological Simplification of Brain MR Images by Deep Learning for Facilitating Deformable Registration

Brain MR image registration is challenging due to the large inter-subject anatomical variation. Especially, the highly convoluted brain cortex makes it difficult to accurately align the corresponding structures of the underlying images. In this paper, we propose a novel deep learning strategy to simplify the image registration task. Specifically, we train a morphological simplification network (MS-Net), which can generate a simplified image with fewer anatomical details given a complex input image. With this trained MS-Net, we can reduce the complexity of both the fixed and the moving images and iteratively derive their respective trajectories of gradually simplified images. The generated images at the ends of the two trajectories are so simple that they are very similar in appearance and morphology and thus easy to register. In this way, these two trajectories can act as a bridge to link the fixed and the moving images and guide their registration. Our experiments show that the proposed method can achieve more accurate registration results than state-of-the-art methods. Moreover, the proposed method can be generalized to the unseen dataset without the need for re-training or domain adaptation.

Dongming Wei, Sahar Ahmad, Zhengwang Wu, Xiaohuan Cao, Xuhua Ren, Gang Li, Dinggang Shen, Qian Wang
Joint Shape Representation and Classification for Detecting PDAC

We aim to detect pancreatic ductal adenocarcinoma (PDAC) in abdominal CT scans, which sheds light on early diagnosis of pancreatic cancer. This is a 3D volume classification task with little training data. We propose a two-stage framework, which first segments the pancreas into a binary mask, then compresses the mask into a shape vector and performs abnormality classification. Shape representation and classification are performed in a joint manner, both to exploit the knowledge that PDAC often changes the shape of the pancreas and to prevent over-fitting. Experiments are performed on 300 normal scans and 136 PDAC cases. We achieve a specificity of $$90.2\%$$ (false alarm occurs on less than 1/10 normal cases) at a sensitivity of $$80.2\%$$ (less than 1/5 PDAC cases are not detected), which show promise for clinical applications.

Fengze Liu, Lingxi Xie, Yingda Xia, Elliot Fishman, Alan Yuille
FusionNet: Incorporating Shape and Texture for Abnormality Detection in 3D Abdominal CT Scans

Automatic abnormality detection in abdominal CT scans can help doctors improve the accuracy and efficiency in diagnosis. In this paper we aim at detecting pancreatic ductal adenocarcinoma (PDAC), the most common pancreatic cancer. Taking the fact that the existence of tumor can affect both the shape and the texture of pancreas, we design a system to extract the shape and texture feature at the same time for detecting PDAC. In this paper we propose a two-stage method for this 3D classification task. First, we segment the pancreas into a binary mask. Second, a FusionNet is proposed to take both the binary mask and CT image as input and perform a binary classification. The optimal architecture of the FusionNet is obtained by searching a pre-defined functional space. We show that the classification results using either shape or texture information are complementary, and by fusing them with the optimized architecture, the performance improves by a large margin. Our method achieves a specificity of 97% and a sensitivity of 92% on 200 normal scans and 136 scans with PDAC.

Fengze Liu, Yuyin Zhou, Elliot Fishman, Alan Yuille
Ultrasound Liver Fibrosis Diagnosis Using Multi-indicator Guided Deep Neural Networks

Accurate analysis of the fibrosis stage plays very important roles in follow-up of patients with chronic hepatitis B infection. In this paper, a deep learning framework is presented for automatically liver fibrosis prediction. On contrary of previous works, our approach can take use of the information provided by multiple ultrasound images. An indicator-guided learning mechanism is further proposed to ease the training of the proposed model. This follows the workflow of clinical diagnosis and make the prediction procedure interpretable. To support the training, a dataset is well-collected which contains the ultrasound videos/images, indicators and labels of 229 patients. As demonstrated in the experimental results, our proposed model shows its effectiveness by achieving the state-of-the-art performance, specifically, the accuracy is 65.6% (20% higher than previous best).

Jiali Liu, Wenxuan Wang, Tianyao Guan, Ningbo Zhao, Xiaoguang Han, Zhen Li
Weakly Supervised Segmentation by a Deep Geodesic Prior

The performance of the state-of-the-art image segmentation methods heavily relies on the high-quality annotations, which are not easily affordable, particularly for medical data. To alleviate this limitation, in this study, we propose a weakly supervised image segmentation method based on a deep geodesic prior. We hypothesize that integration of this prior information can reduce the adverse effects of weak labels in segmentation accuracy. Our proposed algorithm is based on a prior information, extracted from an auto-encoder, trained to map objects’ geodesic maps to their corresponding binary maps. The obtained information is then used as an extra term in the loss function of the segmentor. In order to show efficacy of the proposed strategy, we have experimented segmentation of cardiac substructures with clean and two levels of noisy labels (L1, L2). Our experiments showed that the proposed algorithm boosted the performance of baseline deep learning-based segmentation for both clean and noisy labels by $$4.4\%$$ , $$4.6\%$$ (L1), and $$6.3\%$$ (L2) in dice score, respectively. We also showed that the proposed method was more robust in the presence of high-level noise due to the existence of shape priors.

Aliasghar Mortazi, Naji Khosravan, Drew A. Torigian, Sila Kurugol, Ulas Bagci
Correspondence-Steered Volumetric Descriptor Learning Using Deep Functional Maps

In this paper, we consider the dense correspondence of volumetric images and propose a convolutional network-based descriptor learning framework using the functional map representation. Our main observation is that the correspondence-steered descriptor learning improves dense volumetric mapping compared with the hand-crafted descriptors. We present an unsupervised way to find the optimal network parameters by aligning volumetric probe functions and the enforcement of invertible coupled maps. The proposed framework takes the one-channel volume as input and outputs the multi-channel volumetric descriptors using the cascaded convolutional operators, which are faster than the conventional descriptor computations. We follow the deep functional map framework and represent the dense correspondence by the low-dimensional spectral mapping for the functional transfer and dense correspondence using the linear algebra. We demonstrate that by using the correspondence-steered deep descriptor learning, the quality of both the dense correspondence and attribute transfer are improved in the extensive experiments.

Diya Sun, Yuru Pei, Yungeng Zhang, Yuke Guo, Gengyu Ma, Tianmin Xu, Hongbin Zha
Sturm: Sparse Tubal-Regularized Multilinear Regression for fMRI

While functional magnetic resonance imaging (fMRI) is important for healthcare/neuroscience applications, it is challenging to classify or interpret due to its multi-dimensional structure, high dimensionality, and small number of samples available. Recent sparse multilinear regression methods based on tensor are emerging as promising solutions for fMRI. Particularly, the newly proposed tensor singular value decomposition (t-SVD) sheds light on new directions. In this work, we study t-SVD for sparse multilinear regression and propose a Sparse tubal-regularized multilinear regression (Sturm) method for fMRI. Specifically, the Sturm model performs multilinear regression with two regularization terms: a tubal tensor nuclear norm based on t-SVD and a standard $$\ell _1$$ norm. An optimization algorithm under the alternating direction method of multipliers framework is derived for solving the Sturm model. We then perform experiments on four classification problems, including both resting-state fMRI for disease diagnosis and task-based fMRI for neural decoding. The results show the superior performance of Sturm in classifying fMRI using just a small number of voxels.

Wenwen Li, Jian Lou, Shuo Zhou, Haiping Lu
Improving Whole-Brain Neural Decoding of fMRI with Domain Adaptation

In neural decoding, there has been a growing interest in machine learning on functional magnetic resonance imaging (fMRI). However, the size discrepancy between the whole-brain feature space and the training set poses serious challenges. Simply increasing the number of training examples is infeasible and costly. In this paper, we propose a domain adaptation framework for whole-brain fMRI (DawfMRI) to improve whole-brain neural decoding on target data leveraging source data. DawfMRI consists of two steps: (1) source and target feature adaptation, and (2) source and target classifier adaptation. We evaluate its four possible variations, using a collection of fMRI datasets from OpenfMRI. The results demonstrated that appropriate choices of source domain can help improve neural decoding accuracy for challenging classification tasks. The best-case improvement is $$10.47\%$$ (from $$77.26\%$$ to $$87.73\%$$ ). Moreover, visualising and interpreting voxel weights revealed that the adaptation can provide additional insights into neural decoding.

Shuo Zhou, Christopher R. Cox, Haiping Lu
Automatic Couinaud Segmentation from CT Volumes on Liver Using GLC-UNet

Automatically generating Couinaud segments on liver, a prerequisite for modern surgery of the liver, from computed tomography (CT) volumes is a challenge for the computer-aided diagnosis (CAD). In this paper, we propose a novel global and local contexts UNet (GLC-UNet) for Couinaud segmentation. In this framework, intra-slice features and 3D contexts are effectively probed and jointly optimized for accurate liver and Couinaud segmentation using attention mechanism. We comprehensively evaluate our system performance ( $$98.51\%$$ in terms of Dice per case on liver segmentation, and $$92.46\%$$ on Couinaud segmentation) on the Medical Segmentation Decathlon dataset (task 8, hepatic vessels and tumor) from MICCAI 2018 with our annotated 43, 205 CT slices on liver and Couinaud segmentation. ( https://github.com/GLCUnet/dataset ).

Jiang Tian, Li Liu, Zhongchao Shi, Feiyu Xu
Biomedical Image Segmentation by Retina-Like Sequential Attention Mechanism Using only a Few Training Images

In this paper we propose a novel deep learning-based algorithm for biomedical image segmentation which uses a sequential attention mechanism able to shift the focus of attention across the image in a selective way, allowing subareas which are more difficult to classify to be processed at increased resolution. The spatial distribution of class information in each subarea is learned using a retina-like representation where resolution decreases with distance from the center of attention. The final segmentation is achieved by averaging class predictions over overlapping subareas, utilizing the power of ensemble learning to increase segmentation accuracy. Experimental results for semantic segmentation task for which only a few training images are available show that a CNN using the proposed method outperforms both a patch-based classification CNN and a fully convolutional-based method.

Shohei Hayashi, Bisser Raytchev, Toru Tamaki, Kazufumi Kaneda
Conv-MCD: A Plug-and-Play Multi-task Module for Medical Image Segmentation

For the task of medical image segmentation, fully convolutional network (FCN) based architectures have been extensively used with various modifications. A rising trend in these architectures is to employ joint-learning of the target region with an auxiliary task, a method commonly known as multi-task learning. These approaches help impose smoothness and shape priors, which vanilla FCN approaches do not necessarily incorporate. In this paper, we propose a novel plug-and-play module, which we term as Conv-MCD, which exploits structural information in two ways - (i) using the contour map and (ii) using the distance map, both of which can be obtained from ground truth segmentation maps with no additional annotation costs. The key benefit of our module is the ease of its addition to any state-of-the-art architecture, resulting in a significant improvement in performance with a minimal increase in parameters. To substantiate the above claim, we conduct extensive experiments using 4 state-of-the-art architectures across various evaluation metrics, and report a significant increase in performance in relation to the base networks. In addition to the aforementioned experiments, we also perform ablative studies and visualization of feature maps to further elucidate our approach.

Balamurali Murugesan, Kaushik Sarveswaran, Sharath M. Shankaranarayana, Keerthi Ram, Jayaraj Joseph, Mohanasankar Sivaprakasam
Detecting Abnormalities in Resting-State Dynamics: An Unsupervised Learning Approach

Resting-state functional MRI (rs-fMRI) is a rich imaging modality that captures spontaneous brain activity patterns, revealing clues about the connectomic organization of the human brain. While many rs-fMRI studies have focused on static measures of functional connectivity, there has been a recent surge in examining the temporal patterns in these data. In this paper, we explore two strategies for capturing the normal variability in resting-state activity across a healthy population: (a) an autoencoder approach on the rs-fMRI sequence, and (b) a next frame prediction strategy. We show that both approaches can learn useful representations of rs-fMRI data and demonstrate their novel application for abnormality detection in the context of discriminating autism patients from healthy controls.

Meenakshi Khosla, Keith Jamison, Amy Kuceyeski, Mert R. Sabuncu
Distanced LSTM: Time-Distanced Gates in Long Short-Term Memory Models for Lung Cancer Detection

The field of lung nodule detection and cancer prediction has been rapidly developing with the support of large public data archives. Previous studies have largely focused cross-sectional (single) CT data. Herein, we consider longitudinal data. The Long Short-Term Memory (LSTM) model addresses learning with regularly spaced time points (i.e., equal temporal intervals). However, clinical imaging follows patient needs with often heterogeneous, irregular acquisitions. To model both regular and irregular longitudinal samples, we generalize the LSTM model with the Distanced LSTM (DLSTM) for temporally varied acquisitions. The DLSTM includes a Temporal Emphasis Model (TEM) that enables learning across regularly and irregularly sampled intervals. Briefly, (1) the temporal intervals between longitudinal scans are modeled explicitly, (2) temporally adjustable forget and input gates are introduced for irregular temporal sampling; and (3) the latest longitudinal scan has an additional emphasis term. We evaluate the DLSTM framework in three datasets including simulated data, 1794 National Lung Screening Trial (NLST) scans, and 1420 clinically acquired data with heterogeneous and irregular temporal accession. The experiments on the first two datasets demonstrate that our method achieves competitive performance on both simulated and regularly sampled datasets (e.g. improve LSTM from 0.6785 to 0.7085 on F1 score in NLST). In external validation of clinically and irregularly acquired data, the benchmarks achieved 0.8350 (CNN feature) and 0.8380 (LSTM) on area under the ROC curve (AUC) score, while the proposed DLSTM achieves 0.8905.

Riqiang Gao, Yuankai Huo, Shunxing Bao, Yucheng Tang, Sanja L. Antic, Emily S. Epstein, Aneri B. Balar, Steve Deppen, Alexis B. Paulson, Kim L. Sandler, Pierre P. Massion, Bennett A. Landman
Dense-Residual Attention Network for Skin Lesion Segmentation

In this paper, we propose a dense-residual attention network for skin lesion segmentation. The proposed network is end-to-end and doesn’t need any post-processing operations or pretrained weights to fine-tune. Specifically, we propose the dense-residual block in our network to deal with the problem of fixed receptive field and meanwhile ease the gradient vanishing problem (as often occurred in convolution neural networks). Moreover, an attention gate is designed to enhance the network discriminative ability and ensure the efficiency of feature learning. During the network training, we introduce a novel loss function based on the jaccard distance to tackle the class imbalance issue in medical datasets. The proposed network achieves the state-of-the-art performance on the benchmark ISIC 2017 Challenge dataset without any external training samples. Experimental results show the effectiveness of our dense-residual attention network.

Lei Song, Jianzhe Lin, Z. Jane Wang, Haoqian Wang
Confounder-Aware Visualization of ConvNets

With recent advances in deep learning, neuroimaging studies increasingly rely on convolutional networks (ConvNets) to predict diagnosis based on MR images. To gain a better understanding of how a disease impacts the brain, the studies visualize the salience maps of the ConvNet highlighting voxels within the brain majorly contributing to the prediction. However, these salience maps are generally confounded, i.e., some salient regions are more predictive of confounding variables (such as age) than the diagnosis. To avoid such misinterpretation, we propose in this paper an approach that aims to visualize confounder-free saliency maps that only highlight voxels predictive of the diagnosis. The approach incorporates univariate statistical tests to identify confounding effects within the intermediate features learned by ConvNet. The influence from the subset of confounded features is then removed by a novel partial back-propagation procedure. We use this two-step approach to visualize confounder-free saliency maps extracted from synthetic and two real datasets. These experiments reveal the potential of our visualization in producing unbiased model-interpretation.

Qingyu Zhao, Ehsan Adeli, Adolf Pfefferbaum, Edith V. Sullivan, Kilian M. Pohl
Detecting Lesion Bounding Ellipses with Gaussian Proposal Networks

Lesions characterized by computed tomography (CT) scans, are arguably often elliptical objects. However, current lesion detection systems are predominantly adopted from the popular Region Proposal Networks (RPNs) [9] that only propose bounding boxes without fully leveraging the elliptical geometry of lesions. In this paper, we present Gaussian Proposal Networks (GPNs), a novel extension to RPNs, to detect lesion bounding ellipses. Instead of directly regressing the rotation angle of the ellipse as the common practice, GPN represents bounding ellipses as 2D Gaussian distributions on the image plane and minimizes the Kullback-Leibler (KL) divergence between the proposed Gaussian and the ground truth Gaussian for object localization. Experiments on the DeepLesion [13] dataset show that GPN significantly outperforms RPN for lesion bounding ellipse detection thanks to lower localization error.

Yi Li
Modelling Airway Geometry as Stock Market Data Using Bayesian Changepoint Detection

Numerous lung diseases, such as idiopathic pulmonary fibrosis (IPF), exhibit dilation of the airways. Accurate measurement of dilatation enables assessment of the progression of disease. Unfortunately the combination of image noise and airway bifurcations causes high variability in the profiles of cross-sectional areas, rendering the identification of affected regions very difficult. Here we introduce a noise-robust method for automatically detecting the location of progressive airway dilatation given two profiles of the same airway acquired at different time points. We propose a probabilistic model of abrupt relative variations between profiles and perform inference via Reversible Jump Markov Chain Monte Carlo sampling. We demonstrate the efficacy of the proposed method on two datasets; (i) images of healthy airways with simulated dilatation; (ii) pairs of real images of IPF-affected airways acquired at 1 year intervals. Our model is able to detect the starting location of airway dilatation with an accuracy of 2.5 mm on simulated data. The experiments on the IPF dataset display reasonable agreement with radiologists. We can compute a relative change in airway volume that may be useful for quantifying IPF disease progression.

Kin Quan, Ryutaro Tanno, Michael Duong, Arjun Nair, Rebecca Shipley, Mark Jones, Christopher Brereton, John Hurst, David Hawkes, Joseph Jacob
Unsupervised Lesion Detection with Locally Gaussian Approximation

Generative models have recently been applied to unsupervised lesion detection, where a distribution of normal data, i.e. the normative distribution, is learned and lesions are detected as out-of-distribu-tion regions. However, directly calculating the probability for the lesion region using the normative distribution is intractable. In this work, we address this issue by approximating the normative distribution with local Gaussian approximation and evaluating the probability of test samples in an iterative manner. We show that the local Gaussian approximator can be applied to several auto-encoding models to perform image restoration and unsupervised lesion detection. The proposed method is evaluated on the BraTS Challenge dataset, where the proposed method shows improved detection and achieves state-of-the-art results.

Xiaoran Chen, Nick Pawlowski, Ben Glocker, Ender Konukoglu
A Hybrid Multi-atrous and Multi-scale Network for Liver Lesion Detection

Liver lesion detection on abdominal computed tomography (CT) is a challenging topic because of its large variance. Current detection methods based on a 2D convolutional neural network (CNN) are limited by the inconsistent view of lesions. One obvious observation is that it can easily lead to a discontinuity problem since it ignores the information between CT slices. To solve this problem, we propose a novel hybrid multi-atrous and multi-scale network (HMMNet). Our network treats the liver lesion detection in a 3D setting as finding a 3D cubic bounding box of a liver lesion. In our work, a multi-atrous 3D convolutional network (MA3DNet) is designed as the backbone. It comes with different dilation rate along z-axis to tackle the various resolutions in z-axis for different CT volumes. In addition, multi-scale features are extracted in a component, called feature extractor, to cover the volume and appearance diversities of liver lesions in a transversal plane. Finally, the features from our backbone and feature extractor are combined to offer the sizing and position measures of liver lesions. These information are frequently referred in a diagnostic report. Compared with other state-of-the-art 2D and 3D convolutional detection models, our HMMNet achieves the top-notch detection performance on the public Liver Tumor Segmentation Challenge (LiTS) dataset, where the F-score are 54.8% and 34.2% on average with the intersection-over-union (IoU) of 0.5 and 0.75 respectively. We also notice that our HMMNet model can be directly applied to the public Medical Segmentation Decathlon dataset without fine-tuning. This further illustrates the generalization capability of our proposed method.

Yanan Wei, Xuan Jiang, Kun Liu, Cheng Zhong, Zhongchao Shi, Jianjun Leng, Feiyu Xu
BOLD fMRI-Based Brain Perfusion Prediction Using Deep Dilated Wide Activation Networks

Arterial spin labeling (ASL) perfusion MRI and blood-oxygen-level-dependent (BOLD) fMRI provide complementary information for assessing brain functions. ASL is quantitative, insensitive to low-frequency drift but has lower signal-to-noise-ratio (SNR) and lower temporal resolution than BOLD. However, there still lacks a way to fuse the benefits provided by both of them. When only one modality is available, it is also desirable to have a technique that can extract the other modality from the one being acquired. The purpose of this study was to develop such a technique that can combine the advantages of BOLD fMRI and ASL MRI, i.e., to quantify cerebral blood flow (CBF) like ASL MRI but with high SNR and temporal resolution as in BOLD fMRI. We pursued this goal using a new deep learning-based algorithm to extract CBF directly from BOLD fMRI. Using a relatively large dataset containing dual-echo ASL and BOLD images, we built a wide residual learning based convolutional neural network to predict CBF from BOLD fMRI. We dubbed this technique as a BOA-Net (BOLD to ASL networks). Our testing results demonstrated that ASL CBF can be reliably predicted from BOLD fMRI with comparable image quality and higher SNR. We also evaluated BOA-Net with different deep learning networks.

Danfeng Xie, Yiran Li, HanLu Yang, Donghui Song, Yuanqi Shang, Qiu Ge, Li Bai, Ze Wang
Jointly Discriminative and Generative Recurrent Neural Networks for Learning from fMRI

Recurrent neural networks (RNNs) were designed for dealing with time-series data and have recently been used for creating predictive models from functional magnetic resonance imaging (fMRI) data. However, gathering large fMRI datasets for learning is a difficult task. Furthermore, network interpretability is unclear. To address these issues, we utilize multitask learning and design a novel RNN-based model that learns to discriminate between classes while simultaneously learning to generate the fMRI time-series data. Employing the long short-term memory (LSTM) structure, we develop a discriminative model based on the hidden state and a generative model based on the cell state. The addition of the generative model constrains the network to learn functional communities represented by the LSTM nodes that are both consistent with the data generation as well as useful for the classification task. We apply our approach to the classification of subjects with autism vs. healthy controls using several datasets from the Autism Brain Imaging Data Exchange. Experiments show that our jointly discriminative and generative model improves classification learning while also producing robust and meaningful functional communities for better model understanding.

Nicha C. Dvornek, Xiaoxiao Li, Juntang Zhuang, James S. Duncan
Unsupervised Conditional Consensus Adversarial Network for Brain Disease Identification with Structural MRI

Effective utilization of multi-domain data for brain disease identification has recently attracted increasing attention since a large number of subjects from multiple domains could be beneficial for investigating the pathological changes of disease-affected brains. Previous machine learning methods often suffer from inter-domain data heterogeneity caused by different scanning parameters. Although several deep learning methods have been developed, they usually assume that the source classifier can be directly transferred to the target (i.e., to-be-analyzed) domain upon the learned domain-invariant features, thus ignoring the shift in data distributions across different domains. Also, most of them rely on fully-labeled data in both target and source domains for model training, while labeled target data are generally unavailable. To this end, we present an Unsupervised Conditional consensus Adversarial Network (UCAN) for deep domain adaptation, which can learn the disease classifier from the labeled source domain and adapt to a different target domain (without any label information). The UCAN model contains three major components: (1) a feature extraction module for learning discriminate representations from the input MRI, (2) a cycle feature adaptation module to assist feature and classifier adaptation between the source and target domains, and (3) a classification module for disease identification. Experimental results on 1, 506 subjects from ADNI1 (with 1.5 T structural MRI) and ADNI2 (with 3.0 T structural MRI) have demonstrated the effectiveness of the proposed UCAN method in brain disease identification, compared with state-of-the-art approaches.

Jing Zhang, Mingxia Liu, Yongsheng Pan, Dinggang Shen
A Maximum Entropy Deep Reinforcement Learning Neural Tracker

Tracking of anatomical structures has multiple applications in the field of biomedical imaging, including screening, diagnosing and monitoring the evolution of pathologies. Semi-automated tracking of elongated structures has been previously formulated as a problem suitable for deep reinforcement learning (DRL), but it remains a challenge. We introduce a maximum entropy continuous-action DRL neural tracker capable of training from scratch in a complex environment in the presence of high noise levels, Gaussian blurring and detractors. The trained model is evaluated on two-photon microscopy images of mouse cortex. At the expense of slightly worse robustness compared to a previously applied DRL tracker, we reach significantly higher accuracy, approaching the performance of the standard hand-engineered algorithm used for neuron tracing. The higher sample efficiency of our maximum entropy DRL tracker indicates its potential of being applied directly to small biomedical datasets.

Shafa Balaram, Kai Arulkumaran, Tianhong Dai, Anil Anthony Bharath
Weakly Supervised Confidence Learning for Brain MR Image Dense Parcellation

Automatic dense parcellation of brain MR image, which labels hundreds of regions of interest (ROIs), plays an important role for neuroimage analysis. Specifically, the brain image parcellation using deep learning technology has been widely recognized for its effective performance, but it remains limited in actual application due to its high demand for sufficient training data and intensive memory allocation of GPU resources. Due to the high cost of manual segmentation, it is usually not feasible to provide large dataset for training the network. On the other hand, it is relatively easy to transfer labeling information to many new unlabeled datasets and thus augment the training data. However, the augmented data can only be considered as weakly labeled for training. Therefore, in this paper, we propose a cascaded weakly super- vised confidence integration network (CINet). The main contributions of our method are two-folds. First, we propose the image registration-based data argumentation method, and evaluate the confidence of the labeling information for each augmented image. The augmented data, as well as the original yet small training dataset, contribute to the modeling of the CINet jointly for segmentation. Second, we propose the random crop strategy to handle the large amount of feature channels in the network, which are needed to label hundreds of neural ROIs. The demanding requirement to GPU memory is thus relieved, while better accuracy can also be achieved. In experiments, we use 37 manually labeled subjects and augment 96 images with weak labels for training. The testing result in overall Dice score over 112 brain regions reaches 75%, which is higher than using the original training data only.

Bin Xiao, Xiaoqing Cheng, Qingfeng Li, Qian Wang, Lichi Zhang, Dongming Wei, Yiqiang Zhan, Xiang Sean Zhou, Zhong Xue, Guangming Lu, Feng Shi
Select, Attend, and Transfer: Light, Learnable Skip Connections

Skip connections in deep networks have improved both segmentation and classification performance by facilitating the training of deeper network architectures and reducing the risks for vanishing gradients. The skip connections equip encoder-decoder like networks with richer feature representations, but at the cost of higher memory usage, computation, and possibly resulting in transferring non-discriminative feature maps. In this paper, we focus on improving the skip connections used in segmentation networks. We propose light, learnable skip connections which learn to first select the most discriminative channels, and then aggregate the selected ones as single channel attending to the most discriminative regions of input. We evaluate the proposed method on 3 different 2D and volumetric datasets and demonstrate that the proposed skip connections can outperform the traditional heavy skip connections of 4 different models in terms of segmentation accuracy (2% Dice), memory usage (at least 50%), and the number of network parameters (up to 70%).

Saeid Asgari Taghanaki, Aicha Bentaieb, Anmol Sharma, S. Kevin Zhou, Yefeng Zheng, Bogdan Georgescu, Puneet Sharma, Zhoubing Xu, Dorin Comaniciu, Ghassan Hamarneh
Learning-Based Bone Quality Classification Method for Spinal Metastasis

Spinal metastasis is the most common disease in bone metastasis and may cause pain, instability and neurological injuries. Early detection of spinal metastasis is critical for accurate staging and optimal treatment. The diagnosis is usually facilitated with Computed Tomography (CT) scans, which requires considerable efforts from well-trained radiologists. In this paper, we explore a learning-based automatic bone quality classification method for spinal metastasis based on CT images. We simultaneously take the posterolateral spine involvement classification task into account, and employ multi-task learning (MTL) technique to improve the performance. MTL acts as a form of inductive bias which helps the model generalize better on each task by sharing representations between related tasks. Based on the prior knowledge that the mixed type can be viewed as both blastic and lytic, we model the task of bone quality classification as two binary classification sub-tasks, i.e., whether blastic and whether lytic, and leverage a multiple layer perceptron to combine their predictions. In order to make the model more robust and generalize better, self-paced learning is adopted to gradually involve from easy to more complex samples into the training process. The proposed learning-based method is evaluated on a proprietary spinal metastasis CT dataset. At slice level, our method significantly outperforms an 121-layer DenseNet classifier in sensitivities by +12.54%, +7.23% and +29.06% for blastic, mixed and lytic lesions, respectively, meanwhile +12.33%, +23.21% and +34.25% at vertebrae level.

Shiqi Peng, Bolin Lai, Guangyu Yao, Xiaoyun Zhang, Ya Zhang, Yan-Feng Wang, Hui Zhao
Automated Segmentation of Skin Lesion Based on Pyramid Attention Network

Automatic segmentation of skin lesion in dermatoscope images is important for clinic diagnosis and assessment of melanoma. However, due to the large variations of scale, shape and appearance of the lesion area, accurate and automatic segmentation of skin lesion is faced with great challenges. In this paper, we first introduce the pyramid attention module for global multi-scale features aggregation. The module selectively integrates different multi-scale features associated with lesion by optimizing the features of each scale and suppressing the irrelevant noise. Based on this module, we propose an automatic framework for skin lesion segmentation. In addition, the widely used loss function based on dice coefficient is independent of the relative size of the segmented target, which leads to the insufficient attention of the network to small-scale samples. Therefore, we propose a new loss function based on scale-attention to effectively balance the weight of attention of the network to samples with different scales and improve the segmentation accuracy of small-scale samples. The robustness of the proposed method was evaluated on two public databases: ISIC 2017 and 2018 for skin lesion analysis towards melanoma detection challenge and it could prove that the proposed method could considerably improve the performance of skin lesion segmentation and achieve the state-of-the-art results on ISIC 2017.

Huan Wang, Guotai Wang, Ze Sheng, Shaoting Zhang
Relu Cascade of Feature Pyramid Networks for CT Pulmonary Nodule Detection

Screening of pulmonary nodules in computed tomography (CT) is important for early detection and treatment of lung cancer. Many existing works utilize faster RCNN (regions with convolutional neural network or region proposal network) for this task. However, their performance is often limited, especially for detecting small pulmonary nodules (<4 mm). In this work, we propose a new cascade paradigm called “Relu cascade” to detect pulmonary nodules. The training of “Relu cascade” is similar to the conventional cascade learning approach. First, a detection network is trained using limited positive annotations (nodules) and randomly sampled negative samples (background). Then, a second detection network is trained with the same amount of positives and false positives produced by the first network. By repeating this process, multiple detection networks can be trained with subsequent detection networks tuned specifically to eliminate the false positives produced by previous detection networks. The novelty of “Relu cascade” lies in the way of chaining these networks into a cascade. Different from the conventional cascade learning, each level filters out false positive detections independently in the testing phase, which is prone to overfitting as later levels are very specific to a small amount of negative samples. In “Relu cascade”, nodule likelihoods at all previous levels are aggregated, based on which false positives are identified and filtered out. Experimental results on 606 CT scans of different patients show that “Relu cascade” greatly improves the detection performance of conventional cascade learning.

Guangrui Mu, Yanbo Chen, Dijia Wu, Yiqiang Zhan, Xiang Sean Zhou, Yaozong Gao
Joint Localization of Optic Disc and Fovea in Ultra-widefield Fundus Images

Automated localization of optic disc and fovea is important for computer-aided retinal disease screening and diagnosis. Compared to previous works, this paper makes two novelties. First, we study the localization problem in the new context of ultra-widefield (UWF) fundus images, which has not been considered before. Second, we propose a spatially constrained Faster R-CNN for the task. Extensive experiments on a set of 2,182 UWF fundus images acquired from a local eye center justify the viability of the proposed model. For more than 99% of the test images, the improved Faster R-CNN localizes the fovea within one optic disc diameter to the ground truth, meanwhile detecting the optic disc with a high IoU of 0.82. The new model works reasonably well even in challenging cases where the fovea is occluded due to severe retinopathy or surgical treatments.

Zhuoya Yang, Xirong Li, Xixi He, Dayong Ding, Yanting Wang, Fangfang Dai, Xuemin Jin
Multi-scale Attentional Network for Multi-focal Segmentation of Active Bleed After Pelvic Fractures

Trauma is the worldwide leading cause of death and disability in those younger than 45 years, and pelvic fractures are a major source of morbidity and mortality. Automated segmentation of multiple foci of arterial bleeding from abdominopelvic trauma CT could provide rapid objective measurements of the total extent of active bleeding, potentially augmenting outcome prediction at the point of care, while improving patient triage, allocation of appropriate resources, and time to definitive intervention. In spite of the importance of active bleeding in the quick tempo of trauma care, the task is still quite challenging due to the variable contrast, intensity, location, size, shape, and multiplicity of bleeding foci. Existing work presents a heuristic rule-based segmentation technique which requires multiple stages and cannot be efficiently optimized end-to-end. To this end, we present, Multi-Scale Attentional Network (MSAN), the first yet reliable end-to-end network, for automated segmentation of active hemorrhage from contrast-enhanced trauma CT scans. MSAN consists of the following components: (1) an encoder which fully integrates the global contextual information from holistic 2D slices; (2) a multi-scale strategy applied both in the training stage and the inference stage to handle the challenges induced by variation of target sizes; (3) an attentional module to further refine the deep features, leading to better segmentation quality; and (4) a multi-view mechanism to leverage the 3D information. MSAN reports a significant improvement of more than $$7\%$$ compared to prior arts in terms of DSC.

Yuyin Zhou, David Dreizin, Yingwei Li, Zhishuai Zhang, Yan Wang, Alan Yuille
Lesion Detection by Efficiently Bridging 3D Context

Lesion detection in CT (computed tomography) scan images is an important yet challenging task due to the low contrast of soft tissues and similar appearance between lesion and the background. Exploiting 3D context information has been studied extensively to improve detection accuracy. However, previous methods either use a 3D CNN which usually requires a sliding window strategy to inference and only acts on local patches; or simply concatenate feature maps of independent 2D CNNs to obtain 3D context information, which is less effective to capture 3D knowledge. To address these issues, we design a hybrid detector to combine benefits from both of the above methods. We propose to build several light-weighted 3D CNNs as subnets to bridge 2D CNNs’ intermediate features, so that 2D CNNs are connected with each other which interchange 3D context information while feed-forwarding. Comprehensive experiments in DeepLesion dataset show that our method can combine 3D knowledge effectively and provide higher quality backbone features. Our detector surpasses the current state-of-the-art by a large margin with comparable speed and GPU memory consumption.

Zhishuai Zhang, Yuyin Zhou, Wei Shen, Elliot Fishman, Alan Yuille
Communal Domain Learning for Registration in Drifted Image Spaces

Designing a registration framework for images that do not share the same probability distribution is a major challenge in modern image analytics yet trivial task for the human visual system (HVS). Discrepancies in probability distributions, also known as drifts, can occur due to various reasons including, but not limited to differences in sequences and modalities (e.g., MRI T1-T2 and MRI-CT registration), or acquisition settings (e.g., multisite, inter-subject, or intra-subject registrations). The popular assumption about the working of HVS is that it exploits a communal feature subspace exists between the registering images or fields-of-view that encompasses key drift-invariant features. Mimicking the approach that is potentially adopted by the HVS, herein, we present a representation learning technique of this invariant communal subspace that is shared by registering domains. The proposed communal domain learning (CDL) framework uses a set of hierarchical nonlinear transforms to learn the communal subspace that minimizes the probability differences and maximizes the amount of shared information between the registering domains. Similarity metric and parameter optimization calculations for registration are subsequently performed in the drift-minimized learned communal subspace. This generic registration framework is applied to register multisequence (MR: T1, T2) and multimodal (MR, CT) images. Results demonstrated generic applicability, consistent performance, and statistically significant improvement for both multi-sequence and multi-modal data using the proposed approach (p-value $$<0.001$$ ; Wilcoxon rank sum test) over baseline methods.

Awais Mansoor, Marius George Linguraru
Conv2Warp: An Unsupervised Deformable Image Registration with Continuous Convolution and Warping

Recent successes in deep learning based deformable image registration (DIR) methods have demonstrated that complex deformation can be learnt directly from data while reducing computation time when compared to traditional methods. However, the reliance on fully linear convolutional layers imposes a uniform sampling of pixel/voxel locations which ultimately limits their performance. To address this problem, we propose a novel approach of learning a continuous warp of the source image. Here, the required deformation vector fields are obtained from a concatenated linear and non-linear convolution layers and a learnable bicubic Catmull-Rom spline resampler. This allows to compute smooth deformation field and more accurate alignment compared to using only linear convolutions and linear resampling. In addition, the continuous warping technique penalizes disagreements that are due to topological changes. Our experiments demonstrate that this approach manages to capture large non-linear deformations and minimizes the propagation of interpolation errors. While improving accuracy the method is computationally efficient. We present comparative results on a range of public 4D CT lung (POPI) and brain datasets (CUMC12, MGH10).

Sharib Ali, Jens Rittscher
Semantic Filtering Through Deep Source Separation on Microscopy Images

By their very nature microscopy images of cells and tissues consist of a limited number of object types or components. In contrast to most natural scenes, the composition is known a priori. Decomposing biological images into semantically meaningful objects and layers is the aim of this paper. Building on recent approaches to image de-noising we present a framework that achieves state-of-the-art segmentation results requiring little or no manual annotations. Here, synthetic images generated by adding cell crops are sufficient to train the model. Extensive experiments on cellular images, a histology data set, and small animal videos demonstrate that our approach generalizes to a broad range of experimental settings. As the proposed methodology does not require densely labelled training images and is capable of resolving the partially overlapping objects it holds the promise of being of use in a number of different applications.

Avelino Javer, Jens Rittscher
Adaptive Functional Connectivity Network Using Parallel Hierarchical BiLSTM for MCI Diagnosis

Most of the existing dynamic functional connectivity (dFC) analytical methods compute the correlation between pairs of time courses with the sliding window. However, there is no clear indication on the standard window characteristics (length and shape) that best suit for all analyses, and it cannot pinpoint to compute the dynamic correlation of brain region for each time point. Besides, most of the current studies that utilize the dFC for MCI identification mainly relied on the local clustering coefficient for extracting dynamic features and the support vector machine (SVM) as a classifier. In this paper, we propose a novel adaptive dFC inference method and a deep learning classifier for MCI identification. Specifically, a group-constrained structure detection algorithm is first designed to identify the refined topology of the effective connectivity network, in which the individual information is preserved via different connectivity values. Second, based on the identified topology structure, the adaptive dFC network is then constructed by using the Kalman Filter algorithm to estimate the brain region connectivity strength for each time point. Finally, the adaptive dFC network is validated in MCI identification using a new Parallel Hierarchical Bidirectional Long Short-Term Memory (PH-BiLSTM) network, which extracts as much brain status change information as possible from both the past and future information. The results show that the proposed method achieves relatively high classification accuracy.

Yiqiao Jiang, Huifang Huang, Jingyu Liu, Chong-Yaw Wee, Yang Li
Multi-template Based Auto-Weighted Adaptive Structural Learning for ASD Diagnosis

Autism spectrum disorder (ASD) is a group of neurodevelopmental disorder and its diagnosis is still a challenging issue. To handle it, we propose a novel multi-template ensemble classification framework for ASD diagnosis. Specifically, based on different templates, we construct multiple functional connectivity brain networks for each subject using resting-state functional magnetic resonance imaging (rs-fMRI) data and extract features representations from these networks. Then, our auto-weighted adaptive structural learning model can learn the shared similarity matrix by an adaptive process while selecting informative features. In addition, our method can automatically allot optimal weight for each template without extra weights and parameters. Further, an ensemble classification strategy is adopted to get the final diagnosis results. Our extensive experiments conducted on the Autism Brain Imaging Data Exchange (ABIDE) database demonstrate that our method can improve ASD diagnosis performance. Additionally, our method can detect the ASD-related biomarkers for further medical analysis.

Fanglin Huang, Peng Yang, Shan Huang, Le Ou-Yang, Tianfu Wang, Baiying Lei
Learn to Step-wise Focus on Targets for Biomedical Image Segmentation

Current segmentation networks based on the encoder-decoder architecture have tried recovering spatial information by stacking convolution blocks in the decoder. Unconventionally, we consider that iteratively exploiting spatial attention from high stage to refine lower stage features can form an attention-driven mechanism to step-wise recover detailed features. In this paper, we rethink image segmentation from a novel perspective: a process of step-wise focusing on targets. We develop a lightweight Focus Module (FM) and present a powerful transplantable Step-wise Focus Network (SFN) for biomedical image segmentation. FM extracts high-level spatial attention and combines it with low-level features by our proposed focus learning to generate revised features. Our SFN extends U-Net encoder sub-network and employs just FMs to construct a focus path in order to consistently refine features. We evaluate SFNs in comparison with U-Net and other state-of-art methods on multiple biomedical image segmentation benchmarks. While using 30% floating-point operations and 60% parameters of U-Net, SFNs achieve great performances without any postprocessing.

Siyuan Wei, Li Wang
Renal Cell Carcinoma Staging with Learnable Image Histogram-Based Deep Neural Network

Renal cell carcinoma (RCC) is the seventh most common cancer worldwide, accounting for an estimated 140,000 global deaths annually. An important RCC prognostic predictor is its ‘stage’ for which the tumor-node-metastasis (TNM) staging system is used. Although TNM staging is performed by radiologists via pre-surgery volumetric medical image analysis, a recent study suggested that such staging may be performed by studying the image features of the RCC from computed tomography (CT) data. Currently TNM staging mostly relies on laborious manual processes based on visual inspection of 2D CT image slices that are time-consuming and subjective; a recent study reported about $$\sim $$ 25% misclassification in their patient pools. Recently, we proposed a learnable image histogram based deep neural network approach (ImHistNet) for RCC grading, which is capable of learning textural features directly from the CT images. In this paper, using a similar architecture, we perform the stage low (I/II) and high (III/IV) classification for RCC in CT scans. Validated on a clinical CT dataset of 159 patients from the TCIA database, our method classified RCC low and high stages with about 83% accuracy.

Mohammad Arafat Hussain, Ghassan Hamarneh, Rafeef Garbi
Weakly Supervised Learning Strategy for Lung Defect Segmentation

Through the development of specific magnetic resonance sequences, it is possible to measure the physiological properties of the lung parenchyma, e.g., ventilation. Automatic segmentation of pathologies in such ventilation maps is essential for the clinical application. The generation of labeled ground truth data is costly, time-consuming and requires much experience in the field of lung anatomy and physiology. In this paper, we present a weakly supervised learning strategy for the segmentation of defected lung areas in those ventilation maps. As a weak label, we use the Lung Clearance Index (LCI) which is measured by a Multiple Breath Washout test. The LCI is a single global measure for the ventilation inhomogeneities of the whole lung. We designed a network and a training procedure in order to infer a pixel-wise segmentation from the global LCI value. Our network is composed of two autoencoder sub-networks for the extraction of global and local features respectively. Furthermore, we use self-supervised regularization to prevent the network from learning non-meaningful segmentations. The performance of our method is evaluated by a rating of the created defect segmentations by 5 human experts, where over $$60\%$$ of the segmentation results are rated with very good or perfect.

Robin Sandkühler, Christoph Jud, Grzegorz Bauman, Corin Willers, Orso Pusterla, Sylvia Nyilas, Alan Peters, Lukas Ebner, Enno Stranziger, Oliver Bieri, Philipp Latzin, Philippe C. Cattin
Gated Recurrent Neural Networks for Accelerated Ventilation MRI

Thanks to recent advancements of specific acquisition methods and post-processing, proton Magnetic Resonance Imaging became an alternative imaging modality for detecting and monitoring chronic pulmonary disorders. Currently, ventilation maps of the lung are calculated from time-resolved image series which are acquired under free breathing. Each series consists of 140 coronal 2D images containing several breathing cycles. To cover the majority of the lung, such a series is acquired at several coronal slice-positions. A reduction of the number of images per slice enable an increase in the number of slice-positions per patient and therefore a more detailed analysis of the lung function without adding more stress to the patient. In this paper, we present a new method in order to reduce the number of images for one coronal slice while preserving the quality of the ventilation maps. As the input is a time-dependent signal, we designed our model based on Gated Recurrent Units. The results show that our method is able to compute ventilation maps with a high quality using only 40 images. Furthermore, our method shows strong robustness regarding changes in the breathing cycles during the acquisition.

Robin Sandkühler, Grzegorz Bauman, Sylvia Nyilas, Orso Pusterla, Corin Willers, Oliver Bieri, Philipp Latzin, Christoph Jud, Philippe C. Cattin
A Cascaded Multi-modality Analysis in Mild Cognitive Impairment

Though reversing the pathology of Alzheimer’s disease (AD) has so far not been possible, a more tractable goal may be the prevention or slowing of the disease when diagnosed in its earliest stage, such as mild cognitive impairment (MCI). Recent advances in deep modeling approaches trigger a new era for AD/MCI classification. However, it is still difficult to integrate multi-modal imaging data into a single deep model, to gain benefit from complementary datasets as much as possible. To address this challenge, we propose a cascaded deep model to capture both brain structural and functional characteristic for MCI classification. With diffusion tensor imaging (DTI) and functional magnetic resonance imaging (fMRI) data, a graph convolution network (GCN) is constructed based on brain structural connectome and it works with a one-layer recurrent neural network (RNN) which is responsible for inferring the temporal features from brain functional activities. We named this cascaded deep model as Graph Convolutional Recurrent Neural Network (GCRNN). Using Alzheimer’s Disease Neuroimaging Initiative (ADNI-3) dataset as a test-bed, our method can achieve 97.3% accuracy between normal controls (NC) and MCI patients.

Lu Zhang, Akib Zaman, Li Wang, Jingwen Yan, Dajiang Zhu
Deep Residual Learning for Instrument Segmentation in Robotic Surgery

Detection, tracking, and pose estimation of surgical instruments provide critical information that can be used to correct inaccuracies in kinematic data in robotic-assisted surgery. Such information can be used for various purposes including integration of pre- and intra-operative images into the endoscopic view. In some cases, automatic segmentation of surgical instruments is a crucial step towards full instrument pose estimation but it can also be solely used to improve user interactions with the robotic system. In our work we focus on binary instrument segmentation, where the objective is to label every pixel as instrument or background and instrument part segmentation, where different semantically separate parts of the instrument are labeled. We improve upon previous work by leveraging recent techniques such as deep residual learning and dilated convolutions and advance both binary-segmentation and instrument part segmentation performance on the EndoVis 2017 Robotic Instruments dataset. The source code for the experiments reported in the paper has been made public ( https://github.com/warmspringwinds/pytorch-segmentation-detection ).

Daniil Pakhomov, Vittal Premachandran, Max Allan, Mahdi Azizian, Nassir Navab
Deep Learning Model Integrating Dilated Convolution and Deep Supervision for Brain Tumor Segmentation in Multi-parametric MRI

Automatic segmentation of brain tumor in magnetic resonance images (MRI) is necessary for diagnosis, monitoring and treatment. Manual segmentation is time-consuming, expensive and subjective. In this paper we present a robust automatic segmentation algorithm based on 3D U-Net. We propose a novel residual block with dilated convolution (res_dil block) and incorporate deep supervision to improve the segmentation results. We also compare the effect of different losses on the class imbalance problem. To prove the effectiveness of our method, we analyze each component proposed in the network architecture and we demonstrate that segmentation results can be improved by these components. Experiment results on the BraTS 2017 and BraTS 2018 datasets show that the proposed method can achieve good performance on brain tumor segmentation.

Tongxue Zhou, Su Ruan, Haigen Hu, Stéphane Canu
A Joint 3D UNet-Graph Neural Network-Based Method for Airway Segmentation from Chest CTs

We present an end-to-end deep learning segmentation method by combining a 3D UNet architecture with a graph neural network (GNN) model. In this approach, the convolutional layers at the deepest level of the UNet are replaced by a GNN-based module with a series of graph convolutions. The dense feature maps at this level are transformed into a graph input to the GNN module. The incorporation of graph convolutions in the UNet provides nodes in the graph with information that is based on node connectivity, in addition to the local features learnt through the downsampled paths. This information can help improve segmentation decisions. By stacking several graph convolution layers, the nodes can access higher order neighbourhood information without substantial increase in computational expense. We propose two types of node connectivity in the graph adjacency: (i) one predefined and based on a regular node neighbourhood, and (ii) one dynamically computed during training and using the nearest neighbour nodes in the feature space. We have applied this method to the task of segmenting the airway tree from chest CT scans. Experiments have been performed on 32 CTs from the Danish Lung Cancer Screening Trial dataset. We evaluate the performance of the UNet-GNN models with two types of graph adjacency and compare it with the baseline UNet.

Antonio Garcia-Uceda Juarez, Raghavendra Selvan, Zaigham Saghir, Marleen de Bruijne
Automatic Fetal Brain Extraction Using Multi-stage U-Net with Deep Supervision

Fetal brain extraction is one of the most essential steps for prenatal brain MRI reconstruction and analysis. However, due to the fetal movement within the womb, it is a challenging task to extract fetal brains from sparsely-acquired imaging stacks typically with motion artifacts. To address this problem, we propose an automatic brain extraction method for fetal magnetic resonance imaging (MRI) using multi-stage 2D U-Net with deep supervision (DS U-net). Specifically, we initially employ a coarse segmentation derived from DS U-net to define a 3D bounding box for localizing the position of the brain. The DS U-net is trained with deep supervision loss to acquire more powerful discrimination capability. Then, another DS U-net focuses on the extracted region to produce finer segmentation. The final segmentation results are obtained by performing refined segmentation. We validate the proposed method on 80 stacks of training images and 43 testing stacks. The experimental results demonstrate the precision and robustness of our method with the average Dice coefficient of 91.69%, outperforming the existing methods.

Jingjiao Lou, Dengwang Li, Toan Duc Bui, Fenqiang Zhao, Liang Sun, Gang Li, Dinggang Shen
Cross-Modal Attention-Guided Convolutional Network for Multi-modal Cardiac Segmentation

To leverage the correlated information between modalities to benefit the cross-modal segmentation, we propose a novel cross-modal attention-guided convolutional network for multi-modal cardiac segmentation. In particular, we first employed the cycle-consistency generative adversarial networks to complete the bidirectional image generation (i.e., MR to CT, CT to MR) to help reduce the modal-level inconsistency. Then, with the generated and original MR and CT images, a novel convolutional network is proposed where (1) two encoders learn individual features separately and (2) a common decoder learns shareable features between modalities for a final consistent segmentation. Also, we propose a cross-modal attention module between the encoders and decoder in order to leverage the correlated information between modalities. Our model can be trained in an end-to-end manner. With extensive evaluation on the unpaired CT and MR cardiac images, our method outperforms the baselines in terms of the segmentation performance.

Ziqi Zhou, Xinna Guo, Wanqi Yang, Yinghuan Shi, Luping Zhou, Lei Wang, Ming Yang
High- and Low-Level Feature Enhancement for Medical Image Segmentation

The fully convolutional networks (FCNs) have achieved state-of-the-art performance in numerous medical image segmentation tasks. Most FCNs typically focus on fusing features in different levels to improve the learning ability to multi-scale features. In this paper, we explore an alternative direction to improve network performance by enhancing the encoding quality of high- and low-level features, so as to introduce two feature enhancement modules: (i) high-level feature enhancement module (HFE); (ii) low-level feature enhancement module (LFE). HFE utilizes attention mechanism to selectively aggregate the optimal feature information in high- and low-levels, enhancing the ability of high-level features to reconstruct accurate details. LFE aims to use global semantic information of high-level features to adaptively guide feature learning of bottom networks, so as to enhance the semantic consistency of high- and low-level features. We integrate HFE and LFE into a typical encoder-decoder network, and propose a novel medical image segmentation framework (HLF-Net). On two challenging datasets of skin lesion segmentation and spleen segmentation, we prove that the proposed modules and network can improve the performance considerably.

Huan Wang, Guotai Wang, Zhihan Xu, Wenhui Lei, Shaoting Zhang
Shape-Aware Complementary-Task Learning for Multi-organ Segmentation

Multi-organ segmentation in whole-body computed tomography (CT) is a constant pre-processing step which finds its application in organ-specific image retrieval, radiotherapy planning, and interventional image analysis. We address this problem from an organ-specific shape-prior learning perspective. We introduce the idea of complementary-task learning to enforce shape-prior leveraging the existing target labels. We propose two complementary-tasks namely (i) distance map regression and (ii) contour map detection to explicitly encode the geometric properties of each organ. We evaluate the proposed solution on the public VISCERAL dataset containing CT scans of multiple organs. We report a significant improvement of overall dice score from 0.8849 to 0.9018 due to the incorporation of complementary-task learning.

Fernando Navarro, Suprosanna Shit, Ivan Ezhov, Johannes Paetzold, Andrei Gafita, Jan C. Peeken, Stephanie E. Combs, Bjoern H. Menze
An Active Learning Approach for Reducing Annotation Cost in Skin Lesion Analysis

Automated skin lesion analysis is very crucial in clinical practice, as skin cancer is among the most common human malignancy. Existing approaches with deep learning have achieved remarkable performance on this challenging task, however, heavily relying on large-scale labelled datasets. In this paper, we present a novel active learning framework for cost-effective skin lesion analysis. The goal is to effectively select and utilize much fewer labelled samples, while the network can still achieve state-of-the-art performance. Our sample selection criteria complementarily consider both informativeness and representativeness, derived from decoupled aspects of measuring model certainty and covering sample diversity. To make wise use of the selected samples, we further design a simple yet effective strategy to aggregate intra-class images in pixel space, as a new form of data augmentation. We validate our proposed method on data of ISIC 2017 Skin Lesion Classification Challenge for two tasks. Using only up to 50% of samples, our approach can achieve state-of-the-art performances on both tasks, which are comparable or exceeding the accuracies with full-data training, and outperform other well-known active learning methods by a large margin.

Xueying Shi, Qi Dou, Cheng Xue, Jing Qin, Hao Chen, Pheng-Ann Heng
Tree-LSTM: Using LSTM to Encode Memory in Anatomical Tree Prediction from 3D Images

Extraction and analysis of anatomical trees, such as vasculatures and airways, is important for many clinical applications. However, most tracking methods so far intrinsically embedded a first-order Markovian property, where no memory beyond one tracking step was utilized in the tree extraction process. Motivated by the inherent sequential construction of anatomical trees, vis-à-vis the flow of nutrients through branches and bifurcations, we propose Tree-LSTM, the first LSTM neural network to learn to encode such sequential priors into a deep learning based tree extraction method. We also show that mathematically, by using LSTM, the variational lower bound of a higher order Markovian stochastic process could be approximated, which enables the encoding of a long term memory. Our experiments on a CT airway dataset show that, by adding the LSTM component, the results are improved by at least $$11\%$$ in mean direction prediction accuracy relative to state-of-the-art, and the correlation between bifurcation classification accuracy and evidence is improved by at least $$15\%$$ , which demonstrate the advantage of a unified deep model for sequential tree structure tracking and bifurcation detection.

Mengliu Zhao, Ghassan Hamarneh
FAIM – A ConvNet Method for Unsupervised 3D Medical Image Registration

We present a new unsupervised learning algorithm, “FAIM”, for 3D medical image registration. With a different architecture than the popular “U-net” [10], the network takes a pair of full image volumes and predicts the displacement fields needed to register source to target. Compared with “U-net” based registration networks such as VoxelMorph [2], FAIM has fewer trainable parameters but can achieve higher registration accuracy as judged by Dice score on region labels in the Mindboggle-101 dataset. Moreover, with the proposed penalty loss on negative Jacobian determinants, FAIM produces deformations with many fewer “foldings”, i.e. regions of non-invertibility where the surface folds over itself. We varied the strength of this penalty and found that FAIM is able to maintain both the advantages of higher accuracy and fewer “folding” locations over VoxelMorph, over a range of hyper-parameters. We also evaluated Probabilistic VoxelMorph [3], both in its original form and with its U-net backbone replaced with our FAIM network. We found that the choice of backbone makes little difference. The original version of FAIM outperformed Probabilistic VoxelMorph for registration accuracy, and also for invertibility if FAIM is trained using an anti-folding penalty. Code for this paper is freely available at https://github.com/dykuang/Medical-image-registration .

Dongyang Kuang, Tanya Schmah
Functional Data and Long Short-Term Memory Networks for Diagnosis of Parkinson’s Disease

Computer-aided diagnostic tools for neurodegenerative and psychiatric disease and disorders have many practical clinical applications. In this work, we propose a two-component neural network based on Long Short-Term Memory (LSTM) for the automatic diagnosis of Parkinson’s disease (PD) using whole brain resting-state functional magnetic resonance data. Given the recent findings on structural and functional asymmetry that could be observed in PDs, our proposed architecture consists of two LSTM networks that were designed to facilitate independent mining of patterns that may differ between the left and right hemispheres. Under a cross-validation framework, our proposed model achieved an F1-score of 0.701 ± 0.055, which is competitive against an F1-score of 0.677 ± 0.033 achieved by a single LSTM model.

Saurabh Garg, Martin J. McKeown
Joint Holographic Detection and Reconstruction

Lens-free holographic imaging is important in many biomedical applications, as it offers a wider field of view, more mechanical robustness and lower cost than traditional microscopes. In many cases, it is important to be able to detect biological objects, such as blood cells, in microscopic images. However, state-of-the-art object detection methods are not designed to work on holographic images. Typically, the hologram must first be reconstructed into an image of the specimen, given a priori knowledge of the distance between the specimen and sensor, and standard object detection methods can then be used to detect objects in the reconstructed image. This paper describes a method for detecting objects directly in holograms while jointly reconstructing the image. This is achieved by assuming a sparse convolutional model for the objects being imaged and modeling the diffraction process responsible for generating the recorded hologram. This paper also describes an unsupervised method for training the convolutional templates, shows that the proposed method produces promising results for detecting white blood cells in holographic images, and demonstrates that the proposed object detection method is robust to errors in estimated focal depth.

Florence Yellin, Benjamín Béjar, Benjamin D. Haeffele, Evelien Mathieu, Christian Pick, Stuart C. Ray, René Vidal
Reinforced Transformer for Medical Image Captioning

Computerized medical image report generation is of great significance in automating the workflow of medical diagnosis and treatment for reducing health disparities. However, this task presents several challenges, where the generated medical image report should be precise, coherent and contain heterogeneous information. Current deep learning based medical image captioning models rely on recurrent neural networks and only extract top-down visual features, which make them slow and prone to generate incoherent and hard to comprehend reports. To tackle this challenging problem, this paper proposes a hierarchical Transformer based medical imaging report generation model. Our proposed model consists of two parts: (1) An Image Encoder extracts heuristic visual features by a bottom-up attention mechanism; (2) a non-recurrent Captioning Decoder improves the computational efficiency by parallel computation. The former identifies regions of interest via a bottom-up attention module and extracts top-down visual features. Then the Transformer based captioning decoder generates a coherent paragraph of medical imaging report. The proposed model is trained by using a self-critical reinforcement learning method. We evaluate the proposed model on publicly available datasets of IU X-ray. The experiment results show that our proposed model has improved the performance in BLEU-1 by more than 50% compared with other state-of-the-art image captioning methods.

Yuxuan Xiong, Bo Du, Pingkun Yan
Multi-Task Convolutional Neural Network for Joint Bone Age Assessment and Ossification Center Detection from Hand Radiograph

Bone age assessment is a common clinical procedure to diagnose endocrine and metabolic disorders in children. Recently, a variety of convolutional neural network based approaches have been developed to automatically estimate bone age from hand radiographs and achieved accuracy comparable to human experts. However, most of these networks were trained end-to-end, i.e., deriving the bone age directly from the whole input hand image without knowing which regions of the image are most relevant to the task. In this work, we proposed a multi-task convolutional neural network to simultaneously estimate bone age and localize ossification centers of different phalangeal, metacarpal and carpal bones. We showed that, similar to providing attention maps, the localization of ossification centers helps the network to extract features from more meaningful regions where local appearances are closely related to the skeletal maturity. In particular, to address the problem that some ossification centers do not always appear on the hand radiographs of certain bone ages, we introduced an image-level landmark presence classification loss, in addition to the conventional pixel-level landmark localization loss, in our multi-task network framework. Experiments on public RSNA data demonstrated the effectiveness of our proposed method in the reduction of gross errors of ossification center detection, as well as the improvement of bone age assessment accuracy with the aid of ossification center detection especially when the training data size is relatively small.

Minqing Zhang, Dijia Wu, Qin Liu, Qingfeng Li, Yiqiang Zhan, Xiang Sean Zhou
Backmatter
Metadaten
Titel
Machine Learning in Medical Imaging
herausgegeben von
Heung-Il Suk
Mingxia Liu
Dr. Pingkun Yan
Chunfeng Lian
Copyright-Jahr
2019
Electronic ISBN
978-3-030-32692-0
Print ISBN
978-3-030-32691-3
DOI
https://doi.org/10.1007/978-3-030-32692-0