Skip to main content

2021 | Book

Machine Learning in Medical Imaging

12th International Workshop, MLMI 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, September 27, 2021, Proceedings

Editors: Chunfeng Lian, Xiaohuan Cao, Islem Rekik, Dr. Xuanang Xu, Dr. Pingkun Yan

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science


About this book

This book constitutes the proceedings of the 12th International Workshop on Machine Learning in Medical Imaging, MLMI 2021, held in conjunction with MICCAI 2021, in Strasbourg, France, in September 2021.*

The 71 papers presented in this volume were carefully reviewed and selected from 92 submissions. They focus on major trends and challenges in the above-mentioned area, aiming to identify new-cutting-edge techniques and their uses in medical imaging. Topics dealt with are: deep learning, generative adversarial learning, ensemble learning, sparse learning, multi-task learning, multi-view learning, manifold learning, and reinforcement learning, with their applications to medical image analysis, computer-aided detection and diagnosis, multi-modality fusion, image reconstruction, image retrieval, cellular image analysis, molecular imaging, digital pathology, etc.

*The workshop was held virtually.

Table of Contents

Contrastive Representations for Continual Learning of Fine-Grained Histology Images

We show how a simple autoencoder based deep network with a contrastive loss can effectively learn representations in a continual/incremental manner with limited labelling. This is of particular interest to the biomedical imaging research community, for whom the visual task is often a binary decision (healthy vs. disease) with limited quantity data and costly labelling. For such applications, the proposed method provides a light-weight option of 1) representing patterns with relatively few training samples using a novel collaborative contrastive loss function 2) update the autoencoder based deep network in an unsupervised fashion for continual learning for new incoming data. We overcome the drawbacks of existing methods through planned technical design, and demonstrate the efficacy of the proposed method on three histology image classification tasks (lung, colon, breast cancer) with SOTA results.

Tapabrata Chakraborti, Fergus Gleeson, Jens Rittscher
Learning Transferable 3D-CNN for MRI-Based Brain Disorder Classification from Scratch: An Empirical Study

Reliable and efficient transferability of 3D convolutional neural networks (3D-CNNs) is an important but extremely challenging issue in medical image analysis, due to small-sized samples and the domain shift problem (e.g., caused by the use of different scanners, protocols and/or subject populations in different sites/datasets). Although previous studies proposed to pretrain CNNs on ImageNet, models’ transferability is usually limited due to semantic gap between natural and medical images. In this work, we try to answer a key question: how to learn transferable 3D-CNNs from scratch based on a small (e.g., tens or hundreds) medical image dataset? We focus on the case of structural MRI-based brain disorder classification using four benchmark datasets (i.e., ADNI-1, ADNI-2, ADNI-3 and AIBL) to address this problem. (1) We explore the influence of different network architectures on model transferability, and find that appropriately deepening or widening a network can increase the transferability (e.g., with improved sensitivity). (2) We analyze the contributions of different parts of 3D-CNNs to the transferability, and verify that fine-tuning CNNs can significantly enhance the transferability. This is different from the previous finding that fine-tuning CNNs (pretrained on ImageNet) cannot improve the model transferability in 2D medical image analysis. (3) We also study the between-task transferability when a model is trained on a source task from scratch and applied to a related target task. Experimental results show that, compared to directly training CNN on related target tasks, CNN pretrained on a source task can yield significantly better performance.

Hao Guan, Li Wang, Dongren Yao, Andrea Bozoki, Mingxia Liu
Knee Cartilages Segmentation Based on Multi-scale Cascaded Neural Networks

Knee arthritis is one of the most common chronic degenerative joint diseases in the world, affecting the quality of life of a considerable part of the Modern population. Therefore, the early detection of knee arthritis is of great significance for diagnosis and treatment. Magnetic resonance imaging (MRI) is one of the most commonly used methods for evaluating joint degeneration in osteoarthritis research. In order to obtain information on knee cartilage degradation from MRI, it is necessary to segment the articular cartilage interface and cartilage surface boundary on the entire joint surface. In this work, we propose a novel cascaded network structure with an effective inception-like multi-scale module for knee joint magnetic resonance images segmentation. Compared with the baseline, a maximum of 1.6% dice score mean promotion is obtained. The code is publicly available at

Junrui Liu, Cong Hua, Liang Zhang, Ping Li, Xiaoyuan Lu
Deep PET/CT Fusion with Dempster-Shafer Theory for Lymphoma Segmentation

Lymphoma detection and segmentation from whole-body Positron Emission Tomography/Computed Tomography (PET/CT) volumes are crucial for surgical indication and radiotherapy. Designing automatic segmentation methods capable of effectively exploiting the information from PET and CT as well as resolving their uncertainty remain a challenge. In this paper, we propose an lymphoma segmentation model using an UNet with an evidential PET/CT fusion layer. Single-modality volumes are trained separately to get initial segmentation maps and an evidential fusion layer is proposed to fuse the two pieces of evidence using Dempster-Shafer theory (DST). Moreover, a multi-task loss function is proposed: in addition to the use of the Dice loss for PET and CT segmentation, a loss function based on the concordance between the two segmentation is added to constrain the final segmentation. We evaluate our proposal on a database of polycentric PET/CT volumes of patients treated for lymphoma, delineated by the experts. Our method get accurate segmentation results with Dice score of 0.726, without any user interaction. Quantitative results show that our method is superior to the state-of-the-art methods.

Ling Huang, Thierry Denœux, David Tonnelet, Pierre Decazes, Su Ruan
Interpretable Histopathology Image Diagnosis via Whole Tissue Slide Level Supervision

The deep learning methods supervised by annotating different regions of histopathology images (patch-level labels) have achieved promising outcomes in assisting pathologic diagnosis. However, most clinical data only contains label information for the whole slide image (WSI-level labels), so the methods supervised by WSI-level labels are more necessary than the ones supervised by patch-level labels. Additionally, various methods supervised by WSI-level labels ignore the contextual relations among patches extracted from a WSI, making incorrect predictions for some patches in a WSI and further misclassifying the WSI. In this paper, we propose to utilize an interpretable dual encoder network with a context-capturing RNN module to capture the contextual relations among all patches extracted from a WSI. Besides, we propose to utilize a feature attention module to weigh the importance of each patch automatically. More importantly, visualization of weight for each patch in a WSI demonstrates that our approach matches the concerns of pathologists. Furthermore, extensive experiments demonstrate the superiority of the interpretable dual encoder network.

Zhuoyue Wu, Hansheng Li, Lei Cui, Yuxin Kang, Jianye Liu, Haider Ali, Jun Feng, Lin Yang
Variational Encoding and Decoding for Hybrid Supervision of Registration Network

Various progresses have been made in improving the accuracy of deep learning-based registration. However, there are still some limitations with current methods because of 1) difficulty of acquiring supervised data, 2) challenge of optimizing image similarity and enforcing deformation regularization, and 3) small number of training samples in such an ill-posed problem. It is believed that prior knowledge about the variability of a population could be incorporated to guide the network training to overcome these limitations. In this paper, we propose a group variational decoding-based training strategy to incorporate statistical priors of deformations for network supervision. Specifically, a variational auto-encoder is employed to learn the manifold for reconstructing deformations from a group of valid samples by projecting deformations into a low dimension latent space. Valid transformations can be simulated to serve as the ground-truth for supervised learning of registration. By working alternatively with the conventional unsupervised training, our registration network can better adapt to shape variability and yield accurate and consistent deformations. Experiments on 3D brain magnetic resonance (MR) images show that our proposed method performs better in terms of registration accuracy, consistency, and topological correctness.

Dongdong Gu, Xiaohuan Cao, Guocai Liu, Zhong Xue, Dinggang Shen
Multiresolution Registration Network (MRN) Hierarchy with Prior Knowledge Learning

Deep learning has been extensively used in unsupervised deformable image registration. U-Net structures are often used to infer deformation fields from concatenated input images, and training is achieved by minimizing losses derived from image similarity and field regularization terms. However, the mechanism of multiresolution encoding and decoding with skip connections tends to mix up the spatial relationship between corresponding voxels or features. This paper proposes a multiresolution registration network (MRN) based on simple convolution layers at each resolution level and forms a framework mimicking the ideas of well-accepted traditional image registration algorithms, wherein deformations are solved at the lowest resolution and further refined level-by-level. Multiresolution image features can be directly fed into the network, and wavelet decomposition is employed to maintain rich features at low resolution. In addition, prior knowledge of deformations at the lowest resolution is modeled by kernel-PCA when the template image is fixed, and such a prior loss is employed for training at that level to better tolerate shape variability. The proposed algorithm can be directly used for group analysis or image labeling and potentially applied for registering any image pairs. We compared the performance of MRN with different settings, i.e., w/wo wavelet features, w/wo kernel-PCA losses, using brain magnetic resonance (MR) images, and the results showed better performance for the multiresolution representation and prior knowledge learning.

Dongdong Gu, Xiaohuan Cao, Guocai Liu, Dinggang Shen, Zhong Xue
Learning to Synthesize 7 T MRI from 3 T MRI with Few Data by Deformable Augmentation

High-quality magnetic resonance imaging (MRI), which is generally acquired by ultra-high field (7-Tesla, 7 T) MRI scanners, may lead to improved performance for brain disease diagnosis, such as Alzheimer’s disease (AD). However, 7 T MRI has not been widely used due to higher cost and longer scanning time. To overcome this, we proposed to utilize the generative adversarial networks (GAN)-based techniques to synthesize the 7 T scans from 3 T scans, for which, the most challenge is that we do not have enough data to learn a reliable mapping from 3 T to 7 T. To address this, we further proposed the Unlimited Data Augmentation (UDA) strategy to increase the learning samples via the deformable registration, which can produce enough paired 3 T and 7 T MR images to learning this mapping. Based on this mapping, we synthesize a 7 T MR scan for each subject in Alzheimer’s Disease Neuroimaging Initiative (ADNI), and conduct some experiments to evaluate their effect in two tasks of AD diagnosis, including AD identification and mild cognitive impairment (MCI) conversion prediction. Experimental results demonstrate that our UDA strategy is effective to learn a reliable mapping to high-quality MR images, and the synthetic 7 T scans are possible to increase the performance of AD diagnosis.

Jie Wei, Yongsheng Pan, Yong Xia, Dinggang Shen
Rethinking Pulmonary Nodule Detection in Multi-view 3D CT Point Cloud Representation

3D CT point clouds reconstructed from the original CT images are naturally represented in real-world coordinates. Compared with CT images, 3D CT point clouds contain invariant geometric features with irregular spatial distributions from multiple viewpoints. This paper rethinks pulmonary nodule detection in CT point cloud representations. We first extract the multi-view features from a sparse convolutional (SparseConv) encoder by rotating the point clouds with different angles in the world coordinate. Then, to simultaneously learn the discriminative and robust spatial features from various viewpoints, a nodule proposal optimization schema is proposed to obtain coarse nodule regions by aggregating consistent nodule proposals prediction from multi-view features. Last, the multi-level features and semantic segmentation features extracted from a SparseConv decoder are concatenated with multi-view features for final nodule region regression. Experiments on the benchmark dataset (LUNA16) demonstrate the feasibility of applying CT point clouds in lung nodule detection task. Furthermore, we observe that by combining multi-view predictions, the performance of the proposed framework is greatly improved compared to single-view, while the interior texture features of nodules from images are more suitable for detecting nodules in small sizes.

Jingya Liu, Oguz Akin, Yingli Tian
End-to-End Lung Nodule Detection Framework with Model-Based Feature Projection Block

This paper proposes novel end-to-end framework for detecting suspicious pulmonary nodules in chest CT scans. The method’s core idea is a new nodule segmentation architecture with a model-based feature projection block on three-dimensional convolutions. This block acts as a preliminary feature extractor for a two-dimensional U-Net-like convolutional network. Using the proposed approach along with an axial, coronal, and sagittal projection analysis makes it possible to abandon the widely used false positives reduction step. The proposed method achieves SOTA on LUNA2016 with 0.959 average sensitivity, and 0.936 sensitivity if the false-positive level per scan is 1/4. The paper describes the proposed approach and represents the experimental results on LUNA2016 as well as ablation studies. The code of the proposed model is available at .

Ivan Drokin, Elena Ericheva
Learning Structure from Visual Semantic Features and Radiology Ontology for Lymph Node Classification on MRI

Medical image classification (for example, lesions on MRI scans) is a very challenging task due to the complicated relationships between different lesion sub-types and expensive cost to collect high quality labelled training datasets. Graph model has been used to model the complicated relationship for medical imaging classification successfully in many previous work. However, most existing graph based models assumed the structure is known or pre-defined, and the classification performance severely depends on the pre-defined structure. To address all the problems of current graph learning models, we proposed to jointly learn the graph structure and use it for classification task in one framework. Besides imaging features, we also use the disease semantic features (learned from clinical reports), and predefined lymph node ontology graph to construct the graph structure. We evaluated our model on a T2 MRI image dataset with 821 samples and 14 types of lymph nodes. Although this dataset is very unbalanced on different types of lymph nodes, our model shows promising classification results on this challenging datasets compared to several state of art methods.

Yingying Zhu, Shuai Wang, Qingyu Chen, Sungwon Lee, Thomas Shen, Daniel C. Elton, Zhiyong Lu, Ronald M. Summers
Improving Joint Learning of Chest X-Ray and Radiology Report by Word Region Alignment

Self-supervised learning provides an opportunity to explore unlabeled chest X-rays and their associated free-text reports accumulated in clinical routine without manual supervision. This paper proposes a Joint Image Text Representation Learning Network (JoImTeRNet) for pre-training on chest X-ray images and their radiology reports. The model was pre-trained on both the global image-sentence level and the local image region-word level for visual-textual matching. Both are bidirectionally constrained on Cross-Entropy based and ranking-based Triplet Matching Losses. The region-word matching is calculated using the attention mechanism without direct supervision about their mapping. The pre-trained multi-modal representation learning paves the way for downstream tasks concerning image and/or text encoding. We demonstrate the representation learning quality by cross-modality retrievals and multi-label classifications on two datasets: OpenI-IU and MIMIC-CXR. Our code is available at .

Zhanghexuan Ji, Mohammad Abuzar Shaikh, Dana Moukheiber, Sargur N Srihari, Yifan Peng, Mingchen Gao
Cell Counting by a Location-Aware Network

The purpose of cell counting is to estimate the number of cells in microscopy images. Most popular methods obtain the cell numbers by integrating the density maps that are generated by deep cell counting networks. However, these cell counting networks that reply on estimated cell density maps may leave cell locations in a black-box. In this paper, we propose a novel cell counting network leveraging cell location information to obtain accurate cell numbers. Evaluated on four widely used cell counting datasets, our method which uses cell locations to boost the cell density map generation and cell counting, achieves superior performances compared to the state-of-the-art. The source codes will be available in our Github.

Zuhui Wang, Zhaozheng Yin
Exploring Gyro-Sulcal Functional Connectivity Differences Across Task Domains via Anatomy-Guided Spatio-Temporal Graph Convolutional Networks

One of the most prominent anatomical characteristics of the human brain lies in its highly folded cortical surface into convex gyri and concave sulci. Previous studies have demonstrated that gyri and sulci exhibit fundamental differences in terms of genetic influences, morphology and structural connectivity as well as function. Recent studies have demonstrated time-frequency differences in neural activity between gyri and sulci. However, the functional connectivity between gyri and sulci is currently unclear. Moreover, the regularity/variability of the gyro-sulcal functional connectivity across different task domains remains unknown. To address these two questions, we developed a novel anatomy-guided spatio-temporal graph convolutional network (AG-STGCN) to classify task-based fMRI (t-fMRI) and resting state fMRI (rs-fMRI) data, and to further investigate gyro-sulcal functional connectivity differences across different task domains. By performing seven independent classifications based on seven t-fMRI and one rs-fMRI datasets of 800 subjects from the Human Connectome Project, we found that the constructed gyro-sulcal functional connectivity features could satisfactorily differentiate the t-fMRI and rs-fMRI data. For those functional connectivity features contributing to the classifications, gyri played a more crucial role than sulci in both ipsilateral and contralateral neural communications across task domains. Our study provides novel insights into unveiling the functional differentiation between gyri and sulci as well as for understanding anatomo-functional relationships in the brain.

Mingxin Jiang, Shimin Yang, Zhongbo Zhao, Jiadong Yan, Yuzhong Chen, Tuo Zhang, Shu Zhang, Benjamin Becker, Keith M. Kendrick, Xi Jiang
StairwayGraphNet for Inter- and Intra-modality Multi-resolution Brain Graph Alignment and Synthesis

Synthesizing multimodality medical data provides complementary knowledge and helps doctors make precise clinical decisions. Although promising, existing multimodal brain graph synthesis frameworks have several limitations. First, they mainly tackle only one problem (intra- or inter-modality), limiting their generalizability to synthesizing inter- and intra-modality simultaneously. Second, while few techniques work on super-resolving low-resolution brain graphs within a single modality (i.e., intra), inter-modality graph super-resolution remains unexplored though this would avoid the need for costly data collection and processing. More importantly, both target and source domains might have different distributions, which causes a domain fracture between them. To fill these gaps, we propose a multi-resolution StairwayGraphNet (SG-Net) framework to jointly infer a target graph modality based on a given modality and super-resolve brain graphs in both inter and intra domains. Our SG-Net is grounded in three main contributions: (i) predicting a target graph from a source one based on a novel graph generative adversarial network in both inter (e.g., morphological-functional) and intra (e.g., functional-functional) domains, (ii) generating high-resolution brain graphs without resorting to the time consuming and expensive MRI processing steps, and (iii) enforcing the source distribution to match that of the ground truth graphs using an inter-modality aligner to relax the loss function to optimize. Moreover, we design a new Ground Truth-Preserving loss function to guide both generators in learning the topological structure of ground truth brain graphs more accurately. Our comprehensive experiments on predicting target brain graphs from source graphs using a multi-resolution stairway showed the outperformance of our method in comparison with its variants and state-of-the-art method. SG-Net presents the first work for graph alignment and synthesis across varying modalities and resolutions, which handles graph size, distribution, and structure variations. Our Python TIS-Net code is available on BASIRA GitHub at .

Islem Mhiri, Mohamed Ali Mahjoub, Islem Rekik
Multi-Feature Semi-Supervised Learning for COVID-19 Diagnosis from Chest X-Ray Images

Computed tomography (CT) and chest X-ray (CXR) have been the two dominant imaging modalities deployed for improved management of Coronavirus disease 2019 (COVID-19). Due to faster imaging, less radiation exposure, and being cost-effective CXR is preferred over CT. However, the interpretation of CXR images, compared to CT, is more challenging due to low image resolution and COVID-19 image features being similar to regular pneumonia. Computer-aided diagnosis via deep learning has been investigated to help mitigate these problems and help clinicians during the decision-making process. The requirement for a large amount of labeled data is one of the major problems of deep learning methods when deployed in the medical domain. To provide a solution to this, in this work, we propose a semi-supervised learning (SSL) approach using minimal data for training. We integrate local-phase CXR image features into a multi-feature convolutional neural network architecture where the training of SSL method is obtained with a teacher/student paradigm. Quantitative evaluation is performed on 8,851 normal (healthy), 6,045 pneumonia, and 3,795 COVID-19 CXR scans. By only using 7.06% labeled and 16.48% unlabeled data for training, 5.53% for validation, our method achieves 93.61% mean accuracy on a large-scale (70.93%) test data. We provide comparison results against fully supervised and SSL methods. The code and dataset will be made available after acceptance.

Xiao Qi, David J. Foran, John L. Nosher, Ilker Hacihaliloglu
Transfer Learning with a Layer Dependent Regularization for Medical Image Segmentation

Transfer learning is a machine learning technique where a model trained on one task is used to initialize the learning procedure of a second related task which has only a small amount of training data. Transfer learning can also be used as a regularization procedure by penalizing the learned parameters if they deviate too much from their initial values. In this study we show that the learned parameters move apart from the source task as the image processing progresses along the network layers. To cope with this behaviour we propose a transfer regularization method based on monotonically decreasing regularization coefficients. We demonstrate the power of the proposed regularized transfer learning scheme on COVID-19 opacity task. Specifically, we show that it can improve the segmentation of coronavirus lesions in chest CT scans.

Nimrod Sagie, Hayit Greenspan, Jacob Goldberger
Multi-scale Self-supervised Learning for Multi-site Pediatric Brain MR Image Segmentation with Motion/Gibbs Artifacts

Accurate tissue segmentation of large-scale pediatric brain MR images from multiple sites is essential to characterize early brain development. Due to imaging motion/Gibbs artifacts and multi-site issue (or domain shift issue), it remains a challenge to accurately segment brain tissues from multi-site pediatric MR images. In this paper, we present a multi-scale self-supervised learning (M-SSL) framework to accurately segment tissues for multi-site pediatric brain MR images with artifacts. Specifically, we first work on the downsampled images to estimate coarse tissue probabilities and build a global anatomic guidance. We then train another segmentation model based on the original images to estimate fine tissue probabilities, which are further integrated with the global anatomic guidance to refine the segmentation results. In the testing stage, to alleviate the multi-site issue, we propose an iterative self-supervised learning strategy to train a site-specific segmentation model based on a set of reliable training samples automatically generated for a to-be-segmented site. The experimental results on pediatric brain MR images with real artifacts and multi-site subjects from the iSeg-2019 challenge demonstrate that our M-SSL method achieves better performance compared with several state-of-the-art methods.

Yue Sun, Kun Gao, Weili Lin, Gang Li, Sijie Niu, Li Wang
Deep Active Learning for Dual-View Mammogram Analysis

Supervised deep learning on medical imaging requires massive manual annotations, which are expertise-needed and time-consuming to perform. Active learning aims at reducing annotation efforts by adaptively selecting the most informative samples for labeling. We propose in this paper a novel deep active learning approach for dual-view mammogram analysis, especially for breast mass segmentation and detection, where the necessity of labeling is estimated by exploiting the consistency of predictions arising from craniocaudal (CC) and mediolateral-oblique (MLO) views. Intuitively, if mass segmentation or detection is robustly performed, prediction results achieved on CC and MLO views should be consistent. Exploiting the inter-view consistency is hence a good way to guide the sampling mechanism which iteratively selects the next image pairs to be labeled by an oracle. Experiments on public DDSM-CBIS and INbreast datasets demonstrate that comparable performance with respect to fully-supervised models can be reached using only 6.83% (9.56%) of labeled data for segmentation (detection). This suggests that combining dual-view mammogram analysis and active learning can strongly contribute to the development of computer-aided diagnosis systems.

Yutong Yan, Pierre-Henri Conze, Mathieu Lamard, Heng Zhang, Gwenolé Quellec, Béatrice Cochener, Gouenou Coatrieux
Statistical Dependency Guided Contrastive Learning for Multiple Labeling in Prenatal Ultrasound

Standard plane recognition plays an important role in prenatal ultrasound (US) screening. Automatically recognizing the standard plane along with the corresponding anatomical structures in US image can not only facilitate US image interpretation but also improve diagnostic efficiency. In this study, we build a novel multi-label learning (MLL) scheme to identify multiple standard planes and corresponding anatomical structures of fetus simultaneously. Our contribution is three-fold. First, we represent the class correlation by word embeddings to capture the fine-grained semantic and latent statistical concurrency. Second, we equip the MLL with a graph convolutional network to explore the inner and outer relationship among categories. Third, we propose a novel cluster relabel-based contrastive learning algorithm to encourage the divergence among ambiguous classes. Extensive validation was performed on our large in-house dataset. Our approach reports the highest accuracy as 90.25 $$\%$$ % for standard planes labeling, 85.59 $$\%$$ % for planes and structures labeling and mAP as 94.63 $$\%$$ % . The proposed MLL scheme provides a novel perspective for standard plane recognition and can be easily extended to other medical image classification tasks.

Shuangchi He, Zehui Lin, Xin Yang, Chaoyu Chen, Jian Wang, Xue Shuang, Ziwei Deng, Qin Liu, Yan Cao, Xiduo Lu, Ruobing Huang, Nishant Ravikumar, Alejandro Frangi, Yuanji Zhang, Yi Xiong, Dong Ni
Semi-supervised Learning Regularized by Adversarial Perturbation and Diversity Maximization

In many clinical settings, a lot of medical image datasets suffer from the imbalance problem, which makes the predictions of the trained models to be biased toward majority classes. Semi-supervised Learning (SSL) algorithms trained with such imbalanced datasets become more problematic since pseudo-labels of unlabeled data are generated from the model’s biased predictions. Towards addressing this challenge, we propose a SSL framework which can effectively leverage unlabeled data for improving the performance of deep convolutional neural networks. It is a consistency-based method which exploits the unlabeled data by encouraging the prediction consistency of given input under adversarial perturbation and diversity maximization. We additionally propose to use uncertainty estimation to filter out low-quality consistency targets for the unlabeled data. We conduct comprehensive experiments to evaluate the performance of our method on two publicly available datasets, i.e., the ISIC 2018 challenge dataset for skin lesion classification and the ChestX-ray14 dataset for thorax disease classification. The experimental results demonstrated the efficacy of the present method.

Peng Liu, Guoyan Zheng
TransforMesh: A Transformer Network for Longitudinal Modeling of Anatomical Meshes

The longitudinal modeling of neuroanatomical changes related to Alzheimer’s disease (AD) is crucial for studying the progression of the disease. To this end, we introduce TransforMesh, a spatio-temporal network based on transformers that models longitudinal shape changes on 3D anatomical meshes. While transformer and mesh networks have recently shown impressive performances in natural language processing and computer vision, their application to medical image analysis has been very limited. To the best of our knowledge, this is the first work that combines transformer and mesh networks. Our results show that TransforMesh can model shape trajectories better than other baseline architectures that do not capture temporal dependencies. Moreover, we also explore the capabilities of TransforMesh in detecting structural anomalies of the hippocampus in patients developing AD.

Ignacio Sarasua, Sebastian Pölsterl, Christian Wachinger, for the Alzheimer’s Disease Neuroimaging
A Recurrent Two-Stage Anatomy-Guided Network for Registration of Liver DCE-MRI

Registration of hepatic dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) series remains challenging, due to variable uptakes of the agent on different tissues or even the same tissues in the liver. The differences reflect on the intensity variations, which typically makes traditional intensity-based deformable registration methods fail to align small anatomical structures in liver such as vessels. Although deep-learning-based registration methods have become popular because of their superior efficiency for several years, registration of DCE-MRI series with dynamic intensity change is still under tackle. To solve this challenge, we present a two-stage registration network, in which the first stage aligns the whole liver and the second stage focuses on the registration of anatomical structures like vessels and tumors. Furthermore, we adopt a recurrent registration strategy for the deformation refinement. To evaluate our proposed method, we used clinical DCE-MRI series of 60 patients, and registered the arterial phase and the portal venous phase images onto the pre-contrast phases. Experimental results showed that the proposed method achieved a better registration performance than the traditional method (i.e., SyN) and the deep-learning-based method (i.e., VoxelMorph), especially in aligning anatomical structures such as vessel branches in liver.

Wenjun Shen, Liyun Chen, Dongming Wei, Yuanfang Qiao, Yiqiang Zhan, Dinggang Shen, Qian Wang
Learning Infant Brain Developmental Connectivity for Cognitive Score Prediction

During infancy, the human brain develops rapidly in terms of structure, function and cognition. The tight connection between cognitive skills and brain morphology motivates us to focus on individual level cognitive score prediction using longitudinal structural MRI data. In the early postnatal stage, the massive brain region connections contain some intrinsic topologies, such as small-worldness and modular organization. Accordingly, graph convolutional networks can be used to incorporate different region combinations to predict the infant cognitive scores. Nevertheless, the definition of the brain region connectivity remains a problem. In this work, we propose a crafted layer, the Inter-region Connectivity Module (ICM), to effectively build brain region connections in a data-driven manner. To further leverage the critical cues hidden in the development patterns, we choose path signature as the sequential data descriptor to extract the essential dynamic information of the region-wise growth trajectories. With these region-wise developmental features and the inter-region connectivity, a novel Cortical Developmental Connectivity Network (CDC-Net) is built. Experiments on a longitudinal infant dataset within 3 time points and hundreds of subjects show our superior performance, outperforming classical machine learning based methods and deep learning based algorithms.

Yu Li, Jiale Cheng, Xin Zhang, Ruiyan Fang, Lufan Liao, Xinyao Ding, Hao Ni, Xiangmin Xu, Zhengwang Wu, Dan Hu, Weili Lin, Li Wang, John Gilmore, Gang Li
Hierarchical 3D Feature Learning forPancreas Segmentation

We propose a novel 3D fully convolutional deep network for automated pancreas segmentation from both MRI and CT scans. More specifically, the proposed model consists of a 3D encoder that learns to extract volume features at different scales; features taken at different points of the encoder hierarchy are then sent to multiple 3D decoders that individually predict intermediate segmentation maps. Finally, all segmentation maps are combined to obtain a unique detailed segmentation mask. We test our model on both CT and MRI imaging data: the publicly available NIH Pancreas-CT dataset (consisting of 82 contrast-enhanced CTs) and a private MRI dataset (consisting of 40 MRI scans). Experimental results show that our model outperforms existing methods on CT pancreas segmentation, obtaining an average Dice score of about 88%, and yields promising segmentation performance on a very challenging MRI data set (average Dice score is about 77%). Additional control experiments demonstrate that the achieved performance is due to the combination of our 3D fully-convolutional deep network and the hierarchical representation decoding, thus substantiating our architectural design.

Federica Proietto Salanitri, Giovanni Bellitto, Ismail Irmakci, Simone Palazzo, Ulas Bagci, Concetto Spampinato
Voxel-Wise Cross-Volume Representation Learning for 3D Neuron Reconstruction

Automatic 3D neuron reconstruction is critical for analysing the morphology and functionality of neurons in brain circuit activities. However, the performance of existing tracing algorithms is hinged by the low image quality. Recently, a series of deep learning based segmentation methods have been proposed to improve the quality of raw 3D optical image stacks by removing noises and restoring neuronal structures from low-contrast background. Due to the variety of neuron morphology and the lack of large neuron datasets, most of current neuron segmentation models rely on introducing complex and specially-designed submodules to a base architecture with the aim of encoding better feature representations. Though successful, extra burden would be put on computation during inference. Therefore, rather than modifying the base network, we shift our focus to the dataset itself. The encoder-decoder backbone used in most neuron segmentation models attends only intra-volume voxel points to learn structural features of neurons but neglect the shared intrinsic semantic features of voxels belonging to the same category among different volumes, which is also important for expressive representation learning. Hence, to better utilise the scarce dataset, we propose to explicitly exploit such intrinsic features of voxels through a novel voxel-level cross-volume representation learning paradigm on the basis of an encoder-decoder segmentation model. Our method introduces no extra cost during inference. Evaluated on 42 3D neuron images from BigNeuron project, our proposed method is demonstrated to improve the learning ability of the original segmentation model and further enhancing the reconstruction performance.

Heng Wang, Chaoyi Zhang, Jianhui Yu, Yang Song, Siqi Liu, Wojciech Chrzanowski, Weidong Cai
Diagnosis of Hippocampal Sclerosis from Clinical Routine Head MR Images Using Structure-constrained Super-Resolution Network

Medical images routinely acquired in clinical facilities are mostly low resolution (LR), in consideration of acquisition time and efficiency. This renders challenging for clinical diagnosis of hippocampal sclerosis where additional sequences for hippocampus need to be acquired. In contrast, high-resolution (HR) images provide more detailed information for disease investigation. Recently, image super-resolution (SR) methods were proposed to reconstruct HR images from LR inputs. However, current SR methods generally use simulated LR images and intensity constraints, which limit their applications in clinical practice. To solve this problem, we utilized real paired LR and HR images and trained a Structure-Constrained Super Resolution (SCSR) network. First, we proposed a single image super-resolution framework where mixed loss functions were introduced to enhance the reconstruction of brain tissue boundaries besides intensity constraints; Second, since the structure hippocampus is relatively small, we further proposed a weight map to enhance the reconstruction of subcortical regions. Experimental results using 642 real paired cases showed that the proposed method outperformed the the-state-of-the-art methods in terms of image quality with a PSNR of 27.0405 and an SSIM of 0.9958. Also, experiments using Radiomics features extracted from hippocampus on SR images obtained through the proposed method achieved the best accuracy of 95% for differentiating subjects with left and right hippocampal sclerosis from normal controls. The proposed method shows its potential for disease screening using clinical routine images.

Zehong Cao, Feng Shi, Qiang Xu, Gaoping Liu, Tianyang Sun, Xiaodan Xing, Yichu He, Guangming Lu, Zhiqiang Zhang, Dinggang Shen
U-Net Transformer: Self and Cross Attention for Medical Image Segmentation

Medical image segmentation remains particularly challenging for complex and low-contrast anatomical structures. In this paper, we introduce the U-Transformer network, which combines a U-shaped architecture for image segmentation with self- and cross-attention from Transformers. U-Transformer overcomes the inability of U-Nets to model long-range contextual interactions and spatial dependencies, which are arguably crucial for accurate segmentation in challenging contexts. To this end, attention mechanisms are incorporated at two main levels: a self-attention module leverages global interactions between encoder features, while cross-attention in the skip connections allows a fine spatial recovery in the U-Net decoder by filtering out non-semantic features. Experiments on two abdominal CT-image datasets show the large performance gain brought out by U-Transformer compared to U-Net and local Attention U-Nets. We also highlight the importance of using both self- and cross-attention, and the nice interpretability features brought out by U-Transformer.

Olivier Petit, Nicolas Thome, Clement Rambour, Loic Themyr, Toby Collins, Luc Soler
Pre-biopsy Multi-class Classification of Breast Lesion Pathology in Mammograms

Characterization of lesions by artificial intelligence (AI) has been the subject of extensive research. In recent years, many studies demonstrated the ability of convolution neural networks (CNNs) to successfully distinguish between malignant and benign breast lesions in mammography (MG) images. However, to date, no study has assessed the specific sub-type of lesions in MG images, as detailed in histolopathology reports. We present a method for finer classification of breast lesions in MG images into multiple pathology sub-types. Our approach works well with radiologists’ diagnostic workflow, and uses data available in radiology reports. The proposed Dual-Radiology Dual-Resolution Network (Du-Rad Du-Res Net) receives dual input from the radiologist and dual image resolutions. The radiologist input includes annotation of the lesion area and semantic radiology features; the dual image resolutions comprise a low resolution of the entire mammogram and a high resolution of the lesion area. The network estimates the likelihood of malignancy, as well as the associated pathological sub-type. We show that the combined input of the lesion region of interest (ROI) and the entire mammogram is important for optimizing the model’s performance. We tested the AI in a reader study on a dataset of 100 heldout cases. The AI outperformed three breast radiologists in the task of lesion histopathology sub-typing.

Tal Tlusty, Michal Ozery-Flato, Vesna Barros, Ella Barkan, Mika Amit, David Gruen, Michal Guindy, Tal Arazi, Mona Rozin, Michal Rosen-Zvi, Efrat Hexter
Co-segmentation of Multi-modality Spinal Image Using Channel and Spatial Attention

Clinicians usually examine and diagnose patients with multimodality images such as CT and MRI because different modality data of the same anatomical structure are often complementary. This can provide doctors with a variety of information and help doctors to make accurate diagnoses. Inspired by this, the paper proposes a novel method of collaborative spinal segmentation based on spinal CT and MRI images. We use Siam network as architecture and ResNet50 as backbone network to extract high-level semantic features and low-level detail features of two modal images at the same time. Firstly, the high-level feature is enhanced by expanding the receptive field, and then it is input into the channel and spatial attention structure to achieve the optimal combination of high-level semantic information with the help of average pooling and maximum pooling, and learn the mutual information between different modal images. The learned high-level semantic correlation of different modalities will be combined with the up-sampled low-level features for maintaining the uniqueness of their respective modality, and finally the spinal segmentation results of the two modal images will be obtained at the same time. The experimental results show that the performance of multi-modal co-segmentation is better than that of single-modal co-segmentation and ResNet50 segmentation. All codes and data described are available at: .

Yaocong Zou, Yonghong Shi
Hetero-Modal Learning and Expansive Consistency Constraints for Semi-supervised Detection from Multi-sequence Data

Lesion detection serves a critical role in early diagnosis and has been well explored in recent years due to methodological advances and increased data availability. However, the high costs of annotations hinder the collection of large and completely labeled datasets, motivating semi-supervised detection approaches. In this paper, we introduce mean teacher hetero-modal detection (MTHD), which addresses two important gaps in current semi-supervised detection. First, it is not obvious how to enforce unlabeled consistency constraints across the very different outputs of various detectors, which has resulted in various compromises being used in the state of the art. Using an anchor-free framework, MTHD formulates a mean teacher approach without such compromises, enforcing consistency on the soft-output of object centers and size. Second, multi-sequence data is often critical, e.g., for abdominal lesion detection, but unlabeled data is often missing sequences. To deal with this, MTHD incorporates hetero-modal learning in its framework. Unlike prior art, MTHD is able to incorporate an expansive set of consistency constraints that include geometric transforms and random sequence combinations. We train and evaluate MTHD on liver lesion detection using the largest MR lesion dataset to date (1099 patients with $$>5000$$ > 5000 volumes). MTHD surpasses the best fully-supervised and semi-supervised competitors by $$10.1\%$$ 10.1 % and $$3.5\%$$ 3.5 % , respectively, in average sensitivity.

Bolin Lai, Yuhsuan Wu, Xiao-Yun Zhou, Peng Wang, Le Lu, Lingyun Huang, Mei Han, Jing Xiao, Heping Hu, Adam P. Harrison
STRUDEL: Self-training with Uncertainty Dependent Label Refinement Across Domains

We propose an unsupervised domain adaptation (UDA) approach for white matter hyperintensity (WMH) segmentation, which uses Self-TRaining with Uncertainty DEpendent Label refinement (STRUDEL). Self-training has recently been introduced as a highly effective method for UDA, which is based on self-generated pseudo labels. However, pseudo labels can be very noisy and therefore deteriorate model performance. We propose to predict the uncertainty of pseudo labels and integrate it in the training process with an uncertainty-guided loss function to highlight labels with high certainty. STRUDEL is further improved by incorporating the segmentation output of an existing method in the pseudo label generation that showed high robustness for WMH segmentation. In our experiments, we evaluate STRUDEL with a standard U-Net and a modified network with a higher receptive field. Our results on WMH segmentation across datasets demonstrate the significant improvement of STRUDEL with respect to standard self-training.

Fabian Gröger, Anne-Marie Rickmann, Christian Wachinger
Deep Reinforcement Learning for L3 Slice Localization in Sarcopenia Assessment

Sarcopenia is a medical condition characterized by a reduction in muscle mass and function. A quantitative diagnosis technique consists of localizing the CT slice passing through the middle of the third lumbar area (L3) and segmenting muscles at this level. In this paper, we propose a deep reinforcement learning method for accurate localization of the L3 CT slice. Our method trains a reinforcement learning agent by incentivizing it to discover the right position. Specifically, a Deep Q-Network is trained to find the best policy to follow for this problem. Visualizing the training process shows that the agent mimics the scrolling of an experienced radiologist. Extensive experiments against other state-of-the-art deep learning based methods for L3 localization prove the superiority of our technique which performs well even with a limited amount of data and annotations.

Othmane Laousy, Guillaume Chassagnon, Edouard Oyallon, Nikos Paragios, Marie-Pierre Revel, Maria Vakalopoulou
MIST GAN: Modality Imputation Using Style Transfer for MRI

MRI entails a great amount of cost, time and effort for generation of all the modalities that are recommended for efficient diagnosis and treatment planning. Recent advancements in deep learning research show that generative models have achieved substantial improvement in the aspects of style transfer and image synthesis. In this work, we formulate generating the missing MR modality from existing MR modalities as an imputation problem using style transfer. With a multiple-to-one mapping, we model a network that accommodates domain specific styles in generating the target image. We analyse the style diversity both within and across MR modalities. Our model is tested on the BraTS’18 dataset and the results obtained are observed to be on par with the state-of-the-art in terms of visual metrics, SSIM and PSNR. After being evaluated by two expert radiologists, we show that our model is efficient, extendable, and suitable for clinical applications.

Jaya Chandra Raju, Kompella Subha Gayatri, Keerthi Ram, Rajeswaran Rangasami, Rajoo Ramachandran, Mohanasankar Sivaprakasam
Biased Extrapolation in Latent Space forImbalanced Deep Learning

Addressing class data imbalance to improve generalization on minor classes is critical in medical applications. Traditional approaches including re-weighing and re-sampling have shown the potential of the generalization but ignore statistical characteristics of classes. We study the potential and effectiveness of data extrapolation in latent space of deep learning networks to address data imbalance. We propose biased normal sample selection and latent space sample extrapolation methods for imbalanced deep learning. Two types of biases in the extrapolations are sample bias and extrapolation bias. Experimental evaluation is performed for ulcer classification in endoscopy images and Cardiomegaly detection from CXR. We show that new abnormal samples extrapolated asymmetrically from biased normal samples of low probability improve the separation between normal and abnormal classes.

Suhyeon Jeong, Seungkyu Lee
3DMeT: 3D Medical Image Transformer for Knee Cartilage Defect Assessment

While convolutional neural networks (CNNs) are dominating the area of computer-aided 3D medical image diagnosis, they are incapable of capturing global information due to the intrinsic locality of convolution. Transformers, another type of neural network empowered with self-attention mechanism, are good at representing global relations, yet computationally expensive and do not generalize well on small datasets. Applying Transformers on 3D medical images has two major problems: 1) medical 3D volumes are bigger in size than natural images which makes training process computationally impractical, 2) and 3D medical image datasets are usually smaller than natural image datasets since medical images are expensive to collect. In this paper, we propose the 3D Medical image Transformer (3DMeT) to address these two issues. 3DMeT introduces 3D convolutional layers to perform block embedding instead of the original linear embedding to cut the computational cost. Additionally, we propose a teacher-student training strategy to address the data-hungry issue by adapting convolutional layers’ weights from a CNN teacher. We conduct experiments on knee images, results demonstrate that the 3DMeT (70.2) confidently outperforms the 3DCNNs (65.3) and Vision Transformer (58.7).

Sheng Wang, Zixu Zhuang, Kai Xuan, Dahong Qian, Zhong Xue, Jia Xu, Ying Liu, Yiming Chai, Lichi Zhang, Qian Wang, Dinggang Shen
A Gaussian Process Model for Unsupervised Analysis of High Dimensional Shape Data

Applications of medical image analysis are often faced with the challenge of modelling high-dimensional data with relatively few samples. In many settings, normal or healthy samples are prevalent while pathological samples are rarer, highly diverse, and/or difficult to model. In such cases, a robust model of the normal population in the high-dimensional space can be useful for characterizing pathologies. In this context, there is utility in hybrid models, such as probabilistic PCA, which learns a low-dimensional model, commensurates with the available data, and combines it with a generic, isotropic noise model for the remaining dimensions. However, the isotropic noise model ignores the inherent correlations that are evident in so many high-dimensional data sets associated with images and shapes in medicine. This paper describes a method for estimating a Gaussian model for collections of images or shapes that exhibit underlying correlations, e.g., in the form of smoothness. The proposed method incorporates a Gaussian-process noise model within a generative formulation. For optimization, we derive a novel expectation maximization (EM) algorithm. We demonstrate the efficacy of the method on synthetic examples and on anatomical shape data.

Wenzheng Tao, Riddhish Bhalodia, Ross Whitaker
Standardized Analysis of Kidney Ultrasound Images for the Prediction of Pediatric Hydronephrosis Severity

Congenital hydronephrosis, which is the dilatation of the renal collecting system, is common in children, resolving in most and treatable in the remaining 25%. Sonography is routinely used for hydronephrosis detection and longitudinal evaluation but lacks standardization in acquisition and provides no information about kidney function. These facts make the visual assessment of hydronephrosis from ultrasound subjective and variable. In this paper, we present an automatic method to standardize the analysis of the kidney regions in sonograms for the quantification of hydronephrosis severity as well as the prediction of obstruction. First, the field-of-view in images is standardized by segmenting the kidney regions using convolutional neural networks and reorienting them along their longest axes in the coronal view. Then, the core areas of the kidney containing the pelvis and calyces are identified by correlation analysis. Each standardized kidney image slice is evaluated using a deep learning-based approach to predict the obstruction severity, and the slice-based predictive scores are fused based on a weighted-voting technique to determine the final risk score. The performance of the method was evaluated on 54 hydronephrotic kidneys with known clinical outcome. Results show that our method could automatically predict the obstruction severity with an average accuracy of 0.83, a significant improvement over the common clinical approach (p-value < 0.001). Our method has the potential to predict kidney function from routine ultrasound evaluation.

Pooneh Roshanitabrizi, Jonathan Zember, Bruce Michael Sprague, Steven Hoefer, Ramon Sanchez-Jacob, James Jago, Dorothy Bulas, Hans G. Pohl, Marius George Linguraru
Automated Deep Learning-Based Detection of Osteoporotic Fractures in CT Images

Automating opportunistic screening of osteoporotic fractures in computed tomography (CT) images could reduce the underdiagnosis of vertebral fractures. In this work, we present and evaluate an end-to-end pipeline for the detection of osteoporotic compression fractures of the vertebral body in CT images. The approach works in 2 steps: First, a hierarchical neural network detects and identifies all vertebrae that are visible in the field of view. Second, a feed-forward convolutional neural network is applied to patches containing single vertebrae to decide if an osteoporotic fracture is present or not. The maximum of the classifier’s output scores then allows to classify if there is at least one fractured vertebra in the image. On a per-patient basis our pipeline classifies 145 CT images—annotated by an experienced musculoskeletal radiologist—with a sensitivity of 0.949 and a specificity of 0.815 regarding the presence of osteoporotic fractures. The fracture classifier even distinguishes grade 1 deformities from grade 1 osteoporotic fractures with an area under the ROC-curve of 0.742, a task potentially challenging even for human experts. Our approach demonstrates robust and accurate diagnostic performance and thus could be applied to opportunistic screening.

Eren Bora Yilmaz, Christian Buerger, Tobias Fricke, Md Motiur Rahman Sagar, Jaime Peña, Cristian Lorenz, Claus-Christian Glüer, Carsten Meyer
GT U-Net: A U-Net Like Group Transformer Network for Tooth Root Segmentation

To achieve an accurate assessment of root canal therapy, a fundamental step is to perform tooth root segmentation on oral X-ray images, in that the position of tooth root boundary is significant anatomy information in root canal therapy evaluation. However, the fuzzy boundary makes the tooth root segmentation very challenging. In this paper, we propose a novel end-to-end U-Net like Group Transformer Network (GT U-Net) for the tooth root segmentation. The proposed network retains the essential structure of U-Net but each of the encoders and decoders is replaced by a group Transformer, which significantly reduces the computational cost of traditional Transformer architectures by using the grouping structure and the bottleneck structure. In addition, the proposed GT U-Net is composed of a hybrid structure of convolution and Transformer, which makes it independent of pre-training weights. For optimization, we also propose a shape-sensitive Fourier Descriptor (FD) loss function to make use of shape prior knowledge. Experimental results show that our proposed network achieves the state-of-the-art performance on our collected tooth root segmentation dataset and the public retina dataset DRIVE. Code has been released at .

Yunxiang Li, Shuai Wang, Jun Wang, Guodong Zeng, Wenjun Liu, Qianni Zhang, Qun Jin, Yaqi Wang
Information Bottleneck Attribution for Visual Explanations of Diagnosis and Prognosis

Visual explanation methods have an important role in the prognosis of the patients where the annotated data is limited or unavailable. There have been several attempts to use gradient-based attribution methods to localize pathology from medical scans without using segmentation labels. This research direction has been impeded by the lack of robustness and reliability. These methods are highly sensitive to the network parameters. In this study, we introduce a robust visual explanation method to address this problem for medical applications. We provide an innovative visual explanation algorithm for general purpose and as an example application we demonstrate its effectiveness for quantifying lesions in the lungs caused by the Covid-19 with high accuracy and robustness without using dense segmentation labels. This approach overcomes the drawbacks of commonly used Grad-CAM and its extended versions. The premise behind our proposed strategy is that the information flow is minimized while ensuring the classifier prediction stays similar. Our findings indicate that the bottleneck condition provides a more stable severity estimation than the similar attribution methods. The source code will be publicly available upon publication.

Ugur Demir, Ismail Irmakci, Elif Keles, Ahmet Topcu, Ziyue Xu, Concetto Spampinato, Sachin Jambawalikar, Evrim Turkbey, Baris Turkbey, Ulas Bagci
Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling

Labeling vertebral discs from MRI scans is important for the proper diagnosis of spinal related diseases, including multiple sclerosis, amyotrophic lateral sclerosis, degenerative cervical myelopathy and cancer. Automatic labeling of the vertebral discs in MRI data is a difficult task because of the similarity between discs and bone area, the variability in the geometry of the spine and surrounding tissues across individuals, and the variability across scans (manufacturers, pulse sequence, image contrast, resolution and artefacts). In previous studies, vertebral disc labeling is often done after a disc detection step and mostly fails when the localization algorithm misses discs or has false positive detection. In this work, we aim to mitigate this problem by reformulating the semantic vertebral disc labeling using the pose estimation technique. To do so, we propose a stacked hourglass network with multi-level attention mechanism to jointly learn intervertebral disc position and their skeleton structure. The proposed deep learning model takes into account the strength of semantic segmentation and pose estimation technique to handle the missing area and false positive detection. To further improve the performance of the proposed method, we propose a skeleton-based search space to reduce false positive detection. The proposed method evaluated on spine generic public multi-center dataset and demonstrated better performance comparing to previous work, on both T1w and T2w contrasts. The method is implemented in ivadomed ( ) .

Reza Azad, Lucas Rouhier, Julien Cohen-Adad
TED-Net: Convolution-Free T2T Vision Transformer-Based Encoder-Decoder Dilation Network for Low-Dose CT Denoising

Low dose computed tomography (CT) is a mainstream for clinical applications. However, compared to normal dose CT, in the low dose CT (LDCT) images, there are stronger noise and more artifacts which are obstacles for practical applications. In the last few years, convolution-based end-to-end deep learning methods have been widely used for LDCT image denoising. Recently, transformer has shown superior performance over convolution with more feature interactions. Yet its applications in LDCT denoising have not been fully cultivated. Here, we propose a convolution-free T2T vision transformer-based Encoder-decoder Dilation Network (TED-Net) to enrich the family of LDCT denoising algorithms. The model is free of convolution blocks and consists of a symmetric encoder-decoder block with sole transformer. Our model (Codes are available at ) is evaluated on the AAPM-Mayo clinic LDCT Grand Challenge dataset, and results show outperformance over the state-of-the-art denoising methods.

Dayang Wang, Zhan Wu, Hengyong Yu
Self-supervised Mean Teacher for Semi-supervised Chest X-Ray Classification

The training of deep learning models generally requires a large amount of annotated data for effective convergence and generalisation. However, obtaining high-quality annotations is a laboursome and expensive process due to the need of expert radiologists for the labelling task. The study of semi-supervised learning in medical image analysis is then of crucial importance given that it is much less expensive to obtain unlabelled images than to acquire images labelled by expert radiologists. Essentially, semi-supervised methods leverage large sets of unlabelled data to enable better training convergence and generalisation than using only the small set of labelled images. In this paper, we propose Self-supervised Mean Teacher for Semi-supervised (S $$^2$$ 2 MTS $$^2$$ 2 ) learning that combines self-supervised mean-teacher pre-training with semi-supervised fine-tuning. The main innovation of S $$^2$$ 2 MTS $$^2$$ 2 is the self-supervised mean-teacher pre-training based on the joint contrastive learning, which uses an infinite number of pairs of positive query and key features to improve the mean-teacher representation. The model is then fine-tuned using the exponential moving average teacher framework trained with semi-supervised learning. We validate S $$^2$$ 2 MTS $$^2$$ 2 on the multi-label classification problems from Chest X-ray14 and CheXpert, and the multi-class classification from ISIC2018, where we show that it outperforms the previous SOTA semi-supervised learning methods by a large margin. Our code will be available upon paper acceptance.

Fengbei Liu, Yu Tian, Filipe R. Cordeiro, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro
VoxelEmbed: 3D Instance Segmentation and Tracking with Voxel Embedding based Deep Learning

Recent advances in bioimaging have provided scientists a superior high spatial-temporal resolution to observe dynamics of living cells as 3D volumetric videos. Unfortunately, the 3D biomedical video analysis is lagging, impeded by resource insensitive human curation using off-the-shelf 3D analytic tools. Herein, biologists often need to discard a considerable amount of rich 3D spatial information by compromising on 2D analysis via maximum intensity projection. Recently, pixel embedding based cell instance segmentation and tracking provided a neat and generalizable computing paradigm for understanding cellular dynamics. In this work, we propose a novel spatial-temporal voxel-embedding (VoxelEmbed) based learning method to perform simultaneous cell instance segmenting and tracking on 3D volumetric video sequences. Our contribution is in four-fold: (1) The proposed voxel embedding generalizes the pixel embedding with 3D context information; (2) Present a simple multi-stream learning approach that allows effective spatial-temporal embedding; (3) Accomplished an end-to-end framework for one-stage 3D cell instance segmentation and tracking without heavy parameter tuning; (4) The proposed 3D quantification is memory efficient via a single GPU with 12 GB memory. We evaluate our VoxelEmbed method on four 3D datasets (with different cell types) from the ISBI Cell Tracking Challenge. The proposed VoxelEmbed method achieved consistent superior overall performance (OP) on two densely annotated datasets. The performance is also competitive on two sparsely annotated cohorts with 20.6 $$\%$$ % and 2 $$\%$$ % of data-set having segmentation annotations. The results demonstrate that the VoxelEmbed method is a generalizable and memory-efficient solution.

Mengyang Zhao, Quan Liu, Aadarsh Jha, Ruining Deng, Tianyuan Yao, Anita Mahadevan-Jansen, Matthew J. Tyska, Bryan A. Millis, Yuankai Huo
Using Spatio-Temporal Correlation Based Hybrid Plug-and-Play Priors (SEABUS) for Accelerated Dynamic Cardiac Cine MRI

The plug-and-play prior (P $$^{3}$$ 3 ) is known as denoising prior which has been successfully applied to various imaging problems. In this work for accelerated dynamic cardiac cine magnetic resonance imaging (Dcc-MRI), we introduce a Spatio-tEmporal correlAtion based hyBrid plUg-and-play priorS (SEABUS) integrating local P $$^{3}$$ 3 and nonlocal P $$^{3}$$ 3 , which further help both suppress aliasing artifacts and capture dynamic features. Specifically, the local P $$^{3}$$ 3 enforces the pixel-wise edge-orientation consistency by reference frame guided multi-scale orientation projection (MSOP) in a subset of few adjacent frames. The nonlocal P $$^{3}$$ 3 constrains the cube-wise anatomic-structure similarity by cube matching and 4D filtering (CM4D) in all frames. By composite splitting algorithm (CSA), the SEABUS is coupled into fast iterative shrinkage-thresholding algorithm (FISTA) and then a new Dcc-MRI approach that is named as SEABUS-FCSA is proposed. The experimental results on the in-vivo cardiac MR datasets demonstrated the efficiency and potential of the proposed SEABUS-FCSA approach.

Qingyong Zhu, Dong Liang
Window-Level Is a Strong Denoising Surrogate

CT image quality is heavily reliant on radiation dose, which causes a trade-off between radiation dose and image quality that affects the subsequent image-based diagnostic performance. However, high radiation can be harmful to both patients and operators. Several (deep learning-based) approaches have been attempted to denoise low dose images. However, those approaches require access to large training sets, specifically the full dose CT images for reference, which can often be difficult to obtain. Self-supervised learning is an emerging alternative for lowering the reference data requirement facilitating unsupervised learning. Currently available self-supervised CT denoising works are either dependent on foreign domains or pretexts that are not very task-relevant. To tackle the aforementioned challenges, we propose a novel self-supervised learning approach, namely Self-Supervised Window-Leveling for Image DeNoising (SSWL-IDN), leveraging an innovative, task-relevant, simple, yet effective surrogate—prediction of the window-leveled equivalent. SSWL-IDN leverages residual learning and a hybrid loss combining perceptual loss and MSE, all incorporated in a VAE framework. Our extensive (in- and cross-domain) experimentation demonstrates the effectiveness of SSWL-IDN in aggressive denoising of CT (abdomen and chest) images acquired at 5% dose level only (Code available at ).

Ayaan Haque, Adam Wang, Abdullah-Al-Zubaer Imran
Cardiovascular Disease Risk Improves COVID-19 Patient Outcome Prediction

The pandemic of coronavirus disease 2019 (COVID-19) has severely impacted the world. Several studies suggest an increased risk for COVID-19 patients with underlying cardiovascular diseases (CVD). However, it is challenging to quantify such risk factors and integrate them into patient condition evaluation. This paper presents machine learning methods to assess CVD risk scores from chest computed tomography together with laboratory data, demographics, and deep learning extracted lung imaging features to increase the outcome prediction accuracy for COVID-19 patients. The experimental results demonstrate an overall increase in prediction performance when the CVD severity score was added to the feature set. The machine learning methods obtained their best performance when all categories of the features were used for the patient outcome prediction. With the best attained area under the curve of 0.888, the presented research may assist physicians in clinical decision-making process on managing COVID-19 patients.

Diego Machado Reyes, Hanqing Chao, Fatemeh Homayounieh, Juergen Hahn, Mannudeep K. Kalra, Pingkun Yan
Self-supervision Based Dual-Transformation Learning for Stain Normalization, Classification andSegmentation

Stain color variation s across images are common in the medical imaging domain. However, such variations among the training and test datasets may lead to unsatisfactory performance on the latter in any desired task. This paper proposes a novel coupled-network composed of two U-Net type architectures that utilize self-supervised learning. The first subnetwork (N1) learns an identity transformation, while the second (N2) learns a transformation to perform stain normalization. We also introduce classification heads in the subnetworks, trained along with the stain normalization task. To the best of our knowledge, the proposed coupling framework, where the information from the encoders of both the subnetworks is utilized by the decoders of both subnetworks as well as trained in a coupled fashion, is introduced in this domain for the first time. Interestingly, the coupling of N1 (for identity transformation) and N2 (for stain normalization) helps N2 learn the stain normalization task while being cognizant of the features essential to reconstruct images. Similarly, N1 learns to extract relevant features for reconstruction invariant to stain color variations due to its coupling with N2. Thus, the two subnetworks help each other, leading to improved performance on the subsequent task of classification. Further, it is shown that the proposed architecture can also be used for segmentation, making it applicable for all three applications: stain normalization, classification, and segmentation. Experiments are carried out on four datasets to show the efficacy of the proposed architecture.

Shiv Gehlot, Anubha Gupta
Deep Representation Learning for Image-Based Cell Profiling

High-content, microscopic image-based screening data are widely used in cell profiling to characterize cell phenotype diversity and extract explanatory biomarkers differentiating cell phenotypes induced by experimental perturbations or disease states. In recent years, high-throughput manifold embedding techniques such as t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection (UMAP), and generative networks have been increasingly applied to interpret cell profiling data. However, the resulting representations may not exploit the full image information, as these techniques are typically applied to quantitative image features defined by human experts. Here we propose a novel framework to analyze cell profiling data, based on two-stage deep representation learning using variational autoencoders (VAEs). We present quantitative and qualitative evaluations of the learned cell representations on two datasets. The results show that our framework can yield better representations than the currently popular methods. Also, our framework provides researchers with a more flexible tool to analyze underlying cell phenotypes and interpret the automatically defined cell features effectively.

Wenzhao Wei, Sacha Haidinger, John Lock, Erik Meijering
Detecting Extremely Small Lesions in Mouse Brain MRI with Point Annotations via Multi-task Learning

Detection of small lesions in magnetic resonance imaging (MRI) images is one of the most challenging tasks. Compared with detection in natural images, small lesion detection in MRI images faces two major problems: First, small lesions only occupy a small fraction of voxels within an image, yielding insufficient features and information for them to be distinguished from the surrounding tissues. Second, an accurate outline of these small lesions manually is time-consuming and inefficient even for medical experts in pathology. Hence, existing methods cannot accurately detect lesions with such a limited amount of information. To solve these problems, we propose a novel multi-task convolutional neural network (CNN), which simultaneously performs regression of lesion number and detection of lesion location. Both lesion number and location can be obtained through point annotations, which is much easier and efficient than a full segmentation of lesion manually. We use an encoder-decoder structure that outputs a distance map of each pixel to the nearest lesion centers. Additionally, a regression branch is added after the encoder to learn the counting of lesion numbers, thus providing an extra regularization. Note that these two tasks share the same encoder weights. We demonstrate that our model enables the counting and locating of extremely small lesions within 3–5 voxels (300 × 300 voxels per image) with a recall of 72.66% on a large mouse brain MRI image dataset (more than 1000 images), and outperforms other methods.

Xiaoyang Han, Yuting Zhai, Ziqi Yu, Tingying Peng, Xiao-Yong Zhang
Morphology-Guided Prostate MRI Segmentation with Multi-slice Association

Prostate segmentation from magnetic resonance (MR) images plays an important role in prostate cancer diagnosis and treatment. Previous works typically overlooked large variations of prostate shapes, especially on the boundary area. Furthermore, the small glandular areas at ending slices also make the task very challenging. To overcome these problems, this paper presents a two-stage framework that explicitly utilizes prostate morphological representations (e.g., point, boundary) to accurately localize the prostate region with a coarse volumetric segmentation. Based on the 3D coarse outputs of the first stage, a 2D segmentation network with multi-slice association is further introduced to produce more reliable and accurate segmentation, due to large slice thickness in prostate MR images. Besides, several novel loss functions are further designed to enhance the consistency of prostate boundaries. Extensive experiments on large prostate MRI dataset show superior performance of our proposed method compared to several state-of-the-art methods.

Jianping Li, Zhiming Cui, Shuai Wang, Jie Wei, Jun Feng, Shu Liao, Dinggang Shen
Unsupervised Cross-modality Cardiac Image Segmentation via Disentangled Representation Learning and Consistency Regularization

Deep neural networks based approaches for medical image segmentation rely heavily on the availability of large amount of annotated data, which sometimes is difficult to obtain due to time, logistic effort and the requirement of expertise knowledge. Unpaired image translation enables a cross-modality segmentation network to be trained in an annotation-poor target domain by leveraging an annotation-rich source domain but most existing methods separate the image translation stage from the image segmentation stage and are not trained end-to-end. In this paper, we propose an end-to-end unsupervised cross-modality cardiac image segmentation method, taking advantage of diverse image translation via disentangled representation learning and consistency regularization in one network. Different from learning one-to-one mapping, our method characterizes the complex relationship between domains as many-to-many mapping. A novel diverse inter-domain semantic consistency loss is then proposed to regularize the cross-modality segmentation process. We additionally introduce an intra-domain semantic consistency loss to encourage the segmentation consistency between the original input and the image after cross-cycle reconstruction. We conduct comprehensive experiments on two publicly available datasets to evaluate the effectiveness of the proposed method. The experimental results demonstrate the efficacy of the present approach.

Runze Wang, Guoyan Zheng
Landmark-Guided Rigid Registration for Temporomandibular Joint MRI-CBCT Images with Large Field-of-View Difference

Fused MRI-CBCT images provide desirable complementary information of the articular disc and condyle surface for optimum diagnosis, has been shown to be accurate and reliable in Temporomandibular Disorders (TMD) assessment. But field-of-view difference between multi-modality images brings challenges to conventional registration algorithms. In this paper, we proposed a landmark-guided learning method for Temporomandibular Joint (TMJ) MRI-CBCT images registration. First, end-to-end landmark localization network was used to detect correspondence landmark pairs in the different modality images to generate the landmark guidance information. Then taking image patches centered landmarks as input, an unsupervised learning network regresses the rigid transformation matrix using mutual information as a measure of similarity between image patches. Finally combined landmarks coordinates with the rigid transformation matrix, the whole image registration can be realized. Experiment results demonstrate that our approach achieves better overall performance on registration of images from different patients and modalities with 100x speed-up in execution time.

Jupeng Li, Yinghui Wang, Shuai Wang, Kai Zhang, Gang Li
Spine-Rib Segmentation and Labeling via Hierarchical Matching and Rib-Guided Registration

Accurate segmentation and labeling of spine-rib are of great importance for clinical spine and rib diagnosis and treatment. In clinical applications, the spine-rib segmentation and labeling are often challenging, as the shape and appearance of vertebrae are complicated. Previous segmentation and labeling methods usually face considerable difficulties when coping with spine CT images with abnormal curvature spines and implanted metal. In this paper, we propose a multi-stage spine-rib segmentation and labeling method that can be applied to various spine-rib CT images. Our proposed method consists of three steps. First, a 3D U-Net is used to obtain a initial segmentation mask of the spine and rib. Then, the subject information, including gender, age, and the shape of the spine and rib, is used for hierarchically selecting the templates with similar physiological structures from the pre-constructed template library. Finally, the segmentation mask and label from the templates are transferred to the subject via rib-guided registration to achieve correction of the initial results. We evaluated the proposed method on a clinical dataset, and obtained significantly better and robust performance than the state-of-the-art method.

Caiwen Jiang, Zhiming Cui, Dongming Wei, Yuhang Sun, Jiameng Liu, Jie Wei, Qun Chen, Dijia Wu, Dinggang Shen
Multi-scale Segmentation Network for Rib Fracture Classification from CT Images

As the most common thoracic trauma, rib fracture classification is essential for clinical evaluation and treatment planning. However, it is challenging for manual identification and classification, due to the tiny size and blurriness of rib fracture in CT images. For automatic classification of rib fractures, conventional methods using hand-crafted features are low in robustness and generalizability. Though previous deep learning-based method shows improved the performance, they empirically normalized all fractures using one size, which ended up in alteration of fracture patterns. Moreover, these methods mainly employed macroscale features with little attention to details, which degrades the classification accuracy, as rib fracture type is essentially determined by tiny fracture details. To address all these issues, we propose a novel framework to classify rib fractures, where we first introduce a multi-scale network to integrate multiple sizes of fractures to minimize size alteration, and further formulate fracture classification problem as a segmentation problem to enforce network attention to tiny fracture details, so as to increase the classification accuracy. Our method has been evaluated on a large dataset (with 53045 cases) with four types of fractures, including acute displaced fracture, acute non-displaced fracture, acute buckle fracture, and chronic fracture. The results are compared with state-of-the-art methods, which suggest that our proposed method achieves the best performance. The capability of our multi-scale segmentation strategy is also verified by experimental results, especially in handling huge size variation of rib fractures during fracture classification.

Jiameng Liu, Zhiming Cui, Yuhang Sun, Caiwen Jiang, Zirong Chen, Hao Yang, Yuyao Zhang, Dijia Wu, Dinggang Shen
Knowledge-Guided Multiview Deep Curriculum Learning for Elbow Fracture Classification

Elbow fracture diagnosis often requires patients to take both frontal and lateral views of elbow X-ray radiographs. In this paper, we propose a multiview deep learning method for an elbow fracture subtype classification task. Our strategy leverages transfer learning by first training two single-view models, one for frontal view and the other for lateral view, and then transferring the weights to the corresponding layers in the proposed multiview network architecture. Meanwhile, quantitative medical knowledge was integrated into the training process through a curriculum learning framework, which enables the model to first learn from “easier” samples and then transition to “harder” samples to reach better performance. In addition, our multiview network can work both in a dual-view setting and with a single view as input. We evaluate our method through extensive experiments on a classification task of elbow fracture with a dataset of 1,964 images. Results show that our method outperforms two related methods on bone fracture study in multiple settings, and our technique is able to boost the performance of the compared methods. The code is available at .

Jun Luo, Gene Kitamura, Dooman Arefan, Emine Doganay, Ashok Panigrahy, Shandong Wu
Contrastive Learning of Single-Cell Phenotypic Representations for Treatment Classification

Learning robust representations to discriminate cell phenotypes based on microscopy images is important for drug discovery. Drug development efforts typically analyse thousands of cell images to screen for potential treatments. Early works focus on creating hand-engineered features from these images or learn such features with deep neural networks in a fully or weakly-supervised framework. Both require prior knowledge or labelled datasets. Therefore, subsequent works propose unsupervised approaches based on generative models to learn these representations. Recently, representations learned with self-supervised contrastive loss-based methods have yielded state-of-the-art results on various imaging tasks compared to earlier unsupervised approaches. In this work, we leverage a contrastive learning framework to learn appropriate representations from single-cell fluorescent microscopy images for the task of Mechanism-of-Action classification. The proposed work is evaluated on the annotated BBBC021 dataset, and we obtain state-of-the-art results in NSC, NCSB and drop metrics for an unsupervised approach. We observe an improvement of 10% in NCSB accuracy and 11% in NSC-NSCB drop over the previously best unsupervised method. Moreover, the performance of our unsupervised approach ties with the best supervised approach. Additionally, we observe that our framework performs well even without post-processing, unlike earlier methods. With this, we conclude that one can learn robust cell representations with contrastive learning. We make the code available on GitHub ( ).

Alexis Perakis, Ali Gorji, Samriddhi Jain, Krishna Chaitanya, Simone Rizza, Ender Konukoglu
CorLab-Net: Anatomical Dependency-Aware Point-Cloud Learning for Automatic Labeling of Coronary Arteries

Automatic coronary artery labeling is essential yet challenging step in coronary artery disease diagnosis for clinician. Previous methods typically overlooked rich relationships with heart chamber and also morphological features of coronary artery. In this paper, we propose a novel point-cloud learning method (called CorLab-Net), which comprehensively captures both inter-organ and intra-artery spatial dependencies as explicit guidance to assist the labeling of these challenging coronary vessels. Specifically, given a 3D point cloud extracted from the segmented coronary artery, our CorLab-Net improves artery labeling from three aspects: First, it encodes the inter-organ anatomical dependency between vessels and heart chambers (in terms of spatial distance field) to effectively locate the blood vessels. Second, it extracts the intra-artery anatomical dependency between vessel points and key joint points (in terms of morphological distance field) to precisely identify different vessel branches at the junctions. Third, it enhances the intra-artery local dependency between neighboring points (by using graph convolutional modules) to correct labeling outliers and improve consistency, especially at the vascular endings. We evaluated our method on a real-clinical dataset. Extensive experiments show that CorLab-Net significantly outperformed the state-of-the-art methods in labeling coronary arteries with large appearance-variance.

Xiao Zhang, Zhiming Cui, Jun Feng, Yanli Song, Dijia Wu, Dinggang Shen
A Hybrid Deep Registration of MR Scans to Interventional Ultrasound for Neurosurgical Guidance

Despite the recent advances in image-guided neurosurgery, reliable and accurate estimation of the brain shift still remains one of the key challenges. In this paper, we propose an automated multimodal deformable registration method using hybrid learning-based and classical approaches to improve neurosurgical procedures. Initially, the moving and fixed images are aligned using classical affine transformation (MINC toolkit), and then the result is provided to the convolutional neural network, which predicts the deformation field using backpropagation. Subsequently, the moving image is transformed using the resultant deformation into a moved image. Our model was evaluated on two publicly available datasets: the retrospective evaluation of cerebral tumors (RESECT) and brain images of tumors for evaluation (BITE). The mean target registration errors have been reduced from 5.35 ± 4.29 to 0.99 ± 0.22 mm in the RESECT and from 4.18 ± 1.91 to 1.68 ± 0.65 mm in the BITE. Experimental results showed that our method improved the state-of-the-art in terms of both accuracy and runtime speed (170 ms on average). Hence, the proposed method provides a fast runtime for 3D MRI to intra-operative US pair in a GPU-based implementation, which shows a promise for its applicability in assisting the neurosurgical procedures compensating for brain shift.

Ramy A. Zeineldin, Mohamed E. Karar, Franziska Mathis-Ullrich, Oliver Burgert
Segmentation of Peripancreatic Arteries in Multispectral Computed Tomography Imaging

Pancreatic ductal adenocarcinoma is an aggressive form of cancer with a poor prognosis, where the operability and hence chance of survival is strongly affected by the tumor infiltration of the arteries. In an effort to enable an automated analysis of the relationship between the local arteries and the tumor, we propose a method for segmenting the peripancreatic arteries in multispectral CT images in the arterial phase. A clinical dataset was collected, and we designed a fast semi-manual annotation procedure, which requires around 20 min of annotation time per case. Next, we trained a U-Net based model to perform binary segmentation of the peripancreatic arteries, where we obtained a near perfect segmentation with a Dice score of $$95.05\%$$ 95.05 % in our best performing model. Furthermore, we designed a clinical evaluation procedure for our models; performed by two radiologists, yielding a complete segmentation of $$85.31\%$$ 85.31 % of the clinically relevant arteries, thereby confirming the clinical relevance of our method.

Alina Dima, Johannes C. Paetzold, Friederike Jungmann, Tristan Lemke, Philipp Raffler, Georgios Kaissis, Daniel Rueckert, Rickmer Braren
SkullEngine: A Multi-stage CNN Framework for Collaborative CBCT Image Segmentation and Landmark Detection

Accurate bone segmentation and landmark detection are two essential preparation tasks in computer-aided surgical planning for patients with craniomaxillofacial (CMF) deformities. Surgeons typically have to complete the two tasks manually, spending $$\sim $$ ∼ 12 h for each set of CBCT or $$\sim $$ ∼ 5 h for CT. To tackle these problems, we propose a multi-stage coarse-to-fine CNN-based framework, called SkullEngine, for high-resolution segmentation and large-scale landmark detection through a collaborative, integrated, and scalable JSD model and three segmentation and landmark detection refinement models. We evaluated our framework on a clinical dataset consisting of 170 CBCT/CT images for the task of segmenting 2 bones (midface and mandible) and detecting 175 clinically common landmarks on bones, teeth, and soft tissues. Experimental results show that SkullEngine significantly improves segmentation quality, especially in regions where the bone is thin. In addition, SkullEngine also efficiently and accurately detect all of the 175 landmarks. Both tasks were completed simultaneously within 3 min regardless of CBCT or CT with high segmentation quality. Currently, SkullEngine has been integrated into a clinical workflow to further evaluate its clinical efficiency.

Qin Liu, Han Deng, Chunfeng Lian, Xiaoyang Chen, Deqiang Xiao, Lei Ma, Xu Chen, Tianshu Kuang, Jaime Gateno, Pew-Thian Yap, James J. Xia
Skull Segmentation from CBCT Images via Voxel-Based Rendering

Skull segmentation from three-dimensional (3D) cone-beam computed tomography (CBCT) images is critical for the diagnosis and treatment planning of the patients with craniomaxillofacial (CMF) deformities. Convolutional neural network (CNN)-based methods are currently dominating volumetric image segmentation, but these methods suffer from the limited GPU memory and the large image size (e.g., 512 $$\times $$ × 512 $$\times $$ × 448). Typical ad-hoc strategies, such as down-sampling or patch cropping, will degrade segmentation accuracy due to insufficient capturing of local fine details or global contextual information. Other methods such as Global-Local Networks (GLNet) are focusing on the improvement of neural networks, aiming to combine the local details and the global contextual information in a GPU memory-efficient manner. However, all these methods are operating on regular grids, which are computationally inefficient for volumetric image segmentation. In this work, we propose a novel VoxelRend-based network (VR-U-Net) by combining a memory-efficient variant of 3D U-Net with a voxel-based rendering (VoxelRend) module that refines local details via voxel-based predictions on non-regular grids. Establishing on relatively coarse feature maps, the VoxelRend module achieves significant improvement of segmentation accuracy with a fraction of GPU memory consumption. We evaluate our proposed VR-U-Net in the skull segmentation task on a high-resolution CBCT dataset collected from local hospitals. Experimental results show that the proposed VR-U-Net yields high-quality segmentation results in a memory-efficient manner, highlighting the practical value of our method.

Qin Liu, Chunfeng Lian, Deqiang Xiao, Lei Ma, Han Deng, Xu Chen, Dinggang Shen, Pew-Thian Yap, James J. Xia
Alzheimer’s Disease Diagnosis via Deep Factorization Machine Models

The current state-of-the-art deep neural networks (DNNs) for Alzheimer’s Disease diagnosis use different biomarker combinations to classify patients, but do not allow extracting knowledge about the interactions of biomarkers. However, to improve our understanding of the disease, it is paramount to extract such knowledge from the learned model. In this paper, we propose a Deep Factorization Machine model that combines the ability of DNNs to learn complex relationships and the ease of interpretability of a linear model. The proposed model has three parts: (i) an embedding layer to deal with sparse categorical data, (ii) a Factorization Machine to efficiently learn pairwise interactions, and (iii) a DNN to implicitly model higher order interactions. In our experiments on data from the Alzheimer’s Disease Neuroimaging Initiative, we demonstrate that our proposed model classifies cognitive normal, mild cognitive impaired, and demented patients more accurately than competing models. In addition, we show that valuable knowledge about the interactions among biomarkers can be obtained.

Raphael Ronge, Kwangsik Nho, Christian Wachinger, Sebastian Pölsterl
3D Temporomandibular Joint CBCT Image Segmentation via Multi-directional Resampling Ensemble Learning Network

Accurate segmentation of temporomandibular joint (TMJ) from dental cone beam CT (CBCT) images is basis of for early diagnosis of TMJ-related diseases such as temporomandibular disorders (TMD). Fully convolutional networks (FCN) have achieved the state-of-the-art performance in medical image segmentation field. Both enough contextual information as well as rich spatial semantic information is required to obtain accurate segmentation, however, due to the limited GPU memories, high-resolution 3D volume cannot be directly input to these models. In this paper, we propose Multi-directional Resampling Ensemble Learning Network for 3D TMJ-CBCT image segmentation. This model extracts four semantic features from multi-directional resampled volumes, and then integrates features via ensemble learning network to achieve accurate segmentation. We implement extensive evaluations of the proposed method on a clinical images dataset, including images acquired from 89 patients. Our method achieves the Mean DSC value of 0.9814 ± 0.0054, the Mean Hausdorff Distance of 1.5711 ± 1.0252 mm, and the Mean Average Surface Distance of 0.0555 ± 0.0198 mm.

Kai Zhang, Jupeng Li, Ruohan Ma, Gang Li
Vox2Surf: Implicit Surface Reconstruction from Volumetric Data

Surface reconstruction from volumetric T1-weighted and T2-weighted images is a time-consuming multi-step process that often involves careful parameter fine-tuning, hindering a more wide-spread utilization of surface-based analysis particularly in large-scale studies. In this work, we propose a fast surface reconstruction method that is based on directly learning a continuous-valued signed distance function (SDF) as implicit surface representation. This continuous representation implicitly encodes the boundary of the surface as the zero isosurface. Given the predicted SDF, the target 3D surface is reconstructed by applying the marching cubes algorithm. Our implicit reconstruction method concurrently predicts the surfaces of the brain parenchyma, the white matter and pial surfaces, the subcortical structures, and the ventricles. Evaluation based on data from the Human Connectome Project indicates that surface reconstruction of a total of 22 cortical and subcortical structures can be completed in less than 20 min.

Yoonmi Hong, Sahar Ahmad, Ye Wu, Siyuan Liu, Pew-Thian Yap
Clinically Correct Report Generation from Chest X-Rays Using Templates

We address the task of automatically generating a medical report from chest X-rays. Many authors have proposed deep learning models to solve this task, but they focus mainly on improving NLP metrics, such as BLEU and CIDEr, which are not suitable to measure clinical correctness in clinical reports. In this work, we propose CNN-TRG, a Template-based Report Generation model that detects a set of abnormalities and verbalizes them via fixed sentences, which is much simpler than other state-of-the-art NLG methods and achieves better results in medical correctness metrics.We benchmark our model in the IU X-ray and MIMIC-CXR datasets against naive baselines as well as deep learning-based models, by employing the Chexpert labeler and MIRQI as clinical correctness evaluations, and NLP metrics as secondary evaluation. We also provide further evidence indicating that traditional NLP metrics are not suitable for this task by presenting their lack of robustness in multiple cases. We show that slightly altering a template-based model can increase NLP metrics considerably while maintaining high clinical performance. Our work contributes by a simple but effective approach for chest X-ray report generation, as well as by supporting a model evaluation focused primarily on clinical correctness metrics and secondarily on NLP metrics.

Pablo Pino, Denis Parra, Cecilia Besa, Claudio Lagos
Extracting Sequential Features from Dynamic Connectivity Network with rs-fMRI Data for AD Classification

Dynamic functional connectivity (dFC) networks based on resting-state functional magnetic resonance imaging (rs-fMRI) can help us understand the function of brain better, and have been applied to brain disease identification, such as Alzheimer’s disease (AD) and its early stages (i.e., mild cognitive impairment, MCI). Deep learning (e.g., convolutional neural network, CNN) methods have been recently applied to dynamic FC network analysis, and achieve good performance compared to traditional machine learning methods. Existing studies usually ignore sequence information of temporal features from dynamic FC networks. To this end, in this paper, we propose a recurrent neural network-based learning framework to extract sequential features from dynamic FC networks with rs-fMRI data for brain disease classification. Experimental results on 174 subjects with baseline resting-state functional MRI (rs-fMRI) data from ADNI demonstrate the effectiveness of our proposed method in binary and multi-category classification tasks.

Kai Lin, Biao Jie, Peng Dong, Xintao Ding, Weixin Bian, Mingxia Liu
Integration of Handcrafted and Embedded Features from Functional Connectivity Network with rs-fMRI forBrain Disease Classification

Functional connectivity networks (FCNs) based on the resting-state functional magnetic imaging (rs-fMRI) can help to enhance our knowledge and understanding of brain function, and have been applied to diagnosis of brain diseases, such as Alzheimer’s disease (AD) and its prodromal stage, i.e., mild cognitive impairment (MCI). Traditional methods usually extract meaningful measures (e.g., local clustering coefficients) from FCNs as (handcrafted) features for training the model. Recently, deep neural networks (DNNs) have been used to learn (embedded) features from FCNs for classification. However, few work explores to integrate both kinds of features (i.e., handcrafted features from traditional methods and embedded features from DNN methods), although these features may convey complementary information for further improving the classification performance. Accordingly, in this paper, we propose a novel learning framework to integrate the handcrafted features from traditional method and embedded features from DNN method for classification of brain disease with rs-fMRI data. Experimental results on 174 subjects with baseline rs-fMRI data from the ADNI demonstrate the superiority of the proposed methods against several existing methods.

Peng Dong, Biao Jie, Lin Kai, Xintao Ding, Weixin Bian, Mingxia Liu
Detection of Lymph Nodes in T2 MRI Using Neural Network Ensembles

Reliable localization of abnormal lymph nodes in T2 Magnetic Resonance Imaging (MRI) scans is needed for staging and treatment of lymphoproliferative diseases. Radiologists need to accurately characterize the size and shape of the lymph nodes and may require an additional contrast sequence such as diffusion weighted imaging (DWI) for staging confirmation. The varied appearance of lymph nodes in T2 MRI makes staging for metastasis challenging. Moreover, radiologists often times miss smaller lymph nodes that could be malignant over the course of a busy clinical day. To address these imaging and workflow issues, in this pilot work we aim to localize potentially suspicious lymph nodes for staging. We use state-of-the-art detection neural networks to localize lymph nodes in T2 MRI scans acquired through a variety of scanners and exam protocols, and employ bounding box fusion techniques to reduce false positives (FP) and boost detection accuracy. We construct an ensemble of the best detection models to identify potential lymph node candidates for staging, obtaining a 71.75% precision and 91.96% sensitivity at 4 FP per image. To the best of our knowledge, our results improve upon the current state-of-the-art techniques for lymph node detection in T2 MRI scans.

Tejas Sudharshan Mathai, Sungwon Lee, Daniel C. Elton, Thomas C. Shen, Yifan Peng, Zhiyong Lu, Ronald M. Summers
Seeking an Optimal Approach for Computer-Aided Pulmonary Embolism Detection

Pulmonary embolism (PE) represents a thrombus (“blood clot”), usually originating from a lower extremity vein, that travels to the blood vessels in the lung, causing vascular obstruction and in some patients, death. This disorder is commonly diagnosed using CT pulmonary angiography (CTPA). Deep learning holds great promise for the computer-aided CTPA diagnosis (CAD) of PE. However, numerous competing methods for a given task in the deep learning literature exist, causing great confusion regarding the development of a CAD PE system. To address this confusion, we present a comprehensive analysis of competing deep learning methods applicable to PE diagnosis using CTPA at the both image and exam levels. At the image level, we compare convolutional neural networks (CNNs) with vision transformers, and contrast self-supervised learning (SSL) with supervised learning, followed by an evaluation of transfer learning compared with training from scratch. At the exam level, we focus on comparing conventional classification (CC) with multiple instance learning (MIL). Our extensive experiments consistently show: (1) transfer learning consistently boosts performance despite differences between natural images and CT scans, (2) transfer learning with SSL surpasses its supervised counterparts; (3) CNNs outperform vision transformers, which otherwise show satisfactory performance; and (4) CC is, surprisingly, superior to MIL. Compared with the state of the art, our optimal approach provides an AUC gain of 0.2% and 1.05% for image-level and exam-level, respectively.

Nahid Ul Islam, Shiv Gehlot, Zongwei Zhou, Michael B. Gotway, Jianming Liang
Correction to: Machine Learning in Medical Imaging

In an older version of papers 68 and 69, the CERNET Innovation Project (NGII20190621) had been omitted from the Acknowledgment section. This has been corrected.

Chunfeng Lian, Xiaohuan Cao, Islem Rekik, Xuanang Xu, Pingkun Yan
Correction to: A Gaussian Process Model for Unsupervised Analysis of High Dimensional Shape Data

In an older version of this chapter, the acknowledgement was incomplete. This has been corrected.

Wenzheng Tao, Riddhish Bhalodia, Ross Whitaker
Machine Learning in Medical Imaging
Chunfeng Lian
Xiaohuan Cao
Islem Rekik
Dr. Xuanang Xu
Dr. Pingkun Yan
Copyright Year
Electronic ISBN
Print ISBN

Premium Partner