Skip to main content
Top

2020 | Book

Medical Image Computing and Computer Assisted Intervention – MICCAI 2020

23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part VI

Editors: Prof. Anne L. Martel, Purang Abolmaesumi, Danail Stoyanov, Diana Mateus, Maria A. Zuluaga, S. Kevin Zhou, Daniel Racoceanu, Prof. Leo Joskowicz

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

The seven-volume set LNCS 12261, 12262, 12263, 12264, 12265, 12266, and 12267 constitutes the refereed proceedings of the 23rd International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2020, held in Lima, Peru, in October 2020. The conference was held virtually due to the COVID-19 pandemic.

The 542 revised full papers presented were carefully reviewed and selected from 1809 submissions in a double-blind review process. The papers are organized in the following topical sections:

Part I: machine learning methodologies

Part II: image reconstruction; prediction and diagnosis; cross-domain methods and reconstruction; domain adaptation; machine learning applications; generative adversarial networks

Part III: CAI applications; image registration; instrumentation and surgical phase detection; navigation and visualization; ultrasound imaging; video image analysis

Part IV: segmentation; shape models and landmark detection

Part V: biological, optical, microscopic imaging; cell segmentation and stain normalization; histopathology image analysis; opthalmology

Part VI: angiography and vessel analysis; breast imaging; colonoscopy; dermatology; fetal imaging; heart and lung imaging; musculoskeletal imaging

Part VI: brain development and atlases; DWI and tractography; functional brain networks; neuroimaging; positron emission tomography

Table of Contents

Frontmatter
Correction to: The Case of Missed Cancers: Applying AI as a Radiologist’s Safety Net

The original version of this chapter was revised. Dr. Ayelet Akselrod-Ballin contributed to the development of the conference paper and was therefore added to the list of coauthors.

Michal Chorev, Yoel Shoshan, Ayelet Akselrod-Ballin, Adam Spiro, Shaked Naor, Alon Hazan, Vesna Barros, Iuliana Weinstein, Esma Herzel, Varda Shalev, Michal Guindy, Michal Rosen-Zvi

Angiography and Vessel Analysis

Frontmatter
Lightweight Double Attention-Fused Networks for Intraoperative Stent Segmentation

In endovascular interventional therapy, the fusion of preoperative data with intraoperative X-ray fluoroscopy has demonstrated the potential to reduce radiation dose, contrast agent and processing time. Real-time intraoperative stent segmentation is an important pre-requisite for accurate fusion. Nevertheless, this task often comes with the challenge of the thin stent wires with low contrast in noisy X-ray fluoroscopy. In this paper, a novel and efficient network, termed Lightweight Double Attention-fused Network (LDA-Net), is proposed for end-to-end stent segmentation in intraoperative X-ray fluoroscopy. The proposed LDA-Net consists of three major components, namely feature attention module, relevance attention module and pre-trained MobileNetV2 encoder. Besides, a hybrid loss function of both reinforced focal loss and dice loss is designed to better address the issues of class imbalance and misclassified examples. Quantitative and qualitative evaluations on 175 intraoperative X-ray sequences demonstrate that the proposed LDA-Net significantly outperforms simpler baselines as well as the best previously-published result for this task, achieving the state-of-the-art performance.

Yan-Jie Zhou, Xiao-Liang Xie, Zeng-Guang Hou, Xiao-Hu Zhou, Gui-Bin Bian, Shi-Qi Liu
TopNet: Topology Preserving Metric Learning for Vessel Tree Reconstruction and Labelling

Reconstructing Portal Vein and Hepatic Vein trees from contrast enhanced abdominal CT scans is a prerequisite for preoperative liver surgery simulation. Existing deep learning based methods treat vascular tree reconstruction as a semantic segmentation problem. However, vessels such as hepatic and portal vein look very similar locally and need to be traced to their source for robust label assignment. Therefore, semantic segmentation by looking at local 3D patch results in noisy misclassifications. To tackle this, we propose a novel multi-task deep learning architecture for vessel tree reconstruction. The network architecture simultaneously solves the task of detecting voxels on vascular centerlines (i.e. nodes) and estimates connectivity between center-voxels (edges) in the tree structure to be reconstructed. Further, we propose a novel connectivity metric which considers both inter-class distance and intra-class topological distance between center-voxel pairs. Vascular trees are reconstructed starting from the vessel source using the learned connectivity metric using the shortest path tree algorithm. A thorough evaluation on public IRCAD dataset shows that the proposed method considerably outperforms existing semantic segmentation based methods. To the best of our knowledge, this is the first deep learning based approach which learns multi-label tree structure connectivity from images.

Deepak Keshwani, Yoshiro Kitamura, Satoshi Ihara, Satoshi Iizuka, Edgar Simo-Serra
Learning Hybrid Representations for Automatic 3D Vessel Centerline Extraction

Automatic blood vessel extraction from 3D medical images is crucial for vascular disease diagnoses. Existing methods based on convolutional neural networks (CNNs) may suffer from discontinuities of extracted vessels when segmenting such thin tubular structures from 3D images. We argue that preserving the continuity of extracted vessels requires to take into account the global geometry. However, 3D convolutions are computationally inefficient, which prohibits the 3D CNNs from sufficiently large receptive fields to capture the global cues in the entire image. In this work, we propose a hybrid representation learning approach to address this challenge. The main idea is to use CNNs to learn local appearances of vessels in image crops while using another point-cloud network to learn the global geometry of vessels in the entire image. In inference, the proposed approach extracts local segments of vessels using CNNs, classifies each segment based on global geometry using the point-cloud network, and finally connects all the segments that belong to the same vessel using the shortest-path algorithm. This combination results in an efficient, fully-automatic and template-free approach to centerline extraction from 3D images. We validate the proposed approach on CTA datasets and demonstrate its superior performance compared to both traditional and CNN-based baselines.

Jiafa He, Chengwei Pan, Can Yang, Ming Zhang, Yang Wang, Xiaowei Zhou, Yizhou Yu
Branch-Aware Double DQN for Centerline Extraction in Coronary CT Angiography

Accurate coronary artery centerline is essential for coronary stenosis analysis and atherosclerotic plaque analysis. However, the existence of many branches makes accurate centerline extraction a challenging task in coronary CT angiography (CCTA). In this paper, we proposed a branch-aware coronary centerline extraction approach (BACCE) based on Double Deep Q-Network (DDQN) and 3D dilated CNN. It consists of two parts: a DDQN based tracker and a branch-aware detector. The tracker can predict the next action of an agent accurately which can trace the centerline. The detector can detect the branch points and radius of coronary artery, and it makes our BACCE able to trace the branches automatically. Benefiting from the detector, our BACCE only needs one seed point to extract the entire coronary tree. Moreover, we proposed a new reward calculation based on dot product of two vectors and a new agent movement strategy based on twenty-six adjacent voxels, which were proved to improve tracing speed and accuracy. We evaluated the BACCE model on the public dataset in CAT08 challenge and experiment results demonstrated that our method achieved state-of-the-art results in terms of time-cost(7 s), OV(96.2%), OF(88.3%), OT(96.5%) and AI(0.21 mm) metrics. Moreover, we also demonstrated results on qualitative evaluation at the end. Source code and pre-trained models are publicly available: https://github.com/514sz/Branch-aware-centerline-extraction .

Yuyang Zhang, Gongning Luo, Wei Wang, Kuanquan Wang
Automatic CAD-RADS Scoring Using Deep Learning

Coronary CT angiography (CCTA) has established its role as a non-invasive modality for the diagnosis of coronary artery disease (CAD). The CAD-Reporting and Data System (CAD-RADS) has been developed to standardize communication and aid in decision making based on CCTA findings. The CAD-RADS score is determined by manual assessment of all coronary vessels and the grading of lesions within the coronary artery tree.We propose a bottom-up approach for fully-automated prediction of this score using deep-learning operating on a segment-wise representation of the coronary arteries. The method relies solely on a prior fully-automated centerline extraction and segment labeling and predicts the segment-wise stenosis degree and the overall calcification grade as auxiliary tasks in a multi-task learning setup.We evaluate our approach on a data collection consisting of 2,867 patients. On the task of identifying patients with a CAD-RADS score indicating the need for further invasive investigation our approach reaches an area under curve (AUC) of 0.923 and an AUC of 0.914 for determining whether the patient suffers from CAD. This level of performance enables our approach to be used in a fully-automated screening setup or to assist diagnostic CCTA reading, especially due to its neural architecture design – which allows comprehensive predictions.

Felix Denzinger, Michael Wels, Katharina Breininger, Mehmet A. Gülsün, Max Schöbinger, Florian André, Sebastian Buß, Johannes Görich, Michael Sühling, Andreas Maier
Higher-Order Flux with Spherical Harmonics Transform for Vascular Analysis

In this paper, we present a novel flux-based method to robustly identify the vasculature structure in the angiography, where the curvilinear geometry is delineated by the higher-order tensor computed in the spherical frequency domain. We first modify the vesselness measurement derived from the oriented flux and introduce an antisymmetry measurement to generate the curvilinear responses. We then extend the responses to the cylindrical model and fit them into spherical harmonics transform to perform high-order tensor analysis, in which fiber orientation distribution function is utilized. A graphical framework based on the random walker is applied for vascular segmentation. It is experimentally demonstrated that the proposed method can achieve accurate and stable segmentation performance with various noise levels, demonstrating the proposed method can deliver reliable curvilinear structure responses.

Jierong Wang, Albert C. S. Chung
Cerebrovascular Segmentation in MRA via Reverse Edge Attention Network

Automated extraction of cerebrovascular is of great importance in understanding the mechanism, diagnosis, and treatment of many cerebrovascular pathologies. However, segmentation of cerebrovascular networks from magnetic resonance angiography (MRA) imagery continues to be challenging because of relatively poor contrast and inhomogeneous backgrounds, and the anatomical variations, complex geometry and topology of the networks themselves. In this paper, we present a novel cerebrovascular segmentation framework that consists of image enhancement and segmentation phases. We aim to remove redundant features, while retaining edge information in shallow features when combining these with deep features. We first employ a Retinex model, which is able to model noise explicitly to aid removal of imaging noise, as well as reducing redundancy within an image and emphasizing the vessel regions, thereby simplifying the subsequent segmentation problem. Subsequently, a reverse edge attention module is employed to discover edge information by paying particular attention to the regions that are not salient in high-level semantic features. The experimental results show that the proposed framework enables the reverse edge attention network to deliver a reliable cerebrovascular segmentation.

Hao Zhang, Likun Xia, Ran Song, Jianlong Yang, Huaying Hao, Jiang Liu, Yitian Zhao
Automated Intracranial Artery Labeling Using a Graph Neural Network and Hierarchical Refinement

Automatically labeling intracranial arteries (ICA) with their anatomical names is beneficial for feature extraction and detailed analysis of intracranial vascular structures. There are significant variations in the ICA due to natural and pathological causes, making it challenging for automated labeling. However, the existing public dataset for evaluation of anatomical labeling is limited. We construct a comprehensive dataset with 729 Magnetic Resonance Angiography scans and propose a Graph Neural Network (GNN) method to label arteries by classifying types of nodes and edges in an attributed relational graph. In addition, a hierarchical refinement framework is developed for further improving the GNN outputs to incorporate structural and relational knowledge about the ICA. Our method achieved a node labeling accuracy of 97.5%, and 63.8% of scans were correctly labeled for all Circle of Willis nodes, on a testing set of 105 scans with both healthy and diseased subjects. This is a significant improvement over available state-of-the-art methods. Automatic artery labeling is promising to minimize manual effort in characterizing the complicated ICA networks and provides valuable information for the identification of geometric risk factors of vascular disease. Our code and dataset are available at https://github.com/clatfd/GNN-ART-LABEL .

Li Chen, Thomas Hatsukami, Jenq-Neng Hwang, Chun Yuan
Time Matters: Handling Spatio-Temporal Perfusion Information for Automated TICI Scoring

X-ray digital subtraction angiography (DSA) imaging is the backbone of diagnosis and therapy response assessment in cerebral ischemic stroke. To evaluate and document the success of endovascular interventions, the spatio-temporal DSA image information and perfusion dynamics are visually assessed by a clinical expert and reperfusion rated using the so-called TICI (treatment in cerebral ischemia) score. Although clinical standard, it is well known that TICI scoring is time-consuming, observer-dependent and not practicable especially in larger clinical studies. Automated TICI scoring has, however, been considered beyond the scope of machine learning capabilities, due to the complexity of the classification task (eg. heterogeneity of clinical DSA data and a complex dependence between TICI score and perfusion dynamics). The present work describes the first study that tackles automated TICI scoring using deep spatio-temporal learning. It thereby defines the first corresponding benchmark. Methodically, we build on gated recurrent unit networks (GRUs) and integrate knowledge about the perfusion and TICI scoring process into loss functions and network training to increase prediction robustness. Differences between GRU-predicted mTICI scores and routine mTICI scores are in the order of literature-reported interrater variability of human expert-based TICI scoring.

Maximilian Nielsen, Moritz Waldmann, Thilo Sentker, Andreas Frölich, Jens Fiehler, René Werner
ID-Fit: Intra-Saccular Device Adjustment for Personalized Cerebral Aneurysm Treatment

Intrasaccular devices, like Woven EndoBridge (WEB), are novel braided devices employed for the treatment of aneurysms with a complex shape and location, mostly terminal aneurysms. Such aneurysms are often challenging or impossible to treat with other endovascular techniques such as coils, stents, flow diverter stents. The selection of an appropriate endosaccular device size is crucial for a successful treatment and strongly depends of the final configuration that the device adopts when it adapts to the aneurysm sac morphology. This is frequently a problem during the intervention, leading to replacement of the device, reopening of the aneurysm or a need for re-treatment. A technique that allows predicting the released WEB configuration before intervention will provide a powerful computational tool to aid the interventionist during device selection. We propose a technique based on device design and aneurysm morphology that, by virtually deploying a WEB, will enable the assessment of different device sizes before the device implantation. This technique was tested on 6 MCA aneurysm cases and the simulation results were compared to the size of the deployed device on the patient, using post-treatment images.

Romina Muñoz, Ana Paula Narata, Ignacio Larrabide
JointVesselNet: Joint Volume-Projection Convolutional Embedding Networks for 3D Cerebrovascular Segmentation

In this paper, we present an end-to-end deep learning method, JointVesselNet, for robust extraction of 3D sparse vascular structure through embedding the image composition, generated by maximum intensity projection (MIP), into the 3D magnetic resonance angiography (MRA) volumetric image learning process to enhance the overall performance. The MIP embedding features can strengthen the local vessel signal and adapt to the geometric variability and scalability of vessels. Therefore, the proposed framework can better capture the small vessels and improve the vessel connectivity. To our knowledge, this is the first time that a deep learning framework is proposed to construct a joint convolutional embedding space, where the computed joint vessel probabilities from 2D projection and 3D volume can be integrated synergistically. Experimental results are evaluated and compared with the traditional 3D vessel segmentation methods and the state-of-the-art in deep learning, by using both public and real patient cerebrovascular image datasets.

Yifan Wang, Guoli Yan, Haikuan Zhu, Sagar Buch, Ying Wang, Ewart Mark Haacke, Jing Hua, Zichun Zhong
Classification of Retinal Vessels into Artery-Vein in OCT Angiography Guided by Fundus Images

Automated classification of retinal artery (A) and vein (V) is of great importance for the management of eye diseases and systemic diseases. Traditional colour fundus images usually provide a large field of view of the retina in color, but often fail to capture the finer vessels and capillaries. In contrast, the new Optical Coherence Tomography Angiography (OCT-A) images can provide clear view of the retinal microvascular structure in gray scale down to capillary levels but cannot provide A/V information alone. For the first time, this study presents a new approach for the classification of A/V in OCT-A images, guided by the corresponding fundus images, so that the strengths of both modalities can be integrated together. To this end, we first estimate the vascular topologies of paired color fundus and OCT-A images respectively, then we propose a topological message passing algorithm to register the OCT-A onto color fundus images, and finally the integrated vascular topology map is categorized into arteries and veins by a clustering approach. The proposed method has been applied to a local dataset contains both fundus image and OCT-A, and it reliably identified individual arteries and veins in OCT-A. The experimental results show that despite lack of color and intensity information, it produces promising results. In addition, we will release our database to the public.

Jianyang Xie, Yonghuai Liu, Yalin Zheng, Pan Su, Yan Hu, Jianlong Yang, Jiang Liu, Yitian Zhao
Vascular Surface Segmentation for Intracranial Aneurysm Isolation and Quantification

Predicting rupture risk and deciding on optimal treatment plan for intracranial aneurysms (IAs) is possible by quantification of their size and shape. For this purpose the IA has to be isolated from 3D angiogram. State-of-the-art methods perform IA isolation by encoding neurosurgeon’s intuition about former non-dilated vessel anatomy through principled approaches like fitting a cutting plane to vasculature surface, using Gaussian curvature and vessel centerline distance constraints, by deformable contours or graph cuts guided by the curvature or restricted by Voronoi surface decomposition and similar. However, the large variability of IAs and their parent vasculature configurations often leads to failure or non-intuitive isolation. Manual corrections are thus required, but suffer from poor reproducibility. In this paper, we aim to increase the accuracy, robustness and reproducibility of IA isolation through two stage deep learning based segmentation of vascular surface. The surface was represented by local patches in form of point clouds, which were fed into first stage multilayer neural network (MNN) to obtain descriptors invariant to point ordering, rotation and scale. Binary classifier as second stage MNN was used to isolate surface belonging to the IA. Method validation was based on 57 3D-DSA, 28 CTA and 5 MRA images, where cross-modality-validation showed high segmentation sensitivity of 0.985, a substantial improvement over 0.830 obtained for the state-of-the-art method on the same datasets. Visual analysis of IA isolation and its high accuracy and reliability consistent across CTA, MRA and 3D-DSA scans confirmed the clinical applicability of proposed method.

Žiga Bizjak, Boštjan Likar, Franjo Pernuš, Žiga Špiclin

Breast Imaging

Frontmatter
Deep Doubly Supervised Transfer Network for Diagnosis of Breast Cancer with Imbalanced Ultrasound Imaging Modalities

Elastography ultrasound (EUS) provides additional bio-mechanical information about lesion for B-mode ultrasound (BUS) in the diagnosis of breast cancers. However, joint utilization of both BUS and EUS is not popular due to the lack of EUS devices in rural hospitals, which arouses a novel modality imbalance problem in computer-aided diagnosis (CAD) for breast cancers. Current transfer learning (TL) pay little attention to this special issue of clinical modality imbalance, that is, the source domain (EUS modality) has fewer labeled samples than those in the target domain (BUS modality). Moreover, these TL methods cannot fully use the label information to explore the intrinsic relation between two modalities and then guide the promoted knowledge transfer. To this end, we propose a novel doubly supervised TL network (DDSTN) that integrates the Learning Using Privileged Information (LUPI) paradigm and the Maximum Mean Discrepancy (MMD) criterion into a unified deep TL framework. The proposed algorithm can not only make full use of the shared labels to effectively guide knowledge transfer by LUPI paradigm, but also perform additional supervised transfer between unpaired data. We further introduce the MMD criterion to enhance the knowledge transfer. The experimental results on the breast ultrasound dataset indicate that the proposed DDSTN outperforms all the compared state-of-the-art algorithms for the BUS-based CAD.

Xiangmin Han, Jun Wang, Weijun Zhou, Cai Chang, Shihui Ying, Jun Shi
2D X-Ray Mammogram and 3D Breast MRI Registration

X-ray mammography and breast Magnetic Resonance Imaging (MRI) are two principal imaging modalities which are currently used for detection and diagnosis of breast disease in women. Since these imaging modalities exploit different contrast mechanisms, establishing spatial correspondence between mammograms and volumetric breast MRI scans is expected to aid the assessment and quantification of different type of breast malignancies. Finding such correspondence is, unfortunately, far from being a trivial problem – not only that the images have different contrasts and dimensionality, they are also acquired under vastly different physical conditions. As opposed to many complex standard methods relying on patient-specific bio-mechanical modelling, we developed a new simple approach to find the correspondences. This paper introduces a two-stage computational scheme which estimates the global (compression dependent) part of the spatial transformation first, followed by estimating the residual (tissue dependent) part of the transformation of much smaller magnitude. Experimental results on a clinical data-set, containing 10 subjects, validated the efficiency of the proposed approach. The average Target Registration Error (TRE) on the data-set is 5.44 mm with a standard deviation of 3.61 mm.

Hossein Soleimani, Oleg V. Michailovich
A Second-Order Subregion Pooling Network for Breast Lesion Segmentation in Ultrasound

Breast lesion segmentation in ultrasound images is a fundamental task for clinical diagnosis of the disease. Unfortunately, existing methods mainly rely on the entire image to learn the global context information, which neglects the spatial relation and results in ambiguity in the segmentation results. In this paper, we propose a novel second-order subregion pooling network ( $$S^2P$$ S 2 P -Net) for boosting the breast lesion segmentation in ultrasound images. In our $$S^2P$$ S 2 P -Net, an attention-weighted subregion pooling (ASP) module is introduced in each encoder block of segmentation network to refine features by aggregating global features from the whole image and local information of subregions. Moreover, in each subregion, a guided multi-dimension second-order pooling (GMP) block is designed to leverage additional guidance information and multiple feature dimensions to learn powerful second-order covariance representations. Experimental results on two datasets demonstrate that our proposed $$S^2P$$ S 2 P -Net outperforms state-of-the-art methods.

Lei Zhu, Rongzhen Chen, Huazhu Fu, Cong Xie, Liansheng Wang, Liang Wan, Pheng-Ann Heng
Multi-scale Gradational-Order Fusion Framework for Breast Lesions Classification Using Ultrasound Images

Predicting malignant potential of breast lesions based on breast ultrasound (BUS) images is crucial for computer-aided diagnosis (CAD) system for breast cancer. However, since breast lesions in BUS images have various shapes with relatively low contrast and the textures of breast lesions are often complex, it still remains challenging to predict the malignant potential of breast lesions. In this paper, a novel multi-scale gradational-order fusion (MsGoF) framework is proposed to make full advantages of features from different scale images for predicting malignant potential of breast lesions. Specifically, the multi-scale patches are first extracted from the annotated lesions in BUS images as the multi-channel inputs. Multi-scale features are then automatically learned and fused in several fusion blocks that armed with different fusion strategies to comprehensively capture morphological characteristics of breast lesions. To better characterize complex textures and enhance non-linear modeling capability, we further propose isotropous gradational-order feature module in each block to learn and combine different-order features. Finally, these multi-scale gradational-order features are utilized to perform prediction for malignant potential of breast lesions. The major advantage of our framework is embedding the gradational-order feature module into a fusion block, which is used to deeply integrate multi-scale features. The proposed model was evaluated on an open dataset by using 5-fold cross-validation. The experimental results demonstrate that the proposed MsGoF framework obtains the promising performance when compared with other deep learning-based methods.

Zhenyuan Ning, Chao Tu, Qing Xiao, Jiaxiu Luo, Yu Zhang
Computer-Aided Tumor Diagnosis in Automated Breast Ultrasound Using 3D Detection Network

Automated breast ultrasound (ABUS) is a new and promising imaging modality for breast cancer detection and diagnosis, which could provide intuitive 3D information and coronal plane information with great diagnostic value. However, manually screening and diagnosing tumors from ABUS images is very time-consuming and overlooks of abnormalities may happen. In this study, we propose a novel two-stage 3D detection network for locating suspected lesion areas and further classifying lesions as benign or malignant tumors. Specifically, we propose a 3D detection network rather than frequently-used segmentation network to locate lesions in ABUS images, thus our network can make full use of the spatial context information in ABUS images. A novel similarity loss is designed to effectively distinguish lesions from background. Then a classification network is employed to identify the located lesions as benign or malignant. An IoU-balanced classification loss is adopted to improve the correlation between classification and localization task. The efficacy of our network is verified from a collected dataset of 418 patients with 145 benign tumors and 273 malignant tumors. Experiments show our network attains a sensitivity of 97.66% with 1.23 false positives (FPs), and has an area under the curve(AUC) value of 0.8720.

Junxiong Yu, Chaoyu Chen, Xin Yang, Yi Wang, Dan Yan, Jianxing Zhang, Dong Ni
Auto-weighting for Breast Cancer Classification in Multimodal Ultrasound

Breast cancer is the most common invasive cancer in women. Besides the primary B-mode ultrasound screening, sonographers have explored the inclusion of Doppler, strain and shear-wave elasticity imaging to advance the diagnosis. However, recognizing useful patterns in all types of images and weighing up the significance of each modality can elude less-experienced clinicians. In this paper, we explore, for the first time, an automatic way to combine the four types of ultrasonography to discriminate between benign and malignant breast nodules. A novel multimodal network is proposed, along with promising learnability and simplicity to improve classification accuracy. The key is using a weight-sharing strategy to encourage interactions between modalities and adopting an additional cross-modalities objective to integrate global information. In contrast to hardcoding the weights of each modality in the model, we embed it in a Reinforcement Learning framework to learn this weighting in an end-to-end manner. Thus the model is trained to seek the optimal multimodal combination without handcrafted heuristics. The proposed framework is evaluated on a dataset contains 1616 sets of multimodal images. Results showed that the model scored a high classification accuracy of 95.4%, which indicates the efficiency of the proposed method.

Jian Wang, Juzheng Miao, Xin Yang, Rui Li, Guangquan Zhou, Yuhao Huang, Zehui Lin, Wufeng Xue, Xiaohong Jia, Jianqiao Zhou, Ruobing Huang, Dong Ni
MommiNet: Mammographic Multi-view Mass Identification Networks

Most Deep Neural Networks (DNNs) based approaches for mammogram analysis are based on single view. Some recent DNN-based multi-view approaches can perform either bilateral or ipsilateral analysis, while in practice, radiologists use both to achieve the best clinical outcome. In this paper, we present the first DNN-based tri-view mass identification approach (MommiNet), which can simultaneously perform end-to-end bilateral and ipsilateral analysis of mammogram images, and in turn can fully emulate the radiologists’ reading practice. Novel network architectures are proposed to learn the symmetry and geometry constraints, to fully aggregate the information from all views. Extensive experiments have been conducted on the public DDSM dataset and our in-house dataset, and state-of-the-art (SOTA) results have been obtained in terms of mammogram mass detection accuracy.

Zhicheng Yang, Zhenjie Cao, Yanbo Zhang, Mei Han, Jing Xiao, Lingyun Huang, Shibin Wu, Jie Ma, Peng Chang
Multi-site Evaluation of a Study-Level Classifier for Mammography Using Deep Learning

We present a computer-aided diagnosis algorithm for mammography trained and validated on studies acquired from six clinical sites. We hold out the full dataset from a seventh hospital for testing to assess the algorithm’s ability to generalize to new sites. Our classifiers are convolutional neural networks that take multiple input images from a mammography study and produce classifications for the study. The studies are globally labeled as normal, biopsy benign, high risk or biopsy malignant. We report on experimental results from several network variants, including study-level and breast-level models, single- and multiple-output models, and a novel model architecture that incorporates prior studies. Each model variation includes an image-level classifier that is pre-trained with per-image labels and is used as a feature extractor in our study-level models. Our best study-level model achieves 0.85 area under the ROC curve for normal vs malignant classification on the held-out test site. In comparison with other recent work, we achieve a similar level of classification sensitivity and specificity on a dataset with greater site and vendor variation. Additionally, our test performance is demonstrated on a held-out site to more accurately assess how the model would perform when deployed in the field.

Dustin Sargent, Sun Young Park, Amod Jog, Aly Mohamed, David Richmond
The Case of Missed Cancers: Applying AI as a Radiologist’s Safety Net

We investigate the potential contribution of an AI system as a safety net application for radiologists in breast cancer screening. As a safety net, the AI alerts on cases suspected to be malignant which the radiologist did not recommend for a recall. We analyzed held-out data of 2,638 exams enriched with 90 missed cancers. In screening mammography settings, we show that a system alerting on 11 out of every 1,000 cases, could detect up to 10.7% of the radiologists’ missed cancers. Thus, significantly increasing radiologist’s sensitivity to 80.3%, while only slightly decreasing their specificity to 95.3%. Importantly, the safety net demonstrated a significant contribution to their performance even when radiologists utilized both mammography and ultrasound images. In those settings, it would have alerted 8.5 times per 1,000 cases, and detected 11.7% of the radiologists’ missed cancers. In an analysis of the missed cancers by an expert, we found that most of the cancers detected by the AI were visible post-hoc. Finally, we performed a reader study with five radiologists over 120 exams, 10 of which were originally missed cancers. The AI safety net was able to assist 3 out of the 5 radiologists in detecting missed cancers without raising any false alerts.

Michal Chorev, Yoel Shoshan, Ayelet Akselrod-Ballin, Adam Spiro, Shaked Naor, Alon Hazan, Vesna Barros, Iuliana Weinstein, Esma Herzel, Varda Shalev, Michal Guindy, Michal Rosen-Zvi
Decoupling Inherent Risk and Early Cancer Signs in Image-Based Breast Cancer Risk Models

The ability to accurately estimate risk of developing breast cancer would be invaluable for clinical decision-making. One promising new approach is to integrate image-based risk models based on deep neural networks. However, one must take care when using such models, as selection of training data influences the patterns the network will learn to identify. With this in mind, we trained networks using three different criteria to select the positive training data (i.e. images from patients that will develop cancer): an inherent risk model trained on images with no visible signs of cancer, a cancer signs model trained on images containing cancer or early signs of cancer, and a conflated model trained on all images from patients with a cancer diagnosis. We find that these three models learn distinctive features that focus on different patterns, which translates to contrasts in performance. Short-term risk is best estimated by the cancer signs model, whilst long-term risk is best estimated by the inherent risk model. Carelessly training with all images conflates inherent risk with early cancer signs, and yields sub-optimal estimates in both regimes. As a consequence, conflated models may lead physicians to recommend preventative action when early cancer signs are already visible.

Yue Liu, Hossein Azizpour, Fredrik Strand, Kevin Smith
Multi-task Learning for Detection and Classification of Cancer in Screening Mammography

Breast screening is an effective method to identify breast cancer in asymptomatic women; however, not all exams are read by radiologists specialized in breast imaging, and missed cancers are a reality. Deep learning provides a valuable tool to support this critical decision point. Algorithmically, accurate assessment of breast mammography requires both detection of abnormal findings (object detection) and a correct decision whether to recall a patient for additional imaging (image classification). In this paper, we present a multi-task learning approach, that we argue is ideally suited to this problem. We train a network for both object detection and image classification, based on state-of-the-art models, and demonstrate significant improvement in the recall vs no recall decision on a multi-site, multi-vendor data set, measured by concordance with biopsy proven malignancy. We also observe improved detection of microcalcifications, and detection of cancer cases that were missed by radiologists, demonstrating that this approach could provide meaningful support for radiologists in breast screening (especially non-specialists). Moreover, we argue that this multi-task framework is broadly applicable to a wide range of medical imaging problems that require a patient-level recommendation, based on specific imaging findings.

Maria V. Sainz de Cea, Karl Diedrich, Ran Bakalo, Lior Ness, David Richmond

Colonoscopy

Frontmatter
Adaptive Context Selection for Polyp Segmentation

Accurate polyp segmentation is of great significance for the diagnosis and treatment of colorectal cancer. However, it has always been very challenging due to the diverse shape and size of polyp. In recent years, state-of-the-art methods have achieved significant breakthroughs in this task with the help of deep convolutional neural networks. However, few algorithms explicitly consider the impact of the size and shape of the polyp and the complex spatial context on the segmentation performance, which results in the algorithms still being powerless for complex samples. In fact, segmentation of polyps of different sizes relies on different local and global contextual information for regional contrast reasoning. To tackle these issues, we propose an adaptive context selection based encoder-decoder framework which is composed of Local Context Attention (LCA) module, Global Context Module (GCM) and Adaptive Selection Module (ASM). Specifically, LCA modules deliver local context features from encoder layers to decoder layers, enhancing the attention to the hard region which is determined by the prediction map of previous layer. GCM aims to further explore the global context features and send to the decoder layers. ASM is used for adaptive selection and aggregation of context features through channel-wise attention. Our proposed approach is evaluated on the EndoScene and Kvasir-SEG Datasets, and shows outstanding performance compared with other state-of-the-art methods. The code is available at https://github.com/ReaFly/ACSNet .

Ruifei Zhang, Guanbin Li, Zhen Li, Shuguang Cui, Dahong Qian, Yizhou Yu
PraNet: Parallel Reverse Attention Network for Polyp Segmentation

Colonoscopy is an effective technique for detecting colorectal polyps, which are highly related to colorectal cancer. In clinical practice, segmenting polyps from colonoscopy images is of great importance since it provides valuable information for diagnosis and surgery. However, accurate polyp segmentation is a challenging task, for two major reasons: (i) the same type of polyps has a diversity of size, color and texture; and (ii) the boundary between a polyp and its surrounding mucosa is not sharp. To address these challenges, we propose a parallel reverse attention network (PraNet) for accurate polyp segmentation in colonoscopy images. Specifically, we first aggregate the features in high-level layers using a parallel partial decoder (PPD). Based on the combined feature, we then generate a global map as the initial guidance area for the following components. In addition, we mine the boundary cues using the reverse attention (RA) module, which is able to establish the relationship between areas and boundary cues. Thanks to the recurrent cooperation mechanism between areas and boundaries, our PraNet is capable of calibrating some misaligned predictions, improving the segmentation accuracy. Quantitative and qualitative evaluations on five challenging datasets across six metrics show that our PraNet improves the segmentation accuracy significantly, and presents a number of advantages in terms of generalizability, and real-time segmentation efficiency ( $$\varvec{\sim }$$ ∼ 50 fps).

Deng-Ping Fan, Ge-Peng Ji, Tao Zhou, Geng Chen, Huazhu Fu, Jianbing Shen, Ling Shao
Few-Shot Anomaly Detection for Polyp Frames from Colonoscopy

Anomaly detection methods generally target the learning of a normal image distribution (i.e., inliers showing healthy cases) and during testing, samples relatively far from the learned distribution are classified as anomalies (i.e., outliers showing disease cases). These approaches tend to be sensitive to outliers that lie relatively close to inliers (e.g., a colonoscopy image with a small polyp). In this paper, we address the inappropriate sensitivity to outliers by also learning from inliers. We propose a new few-shot anomaly detection method based on an encoder trained to maximise the mutual information between feature embeddings and normal images, followed by a few-shot score inference network, trained with a large set of inliers and a substantially smaller set of outliers. We evaluate our proposed method on the clinical problem of detecting frames containing polyps from colonoscopy video sequences, where the training set has 13350 normal images (i.e., without polyps) and less than 100 abnormal images (i.e., with polyps). The results of our proposed model on this data set reveal a state-of-the-art detection result, while the performance based on different number of anomaly samples is relatively stable after approximately 40 abnormal training images. Code is available at https://github.com/tianyu0207/FSAD-Net .

Yu Tian, Gabriel Maicas, Leonardo Zorron Cheng Tao Pu, Rajvinder Singh, Johan W. Verjans, Gustavo Carneiro
PolypSeg: An Efficient Context-Aware Network for Polyp Segmentation from Colonoscopy Videos

Polyp segmentation from colonoscopy videos is of great importance for improving the quantitative analysis of colon cancer. However, it remains a challenging task due to (1) the large size and shape variation of polyps, (2) the low contrast between polyps and background, and (3) the inherent real-time requirement of this application, where the segmentation results should be immediately presented to the doctors during the colonoscopy procedures for their prompt decision and action. It is difficult to develop a model with powerful representation capability, yielding satisfactory segmentation results in a real-time manner. We propose a novel and efficient context-aware network, named PolypSeg, in order to comprehensively address these challenges. The proposed PolypSeg consists of two key components: adaptive scale context module (ASCM) and semantic global context module (SGCM). The ASCM aggregates the multi-scale context information and takes advantage of an improved attention mechanism to make the network focus on the target regions and hence improve the feature representation. The SGCM enriches the semantic information and excludes the background noise in the low-level features, which enhances the feature fusion between high-level and low-level features. In addition, we introduce the deep separable convolution into our PolypSeg to replace the traditional convolution operations in order to reduce parameters and computational costs to make the PolypSeg run in a real-time manner. We conducted extensive experiments on a famous public available dataset for polyp segmentation task. Experimental results demonstrate that the proposed PolypSeg achieves much better segmentation results than state-of-the-art methods with a much faster speed.

Jiafu Zhong, Wei Wang, Huisi Wu, Zhenkun Wen, Jing Qin
Endoscopic Polyp Segmentation Using a Hybrid 2D/3D CNN

Colonoscopy is the gold standard for early diagnosis and pre-emptive treatment of colorectal cancer by detecting and removing colonic polyps. Deep learning approaches to polyp detection have shown potential for enhancing polyp detection rates. However, the majority of these systems are developed and evaluated on static images from colonoscopies, whilst applied treatment is performed on a real-time video feed. Non-curated video data includes a high proportion of low-quality frames in comparison to selected images but also embeds temporal information that can be used for more stable predictions. To exploit this, a hybrid 2D/3D convolutional neural network architecture is presented. The network is used to improve polyp detection by encompassing spatial and temporal correlation of the predictions while preserving real-time detections. Extensive experiments show that the hybrid method outperforms a 2D baseline. The proposed architecture is validated on videos from 46 patients. The results show that real-world clinical implementations of automated polyp detection can benefit from the hybrid algorithm.

Juana González-Bueno Puyal, Kanwal K. Bhatia, Patrick Brandao, Omer F. Ahmad, Daniel Toth, Rawen Kader, Laurence Lovat, Peter Mountney, Danail Stoyanov

Dermatology

Frontmatter
A Distance-Based Loss for Smooth and Continuous Skin Layer Segmentation in Optoacoustic Images

Raster-scan optoacoustic mesoscopy (RSOM) is a powerful, non-invasive optical imaging technique for functional, anatomical, and molecular skin and tissue analysis. However, both the manual and the automated analysis of such images are challenging, because the RSOM images have very low contrast, poor signal to noise ratio, and systematic overlaps between the absorption spectra of melanin and hemoglobin. Nonetheless, the segmentation of the epidermis layer is a crucial step for many downstream medical and diagnostic tasks, such as vessel segmentation or monitoring of cancer progression. We propose a novel, shape-specific loss function that overcomes discontinuous segmentations and achieves smooth segmentation surfaces while preserving the same volumetric Dice and IoU. Further, we validate our epidermis segmentation through the sensitivity of vessel segmentation. We found a 20 $$\%$$ % improvement in Dice for vessel segmentation tasks when the epidermis mask is provided as additional information to the vessel segmentation network.

Stefan Gerl, Johannes C. Paetzold, Hailong He, Ivan Ezhov, Suprosanna Shit, Florian Kofler, Amirhossein Bayat, Giles Tetteh, Vasilis Ntziachristos, Bjoern Menze
Fairness of Classifiers Across Skin Tones in Dermatology

Recent advances in computer vision have led to breakthroughs in the development of automated skin image analysis. However, no attempt has been made to evaluate the consistency in performance across populations with varying skin tones. In this paper, we present an approach to estimate skin tone in skin disease benchmark datasets and investigate whether model performance is dependent on this measure. Specifically, we use individual typology angle (ITA) to approximate skin tone in dermatology datasets. We look at the distribution of ITA values to better understand skin color representation in two benchmark datasets: 1) the ISIC 2018 Challenge dataset, a collection of dermoscopic images of skin lesions for the detection of skin cancer, and 2) the SD-198 dataset, a collection of clinical images capturing a wide variety of skin diseases. To estimate ITA, we first develop segmentation models to isolate non-diseased areas of skin. We find that the majority of the data in the two datasets have ITA values between 34.5 $$^\circ $$ ∘ and 48 $$^\circ $$ ∘ , which are associated with lighter skin, and is consistent with under-representation of darker skinned populations in these datasets. We also find no measurable correlation between accuracy of machine learning models and ITA values, though more comprehensive data is needed for further validation.

Newton M. Kinyanjui, Timothy Odonga, Celia Cintas, Noel C. F. Codella, Rameswar Panda, Prasanna Sattigeri, Kush R. Varshney
Alleviating the Incompatibility Between Cross Entropy Loss and Episode Training for Few-Shot Skin Disease Classification

Skin disease classification from images is crucial to dermatological diagnosis. However, identifying skin lesions involves a variety of aspects in terms of size, color, shape, and texture. To make matters worse, many categories only contain very few samples, posing great challenges to conventional machine learning algorithms and even human experts. Inspired by the recent success of Few-Shot Learning (FSL) in natural image classification, we propose to apply FSL to skin disease identification to address the extreme scarcity of training samples. However, directly applying FSL to this task does not work well in practice, and we find that the problem should be largely attributed to the incompatibility between Cross Entropy (CE) and episode training, which are both commonly used in FSL. Based on a detailed analysis, we propose the Query-Relative (QR) loss, which proves superior to CE under episode training and is closely related to recently proposed mutual information estimation. Moreover, we further strengthen the proposed QR loss with a novel adaptive hard margin strategy. Comprehensive experiments validate the effectiveness of the proposed FSL scheme and the possibility to diagnosis rare skin disease with a few labeled samples.

Wei Zhu, Haofu Liao, Wenbin Li, Weijian Li, Jiebo Luo
Clinical-Inspired Network for Skin Lesion Recognition

Automated skin lesion recognition methods are useful for improving the diagnostic accuracy in dermoscopy images. However, several challenges delayed the pace of the development of these methods, including limited amount of data, a lack of ability to focus on the lesion area, poor performance for distinguishing between visually-similar categories of diseases and an imbalance between different classes of training data. During practical learning and diagnosis process, doctors conduct certain strategies to tackle these challenges. Thus, it’s really appealing to involve these strategies in automated skin lesion recognition method, which could be promising for a better performance. Inspired by this, we propose a new Clinical-Inspired Network (CIN) to simulate the subjective learning and diagnostic process of doctors. To mimic the diagnostic process, we design three modules, including a lesion area attention module to crop the images, a feature extraction module to extract image features and a lesion feature attention module to focus on the important lesion parts and mine the correlation between different lesion parts. To simulate the learning process, we introduce a distinguish module. The CIN is extensively tested on ISBI2016 and 2017 challenge datasets and achieves state-of-the-art performance, which demonstrates its advantages.

Zihao Liu, Ruiqin Xiong, Tingting Jiang
Multi-class Skin Lesion Segmentation for Cutaneous T-cell Lymphomas on High-Resolution Clinical Images

Automated skin lesion segmentation is essential to assist doctors in diagnosis. Most methods focus on lesion segmentation of dermoscopy images, while a few focus on clinical images. Nearly all the existing methods tackle the binary segmentation problem as to distinguish lesion parts from normal skin parts, and are designed for diseases with localized solitary skin lesion. Besides, the characteristics of both the dermoscopy images and the clinical images are four-fold: (1) Only one skin lesion exists in the image. (2) The skin lesion mostly appears in the center of the image. (3) The backgrounds are similar between different images of same modality. (4) The resolution of images isn’t high, with an average of about $$1500\times 1200$$ 1500 × 1200 in several popular datasets. In contrast, this paper focuses on a four-class segmentation task for Cutaneous T-cell lymphomas (CTCL), an extremely aggressive skin disease with three visually similar kinds of lesions. For the first time, we collect a new dataset, which only contains clinical images captured from different body areas of human. The main characteristics of these images differ from all the existing images in four aspects: (1) Multiple skin lesion parts exist in each image. (2) The skin lesion parts are widely scattered in different areas of the image. (3) The background of the images has a large variety. (4) All the images have high resolutions, with an average of $$3255 \times 2535$$ 3255 × 2535 . According to the characteristics and difficulties of CTCL, we design a new Multi Knowledge Learning Network (MKLN). The experimental results demonstrate the superiority of our method, which meet the clinical needs.

Zihao Liu, Haihao Pan, Chen Gong, Zejia Fan, Yujie Wen, Tingting Jiang, Ruiqin Xiong, Hang Li, Yang Wang

Fetal Imaging

Frontmatter
Deep Learning Automatic Fetal Structures Segmentation in MRI Scans with Few Annotated Datasets

We present a new method for end-to-end automatic volumetric segmentation of fetal structures in MRI scans with deep learning networks trained with very few annotated scans. It consists of three main stages: 1) two-step automatic structure segmentation with custom 3D U-Nets; 2) segmentation error estimation, and; 3) segmentation error correction. The automatic structure segmentation stage first computes a region of interest (ROI) on a downscaled scan and then computes a final segmentation on the cropped ROI. The segmentation error estimation stage uses prediction-time augmentations of the input scan to compute multiple segmentations and estimate the segmentation uncertainty for individual slices and for the entire scan. The segmentation error correction stage then uses these estimations to locate the most error-prone slices and to correct the segmentations in those slices based on validated adjacent slices. Experimental results of our methods on fetal body (63 cases, 9 for training, 55 for testing) and fetal brain MRI scans (35 cases, 6 for training, 29 for testing) yield a mean Dice coefficient of 0.96 for both, and a mean Average Symmetric Surface Distance of 0.74 mm and 0.19 mm, respectively, below the observer delineation variability.

Gal Dudovitch, Daphna Link-Sourani, Liat Ben Sira, Elka Miller, Dafna Ben Bashat, Leo Joskowicz
Data-Driven Multi-contrast Spectral Microstructure Imaging with InSpect

We introduce and demonstrate an unsupervised machine learning method for spectroscopic analysis of quantitative MRI (qMRI) experiments. qMRI data can support estimation of multidimensional correlation (or single-dimensional) spectra, which allow model-free investigation of tissue properties, but this requires an ill-posed calculation. Moreover, in the vast majority of applications ground truth knowledge is unobtainable, preventing the application of supervised machine learning. Here we present a new method that addresses these limitations in a data-driven way. The algorithm simultaneously estimates a canonical basis of spectral components and voxelwise maps of their weightings, thereby pooling information across whole images to regularise the ill-posed problem. We show that our algorithm substantially outperforms current voxelwise spectral approaches. We demonstrate the method on combined diffusion-relaxometry placental MRI scans, revealing anatomically-relevant substructures, and identifying dysfunctional placentas. Our algorithm vastly reduces the data required to reliably estimate multidimensional correlation (or single-dimensional) spectra, opening up the possibility of spectroscopic imaging in a wide range of new applications.

Paddy J. Slator, Jana Hutter, Razvan V. Marinescu, Marco Palombo, Laurence H. Jackson, Alison Ho, Lucy C. Chappell, Mary Rutherford, Joseph V. Hajnal, Daniel C. Alexander
Semi-supervised Learning for Fetal Brain MRI Quality Assessment with ROI Consistency

Fetal brain MRI is useful for diagnosing brain abnormalities but is challenged by fetal motion. The current protocol for T2-weighted fetal brain MRI is not robust to motion so image volumes are degraded by inter- and intra- slice motion artifacts. Besides, manual annotation for fetal MR image quality assessment are usually time-consuming. Therefore, in this work, a semi-supervised deep learning method that detects slices with artifacts during the brain volume scan is proposed. Our method is based on the mean teacher model, where we not only enforce consistency between student and teacher models on the whole image, but also adopt an ROI consistency loss to guide the network to focus on the brain region. The proposed method is evaluated on a fetal brain MR dataset with 11,223 labeled images and more than 200,000 unlabeled images. Results show that compared with supervised learning, the proposed method can improve model accuracy by about 6% and outperform other state-of-the-art semi-supervised learning methods. The proposed method is also implemented and evaluated on an MR scanner, which demonstrates the feasibility of online image quality assessment and image reacquisition during fetal MR scans.

Junshen Xu, Sayeri Lala, Borjan Gagoski, Esra Abaci Turk, P. Ellen Grant, Polina Golland, Elfar Adalsteinsson
Enhanced Detection of Fetal Pose in 3D MRI by Deep Reinforcement Learning with Physical Structure Priors on Anatomy

Fetal MRI is heavily constrained by unpredictable and substantial fetal motion that causes image artifacts and limits the set of viable diagnostic image contrasts. Current mitigation of motion artifacts is predominantly performed by fast, single-shot MRI and retrospective motion correction. Estimation of fetal pose in real time during MRI stands to benefit prospective methods to detect and mitigate fetal motion artifacts where inferred fetal motion is combined with online slice prescription with low-latency decision making. Current developments of deep reinforcement learning (DRL), offer a novel approach for fetal landmarks detection. In this task 15 agents are deployed to detect 15 landmarks simultaneously by DRL. The optimization is challenging, and here we propose an improved DRL that incorporates priors on physical structure of the fetal body. First, we use graph communication layers to improve the communication among agents based on a graph where each node represents a fetal-body landmark. Further, additional reward based on the distance between agents and physical structures such as the fetal limbs is used to fully exploit physical structure. Evaluation of this method on a repository of 3-mm resolution in vivo data demonstrates a mean accuracy of landmark estimation 10 mm of ground truth as 87.3%, and a mean error of 6.9 mm. The proposed DRL for fetal pose landmark search demonstrates a potential clinical utility for online detection of fetal motion that guides real-time mitigation of motion artifacts as well as health diagnosis during MRI of the pregnant mother.

Molin Zhang, Junshen Xu, Esra Abaci Turk, P. Ellen Grant, Polina Golland, Elfar Adalsteinsson
Automatic Angle of Progress Measurement of Intrapartum Transperineal Ultrasound Image with Deep Learning

Angle of progress (AOP) is an important indicator used in assessing the progress of labor during delivery. However, manually measuring AOP is time consuming and subjective. In this study, we address the challenge of automatic AOP measurement of transperineal ultrasound (TPU) to achieve accurate monitoring of maternal and infant status. We propose a multitask framework for simultaneously locating the landmark of pubic symphysis endpoints and segmenting the region of the fetal head and pubic symphysis. We then exploit the localization of the landmarks to obtain the central axis of pubic symphysis. Afterward, we calculate the tangent of fetal head as it passes through the lower endpoint of pubic symphysis. Finally, we compute AOP from the central axis and tangent. Our framework is evaluated on the basis of a TPU dataset acquired at The First Affiliated Hospital of Jinan University, which is annotated by an ultrasound physician with over 10 years of experience. Our method achieves a mean difference of 7.6° and displays promising prospects for real-time monitoring of labor progress in clinical practice. To the best of our knowledge, this study is the first to apply deep learning methods to AOP measurements.

Minghong Zhou, Chao Yuan, Zhaoshi Chen, Chuan Wang, Yaosheng Lu
Joint Image Quality Assessment and Brain Extraction of Fetal MRI Using Deep Learning

Quality assessment (QA) and brain extraction (BE) are two fundamental steps in 3D fetal brain MRI reconstruction and quantification. Conventionally, QA and BE are performed independently, ignoring the inherent relation of the two closely-related tasks. However, both of them focus on the brain region representation, so they can be jointly optimized to ensure the network to learn shared features and avoid overfitting. To this end, we propose a novel multi-stage deep learning model for joint QA and BE of fetal MRI. The locations and orientations of fetal brains are randomly variable, and the shapes and appearances of fetal brains change remarkably across gestational ages, thus imposing great challenges to extract shared features of QA and BE. To address these problems, we firstly design a brain detector to locate the brain region. Then we introduce the deformable convolution to adaptively adjust the receptive field for dealing with variable brain shapes. Finally, a task-specific module is used for image QA and BE simultaneously. To obtain a well-trained model, we further propose a multi-step training strategy. We cross validate our method on two independent fetal MRI datasets acquired from different scanners with different imaging protocols, and achieve promising performance.

Lufan Liao, Xin Zhang, Fenqiang Zhao, Tao Zhong, Yuchen Pei, Xiangmin Xu, Li Wang, He Zhang, Dinggang Shen, Gang Li

Heart and Lung Imaging

Frontmatter
Accelerated 4D Respiratory Motion-Resolved Cardiac MRI with a Model-Based Variational Network

Respiratory motion and long scan times remain major challenges in free-breathing 3D cardiac MRI. Respiratory motion-resolved approaches have been proposed by binning the acquired data to different respiratory motion states. After inter-bin motion estimation, motion-compensated reconstruction can be obtained. However, respiratory bins from accelerated acquisitions are highly undersampled and have different undersampling patterns depending on the subject-specific respiratory motion. Remaining undersampling artifacts in the bin images can influence the accuracy of the motion estimation. We propose a model-based variational network (VN) which reconstructs motion-resolved images jointly by exploiting shared information between respiratory bins. In each stage of VN, conjugate gradient is adopted to enforce data-consistency (CG-VN), achieving better enforcement of data consistency per stage than the classic VN with proximal gradient descent step (GD-VN), translating to faster convergence and better reconstruction performance. We compare the performance of CG-VN and GD-VN for reconstruction of respiratory motion-resolved images for two different cardiac MR sequences. Our results show that CG-VN with less stages outperforms GD-VN by achieving higher PSNR and better generalization on prospectively undersampled data. The proposed motion-resolved CG-VN provides consistently good reconstruction quality for all motion states with varying undersampling patterns by taking advantage of redundancies among motion bins.

Haikun Qi, Niccolo Fuin, Thomas Kuestner, René Botnar, Claudia Prieto
Motion Pyramid Networks for Accurate and Efficient Cardiac Motion Estimation

Cardiac motion estimation plays a key role in MRI cardiac feature tracking and function assessment such as myocardium strain. In this paper, we propose Motion Pyramid Networks, a novel deep learning-based approach for accurate and efficient cardiac motion estimation. We predict and fuse a pyramid of motion fields from multiple scales of feature representations to generate a more refined motion field. We then use a novel cyclic teacher-student training strategy to make the inference end-to-end and further improve the tracking performance. Our teacher model provides more accurate motion estimation as supervision through progressive motion compensations. Our student model learns from the teacher model to estimate motion in a single step while maintaining accuracy. The teacher-student knowledge distillation is performed in a cyclic way for a further performance boost. Our proposed method outperforms a strong baseline model on two public available clinical datasets significantly, evaluated by a variety of metrics and the inference time. New evaluation metrics are also proposed to represent errors in a clinically meaningful manner.

Hanchao Yu, Xiao Chen, Humphrey Shi, Terrence Chen, Thomas S. Huang, Shanhui Sun
ICA-UNet: ICA Inspired Statistical UNet for Real-Time 3D Cardiac Cine MRI Segmentation

Real-time cine magnetic resonance imaging (MRI) plays an increasingly important role in various cardiac interventions. In order to enable fast and accurate visual assistance, the temporal frames need to be segmented on-the-fly. However, state-of-the-art MRI segmentation methods are used either offline because of their high computation complexity, or in real-time but with significant accuracy loss and latency increase (causing visually noticeable lag). As such, they can hardly be adopted to assist visual guidance. In this work, inspired by a new interpretation of Independent Component Analysis (ICA) [11] for learning, we propose a novel ICA-UNet for real-time 3D cardiac cine MRI segmentation. Experiments using the MICCAI ACDC 2017 dataset show that, compared with the state-of-the-arts, ICA-UNet not only achieves higher Dice scores, but also meets the real-time requirements for both throughput and latency (up to 12.6 $$\times $$ × reduction), enabling real-time guidance for cardiac interventions without visual lag.

Tianchen Wang, Xiaowei Xu, Jinjun Xiong, Qianjun Jia, Haiyun Yuan, Meiping Huang, Jian Zhuang, Yiyu Shi
A Bottom-Up Approach for Real-Time Mitral Valve Annulus Modeling on 3D Echo Images

3D+t Transesophageal Echocardiography (TEE) performs 4D scans of mitral valve (MV) morphology at frame rate providing real-time guidance for catheter-based interventions for MV repair and replacement. A key anatomical structure is the MV annulus, and live quantification of the dynamic annulus at acquisition rates 15 fps or higher have proven to be technically challenging. In this paper, we propose a bottom-up approach inspired by clinicians’ manual workflow for MV annulus modeling on 3D+t TEE images in real time. Specifically, we first detect annulus landmarks with clear 3D anatomical features via agents trained using Deep Reinforcement Learning. Leveraging the circular structure of the annulus, cross-annular planes are extracted and additional landmarks are then detected through 2D image-to-image networks on the 2D cutting planes. The complete 3D annulus is finally fitted through all detected landmarks using Splines. We validate the proposed approach on 795 3D+t TEE sequences with 1906 annotated frames, and achieve a speed 20 fps with a median accuracy 2.74 mm curve-to-curve error. Furthermore, device simulation is utilized to augment the training data that results in promising accuracy improvement on challenging echos with visible devices and warrants further investigation.

Yue Zhang, Abdoul-aziz Amadou, Ingmar Voigt, Viorel Mihalef, Helene Houle, Matthias John, Tommaso Mansi, Rui Liao
A Semi-supervised Joint Network for Simultaneous Left Ventricular Motion Tracking and Segmentation in 4D Echocardiography

This work presents a novel deep learning method to combine segmentation and motion tracking in 4D echocardiography. The network iteratively trains a motion branch and a segmentation branch. The motion branch is initially trained entirely unsupervised and learns to roughly map the displacements between a source and a target frame. The estimated displacement maps are then used to generate pseudo-ground truth labels to train the segmentation branch. The labels predicted by the trained segmentation branch are fed back into the motion branch and act as landmarks to help retrain the branch to produce smoother displacement estimations. These smoothed out displacements are then used to obtain smoother pseudo-labels to retrain the segmentation branch. Additionally, a biomechanically-inspired incompressibility constraint is implemented in order to encourage more realistic cardiac motion. The proposed method is evaluated against other approaches using synthetic and in-vivo canine studies. Both the segmentation and motion tracking results of our model perform favorably against competing methods.

Kevinminh Ta, Shawn S. Ahn, John C. Stendahl, Albert J. Sinusas, James S. Duncan
Joint Data Imputation and Mechanistic Modelling for Simulating Heart-Brain Interactions in Incomplete Datasets

The use of mechanistic models in clinical studies is limited by the lack of multi-modal patients data representing different anatomical and physiological processes. For example, neuroimaging datasets do not provide a sufficient representation of heart features for the modeling of cardiovascular factors in brain disorders. To tackle this problem we introduce a probabilistic framework for joint cardiac data imputation and personalisation of cardiovascular mechanistic models, with application to brain studies with incomplete heart data. Our approach is based on a variational framework for the joint inference of an imputation model of cardiac information from the available features, along with a Gaussian Process emulator that can faithfully reproduce personalised cardiovascular dynamics. Experimental results on UK Biobank show that our model allows accurate imputation of missing cardiac features in datasets containing minimal heart information, e.g. systolic and diastolic blood pressures only, while jointly estimating the emulated parameters of the lumped model. This allows a novel exploration of the heart-brain joint relationship through simulation of realistic cardiac dynamics corresponding to different conditions of brain anatomy.

Jaume Banus, Maxime Sermesant, Oscar Camara, Marco Lorenzi
Learning Geometry-Dependent and Physics-Based Inverse Image Reconstruction

Deep neural networks have shown great potential in image reconstruction problems in Euclidean space. However, many reconstruction problems involve imaging physics that are dependent on the underlying non-Euclidean geometry. In this paper, we present a new approach to learn inverse imaging that exploit the underlying geometry and physics. We first introduce a non-Euclidean encoding-decoding network that allows us to describe the unknown and measurement variables over their respective geometrical domains. We then learn the geometry-dependent physics in between the two domains by explicitly modeling it via a bipartite graph over the graphical embedding of the two geometry. We applied the presented network to reconstructing electrical activity on the heart surface from body-surface potential. In a series of generalization tasks with increasing difficulty, we demonstrated the improved ability of the presented network to generalize across geometrical changes underlying the data in comparison to its Euclidean alternatives.

Xiajun Jiang, Sandesh Ghimire, Jwala Dhamala, Zhiyuan Li, Prashnna Kumar Gyawali, Linwei Wang
Hierarchical Classification of Pulmonary Lesions: A Large-Scale Radio-Pathomics Study

Diagnosis of pulmonary lesions from computed tomography (CT) is important but challenging for clinical decision making in lung cancer related diseases. Deep learning has achieved great success in computer aided diagnosis (CADx) area for lung cancer, whereas it suffers from label ambiguity due to the difficulty in the radiological diagnosis. Considering that invasive pathological analysis serves as the clinical golden standard of lung cancer diagnosis, in this study, we solve the label ambiguity issue via a large-scale radio-pathomics dataset containing 5,134 radiological CT images with pathologically confirmed labels, including cancers (e.g., invasive/non-invasive adenocarcinoma, squamous carcinoma) and non-cancer diseases (e.g., tuberculosis, hamartoma). This retrospective dataset, named Pulmonary-RadPath, enables development and validation of accurate deep learning systems to predict invasive pathological labels with a non-invasive procedure, i.e., radiological CT scans. A three-level hierarchical classification system for pulmonary lesions is developed, which covers most diseases in cancer-related diagnosis. We explore several techniques for hierarchical classification on this dataset, and propose a Leaky Dense Hierarchy approach with proven effectiveness in experiments. Our study significantly outperforms prior arts in terms of data scales ( $$6\times $$ 6 × larger), disease comprehensiveness and hierarchies. The promising results suggest the potentials to facilitate precision medicine.

Jiancheng Yang, Mingze Gao, Kaiming Kuang, Bingbing Ni, Yunlang She, Dong Xie, Chang Chen
Learning Tumor Growth via Follow-Up Volume Prediction for Lung Nodules

Follow-up serves an important role in the management of pulmonary nodules for lung cancer. Imaging diagnostic guidelines with expert consensus have been made to help radiologists make clinical decision for each patient. However, tumor growth is such a complicated process that it is difficult to stratify high-risk nodules from low-risk ones based on morphologic characteristics. On the other hand, recent deep learning studies using convolutional neural networks (CNNs) to predict the malignancy score of nodules, only provides clinicians with black-box predictions. To this end, we propose a unified framework, named Nodule Follow-Up Prediction Network (NoFoNet), which predicts the growth of pulmonary nodules with high-quality visual appearances and accurate quantitative results, given any time interval from baseline observations. It is achieved by predicting future displacement field of each voxel with a WarpNet. A TextureNet is further developed to refine textural details of WarpNet outputs. We also introduce techniques including Temporal Encoding Module and Warp Segmentation Loss to encourage time-aware and shape-aware representation learning. We build an in-house follow-up dataset from two medical centers to validate the effectiveness of the proposed method. NoFoNet significantly outperforms direct prediction by a U-Net in terms of visual quality; more importantly, it demonstrates accurate differentiating performance between high- and low-risk nodules. Our promising results suggest the potentials in computer aided intervention for lung nodule management.

Yamin Li, Jiancheng Yang, Yi Xu, Jingwei Xu, Xiaodan Ye, Guangyu Tao, Xueqian Xie, Guixue Liu
Multi-stream Progressive Up-Sampling Network for Dense CT Image Reconstruction

Pulmonary computerized tomography (CT) images with small slice thickness (thin) is very helpful in clinical practice due to its high resolution for precise diagnosis. However, there are still a lot of CT images with large slice thickness (thick) because of the benefits of storage-saving and short taking time. Therefore, it is necessary to build a pipeline to leverage advantages from both thin and thick slices. In this paper, we try to generate thin slices from the thick ones, in order to obtain high quality images with a low storage requirement. Our method is implemented in an encoder-decoder manner with a proposed progressive up-sampling module to exploit enough information for reconstruction. To further lower the difficulty of the task, a multi-stream architecture is established to separately learn the inner- and outer-lung regions. During training, a contrast-aware loss and feature matching loss are designed to capture the appearance of lung markings and reduce the influence of noise. To verify the performance of the proposed method, a total of 880 pairs of CT images with both thin and thick slices are collected. Ablation study demonstrates the effectiveness of each component of our method and higher performance is obtained compared with previous work. Furthermore, three radiologists are required to detect pulmonary nodules in raw thick slices and the generated thin slices independently, the improvement in both sensitivity and precision shows the potential value of the proposed method in clinical applications.

Qiuyue Liu, Zhen Zhou, Feng Liu, Xiangming Fang, Yizhou Yu, Yizhou Wang
Abnormality Detection in Chest X-Ray Images Using Uncertainty Prediction Autoencoders

Chest radiography is widely used in annual medical screening to check whether lungs are healthy or not. Therefore it would be desirable to develop an intelligent system to help clinicians automatically detect potential abnormalities in chest X-ray images. Here with only healthy X-ray images, we propose a new abnormality detection approach based on an autoencoder which outputs not only the reconstructed normal version of the input image but also a pixel-wise uncertainty prediction. Higher uncertainty often appears at normal region boundaries with relatively larger reconstruction errors, but not at potential abnormal regions in the lung area. Therefore the normalized reconstruction error by the uncertainty provides a natural measurement for abnormality detection in images. Experiments on two chest X-ray datasets show the state-of-the-art performance by the proposed approach.

Yifan Mao, Fei-Fei Xue, Ruixuan Wang, Jianguo Zhang, Wei-Shi Zheng, Hongmei Liu
Region Proposals for Saliency Map Refinement for Weakly-Supervised Disease Localisation and Classification

The deployment of automated systems to diagnose diseases from medical images is challenged by the requirement to localise the diagnosed diseases to justify or explain the classification decision. This requirement is hard to fulfil because most of the training sets available to develop these systems only contain global annotations, making the localisation of diseases a weakly supervised approach. The main methods designed for weakly supervised disease classification and localisation rely on saliency or attention maps that are not specifically trained for localisation, or on region proposals that can not be refined to produce accurate detections. In this paper, we introduce a new model that combines region proposal and saliency detection to overcome both limitations for weakly supervised disease classification and localisation. Using the ChestX-ray14 data set, we show that our proposed model establishes the new state-of-the-art for weakly-supervised disease diagnosis and localisation. We make our code available at https://github.com/renato145/RpSalWeaklyDet .

Renato Hermoza, Gabriel Maicas, Jacinto C. Nascimento, Gustavo Carneiro
CPM-Net: A 3D Center-Points Matching Network for Pulmonary Nodule Detection in CT Scans

Automatic and accurate lung nodule detection from Computed Tomography (CT) scans plays a vital role in efficient lung cancer screening. Despite the state-of-the-art performance obtained by recent anchor-based detectors using Convolutional Neural Networks (CNNs) for this task, they require pre-determined anchor parameters such as the size, number and aspect ratio of anchors, and have limited robustness when dealing with lung nodules with a massive variety of sizes. To overcome this problem, we propose a 3D center-points matching detection network (CPM-Net) that is anchor-free and automatically predicts the position, size and aspect ratio of nodules without manual design of nodule/anchor parameters. The CPM-Net uses center-points matching strategy to find center-points, and then uses features of these points correspondingly to regress the size of the bounding box of nodule and local offset of the center points. To better capture spatial information and 3D context for the detection, we propose to fuse multi-level spatial coordinate maps with the feature extractor and combine it with 3D squeeze-and-excitation attention modules. To deal with the enormous imbalance between the number of positive and negative samples during center points matching, we propose a hybrid method of adaptive points mining and re-focal loss. Experimental results on LUNA16 dataset showed that our proposed CPM-Net achieved superior performance for lung nodule detection compared with state-of-the-art anchor-based methods.

Tao Song, Jieneng Chen, Xiangde Luo, Yechong Huang, Xinglong Liu, Ning Huang, Yinan Chen, Zhaoxiang Ye, Huaqiang Sheng, Shaoting Zhang, Guotai Wang
Interpretable Identification of Interstitial Lung Disease (ILD) Associated Findings from CT

In this study, we present a method to identify radiologic findings associated with interstitial lung diseases (ILD), a heterogeneous collection of progressive lung diseases, from thoracic CT scans. Prior studies have relied on densely supervised methods using 2D slices or small 3D patches as input, requiring significant manual labor to create dense labels. This limits the amount of data available for algorithm development and thus hinders generalization performance. To harness available large, but sparsely labeled datasets, we present a weakly supervised method to identify imaging findings associated with ILD. We test this framework to classify and roughly localize 14 radiologic findings on the LTRC dataset of 3380 thoracic CT scans. We conduct 5-fold cross-validation and achieve 0.8 mean AUC scores on 5 out of 14 findings classification. We visualize attention energy maps which demonstrate that our classifier is able to learn representative features with meaningful differences between radiologic findings, and is capable of approximately localizing the findings of interest, thereby adding interpretability of our model (This work was supported by the USPHS under NIH grant R01-HL133889).

Yifan Wu, Jiancong Wang, William D. Lindsay, Tarmily Wen, Jianbo Shi, James C. Gee
Learning with Sure Data for Nodule-Level Lung Cancer Prediction

Recent evolution in image-based disease prediction based on deep learning has significantly extended the clinical capabilities of these systems. However, in certain cases (e.g. lung nodule prediction), ground truth labels manually annotated by radiologists (unsure data) are often based on subjective assessment, which lack pathological-proven benchmarks (sure data) at the nodule-level. To address this issue, we build a small yet definite CT dataset (171 patients) called SCH-LND focusing on solid lung nodules (90 benign/90 malignant cases). Under the supervision of SCH-LND dataset, many hidden drawbacks of unsure data (484 solid nodules selected from LIDC-IDRI dataset) served for malignancy prediction are objectively revealed. Explanations to this phenomenon are inferred in this paper from the view of model training and data annotation bias. Although learning from scratch over sure data with commonly used model can surpass the performance of unsure data in large scales, we additionally propose two frameworks to make the best use of these cross-domain resources, among which, transfer learning is verified as an effective approach for LIDC-IDRI knowledge adaptation. Results show that the proposed method can achieve good performance for nodule-level malignancy prediction with a small SCH-LND dataset.

Hanxiao Zhang, Yun Gu, Yulei Qin, Feng Yao, Guang-Zhong Yang
Cascaded Robust Learning at Imperfect Labels for Chest X-ray Segmentation

The superior performance of CNN on medical image analysis heavily depends on the annotation quality, such as the number of labeled images, the source of images, and the expert experience. The annotation requires great expertise and labor. To deal with the high inter-rater variability, the study of the imperfect label has great significance in medical image segmentation tasks. In this paper, we present a novel cascaded robust learning framework for chest X-ray segmentation with imperfect annotation at the boundary. Our model consists of three independent networks, which can effectively learn useful information from peer networks. The framework includes two stages. In the first stage, we select the clean annotated samples via a model committee setting, the networks are trained by minimizing a segmentation loss using the selected clean samples. In the second stage, we design a joint optimization framework with label correction to gradually correct the wrong annotation and improve the network performance. We conduct experiments on the public chest X-ray image datasets collected by Shenzhen Hospital. The results show that our methods could achieve a significant improvement on the accuracy in segmentation tasks compared to the previous methods.

Cheng Xue, Qiao Deng, Xiaomeng Li, Qi Dou, Pheng-Ann Heng
Class-Aware Multi-window Adversarial Lung Nodule Synthesis Conditioned on Semantic Features

Nodule CT image synthesis is effective as a data augmentation method for deep learning tasks about lung nodules. To advance the realistic malignant/benign lung nodule synthesis, the conditional Generative Adversarial Networks have been widely adopted. In this paper, we argue about an issue in the existing technique for class-aware nodule synthesis: the class-aware controllability of semantic features. To address this issue, we propose a adversarial lung nodule synthesis framework based on conditional Generative Adversarial Networks and class-aware multi-window semantic feature learning. By learning semantic features from multi-window CT images, our framework can generate realistic nodule CT images, and has better controllability of class-aware nodule features. Our framework provides a new perspective for nodule CT image synthesis that has never been noticed before. We train our framework on the public dataset LIDC-IDRI. Our framework improves the malignancy prediction F1 score by more than 3% and shows promising results as a solution for lung nodule augmentation. The source code can be found at https://github.com/qiuliwang/CA-MW-Adversarial-Synthesis .

Qiuli Wang, Xingpeng Zhang, Wei Chen, Kun Wang, Xiaohong Zhang
Nodule2vec: A 3D Deep Learning System for Pulmonary Nodule Retrieval Using Semantic Representation

Content-based retrieval supports a radiologist decision making process by presenting the doctor the most similar cases from the database containing both historical diagnosis and further disease development history. We present a deep learning system that transforms a 3D image of a pulmonary nodule from a CT scan into a low-dimensional embedding vector. We demonstrate that such a vector representation preserves semantic information about the nodule and offers a viable approach for content-based image retrieval (CBIR). We discuss the theoretical limitations of the available datasets and overcome them by applying transfer learning of the state-of-the-art lung nodule detection model. We evaluate the system using the LIDC-IDRI dataset of thoracic CT scans. We devise a similarity score and show that it can be utilized to measure similarity 1) between annotations of the same nodule by different radiologists and 2) between the query nodule and the top four CBIR results. A comparison between doctors and algorithm scores suggests that the benefit provided by the system to the radiologist end-user is comparable to obtaining a second radiologist’s opinion.

Ilia Kravets, Tal Heletz, Hayit Greenspan
Deep Active Learning for Effective Pulmonary Nodule Detection

Expensive and time-consuming medical imaging annotation is one of the big challenges for the deep learning-based computer-aided diagnosis (CAD) on the low-dose computed tomography (CT). To address this problem, we propose a novel active learning approach to improve the training efficiency for a deep network-based lung nodule detection framework as well as reduce the annotation cost. The informative CT scans, such as the samples that inconspicuous or likely to produce high false positives, are selected and further annotated for the nodule detector network training. A simple yet effective schema suggests the samples by ranking the uncertainty loss predicted by multi-layer feature maps and the Region of Interests (RoIs). The proposed framework is evaluated on a public dataset DeepLesion and achieves results that surpass the active learning baseline schema at all the training cycles.

Jingya Liu, Liangliang Cao, Yingli Tian

Musculoskeletal Imaging

Frontmatter
Towards Robust Bone Age Assessment: Rethinking Label Noise and Ambiguity

The effects of label noise and ambiguity are widespread, especially for subjective tasks such as bone age assessment (BAA). However, most existing BAA algorithms ignore these issues. We propose a robust framework for BAA supporting Tanner & Whitehouse 3 (TW3) method, which is clinically more objective and reproducible than Greulich & Pyle (GP) method, but has received less attention from the research community. Since the publicly available RSNA BAA dataset was annotated using GP method, we contribute additional TW3 annotations. We formulate TW3 BAA as an ordinal regression problem, and address both label noise and ambiguity with a two stage deep learning framework. The first stage focuses on correcting erroneous labels with ambiguity tolerated, while the latter stage introduces a module called Residual Context Graph (RCG) to conquer label ambiguity. Inspired by the way human experts handle ambiguity, we combine fine-grained local features with a graph based context. Experiments show the proposed framework outperforms previously reported TW3-based BAA systems by large margins. TW3 annotations of bone maturity levels for a portion of the RSNA BAA dataset will be made publicly available.

Ping Gong, Zihao Yin, Yizhou Wang, Yizhou Yu
Improve Bone Age Assessment by Learning from Anatomical Local Regions

Skeletal bone age assessment (BAA), as an essential imaging examination, aims at evaluating the biological and structural maturation of human bones. In the clinical practice, Tanner and Whitehouse (TW2) method is a widely-used method for radiologists to perform BAA. The TW2 method splits the hands into Region Of Interests (ROI) and analyzes each of the anatomical ROI separately to estimate the bone age. Because of considering the analysis of local information, the TW2 method shows accurate results in practice. Following the spirit of TW2, we propose a novel model called Anatomical Local-Aware Network (ALA-Net) for automatic bone age assessment. In ALA-Net, anatomical local extraction module is introduced to learn the hand structure and extract local information. Moreover, we design an anatomical patch training strategy to provide extra regularization during the training process. Our model can detect the anatomical ROIs and estimate bone age jointly in an end-to-end manner. The experimental results show that our ALA-Net achieves a new state-of-the-art single model performance of 3.91 mean absolute error (MAE) on the public available RSNA dataset. Since the design of our model is well consistent with the well recognized TW2 method, it is interpretable and reliable for clinical usage.

Dong Wang, Kexin Zhang, Jia Ding, Liwei Wang
An Analysis by Synthesis Method that Allows Accurate Spatial Modeling of Thickness of Cortical Bone from Clinical QCT

Osteoporosis is a skeletal disorder that leads to increased fracture risk due to decreased strength of cortical and trabecular bone. Even with state-of-the-art non-invasive assessment methods there is still a high underdiagnosis rate. Quantitative computed tomography (QCT) permits the selective analysis of cortical bone, however the low spatial resolution of clinical QCT leads to an overestimation of the thickness of cortical bone (Ct.Th) and bone strength.We propose a novel, model based, fully automatic image analysis method that allows accurate spatial modeling of the thickness distribution of cortical bone from clinical QCT. In an analysis-by-synthesis (AbS) fashion a stochastic scan is synthesized from a probabilistic bone model, the optimal model parameters are estimated using a maximum a-posteriori approach. By exploiting the different characteristics of in-plane and out-of-plane point spread functions of CT scanners the proposed method is able assess the spatial distribution of cortical thickness.The method was evaluated on eleven cadaveric human vertebrae, scanned by clinical QCT and analyzed using standard methods and AbS, both compared to high resolution peripheral QCT (HR-pQCT) as gold standard. While standard QCT based measurements overestimated Ct.Th. by 560% and did not show significant correlation with the gold standard ( $$r^2 = 0.20,\,p = 0.169$$ r 2 = 0.20 , p = 0.169 ) the proposed method eliminated the overestimation and showed a significant tight correlation with the gold standard ( $$r^2 = 0.98,\,p < 0.0001$$ r 2 = 0.98 , p < 0.0001 ) a root mean square error below 10%.

Stefan Reinhold, Timo Damm, Sebastian Büsse, Stanislav Gorb, Claus-C. Glüer, Reinhard Koch
Segmentation of Paraspinal Muscles at Varied Lumbar Spinal Levels by Explicit Saliency-Aware Learning

Automated segmentation for paraspinal muscles on axial lumbar MRIs of varied spinal levels is clinically demanded. However, it is challenging and there is no reported success due to the large inter- and intra-organ variations, unclear muscle boundaries and unpredictable muscle degeneration patterns. In this paper, we propose a novel explicit saliency-aware learning framework (BS-ESNet) for fine segmentation of multiple paraspinal muscles and other major components at varied spinal levels across the full lumbar spine. BS-ESNet is designed to first detect the location of each organ in forms of bounding box (b-box); then performs accurately segmentation which utilizes detected b-boxes to enable spatial saliency awareness. BS-ESNet creatively conducts detection upon a preliminary segmentation mask instead of input MRI, which eliminates the influence of inter-organ variations and is robust against unclear muscle boundaries. Such segment-then-detect workflow also provides a paradigm to formulate multi-organ detection in an end-to-end trainable process. Our framework also embeds an elaborate spatial attention gate which adopts detection b-boxes to obtain a saliency activation map in an explicitly supervised manner. Acquired salient attention map can automatically correct and enhance segmentation features, and further guides the adaptation of variable precise anatomical structures. The method is validated on a challenging dataset of 320 MRIs. Evaluation results demonstrate that our BS-ESNet achieves high segmentation performance with mean Dice score of 0.94 and outperforms other state-of-the-art frameworks.

Jiawei Huang, Haotian Shen, Bo Chen, Yue Wang, Shuo Li
Manifold Ordinal-Mixup for Ordered Classes in TW3-Based Bone Age Assessment

Bone age assessment (BAA) is vital to detecting abnormal growth in children and can be used to investigate its cause. Automating assessments could benefit radiologists by reducing reader variability and reading time. Recently, deep learning (DL) algorithms have been devised to automate BAA using hand X-ray images mostly based on GP-based methods. In contrast to GP-based methods where radiologists compare the whole hand’s X-ray image with standard images in the GP-atlas, TW3 methods operate by analyzing major bones in the hand image to estimate the subject’s bone age. It is thus more attractive to automate TW3 methods for their lower reader variability and higher accuracy; however, the inaccessibility of bone maturity stages inhibited wide-spread application of DL in automating TW3 systems. In this work, we propose an unprecedented DL-based TW3 system by training deep neural networks (DNNs) to extract region of interest (RoI) patches in hand images for all 13 major bones and estimate the bone’s maturity stage which in turn can be used to estimate the bone age. For this purpose, we designed a novel loss function which considers ordinal relations among classes corresponding to maturity stages, and show that DNNs trained using our loss not only attains lower mean absolute error, but also learns a path-connected latent space illuminating the inherent ordinal relations among classes. Our experiments show that DNNs trained using the proposed loss outperform other DL algorithms, known to excel in other tasks, in estimating maturity stage and bone age.

Byeonguk Bae, Jaewon Lee, Seo Taek Kong, Jinkyeong Sung, Kyu-Hwan Jung
Contour-Based Bone Axis Detection for X-Ray Guided Surgery on the Knee

The anatomical axis of long bones is an important reference line for guiding fracture reduction and assisting in the correct placement of guide pins, screws, and implants in orthopedics and trauma surgery. This study investigates an automatic approach for detection of such axes on X-ray images based on the segmentation contour of the bone. For this purpose, we use the medically established two-line method and translate it into a learning-based approach. The proposed method is evaluated on 38 clinical test images of the femoral and tibial bone and achieves a median angulation error of $$0.19^{\circ }$$ 0 . 19 ∘ and $$0.33^{\circ }$$ 0 . 33 ∘ respectively. An inter-rater study with three trauma surgery experts confirms reliability of the method and recommends further clinical application.

Florian Kordon, Andreas Maier, Benedict Swartman, Maxim Privalov, Jan Siad El Barbari, Holger Kunze
Automatic Segmentation, Localization, and Identification of Vertebrae in 3D CT Images Using Cascaded Convolutional Neural Networks

This paper presents a method for automatic segmentation, localization, and identification of vertebrae in arbitrary 3D CT images. Many previous works do not perform the three tasks simultaneously even though requiring a priori knowledge of which part of the anatomy is visible in the 3D CT images. Our method tackles all these tasks in a single multi-stage framework without any assumptions. In the first stage, we train a 3D Fully Convolutional Networks to find the bounding boxes of the cervical, thoracic, and lumbar vertebrae. In the second stage, we train an iterative 3D Fully Convolutional Networks to segment individual vertebrae in the bounding box. The input to the second networks have an auxiliary channel in addition to the 3D CT images. Given the segmented vertebra regions in the auxiliary channel, the networks output the next vertebra. The proposed method is evaluated in terms of segmentation, localization, and identification accuracy with two public datasets of 15 3D CT images from the MICCAI CSI 2014 workshop challenge and 302 3D CT images with various pathologies introduced in [1]. Our method achieved a mean Dice score of 96%, a mean localization error of 8.3 mm, and a mean identification rate of 84%. In summary, our method achieved better performance than all existing works in all the three metrics.

Naoto Masuzawa, Yoshiro Kitamura, Keigo Nakamura, Satoshi Iizuka, Edgar Simo-Serra
Discriminative Dictionary-Embedded Network for Comprehensive Vertebrae Tumor Diagnosis

Comprehensive vertebrae tumor diagnosis (vertebrae recognition and vertebrae tumor diagnosis from MRI images) is crucial for tumor screening and preventing further metastasis. However, this task has not yet been attempted due to challenges caused by various tumor appearance, non-tumor diseases with similar appearance, irrelevant interference information, as well as diverse MRI image field of view (FOV) and/or characteristics. We purpose a discriminative dictionary-embedded network (DECIDE) that contains an elaborated enhanced-supervision recognition network (ERN) and a discerning diagnosis network (DDN). Our ERN creatively designs projection-guided dictionary learning to leverage projections of angular point coordinates onto multiple observation axes for enhanced supervision and discriminability of different vertebrae. DDN integrates a novel label consistent dictionary learning layer into a classification network to obtain more discerning sparse codes for diagnosing performance improvement. DECIDE is trained and evaluated using a very challenging dataset consisted of 600 MRI images; the evaluation results show that DECIDE achieves high performance in both recognition (accuracy: 0.928) and diagnosis (AUC: 0.96) tasks.

Shen Zhao, Bin Chen, Heyou Chang, Xi Wu, Shuo Li
Multi-vertebrae Segmentation from Arbitrary Spine MR Images Under Global View

Multi-vertebrae segmentation plays an important role in spine diseases diagnosis and treatment planning. Global spatial dependencies between vertebrae are essential prior information for automatic multi-vertebrae segmentation. However, due to the lack of global information, previous methods have to localize specific vertebrae regions first, then segment and recognize the vertebrae in the region, resulting in a reduction in feature reuse and increase in computation. In this paper, we propose to leverage both global spatial and label information for multi-vertebrae segmentation from arbitrary MR images in one go. Specifically, a spatial graph convolutional network (GCN) is designed to first automatically learn an adjacency matrix and construct a graph on local feature maps, then adopt stacked GCN to capture the global spatial relationships between vertebrae. A label attention network is built to predict the appearance probabilities of all vertebrae using attention mechanism to reduce the ambiguity caused by variant FOV or similar appearances of adjacent vertebrae. The proposed method is trained in an end-to-end manner and evaluated on a challenging dataset of 292 MRI scans with various fields of view, image characteristics and vertebra deformations. The experimental results show that our method achieves high performance ( $$89.28\pm 5.21$$ 89.28 ± 5.21 of IDR and $$85.37\pm 4.09\%$$ 85.37 ± 4.09 % of mIoU) from arbitrary input images.

Heyou Chang, Shen Zhao, Hao Zheng, Yang Chen, Shuo Li
A Convolutional Approach to Vertebrae Detection and Labelling in Whole Spine MRI

We propose a novel convolutional method for the detection and identification of vertebrae in whole spine MRIs. This involves using a learnt vector field to group detected vertebrae corners together into individual vertebral bodies and convolutional image-to-image translation followed by beam search to label vertebral levels in a self-consistent manner. The method can be applied without modification to lumbar, cervical and thoracic-only scans across a range of different MR sequences. The resulting system achieves 98.1% detection rate and 96.5% identification rate on a challenging clinical dataset of whole spine scans and matches or exceeds the performance of previous systems of detecting and labelling vertebrae in lumbar-only scans. Finally, we demonstrate the clinical applicability of this method, using it for automated scoliosis detection in both lumbar and whole spine MR scans.

Rhydian Windsor, Amir Jamaludin, Timor Kadir, Andrew Zisserman
Keypoints Localization for Joint Vertebra Detection and Fracture Severity Quantification

Vertebral body compression fractures are reliable early signs of osteoporosis. Though these fractures are visible on Computed Tomography (CT) images, they are frequently missed by radiologists in clinical settings. Prior research on automatic methods of vertebral fracture classification proves its reliable quality; however, existing methods provide hard-to-interpret outputs and sometimes fail to process cases with severe abnormalities such as highly pathological vertebrae or scoliosis. We propose a new two-step algorithm to localize the vertebral column in 3D CT images and then to simultaneously detect individual vertebrae and quantify fractures in 2D. We train neural networks for both steps using a simple 6-keypoints based annotation scheme, which corresponds precisely to current medical recommendations. Our algorithm has no exclusion criteria, processes 3D CT in 2 s on a single GPU and provides an intuitive and verifiable output. The method approaches expert-level performance and demonstrates state-of-the-art results in vertebrae 3D localization (the average error is 1 mm), vertebrae 2D detection (precision is 0.99, recall is 1), and fracture identification (ROC AUC at the patient level is 0.93).

Maxim Pisov, Vladimir Kondratenko, Alexey Zakharov, Alexey Petraikin, Victor Gombolevskiy, Sergey Morozov, Mikhail Belyaev
Grading Loss: A Fracture Grade-Based Metric Loss for Vertebral Fracture Detection

Osteoporotic vertebral fractures have a severe impact on patients’ overall well-being but are severely under-diagnosed. These fractures present themselves at various levels of severity measured using the Genant’s grading scale. Insufficient annotated datasets, severe data-imbalance, and minor difference in appearances between fractured and healthy vertebrae make naive classification approaches result in poor discriminatory performance. Addressing this, we propose a representation learning-inspired approach for automated vertebral fracture detection, aimed at learning latent representations efficient for fracture detection. Building on state-of-art metric losses, we present a novel Grading Loss for learning representations that respect Genant’s fracture grading scheme. On a publicly available spine dataset, the proposed loss function achieves a fracture detection F1 score of 81.5%, a 10% increase over a naive classification baseline.

Malek Husseini, Anjany Sekuboyina, Maximilian Loeffler, Fernando Navarro, Bjoern H. Menze, Jan S. Kirschke
3D Convolutional Sequence to Sequence Model for Vertebral Compression Fractures Identification in CT

An osteoporosis-related fracture occurs every three seconds worldwide, affecting one in three women and one in five men aged over 50. The early detection of at-risk patients facilitates effective and well-evidenced preventative interventions, reducing the incidence of major osteoporotic fractures. In this study we present an automatic system for identification of vertebral compression fractures on Computed Tomography images, which are often an undiagnosed precursor to major osteoporosis-related fractures. The system integrates a compact 3D representation of the spine, utilizing a Convolutional Neural Network (CNN) for spinal cord detection and a novel end-to-end sequence to sequence 3D architecture. We evaluate several model variants that exploit different representation and classification approaches, and present a framework combining an ensemble of models that achieves state of the art results, validated on a large data set, with a patient-level fracture identification of 0.955 Area Under the Curve (AUC). The system proposed has the potential to support osteoporosis clinical management, improve treatment pathways and to change the course of one of the most burdensome diseases of our generation .

David Chettrit, Tomer Meir, Hila Lebel, Mila Orlovsky, Ronen Gordon, Ayelet Akselrod-Ballin, Amir Bar
SIMBA: Specific Identity Markers for Bone Age Assessment

Bone Age Assessment (BAA) is a task performed by radiologists to diagnose abnormal growth in a child. In manual approaches, radiologists take into account different identity markers when calculating bone age, i.e., chronological age and gender. However, the current automated Bone Age Assessment methods do not completely exploit the information present in the patient’s metadata. With this lack of available methods as motivation, we present SIMBA: Specific Identity Markers for Bone Age Assessment. SIMBA is a novel approach for the task of BAA based on the use of identity markers. For this purpose, we build upon the state-of-the-art model, fusing the information present in the identity markers with the visual features created from the original hand radiograph. We then use this robust representation to estimate the patient’s relative bone age: the difference between chronological age and bone age. We validate SIMBA on the Radiological Hand Pose Estimation dataset and find that it outperforms previous state-of-the-art methods. SIMBA sets a trend of a new wave of Computer-aided Diagnosis methods that incorporate all of the data that is available regarding a patient. To promote further research in this area and ensure reproducibility we will provide the source code as well as the pre-trained models of SIMBA.

Cristina González, María Escobar, Laura Daza, Felipe Torres, Gustavo Triana, Pablo Arbeláez
Doctor Imitator: A Graph-Based Bone Age Assessment Framework Using Hand Radiographs

Bone age assessment is challenging in clinical practice due to the complicated bone age assessment process. Current automatic bone age assessment methods were designed with rare consideration of the diagnostic logistics and thus may yield certain uninterpretable hidden states and outputs. Consequently, doctors can find it hard to cooperate with such models harmoniously because it is difficult to check the correctness of the model predictions. In this work, we propose a new graph-based deep learning framework for bone age assessment with hand radiographs, called Doctor Imitator (DI). The architecture of DI is designed to learn the diagnostic logistics of doctors using the scoring methods (e.g., the Tanner-Whitehouse method) for bone age assessment. Specifically, the convolutions of DI capture the local features of the anatomical regions of interest (ROIs) on hand radiographs and predict the ROI scores by our proposed Anatomy-based Group Convolution, summing up for bone age prediction. Besides, we develop a novel Dual Graph-based Attention module to compute patient-specific attention for ROI features and context attention for ROI scores. As far as we know, DI is the first automatic bone age assessment framework following the scoring methods without fully supervised hand radiographs. Experiments on hand radiographs with only bone age supervision verify that DI can achieve excellent performance with sparse parameters and provide more interpretability.

Jintai Chen, Bohan Yu, Biwen Lei, Ruiwei Feng, Danny Z. Chen, Jian Wu
Inferring the 3D Standing Spine Posture from 2D Radiographs

The treatment of degenerative spinal disorders requires an understanding of the individual spinal anatomy and curvature in 3D. An upright spinal pose (i.e. standing) under natural weight bearing is crucial for such bio-mechanical analysis. 3D volumetric imaging modalities (e.g. CT and MRI) are performed in patients lying down. On the other hand, radiographs are captured in an upright pose, but result in 2D projections. This work aims to integrate the two realms, i.e. it combines the upright spinal curvature from radiographs with the 3D vertebral shape from CT imaging for synthesizing an upright 3D model of spine, loaded naturally. Specifically, we propose a novel neural network architecture working vertebra-wise, termed TransVert, which takes orthogonal 2D radiographs and infers the spine’s 3D posture. We validate our architecture on digitally reconstructed radiographs, achieving a 3D reconstruction Dice of $$95.52\%$$ 95.52 % , indicating an almost perfect 2D-to-3D domain translation. Deploying our model on clinical radiographs, we successfully synthesise full-3D, upright, patient-specific spine models for the first time .

Amirhossein Bayat, Anjany Sekuboyina, Johannes C. Paetzold, Christian Payer, Darko Stern, Martin Urschler, Jan S. Kirschke, Bjoern H. Menze
Generative Modelling of 3D In-Silico Spongiosa with Controllable Micro-structural Parameters

Research in vertebral bone micro-structure generally requires costly procedures to obtain physical scans of real bone with a specific pathology under study, since no methods are available yet to generate realistic bone structures in-silico. Here we propose to apply recent advances in generative adversarial networks (GANs) to develop such a method. We adapted style-transfer techniques, which have been largely used in other contexts, in order to transfer style between image pairs while preserving its informational content. In a first step, we trained a volumetric generative model in a progressive manner using a Wasserstein objective and gradient penalty (PWGAN-GP) to create patches of realistic bone structure in-silico. The training set contained 7660 purely spongeous bone samples from twelve human vertebrae (T12 or L1) with isotropic resolution of $$164~\upmu $$ 164 μ m and scanned with a high resolution peripheral quantitative CT (Scanco XCT). After training, we generated new samples with tailored micro-structure properties by optimizing a vector $$\textit{\textbf{z}}$$ z in the learned latent space. To solve this optimization problem, we formulated a differentiable goal function that leads to valid samples while compromising the appearance (content) with target 3D properties (style). Properties of the learned latent space effectively matched the data distribution. Furthermore, we were able to simulate the resulting bone structure after deterioration or treatment effects of osteoporosis therapies based only on expected changes of micro-structural parameters. Our method allows to generate a virtually infinite number of patches of realistic bone micro-structure, and thereby likely serves for the development of bone-biomarkers and to simulate bone therapies in advance.

Emmanuel Iarussi, Felix Thomsen, Claudio Delrieux
GAN-Based Realistic Bone Ultrasound Image and Label Synthesis for Improved Segmentation

To provide a safe alternative, for intra-operative fluoroscopy, ultrasound (US) has been investigated as an alternative safe imaging modality for various computer assisted orthopedic surgery (CAOS) procedures. However, low signal to noise ratio, imaging artifacts and bone surfaces appearing several millimeters (mm) in thickness have hindered the wide spread application of US in CAOS. In order to provide a solution for these problems, research has focused on the development of accurate, robust and real-time bone segmentation methods. Most recently methods based on deep learning have shown very promising results. However, scarcity of bone US data introduces significant challenges when training deep learning models. In this work, we propose a computational method, based on a novel generative adversarial network (GAN) architecture, to (1) produce synthetic B-mode US images and (2) their corresponding segmented bone surface masks in real-time. We show how a duality concept can be implemented for such tasks. Armed by two convolutional blocks, referred to as self-projection and self-attention blocks, our proposed GAN model synthesizes realistic B-mode bone US image and segmented bone masks. Quantitative and qualitative evaluation studies are performed on 1235 scans collected from 27 subjects using two different US machines to show comparison results of our model against state-of-the-art GANs for the task of bone surface segmentation using U-net.

Ahmed Z. Alsinan, Charles Rule, Michael Vives, Vishal M. Patel, Ilker Hacihaliloglu
Robust Bone Shadow Segmentation from 2D Ultrasound Through Task Decomposition

Acoustic bone shadow information in ultrasound (US) is important during imaging bones in US-guided orthopedic procedures. In this work, an end to end deep learning-based method is proposed to segment the bone shadow region from US data. In particular, we decompose the bone shadow segmentation task into two subtasks, coarse bone shadow enhancement (BSE) and horizontal bone interval mask (HBIM) estimation. Outputs from two subtasks are processed by a masking operation to generate the final bone shadow segmentation. To better leverage the mutual information in different tasks, our model features a shared encoder as deep feature extractor for both subtasks and two multi-scale pyramid pooling decoders. Additionally, we propose a conditional shape discriminator to regularize the shape of the output segmentation map. The proposed method is validated on 814 in vivo US scans obtained from knee, femur, distal radius and tibia bones. Validation against expert annotation achieved statistically significant improvements in segmentation of bone shadow regions compared to the state-of-the-art method.

Puyang Wang, Michael Vives, Vishal M. Patel, Ilker Hacihaliloglu
Backmatter
Metadata
Title
Medical Image Computing and Computer Assisted Intervention – MICCAI 2020
Editors
Prof. Anne L. Martel
Purang Abolmaesumi
Danail Stoyanov
Diana Mateus
Maria A. Zuluaga
S. Kevin Zhou
Daniel Racoceanu
Prof. Leo Joskowicz
Copyright Year
2020
Electronic ISBN
978-3-030-59725-2
Print ISBN
978-3-030-59724-5
DOI
https://doi.org/10.1007/978-3-030-59725-2

Premium Partner