Skip to main content
main-content

Über dieses Buch

The four-volume set LNCS 11070, 11071, 11072, and 11073 constitutes the refereed proceedings of the 21st International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2018, held in Granada, Spain, in September 2018.

The 373 revised full papers presented were carefully reviewed and selected from 1068 submissions in a double-blind review process. The papers have been organized in the following topical sections:
Part I: Image Quality and Artefacts; Image Reconstruction Methods; Machine Learning in Medical Imaging; Statistical Analysis for Medical Imaging; Image Registration Methods.
Part II: Optical and Histology Applications: Optical Imaging Applications; Histology Applications; Microscopy Applications; Optical Coherence Tomography and Other Optical Imaging Applications. Cardiac, Chest and Abdominal Applications: Cardiac Imaging Applications: Colorectal, Kidney and Liver Imaging Applications; Lung Imaging Applications; Breast Imaging Applications; Other Abdominal Applications.
Part III: Diffusion Tensor Imaging and Functional MRI: Diffusion Tensor Imaging; Diffusion Weighted Imaging; Functional MRI; Human Connectome. Neuroimaging and Brain Segmentation Methods: Neuroimaging; Brain Segmentation Methods.
Part IV: Computer Assisted Intervention: Image Guided Interventions and Surgery; Surgical Planning, Simulation and Work Flow Analysis; Visualization and Augmented Reality. Image Segmentation Methods: General Image Segmentation Methods, Measures and Applications; Multi-Organ Segmentation; Abdominal Segmentation Methods; Cardiac Segmentation Methods; Chest, Lung and Spine Segmentation; Other Segmentation Applications.

Inhaltsverzeichnis

Frontmatter

Image Quality and Artefacts

Frontmatter

Conditional Generative Adversarial Networks for Metal Artifact Reduction in CT Images of the Ear

We propose an approach based on a conditional generative adversarial network (cGAN) for the reduction of metal artifacts (RMA) in computed tomography (CT) ear images of cochlear implants (CIs) recipients. Our training set contains paired pre-implantation and post-implantation CTs of 90 ears. At the training phase, the cGAN learns a mapping from the artifact-affected CTs to the artifact-free CTs. At the inference phase, given new metal-artifact-affected CTs, the cGAN produces CTs in which the artifacts are removed. As a pre-processing step, we also propose a band-wise normalization method, which splits a CT image into three channels according to the intensity value of each voxel and we show that this method improves the performance of the cGAN. We test our cGAN on post-implantation CTs of 74 ears and the quality of the artifact-corrected images is evaluated quantitatively by comparing the segmentations of intra-cochlear anatomical structures, which are obtained with a previously published method, in the real pre-implantation and the artifact-corrected CTs. We show that the proposed method leads to an average surface error of 0.18 mm which is about half of what could be achieved with a previously proposed technique.

Jianing Wang, Yiyuan Zhao, Jack H. Noble, Benoit M. Dawant

Neural Network Evolution Using Expedited Genetic Algorithm for Medical Image Denoising

Convolutional neural networks offer state-of-the-art performance for medical image denoising. However, their architectures are manually designed for different noise types. The realistic noise in medical images is usually mixed and complicated, and sometimes unknown, leading to challenges in creating effective denoising neural networks. In this paper, we present a Genetic Algorithm (GA)-based network evolution approach to search for the fittest genes to optimize network structures. We expedite the evolutionary process through an experience-based greedy exploration strategy and transfer learning. The experimental results on computed tomography perfusion (CTP) images denoising demonstrate the capability of the method to select the fittest genes for building high-performance networks, named EvoNets, and our results compare favorably with state-of-the-art methods.

Peng Liu, Yangjunyi Li, Mohammad D. El Basha, Ruogu Fang

Deep Convolutional Filtering for Spatio-Temporal Denoising and Artifact Removal in Arterial Spin Labelling MRI

Arterial spin labelling (ASL) is a noninvasive imaging modality, used in the clinic and in research, which can give quantitative measurements of perfusion in the brain and other organs. However, because the signal-to-noise ratio is inherently low and the ASL acquisition is particularly prone to corruption by artifact, image processing methods such as denoising and artifact filtering are vital for generating accurate measurements of perfusion. In this work, we present a new simultaneous approach to denoising and artifact removal, using a novel deep convolutional joint filter architecture to learn and exploit spatio-temporal properties of the ASL signal. We proceed to show, using data from 15 healthy subjects, that our approach achieves state of the art performance in both denoising and artifact removal, improving peak signal-to-noise ratio by up to 50%. By allowing more accurate estimation of perfusion, even in challenging datasets, this technique offers an exciting new approach for ASL pipelines, and might be used both for improving individual images and to increase the power of research studies using ASL.

David Owen, Andrew Melbourne, Zach Eaton-Rosen, David L. Thomas, Neil Marlow, Jonathan Rohrer, Sébastien Ourselin

DeepASL: Kinetic Model Incorporated Loss for Denoising Arterial Spin Labeled MRI via Deep Residual Learning

Arterial spin labeling (ASL) allows to quantify the cerebral blood flow (CBF) by magnetic labeling of the arterial blood water. ASL is increasingly used in clinical studies due to its noninvasiveness, repeatability and benefits in quantification. However, ASL suffers from an inherently low-signal-to-noise ratio (SNR) requiring repeated measurements of control/spin-labeled (C/L) pairs to achieve a reasonable image quality, which in return increases motion sensitivity. This leads to clinically prolonged scanning times increasing the risk of motion artifacts. Thus, there is an immense need of advanced imaging and processing techniques in ASL. In this paper, we propose a novel deep learning based approach to improve the perfusion-weighted image quality obtained from a subset of all available pairwise C/L subtractions. Specifically, we train a deep fully convolutional network (FCN) to learn a mapping from noisy perfusion-weighted image and its subtraction (residual) from the clean image. Additionally, we incorporate the CBF estimation model in the loss function during training, which enables the network to produce high quality images while simultaneously enforcing the CBF estimates to be as close as reference CBF values. Extensive experiments on synthetic and clinical ASL datasets demonstrate the effectiveness of our method in terms of improved ASL image quality, accurate CBF parameter estimation and considerably small computation time during testing.

Cagdas Ulas, Giles Tetteh, Stephan Kaczmarz, Christine Preibisch, Bjoern H. Menze

Direct Estimation of Pharmacokinetic Parameters from DCE-MRI Using Deep CNN with Forward Physical Model Loss

Dynamic contrast-enhanced (DCE) MRI is an evolving imaging technique that provides a quantitative measure of pharmacokinetic (PK) parameters in body tissues, in which series of $$T_1$$ -weighted images are collected following the administration of a paramagnetic contrast agent. Unfortunately, in many applications, conventional clinical DCE-MRI suffers from low spatiotemporal resolution and insufficient volume coverage. In this paper, we propose a novel deep learning based approach to directly estimate the PK parameters from undersampled DCE-MRI data. Specifically, we design a custom loss function where we incorporate a forward physical model that relates the PK parameters to corrupted image-time series obtained due to subsampling in k-space. This allows the network to directly exploit the knowledge of true contrast agent kinetics in the training phase, and hence provide more accurate restoration of PK parameters. Experiments on clinical brain DCE datasets demonstrate the efficacy of our approach in terms of fidelity of PK parameter reconstruction and significantly faster parameter inference compared to a model-based iterative reconstruction method.

Cagdas Ulas, Giles Tetteh, Michael J. Thrippleton, Paul A. Armitage, Stephen D. Makin, Joanna M. Wardlaw, Mike E. Davies, Bjoern H. Menze

Short Acquisition Time PET/MR Pharmacokinetic Modelling Using CNNs

Standard quantification of Positron Emission Tomography (PET) data requires a long acquisition time to enable pharmacokinetic (PK) model fitting, however blood flow information from Arterial Spin Labelling (ASL) Magnetic Resonance Imaging (MRI) can be combined with simultaneous dynamic PET data to reduce the acquisition time. Due the difficulty of fitting a PK model to noisy PET data with limited time points, such ‘fixed- $$R_1$$ ’ techniques are constrained to a 30 min minimum acquisition, which is intolerable for many patients. In this work we apply a deep convolutional neural network (CNN) approach to combine the PET and MRI data. This permits shorter acquisition times as it avoids the noise sensitive voxelwise PK modelling and facilitates the full modelling of the relationship between blood flow and the dynamic PET data. This method is compared to three fixed- $$R_1$$ PK methods, and the clinically used standardised uptake value ratio (SUVR), using 60 min dynamic PET PK modelling as the gold standard. Testing on 11 subjects participating in a study of pre-clinical Alzheimer’s Disease showed that, for 30 min acquisitions, all methods which combine the PET and MRI data have comparable performance, however at shorter acquisition times the CNN approach has a significantly lower mean square error (MSE) compared to fixed- $$R_1$$ PK modelling ( $$p=0.001$$ ). For both acquisition windows, SUVR had a significantly higher MSE than the CNN method ( $$p\le 0.003$$ ). This demonstrates that combining simultaneous PET and MRI data using a CNN can result in robust PET quantification within a scan time which is tolerable to patients with dementia.

Catherine J. Scott, Jieqing Jiao, M. Jorge Cardoso, Kerstin Kläser, Andrew Melbourne, Pawel J. Markiewicz, Jonathan M. Schott, Brian F. Hutton, Sébastien Ourselin

Can Deep Learning Relax Endomicroscopy Hardware Miniaturization Requirements?

Confocal laser endomicroscopy (CLE) is a novel imaging modality that provides in vivo histological cross-sections of examined tissue. Recently, attempts have been made to develop miniaturized in vivo imaging devices, specifically confocal laser microscopes, for both clinical and research applications. However, current implementations of miniature CLE components such as confocal lenses compromise image resolution, signal-to-noise ratio, or both, which negatively impacts the utility of in vivo imaging. In this work, we demonstrate that software-based techniques can be used to recover lost information due to endomicroscopy hardware miniaturization and reconstruct images of higher resolution. Particularly, a densely connected convolutional neural network is used to reconstruct a high-resolution CLE image, given a low-resolution input. In the proposed network, each layer is directly connected to all subsequent layers, which results in an effective combination of low-level and high-level features and efficient information flow throughout the network. To train and evaluate our network, we use a dataset of 181 high-resolution CLE images. Both quantitative and qualitative results indicate superiority of the proposed network compared to traditional interpolation techniques and competing learning-based methods. This work demonstrates that software-based super-resolution is a viable approach to compensate for loss of resolution due to endoscopic hardware miniaturization.

Saeed Izadi, Kathleen P. Moriarty, Ghassan Hamarneh

A Framework to Objectively Identify Reference Regions for Normalizing Quantitative Imaging

The quantitative use of medical images often requires an intensity scaling with respect to the signal from a well-characterized anatomical region of interest. The choice of such a region often varies between studies which can substantially influence the quantification, resulting in study bias hampering objective findings which are detrimental to open science. This study outlines a list of criteria and a statistical ranking approach for identifying normalization region of interest. The proposed criteria include (i) associations between reference region and demographics such as age, (ii) diagnostic group differences in the reference region, (iii) correlation between reference and primary areas of interest, (iv) local variance in the reference region, and (v) longitudinal reproducibility of the target regions when normalized. The proposed approach has been used to establish an optimal normalization region of interest for the analysis of Quantitative Susceptibility Mapping (QSM) of Magnetic Resonance Imaging (MRI). This was achieved by using cross-sectional data from 119 subjects with normal cognition, mild cognitive impairment, and Alzheimer’s disease as well as and 19 healthy elderly individuals with longitudinal data. For the QSM application, we found that normalizing by the white matter regions not only satisfies the criteria but it also provides the best separation between clinical groups for deep brain nuclei target regions.

Amir Fazlollahi, Scott Ayton, Pierrick Bourgeat, Ibrahima Diouf, Parnesh Raniga, Jurgen Fripp, James Doecke, David Ames, Colin L. Masters, Christopher C. Rowe, Victor L. Villemagne, Ashley I. Bush, Olivier Salvado

Evaluation of Adjoint Methods in Photoacoustic Tomography with Under-Sampled Sensors

Photo-Acoustic Tomography (PAT) can reconstruct a distribution of optical absorbers acting as instantaneous sound sources in subcutaneous microvasculature of a human breast. Adjoint methods for PAT, typically Time-Reversal (TR) and Back-Projection (BP), are ways to refocus time-reversed acoustic signals on sources by wave propagation from the position of sensors. TR and BP have different treatments for received signals, but they are equivalent under continuously sampling on a closed circular sensor array in two dimensions. Here, we analyze image quality with discrete under-sampled sensors in the sense of the Shannon sampling theorem. We investigate resolution and contrast of TR and BP, respectively in one source-sensor pair configuration and the frequency domain. With Hankel’s asymptotic expansion to the integrands of imaging functions, our main contribution is to demonstrate that TR and BP have better performance on contrast and resolution, respectively. We also show that the integrand of TR includes additional side lobes which degrade axial resolution whereas that of BP conversely has relatively small amplitudes. Moreover, omnidirectional resolution is improved if more sensors are employed to collect the received signals. Nevertheless, for the under-sampled sensors, we propose the Truncated Back-Projection (TBP) method to enhance the contrast of BP using removing higher frequency components in the received signals. We conduct numerical experiments on the two-dimensional projected phantom model extracted from OA-Breast Database. The experiments verify our theories and show that the proposed TBP possesses better omnidirectional resolution as well as contrast compared with TR and BP with under-sampled sensors.

Hongxiang Lin, Takashi Azuma, Mehmet Burcin Unlu, Shu Takagi

A No-Reference Quality Metric for Retinal Vessel Tree Segmentation

Due to inevitable differences between the data used for training modern CAD systems and the data encountered when they are deployed in clinical scenarios, the ability to automatically assess the quality of predictions when no expert annotation is available can be critical. In this paper, we propose a new method for quality assessment of retinal vessel tree segmentations in the absence of a reference ground-truth. For this, we artificially degrade expert-annotated vessel map segmentations and then train a CNN to predict the similarity between the degraded images and their corresponding ground-truths. This similarity can be interpreted as a proxy to the quality of a segmentation. The proposed model can produce a visually meaningful quality score, effectively predicting the quality of a vessel tree segmentation in the absence of a manually segmented reference. We further demonstrate the usefulness of our approach by applying it to automatically find a threshold for soft probabilistic segmentations on a per-image basis. For an independent state-of-the-art unsupervised vessel segmentation technique, the thresholds selected by our approach lead to statistically significant improvements in F1-score $$(+2.67\%)$$ and Matthews Correlation Coefficient (+ $$3.11\%$$ ) over the thresholds derived from ROC analysis on the training set. The score is also shown to correlate strongly with F1 and MCC when a reference is available.

Adrian Galdran, Pedro Costa, Alessandro Bria, Teresa Araújo, Ana Maria Mendonça, Aurélio Campilho

Efficient and Accurate MRI Super-Resolution Using a Generative Adversarial Network and 3D Multi-level Densely Connected Network

High-resolution (HR) magnetic resonance images (MRI) provide detailed anatomical information important for clinical application and quantitative image analysis. However, HR MRI conventionally comes at the cost of longer scan time, smaller spatial coverage, and lower signal-to-noise ratio (SNR). Recent studies have shown that single image super-resolution (SISR), a technique to recover HR details from one single low-resolution (LR) input image, could provide high quality image details with the help of advanced deep convolutional neural networks (CNN). However, deep neural networks consume memory heavily and run slowly, especially in 3D settings. In this paper, we propose a novel 3D neural network design, namely a multi-level densely connected super-resolution network (mDCSRN) with generative adversarial network (GAN)–guided training. The mDCSRN trains and inferences quickly, and the GAN promotes realistic output hardly distinguishable from original HR images. Our results from experiments on a dataset with 1,113 subjects shows that our new architecture outperforms other popular deep learning methods in recovering 4x resolution-downgraded images and runs 6x faster.

Yuhua Chen, Feng Shi, Anthony G. Christodoulou, Yibin Xie, Zhengwei Zhou, Debiao Li

A Deep Learning Based Anti-aliasing Self Super-Resolution Algorithm for MRI

High resolution magnetic resonance (MR) images are desired in many clinical applications, yet acquiring such data with an adequate signal-to-noise ratio requires a long time, making them costly and susceptible to motion artifacts. A common way to partly achieve this goal is to acquire MR images with good in-plane resolution and poor through-plane resolution (i.e., large slice thickness). For such 2D imaging protocols, aliasing is also introduced in the through-plane direction, and these high-frequency artifacts cannot be removed by conventional interpolation. Super-resolution (SR) algorithms which can reduce aliasing artifacts and improve spatial resolution have previously been reported. State-of-the-art SR methods are mostly learning-based and require external training data consisting of paired low resolution (LR) and high resolution (HR) MR images. However, due to scanner limitations, such training data are often unavailable. This paper presents an anti-aliasing (AA) and self super-resolution (SSR) algorithm that needs no external training data. It takes advantage of the fact that the in-plane slices of those MR images contain high frequency information. Our algorithm consists of three steps: (1) We build a self AA (SAA) deep network followed by (2) an SSR deep network, both of which can be applied along different orientations within the original images, and (3) recombine the multiple orientations output from Steps 1 and 2 using Fourier burst accumulation. We perform our SAA+SSR algorithm on a diverse collection of MR data without modification or preprocessing other than N4 inhomogeneity correction, and demonstrate significant improvement compared to competing SSR methods.

Can Zhao, Aaron Carass, Blake E. Dewey, Jonghye Woo, Jiwon Oh, Peter A. Calabresi, Daniel S. Reich, Pascal Sati, Dzung L. Pham, Jerry L. Prince

Gradient Profile Based Super Resolution of MR Images with Induced Sparsity

Trade-off between resolution and signal to noise ratio(SNR) of magnetic resonance (MR) images can be improved by post processing algorithms to provide high quality MR images required for several medical diagnosis. This paper proposed a constraint to sharpen the gradient profile (GP), typically symbolizes the quality of image, of super-resolved MR images in the framework of sparse representation based super resolution without any external LR (low-resolution)-HR (high resolution) pair images. It has been performed by establishing a piecewise linear relation between GP of LR image up-scaled by Bi-cubic interpolation (UR), and corresponding LR image. The resultant relationship is used to approximate the ground truth HR image such that GP of upsampled LR image is improved. Further, to preserve the details along with its consistency among coronal, sagittal and axial planes, we have learned multiple dictionaries by extracting patches from the same and adjacent slices. The experimental results demonstrate that the proposed approach outperforms qualitatively and quantitatively the existing algorithms of increasing the resolution of MR images.

Prabhjot Kaur, Anil Kumar Sao

Deeper Image Quality Transfer: Training Low-Memory Neural Networks for 3D Images

In this paper we address the memory demands that come with the processing of 3-dimensional, high-resolution, multi-channeled medical images in deep learning. We exploit memory-efficient backpropagation techniques, to reduce the memory complexity of network training from being linear in the network’s depth, to being roughly constant – permitting us to elongate deep architectures with negligible memory increase. We evaluate our methodology in the paradigm of Image Quality Transfer, whilst noting its potential application to various tasks that use deep learning. We study the impact of depth on accuracy and show that deeper models have more predictive power, which may exploit larger training sets. We obtain substantially better results than the previous state-of-the-art model with a slight memory increase, reducing the root-mean-squared-error by 13%. Our code is publicly available.

Stefano B. Blumberg, Ryutaro Tanno, Iasonas Kokkinos, Daniel C. Alexander

High Frame-Rate Cardiac Ultrasound Imaging with Deep Learning

Cardiac ultrasound imaging requires a high frame rate in order to capture rapid motion. This can be achieved by multi-line acquisition (MLA), where several narrow-focused received lines are obtained from each wide-focused transmitted line. This shortens the acquisition time at the expense of introducing block artifacts. In this paper, we propose a data-driven learning-based approach to improve the MLA image quality. We train an end-to-end convolutional neural network on pairs of real ultrasound cardiac data, acquired through MLA and the corresponding single-line acquisition (SLA). The network achieves a significant improvement in image quality for both 5- and 7-line MLA resulting in a decorrelation measure similar to that of SLA while having the frame rate of MLA.

Ortal Senouf, Sanketh Vedula, Grigoriy Zurakhov, Alex Bronstein, Michael Zibulevsky, Oleg Michailovich, Dan Adam, David Blondheim

Image Reconstruction Methods

Frontmatter

Phase-Sensitive Region-of-Interest Computed Tomography

X-Ray Phase-Contrast Imaging (PCI) yields absorption, differential phase, and dark-field images. Computed Tomography (CT) of grating-based PCI can in principle provide high-resolution soft-tissue contrast. Recently, grating-based PCI took several hurdles towards clinical implementation by addressing, for example, acquisition speed, high X-ray energies, and system vibrations. However, a critical impediment in all grating-based systems lies in limits that constrain the grating diameter to few centimeters.In this work, we propose a system and a reconstruction algorithm to circumvent this constraint in a clinically compatible way. We propose to perform a phase-sensitive Region-of-Interest (ROI) CT within a full-field absorption CT. The biggest advantage of this approach is that it allows to correct for phase truncation artifacts, and to obtain quantitative phase values. Our method is robust, and shows high-quality results on simulated data and on a biological mouse sample. This work is a proof of concept showing the potential to use PCI in CT on large specimen, such as humans, in clinical applications.

Lina Felsner, Martin Berger, Sebastian Kaeppler, Johannes Bopp, Veronika Ludwig, Thomas Weber, Georg Pelzer, Thilo Michel, Andreas Maier, Gisela Anton, Christian Riess

Some Investigations on Robustness of Deep Learning in Limited Angle Tomography

In computed tomography, image reconstruction from an insufficient angular range of projection data is called limited angle tomography. Due to missing data, reconstructed images suffer from artifacts, which cause boundary distortion, edge blurring, and intensity biases. Recently, deep learning methods have been applied very successfully to this problem in simulation studies. However, the robustness of neural networks for clinical applications is still a concern. It is reported that most neural networks are vulnerable to adversarial examples. In this paper, we aim to investigate whether some perturbations or noise will mislead a neural network to fail to detect an existing lesion. Our experiments demonstrate that the trained neural network, specifically the U-Net, is sensitive to Poisson noise. While the observed images appear artifact-free, anatomical structures may be located at wrong positions, e.g. the skin shifted by up to 1 cm. This kind of behavior can be reduced by retraining on data with simulated Poisson noise. However, we demonstrate that the retrained U-Net model is still susceptible to adversarial examples. We conclude the paper with suggestions towards robust deep-learning-based reconstruction.

Yixing Huang, Tobias Würfl, Katharina Breininger, Ling Liu, Günter Lauritsch, Andreas Maier

Adversarial Sparse-View CBCT Artifact Reduction

We present an effective post-processing method to reduce the artifacts from sparsely reconstructed cone-beam CT (CBCT) images. The proposed method is based on the state-of-the-art, image-to-image generative models with a perceptual loss as regulation. Unlike the traditional CT artifact-reduction approaches, our method is trained in an adversarial fashion that yields more perceptually realistic outputs while preserving the anatomical structures. To address the streak artifacts that are inherently local and appear across various scales, we further propose a novel discriminator architecture based on feature pyramid networks and a differentially modulated focus map to induce the adversarial training. Our experimental results show that the proposed method can greatly correct the cone-beam artifacts from clinical CBCT images reconstructed using 1/3 projections, and outperforms strong baseline methods both quantitatively and qualitatively.

Haofu Liao, Zhimin Huo, William J. Sehnert, Shaohua Kevin Zhou, Jiebo Luo

Nasal Mesh Unfolding – An Approach to Obtaining 2-D Skin Templates from 3-D Nose Models

Nasal reconstruction requires a 2-D template representing the skin area to be taken from the donor site of the patient. We propose a new framework for template design, called nasal mesh unfolding, to obtain 2-D skin templates from 3-D nose models. The proposed nasal mesh unfolding framework takes as input a target digital nose model represented by a 3-D triangle mesh and unfolds the nasal mesh under structure constraints using semidefinite programming. The solution of the unfolding problem is in the form of a Gram matrix from which the 2-D representation of the 3-D model, or embedding, is extracted. The embedding defines a digital template representing the skin requirement for nasal reconstruction, which can in turn be used to produce a physical 2-D template to apply on the donor site for guiding skin incision. Experiments on synthetic data demonstrate the effectiveness of the proposed unfolding approach, and results on real data show the feasibility of generating physical 2-D skin templates from 3-D nose meshes. The proposed approach efficiently converts 3-D nose models to digital 2-D skin templates for fast easy and accurate preparation of physical templates and can be useful for other plastic surgery tasks.

Hongying Li, Marc Robini, Zhongwei Zhou, Wei Tang, Yuemin Zhu

Towards Generating Personalized Volumetric Phantom from Patient’s Surface Geometry

This paper presents a method to generate a volumetric phantom with internal anatomical structures from the patient’s skin surface geometry, and studies the potential impact of this technology on planning medical scans and procedures such as patient positioning. Existing scan planning for imaging is either done by visual inspection of the patient or based on an ionizing scan obtained prior to the full scan. These methods are either limited in accuracy or result in additional radiation dose to the patient. Our approach generates a “CT”-like phantom, with lungs and bone structures, from the patient’s skin surface. The skin surface can be estimated from a 2.5D depth sensor and thus, the proposed method offers a novel solution to reduce the radiation dose. We present quantitative experiments on a dataset of 2045 whole body CT scans and report measurements relevant to the potential clinical use of such phantoms. (This feature is based on research, and is not commercially available. Due to regulatory reasons its future availability cannot be guaranteed.)

Yifan Wu, Vivek Singh, Brian Teixeira, Kai Ma, Birgi Tamersoy, Andreas Krauss, Terrence Chen

Multi-channel Generative Adversarial Network for Parallel Magnetic Resonance Image Reconstruction in K-space

Magnetic Resonance Imaging (MRI) typically collects data below the Nyquist sampling rate for imaging acceleration. To remove aliasing artifacts, we propose a multi-channel deep generative adversarial network (GAN) model for MRI reconstruction. Because multi-channel GAN matches the parallel data acquisition system architecture on a modern MRI scanner, this model can effectively learn intrinsic data correlation associated with MRI hardware from originally-collected multi-channel complex data. By estimating missing data directly with the trained network, images may be generated from undersampled multi-channel raw data, providing an “end-to-end” approach to parallel MRI reconstruction. By experimentally comparing with other methods, it is demonstrated that multi-channel GAN can perform image reconstruction with an affordable computation cost and an imaging acceleration factor higher than the current clinical standard.

Pengyue Zhang, Fusheng Wang, Wei Xu, Yu Li

A Learning-Based Metal Artifacts Correction Method for MRI Using Dual-Polarity Readout Gradients and Simulated Data

In MRI, metallic implants can generate magnetic field distortions and interfere in the spatial encoding of gradient magnetic fields. This results in image distortions, such as bulk shifts, pile-up and signal-loss artifacts. Three-dimensional spectral imaging methods can reduce the bulk shifts to a single-voxel level, but they still suffer from residual artifacts such as pile-up and signal-loss artifacts. Fully phase encoding methods suppress metal-induced artifacts, but they require impractically long imaging times. In this paper, we applied a deep learning method to correct metal artifacts. A neural network is proposed to map two distorted images obtained by dual-polarity readout gradients into a distortion-free image obtained by fully phase encoding. Simulated data were utilized to supplement and substitute real MR data for training the proposed network. Phantom experiments were performed to compare the quality of reconstructed images from several methods at high and low readout bandwidths.

Kinam Kwon, Dongchan Kim, HyunWook Park

Motion Aware MR Imaging via Spatial Core Correspondence

Motion awareness in MR imaging is essential when it comes to long acquisition times. For volumetric high-resolution or temporal resolved images, sporadic subject movements or respiration induced organ motion has to be considered in order to reduce motion artifacts. We present a novel MR imaging sequence and an associated retrospective reconstruction method incorporating motion via spatial correspondence of the k-space center. The sequence alternatingly samples k-space patches located in the center and in peripheral higher frequency regions. Each patch is transformed into the spatial domain in order to normalize for spatial transformations rigidly as well as non-rigidly. The k-space is reconstructed from the spatially aligned patches where the alignment is derived using image registration of the center patches. Our proposed method assumes neither periodic motion nor requires any binning of motion states to properly compensate for movements during acquisition. As we directly acquire volumes, 2D slice stacking is avoided. We tested our method for brain imaging with sporadic head motion and for chest imaging where a volunteer has been scanned under free breathing. In both cases, we demonstrate high-quality 3D reconstructions.

Christoph Jud, Damien Nguyen, Robin Sandkühler, Alina Giger, Oliver Bieri, Philippe C. Cattin

Nonparametric Density Flows for MRI Intensity Normalisation

With the adoption of powerful machine learning methods in medical image analysis, it is becoming increasingly desirable to aggregate data that is acquired across multiple sites. However, the underlying assumption of many analysis techniques that corresponding tissues have consistent intensities in all images is often violated in multi-centre databases. We introduce a novel intensity normalisation scheme based on density matching, wherein the histograms are modelled as Dirichlet process Gaussian mixtures. The source mixture model is transformed to minimise its $$L^2$$ divergence towards a target model, then the voxel intensities are transported through a mass-conserving flow to maintain agreement with the moving density. In a multi-centre study with brain MRI data, we show that the proposed technique produces excellent correspondence between the matched densities and histograms. We further demonstrate that our method makes tissue intensity statistics substantially more compatible between images than a baseline affine transformation and is comparable to state-of-the-art while providing considerably smoother transformations. Finally, we validate that nonlinear intensity normalisation is a step toward effective imaging data harmonisation.

Daniel C. Castro, Ben Glocker

Ultra-Fast T2-Weighted MR Reconstruction Using Complementary T1-Weighted Information

T1-weighted image (T1WI) and T2-weighted image (T2WI) are the two routinely acquired Magnetic Resonance Imaging (MRI) protocols that provide complementary information for diagnosis. However, the total acquisition time of ~10 min yields the image quality vulnerable to artifacts such as motion. To speed up MRI process, various algorithms have been proposed to reconstruct high quality images from under-sampled k-space data. These algorithms only employ the information of an individual protocol (e.g., T2WI). In this paper, we propose to combine complementary MRI protocols (i.e., T1WI and under-sampled T2WI particularly) to reconstruct the high-quality image (i.e., fully-sampled T2WI). To the best of our knowledge, this is the first work to utilize data from different MRI protocols to speed up the reconstruction of a target sequence. Specifically, we present a novel deep learning approach, namely Dense-Unet, to accomplish the reconstruction task. The Dense-Unet requires fewer parameters and less computation, but achieves better performance. Our results have shown that Dense-Unet can reconstruct a 3D T2WI volume in less than 10 s, i.e., with the acceleration rate as high as 8 or more but with negligible aliasing artefacts and signal-noise-ratio (SNR) loss.

Lei Xiang, Yong Chen, Weitang Chang, Yiqiang Zhan, Weili Lin, Qian Wang, Dinggang Shen

Image Reconstruction by Splitting Deep Learning Regularization from Iterative Inversion

Image reconstruction from downsampled and corrupted measurements, such as fast MRI and low dose CT, is mathematically ill-posed inverse problem. In this work, we propose a general and easy-to-use reconstruction method based on deep learning techniques. In order to address the intractable inversion of general inverse problems, we propose to train a network to refine intermediate images from classical reconstruction procedure to the ground truth, i.e. the intermediate images that satisfy the data consistence will be fed into some chosen denoising networks or generative networks for denoising and removing artifact in each iterative stage. The proposed approach involves only techniques of conventional image reconstruction and usual image representation/denoising deep network learning, without a specifically designed and complicated network structures for a certain physical forward operator. Extensive experiments on MRI reconstruction applied with both stack auto-encoder networks and generative adversarial nets demonstrate the efficiency and accuracy of the proposed method compared with other image reconstruction algorithms.

Jiulong Liu, Tao Kuang, Xiaoqun Zhang

Adversarial and Perceptual Refinement for Compressed Sensing MRI Reconstruction

Deep learning approaches have shown promising performance for compressed sensing-based Magnetic Resonance Imaging. While deep neural networks trained with mean squared error (MSE) loss functions can achieve high peak signal to noise ratio, the reconstructed images are often blurry and lack sharp details, especially for higher undersampling rates. Recently, adversarial and perceptual loss functions have been shown to achieve more visually appealing results. However, it remains an open question how to (1) optimally combine these loss functions with the MSE loss function and (2) evaluate such a perceptual enhancement. In this work, we propose a hybrid method, in which a visual refinement component is learnt on top of an MSE loss-based reconstruction network. In addition, we introduce a semantic interpretability score, measuring the visibility of the region of interest in both ground truth and reconstructed images, which allows us to objectively quantify the usefulness of the image quality for image post-processing and analysis. Applied on a large cardiac MRI dataset simulated with 8-fold undersampling, we demonstrate significant improvements ( $$p<0.01$$ ) over the state-of-the-art in both a human observer study and the semantic interpretability score.

Maximilian Seitzer, Guang Yang, Jo Schlemper, Ozan Oktay, Tobias Würfl, Vincent Christlein, Tom Wong, Raad Mohiaddin, David Firmin, Jennifer Keegan, Daniel Rueckert, Andreas Maier

Translation of 1D Inverse Fourier Transform of K-space to an Image Based on Deep Learning for Accelerating Magnetic Resonance Imaging

To reconstruct magnetic resonance (MR) images from undersampled Cartesian k-space data, we propose an algorithm based on two deep-learning architectures: (1) a multi-layer perceptron (MLP) that estimates a target image from 1D inverse Fourier transform (IFT) of k-space; and (2) a convolutional neural network (CNN) that estimates the target image from the estimated image of the MLP. The MLP learns the relationship between 1D IFT of undersampled k-space which is transformed along the frequency-encoding direction and the target fully-sampled image. The MLP is trained line by line rather than by a whole image, because each frequency-encoding line of the 1D IFT of k-space is not correlated with each other. It can dramatically decrease the number of parameters to be learned because the number of input/output pixels decrease from N2 to N. The next CNN learns the relationship between an estimated image of the MLP and the target fully-sampled image to reduce remaining artifacts in the image domain. The proposed deep-learning algorithm (i.e., the combination of the MLP and the CNN) exhibited superior performance over a single MLP and a single CNN. And it outperformed the comparison algorithms including CS-MRI, DL-MRI, a CNN-based algorithm (denoted as Wang’s algorithm), PANO, and FDLCP in both qualitative and quantitative evaluation. Consequently, the proposed algorithm is applicable up to a sampling ratio of 25% in Cartesian k-space.

Taejoon Eo, Hyungseob Shin, Taeseong Kim, Yohan Jun, Dosik Hwang

Deep Learning Using K-Space Based Data Augmentation for Automated Cardiac MR Motion Artefact Detection

Quality assessment of medical images is essential for complete automation of image processing pipelines. For large population studies such as the UK Biobank, artefacts such as those caused by heart motion are problematic and manual identification is tedious and time-consuming. Therefore, there is an urgent need for automatic image quality assessment techniques. In this paper, we propose a method to automatically detect the presence of motion-related artefacts in cardiac magnetic resonance (CMR) images. As this is a highly imbalanced classification problem (due to the high number of good quality images compared to the low number of images with motion artefacts), we propose a novel k-space based training data augmentation approach in order to address this problem. Our method is based on 3D spatio-temporal Convolutional Neural Networks, and is able to detect 2D+time short axis images with motion artefacts in less than 1 ms. We test our algorithm on a subset of the UK Biobank dataset consisting of 3465 CMR images and achieve not only high accuracy in detection of motion artefacts, but also high precision and recall. We compare our approach to a range of state-of-the-art quality assessment methods.

Ilkay Oksuz, Bram Ruijsink, Esther Puyol-Antón, Aurelien Bustin, Gastao Cruz, Claudia Prieto, Daniel Rueckert, Julia A. Schnabel, Andrew P. King

Cardiac MR Segmentation from Undersampled k-space Using Deep Latent Representation Learning

Reconstructing magnetic resonance imaging (MRI) from undersampled k-space enables the accelerated acquisition of MRI but is a challenging problem. However, in many diagnostic scenarios, perfect reconstructions are not necessary as long as the images allow clinical practitioners to extract clinically relevant parameters. In this work, we present a novel deep learning framework for reconstructing such clinical parameters directly from undersampled data, expanding on the idea of application-driven MRI. We propose two deep architectures, an end-to-end synthesis network and a latent feature interpolation network, to predict cardiac segmentation maps from extremely undersampled dynamic MRI data, bypassing the usual image reconstruction stage altogether. We perform a large-scale simulation study using UK Biobank data containing nearly 1000 test subjects and show that with the proposed approaches, an accurate estimate of clinical parameters such as ejection fraction can be obtained from fewer than 10 k-space lines per time-frame.

Jo Schlemper, Ozan Oktay, Wenjia Bai, Daniel C. Castro, Jinming Duan, Chen Qin, Jo V. Hajnal, Daniel Rueckert

A Comprehensive Approach for Learning-Based Fully-Automated Inter-slice Motion Correction for Short-Axis Cine Cardiac MR Image Stacks

In the clinical routine, short axis (SA) cine cardiac MR (CMR) image stacks are acquired during multiple subsequent breath-holds. If the patient cannot consistently hold the breath at the same position, the acquired image stack will be affected by inter-slice respiratory motion and will not correctly represent the cardiac volume, introducing potential errors in the following analyses and visualisations. We propose an approach to automatically correct inter-slice respiratory motion in SA CMR image stacks. Our approach makes use of probabilistic segmentation maps (PSMs) of the left ventricular (LV) cavity generated with decision forests. PSMs are generated for each slice of the SA stack and rigidly registered in-plane to a target PSM. If long axis (LA) images are available, PSMs are generated for them and combined to create the target PSM; if not, the target PSM is produced from the same stack using a 3D model trained from motion-free stacks. The proposed approach was tested on a dataset of SA stacks acquired from 24 healthy subjects (for which anatomical 3D cardiac images were also available as reference) and compared to two techniques which use LA intensity images and LA segmentations as targets, respectively. The results show the accuracy and robustness of the proposed approach in motion compensation.

Giacomo Tarroni, Ozan Oktay, Matthew Sinclair, Wenjia Bai, Andreas Schuh, Hideaki Suzuki, Antonio de Marvao, Declan O’Regan, Stuart Cook, Daniel Rueckert

Automatic View Planning with Multi-scale Deep Reinforcement Learning Agents

We propose a fully automatic method to find standardized view planes in 3D image acquisitions. Standard view images are important in clinical practice as they provide a means to perform biometric measurements from similar anatomical regions. These views are often constrained to the native orientation of a 3D image acquisition. Navigating through target anatomy to find the required view plane is tedious and operator-dependent. For this task, we employ a multi-scale reinforcement learning (RL) agent framework and extensively evaluate several Deep Q-Network (DQN) based strategies. RL enables a natural learning paradigm by interaction with the environment, which can be used to mimic experienced operators. We evaluate our results using the distance between the anatomical landmarks and detected planes, and the angles between their normal vector and target. The proposed algorithm is assessed on the mid-sagittal and anterior-posterior commissure planes of brain MRI, and the 4-chamber long-axis plane commonly used in cardiac MRI, achieving accuracy of 1.53 mm, 1.98 mm and 4.84 mm, respectively.

Amir Alansary, Loic Le Folgoc, Ghislain Vaillant, Ozan Oktay, Yuanwei Li, Wenjia Bai, Jonathan Passerat-Palmbach, Ricardo Guerrero, Konstantinos Kamnitsas, Benjamin Hou, Steven McDonagh, Ben Glocker, Bernhard Kainz, Daniel Rueckert

Towards MR-Only Radiotherapy Treatment Planning: Synthetic CT Generation Using Multi-view Deep Convolutional Neural Networks

Recently, Magnetic Resonance imaging-only (MR-only) radiotherapy treatment planning (RTP) receives growing interests since it is radiation-free and time/cost efficient. A key step in MR-only RTP is the generation of a synthetic CT from MR for dose calculation. Although deep learning approaches have achieved promising results on this topic, they still face two major challenges. First, it is very difficult to get perfectly registered CT-MR pairs to learn the intensity mapping, especially for abdomen and pelvic scans. Slight registration errors may mislead the deep network to converge at a sub-optimal CT-MR intensity matching. Second, training of a standard 3D deep network is very memory-consuming. In practice, one has to either shrink the size of the training network (sacrificing the accuracy) or use a patch-based sliding-window scheme (sacrificing the speed). In this paper, we proposed a novel method to address these two challenges. First, we designed a max-pooled cost function to accommodate imperfect registered CT-MR training pairs. Second, we proposed a network that consists of multiple 2D sub-networks (from different 3D views) followed by a combination sub-network. It reduces the memory consumption without losing the 3D context for high quality CT synthesis. We demonstrated our method can generate high quality synthetic CTs with much higher runtime efficiency compared to the state-of-the-art as well as our own benchmark methods. The proposed solution can potentially enable more effective and efficient MR-only RTPs in clinical settings.

Yu Zhao, Shu Liao, Yimo Guo, Liang Zhao, Zhennan Yan, Sungmin Hong, Gerardo Hermosillo, Tianming Liu, Xiang Sean Zhou, Yiqiang Zhan

Stochastic Deep Compressive Sensing for the Reconstruction of Diffusion Tensor Cardiac MRI

Understanding the structure of the heart at the microscopic scale of cardiomyocytes and their aggregates provides new insights into the mechanisms of heart disease and enables the investigation of effective therapeutics. Diffusion Tensor Cardiac Magnetic Resonance (DT-CMR) is a unique non-invasive technique that can resolve the microscopic structure, organisation, and integrity of the myocardium without the need for exogenous contrast agents. However, this technique suffers from relatively low signal-to-noise ratio (SNR) and frequent signal loss due to respiratory and cardiac motion. Current DT-CMR techniques rely on acquiring and averaging multiple signal acquisitions to improve the SNR. Moreover, in order to mitigate the influence of respiratory movement, patients are required to perform many breath holds which results in prolonged acquisition durations (e.g., $$\sim $$ 30 min using the existing technology). In this study, we propose a novel cascaded Convolutional Neural Networks (CNN) based compressive sensing (CS) technique and explore its applicability to improve DT-CMR acquisitions. Our simulation based studies have achieved high reconstruction fidelity and good agreement between DT-CMR parameters obtained with the proposed reconstruction and fully sampled ground truth. When compared to other state-of-the-art methods, our proposed deep cascaded CNN method and its stochastic variation demonstrated significant improvements. To the best of our knowledge, this is the first study using deep CNN based CS for the DT-CMR reconstruction. In addition, with relatively straightforward modifications to the acquisition scheme, our method can easily be translated into a method for online, at-the-scanner reconstruction enabling the deployment of accelerated DT-CMR in various clinical applications.

Jo Schlemper, Guang Yang, Pedro Ferreira, Andrew Scott, Laura-Ann McGill, Zohya Khalique, Margarita Gorodezky, Malte Roehl, Jennifer Keegan, Dudley Pennell, David Firmin, Daniel Rueckert

Automatic, Fast and Robust Characterization of Noise Distributions for Diffusion MRI

Knowledge of the noise distribution in magnitude diffusion MRI images is the centerpiece to quantify uncertainties arising from the acquisition process. The use of parallel imaging methods, the number of receiver coils and imaging filters applied by the scanner, amongst other factors, dictate the resulting signal distribution. Accurate estimation beyond textbook Rician or noncentral chi distributions often requires information about the acquisition process (e.g.coils sensitivity maps or reconstruction coefficients), which is not usually available. We introduce a new method where a change of variable naturally gives rise to a particular form of the gamma distribution for background signals. The first moments and maximum likelihood estimators of this gamma distribution explicitly depend on the number of coils, making it possible to estimate all unknown parameters using only the magnitude data. A rejection step is used to make the method automatic and robust to artifacts. Experiments on synthetic datasets show that the proposed method can reliably estimate both the degrees of freedom and the standard deviation. The worst case errors range from below 2% (spatially uniform noise) to approximately 10% (spatially variable noise). Repeated acquisitions of in vivo datasets show that the estimated parameters are stable and have lower variances than compared methods.

Samuel St-Jean, Alberto De Luca, Max A. Viergever, Alexander Leemans

An Automated Localization, Segmentation and Reconstruction Framework for Fetal Brain MRI

Reconstructing a high-resolution (HR) volume from motion-corrupted and sparsely acquired stacks plays an increasing role in fetal brain Magnetic Resonance Imaging (MRI) studies. Existing reconstruction methods are time-consuming and often require user interaction to localize and extract the brain from several stacks of 2D slices. In this paper, we propose a fully automatic framework for fetal brain reconstruction that consists of three stages: (1) brain localization based on a coarse segmentation of a down-sampled input image by a Convolutional Neural Network (CNN), (2) fine segmentation by a second CNN trained with a multi-scale loss function, and (3) novel, single-parameter outlier-robust super-resolution reconstruction (SRR) for HR visualization in the standard anatomical space. We validate our framework with images from fetuses with variable degrees of ventriculomegaly associated with spina bifida. Experiments show that each step of our proposed pipeline outperforms state-of-the-art methods in both segmentation and reconstruction comparisons. Overall, we report automatic SRR reconstructions that compare favorably with those obtained by manual, labor-intensive brain segmentations. This potentially unlocks the use of automatic fetal brain reconstruction studies in clinical practice.

Michael Ebner, Guotai Wang, Wenqi Li, Michael Aertsen, Premal A. Patel, Rosalind Aughwane, Andrew Melbourne, Tom Doel, Anna L. David, Jan Deprest, Sébastien Ourselin, Tom Vercauteren

Retinal Image Understanding Emerges from Self-Supervised Multimodal Reconstruction

The successful application of deep learning-based methodologies is conditioned by the availability of sufficient annotated data, which is usually critical in medical applications. This has motivated the proposal of several approaches aiming to complement the training with reconstruction tasks over unlabeled input data, complementary broad labels, augmented datasets or data from other domains. In this work, we explore the use of reconstruction tasks over multiple medical imaging modalities as a more informative self-supervised approach. Experiments are conducted on multimodal reconstruction of retinal angiography from retinography. The results demonstrate that the detection of relevant domain-specific patterns emerges from this self-supervised setting.

Álvaro S. Hervella, José Rouco, Jorge Novo, Marcos Ortega

Locality Adaptive Multi-modality GANs for High-Quality PET Image Synthesis

Positron emission topography (PET) has been substantially used in recent years. To minimize the potential health risks caused by the tracer radiation inherent to PET scans, it is of great interest to synthesize the high-quality full-dose PET image from the low-dose one to reduce the radiation exposure while maintaining the image quality. In this paper, we propose a locality adaptive multi-modality generative adversarial networks model (LA-GANs) to synthesize the full-dose PET image from both the low-dose one and the accompanying T1-weighted MRI to incorporate anatomical information for better PET image synthesis. This paper has the following contributions. First, we propose a new mechanism to fuse multi-modality information in deep neural networks. Different from the traditional methods that treat each image modality as an input channel and apply the same kernel to convolute the whole image, we argue that the contributions of different modalities could vary at different image locations, and therefore a unified kernel for a whole image is not appropriate. To address this issue, we propose a method that is locality adaptive for multi-modality fusion. Second, to learn this locality adaptive fusion, we utilize 1 × 1 × 1 kernel so that the number of additional parameters incurred by our method is kept minimum. This also naturally produces a fused image which acts as a pseudo input for the subsequent learning stages. Third, the proposed locality adaptive fusion mechanism is learned jointly with the PET image synthesis in an end-to-end trained 3D conditional GANs model developed by us. Our 3D GANs model generates high quality PET images by employing large-sized image patches and hierarchical features. Experimental results show that our method outperforms the traditional multi-modality fusion methods used in deep networks, as well as the state-of-the-art PET estimation approaches.

Yan Wang, Luping Zhou, Lei Wang, Biting Yu, Chen Zu, David S. Lalush, Weili Lin, Xi Wu, Jiliu Zhou, Dinggang Shen

Joint PET+MRI Patch-Based Dictionary for Bayesian Random Field PET Reconstruction

Multimodal imaging combining positron emission tomography (PET) and magnetic resonance imaging (MRI) provides complementary information about metabolism and anatomy. While the appearances of MRI and PET images are distinctive, there are fundamental inter-image dependencies relating structure and function. In PET-MRI imaging, typical PET reconstruction methods use priors to enforce PET-MRI dependencies at the very fine scale of image gradients and, so, cannot capture larger-scale inter-image correlations and intra-image texture patterns. Some recent methods enforce statistical models of MRI-image patches on PET-image patches, risking infusing anatomical features into PET images. In contrast, we propose a novel patch-based joint dictionary model for PET and MRI, learning regularity in individual patches and correlations in spatially-corresponding patches, for Bayesian PET reconstruction using expectation maximization. Reconstructions on simulated and in vivo PET-MRI data show that our method gives better-regularized images with smaller errors, compared to the state of the art.

Viswanath P. Sudarshan, Zhaolin Chen, Suyash P. Awate

Analysis of 3D Facial Dysmorphology in Genetic Syndromes from Unconstrained 2D Photographs

The quantification of facial dysmorphology is essential for the detection and diagnosis of genetic conditions. Facial analysis benefits from 3D image data, but 2D photography is more widely available at clinics. The aim of this paper is to analyze 3D facial dysmorphology using unconstrained (uncalibrated) 2D pictures at three orientations: frontal, left and right profiles. We estimate a unified 3D face shape by fitting a 3D morphable model (3DMM) to all the images by minimizing the differences between the 2D projected position of the selected 3D vertices in the 3DMM and their corresponding position in the 2D pictures. Using the estimated 3D face shape, we compute a set of facial dysmorphology measurements and train a classifier to identify genetic syndromes. Evaluated on a set of 48 subjects with and without genetic conditions, our method reduced the landmark detection errors obtained by using a single photograph by 44%, 48%, and 49% on the frontal photograph, left profile, and right profile, respectively. We achieved a point-to-point projection error of 1.98 ± 0.38% normalized to the size of face, significantly improving (p ≤ 0.01) the error obtained with state-of-the-art methods of 4.17 ± 2.83%. In addition, the geometric features calculated from the 3D reconstructed face obtained an accuracy of 73% in the detection of facial dysmorphology associated to genetic syndromes, compared with the error of 58% using state-of-the-art methods from 2D pictures. That accuracy increased to 96% when we included local texture information. Our results demonstrate the potential of this framework to assist in the earlier and remote detection of genetic syndromes throughout the world.

Liyun Tu, Antonio R. Porras, Alec Boyle, Marius George Linguraru

Double Your Views – Exploiting Symmetry in Transmission Imaging

For a plane symmetric object we can find two views—mirrored at the plane of symmetry—that will yield the exact same image of that object. In consequence, having one image of a plane symmetric object and a calibrated camera, we can automatically have a second, virtual image of that object if the 3-D location of the symmetry plane is known. In this work, we show for the first time that the above concept naturally extends to transmission imaging and present an algorithm to estimate the 3-D symmetry plane from a set of projection domain images based on Grangeat’s theorem. We then exploit symmetry to generate a virtual trajectory by mirroring views at the plane of symmetry. If the plane is not perpendicular to the acquired trajectory plane, the virtual and real trajectory will be oblique. The resulting X-shaped trajectory will be data-complete, allowing for the compensation of in-plane motion using epipolar consistency. We evaluate the proposed method on a synthetic symmetric phantom and, in a proof-of-concept study, apply it to a real scan of an anthropomorphic human head phantom.

Alexander Preuhs, Andreas Maier, Michael Manhart, Javad Fotouhi, Nassir Navab, Mathias Unberath

Real Time RNN Based 3D Ultrasound Scan Adequacy for Developmental Dysplasia of the Hip

Acquiring adequate ultrasound (US) image data is crucial for accurate diagnosis of developmental dysplasia of the hip (DDH), the most common pediatric hip disorder affecting on average one in every one thousand births. Presently, the acquisition of high quality US deemed adequate for diagnostic measurements requires thorough knowledge of infant hip anatomy as well as extensive experience in interpreting such scans. This work aims to provide rapid assurance to the operator, automatically at the time of acquisition, that the data acquired are suitable for accurate diagnosis. To this end, we propose a deep learning model for a fully automatic scan adequacy assessment of 3D US volumes. Our contributions include developing an effective criteria that defines the features required for DDH diagnosis in an adequate 3D US volume, proposing an efficient neural network architecture composed of convolutional layers and recurrent layers for robust classification, and validating our model’s agreement with classification labels from an expert radiologist on real pediatric clinical data. To the best of our knowledge, our work is the first to make use of inter-slice information within a 3D US volume for DDH scan adequacy. Using 200 3D US volumes from 25 pediatric patients, we demonstrate an accuracy of 82% with an area under receiver operating characteristic curve of 0.83 and a clinically suitable runtime of one second.

Olivia Paserin, Kishore Mulpuri, Anthony Cooper, Antony J. Hodgson, Rafeef Garbi

Direct Reconstruction of Ultrasound Elastography Using an End-to-End Deep Neural Network

In this work, we developed an end-to-end convolutional neural network (CNN) to reconstruct the ultrasound elastography directly from radio frequency (RF) data. The novelty of this network is able to infer the distribution of elastography from real RF data by only using computational simulation as the training data. Moreover, this framework can generate displacement and strain field respectively both from ultrasound RF data directly. We evaluated the performance of this network on 50 simulated RF datasets, 42 phantom datasets, and 4 human datasets. The best results of signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR) in simulated data, phantom data, and human data are 39.5 dB and 69.64 dB, 32.64 dB and 48.76 dB, 23.24 dB and 46.22 dB, respectively. Furthermore, we also compare the performance of our method to the state-of-art ultrasound elastography using normalized cross-correlation (NCC) technique. From this comparison, it shows that that our method can effectively compute the strain field robustly and accurately in the this paper. These results might imply great potential of this deep learning method in ultrasound elastography application.

Sitong Wu, Zhifan Gao, Zhi Liu, Jianwen Luo, Heye Zhang, Shuo Li

3D Fetal Skull Reconstruction from 2DUS via Deep Conditional Generative Networks

2D ultrasound (US) is the primary imaging modality in antenatal healthcare. Despite the limitations of traditional 2D biometrics to characterize the true 3D anatomy of the fetus, the adoption of 3DUS is still very limited. This is particularly significant in developing countries and remote areas, due to the lack of experienced sonographers and the limited access to 3D technology. In this paper, we present a new deep conditional generative network for the 3D reconstruction of the fetal skull from 2DUS standard planes of the head routinely acquired during the fetal screening process. Based on the generative properties of conditional variational autoencoders (CVAE), our reconstruction architecture (REC-CVAE) directly integrates the three US standard planes as conditional variables to generate a unified latent space of the skull. Additionally, we propose HiREC-CVAE, a hierarchical generative network based on the different clinical relevance of each predictive view. The hierarchical structure of HiREC-CVAE allows the network to learn a sequence of nested latent spaces, providing superior predictive capabilities even in the absence of some of the 2DUS scans. The performance of the proposed architectures was evaluated on a dataset of 72 cases, showing accurate reconstruction capabilities from standard non-registered 2DUS images.

Juan J. Cerrolaza, Yuanwei Li, Carlo Biffi, Alberto Gomez, Matthew Sinclair, Jacqueline Matthew, Caronline Knight, Bernhard Kainz, Daniel Rueckert

Standard Plane Detection in 3D Fetal Ultrasound Using an Iterative Transformation Network

Standard scan plane detection in fetal brain ultrasound (US) forms a crucial step in the assessment of fetal development. In clinical settings, this is done by manually manoeuvring a 2D probe to the desired scan plane. With the advent of 3D US, the entire fetal brain volume containing these standard planes can be easily acquired. However, manual standard plane identification in 3D volume is labour-intensive and requires expert knowledge of fetal anatomy. We propose a new Iterative Transformation Network (ITN) for the automatic detection of standard planes in 3D volumes. ITN uses a convolutional neural network to learn the relationship between a 2D plane image and the transformation parameters required to move that plane towards the location/orientation of the standard plane in the 3D volume. During inference, the current plane image is passed iteratively to the network until it converges to the standard plane location. We explore the effect of using different transformation representations as regression outputs of ITN. Under a multi-task learning framework, we introduce additional classification probability outputs to the network to act as confidence measures for the regressed transformation parameters in order to further improve the localisation accuracy. When evaluated on 72 US volumes of fetal brain, our method achieves an error of 3.83 mm/12.7 $$^{\circ }$$ and 3.80 mm/12.6 $$^{\circ }$$ for the transventricular and transcerebellar planes respectively and takes 0.46 s per plane.

Yuanwei Li, Bishesh Khanal, Benjamin Hou, Amir Alansary, Juan J. Cerrolaza, Matthew Sinclair, Jacqueline Matthew, Chandni Gupta, Caroline Knight, Bernhard Kainz, Daniel Rueckert

Towards Radiotherapy Enhancement and Real Time Tumor Radiation Dosimetry Through 3D Imaging of Gold Nanoparticles Using XFCT

To enhance the efficiency of radiotherapy, a promising strategy consists in tumor exposure simultaneously to ionizing radiation (IR) and gold nanoparticles (GNPs). Indeed, when exposed to the radiation beam, these GNPs exhibit a photoelectric effect that generates reactive oxygen species (ROS) within the tumor and enhances the direct IR related deleterious effects. The measurement of this photoelectric effect thanks to an additional detector could give new insight for in vivo quantification and distribution of the GNPs in the tumor and more importantly for measuring the precise dose deposition. As a first step towards such a challenge, we present here materials and methods designed for detecting and measuring very low concentrations of GNPs in solution and for performing 3D reconstruction of small gold objects whose size is representative with respect to the considered application. A matrix image detector, whose sensitivity is first validated through the detection of few hundreds of micrograms of GNPs, is combined with a pinhole element and moved along a limited circular trajectory to acquire 2D fluorescence images of a motionless object. We implement a direct back-projection algorithm that provides a 3D image of these objects from this sparse set of data.

Caroline Vienne, Adrien Stolidi, Hermine Lemaire, Daniel Maier, Diana Renaud, Romain Grall, Sylvie Chevillard, Emilie Brun, Cécile Sicard, Olivier Limousin

Dual-Domain Cascaded Regression for Synthesizing 7T from 3T MRI

Due to the high cost and low accessibility of 7T magnetic resonance imaging (MRI) scanners, we propose a novel dual-domain cascaded regression framework to synthesize 7T images from the routine 3T images. Our framework is composed of two parallel and interactive multi-stage regression streams, where one stream regresses on spatial domain and the other regresses on frequency domain. These two streams complement each other and enable the learning of complex mappings between 3T and 7T images. We evaluated the proposed framework on a set of 3T and 7T images by leave-one-out cross-validation. Experimental results demonstrate that the proposed framework generates realistic 7T images and achieves better results than state-of-the-art methods.

Yongqin Zhang, Jie-Zhi Cheng, Lei Xiang, Pew-Thian Yap, Dinggang Shen

Machine Learning in Medical Imaging

Frontmatter

Concurrent Spatial and Channel ‘Squeeze & Excitation’ in Fully Convolutional Networks

Fully convolutional neural networks (F-CNNs) have set the state-of-the-art in image segmentation for a plethora of applications. Architectural innovations within F-CNNs have mainly focused on improving spatial encoding or network connectivity to aid gradient flow. In this paper, we explore an alternate direction of recalibrating the feature maps adaptively, to boost meaningful features, while suppressing weak ones. We draw inspiration from the recently proposed squeeze & excitation (SE) module for channel recalibration of feature maps for image classification. Towards this end, we introduce three variants of SE modules for image segmentation, (i) squeezing spatially and exciting channel-wise (cSE), (ii) squeezing channel-wise and exciting spatially (sSE) and (iii) concurrent spatial and channel squeeze & excitation (scSE). We effectively incorporate these SE modules within three different state-of-the-art F-CNNs (DenseNet, SD-Net, U-Net) and observe consistent improvement of performance across all architectures, while minimally effecting model complexity. Evaluations are performed on two challenging applications: whole brain segmentation on MRI scans and organ segmentation on whole body contrast enhanced CT scans.

Abhijit Guha Roy, Nassir Navab, Christian Wachinger

SPNet: Shape Prediction Using a Fully Convolutional Neural Network

Shape has widely been used in medical image segmentation algorithms to constrain a segmented region to a class of learned shapes. Recent methods for object segmentation mostly use deep learning algorithms. The state-of-the-art deep segmentation networks are trained with loss functions defined in a pixel-wise manner, which is not suitable for learning topological shape information and constraining segmentation results. In this paper, we propose a novel shape predictor network for object segmentation. The proposed deep fully convolutional neural network learns to predict shapes instead of learning pixel-wise classification. We apply the novel shape predictor network to X-ray images of cervical vertebra where shape is of utmost importance. The proposed network is trained with a novel loss function that computes the error in the shape domain. Experimental results demonstrate the effectiveness of the proposed method to achieve state-of-the-art segmentation, with correct topology and accurate fitting that matches expert segmentation.

S. M. Masudur Rahman Al Arif, Karen Knapp, Greg Slabaugh

Roto-Translation Covariant Convolutional Networks for Medical Image Analysis

We propose a framework for rotation and translation covariant deep learning using SE(2) group convolutions. The group product of the special Euclidean motion group SE(2) describes how a concatenation of two roto-translations results in a net roto-translation. We encode this geometric structure into convolutional neural networks (CNNs) via SE(2) group convolutional layers, which fit into the standard 2D CNN framework, and which allow to generically deal with rotated input samples without the need for data augmentation.We introduce three layers: a lifting layer which lifts a 2D (vector valued) image to an SE(2)-image, i.e., 3D (vector valued) data whose domain is SE(2); a group convolution layer from and to an SE(2)-image; and a projection layer from an SE(2)-image to a 2D image. The lifting and group convolution layers are SE(2) covariant (the output roto-translates with the input). The final projection layer, a maximum intensity projection over rotations, makes the full CNN rotation invariant.We show with three different problems in histopathology, retinal imaging, and electron microscopy that with the proposed group CNNs, state-of-the-art performance can be achieved, without the need for data augmentation by rotation and with increased performance compared to standard CNNs that do rely on augmentation.

Erik J. Bekkers, Maxime W. Lafarge, Mitko Veta, Koen A. J. Eppenhof, Josien P. W. Pluim, Remco Duits

Bimodal Network Architectures for Automatic Generation of Image Annotation from Text

Medical image analysis practitioners have embraced big data methodologies. This has created a need for large annotated datasets. The source of big data is typically large image collections and clinical reports recorded for these images. In many cases, however, building algorithms aimed at segmentation and detection of disease requires a training dataset with markings of the areas of interest on the image that match with the described anomalies. This process of annotation is expensive and needs the involvement of clinicians. In this work we propose two separate deep neural network architectures for automatic marking of a region of interest (ROI) on the image best representing a finding location, given a textual report or a set of keywords. One architecture consists of LSTM and CNN components and is trained end to end with images, matching text, and markings of ROIs for those images. The output layer estimates the coordinates of the vertices of a polygonal region. The second architecture uses a network pre-trained on a large dataset of the same image types for learning feature representations of the findings of interest. We show that for a variety of findings from chest X-ray images, both proposed architectures learn to estimate the ROI, as validated by clinical annotations. There is a clear advantage obtained from the architecture with pre-trained imaging network. The centroids of the ROIs marked by this network were on average at a distance equivalent to 5.1% of the image width from the centroids of the ground truth ROIs.

Mehdi Moradi, Ali Madani, Yaniv Gur, Yufan Guo, Tanveer Syeda-Mahmood

Multimodal Recurrent Model with Attention for Automated Radiology Report Generation

Radiologists routinely examine medical images such as X-Ray, CT, or MRI and write reports summarizing their descriptive findings and conclusive impressions. A computer-aided radiology report generation system can lighten the workload for radiologists considerably and assist them in decision making. Although the rapid development of deep learning technology makes the generation of a single conclusive sentence possible, results produced by existing methods are not sufficiently reliable due to the complexity of medical images. Furthermore, generating detailed paragraph descriptions for medical images remains a challenging problem. To tackle this problem, we propose a novel generative model which generates a complete radiology report automatically. The proposed model incorporates the Convolutional Neural Networks (CNNs) with the Long Short-Term Memory (LSTM) in a recurrent way. It is capable of not only generating high-level conclusive impressions, but also generating detailed descriptive findings sentence by sentence to support the conclusion. Furthermore, our multimodal model combines the encoding of the image and one generated sentence to construct an attention input to guide the generation of the next sentence, and henceforth maintains coherence among generated sentences. Experimental results on the publicly available Indiana U. Chest X-rays from the Open-i image collection show that our proposed recurrent attention model achieves significant improvements over baseline models according to multiple evaluation metrics.

Yuan Xue, Tao Xu, L. Rodney Long, Zhiyun Xue, Sameer Antani, George R. Thoma, Xiaolei Huang

Magnetic Resonance Spectroscopy Quantification Using Deep Learning

Magnetic resonance spectroscopy (MRS) is an important technique in biomedical research and it has the unique capability to give a non-invasive access to the biochemical content (metabolites) of scanned organs. In the literature, the quantification (the extraction of the potential biomarkers from the MRS signals) involves the resolution of an inverse problem based on a parametric model of the metabolite signal. However, poor signal-to-noise ratio (SNR), presence of the macromolecule signal or high correlation between metabolite spectral patterns can cause high uncertainties for most of the metabolites, which is one of the main reasons that prevents use of MRS in clinical routine. In this paper, quantification of metabolites in MR Spectroscopic imaging using deep learning is proposed. A regression framework based on the Convolutional Neural Networks (CNN) is introduced for an accurate estimation of spectral parameters. The proposed model learns the spectral features from a large-scale simulated data set with different variations of human brain spectra and SNRs. Experimental results demonstrate the accuracy of the proposed method, compared to state of the art standard quantification method (QUEST), on concentration of 20 metabolites and the macromolecule.

Nima Hatami, Michaël Sdika, Hélène Ratiney

A Lifelong Learning Approach to Brain MR Segmentation Across Scanners and Protocols

Convolutional neural networks (CNNs) have shown promising results on several segmentation tasks in magnetic resonance (MR) images. However, the accuracy of CNNs may degrade severely when segmenting images acquired with different scanners and/or protocols as compared to the training data, thus limiting their practical utility. We address this shortcoming in a lifelong multi-domain learning setting by treating images acquired with different scanners or protocols as samples from different, but related domains. Our solution is a single CNN with shared convolutional filters and domain-specific batch normalization layers, which can be tuned to new domains with only a few ( $${\approx }$$ 4) labelled images. Importantly, this is achieved while retaining performance on the older domains whose training data may no longer be available. We evaluate the method for brain structure segmentation in MR images. Results demonstrate that the proposed method largely closes the gap to the benchmark, which is training a dedicated CNN for each scanner.

Neerav Karani, Krishna Chaitanya, Christian Baumgartner, Ender Konukoglu

Respond-CAM: Analyzing Deep Models for 3D Imaging Data by Visualizations

The convolutional neural network (CNN) has become a powerful tool for various biomedical image analysis tasks, but there is a lack of visual explanation for the machinery of CNNs. In this paper, we present a novel algorithm, Respond-weighted Class Activation Mapping (Respond-CAM), for making CNN-based models interpretable by visualizing input regions that are important for predictions, especially for biomedical 3D imaging data inputs. Our method uses the gradients of any target concept (e.g. the score of target class) that flow into a convolutional layer. The weighted feature maps are combined to produce a heatmap that highlights the important regions in the image for predicting the target concept. We prove a preferable sum-to-score property of the Respond-CAM and verify its significant improvement on 3D images from the current state-of-the-art approach. Our tests on Cellular Electron Cryo-Tomography 3D images show that Respond-CAM achieves superior performance on visualizing the CNNs with 3D biomedical image inputs, and is able to get reasonably good results on visualizing the CNNs with natural image inputs. The Respond-CAM is an efficient and reliable approach for visualizing the CNN machinery, and is applicable to a wide variety of CNN model families and image analysis tasks. Our code is available at: https://github.com/xulabs/projects/tree/master/respond_cam .

Guannan Zhao, Bo Zhou, Kaiwen Wang, Rui Jiang, Min Xu

Generalizability vs. Robustness: Investigating Medical Imaging Networks Using Adversarial Examples

In this paper, for the first time, we propose an evaluation method for deep learning models that assesses the performance of a model not only in an unseen test scenario, but also in extreme cases of noise, outliers and ambiguous input data. To this end, we utilize adversarial examples, images that fool machine learning models, while looking imperceptibly different from original data, as a measure to evaluate the robustness of a variety of medical imaging models. Through extensive experiments on skin lesion classification and whole brain segmentation with state-of-the-art networks such as Inception and UNet, we show that models that achieve comparable performance regarding generalizability may have significant variations in their perception of the underlying data manifold, leading to an extensive performance gap in their robustness.

Magdalini Paschali, Sailesh Conjeti, Fernando Navarro, Nassir Navab

Subject2Vec: Generative-Discriminative Approach from a Set of Image Patches to a Vector

We propose an attention-based method that aggregates local image features to a subject-level representation for predicting disease severity. In contrast to classical deep learning that requires a fixed dimensional input, our method operates on a set of image patches; hence it can accommodate variable length input image without image resizing. The model learns a clinically interpretable subject-level representation that is reflective of the disease severity. Our model consists of three mutually dependent modules which regulate each other: (1) a discriminative network that learns a fixed-length representation from local features and maps them to disease severity; (2) an attention mechanism that provides interpretability by focusing on the areas of the anatomy that contribute the most to the prediction task; and (3) a generative network that encourages the diversity of the local latent features. The generative term ensures that the attention weights are non-degenerate while maintaining the relevance of the local regions to the disease severity. We train our model end-to-end in the context of a large-scale lung CT study of Chronic Obstructive Pulmonary Disease (COPD). Our model gives state-of-the art performance in predicting clinical measures of severity for COPD. The distribution of the attention provides the regional relevance of lung tissue to the clinical measurements.

Sumedha Singla, Mingming Gong, Siamak Ravanbakhsh, Frank Sciurba, Barnabas Poczos, Kayhan N. Batmanghelich

3D Context Enhanced Region-Based Convolutional Neural Network for End-to-End Lesion Detection

Detecting lesions from computed tomography (CT) scans is an important but difficult problem because non-lesions and true lesions can appear similar. 3D context is known to be helpful in this differentiation task. However, existing end-to-end detection frameworks of convolutional neural networks (CNNs) are mostly designed for 2D images. In this paper, we propose 3D context enhanced region-based CNN (3DCE) to incorporate 3D context information efficiently by aggregating feature maps of 2D images. 3DCE is easy to train and end-to-end in training and inference. A universal lesion detector is developed to detect all kinds of lesions in one algorithm using the DeepLesion dataset. Experimental results on this challenging task prove the effectiveness of 3DCE.

Ke Yan, Mohammadhadi Bagheri, Ronald M. Summers

Keep and Learn: Continual Learning by Constraining the Latent Space for Knowledge Preservation in Neural Networks

Data is one of the most important factors in machine learning. However, even if we have high-quality data, there is a situation in which access to the data is restricted. For example, access to the medical data from outside is strictly limited due to the privacy issues. In this case, we have to learn a model sequentially only with the data accessible in the corresponding stage. In this work, we propose a new method for preserving learned knowledge by modeling the high-level feature space and the output space to be mutually informative, and constraining feature vectors to lie in the modeled space during training. The proposed method is easy to implement as it can be applied by simply adding a reconstruction loss to an objective function. We evaluate the proposed method on CIFAR-10/100 and a chest X-ray dataset, and show benefits in terms of knowledge preservation compared to previous approaches.

Hyo-Eun Kim, Seungwook Kim, Jaehwan Lee

Distribution Matching Losses Can Hallucinate Features in Medical Image Translation

This paper discusses how distribution matching losses, such as those used in CycleGAN, when used to synthesize medical images can lead to mis-diagnosis of medical conditions. It seems appealing to use these new image synthesis methods for translating images from a source to a target domain because they can produce high quality images and some even do not require paired data. However, the basis of how these image translation models work is through matching the translation output to the distribution of the target domain. This can cause an issue when the data provided in the target domain has an over or under representation of some classes (e.g. healthy or sick). When the output of an algorithm is a transformed image there are uncertainties whether all known and unknown class labels have been preserved or changed. Therefore, we recommend that these translated images should not be used for direct interpretation (e.g. by doctors) because they may lead to misdiagnosis of patients based on hallucinated image features by an algorithm that matches a distribution. However there are many recent papers that seem as though this is the goal.

Joseph Paul Cohen, Margaux Luck, Sina Honari

Generative Invertible Networks (GIN): Pathophysiology-Interpretable Feature Mapping and Virtual Patient Generation

Machine learning methods play increasingly important roles in pre-procedural planning for complex surgeries and interventions. Very often, however, researchers find the historical records of emerging surgical techniques, such as the transcatheter aortic valve replacement (TAVR), are highly scarce in quantity. In this paper, we address this challenge by proposing novel generative invertible networks (GIN) to select features and generate high-quality virtual patients that may potentially serve as an additional data source for machine learning. Combining a convolutional neural network (CNN) and generative adversarial networks (GAN), GIN discovers the pathophysiologic meaning of the feature space. Moreover, a test of predicting the surgical outcome directly using the selected features results in a high accuracy of 81.55%, which suggests little pathophysiologic information has been lost while conducting the feature selection. This demonstrates GIN can generate virtual patients not only visually authentic but also pathophysiologically interpretable.

Jialei Chen, Yujia Xie, Kan Wang, Zih Huei Wang, Geet Lahoti, Chuck Zhang, Mani A. Vannan, Ben Wang, Zhen Qian

Training Medical Image Analysis Systems like Radiologists

The training of medical image analysis systems using machine learning approaches follows a common script: collect and annotate a large dataset, train the classifier on the training set, and test it on a hold-out test set. This process bears no direct resemblance with radiologist training, which is based on solving a series of tasks of increasing difficulty, where each task involves the use of significantly smaller datasets than those used in machine learning. In this paper, we propose a novel training approach inspired by how radiologists are trained. In particular, we explore the use of meta-training that models a classifier based on a series of tasks. Tasks are selected using teacher-student curriculum learning, where each task consists of simple classification problems containing small training sets. We hypothesize that our proposed meta-training approach can be used to pre-train medical image analysis models. This hypothesis is tested on the automatic breast screening classification from DCE-MRI trained with weakly labeled datasets. The classification performance achieved by our approach is shown to be the best in the field for that application, compared to state of art baseline approaches: DenseNet, multiple instance learning and multi-task learning.

Gabriel Maicas, Andrew P. Bradley, Jacinto C. Nascimento, Ian Reid, Gustavo Carneiro

Joint High-Order Multi-Task Feature Learning to Predict the Progression of Alzheimer’s Disease

Alzheimer’s disease (AD) is a degenerative brain disease that affects millions of people around the world. As populations in the United States and worldwide age, the prevalence of Alzheimer’s disease will only increase. In turn, the social and financial costs of AD will create a difficult environment for many families and caregivers across the globe. By combining genetic information, brain scans, and clinical data, gathered over time through the Alzheimer’s Disease Neuroimaging Initiative (ADNI), we propose a new Joint High-Order Multi-Modal Multi-Task Feature Learning method to predict the cognitive performance and diagnosis of patients with and without AD.

Lodewijk Brand, Hua Wang, Heng Huang, Shannon Risacher, Andrew Saykin, Li Shen

Fast Multiple Landmark Localisation Using a Patch-Based Iterative Network

We propose a new Patch-based Iterative Network (PIN) for fast and accurate landmark localisation in 3D medical volumes. PIN utilises a Convolutional Neural Network (CNN) to learn the spatial relationship between an image patch and anatomical landmark positions. During inference, patches are repeatedly passed to the CNN until the estimated landmark position converges to the true landmark location. PIN is computationally efficient since the inference stage only selectively samples a small number of patches in an iterative fashion rather than a dense sampling at every location in the volume. Our approach adopts a multi-task learning framework that combines regression and classification to improve localisation accuracy. We extend PIN to localise multiple landmarks by using principal component analysis, which models the global anatomical relationships between landmarks. We have evaluated PIN using 72 3D ultrasound images from fetal screening examinations. PIN achieves quantitatively an average landmark localisation error of 5.59 mm and a runtime of 0.44 s to predict 10 landmarks per volume. Qualitatively, anatomical 2D standard scan planes derived from the predicted landmark locations are visually similar to the clinical ground truth.

Yuanwei Li, Amir Alansary, Juan J. Cerrolaza, Bishesh Khanal, Matthew Sinclair, Jacqueline Matthew, Chandni Gupta, Caroline Knight, Bernhard Kainz, Daniel Rueckert

Omni-Supervised Learning: Scaling Up to Large Unlabelled Medical Datasets

Two major bottlenecks in increasing algorithmic performance in the field of medical imaging analysis are the typically limited size of datasets and the shortage of expert labels for large datasets. This paper investigates approaches to overcome the latter via omni-supervised learning: a special case of semi-supervised learning. Our approach seeks to exploit a small annotated dataset and iteratively increase model performance by scaling up to refine the model using a large set of unlabelled data. By fusing predictions of perturbed inputs, the method generates new training annotations without human intervention. We demonstrate the effectiveness of the proposed framework to localize multiple structures in a 3D US dataset of 4044 fetal brain volumes with an initial expert annotation of just 200 volumes (5% in total) in training. Results show that structure localization error was reduced from 2.07 ± 1.65 mm to 1.76 ± 1.35 mm on the hold-out validation set.

Ruobing Huang, J. Alison Noble, Ana I. L. Namburete

Recurrent Neural Networks for Classifying Human Embryonic Stem Cell-Derived Cardiomyocytes

Classification of human embryonic stem cell-derived cardiomyocytes (hESC-CMs) is important for many applications in cardiac regenerative medicine. However, a key challenge is the lack of ground truth labels for hESC-CMs: Whereas adult phenotypes are well-characterized in terms of their action potentials (APs), the understanding of how the shape of the AP of immature CMs relates to that of adult CMs remains limited. Recently, a new metamorphosis distance has been proposed to determine if a query immature AP is closer to a particular adult AP phenotype. However, the metamorphosis distance is difficult to compute making it unsuitable for classifying a large number of CMs. In this paper we propose a semi-supervised learning framework for the classification of hESC-CM APs. The proposed framework is based on a recurrent neural network with LSTM units whose parameters are learned by minimizing a loss consisting of two parts. The supervised part uses labeled data obtained from computational models of adult CMs, while the unsupervised part uses the metamorphosis distance in an efficient way. Experiments confirm the benefit of integrating information from both adult and stem cell-derived domains in the learning scheme, and also show that the proposed method generates results similar to the state-of-the-art (94.73%) with clear computational advantages when applied to new samples.

Carolina Pacheco, René Vidal

Group-Driven Reinforcement Learning for Personalized mHealth Intervention

Due to the popularity of smartphones and wearable devices nowadays, mobile health (mHealth) technologies are promising to bring positive and wide impacts on people’s health. State-of-the-art decision-making methods for mHealth rely on some ideal assumptions. Those methods either assume that the users are completely homogenous or completely heterogeneous. However, in reality, a user might be similar with some, but not all, users. In this paper, we propose a novel group-driven reinforcement learning method for the mHealth. We aim to understand how to share information among similar users to better convert the limited user information into sharper learned RL policies. Specifically, we employ the K-means clustering method to group users based on their trajectory information similarity and learn a shared RL policy for each group. Extensive experiment results have shown that our method can achieve clear gains over the state-of-the-art RL methods for mHealth.

Feiyun Zhu, Jun Guo, Zheng Xu, Peng Liao, Liu Yang, Junzhou Huang

Joint Correlational and Discriminative Ensemble Classifier Learning for Dementia Stratification Using Shallow Brain Multiplexes

The demented brain wiring undergoes several changes with dementia progression. However, in early dementia stages, particularly early mild cognitive impairment (eMCI), these remain challenging to spot. Hence, developing accurate diagnostic techniques for eMCI identification is critical for early intervention to prevent the onset of Alzheimer’s Disease (AD). There is a large body of machine-learning based research developed for classifying different brain states (e.g., AD vs MCI). These works can be fundamentally grouped into two categories. The first uses correlational methods, such as canonical correlation analysis (CCA) and its variants, with the aim to identify most correlated features for diagnosis. The second includes discriminative methods, such as feature selection methods and linear discriminative analysis (LDA) and its variants to identify brain features that distinguish between two brain states. However, existing methods examine these correlational and discriminative brain data independently, which overlooks the complementary information provided by both techniques, which could prove to be useful in the classification of patients with dementia. On the other hand, how early dementia affects cortical brain connections in morphology remains largely unexplored. To address these limitations, we propose a joint correlational and discriminative ensemble learning framework for eMCI diagnosis that leverages a novel brain network representation, derived from the cortex. Specifically, we devise ‘the shallow convolutional brain multiplex’ (SCBM), which not only measures the similarity in morphology between pairs of brain regions, but also encodes the relationship between two morphological brain networks. Then, we represent each individual brain using a set of SCBMs, which are used to train joint ensemble CCA-SVM and LDA-based classifier. Our framework outperformed several state-of-the-art methods by 3-7% including independent correlational and discriminative methods.

Rory Raeper, Anna Lisowska, Islem Rekik

Statistical Analysis for Medical Imaging

Frontmatter

FDR-HS: An Empirical Bayesian Identification of Heterogenous Features in Neuroimage Analysis

Recent studies found that in voxel-based neuroimage analysis, detecting and differentiating “procedural bias” that are introduced during the preprocessing steps from lesion features, not only can help boost accuracy but also can improve interpretability. To the best of our knowledge, GSplit LBI is the first model proposed in the literature to simultaneously capture both procedural bias and lesion features. Despite the fact that it can improve prediction power by leveraging the procedural bias, it may select spurious features due to the multicollinearity in high dimensional space. Moreover, it does not take into account the heterogeneity of these two types of features. In fact, the procedural bias and lesion features differ in terms of volumetric change and spatial correlation pattern. To address these issues, we propose a “two-groups” Empirical-Bayes method called “FDR-HS” (False-Discovery-Rate Heterogenous Smoothing). Such method is able to not only avoid multicollinearity, but also exploit the heterogenous spatial patterns of features. In addition, it enjoys the simplicity in implementation by introducing hidden variables, which turns the problem into a convex optimization scheme and can be solved efficiently by the expectation-maximum (EM) algorithm. Empirical experiments have been evaluated on the Alzheimer’s Disease Neuroimage Initiative (ADNI) database. The advantage of the proposed model is verified by improved interpretability and prediction power using selected features by FDR-HS.

Xinwei Sun, Lingjing Hu, Fandong Zhang, Yuan Yao, Yizhou Wang

Order-Sensitive Deep Hashing for Multimorbidity Medical Image Retrieval

In this paper, we propose an order-sensitive deep hashing for scalable medical image retrieval in the scenario of coexistence of multiple medical conditions. The pairwise similarity preservation in existing hashing methods is not suitable for this multimorbidity medical image retrieval problem. To capture the multilevel semantic similarity, we formulate it as a multi-label hashing learning problem. We design a deep hash model for powerful feature extraction and preserve the ranking list with a triplet based ranking loss for better assessment assistance. We further introduce the cross-entropy based multi-label classification loss to exploit multi-label information. We solve the optimization problem by continuation to reduce the quantization loss. We conduct extensive experiments on a large database constructed on the NIH Chest X-ray database to validate the efficacy of the proposed algorithm. Experimental results demonstrate that our order sensitive deep hashing leads to superior performance compared with several state-of-the-art hashing methods.

Zhixiang Chen, Ruojin Cai, Jiwen Lu, Jianjiang Feng, Jie Zhou

Exact Combinatorial Inference for Brain Images

The permutation test is known as the exact test procedure in statistics. However, often it is not exact in practice and only an approximate method since only a small fraction of every possible permutation is generated. Even for a small sample size, it often requires to generate tens of thousands permutations, which can be a serious computational bottleneck. In this paper, we propose a novel combinatorial inference procedure that enumerates all possible permutations combinatorially without any resampling. The proposed method is validated against the standard permutation test in simulation studies with the ground truth. The method is further applied in twin DTI study in determining the genetic contribution of the minimum spanning tree of the structural brain connectivity.

Moo K. Chung, Zhan Luo, Alex D. Leow, Andrew L. Alexander, Richard J. Davidson, H. Hill Goldsmith

Statistical Inference with Ensemble of Clustered Desparsified Lasso

Medical imaging involves high-dimensional data, yet their acquisition is obtained for limited samples. Multivariate predictive models have become popular in the last decades to fit some external variables from imaging data, and standard algorithms yield point estimates of the model parameters. It is however challenging to attribute confidence to these parameter estimates, which makes solutions hardly trustworthy. In this paper we present a new algorithm that assesses parameters statistical significance and that can scale even when the number of predictors $$p \ge 10^{5}$$ is much higher than the number of samples $$n \le 10^{3}$$ , by leveraging structure among features. Our algorithm combines three main ingredients: a powerful inference procedure for linear models –the so-called Desparsified Lasso– feature clustering and an ensembling step. We first establish that Desparsified Lasso alone cannot handle $$n \ll p$$ regimes; then we demonstrate that the combination of clustering and ensembling provides an accurate solution, whose specificity is controlled. We also demonstrate stability improvements on two neuroimaging datasets.

Jérôme-Alexis Chevalier, Joseph Salmon, Bertrand Thirion

Low-Rank Representation for Multi-center Autism Spectrum Disorder Identification

Effective utilization of multi-center data for autism spectrum disorder (ASD) diagnosis recently has attracted increasing attention, since a large number of subjects from multiple centers are beneficial for investigating the pathological changes of ASD. To better utilize the multi-center data, various machine learning methods have been proposed. However, most previous studies do not consider the problem of data heterogeneity (e.g., caused by different scanning parameters and subject populations) among multi-center datasets, which may degrade the diagnosis performance based on multi-center data. To address this issue, we propose a multi-center low-rank representation learning (MCLRR) method for ASD diagnosis, to seek a good representation of subjects from different centers. Specifically, we first choose one center as the target domain and the remaining centers as source domains. We then learn a domain-specific projection for each source domain to transform them into an intermediate representation space. To further suppress the heterogeneity among multiple centers, we disassemble the learned projection matrices into a shared part and a sparse unique part. With the shared matrix, we can project target domain to the common latent space, and linearly represent the source domain datasets using data in the transformed target domain. Based on the learned low-rank representation, we employ the k-nearest neighbor (KNN) algorithm to perform disease classification. Our method has been evaluated on the ABIDE database, and the superior classification results demonstrate the effectiveness of our proposed method as compared to other methods.

Mingliang Wang, Daoqiang Zhang, Jiashuang Huang, Dinggang Shen, Mingxia Liu

Exploring Uncertainty Measures in Deep Networks for Multiple Sclerosis Lesion Detection and Segmentation

Deep learning (DL) networks have recently been shown to outperform other segmentation methods on various public, medical-image challenge datasets [3, 11, 16], especially for large pathologies. However, in the context of diseases such as Multiple Sclerosis (MS), monitoring all the focal lesions visible on MRI sequences, even very small ones, is essential for disease staging, prognosis, and evaluating treatment efficacy. Moreover, producing deterministic outputs hinders DL adoption into clinical routines. Uncertainty estimates for the predictions would permit subsequent revision by clinicians. We present the first exploration of multiple uncertainty estimates based on Monte Carlo (MC) dropout [4] in the context of deep networks for lesion detection and segmentation in medical images. Specifically, we develop a 3D MS lesion segmentation CNN, augmented to provide four different voxel-based uncertainty measures based on MC dropout. We train the network on a proprietary, large-scale, multi-site, multi-scanner, clinical MS dataset, and compute lesion-wise uncertainties by accumulating evidence from voxel-wise uncertainties within detected lesions. We analyze the performance of voxel-based segmentation and lesion-level detection by choosing operating points based on the uncertainty. Empirical evidence suggests that uncertainty measures consistently allow us to choose superior operating points compared only using the network’s sigmoid output as a probability.

Tanya Nair, Doina Precup, Douglas L. Arnold, Tal Arbel

Inherent Brain Segmentation Quality Control from Fully ConvNet Monte Carlo Sampling

We introduce inherent measures for effective quality control of brain segmentation based on a Bayesian fully convolutional neural network, using model uncertainty. Monte Carlo samples from the posterior distribution are efficiently generated using dropout at test time. Based on these samples, we introduce next to a voxel-wise uncertainty map also three metrics for structure-wise uncertainty. We then incorporate these structure-wise uncertainty in group analyses as a measure of confidence in the observation. Our results show that the metrics are highly correlated to segmentation accuracy and therefore present an inherent measure of segmentation quality. Furthermore, group analysis with uncertainty results in effect sizes closer to that of manual annotations. The introduced uncertainty metrics can not only be very useful in translation to clinical practice but also provide automated quality control and group analyses in processing large data repositories.

Abhijit Guha Roy, Sailesh Conjeti, Nassir Navab, Christian Wachinger

Perfect MCMC Sampling in Bayesian MRFs for Uncertainty Estimation in Segmentation

Typical segmentation methods produce a single optimal solution and fail to inform about (i) the confidence/uncertainty in the object boundaries or (ii) alternate close-to-optimal solutions. To estimate uncertainty, some methods intend to sample segmentations from an associated posterior model using Markov chain Monte Carlo (MCMC) sampling or perturbation models. However, they cannot guarantee sampling from the true posterior, deviating significantly in practice. We propose a novel method that guarantees exact MCMC sampling, in finite time, of multi-label segmentations from generic Bayesian Markov random field (MRF) models. For exact sampling, we propose Fill’s strategy and extend it to generic MRF models via a novel bounding chain algorithm. Results on simulated data and clinical brain images from 4 classic problems show that our uncertainty estimates gain accuracy over the state of the art.

Saurabh Garg, Suyash P. Awate

On the Effect of Inter-observer Variability for a Reliable Estimation of Uncertainty of Medical Image Segmentation

Uncertainty estimation methods are expected to improve the understanding and quality of computer-assisted methods used in medical applications (e.g., neurosurgical interventions, radiotherapy planning), where automated medical image segmentation is crucial. In supervised machine learning, a common practice to generate ground truth label data is to merge observer annotations. However, as many medical image tasks show a high inter-observer variability resulting from factors such as image quality, different levels of user expertise and domain knowledge, little is known as to how inter-observer variability and commonly used fusion methods affect the estimation of uncertainty of automated image segmentation. In this paper we analyze the effect of common image label fusion techniques on uncertainty estimation, and propose to learn the uncertainty among observers. The results highlight the negative effect of fusion methods applied in deep learning, to obtain reliable estimates of segmentation uncertainty. Additionally, we show that the learned observers’ uncertainty can be combined with current standard Monte Carlo dropout Bayesian neural networks to characterize uncertainty of model’s parameters.

Alain Jungo, Raphael Meier, Ekin Ermis, Marcela Blatti-Moreno, Evelyn Herrmann, Roland Wiest, Mauricio Reyes

Towards Safe Deep Learning: Accurately Quantifying Biomarker Uncertainty in Neural Network Predictions

Automated medical image segmentation, specifically using deep learning, has shown outstanding performance in semantic segmentation tasks. However, these methods rarely quantify their uncertainty, which may lead to errors in downstream analysis. In this work we propose to use Bayesian neural networks to quantify uncertainty within the domain of semantic segmentation. We also propose a method to convert voxel-wise segmentation uncertainty into volumetric uncertainty, and calibrate the accuracy and reliability of confidence intervals of derived measurements. When applied to a tumour volume estimation application, we demonstrate that by using such modelling of uncertainty, deep learning systems can be made to report volume estimates with well-calibrated error-bars, making them safer for clinical use. We also show that the uncertainty estimates extrapolate to unseen data, and that the confidence intervals are robust in the presence of artificial noise. This could be used to provide a form of quality control and quality assurance, and may permit further adoption of deep learning tools in the clinic.

Zach Eaton-Rosen, Felix Bragman, Sotirios Bisdas, Sébastien Ourselin, M. Jorge Cardoso

Image Registration Methods

Frontmatter

Registration-Based Patient-Specific Musculoskeletal Modeling Using High Fidelity Cadaveric Template Model

We propose a method to construct patient-specific musculoskeletal model using a template obtained from a high fidelity cadaver images. Musculoskeletal simulation has been traditionally performed using a string-type muscle model that represent the line-of-forces of a muscle with strings, while recent studies found that a more detailed model that represents muscle’s 3D shape and internal fiber arrangement would provide better simulation accuracy when sufficient computational resources are available. Thus, we aim at reconstructing patient-specific muscle fiber arrangement from clinically available modalities such as CT or (non-diffusion) MRI. Our approach follows a conventional biomedical modeling approach which first constructs a highly accurate generic template model which is then registered using the patient-specific measurement. Our template is created from a high-resolution cryosectioned volume and newly proposed registration method aligns the surface of bones and muscles as well as the local orientation inside the muscle (i.e., muscle fiber direction). The evaluation was performed using cryosectioned volumes of two cadavers, one of which accompanies images obtained from clinical CT and MRI. Quantitative evaluation demonstrated that the mean fiber distance error between the one estimated from CT and the ground truth was 4.16, 3.76, and 2.45 mm for the gluteus maximus, medius, and minimus muscles, respectively. The qualitative visual assessment on 20 clinical CT images suggested plausible fiber arrangements that would be able to be translated to biomechanical simulation.

Yoshito Otake, Masaki Takao, Norio Fukuda, Shu Takagi, Naoto Yamamura, Nobuhiko Sugano, Yoshinobu Sato

Atlas Propagation Through Template Selection

Template-based atlas propagation can reduce registration cost in multi-atlas segmentation. In this method, atlases and testing images are registered to a common template. We show that using a common template may be suboptimal for reducing atlas propagation errors. Instead, we propose to apply a custom selected template for each testing image by employing a large template library and a fast template selection technique. The proposed method significantly outperforms common template based atlas propagation. Using a template library with 50 images, our method produced comparable results to standard direct registration-based multi-atlas segmentation with a small fraction of registration cost.

Hongzhi Wang, Rui Zhang

Spatio-Temporal Atlas of Bone Mineral Density Ageing

Osteoporosis is an age-associated bone disease characterised by low bone mass. An improved understanding of the underlying mechanism for age-related bone loss could lead to enhanced preventive and therapeutic strategies for osteoporosis. In this work, we propose a fully automatic pipeline for developing a spatio-temporal atlas of ageing bone. Bone maps are collected using a dual-energy X-ray absorptiometry (DXA) scanner. Each scan is then warped into a reference template to eliminate morphological variation and establish a correspondence between pixel coordinates. Pixel-wise bone density evolution with ageing was modelled using smooth quantile curves. To construct the atlas, we amalgamated a cohort of 1714 Caucasian women (20–87 years) from five different centres in North Western Europe. As a systematic difference exists between different DXA manufacturers, we propose a novel calibration technique to homogenise bone density measurements across the centres. This technique utilises an alternating minimisation technique to map the observed bone density measurements into a latent standardised space. To the best of our knowledge, this is the first spatio-temporal atlas of ageing bone.

Mohsen Farzi, Jose M. Pozo, Eugene McCloskey, Richard Eastell, J. Mark Wilkinson, Alejandro F. Frangi

Unsupervised Learning for Fast Probabilistic Diffeomorphic Registration

Traditional deformable registration techniques achieve impressive results and offer a rigorous theoretical treatment, but are computationally intensive since they solve an optimization problem for each image pair. Recently, learning-based methods have facilitated fast registration by learning spatial deformation functions. However, these approaches use restricted deformation models, require supervised labels, or do not guarantee a diffeomorphic (topology-preserving) registration. Furthermore, learning-based registration tools have not been derived from a probabilistic framework that can offer uncertainty estimates. In this paper, we present a probabilistic generative model and derive an unsupervised learning-based inference algorithm that makes use of recent developments in convolutional neural networks (CNNs). We demonstrate our method on a 3D brain registration task, and provide an empirical analysis of the algorithm. Our approach results in state of the art accuracy and very fast runtimes, while providing diffeomorphic guarantees and uncertainty estimates. Our implementation is available online at http://voxelmorph.csail.mit.edu .

Adrian V. Dalca, Guha Balakrishnan, John Guttag, Mert R. Sabuncu

Adversarial Similarity Network for Evaluating Image Alignment in Deep Learning Based Registration

This paper introduces an unsupervised adversarial similarity network for image registration. Unlike existing deep learning registration frameworks, our approach does not require ground-truth deformations and specific similarity metrics. We connect a registration network and a discrimination network with a deformable transformation layer. The registration network is trained with feedback from the discrimination network, which is designed to judge whether a pair of registered images are sufficiently similar. Using adversarial training, the registration network is trained to predict deformations that are accurate enough to fool the discrimination network. Experiments on four brain MRI datasets indicate that our method yields registration performance that is promising in both accuracy and efficiency compared with state-of-the-art registration methods, including those based on deep learning.

Jingfan Fan, Xiaohuan Cao, Zhong Xue, Pew-Thian Yap, Dinggang Shen

Improving Surgical Training Phantoms by Hyperrealism: Deep Unpaired Image-to-Image Translation from Real Surgeries

Current ‘dry lab’ surgical phantom simulators are a valuable tool for surgeons which allows them to improve their dexterity and skill with surgical instruments. These phantoms mimic the haptic and shape of organs of interest, but lack a realistic visual appearance. In this work, we present an innovative application in which representations learned from real intraoperative endoscopic sequences are transferred to a surgical phantom scenario. The term hyperrealism is introduced in this field, which we regard as a novel subform of surgical augmented reality for approaches that involve real-time object transfigurations. For related tasks in the computer vision community, unpaired cycle-consistent Generative Adversarial Networks (GANs) have shown excellent results on still RGB images. Though, application of this approach to continuous video frames can result in flickering, which turned out to be especially prominent for this application. Therefore, we propose an extension of cycle-consistent GANs, named tempCycleGAN, to improve temporal consistency. The novel method is evaluated on captures of a silicone phantom for training endoscopic reconstructive mitral valve procedures. Synthesized videos show highly realistic results with regard to (1) replacement of the silicone appearance of the phantom valve by intraoperative tissue texture, while (2) explicitly keeping crucial features in the scene, such as instruments, sutures and prostheses. Compared to the original CycleGAN approach, tempCycleGAN efficiently removes flickering between frames. The overall approach is expected to change the future design of surgical training simulators since the generated sequences clearly demonstrate the feasibility to enable a considerably more realistic training experience for minimally-invasive procedures.

Sandy Engelhardt, Raffaele De Simone, Peter M. Full, Matthias Karck, Ivo Wolf

Computing CNN Loss and Gradients for Pose Estimation with Riemannian Geometry

Pose estimation, i.e. predicting a 3D rigid transformation with respect to a fixed co-ordinate frame in, SE(3), is an omnipresent problem in medical image analysis. Deep learning methods often parameterise poses with a representation that separates rotation and translation. As commonly available frameworks do not provide means to calculate loss on a manifold, regression is usually performed using the L2-norm independently on the rotation’s and the translation’s parameterisations. This is a metric for linear spaces that does not take into account the Lie group structure of SE(3).In this paper, we propose a general Riemannian formulation of the pose estimation problem, and train CNNs directly on SE(3) equipped with a left-invariant Riemannian metric. The loss between the ground truth and predicted pose (elements of the manifold) is calculated as the Riemannian geodesic distance, which couples together the translation and rotation components. Network weights are updated by back-propagating the gradient with respect to the predicted pose on the tangent space of the manifold SE(3). We thoroughly evaluate the effectiveness of our loss function by comparing its performance with popular and most commonly used existing methods, on tasks such as image-based localisation and intensity-based 2D/3D registration. We also show that hyper-parameters, used in our loss function to weight the contribution between rotations and translations, can be intrinsically calculated from the dataset to achieve greater performance margins.

Benjamin Hou, Nina Miolane, Bishesh Khanal, Matthew C. H. Lee, Amir Alansary, Steven McDonagh, Jo V. Hajnal, Daniel Rueckert, Ben Glocker, Bernhard Kainz

GDL-FIRE: Deep Learning-Based Fast 4D CT Image Registration

Deformable image registration (DIR) in thoracic 4D CT image data is integral for, e.g., radiotherapy treatment planning, but time consuming. Deep learning (DL)-based DIR promises speed-up, but present solutions are limited to small image sizes. In this paper, we propose a General Deep Learning-based Fast Image Registration framework suitable for application to clinical 4D CT data (GDL-FIRE $$^\text {4D}$$ ). Open source DIR frameworks are selected to build GDL-FIRE $$^\text {4D}$$ variants. In-house-acquired 4D CT images serve as training and open 4D CT data repositories as external evaluation cohorts. Taking up current attempts to DIR uncertainty estimation, dropout-based uncertainty maps for GDL-FIRE $$^\text {4D}$$ variants are analyzed. We show that (1) registration accuracy of GDL-FIRE $$^\text {4D}$$ and standard DIR are in the same order; (2) computation time is reduced to a few seconds (here: 60-fold speed-up); and (3) dropout-based uncertainty maps do not correlate to across-DIR vector field differences, raising doubts about applicability in the given context.

Thilo Sentker, Frederic Madesta, René Werner

Adversarial Deformation Regularization for Training Image Registration Neural Networks

We describe an adversarial learning approach to constrain convolutional neural network training for image registration, replacing heuristic smoothness measures of displacement fields often used in these tasks. Using minimally-invasive prostate cancer intervention as an example application, we demonstrate the feasibility of utilizing biomechanical simulations to regularize a weakly-supervised anatomical-label-driven registration network for aligning pre-procedural magnetic resonance (MR) and 3D intra-procedural transrectal ultrasound (TRUS) images. A discriminator network is optimized to distinguish the registration-predicted displacement fields from the motion data simulated by finite element analysis. During training, the registration network simultaneously aims to maximize similarity between anatomical labels that drives image alignment and to minimize an adversarial generator loss that measures divergence between the predicted- and simulated deformation. The end-to-end trained network enables efficient and fully-automated registration that only requires an MR and TRUS image pair as input, without anatomical labels or simulated data during inference. 108 pairs of labelled MR and TRUS images from 76 prostate cancer patients and 71,500 nonlinear finite-element simulations from 143 different patients were used for this study. We show that, with only gland segmentation as training labels, the proposed method can help predict physically plausible deformation without any other smoothness penalty. Based on cross-validation experiments using 834 pairs of independent validation landmarks, the proposed adversarial-regularized registration achieved a target registration error of 6.3 mm that is significantly lower than those from several other regularization methods.

Yipeng Hu, Eli Gibson, Nooshin Ghavami, Ester Bonmati, Caroline M. Moore, Mark Emberton, Tom Vercauteren, J. Alison Noble, Dean C. Barratt

Fast Registration by Boundary Sampling and Linear Programming

We address the problem of image registration when speed is more important than accuracy. We present a series of simplification and approximations applicable to almost any pixel-based image similarity criterion. We first sample the image at a set of sparse keypoints in a direction normal to image edges and then create a piecewise linear convex approximation of the individual contributions. We obtain a linear program for which a global optimum can be found very quickly by standard algorithms. The linear program formulation also allows for an easy addition of regularization and trust-region bounds. We have tested the approach for affine and B-spline transformation representation but any linear model can be used. Larger deformations can be handled by multiresolution. We show that our method is much faster than pixel-based registration, with only a small loss of accuracy. In comparison to standard keypoint based registration, our method is applicable even if individual keypoints cannot be reliably identified and matched.

Jan Kybic, Jiří Borovec

Learning an Infant Body Model from RGB-D Data for Accurate Full Body Motion Analysis

Infant motion analysis enables early detection of neurodevelopmental disorders like cerebral palsy (CP). Diagnosis, however, is challenging, requiring expert human judgement. An automated solution would be beneficial but requires the accurate capture of 3D full-body movements. To that end, we develop a non-intrusive, low-cost, lightweight acquisition system that captures the shape and motion of infants. Going beyond work on modeling adult body shape, we learn a 3D Skinned Multi-Infant Linear body model (SMIL) from noisy, low-quality, and incomplete RGB-D data. SMIL is publicly available for research purposes at http://s.fhg.de/smil . We demonstrate the capture of shape and motion with 37 infants in a clinical environment. Quantitative experiments show that SMIL faithfully represents the data and properly factorizes the shape and pose of the infants. With a case study based on general movement assessment (GMA), we demonstrate that SMIL captures enough information to allow medical assessment. SMIL provides a new tool and a step towards a fully automatic system for GMA.

Nikolas Hesse, Sergi Pujades, Javier Romero, Michael J. Black, Christoph Bodensteiner, Michael Arens, Ulrich G. Hofmann, Uta Tacke, Mijna Hadders-Algra, Raphael Weinberger, Wolfgang Müller-Felber, A. Sebastian Schroeder

Consistent Correspondence of Cone-Beam CT Images Using Volume Functional Maps

Dense correspondence between Cone-Beam CT (CBCT) images is desirable in clinical orthodontics for both intra-patient treatment evaluation and inter-patient statistical shape modeling and attribute transfer. Conventional 3D deformable image registration relies on time-consuming iterative optimization for correspondences. The recent forest-based correspondence methods often require large offline training costs and a separate regularization in the post-processing. In this work, we propose an efficient volume functional map for dense and consistent correspondence between CBCT images. We design a group of volume functions specifically for CBCT images and construct a reduced functional space on supervoxels. The low-dimensional map between the limited spectral bases determines the dense supervoxel-wise correspondence in an unsupervised way. Further, we perform consistent functional mapping in a collection of volume images to handle ambiguous correspondences of craniofacial structures, e.g., those due to the intercuspation. A subset of orthonormal volume functional maps is optimized on a Stiefel manifold simultaneously, which determines the cycle-consistent pairwise functional maps in the volume collection. Benefits of the proposed volume functional maps have been illustrated in label propagation and segmentation transfer with improved performance over conventional methods.

Yungeng Zhang, Yuru Pei, Yuke Guo, Gengyu Ma, Tianmin Xu, Hongbin Zha

Elastic Registration of Geodesic Vascular Graphs

Vascular graphs can embed a number of high-level features, from morphological parameters, to functional biomarkers, and represent an invaluable tool for longitudinal and cross-sectional clinical inference. This, however, is only feasible when graphs are co-registered together, allowing coherent multiple comparisons. The robust registration of vascular topologies stands therefore as key enabling technology for group-wise analyses. In this work, we present an end-to-end vascular graph registration approach, that aligns networks with non-linear geometries and topological deformations, by introducing a novel over-connected geodesic vascular graph formulation, and without enforcing any anatomical prior constraint. The 3D elastic graph registration is then performed with state-of-the-art graph matching methods used in computer vision. Promising results of vascular matching are found using graphs from synthetic and real angiographies. Observations and future designs are discussed towards potential clinical applications.

Stefano Moriconi, Maria A. Zuluaga, H. Rolf Jäger, Parashkev Nachev, Sébastien Ourselin, M. Jorge Cardoso

Efficient Groupwise Registration of MR Brain Images via Hierarchical Graph Set Shrinkage

Accurate and efficient groupwise registration is important for population analysis. Current groupwise registration methods suffer from high computational cost, which hinders their application to large image datasets. To alleviate the computational burden while delivering accurate groupwise registration result, we propose to use a hierarchical graph set to model the complex image distribution with possibly large anatomical variations, and then turn the groupwise registration problem as a series of simple-to-solve graph shrinkage problems. Specifically, first, we divide the input images into a set of image clusters hierarchically, where images within each image cluster have similar anatomical appearances whereas images falling into different image clusters have varying anatomical appearances. After clustering, two types of graphs, i.e., intra-graph and inter-graph, are employed to hierarchically model the image distribution both within and across the image clusters. The constructed hierarchical graph set divides the registration problem of the whole image set into a series of simple-to-solve registration problems, where the entire registration process can be solved accurately and efficiently. The final deformation pathway of each image to the estimated population center can be obtained by composing each part of the deformation pathway along the hierarchical graph set. To evaluate our proposed method, we performed registration of a hundred of brain images with large anatomical variations. The results indicate that our method yields significant improvement in registration performance over state-of-the-art groupwise registration methods.

Pei Dong, Xiaohuan Cao, Pew-Thian Yap, Dinggang Shen

Initialize Globally Before Acting Locally: Enabling Landmark-Free 3D US to MRI Registration

Registration of partial-view 3D US volumes with MRI data is influenced by initialization. The standard of practice is using extrinsic or intrinsic landmarks, which can be very tedious to obtain. To overcome the limitations of registration initialization, we present a novel approach that is based on Euclidean distance maps derived from easily obtainable coarse segmentations. We evaluate our approach on a publicly available brain tumor dataset (RESECT) and show that it is robust regarding minimal to no overlap of target area and varying initial position. We demonstrate that our method provides initializations that greatly increase the capture range of state-of-the-art nonlinear registration algorithms.

Julia Rackerseder, Maximilian Baust, Rüdiger Göbl, Nassir Navab, Christoph Hennersperger

Solving the Cross-Subject Parcel Matching Problem Using Optimal Transport

Matching structural parcels across different subjects is an open problem in neuroscience. Even when produced by the same technique, parcellations tend to differ in the number, shape, and spatial localization of parcels across subjects. In this work, we propose a parcel matching method based on Optimal Transport. We test its performance by matching parcels of the Desikan atlas, parcels based on a functional criteria and structural parcels. We compare our technique against three other ways to match parcels which are based on the Euclidean distance, the cosine similarity, and the Kullback-Leibler divergence. Our results show that our method achieves the highest number of correct matches.

Guillermo Gallardo, Nathalie T. H. Gayraud, Rachid Deriche, Maureen Clerc, Samuel Deslauriers-Gauthier, Demian Wassermann

GlymphVIS: Visualizing Glymphatic Transport Pathways Using Regularized Optimal Transport

The glymphatic system (GS) is a transit passage that facilitates brain metabolic waste removal and its dysfunction has been associated with neurodegenerative diseases such as Alzheimer’s disease. The GS has been studied by acquiring temporal contrast enhanced magnetic resonance imaging (MRI) sequences of a rodent brain, and tracking the cerebrospinal fluid injected contrast agent as it flows through the GS. We present here a novel visualization framework, GlymphVIS, which uses regularized optimal transport (OT) to study the flow behavior between time points at which the images are taken. Using this regularized OT approach, we can incorporate diffusion, handle noise, and accurately capture and visualize the time varying dynamics in GS transport. Moreover, we are able to reduce the registration mean-squared and infinity-norm error across time points by up to a factor of 5 as compared to the current state-of-the-art method. Our visualization pipeline yields flow patterns that align well with experts’ current findings of the glymphatic system.

Rena Elkin, Saad Nadeem, Eldad Haber, Klara Steklova, Hedok Lee, Helene Benveniste, Allen Tannenbaum

Hierarchical Spherical Deformation for Shape Correspondence

We present novel spherical deformation for a landmark-free shape correspondence in a group-wise manner. In this work, we aim at both addressing template selection bias and minimizing registration distortion in a single framework. The proposed spherical deformation yields a non-rigid deformation field without referring to any particular spherical coordinate system. Specifically, we extend a rigid rotation represented by well-known Euler angles to general non-rigid local deformation via spatial-varying Euler angles. The proposed method employs spherical harmonics interpolation of the local displacements to simultaneously solve rigid and non-rigid local deformation during the optimization. This consequently leads to a continuous, smooth, and hierarchical representation of the deformation field that minimizes registration distortion. In addition, the proposed method is group-wise registration that requires no specific template to establish a shape correspondence. In the experiments, we show an improved shape correspondence with high accuracy in cortical surface parcellation as well as significantly low registration distortion in surface area and edge length compared to the existing registration methods while achieving fast registration in 3 m per subject.

Ilwoo Lyu, Martin A. Styner, Bennett A. Landman

Diffeomorphic Brain Shape Modelling Using Gauss-Newton Optimisation

Shape modelling describes methods aimed at capturing the natural variability of shapes and commonly relies on probabilistic interpretations of dimensionality reduction techniques such as principal component analysis. Due to their computational complexity when dealing with dense deformation models such as diffeomorphisms, previous attempts have focused on explicitly reducing their dimension, diminishing de facto their flexibility and ability to model complex shapes such as brains. In this paper, we present a generative model of shape that allows the covariance structure of deformations to be captured without squashing their domain, resulting in better normalisation. An efficient inference scheme based on Gauss-Newton optimisation is used, which enables processing of 3D neuroimaging data. We trained this algorithm on segmented brains from the OASIS database, generating physiologically meaningful deformation trajectories. To prove the model’s robustness, we applied it to unseen data, which resulted in equivalent fitting scores.

Yaël Balbastre, Mikael Brudfors, Kevin Bronik, John Ashburner

Multi-task SonoEyeNet: Detection of Fetal Standardized Planes Assisted by Generated Sonographer Attention Maps

We present a novel multi-task convolutional neural network called Multi-task SonoEyeNet (M-SEN) that learns to generate clinically relevant visual attention maps using sonographer gaze tracking data on input ultrasound (US) video frames so as to assist standardized abdominal circumference (AC) plane detection. Our architecture consists of a generator and a discriminator, which are trained in an adversarial scheme. The generator learns sonographer attention on a given US video frame to predict the frame label (standardized AC plane/background). The discriminator further fine-tunes the predicted attention map by encouraging it to mimick the ground-truth sonographer attention map. The novel model expands the potential clinical usefulness of a previous model by eliminating the requirement of input gaze tracking data during inference without compromising its plane detection performance (Precision: 96.8, Recall: 96.2, F-1 score: 96.5).

Yifan Cai, Harshita Sharma, Pierre Chatelain, J. Alison Noble

Efficient Laplace Approximation for Bayesian Registration Uncertainty Quantification

This paper presents a novel approach to modeling the posterior distribution in image registration that is computationally efficient for large deformation diffeomorphic metric mapping (LDDMM). We develop a Laplace approximation of Bayesian registration models entirely in a bandlimited space that fully describes the properties of diffeomorphic transformations. In contrast to current methods, we compute the inverse Hessian at the mode of the posterior distribution of diffeomorphisms directly in the low dimensional frequency domain. This dramatically reduces the computational complexity of approximating posterior marginals in the high dimensional imaging space. Experimental results show that our method is significantly faster than the state-of-the-art diffeomorphic image registration uncertainty quantification algorithms, while producing comparable results. The efficiency of our method strengthens the feasibility in prospective clinical applications, e.g., real-time image-guided navigation for brain surgery.

Jian Wang, William M. Wells, Polina Golland, Miaomiao Zhang

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise