Skip to main content
Top

2019 | Book

Medical Image Computing and Computer Assisted Intervention – MICCAI 2019

22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part IV

Editors: Dinggang Shen, Prof. Tianming Liu, Dr. Terry M. Peters, Lawrence H. Staib, Caroline Essert, Sean Zhou, Prof. Dr. Pew-Thian Yap, Ali Khan

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

The six-volume set LNCS 11764, 11765, 11766, 11767, 11768, and 11769 constitutes the refereed proceedings of the 22nd International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2019, held in Shenzhen, China, in October 2019.

The 539 revised full papers presented were carefully reviewed and selected from 1730 submissions in a double-blind review process. The papers are organized in the following topical sections:

Part I: optical imaging; endoscopy; microscopy.

Part II: image segmentation; image registration; cardiovascular imaging; growth, development, atrophy and progression.

Part III: neuroimage reconstruction and synthesis; neuroimage segmentation; diffusion weighted magnetic resonance imaging; functional neuroimaging (fMRI); miscellaneous neuroimaging.

Part IV: shape; prediction; detection and localization; machine learning; computer-aided diagnosis; image reconstruction and synthesis.

Part V: computer assisted interventions; MIC meets CAI.

Part VI: computed tomography; X-ray imaging.

Table of Contents

Frontmatter

Shape (Including Neuroimage Shape)

Frontmatter
A CNN-Based Framework for Statistical Assessment of Spinal Shape and Curvature in Whole-Body MRI Images of Large Populations

The extraction of spines from medical records in a fast yet accurate way is a challenging task, especially for large data sets. Addressing this issue, we present a framework based on convolutional neural networks for the reconstruction of the spinal shape and curvature, making statistical assessments feasible on epidemiological scale. Our method uses a two-step strategy. First, anchor vertebrae and the spinal centerline in between them get extracted. Second, the centerlines are transformed into a common coordinate system to enable comparisons and statistical assessments across subjects. Our networks were trained on 103 subjects, where we achieved accuracies of 3.3 mm on average, taking at most 1 s per record, which eases the handling of even very large cohorts. Without any further training, we validated our model on study data of about 3400 subjects with only 10 cases of failure, which demonstrates the robustness of our method with respect to the natural variability in spinal shape and curvature. A thorough statistical analysis of the results underpins the importance of our work. Specifically, we show that the spinal curvature is significantly influenced by the body mass index of a subject. Moreover, we show that the same findings arise when Cobb angles are considered instead of direct curvature measures. To this end, we propose a generalization of classical Cobb angles that can be evaluated algorithmically and can also serve as a useful (visual) tool for physicians in everyday clinical practice.

Philipp Ernst, Georg Hille, Christian Hansen, Klaus Tönnies, Marko Rak
Exploiting Reliability-Guided Aggregation for the Assessment of Curvilinear Structure Tortuosity

The study on tortuosity of curvilinear structures in medical images has been significant in support of the examination and diagnosis for a number of diseases. To avoid the bias that may arise from using one particular tortuosity measurement, the simultaneous use of multiple measurements may offer a promising approach to produce a more robust overall assessment. As such, this paper proposes a data-driven approach for the automated grading of curvilinear structures’ tortuosity, where multiple morphological measurements are aggregated on the basis of reliability to form a robust overall assessment. The proposed pipeline starts dealing with the imprecision and uncertainty inherently embedded in empirical tortuosity grades, whereby a fuzzy clustering method is applied on each available measurement. The reliability of each measurement is then assessed following a nearest neighbour guided approach before the final aggregation is made. Experimental results on two corneal nerve and one retinal vessel data sets demonstrate the superior performance of the proposed method over those where measurements are used independently or aggregated using conventional averaging operators.

Pan Su, Yitian Zhao, Tianhua Chen, Jianyang Xie, Yifan Zhao, Hong Qi, Yalin Zheng, Jiang Liu
A Surface-Theoretic Approach for Statistical Shape Modeling

We present a novel approach for nonlinear statistical shape modeling that is invariant under Euclidean motion and thus alignment-free. By analyzing metric distortion and curvature of shapes as elements of Lie groups in a consistent Riemannian setting, we construct a framework that reliably handles large deformations. Due to the explicit character of Lie group operations, our non-Euclidean method is very efficient allowing for fast and numerically robust processing. This facilitates Riemannian analysis of large shape populations accessible through longitudinal and multi-site imaging studies providing increased statistical power. We evaluate the performance of our model w.r.t. shape-based classification of pathological malformations of the human knee and show that it outperforms the standard Euclidean as well as a recent nonlinear approach especially in presence of sparse training data. To provide insight into the model’s ability of capturing natural biological shape variability, we carry out an analysis of specificity and generalization ability.

Felix Ambellan, Stefan Zachow, Christoph von Tycowicz
One-Stage Shape Instantiation from a Single 2D Image to 3D Point Cloud

Shape instantiation which predicts the 3D shape of a dynamic target from one or more 2D images is important for real-time intra-operative navigation. Previously, a general shape instantiation framework was proposed with manual image segmentation to generate a 2D Statistical Shape Model (SSM) and with Kernel Partial Least Square Regression (KPLSR) to learn the relationship between the 2D and 3D SSM for 3D shape prediction. In this paper, the two-stage shape instantiation is improved to be one-stage. PointOutNet with 19 convolutional layers and three fully-connected layers is used as the network structure and Chamfer distance is used as the loss function to predict the 3D target point cloud from a single 2D image. With the proposed one-stage shape instantiation algorithm, a spontaneous image-to-point cloud training and inference can be achieved. A dataset from 27 Right Ventricle (RV) subjects, indicating 609 experiments, were used to validate the proposed one-stage shape instantiation algorithm. An average point cloud-to-point cloud (PC-to-PC) error of 1.72 mm has been achieved, which is comparable to the PLSR-based (1.42 mm) and KPLSR-based (1.31 mm) two-stage shape instantiation algorithm.

Xiao-Yun Zhou, Zhao-Yang Wang, Peichao Li, Jian-Qing Zheng, Guang-Zhong Yang
Placental Flattening via Volumetric Parameterization

We present a volumetric mesh-based algorithm for flattening the placenta to a canonical template to enable effective visualization of local anatomy and function. Monitoring placental function in vivo promises to support pregnancy assessment and to improve care outcomes. We aim to alleviate visualization and interpretation challenges presented by the shape of the placenta when it is attached to the curved uterine wall. To do so, we flatten the volumetric mesh that captures placental shape to resemble the well-studied ex vivo shape. We formulate our method as a map from the in vivo shape to a flattened template that minimizes the symmetric Dirichlet energy to control distortion throughout the volume. Local injectivity is enforced via constrained line search during gradient descent. We evaluate the proposed method on 28 placenta shapes extracted from MRI images in a clinical study of placental function. We achieve sub-voxel accuracy in mapping the boundary of the placenta to the template while successfully controlling distortion throughout the volume. We illustrate how the resulting mapping of the placenta enhances visualization of placental anatomy and function. Our implementation is freely available at https://github.com/mabulnaga/placenta-flattening .

S. Mazdak Abulnaga, Esra Abaci Turk, Mikhail Bessmeltsev, P. Ellen Grant, Justin Solomon, Polina Golland
Fast Polynomial Approximation to Heat Diffusion in Manifolds

Heat diffusion has been widely used in image processing for surface fairing, mesh regularization and surface data smoothing. We present a new fast and accurate numerical method to solve heat diffusion on curved surfaces. This is achieved by approximating the heat kernel using high degree orthogonal polynomials in the spectral domain. The proposed polynomial expansion method avoids solving for the eigenfunctions of the Laplace-Beltrami operator, which is computationally costly for large-scale surface meshes, and the numerical instability associated with the finite element method based diffusion solvers. We apply the proposed method to localize the sex differences in cortical brain sulcal and gyral curve patterns.

Shih-Gu Huang, Ilwoo Lyu, Anqi Qiu, Moo K. Chung
Hierarchical Multi-geodesic Model for Longitudinal Analysis of Temporal Trajectories of Anatomical Shape and Covariates

Longitudinal regression analysis for clinical imaging studies is essential to investigate unknown relationships between subject-wise changes over time and subject-specific characteristics, represented by covariates such as disease severity or a level of genetic risk. Image-derived data in medical image analysis, e.g. diffusion tensors or geometric shapes, are often represented on nonlinear Riemannian manifolds. Hierarchical geodesic models were suggested to characterize subject-specific changes of nonlinear data on Riemannian manifolds as extensions of a linear mixed effects model. We propose a new hierarchical multi-geodesic model to enable analysis of the relationship between subject-wise anatomical shape changes on a Riemannian manifold and multiple subject-specific characteristics. Each individual subject-wise shape change is represented by a univariate geodesic model. The effects of subject-specific covariates on the estimated subject-wise trajectories are then modeled by multivariate intercept and slope models which together form a multi-geodesic model. Validation was performed with a synthetic example on a $$S^2$$ manifold. The proposed method was applied to a longitudinal set of 72 corpus callosum shapes from 24 autism spectrum disorder subjects to study the relationship between anatomical shape changes and the autism severity score, resulting in statistics for the population but also for each subject. To our knowledge, this is the first longitudinal framework to model anatomical developments over time as functions of both continuous and categorical covariates on a nonlinear shape space.

Sungmin Hong, James Fishbaugh, Jason J. Wolff, Martin A. Styner, Guido Gerig, the IBIS Network
Clustering of Longitudinal Shape Data Sets Using Mixture of Separate or Branching Trajectories

Several methods have been proposed recently to learn spatiotemporal models of shape progression from repeated observations of several subjects over time, i.e. a longitudinal data set. These methods summarize the population by a single common trajectory in a supervised manner. In this paper, we propose to extend such approaches to an unsupervised setting where a longitudinal data set is automatically clustered in different classes without labels. Our method learns for each cluster an average shape trajectory (or representative curve) and its variance in space and time. Representative trajectories are built as the combination of pieces of curves. This mixture model is flexible enough to handle independent trajectories for each cluster as well as fork and merge scenarios. The estimation of such non linear mixture models in high dimension is known to be difficult because of the trapping states effect that hampers the optimisation of cluster assignments during training. We address this issue by using a tempered version of the stochastic EM algorithm. Finally, we apply our algorithm on synthetic data to validate that a tempered scheme achieve better convergence. We show then how the method can be used to test different scenarios of hippocampus atrophy in ageing by using an heteregenous population of normal ageing individuals and mild cognitive impaired subjects.

Vianney Debavelaere, Alexandre Bône, Stanley Durrleman, Stéphanie Allassonnière, for the Alzheimer’s Disease Neuroimaging Initiative
Group-Wise Graph Matching of Cortical Gyral Hinges

Human brain image alignment has long been an intriguing research topic. The difficulty lies in the huge inter-individual variation. Also, it is not fully understood how structural similarity across subjects is related to functional correspondence. Recently, a gyral folding pattern, which is the conjunction of gyri from multiple directions and termed gyral hinge, was characterized. Gyral hinges have been demonstrated to have structural and functional importance and some of them were found to have cross-subject correspondences by manual labeling. However, there is no automatic method to estimate the cross-subject correspondences for whole-brain gyral hinges yet. To this end, we propose a novel group-wise graph matching framework, to which we feed structural connective matrices among gyral hinges from all subjects. The correspondence estimated by this framework is demonstrated by cross-subject consistency of both structural connective and functional profiles. Also, our results outperform the correspondences identified by pairwise graph matching and image-based registration methods.

Tuo Zhang, Xiao Li, Lin Zhao, Ying Huang, Zhibin He, Lei Guo, Tianming Liu
Multi-view Graph Matching of Cortical Landmarks

Human brain image alignment based on cortical folding pattern has long been an intriguing yet challenging research topic. Recently, a new gyral folding pattern, termed gyral hinge, was proposed and characterized by the conjunction of gyri from multiple directions. The uniqueness and importance of gyral hinges lie in their structural and functional importance and potential cross-subject correspondence, making them possible to be used as cortical landmarks. However, such an anatomical correspondence based on these new cortical landmarks is not fully studied and not related to structural connective similarity and functional coherence. Thus, we investigate whether the single use of structural connective or functional interactive diagrams, or the joint use of them could improve the alignment of these gyral hinges. Based on the pairwise graph matching method, we propose a multi-view framework in which all gyral hinges within a subject are taken as a system and its structural and functional connective networks are used as inputs. The results demonstrate that the joint use of structural and functional profiles outperforms those based on either of them and outperform those based on image registration methods.

Ying Huang, Zhibin He, Lei Guo, Tianming Liu, Tuo Zhang
Patient-Specific Conditional Joint Models of Shape, Image Features and Clinical Indicators

We propose and demonstrate a joint model of anatomical shapes, image features and clinical indicators for statistical shape modeling and medical image analysis. The key idea is to employ a copula model to separate the joint dependency structure from the marginal distributions of variables of interest. This separation provides flexibility on the assumptions made during the modeling process. The proposed method can handle binary, discrete, ordinal and continuous variables. We demonstrate a simple and efficient way to include binary, discrete and ordinal variables into the modeling. We build Bayesian conditional models based on observed partial clinical indicators, features or shape based on Gaussian processes capturing the dependency structure. We apply the proposed method on a stroke dataset to jointly model the shape of the lateral ventricles, the spatial distribution of the white matter hyperintensity associated with periventricular white matter disease, and clinical indicators. The proposed method yields interpretable joint models for data exploration and patient-specific statistical shape models for medical image analysis.

Bernhard Egger, Markus D. Schirmer, Florian Dubost, Marco J. Nardin, Natalia S. Rost, Polina Golland
Surface-Based Spatial Pyramid Matching of Cortical Regions for Analysis of Cognitive Performance

We propose a method to analyze the relationship between the shape of functional regions of the cortex and cognitive measures, such as reading ability and vocabulary knowledge. Functional regions on the cortical surface can vary not only in size and shape but also in topology and position relative to neighboring regions. Standard diffeomorphism-based shape analysis tools do not work well here because diffeomorphisms are unable to capture these topological differences, which include region splitting and merging across subjects. State-of-the-art cortical surface shape analyses compute derived regional properties (scalars), such as regional volume, cortical thickness, curvature, and gyrification index. However, these methods cannot compare the full extent of topological or shape differences in cortical regions. We propose icosahedral spatial pyramid matching (ISPM) of region borders computed on the surface of a sphere to capture this variation in regional topology, position, and shape. We then analyze how this variation corresponds to measures of cognitive performance. We compare our method to other approaches and find that it is indeed informative to consider aspects of shape beyond the standard approaches. Analysis is performed using a subset of 27 test/retest subjects from the Human Connectome Project in order to understand both the effectiveness and reproducibility of this method.

Kristen M. Campbell, Jeffrey S. Anderson, P. Thomas Fletcher

Prediction

Frontmatter
Diagnosis-Guided Multi-modal Feature Selection for Prognosis Prediction of Lung Squamous Cell Carcinoma

The existing studies have demonstrated that the integrative analysis of histopathological images and genomic data can hold great promise for survival analysis of cancers. However, direct combination of multi-modal data may bring irrelevant or redundant features that will harm the prognosis performance. Therefore, it has become a challenge to select informative features from the derived heterogeneous data for survival analysis. Most existing feature selection methods only utilized the collected multi-modal data and survival information to identify a subset of relevant features, which neglect to use the diagnosis information to guide the feature selection process. In fact, the diagnosis information (e.g., TNM stage) indicates the extent of the disease severity that are highly correlated with the patients’ survival. Accordingly, we propose a diagnosis-guided multi-modal feature selection method (DGM2FS) for prognosis prediction. Specifically, we make use of the task relationship learning framework to automatically discover the relations between the diagnosis and prognosis tasks, through which we can identify important survival-associated image and eigengenes features with the help of diagnosis information. In addition, we also consider the association between the multi-modal data and use a regularization term to capture the correlation between the image and eigengene data. Experimental results on a lung squamous cell carcinoma dataset imply that incorporating diagnosis information can help identify meaningful survival-associated features, by which we can achieve better prognosis prediction performance than the conventional methods.

Wei Shao, Tongxin Wang, Zhi Huang, Jun Cheng, Zhi Han, Daoqiang Zhang, Kun Huang
Graph Convolution Based Attention Model for Personalized Disease Prediction

Clinicians implicitly incorporate the complementarity of multi-modal data for disease diagnosis. Often a varied order of importance for this heterogeneous data is considered for personalized decisions. Current learning-based methods have achieved better performance with uniform attention to individual information, but a very few have focused on patient-specific attention learning schemes for each modality. Towards this, we introduce a model which not only improves the disease prediction but also focuses on learning patient-specific order of importance for multi-modal data elements. In order to achieve this, we take advantage of LSTM-based attention mechanism and graph convolutional networks (GCNs) to design our model. GCNs learn multi-modal but class-specific features from the entire population of patients, whereas the attention mechanism optimally fuses these multi-modal features into a final decision, separately for each patient. In this paper, we apply the proposed approach for disease prediction task for Parkinson’s and Alzheimer’s using two public medical datasets.

Anees Kazi, Shayan Shekarforoush, S. Arvind Krishna, Hendrik Burwinkel, Gerome Vivar, Benedict Wiestler, Karsten Kortüm, Seyed-Ahmad Ahmadi, Shadi Albarqouni, Nassir Navab
Predicting Early Stages of Neurodegenerative Diseases via Multi-task Low-Rank Feature Learning

Early stages of neurodegenerative diseases draw increasing recognition as obscure symptoms may appear before classical clinical diagnosis. For this reason, we propose a novel multi-task low-rank feature learning method, which takes advantages of the sparsity and low-rankness of neuroimaging data for Parkinson’s Disease (PD) and Alzheimer’s Disease (AD) multi-classification. First, the low-rank learning is proposed to unveil the underlying relationships between input data and output targets by preserving the most class-discriminative features. Multi-task learning is simultaneously performed to capture intrinsic feature relatedness. A sparse linear regression framework is designed to find the low-dimensional structure of high dimensional data. Experimental results on the Parkinson’s progression markers initiative (PPMI) and Alzheimer’s disease neuroimaging initiative (ADNI) datasets show that our proposed model not only enhances the performances of multi-classification tasks, but also outperforms the conventional algorithms.

Haijun Lei, Yujia Zhao, Baiying Lei
Improved Prediction of Cognitive Outcomes via Globally Aligned Imaging Biomarker Enrichments over Progressions

Incomplete or inconsistent temporal neuroimaging records of patients over time pose a major challenge to accurately predict clinical scores for diagnosing Alzheimer’s Disease (AD). In this paper, we present an unsupervised method to learn enriched imaging biomarker representations that can simultaneously capture the information conveyed by all the baseline neuroimaging measures and the progressive variations of the available follow-up measurements of every participant. Our experiments on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset show improved performance in predicting cognitive outcomes thereby demonstrating the effectiveness of our proposed method.

Lyujian Lu, Saad Elbeleidy, Lauren Baker, Hua Wang, Heng Huang, Li Shen, for the ADNI
Deep Granular Feature-Label Distribution Learning for Neuroimaging-Based Infant Age Prediction

Neuroimaging-based infant age prediction is important for brain development analysis but often suffers insufficient data. To address this challenge, we introduce label distribution learning (LDL), a popular machine learning paradigm focusing on the small sample problem, for infant age prediction. As directly applying LDL yields dramatically increased number of day-to-day age labels and also extremely scarce data describing each label, we propose a new strategy, called granular label distribution (GLD). Particularly, by assembling the adjacent labels to granules and designing granular distributions, GLD makes each brain MRI contribute to not only its own age but also its neighboring ages at a granule scale, which effectively keeps the information augmentation superiority of LDL and reduces the number of labels. Furthermore, to extremely augment the information supplied by the small data, we propose a novel method named granular feature distribution (GFD). GFD leverages the variability of the brain images at the same age, thus significantly increases the learning effectiveness. Moreover, deep neural network is exploited to approximate the GLD. These strategies constitute a new model: deep granular feature-label distribution learning (DGFLDL). By taking 8 types of cortical morphometric features from structural MRI as predictors, the proposed DGFLDL is validated on infant age prediction using 384 brain MRI scans from 35 to 848 days after birth. Our proposed method, approaching the mean absolute error as 36.1 days, significantly outperforms the baseline methods. Besides, the permutation importance analysis of features based on our method reveals important biomarkers of infant brain development.

Dan Hu, Han Zhang, Zhengwang Wu, Weili Lin, Gang Li, Dinggang Shen, for UNC/UMN Baby Connectome Project Consortium
End-to-End Dementia Status Prediction from Brain MRI Using Multi-task Weakly-Supervised Attention Network

Computer-aided prediction of dementia status (e.g., clinical scores of cognitive tests) from brain MRI is of great clinical value, as it can help assess pathological stage and predict disease progression. Existing learning-based approaches typically preselect dementia-sensitive regions from the whole-brain MRI for feature extraction and prediction model construction, which might be sub-optimal due to potential heterogeneities between different steps. Also, based on anatomical prior knowledge (e.g., brain atlas) and time-consuming nonlinear registration, these preselected brain regions are usually the same across all subjects, ignoring their individual specificities in dementia progression. In this paper, we propose a multi-task weakly-supervised attention network (MWAN) to jointly predict multiple clinical scores from the baseline MRI data, by explicitly considering individual specificities of different subjects. Leveraging a fully-trainable dementia attention block, our MWAN method can automatically identify subject-specific discriminative locations from the whole-brain MRI for end-to-end feature learning and multi-task regression. We evaluated our MWAN method by cross-validation on two public datasets (i.e., ADNI-1 and ADNI-2). Experimental results demonstrate that the proposed method performs well in both the tasks of clinical score prediction and weakly-supervised discriminative localization in brain MR images.

Chunfeng Lian, Mingxia Liu, Li Wang, Dinggang Shen
Unified Modeling of Imputation, Forecasting, and Prediction for AD Progression

In this paper, we propose a novel deep recurrent neural network as an Alzheimer’s Disease (AD) progression model, capable of jointly conducting tasks of missing values imputation, phenotypic measurements forecast, and clinical state prediction of a subject based on his/her longitudinal imaging biomarkers. Unlike the existing methods that mostly ignore missing values or impute them by means of an independent statistical model before training a disease progression model, we devise a unified recurrent network architecture for jointly performing missing values imputation, biomarker values forecast, and clinical state prediction from the longitudinal data. For these tasks to be handled in a unified framework, we also define an objective function that can be efficiently optimized by means of stochastic gradient descent in an end-to-end manner. We validated the effectiveness of our proposed method by comparing with the comparative methods over the TADPOLE challenge cohort.

Wonsik Jung, Ahmad Wisnu Mulyadi, Heung-Il Suk
LSTM Network for Prediction of Hemorrhagic Transformation in Acute Stroke

Hemorrhagic transformation (HT) is one of the most devastating complications of reperfusion therapy in acute ischemic stroke. Prediction of an upcoming HT remains beyond current techniques in routine clinical practice. If made available, such information would benefit the management of acute ischemic stroke patients and help to tailor therapeutic strategies. This study aims at providing a machine learning framework for predicting occurrence and extent of HT from source perfusion-weighted magnetic resonance imaging (PWI) combined with diffusion weighted imaging (DWI). The model relies on a LSTM network based on PWI combined with DWI imaging features into a fully connected neural network. A retrospective comparative analysis performed on 155 acute stroke patients demonstrate the efficacy of the LSTM model (AUC-ROC: $$89.4\%$$ ) against state-of-the-art machine learning models. Predicted likelihood of HT at the voxel level was evaluated against HT annotations of stroke neurologists obtained from follow-up gradient recalled echo (GRE) imaging.

Yannan Yu, Bhargav Parsi, William Speier, Corey Arnold, Min Lou, Fabien Scalzo
Inter-modality Dependence Induced Data Recovery for MCI Conversion Prediction

Learning complementary information from multi-modality data often improves diagnostic performance of brain disorders. However, it is challenging to obtain this complementary information when the data are incomplete. Existing methods, such as low-rank matrix completion (which imputes the missing data) and multi-task learning (which restructures the problem into the joint learning of multiple tasks, with each task associated with a subset of complete data), simply concatenate features from different modalities without considering their underlying correlations. Furthermore, most methods conduct multi-modality fusion and prediction model learning in separated steps, which may render to a sub-optimal solution. To address these issues, we propose a novel diagnostic model that integrates missing data recovery, latent space learning and prediction model learning into a unified framework. Specifically, we first recover the missing modality by maximizing the dependency among different modalities. Then, we further exploit the modality correlation by projecting different modalities into a common latent space. Besides, we employ an $$\ell _1$$ -norm to our loss function to mitigate the influence of sample outliers. Finally, we map the learned latent representation into the label space. All these tasks are learned iteratively in a unified framework, where the label information (from the training samples) can also inherently guide the missing modality recovery. Experimental results on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset show the effectiveness of our method.

Tao Zhou, Kim-Han Thung, Yu Zhang, Huazhu Fu, Jianbing Shen, Dinggang Shen, Ling Shao
Preprocessing, Prediction and Significance: Framework and Application to Brain Imaging

Brain imaging studies have set the stage for measuring brain function in psychiatric disorders, such as depression, with the goal of developing effective treatment strategies. However, data arising from such studies are often hampered by noise confounds such as motion-related artifacts, affecting both the spatial and temporal correlation structure of the data. Failure to adequately control for these types of noise can have significant impact on subsequent statistical analyses. In this paper, we demonstrate a framework for extending the non-parametric testing of statistical significance in predictive modeling by including a plausible set of preprocessing strategies to measure the predictive power. Our approach adopts permutation tests to estimate how likely we are to obtain a given predictive performance in an independent sample, depending on the preprocessing strategy used to generate the data. We demonstrate and apply the framework on examples of longitudinal Positron Emission Tomography (PET) data following a pharmacological intervention.

Martin Nørgaard, Brice Ozenne, Claus Svarer, Vibe G. Frokjaer, Martin Schain, Stephen C. Strother, Melanie Ganz
Early Prediction of Alzheimer’s Disease Progression Using Variational Autoencoders

Prediction of Alzheimer’s disease before the onset of symptoms is an important clinical challenge, as it offers the potential for earlier intervention to interrupt disease progression before the development of dementia symptoms, as well as spur new prevention and treatment avenues. In this work, we propose a model that learns how to predict Alzheimer’s disease ahead of time from structural Magnetic Resonance Imaging (sMRI) data. The contributions of this work are two-fold: (i) We use the latent variables learned by our model to visualize areas of the brain, which contribute to confident decisions. Our model appears to be focusing on specific areas of the neocortex, cerebellum, and brainstem, which are known to be clinically relevant. (ii) There are various ways in which disease might evolve from a patient’s current physiological state. We can leverage the latent variables in our model to capture the uncertainty over possible future patient outcomes. It can help identify and closely monitor people who are at a higher risk of disease, despite the current lack of clinical indications.

Sumana Basu, Konrad Wagstyl, Azar Zandifar, Louis Collins, Adriana Romero, Doina Precup
Integrating Heterogeneous Brain Networks for Predicting Brain Disease Conditions

Human brain networks convey important insights in understanding the mechanism of many mental disorders. However, it is difficult to determine a universal optimal among various tractography methods for general diagnosis tasks. To address this issue, tentative studies, aiming at the identification of some mental disorders, make an effective concession by exploiting multi-modal brain networks. In this paper, we propose to predict the clinical measures as a more comprehensive and stable assessment of brain abnormalities. We develop a graph convolutional network (GCN) framework to integrate heterogeneous brain networks. Particularly, an adaptive pooling scheme is designed, catering to the modal structural diversity and sharing the advantages of locality, loyalty and likely as in standard convolutional networks. The experimental results demonstrate that our method achieves state-of-the-art prediction results, and validates the advantages of the utilization of multi-modal brain networks in that, more modals are always at least as good as the best modal, if not better.

Yanfu Zhang, Liang Zhan, Weidong Cai, Paul Thompson, Heng Huang

Detection and Localization

Frontmatter
Uncertainty-Informed Detection of Epileptogenic Brain Malformations Using Bayesian Neural Networks

Focal cortical dysplasia (FCD) is a prevalent surgically-amenable epileptogenic malformation of cortical development. On MRI, FCD typically presents with cortical thickening, hyperintensity, and blurring of the gray-white matter interface. These changes may be visible to the naked eye, or subtle and be easily overlooked. Despite advances in MRI analytics, current machine learning algorithms fail to detect FCD in up to 50% of cases. Moreover, the deterministic nature of current algorithms does not allow conducting risk assessments of such predictions, an essential step in clinical decision-making. Here, we propose an algorithm formulated on Bayesian convolutional neural networks (CNN) providing information on prediction uncertainty, while leveraging this information to improve classification performance. Our classifier was trained on a patch-based augmented dataset derived from 56 patients with histologically-validated FCD to distinguish the lesion from healthy tissue. The algorithm was trained and cross-validated on multimodal 3T MRI data. Compared to a non-Bayesian learner with the same network architecture and complexity, the uncertainty-informed Bayesian CNN classifiers showed significant improvement in sensitivity (89% vs 82%; p < 0.05) while specificity was high for both classifiers. We demonstrate empirically the effectiveness of our uncertainty-informed CNN algorithm, making it ideal for large-scale clinical diagnostics of FCD.

Ravnoor S. Gill, Benoit Caldairou, Neda Bernasconi, Andrea Bernasconi
Automated Lesion Detection by Regressing Intensity-Based Distance with a Neural Network

Localization of focal vascular lesions on brain MRI is an important component of research on the etiology of neurological disorders. However, manual annotation of lesions can be challenging, time-consuming and subject to observer bias. Automated detection methods often need voxel-wise annotations for training. We propose a novel approach for automated lesion detection that can be trained on scans only annotated with a dot per lesion instead of a full segmentation. From the dot annotations and their corresponding intensity images we compute various distance maps (DMs), indicating the distance to a lesion based on spatial distance, intensity distance, or both. We train a fully convolutional neural network (FCN) to predict these DMs for unseen intensity images. The local optima in the predicted DMs are expected to correspond to lesion locations. We show the potential of this approach to detect enlarged perivascular spaces in white matter on a large brain MRI dataset with an independent test set of 1000 scans. Our method matches the intra-rater performance of the expert rater that was computed on an independent set. We compare the different types of distance maps, showing that incorporating intensity information in the distance maps used to train an FCN greatly improves performance.

Kimberlin M. H. van Wijnen, Florian Dubost, Pinar Yilmaz, M. Arfan Ikram, Wiro J. Niessen, Hieab Adams, Meike W. Vernooij, Marleen de Bruijne
Intracranial Aneurysm Detection from 3D Vascular Mesh Models with Ensemble Deep Learning

Intracranial aneurysm rupture can cause a serious stroke, which is related to the decline of daily life ability of the elderly. Although deep learning is now the most successful solution for organ detection, it requires myriads of training data, consistent of the image format, and a balanced sample distribution. This work presents an innovative representation of intracranial aneurysm detection as a shape analysis problem rather than a computer vision problem. We detected intracranial aneurysms in 3D cerebrovascular mesh models after segmentation of the brain vessels from the medical images, which can overcome the barriers of data format and data distribution, serving both clinical and screening purposes. Additionally, we propose a transferable multi-model ensemble (MMEN) architecture to detect intracranial aneurysms from cerebrovascular mesh models with limited data. To obtain a well-defined convolution operator, we use a global seamless parameterization converting a 3D cerebrovascular mesh model to a planar flat-torus. In the architecture, we transfer the planar flat-torus presentation abilities of three GoogleNet Inception V3 models, which were pre-trained on the ImageNet database, to characterize the intracranial aneurysms with local and global geometric features such as Gaussian curvature (GC), shape diameter function (SDF) and wave kernel signature (WKS), respectively. We jointly utilize all three models to detect aneurysms with adaptive weights learning based on back propagation. The experimental results on the 121 models show that our proposed method can achieve detection accuracy of 95.1% with 94.7% F1-score and 94.8% sensitivity, which is as good as the state-of-art work but is applicable to inhomogeneous image modalities and smaller datasets.

Mingsong Zhou, Xingce Wang, Zhongke Wu, Jose M. Pozo, Alejandro F. Frangi
Automated Noninvasive Seizure Detection and Localization Using Switching Markov Models and Convolutional Neural Networks

We introduce a novel switching Markov model for combined epileptic seizure detection and localization from scalp electroencephalography (EEG). Using a hierarchy of Markov chains to fuse multichannel information, our model detects seizure onset, localizes the seizure focus, and tracks seizure activity as it spreads across the cortex. This model-based seizure tracking and localization is complemented by a nonparametric EEG likelihood using convolutional neural networks. We learn our model with an expectation-maximization algorithm that uses loopy belief propagation for approximate inference. We validate our model using leave one patient out cross validation on EEG acquired from two hospitals. Detection is evaluated on the publicly available Children’s Hospital Boston dataset. We validate both the detection and localization performance on a focal epilepsy dataset collected at Johns Hopkins Hospital. To the best of our knowledge, our model is the first to perform automated localization from scalp EEG across a heterogeneous patient cohort.

Jeff Craley, Emily Johnson, Christophe Jouny, Archana Venkataraman
Multiple Landmark Detection Using Multi-agent Reinforcement Learning

The detection of anatomical landmarks is a vital step for medical image analysis and applications for diagnosis, interpretation and guidance. Manual annotation of landmarks is a tedious process that requires domain-specific expertise and introduces inter-observer variability. This paper proposes a new detection approach for multiple landmarks based on multi-agent reinforcement learning. Our hypothesis is that the position of all anatomical landmarks is interdependent and non-random within the human anatomy, thus finding one landmark can help to deduce the location of others. Using a Deep Q-Network (DQN) architecture we construct an environment and agent with implicit inter-communication such that we can accommodate K agents acting and learning simultaneously, while they attempt to detect K different landmarks. During training the agents collaborate by sharing their accumulated knowledge for a collective gain. We compare our approach with state-of-the-art architectures and achieve significantly better accuracy by reducing the detection error by $$50\%$$ , while requiring fewer computational resources and time to train compared to the naïve approach of training K agents separately. Code and visualizations available: https://github.com/thanosvlo/MARL-for-Anatomical-Landmark-Detection

Athanasios Vlontzos, Amir Alansary, Konstantinos Kamnitsas, Daniel Rueckert, Bernhard Kainz
Spatiotemporal Breast Mass Detection Network (MD-Net) in 4D DCE-MRI Images

Automatic mass detection in breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) helps to reduce the workload of radiologists and improves diagnostic accuracy. However, most of the existing methods rely on hand-crafted features followed by rule-based or shallow machine learning based detection methods. Due to the limited expressive power of hand-crafted features, the diagnostic performances of existing methods are usually unsatisfactory. In this work, we aim to leverage recent deep learning techniques for breast lesion detection and propose the Spatiotemporal Breast Mass Detection Networks (MD-Nets) to detect the masses in the 4D DCE-MRI images automatically. Simulating the clinical diagnosis process, we initially generate image-based candidates from all individual images and then construct a spatiotemporal 4D data to classify mass by using the convolutional long short-term memory network (ConvLSTM) to incorporate kinetic and spatial characteristics. Moreover, we collect a DCE-MRI dataset containing 21,294 annotated images from 172 studies. In experiments, we achieve an AUC of 0.9163 with a sensitivity of 0.8655 and a specificity of 0.8452, which verifies the effectiveness of our method.

Lixi Deng, Sheng Tang, Huazhu Fu, Bin Wang, Yongdong Zhang
Automated Pulmonary Embolism Detection from CTPA Images Using an End-to-End Convolutional Neural Network

Automated methods for detecting pulmonary embolisms (PEs) on CT pulmonary angiography (CTPA) images are of high demand. Existing methods typically employ separate steps for PE candidate detection and false positive removal, without considering the ability of the other step. As a result, most existing methods usually suffer from a high false positive rate in order to achieve an acceptable sensitivity. This study presents an end-to-end trainable convolutional neural network (CNN) where the two steps are optimized jointly. The proposed CNN consists of three concatenated subnets: (1) a novel 3D candidate proposal network for detecting cubes containing suspected PEs, (2) a 3D spatial transformation subnet for generating fixed-sized vessel-aligned image representation for candidates, and (3) a 2D classification network which takes the three cross-sections of the transformed cubes as input and eliminates false positives. We have evaluated our approach using the 20 CTPA test dataset from the PE challenge, achieving a sensitivity of 78.9%, 80.7% and 80.7% at 2 false positives per volume at 0 mm, 2 mm and 5 mm localization error, which is superior to the state-of-the-art methods. We have further evaluated our system on our own dataset consisting of 129 CTPA data with a total of 269 emboli. Our system achieves a sensitivity of 63.2%, 78.9% and 86.8% at 2 false positives per volume at 0 mm, 2 mm and 5 mm localization error.

Yi Lin, Jianchao Su, Xiang Wang, Xiang Li, Jingen Liu, Kwang-Ting Cheng, Xin Yang
Unsupervised Anomaly Localization Using Variational Auto-Encoders

An assumption-free automatic check of medical images for potentially overseen anomalies would be a valuable assistance for a radiologist. Deep learning and especially Variational Auto-Encoders (VAEs) have shown great potential in the unsupervised learning of data distributions. In principle, this allows for such a check and even the localization of parts in the image that are most suspicious. Currently, however, the reconstruction-based localization by design requires adjusting the model architecture to the specific problem looked at during evaluation. This contradicts the principle of building assumption-free models. We propose complementing the localization part with a term derived from the Kullback-Leibler (KL)-divergence. For validation, we perform a series of experiments on FashionMNIST as well as on a medical task including >1000 healthy and >250 brain tumor patients. Results show that the proposed formalism outperforms the state of the art VAE-based localization of anomalies across many hyperparameter settings and also shows a competitive max performance.

David Zimmerer, Fabian Isensee, Jens Petersen, Simon Kohl, Klaus Maier-Hein
HR-CAM: Precise Localization of Pathology Using Multi-level Learning in CNNs

We propose a CNN based technique that aggregates feature maps from its multiple layers that can localize abnormalities with greater details as well as predict pathology under consideration. Existing class activation mapping (CAM) techniques extract feature maps from either the final layer or a single intermediate layer to create the discriminative maps and then interpolate to upsample to the original image resolution. In this case, the subject specific localization is coarse and is unable to capture subtle abnormalities. To mitigate this, our method builds a novel CNN based discriminative localization model that we call high resolution CAM (HR-CAM), which accounts for layers from each resolution, therefore facilitating a comprehensive map that can delineate the pathology for each subject by combining low-level, intermediate as well as high-level features from the CNN. Moreover, our model directly provides the discriminative map in the resolution of the original image facilitating finer delineation of abnormalities. We demonstrate the working of our model on a simulated abnormalities data where we illustrate how the model captures finer details in the final discriminative maps as compared to current techniques. We then apply this technique: (1) to classify ependymomas from grade IV glioblastoma on T1-weighted contrast enhanced (T1-CE) MRI and (2) to predict Parkinson’s disease from neuromelanin sensitive MRI. In all these cases we demonstrate that our model not only predicts pathologies with high accuracies, but also creates clinically interpretable subject specific high resolution discriminative localizations. Overall, the technique can be generalized to any CNN and carries high relevance in a clinical setting.

Sumeet Shinde, Tanay Chougule, Jitender Saini, Madhura Ingalhalikar
Novel Iterative Attention Focusing Strategy for Joint Pathology Localization and Prediction of MCI Progression

Mild Cognitive Impairment (MCI) is the prodromal stage of Alzheimer’s disease (AD), with a high incident rate converting to AD. Hence, it is critical to identify MCI patients who will convert to AD patients for early and effective treatment. Recently, many machine learning or deep learning based methods have been proposed to first localize the pathology-related brain regions and then extract respective features for MCI progression diagnosis. However, the intrinsic relationship between pathological region localization and respective feature extraction was usually neglected. To address this issue, in this paper, we proposed a novel iterative attention focusing strategy for joint pathological region localization and identification of progressive MCI (pMCI) from stable MCI (sMCI). Moreover, by connecting diagnosis network and attention map generator, the pathological regions can be iteratively localized, and the respective diagnosis performance is in turn improved. Experiments on 393 training subjects from the ADNI-1 dataset and other 277 testing subjects from the ADNI-2 dataset show that our method can achieve 81.59% accuracy for pMCI vs. sMCI diagnosis. Our results outperform those with the state-of-the-art methods, while additionally providing a focused attention map on specific pathological locations related to MCI progression, i.e., left temporal lobe, entorhinal and hippocampus. This allows for more insights and better understanding of the progression of MCI to AD.

Qingfeng Li, Xiaodan Xing, Ying Sun, Bin Xiao, Hao Wei, Quan Huo, Minqing Zhang, Xiang Sean Zhou, Yiqiang Zhan, Zhong Xue, Feng Shi
Automatic Vertebrae Recognition from Arbitrary Spine MRI Images by a Hierarchical Self-calibration Detection Framework

Automatic vertebrae recognition is crucial in spine diseases diagnosis, treatment planning, and response assessment. Although vertebrae detection has been studied for years, reliably recognizing vertebrae from arbitrary spine MRI images remains a challenge due to varying image characteristics, field of view (FOV) as well as vertebrae appearance. In this paper, we propose a Hierarchical Self-calibration Detection Framework (Hi-scene) to precisely recognize the labels and bounding boxes of all vertebrae in an arbitrary spine MRI image. Hi-scene is designed to first coarsely localize regions where vertebrae exist without the need of a priori knowledge about the scale, image characteristics and FOV; then accurately recognize vertebrae and automatically correct wrong recognitions by an elaborated self-calibration recognition network that embeds message passing into deep learning network. The method is trained and evaluated on a capacious and challenging dataset of 450 MRI scans, and the evaluation results show that our Hi-scene achieves high performance (testing accuracy reaches 0.933) from arbitrary input spine MRI and outperforms other state-of-the-art methods.

Shen Zhao, Xi Wu, Bo Chen, Shuo Li

Machine Learning

Frontmatter
Image Data Validation for Medical Systems

Data validation is the process of ensuring that the input to a data processing pipeline is correct and useful. It is a critical part of software systems running in production. Image processing systems are no different, whereby problems with data acquisition, file corruption or data transmission, may lead to a wide range of unexpected issues in the acquired images. Until now, most image processing systems of this type involved a human in the loop that could detect these errors before further processing. With the advent of powerful deep learning methods, tools for medical image processing are becoming increasingly autonomous and can go from data acquisition to final medical diagnosis without any human interaction. However, deep networks are known for their inability to detect corruption or errors in the input data. To overcome this, we present a validation method that learns the appearance of images in the training dataset that was used to train the deep network, and is able to identify when an input image deviates from the training distribution and therefore cannot be safely analyzed. We experimentally assess the validity of our method and compare it with different baselines, reaching an improvement of more than 10% points on all considered datasets.

Pablo Márquez-Neila, Raphael Sznitman
Captioning Ultrasound Images Automatically

We describe an automatic natural language processing (NLP)-based image captioning method to describe fetal ultrasound video content by modelling the vocabulary commonly used by sonographers and sonologists. The generated captions are similar to the words spoken by a sonographer when describing the scan experience in terms of visual content and performed scanning actions. Using full-length second-trimester fetal ultrasound videos and text derived from accompanying expert voice-over audio recordings, we train deep learning models consisting of convolutional neural networks and recurrent neural networks in merged configurations to generate captions for ultrasound video frames. We evaluate different model architectures using established general metrics (BLEU, ROUGE-L) and application-specific metrics. Results show that the proposed models can learn joint representations of image and text to generate relevant and descriptive captions for anatomies, such as the spine, the abdomen, the heart, and the head, in clinical fetal ultrasound scans.

Mohammad Alsharid, Harshita Sharma, Lior Drukker, Pierre Chatelain, Aris T. Papageorghiou, J. Alison Noble
Feature Transformers: Privacy Preserving Lifelong Learners for Medical Imaging

Deep learning algorithms have achieved tremendous success in many medical imaging problems leading to multiple commercial healthcare applications. For sustaining the performance of these algorithms post-deployment, it is necessary to overcome catastrophic forgetting and continually evolve with data. While catastrophic forgetting could be managed using historical data, a fundamental challenge in Healthcare is data-privacy, where regulations constrain restrict data sharing. In this paper, we present a single, unified mathematical framework - feature transformers, for handling the myriad variants of lifelong learning to overcome catastrophic forgetting without compromising data-privacy. We report state-of-the-art results for lifelong learning on iCIFAR100 dataset and also demonstrate lifelong learning on medical imaging applications - X-ray Pneumothorax classification and Ultrasound cardiac view classification.

Hariharan Ravishankar, Rahul Venkataramani, Saihareesh Anamandra, Prasad Sudhakar, Pavan Annangi
As Easy as 1, 2...4? Uncertainty in Counting Tasks for Medical Imaging

Counting is a fundamental task in biomedical imaging and count is an important biomarker in a number of conditions. Estimating the uncertainty in the measurement is thus vital to making definite, informed conclusions. In this paper, we first compare a range of existing methods to perform counting in medical imaging and suggest ways of deriving predictive intervals from these. We then propose and test a method for calculating intervals as an output of a multi-task network. These predictive intervals are optimised to be as narrow as possible, while also enclosing a desired percentage of the data. We demonstrate the effectiveness of this technique on histopathological cell counting and white matter hyperintensity counting. Finally, we offer insight into other areas where this technique may apply.

Zach Eaton-Rosen, Thomas Varsavsky, Sebastien Ourselin, M. Jorge Cardoso
Generalizable Feature Learning in the Presence of Data Bias and Domain Class Imbalance with Application to Skin Lesion Classification

Training generalizable data-driven models for medical imaging applications is especially challenging as acquiring and accessing sufficiently large medical datasets is often unfeasible. When trained on limited datasets, a high capacity model, as most leading neural network architectures are, is likely to overfit and thus generalize poorly to unseen data. Further aggravating the problem, data used to train models in medicine are typically collected in silos and from narrow data distributions that are determined by specific acquisition hardware, imaging protocols, and patient demographics. In addition, class imbalance within and across datasets is a common complication as disease conditions or sub-types have varying degrees of prevalence. In this paper, we motivate the need for generalizable training in the context of skin lesion classification by evaluating the performance of ResNet across 7 public datasets with dataset bias and class imbalance. To mitigate dataset bias, we extend the classification and contrastive semantic alignment (CCSA) loss that aims to learn domain-invariant features. As the CCSA loss requires labelled data from two domains, we propose a strategy to dynamically sample paired data in a setting where the set of available classes varies across domains. To encourage learning from underrepresented classes, the sampled class probabilities are used to weight the classification and alignment losses. Experimental results demonstrate improved generalizability as measured by the mean macro-average recall across the 7 datasets when training using the weighted CCSA loss and dynamic sampler.

Chris Yoon, Ghassan Hamarneh, Rafeef Garbi
Learning Task-Specific and Shared Representations in Medical Imaging

The performance of multi-task learning hinges on the design of feature sharing between tasks; a process which is combinatorial in the network depth and task count. Hand-crafting an architecture based on human intuitions of task relationships is therefore suboptimal. In this paper, we present a probabilistic approach to learning task-specific and shared representations in Convolutional Neural Networks (CNNs) for multi-task learning of semantic tasks. We introduce Stochastic Filter Groups; which is a mechanism that groups convolutional kernels into task-specific and shared groups to learn an optimal kernel allocation. They facilitate learning optimal shared and task specific representations. We employ variational inference to learn the posterior distribution over the possible grouping of kernels and CNN weights. Experiments on MRI-based prostate radiotherapy organ segmentation and CT synthesis demonstrate that the proposed method learns optimal task allocations that are inline with human-optimised networks whilst improving performance over competing baselines.

Felix J. S. Bragman, Ryutaro Tanno, Sebastien Ourselin, Daniel C. Alexander, M. Jorge Cardoso
Models Genesis: Generic Autodidactic Models for 3D Medical Image Analysis

Transfer learning from natural image to medical image has established as one of the most practical paradigms in deep learning for medical image analysis. However, to fit this paradigm, 3D imaging tasks in the most prominent imaging modalities (e.g., CT and MRI) have to be reformulated and solved in 2D, losing rich 3D anatomical information and inevitably compromising the performance. To overcome this limitation, we have built a set of models, called Generic Autodidactic Models, nicknamed Models Genesis, because they are created ex nihilo (with no manual labeling), self-taught (learned by self-supervision), and generic (served as source models for generating application-specific target models). Our extensive experiments demonstrate that our Models Genesis significantly outperform learning from scratch in all five target 3D applications covering both segmentation and classification. More importantly, learning a model from scratch simply in 3D may not necessarily yield performance better than transfer learning from ImageNet in 2D, but our Models Genesis consistently top any 2D approaches including fine-tuning the models pre-trained from ImageNet as well as fine-tuning the 2D versions of our Models Genesis, confirming the importance of 3D anatomical information and significance of our Models Genesis for 3D medical imaging. This performance is attributed to our unified self-supervised learning framework, built on a simple yet powerful observation: the sophisticated yet recurrent anatomy in medical images can serve as strong supervision signals for deep models to learn common anatomical representation automatically via self-supervision. As open science, all pre-trained Models Genesis are available at https://github.com/MrGiovanni/ModelsGenesis .

Zongwei Zhou, Vatsal Sodha, Md Mahfuzur Rahman Siddiquee, Ruibin Feng, Nima Tajbakhsh, Michael B. Gotway, Jianming Liang
Efficient Ultrasound Image Analysis Models with Sonographer Gaze Assisted Distillation

Recent automated medical image analysis methods have attained state-of-the-art performance but have relied on memory and compute-intensive deep learning models. Reducing model size without significant loss in performance metrics is crucial for time and memory-efficient automated image-based decision-making. Traditional deep learning based image analysis only uses expert knowledge in the form of manual annotations. Recently, there has been interest in introducing other forms of expert knowledge into deep learning architecture design. This is the approach considered in the paper where we propose to combine ultrasound video with point-of-gaze tracked for expert sonographers as they scan to train memory-efficient ultrasound image analysis models. Specifically we develop teacher-student knowledge transfer models for the exemplar task of frame classification for the fetal abdomen, head, and femur. The best performing memory-efficient models attain performance within 5% of conventional models that are $$1000{\times }$$ larger in size.

Arijit Patra, Yifan Cai, Pierre Chatelain, Harshita Sharma, Lior Drukker, Aris T. Papageorghiou, J. Alison Noble
Fetal Pose Estimation in Volumetric MRI Using a 3D Convolution Neural Network

The performance and diagnostic utility of magnetic resonance imaging (MRI) in pregnancy is fundamentally constrained by fetal motion. Motion of the fetus, which is unpredictable and rapid on the scale of conventional imaging times, limits the set of viable acquisition techniques to single-shot imaging with severe compromises in signal-to-noise ratio and diagnostic contrast, and frequently results in unacceptable image quality. Surprisingly little is known about the characteristics of fetal motion during MRI and here we propose and demonstrate methods that exploit a growing repository of MRI observations of the gravid abdomen that are acquired at low spatial resolution but relatively high temporal resolution and over long durations (10–30 min). We estimate fetal pose per frame in MRI volumes of the pregnant abdomen via deep learning algorithms that detect key fetal landmarks. Evaluation of the proposed method shows that our framework achieves quantitatively an average error of 4.47 mm and 96.4% accuracy (with error less than 10 mm). Fetal pose estimation in MRI time series yields novel means of quantifying fetal movements in health and disease, and enables the learning of kinematic models that may enhance prospective mitigation of fetal motion artifacts during MRI acquisition.

Junshen Xu, Molin Zhang, Esra Abaci Turk, Larry Zhang, P. Ellen Grant, Kui Ying, Polina Golland, Elfar Adalsteinsson
Multi-stage Prediction Networks for Data Harmonization

In this paper, we introduce multi-task learning (MTL) to data harmonization (DH); where we aim to harmonize images across different acquisition platforms and sites. This allows us to integrate information from multiple acquisitions and improve the predictive performance and learning efficiency of the harmonization model. Specifically, we introduce the Multi Stage Prediction (MSP) Network, a MTL framework that incorporates neural networks of potentially disparate architectures, trained for different individual acquisition platforms, into a larger architecture that is refined in unison. The MSP utilizes high-level features of single networks for individual tasks, as inputs of additional neural networks to inform the final prediction, therefore exploiting redundancy across tasks to make the most of limited training data. We validate our methods on a dMRI harmonization challenge dataset, where we predict three modern platform types, from one obtained from an old scanner. We show how MTL architectures, such as the MSP, produce around 20% improvement of patch-based mean-squared error over current state-of-the-art methods and that our MSP outperforms off-the-shelf MTL networks. Our code is available [1].

Stefano B. Blumberg, Marco Palombo, Can Son Khoo, Chantal M. W. Tax, Ryutaro Tanno, Daniel C. Alexander
Self-supervised Feature Learning for 3D Medical Images by Playing a Rubik’s Cube

Witnessed the development of deep learning, increasing number of studies try to build computer aided diagnosis systems for 3D volumetric medical data. However, as the annotations of 3D medical data are difficult to acquire, the number of annotated 3D medical images is often not enough to well train the deep learning networks. The self-supervised learning deeply exploiting the information of raw data is one of the potential solutions to loose the requirement of training data. In this paper, we propose a self-supervised learning framework for the volumetric medical images. A novel proxy task, i.e., Rubik’s cube recovery, is formulated to pre-train 3D neural networks. The proxy task involves two operations, i.e., cube rearrangement and cube rotation, which enforce networks to learn translational and rotational invariant features from raw 3D data. Compared to the train-from-scratch strategy, fine-tuning from the pre-trained network leads to a better accuracy on various tasks, e.g., brain hemorrhage classification and brain tumor segmentation. We show that our self-supervised learning approach can substantially boost the accuracies of 3D deep learning networks on the volumetric medical datasets without using extra data. To our best knowledge, this is the first work focusing on the self-supervised learning of 3D neural networks.

Xinrui Zhuang, Yuexiang Li, Yifan Hu, Kai Ma, Yujiu Yang, Yefeng Zheng
Bayesian Volumetric Autoregressive Generative Models for Better Semisupervised Learning

Deep generative models are rapidly gaining traction in medical imaging. Nonetheless, most generative architectures struggle to capture the underlying probability distributions of volumetric data, exhibit convergence problems, and offer no robust indices of model uncertainty. By comparison, the autoregressive generative model PixelCNN can be extended to volumetric data with relative ease, it readily attempts to learn the true underlying probability distribution and it still admits a Bayesian reformulation that provides a principled framework for reasoning about model uncertainty.Our contributions in this paper are two fold: first, we extend PixelCNN to work with volumetric brain magnetic resonance imaging data. Second, we show that reformulating this model to approximate a deep Gaussian process yields a measure of uncertainty that improves the performance of semi-supervised learning, in particular classification performance in settings where the proportion of labelled data is low. We quantify this improvement across classification, regression, and semantic segmentation tasks, training and testing on clinical magnetic resonance brain imaging data comprising T1-weighted and diffusion-weighted sequences.

Guilherme Pombo, Robert Gray, Thomas Varsavsky, John Ashburner, Parashkev Nachev
Hydranet: Data Augmentation for Regression Neural Networks

Deep learning techniques are often criticized to heavily depend on a large quantity of labeled data. This problem is even more challenging in medical image analysis where the annotator expertise is often scarce. We propose a novel data-augmentation method to regularize neural network regressors that learn from a single global label per image. The principle of the method is to create new samples by recombining existing ones. We demonstrate the performance of our algorithm on two tasks: estimation of the number of enlarged perivascular spaces in the basal ganglia, and estimation of white matter hyperintensities volume. We show that the proposed method improves the performance over more basic data augmentation. The proposed method reached an intraclass correlation coefficient between ground truth and network predictions of 0.73 on the first task and 0.84 on the second task, only using between 25 and 30 scans with a single global label per scan for training. With the same number of training scans, more conventional data augmentation methods could only reach intraclass correlation coefficients of 0.68 on the first task, and 0.79 on the second task.

Florian Dubost, Gerda Bortsova, Hieab Adams, M. Arfan Ikram, Wiro Niessen, Meike Vernooij, Marleen de Bruijne
A Dirty Multi-task Learning Method for Multi-modal Brain Imaging Genetics

Brain imaging genetics is an important research topic in brain science, which combines genetic variations and brain structures or functions to uncover the genetic basis of brain disorders. Imaging data collected by different technologies, measuring the same brain distinctly, might carry complementary but different information. Unfortunately, we do not know the extent to which phenotypic variance is shared among multiple imaging modalities, which might trace back to the complex genetic mechanism. In this study, we propose a novel dirty multi-task SCCA to analyze imaging genetics problems with multiple modalities of brain imaging quantitative traits (QTs) involved. The proposed method can not only identify the shared SNPs and QTs across multiple modalities, but also identify the modality-specific SNPs and QTs, showing a flexible capability of discovering the complex multi-SNP-multi-QT associations. Compared with the multi-view SCCA and multi-task SCCA, our method shows better canonical correlation coefficients and canonical weights on both synthetic and real neuroimaging genetic data. This demonstrates that the proposed dirty multi-task SCCA could be a meaningful and powerful alternative method in multi-modal brain imaging genetics.

Lei Du, Fang Liu, Kefei Liu, Xiaohui Yao, Shannon L. Risacher, Junwei Han, Lei Guo, Andrew J. Saykin, Li Shen, for the Alzheimer’s Disease Neuroimaging Initiative
Robust and Discriminative Brain Genome Association Study

Brain Genome Association (BGA) study, which investigates the associations between brain structure/function (characterized by neuroimaging phenotypes) and genetic variations (characterized by Single Nucleotide Polymorphisms (SNPs)), is important in pathological analysis of neurological disease. However, the current BGA studies are limited as they did not explicitly consider the disease labels, source importance, and sample importance in their formulations. We address these issues by proposing a robust and discriminative BGA formulation. Specifically, we learn two transformation matrices for mapping two heterogeneous data sources (i.e., neuroimaging data and genetic data) into a common space, so that the samples from the same subject (but different sources) are close to each other, and also the samples with different labels are separable. In addition, we add a sparsity constraint on the transformation matrices to enable feature selection on both data sources. Furthermore, both sample importance and source importance are also considered in the formulation via adaptive parameter-free sample and source weightings. We have conducted various experiments, using Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, to test how well the neuroimaging phenotypes and SNPs can represent each other in the common space.

Xiaofeng Zhu, Dinggang Shen
Symmetric Dual Adversarial Connectomic Domain Alignment for Predicting Isomorphic Brain Graph from a Baseline Graph

Medical image synthesis techniques can circumvent the need for costly clinical scan acquisitions using different modalities such as functional Magnetic Resonance Imaging (MRI). Recently, deep learning frameworks were designed to predict a target medical modality from a source one (e.g., MRI from Computed Tomography (CT)). However, such methods which work well on images might fail when handling geometric brain data such as graphs (or connectomes). To the best of our knowledge, learning how to predict brain graph from a source graph based on geometric deep learning remains unexplored [1]. Given a set of isomorphic source and target brain graph (i.e., derived from the same parcellation brain template so their topology is similar), learning how to predict target brain graph from a source graph has two major challenges. The first one is that the source and target domains might have different distributions, which causes a domain fracture. The second challenge can be viewed as a limitation of existing image synthesis methods which address the domain fracture and multimodal data prediction independently. To address both limitations, we unprecedentedly propose a Symmetric Dual Adversarial Domain Alignment (SymDADA) framework for predicting target brain graph from a source graph. SymDADA aligns source and target domains by learning their shared embedding while alternating two regularization constraints: (i) adversarial regularization matching the distribution of the learned shared embedding with that of the source graphs using training and testing data, and (ii) adversarial regularization enforcing the embedded source distribution to match the distribution of the predicted target graphs using only the training samples. In this way, we are optimally adapting the source to the target space as we are jointly predicting the target graph when learning the graph embedding. Our proposed SymDADA framework outperformed its variants for predicting a target brain graph from a source graph in healthy and autistic subjects.

Alaa Bessadok, Mohamed Ali Mahjoub, Islem Rekik
Harmonization of Infant Cortical Thickness Using Surface-to-Surface Cycle-Consistent Adversarial Networks

Increasing multi-site infant neuroimaging datasets are facilitating the research on understanding early brain development with larger sample size and bigger statistical power. However, a joint analysis of cortical properties (e.g., cortical thickness) is unavoidably facing the problem of non-biological variance introduced by differences in MRI scanners. To address this issue, in this paper, we propose cycle-consistent adversarial networks based on spherical cortical surface to harmonize cortical thickness maps between different scanners. We combine the spherical U-Net and CycleGAN to construct a surface-to-surface CycleGAN (S2SGAN). Specifically, we model the harmonization from scanner X to scanner Y as a surface-to-surface translation task. The first goal of harmonization is to learn a mapping $$G_X: X\rightarrow Y$$ such that the distribution of surface thickness maps from $$G_X(X)$$ is indistinguishable from Y. Since this mapping is highly under-constrained, with the second goal of harmonization to preserve individual differences, we utilize the inverse mapping $$G_Y: Y\rightarrow X$$ and the cycle consistency loss to enforce $$G_Y(G_X(X))\approx X$$ (and vice versa). Furthermore, we incorporate the correlation coefficient loss to guarantee the structure consistency between the original and the generated surface thickness maps. Quantitative evaluation on both synthesized and real infant cortical data demonstrates the superior ability of our method in removing unwanted scanner effects and preserving individual differences simultaneously, compared to the state-of-the-art methods.

Fenqiang Zhao, Zhengwang Wu, Li Wang, Weili Lin, Shunren Xia, Dinggang Shen, Gang Li, the UNC/UMN Baby Connectome Project Consortium
Quantifying Confounding Bias in Neuroimaging Datasets with Causal Inference

Neuroimaging datasets keep growing in size to address increasingly complex medical questions. However, even the largest datasets today alone are too small for training complex machine learning models. A potential solution is to increase sample size by pooling scans from several datasets. In this work, we combine 12,207 MRI scans from 15 studies and show that simple pooling is often ill-advised due to introducing various types of biases in the training data. First, we systematically define these biases. Second, we detect bias by experimentally showing that scans can be correctly assigned to their respective dataset with 73.3% accuracy. Finally, we propose to tell causal from confounding factors by quantifying the extent of confounding and causality in a single dataset using causal inference. We achieve this by finding the simplest graphical model in terms of Kolmogorov complexity. As Kolmogorov complexity is not directly computable, we employ the minimum description length to approximate it. We empirically show that our approach is able to estimate plausible causal relationships from real neuroimaging data.

Christian Wachinger, Benjamin Gutierrez Becker, Anna Rieckmann, Sebastian Pölsterl

Computer-Aided Diagnosis

Frontmatter
Multi Scale Curriculum CNN for Context-Aware Breast MRI Malignancy Classification

Classification of malignancy for breast cancer and other cancer types is usually tackled as an object detection problem: Individual lesions are first localized and then classified with respect to malignancy. However, the drawback of this approach is that abstract features incorporating several lesions and areas that are not labelled as a lesion but contain global medically relevant information are thus disregarded: especially for dynamic contrast-enhanced breast MRI, criteria such as background parenchymal enhancement and location within the breast are important for diagnosis and cannot be captured by object detection approaches properly.In this work, we propose a 3D CNN and a multi scale curriculum learning strategy to classify malignancy globally based on an MRI of the whole breast. Thus, the global context of the whole breast rather than individual lesions is taken into account. Our proposed approach does not rely on lesion segmentations, which renders the annotation of training data much more effective than in current object detection approaches.Achieving an AUROC of 0.89, we compare the performance of our approach to Mask R-CNN and Retina U-Net as well as a radiologist. Our performance is on par with approaches that, in contrast to our method, rely on pixelwise segmentations of lesions.

Christoph Haarburger, Michael Baumgartner, Daniel Truhn, Mirjam Broeckmann, Hannah Schneider, Simone Schrading, Christiane Kuhl, Dorit Merhof
Deep Angular Embedding and Feature Correlation Attention for Breast MRI Cancer Analysis

Accurate and automatic analysis of breast MRI plays a vital role in early diagnosis and successful treatment planning for breast cancer. Due to the heterogeneity nature, precise diagnosis of tumors remains a challenging task. In this paper, we propose to identify breast tumor in MRI by Cosine Margin Sigmoid Loss (CMSL) with deep learning (DL) and localize possible cancer lesion by COrrelation Attention Map (COAM) based on the learned features. The CMSL embeds tumor features onto a hyper-sphere and imposes a decision margin through cosine constraints. In this way, the DL model could learn more separable inter-class features and more compact intra-class features in the angular space. Furthermore, we utilize the correlations among feature vectors to generate attention maps that could accurately localize cancer candidates with only image-level labels. We build the largest breast cancer dataset involving 10,290 DCE-MRI scan volumes for developing and evaluating the proposed methods. The model driven by CMSL achieved a classification accuracy of 0.855 and AUC of 0.902 on the testing set, with sensitivity and specificity of 0.857 and 0.852, respectively, outperforming competitive methods overall. In addition, the proposed COAM accomplished more accurate localization of the cancer center compared with other state-of-the-art weakly supervised localization method.

Luyang Luo, Hao Chen, Xi Wang, Qi Dou, Huangjing Lin, Juan Zhou, Gongjie Li, Pheng-Ann Heng
Fully Deep Learning for Slit-Lamp Photo Based Nuclear Cataract Grading

Age-related cataract is a priority eye disease, with nuclear cataract as its most common type. This paper aims for automated nuclear cataract grading based on slit-lamp photos. Different from previous efforts which rely on traditional feature extraction and grade modeling techniques, we propose in this paper a fully deep learning based solution. Given a slit-lamp photo, we localize its nuclear region by Faster R-CNN, followed by a ResNet-101 based grading model. In order to alleviate the issue of imbalanced data, a simple batch balancing strategy is introduced for improving the training of the grading network. Tested on a clinical dataset of 157 slit-lamp photos from 39 female and 31 male patients, the proposed solution outperforms the state-of-the-art, reducing the mean absolute error from 0.357 to 0.313. In addition, our solution processes a slit-lamp photo in approximately 0.1 s, which is two order faster than the state-of-the-art. With its effectiveness and efficiency, the new solution is promising for automated nuclear cataract grading.

Chaoxi Xu, Xiangjia Zhu, Wenwen He, Yi Lu, Xixi He, Zongjiang Shang, Jun Wu, Keke Zhang, Yinglei Zhang, Xianfang Rong, Zhennan Zhao, Lei Cai, Dayong Ding, Xirong Li
Overcoming Data Limitation in Medical Visual Question Answering

Traditional approaches for Visual Question Answering (VQA) require large amount of labeled data for training. Unfortunately, such large scale data is usually not available for medical domain. In this paper, we propose a novel medical VQA framework that overcomes the labeled data limitation. The proposed framework explores the use of the unsupervised Denoising Auto-Encoder (DAE) and the supervised Meta-Learning. The advantage of DAE is to leverage the large amount of unlabeled images while the advantage of Meta-Learning is to learn meta-weights that quickly adapt to VQA problem with limited labeled data. By leveraging the advantages of these techniques, it allows the proposed framework to be efficiently trained using a small labeled training set. The experimental results show that our proposed method significantly outperforms the state-of-the-art medical VQA. The source code is available at https://github.com/aioz-ai/MICCAI19-MedVQA .

Binh D. Nguyen, Thanh-Toan Do, Binh X. Nguyen, Tuong Do, Erman Tjiputra, Quang D. Tran
Multi-Instance Multi-Scale CNN for Medical Image Classification

Deep learning for medical image classification faces three major challenges: (1) the number of annotated medical images for training are usually small; (2) regions of interest (ROIs) are relatively small with unclear boundaries in the whole medical images, and may appear in arbitrary positions across the x, y (and also z in 3D images) dimensions. However often only labels of the whole images are annotated, and localized ROIs are unavailable; and (3) ROIs in medical images often appear in varying sizes (scales). We approach these three challenges with a Multi-Instance Multi-Scale (MIMS) CNN: (1) We propose a multi-scale convolutional layer, which extracts patterns of different receptive fields with a shared set of convolutional kernels, so that scale-invariant patterns are captured by this compact set of kernels. As this layer contains only a small number of parameters, training on small datasets becomes feasible; (2) We propose a “top-k pooling” to aggregate the feature maps in varying scales from multiple spatial dimensions, allowing the model to be trained using weak annotations within the multiple instance learning (MIL) framework. Our method is shown to perform well on three classification tasks involving two 3D and two 2D medical image datasets.

Shaohua Li, Yong Liu, Xiuchao Sui, Cheng Chen, Gabriel Tjio, Daniel Shu Wei Ting, Rick Siow Mong Goh
Improving Uncertainty Estimation in Convolutional Neural Networks Using Inter-rater Agreement

Modern neural networks are pushing the boundaries of medical image classification. For some tasks in dermatology, state of the art models are able to beat human experts in terms of accuracy and type I/II errors. However, in the domain of medical applications, models should also be evaluated on how well they capture uncertainty in samples and labels. This aspect is key to building trust in computer-assisted systems, otherwise largely considered to be black boxes by their users. A common practice in supervised learning is to collect multiple evaluations per sample, which is particularly useful when inter-rater agreement is expected to be low. At the same time, model training traditionally uses label fusion, such as majority voting, to produce a single label for each sample. In this paper, we experimentally show that models trained to predict skin conditions become overconfident when this approach is used; i.e. the probability estimates of the model exceeds the true correctness likelihood. Additionally, we show that a better calibrated model is obtained when training with a label sampling scheme that takes advantage of inter-rater variability during training. The calibration improvements come at no cost in terms of model accuracy. Our approach is combined and contrasted with other recent techniques in uncertainty estimation. All experiments are evaluated on a proprietary dataset consisting of 31017 images of skin, where up to 12 experts have diagnosed each image.

Martin Holm Jensen, Dan Richter Jørgensen, Raluca Jalaboi, Mads Eiler Hansen, Martin Aastrup Olsen
Improving Skin Condition Classification with a Visual Symptom Checker Trained Using Reinforcement Learning

We present a visual symptom checker that combines a pre-trained Convolutional Neural Network (CNN) with a Reinforcement Learning (RL) agent as a Question Answering (QA) model. This method increases the classification confidence and accuracy of the visual symptom checker, and decreases the average number of questions asked to narrow down the differential diagnosis. A Deep Q-Network (DQN)-based RL agent learns how to ask the patient about the presence of symptoms in order to maximize the probability of correctly identifying the underlying condition. The RL agent uses the visual information provided by CNN in addition to the answers to the asked questions to guide the QA system. We demonstrate that the RL-based approach increases the accuracy more than 20% compared to the CNN-only approach, which only uses the visual information to predict the condition. Moreover, the increased accuracy is up to 10% compared to the approach that uses the visual information provided by CNN along with a conventional decision tree-based QA system. We finally show that the RL-based approach not only outperforms the decision tree-based approach, but also narrows down the diagnosis faster in terms of the average number of asked questions.

Mohamed Akrout, Amir-massoud Farahmand, Tory Jarmain, Latif Abid
DScGANS: Integrate Domain Knowledge in Training Dual-Path Semi-supervised Conditional Generative Adversarial Networks and S3VM for Ultrasonography Thyroid Nodules Classification

Semi-supervised learning can reduce the burden of manual label data and improve classification performance by learning with unlabelled data. However, due to the absence of label constraints, unlabelled data is usually ambiguous, which typically results in requiring large datasets to learn the correct feature space distribution. The inherently small sample characteristics of medical image datasets may make semi-supervised learning unstable, which may lead to mixed results and even degrade performance. The domain knowledge (DK) of the physician is of great value for disease diagnosis. In this paper, we propose to promote semi-supervised learning with DK and develop a DScGANS model (DScGAN (dual-path semi-supervised conditional generative adversarial networks) and S3VM (semi-supervised support vector machine)) to diagnose ultrasound thyroid nodules. DScGAN uses DK as a condition and multimodal ultrasound data for training. We concatenate the image representation of DScGAN learning and use it as the input of S3VM. DK will be used as a condition to constrain S3VM for thyroid nodule classification. The experimental results show that our proposed model can effectively avoid mixed results that may occur in semi-supervised learning with a small medical dataset with insufficient labels. Additionally, our model provides stable and advanced diagnostic performance and is potentially integrated into the thyroid ultrasound system.

Wenkai Yang, Juanjuan Zhao, Yan Qiang, Xiaotang Yang, Yunyun Dong, Qianqian Du, Guohua Shi, Muhammad Bilal Zia
Similarity Steered Generative Adversarial Network and Adaptive Transfer Learning for Malignancy Characterization of Hepatocellualr Carcinoma

Deep learning with Convolutional Neural Network (CNN) has exhibited high diagnostic performance for lesion characterization. However, it is still challenging to train powerful deep learning systems for lesion characterization, because there are often limited samples in different malignancy types and there exist considerable variabilities across images from multiple scanners in clinical practice. In this work, we propose a similarity steered generative adversarial network (SSGAN) coupled with pre-train and adaptive fine-turning of data from multiple scanners for lesion characterization. Specifically, SSGAN is based on adding a similarity discriminative measure in the conventional generative adversarial network to effectively generate more discrepant samples, while the adaptive fine-tune strategy is adopted to optimally make decisions on whether to use the pre-train layers or the fine-tune layers. Experimental results of pathologically confirmed malignancy of clinical hepatocellular carcinoma (HCCs) with MR images acquired by different scanners (GE, Philips and Siemens) demonstrate several intriguing characteristics of the proposed end-to-end framework for malignancy characterization of HCC as follows: (1) The proposed SSGAN remarkably improves the performance of lesion characterization and outperforms several recently proposed methods. (2) The adaptive fine-tuning combined with the proposed SSGAN can further improve the performance of lesion characterization in the context of limited data. (3) Clinical images acquired by one MR scanner for pre-train can be used to improve the characterization performance of images acquired by another MR scanner, outperforming the pre-train with ImageNet.

Hanqiu Ju, Wanwei Jian, Xiaoping Cen, Guangyi Wang, Wu Zhou
Unsupervised Clustering of Quantitative Imaging Phenotypes Using Autoencoder and Gaussian Mixture Model

Quantitative medical image computing (radiomics) has been widely applied to build prediction models from medical images. However, overfitting is a significant issue in conventional radiomics, where a large number of radiomic features are directly used to train and test models that predict genotypes or clinical outcomes. In order to tackle this problem, we propose an unsupervised learning pipeline composed of an autoencoder for representation learning of radiomic features and a Gaussian mixture model based on minimum message length criterion for clustering. By incorporating probabilistic modeling, disease heterogeneity has been taken into account. The performance of the proposed pipeline was evaluated on an institutional MRI cohort of 108 patients with colorectal cancer liver metastases. Our approach is capable of automatically selecting the optimal number of clusters and assigns patients into clusters (imaging subtypes) with significantly different survival rates. Our method outperforms other unsupervised clustering methods that have been used for radiomics analysis and has comparable performance to a state-of-the-art imaging biomarker.

Jianan Chen, Laurent Milot, Helen M. C. Cheung, Anne L. Martel
Adaptive Sparsity Regularization Based Collaborative Clustering for Cancer Prognosis

Radiomic approaches have achieved promising performance in prediction of clinical outcomes of cancer patients. Particularly, feature dimensionality reduction plays an important role in radiomic studies. However, conventional feature dimensionality reduction techniques are not equipped to suppress data noise or utilize latent supervision information of patient data under study (e.g. difference in patients) for learning discriminative low dimensional representations. To achieve feature dimensionality reduction with improved discriminative power and robustness to noisy radiomic features, we develop an adaptive sparsity regularization based collaborative clustering method to simultaneously cluster patients and radiomic features into distinct groups respectively. Our method is built on adaptive sparsity regularized matrix tri-factorization for simultaneous feature denoising and dimension reduction so that the noise is adaptively isolated from the features, and grouping information of patients with distinctive features provides latent supervision information to guide feature dimension reduction. The sparsity regularization is grounded on distribution modeling of transform-domain coefficients in a Bayesian framework. Experiments on synthetic data have demonstrated the effectiveness of the proposed approach in data clustering, and empirical results on an FDG-PET/CT dataset of rectal cancer patients have demonstrated that the proposed method outperforms alternative methods in terms of both patient stratification and prediction of patient clinical outcomes.

Hangfan Liu, Hongming Li, Yuemeng Li, Shi Yin, Pamela Boimel, James Janopaul-Naylor, Haoyu Zhong, Ying Xiao, Edgar Ben-Josef, Yong Fan
Coronary Artery Plaque Characterization from CCTA Scans Using Deep Learning and Radiomics

Assessing coronary artery plaque segments in coronary CT angiography scans is an important task to improve patient management and clinical outcomes, as it can help to decide whether invasive investigation and treatment are necessary. In this work, we present three machine learning approaches capable of performing this task. The first approach is based on radiomics, where a plaque segmentation is used to calculate various shape-, intensity- and texture-based features under different image transformations. A second approach is based on deep learning and relies on centerline extraction as sole prerequisite. In the third approach, we fuse the deep learning approach with radiomic features. On our data the methods reached similar scores as simulated fractional flow reserve (FFR) measurements, which - in contrast to our methods - requires an exact segmentation of the whole coronary tree and often time-consuming manual interaction. In literature, the performance of simulated FFR reaches an AUC between 0.79–0.93 predicting an abnormal invasive FFR that demands revascularization. The radiomics approach achieves an AUC of 0.84, the deep learning approach 0.86 and the combined method 0.88 for predicting the revascularization decision directly. While all three proposed methods can be determined within seconds, the FFR simulation typically takes several minutes. Provided representative training data in sufficient quantities, we believe that the presented methods can be used to create systems for fully automatic non-invasive risk assessment for a variety of adverse cardiac events.

Felix Denzinger, Michael Wels, Nishant Ravikumar, Katharina Breininger, Anika Reidelshöfer, Joachim Eckert, Michael Sühling, Axel Schmermund, Andreas Maier
Response Estimation Through Spatially Oriented Neural Network and Texture Ensemble (RESONATE)

Neoadjuvant chemotherapy (NAC) is considered to be the standard treatment for locally advanced breast cancer, but less than half of all recipients achieve pathological complete response (pCR), necessitating a way to predict pCR prior to NAC. Previous work has shown that pCR prediction is viable via either radiomic or deep learning classification methods when applied to the tumoral region on breast MRI. Others have shown that analysis within the peritumoral region directly outside of the tumor can contribute unique value to pCR prediction. In this work we present Response Estimation through Spatially Oriented Neural Network and Texture Ensemble (RESONATE): an approach to spatially invoke different types of analytic representations in different tumor compartments to create a multi-representation based prediction of response to NAC in breast cancer. A total of 114 NAC recipients with pre-treatment MRI were retrospectively analyzed, with 80 of the patients used for training and 34 held out as an independent testing set. Deep learning and radiomic classifiers were trained separately within the tumor and the peritumoral region, with separate classifier predictions then being fused together via a logistic regression classifier. In the testing set, individual radiomics and deep learning classifiers achieved area under the curve (AUC) values of 0.69 and 0.75 within the tumor, respectively, and 0.69 and 0.66 within the peritumoral region. A weighted fusion of these four classifiers, however, best predicted pCR with an AUC of 0.79. This approach also outperformed fusions incorporating radiomic (AUC = 0.77) or deep learning (AUC = 0.75) only, as well as combinations of representations only within (AUC = 0.78) or outside (AUC = 0.70) the tumor.

Jeffrey E. Eben, Nathaniel Braman, Anant Madabhushi
STructural Rectal Atlas Deformation (StRAD) Features for Characterizing Intra- and Peri-wall Chemoradiation Response on MRI

Radiomic features which quantify morphologic texture and shape of tumor regions on imaging have found wide success in characterizing treatment response in vivo. A more detailed interrogation of intra- and peri-tumoral regions for response-related cues could be achieved by capturing subtle structural deformations that occur due to tumor shrinkage or growth. In this work, we present a novel suite of STructural Rectal Atlas Deformation (StRAD) features to quantify tumor-related deformations in rectal cancers via a cohort of 139 patient MRIs. In flexible non-rigid organs such as the rectum, inter-patient differences complicate evaluation of tumor-related deformations that may occur within the rectal wall or in the peri-rectal environment; necessitating construction of a canonical rectal imaging atlas. Using 63 pelvic MRIs where healthy rectums could be clearly visualized, we built the first structural atlas for the healthy rectal wall. This atlas was used to compute structural deformations within and around locations in the rectal wall of patients where tumor was present, resulting in intra- and peri-wall StRAD descriptors. We evaluated the efficacy of our StRAD features in 2 different tasks: (a) predicting which rectal tumors will or will not respond to therapy via baseline MRIs (n = 42), and (b) identifying which rectal tumors were exhibiting regression on post-chemoradiation MRIs (n = 34). Using a linear discriminant analysis classifier in a three-fold cross-validation scheme, we found that intra-wall deformations were significantly lower for responders to chemoradiation; both on baseline MRIs (AUC = $$0.73\pm 0.05$$ ) as well as on post-therapy MRIs (AUC = $$0.87\pm 0.03$$ ). By comparison, radiomic texture features for both intra- and peri-wall locations yielded significantly worse classification performance in both tasks.

Jacob Antunes, Zhouping Wei, Charlems Alvarez-Jimenez, Eduardo Romero, Marwa Ismail, Anant Madabhushi, Pallavi Tiwari, Satish E. Viswanath
Dynamic Routing Capsule Networks for Mild Cognitive Impairment Diagnosis

Alzheimer’s disease (AD) is a chronic neurodegenerative disease that could cause severe cognitive damage to the patients. Diagnosis of AD at its preclinical stage, i.e., mild cognitive impairment (MCI), could help to prevent or slow down AD progression. With machine learning, automatic MCI diagnosis could be achieved. Most of the previous studies mainly share a similar frame-work, i.e., building a classifier based on the features extracted from static or dynamic functional connectivity. Recently, inspired by the great successes achieved by deep learning in other areas of medical image analysis, researchers have introduced neural network models for MCI diagnosis. In this paper, we propose dynamic routing capsule networks for MCI diagnosis. Our proposed methods are based on a novel neural network fashion of capsule net. Two variants of capsule net are designed and discussed, which respectively uses the intra-ROIs and inter-ROIs dynamic routing to obtain functional representation. More importantly, we design a learnable dynamic functional connectivity metric in our inter-ROIs dynamic model, in which the functional connectivity is dynamically learned during network training. To the best of our knowledge, it’s the first time to propose dynamic routing capsule networks for MCI diagnosis. Compared with other machine learning methods and deep learning model, our method can achieve superior performance from various aspects of evaluations.

Zhicheng Jiao, Pu Huang, Tae-Eui Kam, Li-Ming Hsu, Ye Wu, Han Zhang, Dinggang Shen
Deep Multi-modal Latent Representation Learning for Automated Dementia Diagnosis

Effective fusion of multi-modality neuroimaging data, such as structural magnetic resonance imaging (MRI) and fluorodeoxyglucose positron emission tomography (PET), has attracted increasing interest in computer-aided brain disease diagnosis, by providing complementary structural and functional information of the brain to improve diagnostic performance. Although considerable progress has been made, there remain several significant challenges in traditional methods for fusing multi-modality data. First, the fusion of multi-modality data is usually independent of the training of diagnostic models, leading to sub-optimal performance. Second, it is challenging to effectively exploit the complementary information among multiple modalities based on low-level imaging features (e.g., image intensity or tissue volume). To this end, in this paper, we propose a novel Deep Latent Multi-modality Dementia Diagnosis (DLMD $$^2$$ ) framework based on a deep non-negative matrix factorization (NMF) model. Specifically, we integrate the feature fusion/learning process into the classifier construction step for eliminating the gap between neuroimaging features and disease labels. To exploit the correlations among multi-modality data, we learn latent representations for multi-modality data by sharing the common high-level representations in the last layer of each modality in the deep NMF model. Extensive experimental results on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset validate that our proposed method outperforms several state-of-the-art methods.

Tao Zhou, Mingxia Liu, Huazhu Fu, Jun Wang, Jianbing Shen, Ling Shao, Dinggang Shen
Dynamic Spectral Graph Convolution Networks with Assistant Task Training for Early MCI Diagnosis

Functional brain connectome, also known as inter-regional functional connectivity (FC) matrix, is recently considered providing decisive markers for early mild cognitive impairment (eMCI). However, in most existing methods, vectorized static FC matrices and some “off-the-shelf” classifiers were used, which may lead to a deprecation of both spatial and temporal information and thus compromise the diagnosis performance. In this paper, we propose dynamic spectral graph convolution networks (DS-GCNs) for early MCI diagnosis using functional MRI (fMRI). First, a dynamic brain graph is constructed so that the connectivity strengths (edges) are derived by time-varying correlations of fMRI signals, and the node signals are computed from T1 MR images. Then, the spectral graph convolution (GC) based long short term memory (LSTM) network is employed to process long range temporal information from the dynamic graphs. Finally, instead of directly using demographic information as additional inputs as in the conventional methods, we proposed to predict gender and age of each subject as assistant tasks, which in turn captures useful network features and facilitates the main task of eMCI classification; we refer this strategy as assistant task training. Experiments on 294 training and 74 testing subjects show that eMCI classification results achieved $$79.7\%$$ accuracy (with $$86.5\%$$ sensitivity and $$73.0\%$$ specificity) and outperformed the state-of-the-art methods. Notably, the proposed method could be further extended to other Connectomics studies, where the graphs are computed through white matter fiber connections or gray matter characteristics.

Xiaodan Xing, Qingfeng Li, Hao Wei, Minqing Zhang, Yiqiang Zhan, Xiang Sean Zhou, Zhong Xue, Feng Shi
Bridging Imaging, Genetics, and Diagnosis in a Coupled Low-Dimensional Framework

We propose a joint dictionary learning framework that couples imaging and genetics data in a low dimensional subspace as guided by clinical diagnosis. We use a graph regularization penalty to simultaneously capture inter-regional brain interactions and identify the representative set anatomical basis vectors that span the low dimensional space. We further employ group sparsity to find the representative set of genetic basis vectors that span the same latent space. Finally, the latent projection is used to classify patients versus controls. We have evaluated our model on two task fMRI paradigms and single nucleotide polymorphism (SNP) data from schizophrenic patients and matched neurotypical controls. We employ a ten fold cross validation technique to show the predictive power of our model. We compare our model with canonical correlation analysis of imaging and genetics data and random forest classification. Our approach shows better prediction accuracy on both task datasets. Moreover, the implicated brain regions and genetic variants underlie the well documented deficits in schizophrenia.

Sayan Ghosal, Qiang Chen, Aaron L. Goldman, William Ulrich, Karen F. Berman, Daniel R. Weinberger, Venkata S. Mattay, Archana Venkataraman
Global and Local Interpretability for Cardiac MRI Classification

Deep learning methods for classifying medical images have demonstrated impressive accuracy in a wide range of tasks but often these models are hard to interpret, limiting their applicability in clinical practice. In this work we introduce a convolutional neural network model for identifying disease in temporal sequences of cardiac MR segmentations which is interpretable in terms of clinically familiar measurements. The model is based around a variational autoencoder, reducing the input into a low-dimensional latent space in which classification occurs. We then use the recently developed ‘concept activation vector’ technique to associate concepts which are diagnostically meaningful (eg. clinical biomarkers such as ‘low left-ventricular ejection fraction’) to certain vectors in the latent space. These concepts are then qualitatively inspected by observing the change in the image domain resulting from interpolations in the latent space in the direction of these vectors. As a result, when the model classifies images it is also capable of providing naturally interpretable concepts relevant to that classification and demonstrating the meaning of those concepts in the image domain. Our approach is demonstrated on the UK Biobank cardiac MRI dataset where we detect the presence of coronary artery disease.

James R. Clough, Ilkay Oksuz, Esther Puyol-Antón, Bram Ruijsink, Andrew P. King, Julia A. Schnabel
Let’s Agree to Disagree: Learning Highly Debatable Multirater Labelling

Classification and differentiation of small pathological objects may greatly vary among human raters due to differences in training, expertise and their consistency over time. In a radiological setting, objects commonly have high within-class appearance variability whilst sharing certain characteristics across different classes, making their distinction even more difficult. As an example, markers of cerebral small vessel disease, such as enlarged perivascular spaces (EPVS) and lacunes, can be very varied in their appearance while exhibiting high inter-class similarity, making this task highly challenging for human raters. In this work, we investigate joint models of individual rater behaviour and multi-rater consensus in a deep learning setting, and apply it to a brain lesion object-detection task. Results show that jointly modelling both individual and consensus estimates leads to significant improvements in performance when compared to directly predicting consensus labels, while also allowing the characterization of human-rater consistency.

Carole H. Sudre, Beatriz Gomez Anson, Silvia Ingala, Chris D. Lane, Daniel Jimenez, Lukas Haider, Thomas Varsavsky, Ryutaro Tanno, Lorna Smith, Sébastien Ourselin, Rolf H. Jäger, M. Jorge Cardoso
Coidentification of Group-Level Hole Structures in Brain Networks via Hodge Laplacian

One of outstanding issues in brain network analysis is to extract common topological substructure shared by a group of individuals. Recently, methods to detect group-wise modular structure on graph Laplacians have been introduced. From the perspective of algebraic topology, the modules or clusters are the zeroth topology information of a topological space. Higher order topology information can be found in holes. In this study, we extend the concept of graph Laplacian to higher order Hodge Laplacian of weighted networks, and develop a group-level hole identification method via the Stiefel optimization. In experiments, we applied the proposed method to three synthetic data and Alzheimer’s disease neuroimaing initiative (ADNI) database. Experimental results showed that the coidentification of group-level hole structures helped to find the underlying topology information of brain networks that discriminate groups well.

Hyekyoung Lee, Moo K. Chung, Hyejin Kang, Hongyoon Choi, Seunggyun Ha, Youngmin Huh, Eunkyung Kim, Dong Soo Lee
Confident Head Circumference Measurement from Ultrasound with Real-Time Feedback for Sonographers

Manual estimation of fetal Head Circumference (HC) from Ultrasound (US) is a key biometric for monitoring the healthy development of fetuses. Unfortunately, such measurements are subject to large inter-observer variability, resulting in low early-detection rates of fetal abnormalities. To address this issue, we propose a novel probabilistic Deep Learning approach for real-time automated estimation of fetal HC. This system feeds back statistics on measurement robustness to inform users how confident a deep neural network is in evaluating suitable views acquired during free-hand ultrasound examination. In real-time scenarios, this approach may be exploited to guide operators to scan planes that are as close as possible to the underlying distribution of training images, for the purpose of improving inter-operator consistency. We train on freehand ultrasound data from over 2000 subjects (2848 training/540 test) and show that our method is able to predict HC measurements within $$1.81 \pm 1.65$$ mm deviation from the ground truth, with 50% of the test images fully contained within the predicted confidence margins, and an average of $$1.82 \pm 1.78$$ mm deviation from the margin for the remaining cases that are not fully contained.

Samuel Budd, Matthew Sinclair, Bishesh Khanal, Jacqueline Matthew, David Lloyd, Alberto Gomez, Nicolas Toussaint, Emma C. Robinson, Bernhard Kainz

Image Reconstruction and Synthesis

Frontmatter
Detection and Correction of Cardiac MRI Motion Artefacts During Reconstruction from k-space

In fully sampled cardiac MR (CMR) acquisitions, motion can lead to corruption of k-space lines, which can result in artefacts in the reconstructed images. In this paper, we propose a method to automatically detect and correct motion-related artefacts in CMR acquisitions during reconstruction from k-space data. Our correction method is inspired by work on undersampled CMR reconstruction, and uses deep learning to optimize a data-consistency term for under-sampled k-space reconstruction. Our main methodological contribution is the addition of a detection network to classify motion-corrupted k-space lines to convert the problem of artefact correction to a problem of reconstruction using the data consistency term. We train our network to automatically correct for motion-related artefacts using synthetically corrupted cine CMR k-space data as well as uncorrupted CMR images. Using a test set of 50 2D+time cine CMR datasets from the UK Biobank, we achieve good image quality in the presence of synthetic motion artefacts. We quantitatively compare our method with a variety of techniques for recovering good image quality and showcase better performance compared to state of the art denoising techniques with a PSNR of 37.1. Moreover, we show that our method preserves the quality of uncorrupted images and therefore can be also utilized as a general image reconstruction algorithm.

Ilkay Oksuz, James Clough, Bram Ruijsink, Esther Puyol-Antón, Aurelien Bustin, Gastao Cruz, Claudia Prieto, Daniel Rueckert, Andrew P. King, Julia A. Schnabel
Exploiting Motion for Deep Learning Reconstruction of Extremely-Undersampled Dynamic MRI

The problem of accelerated acquisition for dynamic MRI has been recently tackled with deep learning techniques. However, current state-of-the-art approaches do not incorporate a strategy to exploit the full temporal information of the k-space acquisition which would aid in producing higher quality reconstructions. In this paper, we propose a novel method for exploiting the full temporal dynamics for dynamic MRI reconstructions. Specifically, motion estimates are derived from undersampled MRI sequences. These are used to fuse data along the entire temporal axis to produce a novel data-consistent motion-augmented cine (DC-MAC). This is generated and utilised within an end-to-end trainable deep learning framework for MRI reconstruction. In particular, we find that for aggressive acceleration rates of $$\times 51.2 $$ on our cardiac dataset, our method with 3-fold cross-validation, ME-CNN, outperforms the current widely-accepted state-of-the-art, DC-CNN, with an improvement of 12% and 16% in PSNR and SSIM respectively. We report an average PSNR of $$27.3\pm 2.5$$ and SSIM of $$0.776\pm 0.054$$ . We also explore the robustness of using ME-CNN for unseen, out-of-domain examples.

Gavin Seegoolam, Jo Schlemper, Chen Qin, Anthony Price, Jo Hajnal, Daniel Rueckert
VS-Net: Variable Splitting Network for Accelerated Parallel MRI Reconstruction

In this work, we propose a deep learning approach for parallel magnetic resonance imaging (MRI) reconstruction, termed a variable splitting network (VS-Net), for an efficient, high-quality reconstruction of undersampled multi-coil MR data. We formulate the generalized parallel compressed sensing reconstruction as an energy minimization problem, for which a variable splitting optimization method is derived. Based on this formulation we propose a novel, end-to-end trainable deep neural network architecture by unrolling the resulting iterative process of such variable splitting scheme. VS-Net is evaluated on complex valued multi-coil knee images for 4-fold and 6-fold acceleration factors. We show that VS-Net outperforms state-of-the-art deep learning reconstruction algorithms, in terms of reconstruction accuracy and perceptual quality. Our code is publicly available at https://github.com/j-duan/VS-Net .

Jinming Duan, Jo Schlemper, Chen Qin, Cheng Ouyang, Wenjia Bai, Carlo Biffi, Ghalib Bello, Ben Statton, Declan P. O’Regan, Daniel Rueckert
A Novel Loss Function Incorporating Imaging Acquisition Physics for PET Attenuation Map Generation Using Deep Learning

In PET/CT imaging, CT is used for PET attenuation correction (AC). Mismatch between CT and PET due to patient body motion results in AC artifacts. In addition, artifact caused by metal, beam-hardening and count-starving in CT itself also introduces inaccurate AC for PET. Maximum likelihood reconstruction of activity and attenuation (MLAA) was proposed to solve those issues by simultaneously reconstructing tracer activity (λ-MLAA) and attenuation map (μ-MLAA) based on the PET raw data only. However, μ-MLAA suffers from high noise and λ-MLAA suffers from large bias as compared to the reconstruction using the CT-based attenuation map (μ-CT). Recently, a convolutional neural network (CNN) was applied to predict the CT attenuation map (μ-CNN) from λ-MLAA and μ-MLAA, in which an image-domain loss (IM-loss) function between the μ-CNN and the ground truth μ-CT was used. However, IM-loss does not directly measure the AC errors according to the PET attenuation physics, where the line-integral projection of the attenuation map (μ) along the path of the two annihilation events, instead of the μ itself, is used for AC. Therefore, a network trained with the IM-loss may yield suboptimal performance in the μ generation. Here, we propose a novel line-integral projection loss (LIP-loss) function that incorporates the PET attenuation physics for μ generation. Eighty training and twenty testing datasets of whole-body 18F-FDG PET and paired ground truth μ-CT were used. Quantitative evaluations showed that the model trained with the additional LIP-loss was able to significantly outperform the model trained solely based on the IM-loss function.

Luyao Shi, John A. Onofrey, Enette Mae Revilla, Takuya Toyonaga, David Menard, Joseph Ankrah, Richard E. Carson, Chi Liu, Yihuan Lu
A Prior Learning Network for Joint Image and Sensitivity Estimation in Parallel MR Imaging

Parallel imaging is a fast magnetic resonance imaging technique through spatial sensitivity coding using multi-coils. To reconstruct a high quality MR image from under-sampled k-space data, we propose a novel deep network, dubbed as Blind-PMRI-Net, to simultaneously reconstruct the MR image and sensitivity maps in a blind setting for parallel imaging. The Blind-PMRI-Net is a novel deep architecture inspired by the iterative algorithm optimizing a novel energy model for joint image and sensitivity estimation based on image and sensitivity priors. The network is designed to be able to automatically learn these two priors by learning their corresponding proximal operators using convolutional neural networks. Blind-PMRI-Net naturally combines the physical constraint of parallel imaging and prior learning in a single deep architecture. Experiments on a knee MRI dataset show that our network can effectively reconstruct MR image with improved accuracy than previous methods, with fast computational speed. For example, Blind-PMRI-Net takes 0.72 s on GPU to reconstruct 15-channel sensitivity maps and a complex-valued MR image in size of $$320\times 320$$ .

Nan Meng, Yan Yang, Zongben Xu, Jian Sun
Consensus Neural Network for Medical Imaging Denoising with Only Noisy Training Samples

Deep neural networks have been proved efficient for medical image denoising. Current training methods require both noisy and clean images. However, clean images cannot be acquired for many practical medical applications due to naturally noisy signal, such as dynamic imaging, spectral computed tomography, arterial spin labeling magnetic resonance imaging, etc. In this paper we proposed a training method which learned denoising neural networks from noisy training samples only. Training data in the acquisition domain was split to two subsets and the network was trained to map one noisy set to the other. A consensus loss function was further proposed to efficiently combine the outputs from both subsets. A mathematical proof was provided that the proposed training scheme was equivalent to training with noisy and clean samples when the noise in the two subsets was uncorrelated and zero-mean. The method was validated on Low-dose CT Challenge dataset and NYU MRI dataset and achieved improved performance compared to existing unsupervised methods.

Dufan Wu, Kuang Gong, Kyungsang Kim, Xiang Li, Quanzheng Li
Consistent Brain Ageing Synthesis

Brain ageing is associated with morphological changes and cognitive degeneration, and can be affected by neurodegenerative diseases which can accelerate the ageing process. The ability to separate accelerated from healthy ageing is useful from a diagnostic perspective and towards developing subject-specific models of progression. In this paper we start with the ‘simpler’ problem of synthesising age-progressed 2D slices. We adopt adversarial training to learn the joint distribution of brain images and ages, and simulate aged images by a network conditioned on age (a continuous variable) encoded as an ordinal embedding vector. We introduce a loss to help preserve subject identity despite that we train with cross-sectional (unpaired) data. To evaluate the quality of aged images, a pre-trained age predictor is used to estimate an apparent age. We show qualitatively and quantitatively that our method can progressively synthesise realistic brain images of different target ages.

Tian Xia, Agisilaos Chartsias, Sotirios A. Tsaftaris, for the Alzheimer’s Disease Neuroimaging Initiative
Hybrid Generative Adversarial Networks for Deep MR to CT Synthesis Using Unpaired Data

Many different methods have been proposed for generation of synthetic CT from MR images. Most of these methods depend on pairwise aligned MR and CT training images of the same patient, which are difficult to obtain. 2D cycle-consistent Generative Adversarial Networks (2D-cGAN) have been explored before for generating synthetic CTs from MR images but the results are not satisfied due to spatial inconsistency. There exists attempt to develop 3D cycle GAN (3D-cGAN) for image translation but its training requires large number of data which may not be always available. In this paper, we introduce two novel mechanisms to address above mentioned problems. First, we introduce a hybrid GAN (hGAN) consisting of a 3D generator network and a 2D discriminator network for deep MR to CT synthesis using unpaired data. We use 3D fully convolutional networks to form the generator, which can better model the 3D spatial information and thus could solve the discontinuity problem across slices. Second, we take the results generated from the 2D-cGAN as weak labels, which will be used together with an adversarial training strategy to encourage the generator’s 3D output to look like a stack of real CT slices as much as possible. Experimental results demonstrated that our approach achieved better results than the state-of-the-art when limited number of unpaired data are available.

Guodong Zeng, Guoyan Zheng
Arterial Spin Labeling Images Synthesis via Locally-Constrained WGAN-GP Ensemble

Arterial spin labeling (ASL) images begin to receive much popularity in dementia diseases diagnosis recently, yet it is still not commonly seen in well-established image datasets for investigating dementia diseases. Hence, synthesizing ASL images from available data is worthy of investigations. In this study, a novel locally-constrained WGAN-GP model ensemble is proposed to realize ASL images synthesis from structural MRI for the first time. Technically, this new WGAN-GP model ensemble is unique in its constrained optimization task, in which diverse local constraints are incorporated. In this way, more details of synthesized ASL images can be obtained after incorporating local constraints in this new ensemble. The effectiveness of the new WGAN-GP model ensemble for synthesizing ASL images has been substantiated both qualitatively and quantitatively through rigorous experiments in this study. Comprehensive analyses reveal that, this new WGAN-GP model ensemble is superior to several state-of-the-art GAN-based models in synthesizing ASL images from structural MRI in this study.

Wei Huang, Mingyuan Luo, Xi Liu, Peng Zhang, Huijun Ding, Dong Ni
SkrGAN: Sketching-Rendering Unconditional Generative Adversarial Networks for Medical Image Synthesis

Generative Adversarial Networks (GANs) have the capability of synthesizing images, which have been successfully applied to medical image synthesis tasks. However, most of existing methods merely consider the global contextual information and ignore the fine foreground structures, e.g., vessel, skeleton, which may contain diagnostic indicators for medical image analysis. Inspired by human painting procedure, which is composed of stroking and color rendering steps, we propose a Sketching-rendering Unconditional Generative Adversarial Network (SkrGAN) to introduce a sketch prior constraint to guide the medical image generation. In our SkrGAN, a sketch guidance module is utilized to generate a high quality structural sketch from random noise, then a color render mapping is used to embed the sketch-based representations and resemble the background appearances. Experimental results show that the proposed SkrGAN achieves the state-of-the-art results in synthesizing images for various image modalities, including retinal color fundus, X-Ray, Computed Tomography (CT) and Magnetic Resonance Imaging (MRI). In addition, we also show that the performances of medical image segmentation method has been improved by using our synthesized images as data augmentation.

Tianyang Zhang, Huazhu Fu, Yitian Zhao, Jun Cheng, Mengjie Guo, Zaiwang Gu, Bing Yang, Yuting Xiao, Shenghua Gao, Jiang Liu
Wavelet-based Semi-supervised Adversarial Learning for Synthesizing Realistic 7T from 3T MRI

Ultra-high field 7T magnetic resonance imaging (MRI) scanners produce images with exceptional anatomical details, which can facilitate diagnosis and prognosis. However, 7T MRI scanners are often cost prohibitive and hence inaccessible. In this paper, we propose a novel wavelet-based semi-supervised adversarial learning framework to synthesize 7T MR images from their 3T counterparts. Unlike most learning methods that rely on supervision requiring a significant amount of 3T-7T paired data, our method applies a semi-supervised learning mechanism to leverage unpaired 3T and 7T MR images to learn the 3T-to-7T mapping when 3T-7T paired data are scarce. This is achieved via a cycle generative adversarial network that operates in the joint spatial-wavelet domain for the synthesis of multi-frequency details. Extensive experimental results show that our method achieves better performance than state-of-the-art methods trained using fully paired data.

Liangqiong Qu, Shuai Wang, Pew-Thian Yap, Dinggang Shen
DiamondGAN: Unified Multi-modal Generative Adversarial Networks for MRI Sequences Synthesis

Synthesizing MR imaging sequences is highly relevant in clinical practice, as single sequences are often missing or are of poor quality (e.g. due to motion). Naturally, the idea arises that a target modality would benefit from multi-modal input, as proprietary information of individual modalities can be synergistic. However, existing methods fail to scale up to multiple non-aligned imaging modalities, facing common drawbacks of complex imaging sequences. We propose a novel, scalable and multi-modal approach called DiamondGAN. Our model is capable of performing flexible non-aligned cross-modality synthesis and data infill, when given multiple modalities or any of their arbitrary subsets, learning structured information in an end-to-end fashion. We synthesize two MRI sequences with clinical relevance (i.e., double inversion recovery (DIR) and contrast-enhanced T1 (T1-c)), reconstructed from three common sequences. In addition, we perform a multi-rater visual evaluation experiment and find that trained radiologists are unable to distinguish synthetic DIR images from real ones.

Hongwei Li, Johannes C. Paetzold, Anjany Sekuboyina, Florian Kofler, Jianguo Zhang, Jan S. Kirschke, Benedikt Wiestler, Bjoern Menze
Backmatter
Metadata
Title
Medical Image Computing and Computer Assisted Intervention – MICCAI 2019
Editors
Dinggang Shen
Prof. Tianming Liu
Dr. Terry M. Peters
Lawrence H. Staib
Caroline Essert
Sean Zhou
Prof. Dr. Pew-Thian Yap
Ali Khan
Copyright Year
2019
Electronic ISBN
978-3-030-32251-9
Print ISBN
978-3-030-32250-2
DOI
https://doi.org/10.1007/978-3-030-32251-9

Premium Partner