scroll identifier for mobile
main-content

The four-volume set LNCS 11070, 11071, 11072, and 11073 constitutes the refereed proceedings of the 21st International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2018, held in Granada, Spain, in September 2018.

The 373 revised full papers presented were carefully reviewed and selected from 1068 submissions in a double-blind review process. The papers have been organized in the following topical sections:
Part I: Image Quality and Artefacts; Image Reconstruction Methods; Machine Learning in Medical Imaging; Statistical Analysis for Medical Imaging; Image Registration Methods.
Part II: Optical and Histology Applications: Optical Imaging Applications; Histology Applications; Microscopy Applications; Optical Coherence Tomography and Other Optical Imaging Applications. Cardiac, Chest and Abdominal Applications: Cardiac Imaging Applications: Colorectal, Kidney and Liver Imaging Applications; Lung Imaging Applications; Breast Imaging Applications; Other Abdominal Applications.
Part III: Diffusion Tensor Imaging and Functional MRI: Diffusion Tensor Imaging; Diffusion Weighted Imaging; Functional MRI; Human Connectome. Neuroimaging and Brain Segmentation Methods: Neuroimaging; Brain Segmentation Methods.
Part IV: Computer Assisted Intervention: Image Guided Interventions and Surgery; Surgical Planning, Simulation and Work Flow Analysis; Visualization and Augmented Reality. Image Segmentation Methods: General Image Segmentation Methods, Measures and Applications; Multi-Organ Segmentation; Abdominal Segmentation Methods; Cardiac Segmentation Methods; Chest, Lung and Spine Segmentation; Other Segmentation Applications.

### Uncertainty in Multitask Learning: Joint Representations for Probabilistic MR-only Radiotherapy Planning

Multi-task neural network architectures provide a mechanism that jointly integrates information from distinct sources. It is ideal in the context of MR-only radiotherapy planning as it can jointly regress a synthetic CT (synCT) scan and segment organs-at-risk (OAR) from MRI. We propose a probabilistic multi-task network that estimates: (1) intrinsic uncertainty through a heteroscedastic noise model for spatially-adaptive task loss weighting and (2) parameter uncertainty through approximate Bayesian inference. This allows sampling of multiple segmentations and synCTs that share their network representation. We test our model on prostate cancer scans and show that it produces more accurate and consistent synCTs with a better estimation in the variance of the errors, state of the art results in OAR segmentation and a methodology for quality assurance in radiotherapy treatment planning.

Felix J. S. Bragman, Ryutaro Tanno, Zach Eaton-Rosen, Wenqi Li, David J. Hawkes, Sebastien Ourselin, Daniel C. Alexander, Jamie R. McClelland, M. Jorge Cardoso

### A Combined Simulation and Machine Learning Approach for Image-Based Force Classification During Robotized Intravitreal Injections

Intravitreal injection is one of the most common treatment strategies for chronic ophthalmic diseases. The last decade has seen the number of intravitreal injections dramatically increase, and with it, adverse effects and limitations. To overcome these issues, medical assistive devices for robotized injections have been proposed and are projected to improve delivery mechanisms for new generation of pharmacological solutions. In our work, we propose a method aimed at improving the safety features of such envisioned robotic systems. Our vision-based method uses a combination of 2D OCT data, numerical simulation and machine learning to estimate the range of the force applied by an injection needle on the sclera. We build a Neural Network (NN) to predict force ranges from Optical Coherence Tomography (OCT) images of the sclera directly. To avoid the need of large training data sets, the NN is trained on images of simulated deformed sclera. We validate our approach on real OCT images collected on five ex vivo porcine eyes using a robotically-controlled needle. Results show that the applied force range can be predicted with $$94\%$$ accuracy. Being real-time, this solution can be integrated in the control loop of the system, allowing for in-time withdrawal of the needle.

Andrea Mendizabal, Tatiana Fountoukidou, Jan Hermann, Raphael Sznitman, Stephane Cotin

### Learning from Noisy Label Statistics: Detecting High Grade Prostate Cancer in Ultrasound Guided Biopsy

The ubiquity of noise is an important issue for building computer-aided diagnosis models for prostate cancer biopsy guidance where the histopathology data is sparse and not finely annotated. We propose a solution to alleviate this challenge as a part of Temporal Enhanced Ultrasound (TeUS)-based prostate cancer biopsy guidance method. Specifically, we embed the prior knowledge from the histopathology as the soft labels in a two-stage model, to leverage the problem of diverse label noise in the ground-truth. We then use this information to accurately detect the grade of cancer and also to estimate the length of cancer in the target. Additionally, we create a Bayesian probabilistic version of our network, which allows evaluation of model uncertainty that can lead to any possible misguidance during the biopsy procedure. In an in vivo study with 155 patients, we analyze data from 250 suspicious cancer foci obtained during fusion biopsy. We achieve the average area under the curve of 0.84 for cancer grading and mean squared error of 0.12 in the estimation of tumor in biopsy core length.

Shekoofeh Azizi, Pingkun Yan, Amir Tahmasebi, Peter Pinto, Bradford Wood, Jin Tae Kwak, Sheng Xu, Baris Turkbey, Peter Choyke, Parvin Mousavi, Purang Abolmaesumi

### A Feature-Driven Active Framework for Ultrasound-Based Brain Shift Compensation

A reliable Ultrasound (US)-to-US registration method to compensate for brain shift would substantially improve Image-Guided Neurological Surgery. Developing such a registration method is very challenging, due to factors such as the tumor resection, the complexity of brain pathology and the demand for fast computation. We propose a novel feature-driven active registration framework. Here, landmarks and their displacement are first estimated from a pair of US images using corresponding local image features. Subsequently, a Gaussian Process (GP) model is used to interpolate a dense deformation field from the sparse landmarks. Kernels of the GP are estimated by using variograms and a discrete grid search method. If necessary, the user can actively add new landmarks based on the image context and visualization of the uncertainty measure provided by the GP to further improve the result. We retrospectively demonstrate our registration framework as a robust and accurate brain shift compensation solution on clinical data.

Jie Luo, Matthew Toews, Ines Machado, Sarah Frisken, Miaomiao Zhang, Frank Preiswerk, Alireza Sedghi, Hongyi Ding, Steve Pieper, Polina Golland, Alexandra Golby, Masashi Sugiyama, William M. Wells III

### Soft-Body Registration of Pre-operative 3D Models to Intra-operative RGBD Partial Body Scans

We present a novel solution to soft-body registration between a pre-operative 3D patient model and an intra-operative surface mesh of the patient lying on the operating table, acquired using an inexpensive and portable depth (RGBD) camera. The solution has several clinical applications, including skin dose mapping in interventional radiology and intra-operative image guidance. We propose to solve this with a robust non-rigid registration algorithm that handles partial surface data, significant posture modification and patient-table collisions. We investigate several unstudied and important aspects of this registration problem. These are the benefits of heterogeneous versus homogeneous biomechanical models and the benefits of modeling patient/table interaction as collision constraints. We also study how abdominal registration accuracy varies as a function of scan length in the caudal-cranial axis.

Richard Modrzejewski, Toby Collins, Adrien Bartoli, Alexandre Hostettler, Jacques Marescaux

### Automatic Classification of Cochlear Implant Electrode Cavity Positioning

Cochlear Implants (CIs) restore hearing using an electrode array that is surgically implanted into the intra-cochlear cavities. Research has indicated that each electrode can lie in one of several cavities and that location is significantly associated with hearing outcomes. However, comprehensive analysis of this phenomenon has not been possible because the cavities are not directly visible in clinical CT images and because existing methods to estimate cavity location are not accurate enough, labor intensive, or their accuracy has not been validated. In this work, a novel graph-based search is presented to automatically identify the cavity in which each electrode is located. We test our approach on CT scans from a set of 34 implanted temporal bone specimens. High resolution µCT scans of the specimens, where cavities are visible, show our method to have 98% cavity classification accuracy. These results indicate that our methods could be used on a large scale to study the link between electrode placement and outcome, which could lead to advances that improve hearing outcomes for CI users.

Jack H. Noble, Robert F. Labadie, Benoit M. Dawant

### X-ray-transform Invariant Anatomical Landmark Detection for Pelvic Trauma Surgery

X-ray image guidance enables percutaneous alternatives to complex procedures. Unfortunately, the indirect view onto the anatomy in addition to projective simplification substantially increase the task-load for the surgeon. Additional 3D information such as knowledge of anatomical landmarks can benefit surgical decision making in complicated scenarios. Automatic detection of these landmarks in transmission imaging is challenging since image-domain features characteristic to a certain landmark change substantially depending on the viewing direction. Consequently and to the best of our knowledge, the above problem has not yet been addressed. In this work, we present a method to automatically detect anatomical landmarks in X-ray images independent of the viewing direction. To this end, a sequential prediction framework based on convolutional layers is trained on synthetically generated data of the pelvic anatomy to predict 23 landmarks in single X-ray images. View independence is contingent on training conditions and, here, is achieved on a spherical segment covering 120 $${^\circ }\times$$ 90 $${^\circ }$$ in LAO/RAO and CRAN/CAUD, respectively, centered around AP. On synthetic data, the proposed approach achieves a mean prediction error of $$5.6\pm 4.5$$ mm. We demonstrate that the proposed network is immediately applicable to clinically acquired data of the pelvis. In particular, we show that our intra-operative landmark detection together with pre-operative CT enables X-ray pose estimation which, ultimately, benefits initialization of image-based 2D/3D registration.

Bastian Bier, Mathias Unberath, Jan-Nico Zaech, Javad Fotouhi, Mehran Armand, Greg Osgood, Nassir Navab, Andreas Maier

### Endoscopic Navigation in the Absence of CT Imaging

Clinical examinations that involve endoscopic exploration of the nasal cavity and sinuses often do not have a reference image to provide structural context to the clinician. In this paper, we present a system for navigation during clinical endoscopic exploration in the absence of computed tomography (CT) scans by making use of shape statistics from past CT scans. Using a deformable registration algorithm along with dense reconstructions from video, we show that we are able to achieve submillimeter registrations in in-vivo clinical data and are able to assign confidence to these registrations using confidence criteria established using simulated data.

Ayushi Sinha, Xingtong Liu, Austin Reiter, Masaru Ishii, Gregory D. Hager, Russell H. Taylor

### A Novel Mixed Reality Navigation System for Laparoscopy Surgery

Jagadeesan Jayender, Brian Xavier, Franklin King, Ahmed Hosny, David Black, Steve Pieper, Ali Tavakkoli

### Respiratory Motion Modelling Using cGANs

Respiratory motion models in radiotherapy are considered as one possible approach for tracking mobile tumours in the thorax and abdomen with the goal to ensure target coverage and dose conformation. We present a patient-specific motion modelling approach which combines navigator-based 4D MRI with recent developments in deformable image registration and deep neural networks. The proposed regression model based on conditional generative adversarial nets (cGANs) is trained to learn the relation between temporally related US and MR navigator images. Prior to treatment, simultaneous ultrasound (US) and 4D MRI data is acquired. During dose delivery, online US imaging is used as surrogate to predict complete 3D MR volumes of different respiration states ahead of time. Experimental validations on three volunteer lung datasets demonstrate the potential of the proposed model both in terms of qualitative and quantitative results, and computational time required.

Alina Giger, Robin Sandkühler, Christoph Jud, Grzegorz Bauman, Oliver Bieri, Rares Salomir, Philippe C. Cattin

### Physics-Based Simulation to Enable Ultrasound Monitoring of HIFU Ablation: An MRI Validation

High intensity focused ultrasound (HIFU) is used to ablate pathological tissue non-invasively, but reliable and real-time thermal monitoring is crucial to ensure a safe and effective procedure. It can be provided by MRI, which is an expensive and cumbersome modality.We propose a monitoring method that enables real-time assessment of temperature distribution by combining intra-operative ultrasound (US) with physics-based simulation. During the ablation, changes in acoustic properties due to rising temperature are monitored using an external US sensor. A physics-based HIFU simulation model is then used to generate 3D temperature maps at high temporal and spatial resolutions. Our method leverages current HIFU systems with external low-cost and MR-compatible US sensors, thus allowing its validation against MR thermometry, the gold-standard clinical temperature monitoring method.We demonstrated in silico the method feasibility, performed sensitivity analysis and showed experimentally its applicability on phantom data using a clinical HIFU system. Promising results were obtained: a mean temperature error smaller than $$1.5\,^\circ {\text {C}}$$ was found in four experiments.

Chloé Audigier, Younsu Kim, Nicholas Ellens, Emad M. Boctor

### DeepDRR – A Catalyst for Machine Learning in Fluoroscopy-Guided Procedures

Machine learning-based approaches outperform competing methods in most disciplines relevant to diagnostic radiology. Interventional radiology, however, has not yet benefited substantially from the advent of deep learning, in particular because of two reasons: (1) Most images acquired during the procedure are never archived and are thus not available for learning, and (2) even if they were available, annotations would be a severe challenge due to the vast amounts of data. When considering fluoroscopy-guided procedures, an interesting alternative to true interventional fluoroscopy is in silico simulation of the procedure from 3D diagnostic CT. In this case, labeling is comparably easy and potentially readily available, yet, the appropriateness of resulting synthetic data is dependent on the forward model. In this work, we propose DeepDRR, a framework for fast and realistic simulation of fluoroscopy and digital radiography from CT scans, tightly integrated with the software platforms native to deep learning. We use machine learning for material decomposition and scatter estimation in 3D and 2D, respectively, combined with analytic forward projection and noise injection to achieve the required performance. On the example of anatomical landmark detection in X-ray images of the pelvis, we demonstrate that machine learning models trained on DeepDRRs generalize to unseen clinically acquired data without the need for re-training or domain adaptation. Our results are promising and promote the establishment of machine learning in fluoroscopy-guided procedures.

Mathias Unberath, Jan-Nico Zaech, Sing Chun Lee, Bastian Bier, Javad Fotouhi, Mehran Armand, Nassir Navab

### Exploiting Partial Structural Symmetry for Patient-Specific Image Augmentation in Trauma Interventions

In unilateral pelvic fracture reductions, surgeons attempt to reconstruct the bone fragments such that bilateral symmetry in the bony anatomy is restored. We propose to exploit this “structurally symmetric” nature of the pelvic bone, and provide intra-operative image augmentation to assist the surgeon in repairing dislocated fragments. The main challenge is to automatically estimate the desired plane of symmetry within the patient’s pre-operative CT. We propose to estimate this plane using a non-linear optimization strategy, by minimizing Tukey’s biweight robust estimator, relying on the partial symmetry of the anatomy. Moreover, a regularization term is designed to enforce the similarity of bone density histograms on both sides of this plane, relying on the biological fact that, even if injured, the dislocated bone segments remain within the body. The experimental results demonstrate the performance of the proposed method in estimating this “plane of partial symmetry” using CT images of both healthy and injured anatomy. Examples of unilateral pelvic fractures are used to show how intra-operative X-ray images could be augmented with the forward-projections of the mirrored anatomy, acting as objective road-map for fracture reduction procedures.

Javad Fotouhi, Mathias Unberath, Giacomo Taylor, Arash Ghaani Farashahi, Bastian Bier, Russell H. Taylor, Greg M. Osgood, Mehran Armand, Nassir Navab

### Intraoperative Brain Shift Compensation Using a Hybrid Mixture Model

Brain deformation (or brain shift) during neurosurgical procedures such as tumor resection has a significant impact on the accuracy of neuronavigation systems. Compensating for this deformation during surgery is essential for effective guidance. In this paper, we propose a method for brain shift compensation based on registration of vessel centerlines derived from preoperative C-Arm cone beam CT (CBCT) images, to intraoperative ones. A hybrid mixture model (HdMM)-based non-rigid registration approach was formulated wherein, Student’s t and Watson distributions were combined to model positions and centerline orientations of cerebral vasculature, respectively. Following registration of the preoperative vessel centerlines to its intraoperative counterparts, B-spline interpolation was used to generate a dense deformation field and warp the preoperative image to each intraoperative image acquired. Registration accuracy was evaluated using both synthetic and clinical data. The former comprised CBCT images, acquired using a deformable anthropomorphic brain phantom. The latter meanwhile, consisted of four 3D digital subtraction angiography (DSA) images of one patient, acquired before, during and after surgical tumor resection. HdMM consistently outperformed a state-of-the-art point matching method, coherent point drift (CPD), resulting in significantly lower registration errors. For clinical data, the registration error was reduced from 3.73 mm using CPD to 1.55 mm using the proposed method.

Siming Bayer, Nishant Ravikumar, Maddalena Strumia, Xiaoguang Tong, Ying Gao, Martin Ostermeier, Rebecca Fahrig, Andreas Maier

### Video-Based Computer Aided Arthroscopy for Patient Specific Reconstruction of the Anterior Cruciate Ligament

The Anterior Cruciate Ligament tear is a common medical condition that is treated using arthroscopy by pulling a tissue graft through a tunnel opened with a drill. The correct anatomical position and orientation of this tunnel is crucial for knee stability, and drilling an adequate bone tunnel is the most technically challenging part of the procedure. This paper presents, for the first time, a guidance system based solely on intra-operative video for guiding the drilling of the tunnel. Our solution uses small, easily recognizable visual markers that are attached to the bone and tools for estimating their relative pose. A recent registration algorithm is employed for aligning a pre-operative image of the patient’s anatomy with a set of contours reconstructed by touching the bone surface with an instrumented tool. Experimental validation using ex-vivo data shows that the method enables the accurate registration of the pre-operative model with the bone, providing useful information for guiding the surgeon during the medical procedure.

Carolina Raposo, Cristóvão Sousa, Luis Ribeiro, Rui Melo, João P. Barreto, João Oliveira, Pedro Marques, Fernando Fonseca

### Simultaneous Segmentation and Classification of Bone Surfaces from Ultrasound Using a Multi-feature Guided CNN

Various imaging artifacts, low signal-to-noise ratio, and bone surfaces appearing several millimeters in thickness have hindered the success of ultrasound (US) guided computer assisted orthopedic surgery procedures. In this work, a multi-feature guided convolutional neural network (CNN) architecture is proposed for simultaneous enhancement, segmentation, and classification of bone surfaces from US data. The proposed CNN consists of two main parts: a pre-enhancing net, that takes the concatenation of B-mode US scan and three filtered image features for the enhancement of bone surfaces, and a modified U-net with a classification layer. The proposed method was validated on 650 in vivo US scans collected using two US machines, by scanning knee, femur, distal radius and tibia bones. Validation, against expert annotation, achieved statistically significant improvements in segmentation of bone surfaces compared to state-of-the-art.

Puyang Wang, Vishal M. Patel, Ilker Hacihaliloglu

### Endoscopic Laser Surface Scanner for Minimally Invasive Abdominal Surgeries

Minimally invasive surgery performed under endoscopic video is a viable alternative to several types of open abdominal surgeries. Advanced visualization techniques require accurate patient registration, often facilitated by reconstruction of the organ surface in situ. We present an active system for intraoperative surface reconstruction of internal organs, comprising a single-plane laser as the structured light source and a surgical endoscope camera as the imaging system. Both surgical instruments are spatially calibrated and tracked, after which the surface reconstruction is formulated as the intersection problem between line-of-sight rays (from the surgical camera) and the laser beam. Surface target registration error after a rigid-body surface registration between the scanned 3D points to the ground truth obtained via CT is reported. When tested on an ex vivo porcine liver and kidney, root-mean-squared surface target registration error of 1.28 mm was achieved. Accurate endoscopic surface reconstruction is possible by using two separately calibrated and tracked surgical instruments, where the trigonometry between the structured light, imaging system, and organ surface can be optimized. Our novelty is the accurate calibration technique for the tracked laser beam, and the design and the construction of laser apparatus designed for robotic-assisted surgery.

Jordan Geurten, Wenyao Xia, Uditha Jayarathne, Terry M. Peters, Elvis C. S. Chen

### Deep Adversarial Context-Aware Landmark Detection for Ultrasound Imaging

Real-time prostate gland localization in trans-rectal ultrasound images is required for automated ultrasound guided prostate biopsy procedures. We propose a new deep learning based approach aimed at localizing several prostate landmarks efficiently and robustly. Our multitask learning approach primarily makes the overall algorithm more contextually aware. In this approach, we not only consider the explicit learning of landmark locations, but also build-in a mechanism to learn the contour of the prostate. This multitask learning is further coupled with an adversarial arm to promote the generation of feasible structures. We have trained this network using $$\sim$$ 4000 labeled trans-rectal ultrasound images and tested on an independent set of images with ground truth landmark locations. We have achieved an overall Dice score of 92.6% for the adversarially trained multitask approach, which is significantly better than the Dice score of 88.3% obtained by only learning of landmark locations. The overall mean distance error using the adversarial multitask approach has also improved by 20% while reducing the standard deviation of the error compared to learning landmark locations only. In terms of computational complexity both approaches can process the images in real-time using a standard computer with a CUDA enabled GPU.

Ahmet Tuysuzoglu, Jeremy Tan, Kareem Eissa, Atilla P. Kiraly, Mamadou Diallo, Ali Kamen

### Towards a Fast and Safe LED-Based Photoacoustic Imaging Using Deep Convolutional Neural Network

The current standard photoacoustic (PA) technology is based on heavy, expensive and hazardous laser system for excitation of a tissue sample. As an alternative, light emitting diode (LED) offers safe, compact and inexpensive light source. However, the PA images of an LED-based system significantly suffer from low signal-to-noise-ratio due to limited LED-power. With an aim to improve the quality of PA images, in this work we propose to use deep convolutional neural networks that is built upon a previous state-of-the-art image enhancement approach. The key contribution is to improve the optimization of the network by guiding its feature extraction at different layers of the architecture. In addition to using a high quality target image at the output of the network, multiple target images with intermediate qualities are employed at in-betweens layers of the architecture to guide the feature extraction. We perform an end-to-end training of the network using a set of 4,536 low quality PA images from 24 experiments. On the test set from 15 experiments, we achieve a mean peak signal-to-noise ratio of 34.5 dB and a mean structural similarity index of 0.86 with a gain in the frame rate of 6 times compared to the conventional approach.

Emran Mohammad Abu Anas, Haichong K. Zhang, Jin Kang, Emad M. Boctor

### An Open Framework Enabling Electromagnetic Tracking in Image-Guided Interventions

Electromagnetic tracking (EMT) is a core platform technology in the navigation and visualisation of image-guided procedures. The technology provides high tracking accuracy in non-line-of-sight environments, allowing instrument navigation in locations where optical tracking is not feasible. Integration of EMT in complex procedures, often coupled with multi-modal imaging, is on the rise, yet the lack of flexibility in the available hardware platforms has been noted by many researchers and system designers. Advances in the field of EMT include novel methods of improving tracking system accuracy, precision and error compensation capabilities, though such system-level improvements cannot be readily incorporated in current therapy applications due to the ‘blackbox’ nature of commercial tracking solving algorithms. This paper defines a software framework to allow novel EMT designs and improvements become part of the global design process for image-guided interventions. In an effort to standardise EMT development, we define a generalised cross-platform software framework in terms of the four system functions common to all EMT systems; acquisition, filtering, modelling and solving. The interfaces between each software component are defined in terms of their input and output data structures. An exemplary framework is implemented in the Python programming language and demonstrated with the open-source Anser EMT system. Performance metrics are gathered from both Matlab and Python implementations of Anser EMT considering the host operating system, hardware configuration and acquisition settings used. Results show indicative system latencies of 5 ms can be achieved using the framework on a Windows operating system, with decreased system performance observed on UNIX-like platforms.

Herman Alexander Jaeger, Stephen Hinds, Pádraig Cantillon-Murphy

### Colon Shape Estimation Method for Colonoscope Tracking Using Recurrent Neural Networks

We propose an estimation method using a recurrent neural network (RNN) of the colon’s shape where deformation was occurred by a colonoscope insertion. Colonoscope tracking or a navigation system that navigates physician to polyp positions is needed to reduce such complications as colon perforation. Previous tracking methods caused large tracking errors at the transverse and sigmoid colons because these areas largely deform during colonoscope insertion. Colon deformation should be taken into account in tracking processes. We propose a colon deformation estimation method using RNN and obtain the colonoscope shape from electromagnetic sensors during its insertion into the colon. This method obtains positional, directional, and an insertion length from the colonoscope shape. From its shape, we also calculate the relative features that represent the positional and directional relationships between two points on a colonoscope. Long short-term memory is used to estimate the current colon shape from the past transition of the features of the colonoscope shape. We performed colon shape estimation in a phantom study and correctly estimated the colon shapes during colonoscope insertion with 12.39 (mm) estimation error.

Masahiro Oda, Holger R. Roth, Takayuki Kitasaka, Kasuhiro Furukawa, Ryoji Miyahara, Yoshiki Hirooka, Hidemi Goto, Nassir Navab, Kensaku Mori

### Towards Automatic Report Generation in Spine Radiology Using Weakly Supervised Framework

The objective of this work is to automatically generate unified reports of lumbar spinal MRIs in the field of radiology, i.e., given an MRI of a lumbar spine, directly generate a radiologist-level report to support clinical decision making. We show that this can be achieved via a weakly supervised framework that combines deep learning and symbolic program synthesis theory to overcome four inevitable tasks: semantic segmentation, radiological classification, positional labeling, and structural captioning. The weakly supervised framework using object level annotations without requiring radiologist-level report annotations to generate unified reports. Each generated report covers almost type lumbar structures comprised of six intervertebral discs, six neural foramina, and five lumbar vertebrae. The contents of each report contain the exact locations and pathological correlations of these lumbar structures as well as their normalities in terms of three type relevant spinal diseases: intervertebral disc degeneration, neural foraminal stenosis, and lumbar vertebrae deformities. This framework is applied to a large corpus of T1/T2-weighted sagittal MRIs of 253 subjects acquired from multiple vendors. Extensive experiments demonstrate that the framework is able to generate unified radiological reports, which reveals its effectiveness and potential as a clinical tool to relieve spinal radiologists from laborious workloads to a certain extent, such that contributes to relevant time savings and expedites the initiation of many specific therapies.

Zhongyi Han, Benzheng Wei, Stephanie Leung, Jonathan Chung, Shuo Li

### A Natural Language Interface for Dissemination of Reproducible Biomedical Data Science

Computational tools in the form of software packages are burgeoning in the field of medical imaging and biomedical research. These tools enable biomedical researchers to analyze a variety of data using modern machine learning and statistical analysis techniques. While these publicly available software packages are a great step towards a multiplicative increase in the biomedical research productivity, there are still many open issues related to validation and reproducibility of the results. A key gap is that while scientists can validate domain insights that are implicit in the analysis, the analysis itself is coded in a programming language and that domain scientist may not be a programmer. Thus, there is no/limited direct validation of the program that carries out the desired analysis. We propose a novel solution, building upon recent successes in natural language understanding, to address this problem. Our platform allows researchers to perform, share, reproduce and interpret the analysis pipelines and results via natural language. While this approach still requires users to have a conceptual understanding of the techniques, it removes the burden of programming syntax and thus lowers the barriers to advanced and reproducible neuroimaging and biomedical research.

Rogers Jeffrey Leo John, Jignesh M. Patel, Andrew L. Alexander, Vikas Singh, Nagesh Adluru

### Spatiotemporal Manifold Prediction Model for Anterior Vertebral Body Growth Modulation Surgery in Idiopathic Scoliosis

Anterior Vertebral Body Growth Modulation (AVBGM) is a minimally invasive surgical technique that gradually corrects spine deformities while preserving lumbar motion. However the selection of potential surgical patients is currently based on clinical judgment and would be facilitated by the identification of patients responding to AVBGM prior to surgery. We introduce a statistical framework for predicting the surgical outcomes following AVBGM in adolescents with idiopathic scoliosis. A discriminant manifold is first constructed to maximize the separation between responsive and non-responsive groups of patients treated with AVBGM for scoliosis. The model then uses subject-specific correction trajectories based on articulated transformations in order to map spine correction profiles to a group-average piecewise-geodesic path. Spine correction trajectories are described in a piecewise-geodesic fashion to account for varying times at follow-up exams, regressing the curve via a quadratic optimization process. To predict the evolution of correction, a baseline reconstruction is projected onto the manifold, from which a spatiotemporal regression model is built from parallel transport curves inferred from neighboring exemplars. The model was trained on 438 reconstructions and tested on 56 subjects using 3D spine reconstructions from follow-up exams, with the probabilistic framework yielding accurate results with differences of $$2.1\pm 0.6^{\circ }$$ in main curve angulation, and generating models similar to biomechanical simulations.

William Mandel, Olivier Turcot, Dejan Knez, Stefan Parent, Samuel Kadoury

### Evaluating Surgical Skills from Kinematic Data Using Convolutional Neural Networks

The need for automatic surgical skills assessment is increasing, especially because manual feedback from senior surgeons observing junior surgeons is prone to subjectivity and time consuming. Thus, automating surgical skills evaluation is a very important step towards improving surgical practice. In this paper, we designed a Convolutional Neural Network (CNN) to evaluate surgeon skills by extracting patterns in the surgeon motions performed in robotic surgery. The proposed method is validated on the JIGSAWS dataset and achieved very competitive results with 100% accuracy on the suturing and needle passing tasks. While we leveraged from the CNNs efficiency, we also managed to mitigate its black-box effect using class activation map. This feature allows our method to automatically highlight which parts of the surgical task influenced the skill prediction and can be used to explain the classification and to provide personalized feedback to the trainee.

Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, Pierre-Alain Muller

### Needle Tip Force Estimation Using an OCT Fiber and a Fused convGRU-CNN Architecture

Needle insertion is common during minimally invasive interventions such as biopsy or brachytherapy. During soft tissue needle insertion, forces acting at the needle tip cause tissue deformation and needle deflection. Accurate needle tip force measurement provides information on needle-tissue interaction and helps detecting and compensating potential misplacement. For this purpose we introduce an image-based needle tip force estimation method using an optical fiber imaging the deformation of an epoxy layer below the needle tip over time. For calibration and force estimation, we introduce a novel deep learning-based fused convolutional GRU-CNN model which effectively exploits the spatio-temporal data structure. The needle is easy to manufacture and our model achieves a mean absolute error of $$1.76\,\pm \,1.5$$ mN with a cross-correlation coefficient of 0.9996, clearly outperforming other methods. We test needles with different materials to demonstrate that the approach can be adapted for different sensitivities and force ranges. Furthermore, we validate our approach in an ex-vivo prostate needle insertion scenario.

Nils Gessert, Torben Priegnitz, Thore Saathoff, Sven-Thomas Antoni, David Meyer, Moritz Franz Hamann, Klaus-Peter Jünemann, Christoph Otte, Alexander Schlaefer

### Fast GPU Computation of 3D Isothermal Volumes in the Vicinity of Major Blood Vessels for Multiprobe Cryoablation Simulation

Percutaneous cryoablation is a minimally invasive procedure of hypothermia for the treatment of tumors. Several needles are inserted in the tumor through the skin, to create an iceball and kill the malignant cells. The procedure consists of several cycles alternating extreme freezing and thawing. This procedure is very complex to plan, as the iceball is formed from multiple needles and influenced by major blood vessels nearby, making its final shape very difficult to anticipate. For computer assistance to cryoablation planning, it is essential to predict accurately the final volume of necrosis. In this paper, a fast GPU implementation of 3D thermal propagation is presented based on heat transfer equation. Our approach accounts for the presence of major blood vessels in the vicinity of the iceball. The method is validated first in gel conditions, then on an actual retrospective patient case of renal cryoablation with complex vascular structure close to the tumor. The results show that the accuracy of our simulated iceball can help surgeons in surgical planning.

Ehsan Golkar, Pramod P. Rao, Leo Joskowicz, Afshin Gangi, Caroline Essert

### A Machine Learning Approach to Predict Instrument Bending in Stereotactic Neurosurgery

The accurate implantation of stereo-electroencephalography (SEEG) electrodes is crucial for localising the seizure onset zone in patients with refractory epilepsy. Electrode placement may differ from planning due to instrument deflection during surgical insertion. We present a regression-based model to predict instrument bending using image features extracted from structural and diffusion images. We compare three machine learning approaches: Random Forest, Feed-Forward Neural Network and Long Short-Term Memory on accuracy in predicting global instrument bending in the context of SEEG implantation. We segment electrodes from post-implantation CT scans and interpolate position at 1 mm intervals along the trajectory. Electrodes are modelled as elastic rods to quantify 3 degree-of-freedom (DOF) bending using Darboux vectors. We train our models to predict instrument bending from image features. We then iteratively infer instrument positions from the predicted bending. In 32 SEEG post-implantation cases we were able to predict trajectory position with a MAE of 0.49 mm using RF. Comparatively a FFNN had MAE of 0.71 mm and LSTM had a MAE of 0.93 mm.

Alejandro Granados, Matteo Mancini, Sjoerd B. Vos, Oeslle Lucena, Vejay Vakharia, Roman Rodionov, Anna Miserocchi, Andrew W. McEvoy, John S. Duncan, Rachel Sparks, Sébastien Ourselin

### Deep Reinforcement Learning for Surgical Gesture Segmentation and Classification

Recognition of surgical gesture is crucial for surgical skill assessment and efficient surgery training. Prior works on this task are based on either variant graphical models such as HMMs and CRFs, or deep learning models such as Recurrent Neural Networks and Temporal Convolutional Networks. Most of the current approaches usually suffer from over-segmentation and therefore low segment-level edit scores. In contrast, we present an essentially different methodology by modeling the task as a sequential decision-making process. An intelligent agent is trained using reinforcement learning with hierarchical features from a deep model. Temporal consistency is integrated into our action design and reward mechanism to reduce over-segmentation errors. Experiments on JIGSAWS dataset demonstrate that the proposed method performs better than state-of-the-art methods in terms of the edit score and on par in frame-wise accuracy. Our code will be released later.

Daochang Liu, Tingting Jiang

### Automated Performance Assessment in Transoesophageal Echocardiography with Convolutional Neural Networks

Transoesophageal echocardiography (TEE) is a valuable diagnostic and monitoring imaging modality. Proper image acquisition is essential for diagnosis, yet current assessment techniques are solely based on manual expert review. This paper presents a supervised deep learning framework for automatically evaluating and grading the quality of TEE images. To obtain the necessary dataset, 38 participants of varied experience performed TEE exams with a high-fidelity virtual reality (VR) platform. Two Convolutional Neural Network (CNN) architectures, AlexNet and VGG, structured to perform regression, were finetuned and validated on manually graded images from three evaluators. Two different scoring strategies, a criteria-based percentage and an overall general impression, were used. The developed CNN models estimate the average score with a root mean square accuracy ranging between 84% − 93%, indicating the ability to replicate expert valuation. Proposed strategies for automated TEE assessment can have a significant impact on the training process of new TEE operators, providing direct feedback and facilitating the development of the necessary dexterous skills.

Evangelos B. Mazomenos, Kamakshi Bansal, Bruce Martin, Andrew Smith, Susan Wright, Danail Stoyanov

### DeepPhase: Surgical Phase Recognition in CATARACTS Videos

Automated surgical workflow analysis and understanding can assist surgeons to standardize procedures and enhance post-surgical assessment and indexing, as well as, interventional monitoring. Computer-assisted interventional (CAI) systems based on video can perform workflow estimation through surgical instruments’ recognition while linking them to an ontology of procedural phases. In this work, we adopt a deep learning paradigm to detect surgical instruments in cataract surgery videos which in turn feed a surgical phase inference recurrent network that encodes temporal aspects of phase steps within the phase classification. Our models present comparable to state-of-the-art results for surgical tool detection and phase recognition with accuracies of 99 and 78% respectively.

Odysseas Zisimopoulos, Evangello Flouty, Imanol Luengo, Petros Giataganas, Jean Nehme, Andre Chow, Danail Stoyanov

### Surgical Activity Recognition in Robot-Assisted Radical Prostatectomy Using Deep Learning

Adverse surgical outcomes are costly to patients and hospitals. Approaches to benchmark surgical care are often limited to gross measures across the entire procedure despite the performance of particular tasks being largely responsible for undesirable outcomes. In order to produce metrics from tasks as opposed to the whole procedure, methods to recognize automatically individual surgical tasks are needed. In this paper, we propose several approaches to recognize surgical activities in robot-assisted minimally invasive surgery using deep learning. We collected a clinical dataset of 100 robot-assisted radical prostatectomies (RARP) with 12 tasks each and propose ‘RP-Net’, a modified version of InceptionV3 model, for image based surgical activity recognition. We achieve an average precision of 80.9% and average recall of 76.7% across all tasks using RP-Net which out-performs all other RNN and CNN based models explored in this paper. Our results suggest that automatic surgical activity recognition during RARP is feasible and can be the foundation for advanced analytics.

Aneeq Zia, Andrew Hung, Irfan Essa, Anthony Jarc

### Unsupervised Learning for Surgical Motion by Learning to Predict the Future

We show that it is possible to learn meaningful representations of surgical motion, without supervision, by learning to predict the future. An architecture that combines an RNN encoder-decoder and mixture density networks (MDNs) is developed to model the conditional distribution over future motion given past motion. We show that the learned encodings naturally cluster according to high-level activities, and we demonstrate the usefulness of these learned encodings in the context of information retrieval, where a database of surgical motion is searched for suturing activity using a motion-based query. Future prediction with MDNs is found to significantly outperform simpler baselines as well as the best previously-published result for this task, advancing state-of-the-art performance from an F1 score of $$0.60 \pm 0.14$$ to $$0.77 \pm 0.05$$ .

Robert DiPietro, Gregory D. Hager

### Volumetric Clipping Surface: Un-occluded Visualization of Structures Preserving Depth Cues into Surrounding Organs

Anatomies of interest are often hidden within data. In this paper, we address the limitations of visualizing them with a novel dynamic non-planar clipping of volumetric data, while preserving depth cues at adjacent structures to provide a visually consistent anatomical context, with no-user interaction. An un-occluded and un-modified display of the anatomies of interest is made possible. Given a semantic segmentation of the data, our technique computes a continuous clipping surface through the depth buffer of the structures of interest and extrapolates this depth onto surrounding contextual regions in real-time. We illustrate the benefit of this technique using Monte Carlo Ray Tracing (MCRT), in the visualization of deep seated anatomies with complex geometry across two modalities: (a) Knee Cartilage from MRI and (b) bones of the feet in CT. Our novel technique furthers the state of the art by enabling turnkey immediate appreciation of the pathologies in these structures with an unmodified rendering, while still providing a consistent anatomical context. We envisage our technique changing the way clinical applications present 3D data, by incorporating organ viewing presets, similar to transfer function presets for volume visualization.

### Closing the Calibration Loop: An Inside-Out-Tracking Paradigm for Augmented Reality in Orthopedic Surgery

In percutaneous orthopedic interventions the surgeon attempts to reduce and fixate fractures in bony structures. The complexity of these interventions arises when the surgeon performs the challenging task of navigating surgical tools percutaneously only under the guidance of 2D interventional X-ray imaging. Moreover, the intra-operatively acquired data is only visualized indirectly on external displays. In this work, we propose a flexible Augmented Reality (AR) paradigm using optical see-through head mounted displays. The key technical contribution of this work includes the marker-less and dynamic tracking concept which closes the calibration loop between patient, C-arm and the surgeon. This calibration is enabled using Simultaneous Localization and Mapping of the environment, i.e. the operating theater. In return, the proposed solution provides in situ visualization of pre- and intra-operative 3D medical data directly at the surgical site. We demonstrate pre-clinical evaluation of a prototype system, and report errors for calibration and target registration. Finally, we demonstrate the usefulness of the proposed inside-out tracking system in achieving “bull’s eye” view for C-arm-guided punctures. This AR solution provides an intuitive visualization of the anatomy and can simplify the hand-eye coordination for the orthopedic surgeon.

Jonas Hajek, Mathias Unberath, Javad Fotouhi, Bastian Bier, Sing Chun Lee, Greg Osgood, Andreas Maier, Mehran Armand, Nassir Navab

### Higher Order of Motion Magnification for Vessel Localisation in Surgical Video

Locating vessels during surgery is critical for avoiding inadvertent damage, yet vasculature can be difficult to identify. Video motion magnification can potentially highlight vessels by exaggerating subtle motion embedded within the video to become perceivable to the surgeon. In this paper, we explore a physiological model of artery distension to extend motion magnification to incorporate higher orders of motion, leveraging the difference in acceleration over time (jerk) in pulsatile motion to highlight the vascular pulse wave. Our method is compared to first and second order motion based Eulerian video magnification algorithms. Using data from a surgical video retrieved during a robotic prostatectomy, we show that our method can accentuate cardio-physiological features and produce a more succinct and clearer video for motion magnification, with more similarities in areas without motion to the source video at large magnifications. We validate the approach with a Structure Similarity (SSIM) and Peak Signal to Noise Ratio (PSNR) assessment of three videos at an increasing working distance, using three different levels of optical magnification. Spatio-temporal cross sections are presented to show the effectiveness of our proposal and video samples are provided to demonstrates qualitatively our results.

Mirek Janatka, Ashwin Sridhar, John Kelly, Danail Stoyanov

### Simultaneous Surgical Visibility Assessment, Restoration, and Augmented Stereo Surface Reconstruction for Robotic Prostatectomy

Endoscopic vision plays a significant role in minimally invasive surgical procedures. The maintenance and augmentation of such direct in-situ vision is paramount not only for safety by preventing inadvertent injury, but also to improve precision and reduce operating time. This work aims to quantitatively and objectively evaluate endoscopic visualization on surgical videos without employing any reference images, and simultaneously to restore such degenerated visualization and improve the performance of surgical 3-D reconstruction. An objective no-reference color image quality measure is defined in terms of sharpness, naturalness, and contrast. A retinex-driven fusion framework was proposed not only to recover the deteriorated visibility but also to augment the surface reconstruction. The approaches of surgical visibility assessment, restoration, and reconstruction were validated on clinical data. The experimental results demonstrate that the average visibility was significantly enhanced from 0.66 to 1.27. Moreover, the average density ratio of surgical 3-D reconstruction was improved from 94.8% to 99.6%.

Xiongbiao Luo, Ying Wan, Hui-Qing Zeng, Yingying Guo, Henry Chidozie Ewurum, Xiao-Bin Zhang, A. Jonathan McLeod, Terry M. Peters

### Real-Time Augmented Reality for Ear Surgery

Transtympanic procedures aim at accessing the middle ear structures through a puncture in the tympanic membrane. They require visualization of middle ear structures behind the eardrum. Up to now, this is provided by an oto endoscope. This work focused on implementing a real-time augmented reality based system for robotic-assisted transtympanic surgery. A preoperative computed tomography scan is combined with the surgical video of the tympanic membrane in order to visualize the ossciles and labyrinthine windows which are concealed behind the opaque tympanic membrane. The study was conducted on 5 artificial and 4 cadaveric temporal bones. Initially, a homography framework based on fiducials (6 stainless steel markers on the periphery of the tympanic membrane) was used to register a 3D reconstructed computed tomography image to the video images. Micro/endoscope movements were then tracked using Speeded-Up Robust Features. Simultaneously, a micro-surgical instrument (needle) in the frame was identified and tracked using a Kalman filter. Its 3D pose was also computed using a 3-collinear-point framework. An average initial registration accuracy of 0.21 mm was achieved with a slow propagation error during the 2-minute tracking. Similarly, a mean surgical instrument tip 3D pose estimation error of 0.33 mm was observed. This system is a crucial first step towards keyhole surgical approach to middle and inner ears.

Raabid Hussain, Alain Lalande, Roberto Marroquin, Kibrom Berihu Girum, Caroline Guigou, Alexis Bozorg Grayeli

### Framework for Fusion of Data- and Model-Based Approaches for Ultrasound Simulation

Navigation, acquisition and interpretation of ultrasound (US) images relies on the skills and expertise of the performing physician. Virtual-reality based simulations offer a safe, flexible and standardized environment to train these skills. Simulations can be data-based by displaying a-priori acquired US volumes, or ray-tracing based by simulating the complex US interactions of a geometric model. Here we combine these two approaches as it is relatively easy to gather US images of normal background anatomy and attractive to cover the range of rare findings or particular clinical tasks with known ground truth geometric models. For seamless adaption and change of US content we further require stitching, texture synthesis and tissue deformation simulations. We test the proposed hybrid simulation method by replacing embryos within gestational sacs by ray-traced embryos, and by simulating an ectoptic pregnancy.

Christine Tanner, Rastislav Starkov, Michael Bajka, Orcun Goksel

### Esophageal Gross Tumor Volume Segmentation Using a 3D Convolutional Neural Network

Accurate gross tumor volume (GTV) segmentation in esophagus CT images is a critical task in computer aided diagnosis (CAD) systems. However, because of the difficulties raised by the contrast similarity between esophageal GTV and its neighboring tissues in CT scans, this problem has been addressed weakly. In this paper, we present a 3D end-to-end method based on a convolutional neural network (CNN) for this purpose. We leverage design elements from DenseNet in a typical U-shape. The proposed architecture consists of a contractile path and an extending path that includes dense blocks for extracting contextual features and retrieves the lost resolution respectively. Using dense blocks leads to deep supervision, feature re-usability, and parameter reduction while aiding the network to be more accurate. The proposed architecture was trained and tested on a dataset containing 553 scans from 49 distinct patients. The proposed network achieved a Dice value of $$0.73\pm 0.20$$ , and a 95 $$\%$$ mean surface distance of $$3.07 \pm 1.86$$ mm for 85 test scans. The experimental results indicate the effectiveness of the proposed method for clinical diagnosis and treatment systems.

Sahar Yousefi, Hessam Sokooti, Mohamed S. Elmahdy, Femke P. Peters, Mohammad T. Manzuri Shalmani, Roel T. Zinkstok, Marius Staring

### Deep Learning Based Instance Segmentation in 3D Biomedical Images Using Weak Annotation

Instance segmentation in 3D images is a fundamental task in biomedical image analysis. While deep learning models often work well for 2D instance segmentation, 3D instance segmentation still faces critical challenges, such as insufficient training data due to various annotation difficulties in 3D biomedical images. Common 3D annotation methods (e.g., full voxel annotation) incur high workloads and costs for labeling enough instances for training deep learning 3D instance segmentation models. In this paper, we propose a new weak annotation approach for training a fast deep learning 3D instance segmentation model without using full voxel mask annotation. Our approach needs only 3D bounding boxes for all instances and full voxel annotation for a small fraction of the instances, and uses a novel two-stage 3D instance segmentation model utilizing these two kinds of annotation, respectively. We evaluate our approach on several biomedical image datasets, and the experimental results show that (1) with full annotated boxes and a small amount of masks, our approach can achieve similar performance as the best known methods using full annotation, and (2) with similar annotation time, our approach outperforms the best known methods that use full annotation.

Zhuo Zhao, Lin Yang, Hao Zheng, Ian H. Guldner, Siyuan Zhang, Danny Z. Chen

### Learn the New, Keep the Old: Extending Pretrained Models with New Anatomy and Images

Deep learning has been widely accepted as a promising solution for medical image segmentation, given a sufficiently large representative dataset of images with corresponding annotations. With ever increasing amounts of annotated medical datasets, it is infeasible to train a learning method always with all data from scratch. This is also doomed to hit computational limits, e.g., memory or runtime feasible for training. Incremental learning can be a potential solution, where new information (images or anatomy) is introduced iteratively. Nevertheless, for the preservation of the collective information, it is essential to keep some “important” (i.e., representative) images and annotations from the past, while adding new information. In this paper, we introduce a framework for applying incremental learning for segmentation and propose novel methods for selecting representative data therein. We comparatively evaluate our methods in different scenarios using MR images and validate the increased learning capacity with using our methods.

Firat Ozdemir, Philipp Fuernstahl, Orcun Goksel

### ASDNet: Attention Based Semi-supervised Deep Networks for Medical Image Segmentation

Segmentation is a key step for various medical image analysis tasks. Recently, deep neural networks could provide promising solutions for automatic image segmentation. The network training usually involves a large scale of training data with corresponding ground truth label maps. However, it is very challenging to obtain the ground-truth label maps due to the requirement of expertise knowledge and also intensive labor work. To address such challenges, we propose a novel semi-supervised deep learning framework, called “Attention based Semi-supervised Deep Networks” (ASDNet), to fulfill the segmentation tasks in an end-to-end fashion. Specifically, we propose a fully convolutional confidence network to adversarially train the segmentation network. Based on the confidence map from the confidence network, we then propose a region-attention based semi-supervised learning strategy to include the unlabeled data for training. Besides, sample attention mechanism is also explored to improve the network training. Experimental results on real clinical datasets show that our ASDNet can achieve state-of-the-art segmentation accuracy. Further analysis also indicates that our proposed network components contribute most to the improvement of performance.

Dong Nie, Yaozong Gao, Li Wang, Dinggang Shen

### MS-Net: Mixed-Supervision Fully-Convolutional Networks for Full-Resolution Segmentation

For image segmentation, typical fully convolutional networks (FCNs) need strong supervision through a large sample of high-quality dense segmentations, entailing high costs in expert-raters’ time and effort. We propose MS-Net, a new FCN to significantly reduce supervision cost, and improve performance, by coupling strong supervision with weak supervision through low-cost input in the form of bounding boxes and landmarks. Our MS-Net enables instance-level segmentation at high spatial resolution, with feature extraction using dilated convolutions. We propose a new loss function using bootstrapped Dice overlap for precise segmentation. Results on large datasets show that MS-Net segments more accurately at reduced supervision costs, compared to the state of the art.

Meet P. Shah, S. N. Merchant, Suyash P. Awate

### How to Exploit Weaknesses in Biomedical Challenge Design and Organization

Since the first MICCAI grand challenge organized in 2007 in Brisbane, challenges have become an integral part of MICCAI conferences. In the meantime, challenge datasets have become widely recognized as international benchmarking datasets and thus have a great influence on the research community and individual careers. In this paper, we show several ways in which weaknesses related to current challenge design and organization can potentially be exploited. Our experimental analysis, based on MICCAI segmentation challenges organized in 2015, demonstrates that both challenge organizers and participants can potentially undertake measures to substantially tune rankings. To overcome these problems we present best practice recommendations for improving challenge design and organization.

Annika Reinke, Matthias Eisenmann, Sinan Onogur, Marko Stankovic, Patrick Scholz, Peter M. Full, Hrvoje Bogunovic, Bennett A. Landman, Oskar Maier, Bjoern Menze, Gregory C. Sharp, Korsuk Sirinukunwattana, Stefanie Speidel, Fons van der Sommen, Guoyan Zheng, Henning Müller, Michal Kozubek, Tal Arbel, Andrew P. Bradley, Pierre Jannin, Annette Kopp-Schneider, Lena Maier-Hein

### Accurate Weakly-Supervised Deep Lesion Segmentation Using Large-Scale Clinical Annotations: Slice-Propagated 3D Mask Generation from 2D RECIST

Volumetric lesion segmentation from computed tomography (CT) images is a powerful means to precisely assess multiple time-point lesion/tumor changes. However, because manual 3D segmentation is prohibitively time consuming, current practices rely on an imprecise surrogate called response evaluation criteria in solid tumors (RECIST). Despite their coarseness, RECIST markers are commonly found in current hospital picture and archiving systems (PACS), meaning they can provide a potentially powerful, yet extraordinarily challenging, source of weak supervision for full 3D segmentation. Toward this end, we introduce a convolutional neural network (CNN) based weakly supervised slice-propagated segmentation (WSSS) method to (1) generate the initial lesion segmentation on the axial RECIST-slice; (2) learn the data distribution on RECIST-slices; (3) extrapolate to segment the whole lesion slice by slice to finally obtain a volumetric segmentation. To validate the proposed method, we first test its performance on a fully annotated lymph node dataset, where WSSS performs comparably to its fully supervised counterparts. We then test on a comprehensive lesion dataset with 32,735 RECIST marks, where we report a mean Dice score of 92% on RECIST-marked slices and 76% on the entire 3D volumes.

Jinzheng Cai, Youbao Tang, Le Lu, Adam P. Harrison, Ke Yan, Jing Xiao, Lin Yang, Ronald M. Summers

### Semi-automatic RECIST Labeling on CT Scans with Cascaded Convolutional Neural Networks

Response evaluation criteria in solid tumors (RECIST) is the standard measurement for tumor extent to evaluate treatment responses in cancer patients. As such, RECIST annotations must be accurate. However, RECIST annotations manually labeled by radiologists require professional knowledge and are time-consuming, subjective, and prone to inconsistency among different observers. To alleviate these problems, we propose a cascaded convolutional neural network based method to semi-automatically label RECIST annotations and drastically reduce annotation time. The proposed method consists of two stages: lesion region normalization and RECIST estimation. We employ the spatial transformer network (STN) for lesion region normalization, where a localization network is designed to predict the lesion region and the transformation parameters with a multi-task learning strategy. For RECIST estimation, we adapt the stacked hourglass network (SHN), introducing a relationship constraint loss to improve the estimation precision. STN and SHN can both be learned in an end-to-end fashion. We train our system on the DeepLesion dataset, obtaining a consensus model trained on RECIST annotations performed by multiple radiologists over a multi-year period. Importantly, when judged against the inter-reader variability of two additional radiologist raters, our system performs more stably and with less variability, suggesting that RECIST annotations can be reliably obtained with reduced labor and time.

### A Multi-scale Pyramid of 3D Fully Convolutional Networks for Abdominal Multi-organ Segmentation

Recent advances in deep learning, like 3D fully convolutional networks (FCNs), have improved the state-of-the-art in dense semantic segmentation of medical images. However, most network architectures require severely downsampling or cropping the images to meet the memory limitations of today’s GPU cards while still considering enough context in the images for accurate segmentation. In this work, we propose a novel approach that utilizes auto-context to perform semantic segmentation at higher resolutions in a multi-scale pyramid of stacked 3D FCNs. We train and validate our models on a dataset of manually annotated abdominal organs and vessels from 377 clinical CT images used in gastric surgery, and achieve promising results with close to 90% Dice score on average. For additional evaluation, we perform separate testing on datasets from different sources and achieve competitive results, illustrating the robustness of the model and approach.

Holger R. Roth, Chen Shen, Hirohisa Oda, Takaaki Sugino, Masahiro Oda, Yuichiro Hayashi, Kazunari Misawa, Kensaku Mori

### 3D U-JAPA-Net: Mixture of Convolutional Networks for Abdominal Multi-organ CT Segmentation

This paper introduces a new type of deep learning scheme for fully-automated abdominal multi-organ CT segmentation using transfer learning. Convolutional neural network with 3D U-net is a strong tool to achieve volumetric image segmentation. The drawback of 3D U-net is that its judgement is based only on the local volumetric data, which leads to errors in categorization. To overcome this problem we propose 3D U-JAPA-net, which uses not only the raw CT data but also the probabilistic atlas of organs to reflect the information on organ locations. In the first phase of training, a 3D U-net is trained based on the conventional method. In the second phase, expert 3D U-nets for each organ are trained intensely around the locations of the organs, where the initial weights are transferred from the 3D U-net obtained in the first phase. Segmentation in the proposed method consists of three phases. First rough locations of organs are estimated by probabilistic atlas. Second, the trained expert 3D U-nets are applied in the focused locations. Post-process to remove debris is applied in the final phase. We test the performance of the proposed method with 47 CT data and it achieves higher DICE scores than the conventional 2D U-net and 3D U-net. Also, a positive effect of transfer learning is confirmed by comparing the proposed method with that without transfer learning.

Hideki Kakeya, Toshiyuki Okada, Yukio Oshiro

### Training Multi-organ Segmentation Networks with Sample Selection by Relaxed Upper Confident Bound

Convolutional neural networks (CNNs), especially fully convolutional networks, have been widely applied to automatic medical image segmentation problems, e.g., multi-organ segmentation. Existing CNN-based segmentation methods mainly focus on looking for increasingly powerful network architectures, but pay less attention to data sampling strategies for training networks more effectively. In this paper, we present a simple but effective sample selection method for training multi-organ segmentation networks. Sample selection exhibits an exploitation-exploration strategy, i.e., exploiting hard samples and exploring less frequently visited samples. Based on the fact that very hard samples might have annotation errors, we propose a new sample selection policy, named Relaxed Upper Confident Bound (RUCB). Compared with other sample selection policies, e.g., Upper Confident Bound (UCB), it exploits a range of hard samples rather than being stuck with a small set of very hard ones, which mitigates the influence of annotation errors during training. We apply this new sample selection policy to training a multi-organ segmentation network on a dataset containing 120 abdominal CT scans and show that it boosts segmentation performance significantly.

Yan Wang, Yuyin Zhou, Peng Tang, Wei Shen, Elliot K. Fishman, Alan L. Yuille

### Bridging the Gap Between 2D and 3D Organ Segmentation with Volumetric Fusion Net

There has been a debate on whether to use 2D or 3D deep neural networks for volumetric organ segmentation. Both 2D and 3D models have their advantages and disadvantages. In this paper, we present an alternative framework, which trains 2D networks on different viewpoints for segmentation, and builds a 3D Volumetric Fusion Net (VFN) to fuse the 2D segmentation results. VFN is relatively shallow and contains much fewer parameters than most 3D networks, making our framework more efficient at integrating 3D information for segmentation. We train and test the segmentation and fusion modules individually, and propose a novel strategy, named cross-cross-augmentation, to make full use of the limited training data. We evaluate our framework on several challenging abdominal organs, and verify its superiority in segmentation accuracy and stability over existing 2D and 3D approaches.

Yingda Xia, Lingxi Xie, Fengze Liu, Zhuotun Zhu, Elliot K. Fishman, Alan L. Yuille

### Segmentation of Renal Structures for Image-Guided Surgery

Anatomic models of kidneys may help surgeons make plans or guide surgical procedures, in which segmentation is a prerequisite. We develop a convolutional neural network to segment multiple renal structures from arterial-phase CT images, including parenchyma, arteries, veins, collecting systems, and abnormal structures. To the best of our knowledge, this is the first work dedicated to jointly segment these five renal structures. We introduce two novel techniques. First, we generalize the sequential residual architecture to residual graphs. With this generalization, we convert a popular multi-scale architecture (U-Net) to a residual U-Net. Second, we solve the unbalanced data problem which commonly exists in medical image segmentation by weighting pixels with multi-scale entropy. Our multi-scale entropy map combines information theory and scale analysis to capture spatial complexity of a multi-class label map. The two techniques significantly improve segmentation accuracy. Trained on 400 CT scans and tested on another 100, our algorithm achieves median Dice indices 0.96, 0.86, 0.8, 0.62, and 0.29 respectively for renal parenchyma, arteries, veins, collecting systems and abnormal structures.

Junning Li, Pechin Lo, Ahmed Taha, Hang Wu, Tao Zhao

### Kid-Net: Convolution Networks for Kidney Vessels Segmentation from CT-Volumes

Semantic image segmentation plays an important role in modeling patient-specific anatomy. We propose a convolution neural network, called Kid-Net, along with a training schema to segment kidney vessels: artery, vein and collecting system. Such segmentation is vital during the surgical planning phase in which medical decisions are made before surgical incision. Our main contribution is developing a training schema that handles unbalanced data, reduces false positives and enables high-resolution segmentation with a limited memory budget. These objectives are attained using dynamic weighting, random sampling and 3D patch segmentation.Manual medical image annotation is both time-consuming and expensive. Kid-Net reduces kidney vessels segmentation time from matter of hours to minutes. It is trained end-to-end using 3D patches from volumetric CT-images. A complete segmentation for a $$512\times 512\times 512$$ CT-volume is obtained within a few minutes (1–2 mins) by stitching the output 3D patches together. Feature down-sampling and up-sampling are utilized to achieve higher classification and localization accuracies. Quantitative and qualitative evaluation results on a challenging testing dataset show Kid-Net competence.

Ahmed Taha, Pechin Lo, Junning Li, Tao Zhao

### Local and Non-local Deep Feature Fusion for Malignancy Characterization of Hepatocellular Carcinoma

Deep feature derived from convolutional neural network (CNN) has demonstrated superior ability to characterize the biological aggressiveness of tumors, which is typically based on convolutional operations repeatedly processed within a local neighborhood. Due to the heterogeneity of lesions, such local deep feature may be insufficient to represent the aggressiveness of neoplasm. Inspired by the non-local neural networks in computer vision, the non-local deep feature may be remarkably complementary for lesion characterization. In this work, we propose a local and non-local deep feature fusion model based on common and individual feature analysis by extracting common and individual components of local and non-local deep features to characterize the biological aggressiveness of lesions. Specifically, we first design a non-local subnetwork for non-local deep feature extraction of neoplasm, and subsequently combine local and non-local deep features with a specific designed fusion subnetwork based on common and individual feature analysis. Experimental results of malignancy characterization of clinical hepatocellular carcinoma (HCC) with Contrast-enhanced MR images demonstrate several intriguing features of the proposed local and non-local deep feature fusion model as follows: (1) Non-local deep feature outperforms local deep feature for lesion characterization; (2) The fusion of local and non-local deep feature yields further improved performance of lesion characterization; (3) The fusion method of common and individual feature analysis outperforms the method of simple concatenation and the method of deep correlation model.

Tianyou Dou, Lijuan Zhang, Hairong Zheng, Wu Zhou

### A Novel Bayesian Model Incorporating Deep Neural Network and Statistical Shape Model for Pancreas Segmentation

Deep neural networks have achieved significant success in medical image segmentation in recent years. However, poor contrast to surrounding tissues and high flexibility of anatomical structure of the interest object are still challenges. On the other hand, statistical shape model based approaches have demonstrated promising performance on exploiting complex shape variabilities but they are sensitive to localization and initialization. This motivates us to leverage the rich shape priors learned from statistical shape models to improve the segmentation of deep neural networks. In this work, we propose a novel Bayesian model incorporating the segmentation results from both deep neural network and statistical shape model for segmentation. In evaluation, experiments are performed on 82 CT datasets of the challenging public NIH pancreas dataset. We report 85.32 % of the mean DSC that outperforms the state-of-the-art and approximately 12 % improvement from the predicted segment of deep neural network.

Jingting Ma, Feng Lin, Stefan Wesarg, Marius Erdt

### Fine-Grained Segmentation Using Hierarchical Dilated Neural Networks

Image segmentation is a crucial step in many computer-aided medical image analysis tasks, e.g., automated radiation therapy. However, low tissue-contrast and large amounts of artifacts in medical images, i.e., CT or MR images, corrupt the true boundaries of the target tissues and adversely influence the precision of boundary localization in segmentation. To precisely locate blurry and missing boundaries, human observers often use high-resolution context information from neighboring regions. To extract such information and achieve fine-grained segmentation (high accuracy on the boundary regions and small-scale targets), we propose a novel hierarchical dilated network. In the hierarchy, to maintain precise location information, we adopt dilated residual convolutional blocks as basic building blocks to reduce the dependency of the network on downsampling for receptive field enlargement and semantic information extraction. Then, by concatenating the intermediate feature maps of the serially-connected dilated residual convolutional blocks, the resultant hierarchical dilated module (HD-module) can encourage more smooth information flow and better utilization of both high-level semantic information and low-level textural information. Finally, we integrate several HD-modules in different resolutions in a parallel connection fashion to finely collect information from multiple (more than 12) scales for the network. The integration is defined by a novel late fusion module proposed in this paper. Experimental results on pelvic organ CT image segmentation demonstrate the superior performance of our proposed algorithm to the state-of-the-art deep learning segmentation algorithms, especially in localizing the organ boundaries.

Sihang Zhou, Dong Nie, Ehsan Adeli, Yaozong Gao, Li Wang, Jianping Yin, Dinggang Shen

### Generalizing Deep Models for Ultrasound Image Segmentation

Deep models are subject to performance drop when encountering appearance discrepancy, even on congeneric corpus in which objects share the similar structure but only differ slightly in appearance. This performance drop can be observed in automated ultrasound image segmentation. In this paper, we try to address this general problem with a novel online adversarial appearance conversion solution. Our contribution is three-fold. First, different from previous methods which utilize corpus-level training to model a fixed source-target appearance conversion in advance, we only need to model the source corpus and then we can efficiently convert each single testing image in the target corpus on-the-fly. Second, we propose a self-play training strategy to effectively pre-train all the adversarial modules in our framework to capture the appearance and structure distributions of source corpus. Third, we propose to explore a composite appearance and structure constraints distilled from the source corpus to stabilize the online adversarial appearance conversion, thus the pre-trained models can iteratively remove appearance discrepancy in the testing image in a weakly-supervised fashion. We demonstrate our method on segmenting congeneric prenatal ultrasound images. Based on the appearance conversion, we can generalize deep models at-hand well and achieve significant improvement in segmentation without re-training on massive, expensive new annotations.

Xin Yang, Haoran Dou, Ran Li, Xu Wang, Cheng Bian, Shengli Li, Dong Ni, Pheng-Ann Heng

### Inter-site Variability in Prostate Segmentation Accuracy Using Deep Learning

Deep-learning-based segmentation tools have yielded higher reported segmentation accuracies for many medical imaging applications. However, inter-site variability in image properties can challenge the translation of these tools to data from ‘unseen’ sites not included in the training data. This study quantifies the impact of inter-site variability on the accuracy of deep-learning-based segmentations of the prostate from magnetic resonance (MR) images, and evaluates two strategies for mitigating the reduced accuracy for data from unseen sites: training on multi-site data and training with limited additional data from the unseen site. Using 376 T2-weighted prostate MR images from six sites, we compare the segmentation accuracy (Dice score and boundary distance) of three deep-learning-based networks trained on data from a single site and on various configurations of data from multiple sites. We found that the segmentation accuracy of a single-site network was substantially worse on data from unseen sites than on data from the training site. Training on multi-site data yielded marginally improved accuracy and robustness. However, including as few as 8 subjects from the unseen site, e.g. during commissioning of a new clinical system, yielded substantial improvement (regaining 75% of the difference in Dice score).

Eli Gibson, Yipeng Hu, Nooshin Ghavami, Hashim U. Ahmed, Caroline Moore, Mark Emberton, Henkjan J. Huisman, Dean C. Barratt

### Deep Learning-Based Boundary Detection for Model-Based Segmentation with Application to MR Prostate Segmentation

Model-based segmentation (MBS) has been successfully used for the fully automatic segmentation of anatomical structures in medical images with well defined gray values due to its ability to incorporate prior knowledge about the organ shape. However, the robust and accurate detection of boundary points required for the MBS is still a challenge for organs with inhomogeneous appearance such as the prostate and magnetic resonance (MR) images, where the image contrast can vary greatly due to the use of different acquisition protocols and scanners at different clinical sites. In this paper, we propose a novel boundary detection approach and apply it to the segmentation of the whole prostate in MR images. We formulate boundary detection as a regression task, where a convolutional neural network is trained to predict the distances between a surface mesh and the corresponding boundary points. We have evaluated our method on the Prostate MR Image Segmentation 2012 challenge data set with the results showing that the new boundary detection approach can detect boundaries more robustly with respect to contrast and appearance variations and more accurately than previously used features. With an average boundary distance of 1.71 mm and a Dice similarity coefficient of 90.5%, our method was able to segment the prostate more accurately on average than a second human observer and placed first out of 40 entries submitted to the challenge at the writing of this paper.

Tom Brosch, Jochen Peters, Alexandra Groth, Thomas Stehle, Jürgen Weese

### Deep Attentional Features for Prostate Segmentation in Ultrasound

Automatic prostate segmentation in transrectal ultrasound (TRUS) is of essential importance for image-guided prostate biopsy and treatment planning. However, developing such automatic solutions remains very challenging due to the ambiguous boundary and inhomogeneous intensity distribution of the prostate in TRUS. This paper develops a novel deep neural network equipped with deep attentional feature (DAF) modules for better prostate segmentation in TRUS by fully exploiting the complementary information encoded in different layers of the convolutional neural network (CNN). Our DAF utilizes the attention mechanism to selectively leverage the multi-level features integrated from different layers to refine the features at each individual layer, suppressing the non-prostate noise at shallow layers of the CNN and increasing more prostate details into features at deep layers. We evaluate the efficacy of the proposed network on challenging prostate TRUS images, and the experimental results demonstrate that our network outperforms state-of-the-art methods by a large margin.

Yi Wang, Zijun Deng, Xiaowei Hu, Lei Zhu, Xin Yang, Xuemiao Xu, Pheng-Ann Heng, Dong Ni

### Accurate and Robust Segmentation of the Clinical Target Volume for Prostate Brachytherapy

We propose a method for automatic segmentation of the prostate clinical target volume for brachytherapy in transrectal ultrasound (TRUS) images. Because of the large variability in the strength of image landmarks and characteristics of artifacts in TRUS images, existing methods achieve a poor worst-case performance, especially at the prostate base and apex. We aim at devising a method that produces accurate segmentations on easy and difficult images alike. Our method is based on a novel convolutional neural network (CNN) architecture. We propose two strategies for improving the segmentation accuracy on difficult images. First, we cluster the training images using a sparse subspace clustering method based on features learned with a convolutional autoencoder. Using this clustering, we suggest an adaptive sampling strategy that drives the training process to give more attention to images that are difficult to segment. Secondly, we train multiple CNN models using subsets of the training data. The disagreement within this CNN ensemble is used to estimate the segmentation uncertainty due to a lack of reliable landmarks. We employ a statistical shape model to improve the uncertain segmentations produced by the CNN ensemble. On test images from 225 subjects, our method achieves a Hausdorff distance of $$2.7\,\pm \,2.1$$ mm, Dice score of $$93.9\,\pm \,3.5$$ , and it significantly reduces the likelihood of committing large segmentation errors.

Davood Karimi, Qi Zeng, Prateek Mathur, Apeksha Avinash, Sara Mahdavi, Ingrid Spadinger, Purang Abolmaesumi, Septimiu Salcudean

### Hashing-Based Atlas Ranking and Selection for Multiple-Atlas Segmentation

In this paper, we present a learning based, registration free, atlas ranking technique for selecting outperforming atlases prior to image registration and multi-atlas segmentation (MAS). To this end, we introduce ensemble hashing, where each data (image volume) is represented with ensemble of hash codes and a learnt distance metric is used to obviate the need for pairwise registration between atlases and target image. We then pose the ranking process as an assignment problem and solve it through two different combinatorial optimization (CO) techniques. We use 43 unregistered cardiac CT Angiography (CTA) scans and perform thorough validations to show the effectiveness and superiority of the presented technique against existing atlas ranking and selection methods.

Amin Katouzian, Hongzhi Wang, Sailesh Conjeti, Hui Tang, Ehsan Dehghan, Alexandros Karargyris, Anup Pillai, Kenneth Clarkson, Nassir Navab

### Corners Detection for Bioresorbable Vascular Scaffolds Segmentation in IVOCT Images

Bioresorbable Vascular scaffold (BVS) is a promising type of stent in percutaneous coronary intervention. Struts apposition assessment is important to ensure the safety of implanted BVS. Currently, BVS struts apposition analysis in 2D IVOCT images still depends on manual delineation of struts, which is labor intensive and time consuming. Automatic struts segmentation is highly desired to simplify and speed up quantitative analysis. However, it is difficult to segment struts accurately based on the contour, due to the influence of fractures inside strut and blood artifacts around strut. In this paper, a novel framework of automatic struts segmentation based on four corners is introduced, in which prior knowledge is utilized that struts have obvious feature of box-shape. Firstly, a cascaded AdaBoost classifier based on enriched haar-like features is trained to detect struts corners. Then, segmentation result can be obtained based on the four detected corners of each strut. Tested on the same five pullbacks consisting of 480 images with strut, our novel method achieved an average Dice’s coefficient of 0.85 for strut segmentation areas, which is increased by about 0.01 compared to the state-of-the-art. It concludes that our method can segment struts accurately and robustly and has better performance than the state-of-the-art. Furthermore, automatic struts malapposition analysis in clinical practice is feasible based on the segmentation results.

Linlin Yao, Yihui Cao, Qinhua Jin, Jing Jing, Yundai Chen, Jianan Li, Rui Zhu

### The Deep Poincaré Map: A Novel Approach for Left Ventricle Segmentation

Precise segmentation of the left ventricle (LV) within cardiac MRI images is a prerequisite for the quantitative measurement of heart function. However, this task is challenging due to the limited availability of labeled data and motion artifacts from cardiac imaging. In this work, we present an iterative segmentation algorithm for LV delineation. By coupling deep learning with a novel dynamic-based labeling scheme, we present a new methodology where a policy model is learned to guide an agent to travel over the image, tracing out a boundary of the ROI – using the magnitude difference of the Poincaré map as a stopping criterion. Our method is evaluated on two datasets, namely the Sunnybrook Cardiac Dataset (SCD) and data from the STACOM 2011 LV segmentation challenge. Our method outperforms the previous research over many metrics. In order to demonstrate the transferability of our method we present encouraging results over the STACOM 2011 data, when using a model trained on the SCD dataset.

Yuanhan Mo, Fangde Liu, Douglas McIlwraith, Guang Yang, Jingqing Zhang, Taigang He, Yike Guo

### Bayesian VoxDRN: A Probabilistic Deep Voxelwise Dilated Residual Network for Whole Heart Segmentation from 3D MR Images

In this paper, we propose a probabilistic deep voxelwise dilated residual network, referred as Bayesian VoxDRN, to segment the whole heart from 3D MR images. Bayesian VoxDRN can predict voxelwise class labels with a measure of model uncertainty, which is achieved by a dropout-based Monte Carlo sampling during testing to generate a posterior distribution of the voxel class labels. Our method has three compelling advantages. First, the dropout mechanism encourages the model to learn a distribution of weights with better data-explanation ability and prevents over-fitting. Second, focal loss and Dice loss are well encapsulated into a complementary learning objective to segment both hard and easy classes. Third, an iterative switch training strategy is introduced to alternatively optimize a binary segmentation task and a multi-class segmentation task for a further accuracy improvement. Experiments on the MICCAI 2017 multi-modality whole heart segmentation challenge data corroborate the effectiveness of the proposed method.

Zenglin Shi, Guodong Zeng, Le Zhang, Xiahai Zhuang, Lei Li, Guang Yang, Guoyan Zheng

### Real-Time Prediction of Segmentation Quality

Recent advances in deep learning based image segmentation methods have enabled real-time performance with human-level accuracy. However, occasionally even the best method fails due to low image quality, artifacts or unexpected behaviour of black box algorithms. Being able to predict segmentation quality in the absence of ground truth is of paramount importance in clinical practice, but also in large-scale studies to avoid the inclusion of invalid data in subsequent analysis.In this work, we propose two approaches of real-time automated quality control for cardiovascular MR segmentations using deep learning. First, we train a neural network on 12,880 samples to predict Dice Similarity Coefficients (DSC) on a per-case basis. We report a mean average error (MAE) of 0.03 on 1,610 test samples and 97% binary classification accuracy for separating low and high quality segmentations. Secondly, in the scenario where no manually annotated data is available, we train a network to predict DSC scores from estimated quality obtained via a reverse testing strategy. We report an $$\mathrm {MAE} = 0.14$$ and 91% binary classification accuracy for this case. Predictions are obtained in real-time which, when combined with real-time segmentation methods, enables instant feedback on whether an acquired scan is analysable while the patient is still in the scanner. This further enables new applications of optimising image acquisition towards best possible analysis results.

Robert Robinson, Ozan Oktay, Wenjia Bai, Vanya V. Valindria, Mihir M. Sanghvi, Nay Aung, José M. Paiva, Filip Zemrak, Kenneth Fung, Elena Lukaschuk, Aaron M. Lee, Valentina Carapella, Young Jin Kim, Bernhard Kainz, Stefan K. Piechnik, Stefan Neubauer, Steffen E. Petersen, Chris Page, Daniel Rueckert, Ben Glocker

### Recurrent Neural Networks for Aortic Image Sequence Segmentation with Sparse Annotations

Segmentation of image sequences is an important task in medical image analysis, which enables clinicians to assess the anatomy and function of moving organs. However, direct application of a segmentation algorithm to each time frame of a sequence may ignore the temporal continuity inherent in the sequence. In this work, we propose an image sequence segmentation algorithm by combining a fully convolutional network with a recurrent neural network, which incorporates both spatial and temporal information into the segmentation task. A key challenge in training this network is that the available manual annotations are temporally sparse, which forbids end-to-end training. We address this challenge by performing non-rigid label propagation on the annotations and introducing an exponentially weighted loss function for training. Experiments on aortic MR image sequences demonstrate that the proposed method significantly improves both accuracy and temporal smoothness of segmentation, compared to a baseline method that utilises spatial information only. It achieves an average Dice metric of 0.960 for the ascending aorta and 0.953 for the descending aorta.

Wenjia Bai, Hideaki Suzuki, Chen Qin, Giacomo Tarroni, Ozan Oktay, Paul M. Matthews, Daniel Rueckert

### Deep Nested Level Sets: Fully Automated Segmentation of Cardiac MR Images in Patients with Pulmonary Hypertension

In this paper we introduce a novel and accurate optimisation method for segmentation of cardiac MR (CMR) images in patients with pulmonary hypertension (PH). The proposed method explicitly takes into account the image features learned from a deep neural network. To this end, we estimate simultaneous probability maps over region and edge locations in CMR images using a fully convolutional network. Due to the distinct morphology of the heart in patients with PH, these probability maps can then be incorporated in a single nested level set optimisation framework to achieve multi-region segmentation with high efficiency. The proposed method uses an automatic way for level set initialisation and thus the whole optimisation is fully automated. We demonstrate that the proposed deep nested level set (DNLS) method outperforms existing state-of-the-art methods for CMR segmentation in PH patients.

Jinming Duan, Jo Schlemper, Wenjia Bai, Timothy J. W. Dawes, Ghalib Bello, Georgia Doumou, Antonio De Marvao, Declan P. O’Regan, Daniel Rueckert

### Atrial Fibrosis Quantification Based on Maximum Likelihood Estimator of Multivariate Images

We present a fully-automated segmentation and quantification of the left atrial (LA) fibrosis and scars combining two cardiac MRIs, one is the target late gadolinium-enhanced (LGE) image, and the other is an anatomical MRI from the same acquisition session. We formulate the joint distribution of images using a multivariate mixture model (MvMM), and employ the maximum likelihood estimator (MLE) for texture classification of the images simultaneously. The MvMM can also embed transformations assigned to the images to correct the misregistration. The iterated conditional mode algorithm is adopted for optimization. This method first extracts the anatomical shape of the LA, and then estimates a prior probability map. It projects the resulting segmentation onto the LA surface, for quantification and analysis of scarring. We applied the proposed method to 36 clinical data sets and obtained promising results (Accuracy: $$0.809\pm .150$$ , Dice: $$0.556\pm .187$$ ). We compared the method with the conventional algorithms and showed an evidently and statistically better performance ( $$p<0.03$$ ).

Fuping Wu, Lei Li, Guang Yang, Tom Wong, Raad Mohiaddin, David Firmin, Jennifer Keegan, Lingchao Xu, Xiahai Zhuang

### Left Ventricle Segmentation via Optical-Flow-Net from Short-Axis Cine MRI: Preserving the Temporal Coherence of Cardiac Motion

Quantitative assessment of left ventricle (LV) function from cine MRI has significant diagnostic and prognostic value for cardiovascular disease patients. The temporal movement of LV provides essential information on the contracting/relaxing pattern of heart, which is keenly evaluated by clinical experts in clinical practice. Inspired by the expert way of viewing Cine MRI, we propose a new CNN module that is able to incorporate the temporal information into LV segmentation from cine MRI. In the proposed CNN, the optical flow (OF) between neighboring frames is integrated and aggregated at feature level, such that temporal coherence in cardiac motion can be taken into account during segmentation. The proposed module is integrated into the U-net architecture without need of additional training. Furthermore, dilated convolution is introduced to improve the spatial accuracy of segmentation. Trained and tested on the Cardiac Atlas database, the proposed network resulted in a Dice index of 95% and an average perpendicular distance of 0.9 pixels for the middle LV contour, significantly outperforming the original U-net that processes each frame individually. Notably, the proposed method improved the temporal coherence of LV segmentation results, especially at the LV apex and base where the cardiac motion is difficult to follow.

Wenjun Yan, Yuanyuan Wang, Zeju Li, Rob J. van der Geest, Qian Tao

### VoxelAtlasGAN: 3D Left Ventricle Segmentation on Echocardiography with Atlas Guided Generation and Voxel-to-Voxel Discrimination

3D left ventricle (LV) segmentation on echocardiography is very important for diagnosis and treatment of cardiac disease. It is not only because of that echocardiography is a real-time imaging technology and widespread in clinical application, but also because of that LV segmentation on 3D echocardiography can provide more full volume information of heart than LV segmentation on 2D echocardiography. However, 3D LV segmentation on echocardiography is still an open and challenging task owing to the lower contrast, higher noise and data dimensionality, limited annotation of 3D echocardiography. In this paper, we proposed a novel real-time framework, i.e., VoxelAtlasGAN, for 3D LV segmentation on 3D echocardiography. This framework has three contributions: (1) It is based on voxel-to-voxel conditional generative adversarial nets (cGAN). For the first time, cGAN is used for 3D LV segmentation on echocardiography. And cGAN advantageously fuses substantial 3D spatial context information from 3D echocardiography by self-learning structured loss; (2) For the first time, it embeds the atlas into an end-to-end optimization framework, which uses 3D LV atlas as a powerful prior knowledge to improve the inference speed, address the lower contrast and the limited annotation problems of 3D echocardiography; (3) It combines traditional discrimination loss and the new proposed consistent constraint, which further improves the generalization of the proposed framework. VoxelAtlasGAN was validated on 60 subjects on 3D echocardiography and it achieved satisfactory segmentation results and high inference speed. The mean surface distance is 1.85 mm, the mean hausdorff surface distance is 7.26 mm, mean dice is 0.953, the correlation of EF is 0.918, and the mean inference speed is 0.1 s. These results have demonstrated that our proposed method has great potential for clinical application.

Suyu Dong, Gongning Luo, Kuanquan Wang, Shaodong Cao, Ashley Mercado, Olga Shmuilovich, Henggui Zhang, Shuo Li

### Domain and Geometry Agnostic CNNs for Left Atrium Segmentation in 3D Ultrasound

Segmentation of the left atrium and deriving its size can help to predict and detect various cardiovascular conditions. Automation of this process in 3D Ultrasound image data is desirable, since manual delineations are time-consuming, challenging and observer-dependent. Convolutional neural networks have made improvements in computer vision and in medical image analysis. They have successfully been applied to segmentation tasks and were extended to work on volumetric data. In this paper we introduce a combined deep-learning based approach on volumetric segmentation in Ultrasound acquisitions with incorporation of prior knowledge about left atrial shape and imaging device. The results show, that including a shape prior helps the domain adaptation and the accuracy of segmentation is further increased with adversarial learning.

Markus A. Degel, Nassir Navab, Shadi Albarqouni

### Densely Deep Supervised Networks with Threshold Loss for Cancer Detection in Automated Breast Ultrasound

Automated breast ultrasound (ABUS) is a new and promising tool for diagnosing breast cancer. However, reviewing ABUS images is extremely time-consuming and oversight errors could happen. We propose a novel 3D convolutional network for automatic cancer detection in ABUS. Our contribution is twofold. First, we propose a threshold loss function to provide voxel-level adaptive threshold for discriminating cancer and non-cancer, thus achieving high sensitivity with low FPs. Second, we propose a densely deep supervision (DDS) mechanism to improve the sensitivity significantly by utilizing multi-scale discriminative features of all layers. Both class-balanced cross entropy loss and overlap loss are employed to enhance DDS performance. The efficacy of the proposed network is validated on a dataset of 196 patients with 661 cancer regions. The 4-fold cross-validation experiments show our network obtains a sensitivity of 93% with 2.2 FPs per ABUS volume. Our proposed novel network can provide an accurate and automatic cancer detection tool for breast cancer screening by maintaining high sensitivity with low FPs.

Na Wang, Cheng Bian, Yi Wang, Min Xu, Chenchen Qin, Xin Yang, Tianfu Wang, Anhua Li, Dinggang Shen, Dong Ni

### Btrfly Net: Vertebrae Labelling with Energy-Based Adversarial Learning of Local Spine Prior

Robust localisation and identification of vertebrae is essential for automated spine analysis. The contribution of this work to the task is two-fold: (1) Inspired by the human expert, we hypothesise that a sagittal and coronal reformation of the spine contain sufficient information for labelling the vertebrae. Thereby, we propose a butterfly-shaped network architecture (termed Btrfly Net) that efficiently combines the information across reformations. (2) Underpinning the Btrfly net, we present an energy-based adversarial training regime that encodes local spine structure as an anatomical prior into the network, thereby enabling it to achieve state-of-art performance in all standard metrics on a benchmark dataset of 302 scans without any post-processing during inference.

Anjany Sekuboyina, Markus Rempfler, Jan Kukačka, Giles Tetteh, Alexander Valentinitsch, Jan S. Kirschke, Bjoern H. Menze

### AtlasNet: Multi-atlas Non-linear Deep Networks for Medical Image Segmentation

Deep learning methods have gained increasing attention in addressing segmentation problems for medical images analysis despite the challenges inherited from the medical domain, such as limited data availability, lack of consistent textural or salient patterns, and high dimensionality of the data. In this paper, we introduce a novel multi-network architecture that exploits domain knowledge to address those challenges. The proposed architecture consists of multiple deep neural networks that are trained after co-aligning multiple anatomies through multi-metric deformable registration. This multi-network architecture can be trained with fewer examples and leads to better performance, robustness and generalization through consensus. Comparable to human accuracy, highly promising results on the challenging task of interstitial lung disease segmentation demonstrate the potential of our approach.

M. Vakalopoulou, G. Chassagnon, N. Bus, R. Marini, E. I. Zacharaki, M.-P. Revel, N. Paragios

### CFCM: Segmentation via Coarse to Fine Context Memory

Recent neural-network-based architectures for image segmentation make extensive usage of feature forwarding mechanisms to integrate information from multiple scales. Although yielding good results, even deeper architectures and alternative methods for feature fusion at different resolutions have been scarcely investigated for medical applications. In this work we propose to implement segmentation via an encoder-decoder architecture which differs from any other previously published method since (i) it employs a very deep architecture based on residual learning and (ii) combines features via a convolutional Long Short Term Memory (LSTM), instead of concatenation or summation. The intuition is that the memory mechanism implemented by LSTMs can better integrate features from different scales through a coarse-to-fine strategy; hence the name Coarse-to-Fine Context Memory (CFCM). We demonstrate the remarkable advantages of this approach on two datasets: the Montgomery county lung segmentation dataset, and the EndoVis 2015 challenge dataset for surgical instrument segmentation.

Fausto Milletari, Nicola Rieke, Maximilian Baust, Marco Esposito, Nassir Navab

### Pyramid-Based Fully Convolutional Networks for Cell Segmentation

The low contrast and irregular cell shapes in microscopy images cause difficulties to obtain the accurate cell segmentation. We propose pyramid-based fully convolutional networks (FCN) to segment cells in a cascaded refinement manner. The higher-level FCNs generate coarse cell segmentation masks, attacking the challenge of low contrast between cell inner regions and the background. The lower-level FCNs generate segmentation masks focusing more on cell details, attacking the challenge of irregular cell shapes. The FCNs in the pyramid are trained in a cascaded way such that the residual error between the ground truth and upper-level segmentation is propagated to the lower-level and draws the attention of the lower-level FCNs to find the cell details missed from the upper-levels. The fine cell details from lower-level FCNs are gradually fused into the coarse segmentation from upper-level FCNs so as to obtain a final precise cell segmentation mask. On the ISBI cell segmentation challenge dataset and a newly collected dataset with high-quality ground truth, our method outperforms the state-of-the-art methods.

Tianyi Zhao, Zhaozheng Yin

### Automated Object Tracing for Biomedical Image Segmentation Using a Deep Convolutional Neural Network

Convolutional neural networks (CNNs) have been used for fast and accurate segmentation of medical images. In this paper, we present a novel methodology that uses CNNs for segmentation by mimicking the human task of tracing object boundaries. The architecture takes as input a patch of an image with an overlay of previously traced pixels and the output predicts the coordinates of the next m pixels to be traced. We also consider a CNN architecture that leverages the output from another semantic segmentation CNN, e.g., U-net, as an auxiliary image channel. To initialize the trace path in an image, we use either locations identified as object boundaries with high confidence from a semantic segmentation CNN or a short manually traced path. By iterating the CNN output, our method continues the trace until it intersects with the beginning of the path. We show that our network is more accurate than the state-of-the-art semantic segmentation CNN on microscopy images from the ISBI cell tracking challenge. Moreover, our methodology provides a natural platform for performing human-in-the-loop segmentation that is more accurate than CNNs alone and orders of magnitude faster than manual segmentation.

Erica M. Rutter, John H. Lagergren, Kevin B. Flores

### RBC Semantic Segmentation for Sickle Cell Disease Based on Deformable U-Net

Reliable cell segmentation and classification from biomedical images is a crucial step for both scientific research and clinical practice. A major challenge for more robust segmentation and classification methods is the large variations in the size, shape and viewpoint of the cells, combining with the low image quality caused by noise and artifacts. To address this issue, in this work we propose a learning-based, simultaneous cell segmentation and classification method based on the U-Net structure with deformable convolution layers. The U-Net architecture has been shown to offer a precise localization for image semantic segmentation. Moreover, deformable convolution enables the free form deformation of the feature learning process, thus making the whole network more robust to various cell morphologies and image settings. The proposed method is tested on microscopic red blood cell images from patients with sickle cell disease. The results show that U-Net with deformable convolution achieves the highest accuracy for both segmentation and classification tasks, compared with the original U-Net structure and unsupervised methods.

Mo Zhang, Xiang Li, Mengjia Xu, Quanzheng Li

### Accurate Detection of Inner Ears in Head CTs Using a Deep Volume-to-Volume Regression Network with False Positive Suppression and a Shape-Based Constraint

Cochlear implants (CIs) are neural prosthetics which are used to treat patients with hearing loss. CIs use an array of electrodes which are surgically inserted into the cochlea to stimulate the auditory nerve endings. After surgery, CIs need to be programmed. Studies have shown that the spatial relationship between the intra-cochlear anatomy and electrodes derived from medical images can guide CI programming and lead to significant improvement in hearing outcomes. However, clinical head CT images are usually obtained from scanners of different brands with different protocols. The field of view thus varies greatly and visual inspection is needed to document their content prior to applying algorithms for electrode localization and intra-cochlear anatomy segmentation. In this work, to determine the presence/absence of inner ears and to accurately localize them in head CTs, we use a volume-to-volume convolutional neural network which can be trained end-to-end to map a raw CT volume to probability maps which indicate inner ear positions. We incorporate a false positive suppression strategy in training and apply a shape-based constraint. We achieve a labeling accuracy of 98.59% and a localization error of 2.45 mm. The localization error is significantly smaller than a random forest-based approach that has been proposed recently to perform the same task.

Dongqing Zhang, Jianing Wang, Jack H. Noble, Benoit M. Dawant

### Automatic Teeth Segmentation in Panoramic X-Ray Images Using a Coupled Shape Model in Combination with a Neural Network

Dental panoramic radiographs depict the full set of teeth in a single image and are used by dentists as a popular first tool for diagnosis. In order to provide the dentist with automatic diagnostic support, a robust and accurate segmentation of the individual teeth is required. However, poor image quality of panoramic x-ray images like low contrast or noise as well as teeth variations in between patients make this task difficult. In this paper, a fully automatic approach is presented that uses a coupled shape model in conjunction with a neural network to overcome these challenges. The network provides a preliminary segmentation of the teeth region which is used to initialize the coupled shape model in terms of position and scale. Then the 28 individual teeth (excluding wisdom teeth) are segmented and labeled using gradient image features in combination with the model’s statistical knowledge about their shape variation and spatial relation. The segmentation quality of the approach is assessed by comparing the generated results to manually created gold-standard segmentations of the individual teeth. Experimental results on a set of 14 test images show average precision and recall values of 0.790 and 0.827, respectively and a DICE overlap of 0.744.

Andreas Wirtz, Sudesh Ganapati Mirashi, Stefan Wesarg

### Craniomaxillofacial Bony Structures Segmentation from MRI with Deep-Supervision Adversarial Learning

Automatic segmentation of medical images finds abundant applications in clinical studies. Computed Tomography (CT) imaging plays a critical role in diagnostic and surgical planning of craniomaxillofacial (CMF) surgeries as it shows clear bony structures. However, CT imaging poses radiation risks for the subjects being scanned. Alternatively, Magnetic Resonance Imaging (MRI) is considered to be safe and provides good visualization of the soft tissues, but the bony structures appear invisible from MRI. Therefore, the segmentation of bony structures from MRI is quite challenging. In this paper, we propose a cascaded generative adversarial network with deep-supervision discriminator (Deep-supGAN) for automatic bony structures segmentation. The first block in this architecture is used to generate a high-quality CT image from an MRI, and the second block is used to segment bony structures from MRI and the generated CT image. Different from traditional discriminators, the deep-supervision discriminator distinguishes the generated CT from the ground-truth at different levels of feature maps. For segmentation, the loss is not only concentrated on the voxel level but also on the higher abstract perceptual levels. Experimental results show that the proposed method generates CT images with clearer structural details and also segments the bony structures more accurately compared with the state-of-the-art methods.

Miaoyun Zhao, Li Wang, Jiawei Chen, Dong Nie, Yulai Cong, Sahar Ahmad, Angela Ho, Peng Yuan, Steve H. Fung, Hannah H. Deng, James Xia, Dinggang Shen

### Automatic Skin Lesion Segmentation on Dermoscopic Images by the Means of Superpixel Merging

We present a superpixel-based strategy for segmenting skin lesion on dermoscopic images. The segmentation is carried out by over-segmenting the original image using the SLIC algorithm, and then merge the resulting superpixels into two regions: healthy skin and lesion. The mean RGB color of each superpixel was used as merging criterion. The presented method is capable of dealing with segmentation problems commonly found in dermoscopic images such as hair removal, oil bubbles, changes in illumination, and reflections images without any additional steps. The method was evaluated on the PH2 and ISIC 2017 dataset with results comparable to the state-of-art.

Diego Patiño, Jonathan Avendaño, John W. Branch

### Star Shape Prior in Fully Convolutional Networks for Skin Lesion Segmentation

Semantic segmentation is an important preliminary step towards automatic medical image interpretation. Recently deep convolutional neural networks have become the first choice for the task of pixel-wise class prediction. While incorporating prior knowledge about the structure of target objects has proven effective in traditional energy-based segmentation approaches, there has not been a clear way for encoding prior knowledge into deep learning frameworks. In this work, we propose a new loss term that encodes the star shape prior into the loss function of an end-to-end trainable fully convolutional network (FCN) framework. We penalize non-star shape segments in FCN prediction maps to guarantee a global structure in segmentation results. Our experiments demonstrate the advantage of regularizing FCN parameters by the star shape prior and our results on the ISBI 2017 skin segmentation challenge data set achieve the first rank in the segmentation task among 21 participating teams.

Zahra Mirikharaji, Ghassan Hamarneh

### Fast Vessel Segmentation and Tracking in Ultra High-Frequency Ultrasound Images

Ultra High Frequency Ultrasound (UHFUS) enables the visualization of highly deformable small and medium vessels in the hand. Intricate vessel-based measurements, such as intimal wall thickness and vessel wall compliance, require sub-millimeter vessel tracking between B-scans. Our fast GPU-based approach combines the advantages of local phase analysis, a distance-regularized level set, and an Extended Kalman Filter (EKF), to rapidly segment and track the deforming vessel contour. We validated on 35 UHFUS sequences of vessels in the hand, and we show the transferability of the approach to 5 more diverse datasets acquired by a traditional High Frequency Ultrasound (HFUS) machine. To the best of our knowledge, this is the first algorithm capable of rapidly segmenting and tracking deformable vessel contours in 2D UHFUS images. It is also the fastest and most accurate system for 2D HFUS images.

Tejas Sudharshan Mathai, Lingbo Jin, Vijay Gorantla, John Galeotti

### Deep Reinforcement Learning for Vessel Centerline Tracing in Multi-modality 3D Volumes

Accurate vessel centerline tracing greatly benefits vessel centerline geometry assessment and facilitates precise measurements of vessel diameters and lengths. However, cursive and longitudinal geometries of vessels make centerline tracing a challenging task in volumetric images. Treating the problem with traditional feature handcrafting is often ad-hoc and time-consuming, resulting in suboptimal solutions. In this work, we propose a unified end-to-end deep reinforcement learning approach for robust vessel centerline tracing in multi-modality 3D medical volumes. Instead of time-consuming exhaustive search in 3D space, we propose to learn an artificial agent to interact with surrounding environment and collect rewards from the interaction. A deep neural network is integrated to the system to predict stepwise action value for every possible actions. With this mechanism, the agent is able to probe through an optimal navigation path to trace the vessel centerline. Our proposed approach is evaluated on a dataset of over 2,000 3D volumes with diverse imaging modalities, including contrasted CT, non-contrasted CT, C-arm CT and MR images. The experimental results show that the proposed approach can handle large variations from vessel shape to imaging characteristics, with a tracing error as low as 3.28 mm and detection time as fast as 1.71 s per volume.

Pengyue Zhang, Fusheng Wang, Yefeng Zheng