Skip to main content
Top

2021 | Book

Medical Image Computing and Computer Assisted Intervention – MICCAI 2021

24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part IV

Editors: Prof. Dr. Marleen de Bruijne, Prof. Dr. Philippe C. Cattin, Stéphane Cotin, Nicolas Padoy, Prof. Stefanie Speidel, Yefeng Zheng, Caroline Essert

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

The eight-volume set LNCS 12901, 12902, 12903, 12904, 12905, 12906, 12907, and 12908 constitutes the refereed proceedings of the 24th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2021, held in Strasbourg, France, in September/October 2021.*

The 542 revised full papers presented were carefully reviewed and selected from 1809 submissions in a double-blind review process. The papers are organized in the following topical sections:

Part I: image segmentation

Part II: machine learning - self-supervised learning; machine learning - semi-supervised learning; and machine learning - weakly supervised learning

Part III: machine learning - advances in machine learning theory; machine learning - domain adaptation; machine learning - federated learning; machine learning - interpretability / explainability; and machine learning - uncertainty

Part IV: image registration; image-guided interventions and surgery; surgical data science; surgical planning and simulation; surgical skill and work flow analysis; and surgical visualization and mixed, augmented and virtual reality

Part V: computer aided diagnosis; integration of imaging with non-imaging biomarkers; and outcome/disease prediction

Part VI: image reconstruction; clinical applications - cardiac; and clinical applications - vascular

Part VII: clinical applications - abdomen; clinical applications - breast; clinical applications - dermatology; clinical applications - fetal imaging; clinical applications - lung; clinical applications - neuroimaging - brain development; clinical applications - neuroimaging - DWI and tractography; clinical applications - neuroimaging - functional brain networks; clinical applications - neuroimaging – others; and clinical applications - oncology

Part VIII: clinical applications - ophthalmology; computational (integrative) pathology; modalities - microscopy; modalities - histopathology; and modalities - ultrasound

*The conference was held virtually.

Table of Contents

Frontmatter

Image Registration

Frontmatter
Medical Image Registration Based on Uncoupled Learning and Accumulative Enhancement

As a basic building block in medical image analysis, image registration has been greatly developed since the emergence of modern deep neural networks. Compared to non-learning-based methods, the latest approaches can learn task-specific features spontaneously, thus generate the registration results with one round of inference. However, when large inter-image distortion occurs, the stability of existing methods can be strongly affected. To alleviate this problem, the iterative framework based on coarse-to-fine strategies has been introduced in recent works. However, their networks at each iteration step are relatively independent, which is not an optimal solution for the reinforcement of image features. What is more, the moving and the fixed images are often concatenated or fed to identical network layers. Consequently, the iterative learning and warping on the moving image can be entangled with the fixed image. In order to address these issues, we present a novel medical image registration framework, namely ULAE-net, to continuously enhance the spatial transformation and establish more profound contextual dependencies under a compact network layout. Extensive experiments on 3D brain MRI data sets demonstrate that our method has greatly improved the registration performance, thereby outperforms state-of-the-art methods under large-scale deformations ( https://github.com/wanghaostu/ULAE-net ).

Yucheng Shu, Hao Wang, Bin Xiao, Xiuli Bi, Weisheng Li
Atlas-based Segmentation of Intracochlear Anatomy in Metal Artifact Affected CT Images of the Ear with Co-trained Deep Neural Networks

We propose an atlas-based method to segment the intracochlear anatomy (ICA) in the post-implantation CT (Post-CT) images of cochlear implant (CI) recipients that preserves the point-to-point correspondence between the meshes in the atlas and the segmented volumes. To solve this problem, which is challenging because of the strong artifacts produced by the implant, we use a pair of co-trained deep networks that generate dense deformation fields (DDFs) in opposite directions. One network is tasked with registering an atlas image to the Post-CT images and the other network is tasked with registering the Post-CT images to the atlas image. The networks are trained using loss functions based on voxel-wise labels, image content, fiducial registration error, and cycle-consistency constraint. The segmentation of the ICA in the Post-CT images is subsequently obtained by transferring the predefined segmentation meshes of the ICA in the atlas image to the Post-CT images using the corresponding DDFs generated by the trained registration networks. Our model can learn the underlying geometric features of the ICA even though they are obscured by the metal artifacts. We show that our end-to-end network produces results that are comparable to the current state of the art (SOTA) that relies on a two-steps approach that first uses conditional generative adversarial networks to synthesize artifact-free images from the Post-CT images and then uses an active shape model-based method to segment the ICA in the synthetic images. Our method requires a fraction of the time needed by the SOTA, which is important for end-user acceptance.

Jianing Wang, Dingjie Su, Yubo Fan, Srijata Chakravorti, Jack H. Noble, Benoit M. Dawant
Learning Unsupervised Parameter-Specific Affine Transformation for Medical Images Registration

Affine registration has recently been formulated using deep learning frameworks to establish spatial correspondences between different images. In this work, we propose a new unsupervised model that investigates two new strategies to tackle fundamental problems related to affine registration. More specifically, the new model 1) has the advantage to explicitly learn specific geometric transformation parameters (e.g. translations, rotation, scaling and shearing); and 2) can effectively understand the context between the images via cross-stitch units allowing feature exchange. The proposed model is evaluated on two two-dimensional X-ray datasets and a three-dimensional CT dataset. Our experimental results show that our model not only outperforms state-of-art approaches and also can predict specific transformation parameters. Our core source code is made available online $$^{1}$$ 1 ( $$^{1}$$ 1 https://github.com/xuuuuuuchen/PASTA ).

Xu Chen, Yanda Meng, Yitian Zhao, Rachel Williams, Srinivasa R. Vallabhaneni, Yalin Zheng
Conditional Deformable Image Registration with Convolutional Neural Network

Recent deep learning-based methods have shown promising results and runtime advantages in deformable image registration. However, analyzing the effects of hyperparameters and searching for optimal regularization parameters prove to be too prohibitive in deep learning-based methods. This is because it involves training a substantial number of separate models with distinct hyperparameter values. In this paper, we propose a conditional image registration method and a new self-supervised learning paradigm for deep deformable image registration. By learning the conditional features that are correlated with the regularization hyperparameter, we demonstrate that optimal solutions with arbitrary hyperparameters can be captured by a single deep convolutional neural network. In addition, the smoothness of the resulting deformation field can be manipulated with arbitrary strength of smoothness regularization during inference. Extensive experiments on a large-scale brain MRI dataset show that our proposed method enables the precise control of the smoothness of the deformation field without sacrificing the runtime advantage or registration accuracy.

Tony C. W. Mok, Albert C. S. Chung
A Deep Discontinuity-Preserving Image Registration Network

Image registration aims to establish spatial correspondence across pairs, or groups of images, and is a cornerstone of medical image computing and computer-assisted-interventions. Currently, most deep learning-based registration methods assume that the desired deformation fields are globally smooth and continuous, which is not always valid for real-world scenarios, especially in medical image registration (e.g. cardiac imaging and abdominal imaging). Such a global constraint can lead to artefacts and increased errors at discontinuous tissue interfaces. To tackle this issue, we propose a weakly-supervised Deep Discontinuity-preserving Image Registration network (DDIR), to obtain better registration performance and realistic deformation fields. We demonstrate that our method achieves significant improvements in registration accuracy and predicts more realistic deformations, in registration experiments on cardiac magnetic resonance (MR) images from UK Biobank Imaging Study (UKBB), than state-of-the-art approaches.

Xiang Chen, Yan Xia, Nishant Ravikumar, Alejandro F. Frangi
End-to-end Ultrasound Frame to Volume Registration

Fusing intra-operative 2D transrectal ultrasound (TRUS) image with pre-operative 3D magnetic resonance (MR) volume to guide prostate biopsy can significantly increase the yield. However, such a multimodal 2D/3D registration problem is very challenging due to several significant obstacles such as dimensional mismatch, large modal appearance difference, and heavy computational load. In this paper, we propose an end-to-end frame-to-volume registration network (FVR-Net), which can efficiently bridge the previous research gaps by aligning a 2D TRUS frame with a 3D TRUS volume without requiring hardware tracking. The proposed FVR-Net utilizes a dual-branch feature extraction module to extract the information from TRUS frame and volume to estimate transformation parameters. To achieve efficient training and inference, we introduce a differentiable 2D slice sampling module which allows gradients backpropagating from an unsupervised image similarity loss for content correspondence learning. Our experiments demonstrate the proposed method’s superior efficiency for real-time interventional guidance with highly competitive registration accuracy. Source code of this work is publicly available at https://github.com/DIAL-RPI/FVR-Net .

Hengtao Guo, Xuanang Xu, Sheng Xu, Bradford J. Wood, Pingkun Yan
Cross-Modal Attention for MRI and Ultrasound Volume Registration

Prostate cancer biopsy benefits from accurate fusion of transrectal ultrasound (TRUS) and magnetic resonance (MR) images. In the past few years, convolutional neural networks (CNNs) have been proved powerful in extracting image features crucial for image registration. However, challenging applications and recent advances in computer vision suggest that CNNs are quite limited in its ability to understand spatial correspondence between features, a task in which the self-attention mechanism excels. This paper aims to develop a self-attention mechanism specifically for cross-modal image registration. Our proposed cross-modal attention block effectively maps each of the features in one volume to all features in the corresponding volume. Our experimental results demonstrate that a CNN network designed with the cross-modal attention block embedded outperforms an advanced CNN network 10 times of its size. We also incorporated visualization techniques to improve the interpretability of our network. The source code of our work is available at https://github.com/DIAL-RPI/Attention-Reg .

Xinrui Song, Hengtao Guo, Xuanang Xu, Hanqing Chao, Sheng Xu, Baris Turkbey, Bradford J. Wood, Ge Wang, Pingkun Yan
Bayesian Atlas Building with Hierarchical Priors for Subject-Specific Regularization

This paper presents a novel hierarchical Bayesian model for unbiased atlas building with subject-specific regularizations of image registration. We develop an atlas construction process that automatically selects parameters to control the smoothness of diffeomorphic transformation according to individual image data. To achieve this, we introduce a hierarchical prior distribution on regularization parameters that allows multiple penalties on images with various degrees of geometric transformations. We then treat the regularization parameters as latent variables and integrate them out from the model by using the Monte Carlo Expectation Maximization (MCEM) algorithm. Another advantage of our algorithm is that it eliminates the need for manual parameter tuning, which can be tedious and infeasible. We demonstrate the effectiveness of our model on 3D brain MR images. Experimental results show that our model provides a sharper atlas compared to the current atlas building algorithms with single-penalty regularizations. Our code is publicly available at https://github.com/jw4hv/HierarchicalBayesianAtlasBuild .

Jian Wang, Miaomiao Zhang
SAME: Deformable Image Registration Based on Self-supervised Anatomical Embeddings

In this work, we introduce a fast and accurate method for unsupervised 3D medical image registration. This work is built on top of a recent algorithm self-supervised anatomical embedding (SAM), which is capable of computing dense anatomical/semantic correspondences between two images at the pixel level. Our method is named SAM-enhanced registration (SAME), which breaks down image registration into three steps: affine transformation, coarse deformation, and deep deformable registration. Using SAM embeddings, we enhance these steps by finding more coherent correspondences, and providing features and a loss function with better semantic guidance. We collect a multi-phase chest computed tomography dataset with 35 annotated organs for each patient and conduct inter-subject registration for quantitative evaluation. Results show that SAME outperforms widely-used traditional registration techniques (Elastix FFD, ANTs SyN) and learning based VoxelMorph method by at least $$4.7\%$$ 4.7 % and $$2.7\%$$ 2.7 % in Dice scores for two separate tasks of within-contrast-phase and across-contrast-phase registration, respectively. SAME achieves the comparable performance to the best traditional registration method, DEEDS (from our evaluation), while being orders of magnitude faster (from 45 s to 1.2 s).

Fengze Liu, Ke Yan, Adam P. Harrison, Dazhou Guo, Le Lu, Alan L. Yuille, Lingyun Huang, Guotong Xie, Jing Xiao, Xianghua Ye, Dakai Jin
Weakly Supervised Registration of Prostate MRI and Histopathology Images

The interpretation of prostate MRI suffers from low agreement across radiologists due to the subtle differences between cancer and normal tissue. Image registration addresses this issue by accurately mapping the ground-truth cancer labels from surgical histopathology images onto MRI. Cancer labels achieved by image registration can be used to improve radiologists’ interpretation of MRI by training deep learning models for early detection of prostate cancer. A major limitation of current automated registration approaches is that they require manual prostate segmentations, which is a time-consuming task, prone to errors. This paper presents a weakly supervised approach for affine and deformable registration of MRI and histopathology images without requiring prostate segmentations. We used manual prostate segmentations and mono-modal synthetic image pairs to train our registration networks to align prostate boundaries and local prostate features. Although prostate segmentations were used during the training of the network, such segmentations were not needed when registering unseen images at inference time. We trained and validated our registration network with 135 and 10 patients from an internal cohort, respectively. We tested the performance of our method using 16 patients from the internal cohort and 22 patients from an external cohort. The results show that our weakly supervised method has achieved significantly higher registration accuracy than a state-of-the-art method run without prostate segmentations. Our deep learning framework will ease the registration of MRI and histopathology images by obviating the need for prostate segmentations.

Wei Shao, Indrani Bhattacharya, Simon J. C. Soerensen, Christian A. Kunder, Jeffrey B. Wang, Richard E. Fan, Pejman Ghanouni, James D. Brooks, Geoffrey A. Sonn, Mirabela Rusu
4D-CBCT Registration with a FBCT-derived Plug-and-Play Feasibility Regularizer

Deformable registration of phase-resolved lung images is an important procedure to appreciate respiratory motion and enhance image quality. Compared to high-resolution fan-beam CTs (FBCTs), cone-beam CTs (CBCTs) are more readily available for on-table acquisition in companion with treatment. However, CBCT registration is challenging because classic regularization energies in convention methods usually cannot overcome the strong artifacts and the lack of structural details. In this study, we propose to learn an implicit feasibility prior of respiratory motion and incorporate it in a plug-and-play (PnP) fashion into the training of an unsupervised image registration network to improve registration accuracy and robustness to noise and artifacts. In particular, we propose a novel approach to develop a feasibility descriptor from a set of deformation vector fields (DVFs) generated from FBCTs. Subsequently, this FBCT-derived feasibility descriptor was used as a spatially variant regularizer on DVF Jacobian during the unsupervised training for 4D-CBCT registration. In doing so, the higher-quality, higher-confidence information from FBCT is transferred into the much challenging problem of CBCT registration, without explicit FB-CB synthesis. The method was evaluated using manually identified landmarks on real CBCTs and automatically detected landmarks on simulated CBCTs. The method presented good robustness to noise and artifacts and generated physically more feasible DVFs. The target registration errors on the real and simulated data were (1.63 ± 0.98) and (2.16 ± 1.91) mm, respectively, significantly better than the classic bending energy regularization in both the conventional method in SimpleElastix and the unsupervised network. The average registration time was 0.04 s.

Yudi Sang, Dan Ruan
Unsupervised Diffeomorphic Surface Registration and Non-linear Modelling

Registration is an essential tool in image analysis. Deep learning based alternatives have recently become popular, achieving competitive performance at a faster speed. However, many contemporary techniques are limited to volumetric representations, despite increased popularity of 3D surface and shape data in medical image analysis. We propose a one-step registration model for 3D surfaces that internalises a lower dimensional probabilistic deformation model (PDM) using conditional variational autoencoders (CVAE). The deformations are constrained to be diffeomorphic using an exponentiation layer. The one-step registration model is benchmarked against iterative techniques, trading in a slightly lower performance in terms of shape fit for a higher compactness. We experiment with two distance metrics, Chamfer distance (CD) and Sinkhorn divergence (SD), as specific distance functions for surface data in real-world registration scenarios. The internalised deformation model is benchmarked against linear principal component analysis (PCA) achieving competitive results and improved generalisability from lower dimensions.

Balder Croquet, Daan Christiaens, Seth M. Weinberg, Michael Bronstein, Dirk Vandermeulen, Peter Claes
Learning Dual Transformer Network for Diffeomorphic Registration

Diffeomorphic registration is widely used in medical image processing with the invertible and one-to-one mapping between images. Recent progress has been made to diffeomorphic registration by utilizing a convolutional neural network for efficient and end-to-end inference of registration fields from an image pair. However, existing deep learning-based registration models neglect to employ attention mechanisms to handle the long-range cross-image relevance in embedding learning, limiting such approaches to identify the semantically meaningful correspondence of anatomical structures. In this paper, we propose a novel dual transformer network (DTN) for diffeomorphic registration, consisting of a learnable volumetric embedding module, a dual cross-image relevance learning module for feature enhancement, and a registration field inference module. The self-attention mechanisms of DTN explicitly model both the inter- and intra-image relevances in the embedding from both the separate and concatenated volumetric images, facilitating semantical correspondence of anatomical structures in diffeomorphic registration. Extensive quantitative and qualitative evaluations demonstrate that the DTN performs favorably against state-of-the-art methods.

Yungeng Zhang, Yuru Pei, Hongbin Zha
Construction of Longitudinally Consistent 4D Infant Cerebellum Atlases Based on Deep Learning

Longitudinal infant dedicated cerebellum atlases play a fundamental role in characterizing and understanding the dynamic cerebellum development during infancy. However, due to the limited spatial resolution, low tissue contrast, tiny folding structures, and rapid growth of the cerebellum during this stage, it is challenging to build such atlases while preserving clear folding details. Furthermore, the existing atlas construction methods typically independently build discrete atlases based on samples for each age group without considering the within-subject temporal consistency, which is critical for large-scale longitudinal studies. To fill this gap, we propose an age-conditional multi-stage learning framework to construct longitudinally consistent 4D infant cerebellum atlases. Specifically, 1) A joint affine and deformable atlas construction framework is proposed to accurately build temporally continuous atlases based on the entire cohort, and rapidly warp the new images to the atlas space; 2) A longitudinal constraint is employed to enforce the within-subject temporal consistency during atlas building; 3) A Correntropy based regularization loss is further exploited to enhance the robustness of our framework. Our atlases are constructed based on 405 longitudinal scans from 187 healthy infants with age ranging from 6 to 27 months, and are compared to the atlases built by state-of-the-art algorithms. Results demonstrate that our atlases preserve more structural details and fine-grained cerebellum folding patterns, which ensure higher accuracy in subsequent atlas-based registration and segmentation tasks.

Liangjun Chen, Zhengwang Wu, Dan Hu, Yuchen Pei, Fenqiang Zhao, Yue Sun, Ya Wang, Weili Lin, Li Wang, Gang Li, the UNC/UMN Baby Connectome Project Consortium
Nesterov Accelerated ADMM for Fast Diffeomorphic Image Registration

Deterministic approaches using iterative optimisation have been historically successful in diffeomorphic image registration (DiffIR). Although these approaches are highly accurate, they typically carry a significant computational burden. Recent developments in stochastic approaches based on deep learning have achieved sub-second runtimes for DiffIR with competitive registration accuracy, offering a fast alternative to conventional iterative methods. In this paper, we attempt to reduce this difference in speed whilst retaining the performance advantage of iterative approaches in DiffIR. We first propose a simple iterative scheme that functionally composes intermediate non-stationary velocity fields to handle large deformations in images whilst guaranteeing diffeomorphisms in the resultant deformation. We then propose a convex optimisation model that uses a regularisation term of arbitrary order to impose smoothness on these velocity fields and solve this model with a fast algorithm that combines Nesterov gradient descent and the alternating direction method of multipliers (ADMM). Finally, we leverage the computational power of GPU to implement this accelerated ADMM solver on a 3D cardiac MRI dataset, further reducing runtime to less than 2 s. In addition to producing strictly diffeomorphic deformations, our methods outperform both state-of-the-art deep learning-based and iterative DiffIR approaches in terms of dice and Hausdorff scores, with speed approaching the inference time of deep learning-based methods.

Alexander Thorley, Xi Jia, Hyung Jin Chang, Boyang Liu, Karina Bunting, Victoria Stoll, Antonio de Marvao, Declan P. O’Regan, Georgios Gkoutos, Dipak Kotecha, Jinming Duan
Spectral Embedding Approximation and Descriptor Learning for Craniofacial Volumetric Image Correspondence

Deformable image correspondence is crucial in various medical image research. Existing deep learning-based registration and correspondence models mostly learn a nonlinear voxel-wise mapping function between volumetric images by metric space alignments in the spatial domain, without addressing the intrinsic structure correspondence. Thus, the registration requires prior affine transformation or landmark annotations to handle high-frequency perturbations due to pose and structural variations. This paper presents a novel and efficient correspondence framework via low-dimensional spectral mapping to handle the intrinsic correspondence of anatomical structures. We devise a novel multi-path graph convolutional network (GCN)-based embedding approximation module, relieving the time complexity in the eigendecomposition-based spectral embedding of volumetric images. We present a descriptor learning module and surpass the descriptor selection or hand-crafted descriptors. Experimental results demonstrate the efficacy of the core modules, i.e., the image embedding approximation and descriptor learning, for volumetric image correspondence and the atlas-based registration of craniofacial anatomical structures. The proposed approach achieves comparable corresponding accuracies with the state-of-the-art deep registration models, being resilient to pose and shape perturbations.

Diya Sun, Yungeng Zhang, Yuru Pei, Tianmin Xu, Hongbin Zha
A Deep Network for Joint Registration and Parcellation of Cortical Surfaces

Cortical surface registration and parcellation are two essential steps in neuroimaging analysis. Conventionally, they are performed independently as two tasks, ignoring the inherent connections of these two closely-related tasks. Essentially, both tasks rely on meaningful cortical feature representations, so they can be jointly optimized by learning shared useful cortical features. To this end, we propose a deep learning framework for joint cortical surface registration and parcellation. Specifically, our approach leverages the spherical topology of cortical surfaces and uses a spherical network as the shared encoder to first learn shared features for both tasks. Then we train two task-specific decoders for registration and parcellation, respectively. We further exploit the more explicit connection between them by incorporating the novel parcellation map similarity loss to enforce the boundary consistency of regions, thereby providing extra supervision for the registration task. Conversely, parcellation network training also benefits from the registration, which provides a large amount of augmented data by warping one surface with manual parcellation map to another surface, especially when only few manually-labeled surfaces are available. Experiments on a dataset with more than 600 cortical surfaces show that our approach achieves large improvements on both parcellation and registration accuracy (over separately trained networks) and enables training high-quality parcellation and registration models using much fewer labeled data.

Fenqiang Zhao, Zhengwang Wu, Li Wang, Weili Lin, Shunren Xia, Gang Li, the UNC/UMN Baby Connectome Project Consortium
4D-Foot: A Fully Automated Pipeline of Four-Dimensional Analysis of the Foot Bones Using Bi-plane X-Ray Video and CT

We aim to elucidate the mechanism of the foot by automated measurement of its multiple bone movement using 2D-3D registration of bi-plane x-ray video and a stationary 3D CT. Conventional analyses allowed tracking of only 3 large proximal tarsal bones due to the requirement of manual segmentation and manual initialization of 2D-3D registration. The learning-based 2D-3D registration, on the other hand, has been actively studied and demonstrating a large capture range, but the accuracy is inferior to conventional optimization-based methods. We propose a fully automated pipeline using a cost function that seamlessly incorporates the reprojection error at the landmarks in CT and x-ray detected by off-the-shelf CNNs into the conventional image similarity cost, combined with the automated bone segmentation. We experimentally demonstrated that the pipeline allowed a robust and accurate 2D-3D registration to track all 12 tarsal bones, including the metatarsals at the foot arch, which is especially important in the foot biomechanics but has been unmeasurable with previous methods. We evaluated the proposed fully automated pipeline in studies using a bone phantom and real x-ray images of human subjects. The real image study showed the registration error of 0.38 ± 1.95 mm in translation and 0.38 ± 1.20 $$^\circ $$ ∘ in rotation for the proximal tarsal bones.

Shuntaro Mizoe, Yoshito Otake, Takuma Miyamoto, Mazen Soufi, Satoko Nakao, Yasuhito Tanaka, Yoshinobu Sato
Equivariant Filters for Efficient Tracking in 3D Imaging

We demonstrate an object tracking method for 3D images with fixed computational cost and state-of-the-art performance. Previous methods predicted transformation parameters from convolutional layers. We instead propose an architecture that neither flattens convolutional features nor uses fully connected layers, but instead relies on equivariant filters to preserve transformations between inputs and outputs (e.g., rotations/translations of inputs rotate/translate outputs). The transformation is then derived in closed form from the outputs of the filters. This method is useful for applications requiring low latency, such as real-time tracking. We demonstrate our model on synthetically augmented adult brain MRI, as well as fetal brain MRI, which is the intended use-case.

Daniel Moyer, Esra Abaci Turk, P. Ellen Grant, William M. Wells, Polina Golland
Revisiting Iterative Highly Efficient Optimisation Schemes in Medical Image Registration

3D registration remains one of the big challenges in medical imaging, especially when dealing with highly deformed anatomical structures such as those encountered in inter- or intra-patient registration of abdominal scans. In a recent MICCAI registration challenge (Learn2Reg) deep learning based network architectures with inference times of <2 s showed great success for supervised alignment tasks. However, in unsupervised settings deep learning methods have not yet outperformed their conventional algorithmic counterparts based on continuous iterative optimisation (and probably won’t as they share the same objective function (image metric)). This finding has brought us to revisit conventional optimisation schemes and investigate an iterative message passing approach that enables fast runtimes (using iterative optimisation with only few displacement candidates) and high registration accuracy. We conduct experiments on three challenging abdominal datasets ((pre-aligned) inter-patient CT, intra-patient MR-CT) and carry out an in-depth evaluation with a set of selected comparison methods. Our results clearly indicate that optimisation based methods are highly competitive both in accuracy and runtime when compared to Deep Learning methods. Moreover, we show that semantic label information (when available) can be efficiently exploited by our approach (cf. weakly supervised learning). Data and code will be made publicly available to ensure reproducibility and accelerate research in the field of 3D medical registration ( https://github.com/lasseha/iter_lbp ).

Lasse Hansen, Mattias P. Heinrich
Multi-scale Neural ODEs for 3D Medical Image Registration

Image registration plays an important role in medical image analysis. Conventional optimization based methods provide an accurate estimation due to the iterative process at the cost of expensive computation. Deep learning methods such as learn-to-map are much faster but either iterative or coarse-to-fine approach is required to improve accuracy for handling large motions. In this work, we proposed to learn a registration optimizer via a multi-scale neural ODE model. The inference consists of iterative gradient updates similar to a conventional gradient descent optimizer but in a much faster way, because the neural ODE learns from the training data to adapt the gradient efficiently at each iteration. Furthermore, we proposed to learn a modal-independent similarity metric to address image appearance variations across different image contrasts. We performed evaluations through extensive experiments in the context of multi-contrast 3D MR images from both public and private data sources and demonstrate the superior performance of our proposed methods.

Junshen Xu, Eric Z. Chen, Xiao Chen, Terrence Chen, Shanhui Sun

Image-Guided Interventions and Surgery

Frontmatter
Self-supervised Generative Adversarial Network for Depth Estimation in Laparoscopic Images

Dense depth estimation and 3D reconstruction of a surgical scene are crucial steps in computer assisted surgery. Recent work has shown that depth estimation from a stereo image pair could be solved with convolutional neural networks. However, most recent depth estimation models were trained on datasets with per-pixel ground truth. Such data is especially rare for laparoscopic imaging, making it hard to apply supervised depth estimation to real surgical applications. To overcome this limitation, we propose SADepth, a new self-supervised depth estimation method based on Generative Adversarial Networks. It consists of an encoder-decoder generator and a discriminator to incorporate geometry constraints during training. Multi-scale outputs from the generator help to solve the local minima caused by the photometric reprojection loss, while the adversarial learning improves the framework generation quality. Extensive experiments on two public datasets show that SADepth outperforms recent state-of-the-art unsupervised methods by a large margin, and reduces the gap between supervised and unsupervised depth estimation in laparoscopic images.

Baoru Huang, Jian-Qing Zheng, Anh Nguyen, David Tuch, Kunal Vyas, Stamatia Giannarou, Daniel S. Elson
Personalized Respiratory Motion Model Using Conditional Generative Networks for MR-Guided Radiotherapy

MRI-guided radiotherapy systems enable real-time 2D cine acquisitions for target monitoring, but cannot provide volumetric information due to spatio-temporal constraints. Hence, respiratory motion models coupled with a temporal predictive mechanism are a suitable solution to enable ahead-of-time 3D tumor and anatomy tracking in combination with real-time online plan adaptation. We propose a novel subject-specific probabilistic model to enable 3D+t predictions from image-based surrogates during radiotherapy treatments. The model is trained end-to-end to simultaneously capture and learn a distribution of realistic motion fields over a population dataset. Furthermore, the distribution is conditioned on a sequence of partial observations, which can be extrapolated in time using a $$ {seq2seq}$$ s e q 2 s e q -inspired mechanism allowing for scalable predictive horizon. Based on the generative properties of conditional variational autoencoders, it integrates anatomical features and temporal information to construct an interpretable latent space with respiratory phase discrimination. The choice of a probabilistic framework allows improving uncertainty estimation during the volume generation phase. Experimental validation on 25 subjects demonstrates the potential of the proposed model, which achieves a mean landmark error of $$1.4 \pm 1.1$$ 1.4 ± 1.1 mm, yielding statistically significant improvements over state-of-the-art methods.

Liset Vázquez Romaguera, Tal Mezheritsky, Samuel Kadoury
Multimodal Sensing Guidewire for C-Arm Navigation with Random UV Enhanced Optical Sensors Using Spatio-Temporal Networks

Percutaneous transluminal angioplasty (PTA) revascularization is a common minimally invasive treatment for occlusions in peripheral arteries, but it’s success in long occlusions is limited by technical challenges associated with crossing occluded vessels and lumen re-entry. Revascularization needs to be guided closely using ionizing imaging such as fluoroscopy, while intravascular guidewires lack the capability of characterizing physiological conditions near occlusions, such as blood flow. We propose a multimodal sensing framework to infer both three-dimensional shape and vascular flow from an optical fiber device using random optical gratings enhanced with ultraviolet exposure, allowing a fully-distributed strain sensor. A two-branch spatio-temporal neural network is proposed to process a generated optical signal trajectory from scattered wavelength distributions. A shape network is first used in combination with the pre-procedural 3D angiography image to track the 3D shape related to backscattered wavelength shift, while a flow velocity network trained on 4D-MRI measurements allows to extract vascular flow. A final refinement is performed to adjust the 3D-2D projection onto C-arm images, allowing to correct for slight deviations of the sensed shape. Synthetic and porcine experiments were performed in a controlled environment setting, enabling to measure the accuracy of the 3D shape tracking and flow measurements, with errors of 2.4 ± 0.9 mm and flow differences below 2 cm/s, demonstrating the ability to provide anatomical and physiological properties during vascular procedures.

Andrei Svecic, Gilles Soulez, Frédéric Monet, Raman Kashyap, Samuel Kadoury
Image-to-Graph Convolutional Network for Deformable Shape Reconstruction from a Single Projection Image

Shape reconstruction of deformable organs from two-dimensional X-ray images is a key technology for image-guided intervention. In this paper, we propose an image-to-graph convolutional network (IGCN) for deformable shape reconstruction from a single-viewpoint projection image. The IGCN learns relationship between shape/deformation variability and the deep image features based on a deformation mapping scheme. In experiments targeted to the respiratory motion of abdominal organs, we confirmed the proposed framework with a regularized loss function can reconstruct liver shapes from a single digitally reconstructed radiograph with a mean distance error of $$3.6 \text { mm}$$ 3.6 mm .

Megumi Nakao, Fei Tong, Mitsuhiro Nakamura, Tetsuya Matsuda
Class-Incremental Domain Adaptation with Smoothing and Calibration for Surgical Report Generation

Generating surgical reports aimed at surgical scene understanding in robot-assisted surgery can contribute to documenting entry tasks and post-operative analysis. Despite the impressive outcome, the deep learning model degrades the performance when applied to different domains encountering domain shifts. In addition, there are new instruments and variations in surgical tissues appeared in robotic surgery. In this work, we propose class-incremental domain adaptation (CIDA) with a multi-layer transformer-based model to tackle the new classes and domain shift in the target domain to generate surgical reports during robotic surgery. To adapt incremental classes and extract domain invariant features, a class-incremental (CI) learning method with supervised contrastive (SupCon) loss is incorporated with a feature extractor. To generate caption from the extracted feature, curriculum by one-dimensional gaussian smoothing (CBS) is integrated with a multi-layer transformer-based caption prediction model. CBS smoothes the features embedding using anti-aliasing and helps the model to learn domain invariant features. We also adopt label smoothing (LS) to calibrate prediction probability and obtain better feature representation with both feature extractor and captioning model. The proposed techniques are empirically evaluated by using the datasets of two surgical domains, such as nephrectomy operations and transoral robotic surgery. We observe that domain invariant feature learning and the well-calibrated network improves the surgical report generation performance in both source and target domain under domain shift and unseen classes in the manners of one-shot and few-shot learning. The code is publicly available at https://github.com/XuMengyaAmy/CIDACaptioning .

Mengya Xu, Mobarakol Islam, Chwee Ming Lim, Hongliang Ren
Real-Time Rotated Convolutional Descriptor for Surgical Environments

Many descriptors exist that are usable in real-time and tailored for indoor and outdoor tracking and mapping, with a small subset of these being learned descriptors. In order to enable the same in deformable surgical environments without ground truth data, we propose a Real-Time Rotated descriptor, ReTRo, that can be trained in a weakly-supervised manner using stereo images. We propose a novel network that creates these fast, high-quality descriptors that have the option to be binary-valued. ReTRo is the first convolutional feature descriptor to learn a sampling pattern as part of the network, in addition to being the first real-time learned descriptor for surgery. ReTRo runs on multiple scales and has a large receptive field while only requiring small patches for input, affording it great speed. We quantify ReTRo by using it for pose estimation and tissue tracking, demonstrating its efficacy and real-time speed. ReTRo outperforms classical descriptors used in surgery and it will enable surgical tracking and mapping frameworks.

Adam Schmidt, Septimiu E. Salcudean
Surgical Instruction Generation with Transformers

Automatic surgical instruction generation is a prerequisite towards intra-operative context-aware surgical assistance. However, generating instructions from surgical scenes is challenging, as it requires jointly understanding the surgical activity of current view and modelling relationships between visual information and textual description. Inspired by the neural machine translation and imaging captioning tasks in open domain, we introduce a transformer-backboned encoder-decoder network with self-critical reinforcement learning to generate instructions from surgical images. We evaluate the effectiveness of our method on DAISI dataset, which includes 290 procedures from various medical disciplines. Our approach outperforms the existing baseline over all caption evaluation metrics. The results demonstrate the benefits of the encoder-decoder structure backboned by transformer in handling multimodal context.

Jinglu Zhang, Yinyu Nie, Jian Chang, Jian Jun Zhang
Adversarial Domain Feature Adaptation for Bronchoscopic Depth Estimation

Depth estimation from monocular images is an important task in localization and 3D reconstruction pipelines for bronchoscopic navigation. Various supervised and self-supervised deep learning-based approaches have proven themselves on this task for natural images. However, the lack of labeled data and the bronchial tissue’s feature-scarce texture make the utilization of these methods ineffective on bronchoscopic scenes. In this work, we propose an alternative domain-adaptive approach. Our novel two-step structure first trains a depth estimation network with labeled synthetic images in a supervised manner; then adopts an unsupervised adversarial domain feature adaptation scheme to improve the performance on real images. The results of our experiments show that the proposed method improves the network’s performance on real images by a considerable margin and can be employed in 3D reconstruction pipelines.

Mert Asim Karaoglu, Nikolas Brasch, Marijn Stollenga, Wolfgang Wein, Nassir Navab, Federico Tombari, Alexander Ladikos
2.5D Thermometry Maps for MRI-Guided Tumor Ablation

Fast and reliable monitoring of volumetric heat distribution during MRI-guided tumor ablation is an urgent clinical need. In this work, we introduce a method for generating 2.5D thermometry maps from uniformly distributed 2D MRI phase images rotated around the applicator’s main axis. The images canbe fetched directly from the MR device, reducing the delay between image acquisition and visualization. For reconstruction, we use a weighted interpolation on a cylindric coordinate representation to calculate the heat value of voxels in a region of interest. A pilot study on 13 ex vivo bio protein phantoms with flexible tubes to simulate a heat sink effect was conducted to evaluate our method. After thermal ablation, we compared the measured coagulation zone extracted from the post-treatment MR data set with the output of the 2.5D thermometry map. The results show a mean Dice score of $$0.75\pm 0.07$$ 0.75 ± 0.07 , a sensitivity of $$0.77\pm 0.03$$ 0.77 ± 0.03 , and a reconstruction time within 18.02 ms ± 5.91 ms. Future steps should address improving temporal resolution and accuracy, e.g., incorporating advanced bioheat transfer simulations.

Julian Alpers, Daniel L. Reimert, Maximilian Rötzer, Thomas Gerlach, Marcel Gutberlet, Frank Wacker, Bennet Hensen, Christian Hansen
Detection of Critical Structures in Laparoscopic Cholecystectomy Using Label Relaxation and Self-supervision

Laparoscopic cholecystectomy can be subject to complications such as bile duct injury, which can seriously harm the patient or even result in death. Computer-assisted interventions have the potential to prevent such complications by highlighting the critical structures (cystic duct and cystic artery) during surgery, helping the surgeon establish the Critical View of Safety and avoid structure misidentification.A method is presented to detect the critical structures, using state of the art computer vision techniques. The proposed label relaxation dramatically improves performance for segmenting critical structures, which have ambiguous extent and highly variable ground truth labels. We also demonstrate how pseudo-label self-supervision allows further detection improvement using unlabelled data.The system was trained using a dataset of 3,050 labelled and 3,682 unlabelled laparoscopic cholecystectomy frames. We achieved an IoU of .65 and presence detection F1 score of .75. The model’s outputs were further evaluated qualitatively by three expert surgeons, providing preliminary confirmation of our method’s benefits.This work is among the first to perform detection of critical anatomy during laparoscopic cholecystectomy, and demonstrates the great promise of computer-assisted intervention to improve surgical safety and workflow.

David Owen, Maria Grammatikopoulou, Imanol Luengo, Danail Stoyanov
EMDQ-SLAM: Real-Time High-Resolution Reconstruction of Soft Tissue Surface from Stereo Laparoscopy Videos

We propose a novel stereo laparoscopy video-based non-rigid SLAM method called EMDQ-SLAM, which can incrementally reconstruct thee-dimensional (3D) models of soft tissue surfaces in real-time and preserve high-resolution color textures. EMDQ-SLAM uses the expectation maximization and dual quaternion (EMDQ) algorithm combined with SURF features to track the camera motion and estimate tissue deformation between video frames. To overcome the problem of accumulative errors over time, we have integrated a g2o-based graph optimization method that combines the EMDQ mismatch removal and as-rigid-as-possible (ARAP) smoothing methods. Finally, the multi-band blending (MBB) algorithm has been used to obtain high resolution color textures with real-time performance. Experimental results demonstrate that our method outperforms two state-of-the-art non-rigid SLAM methods: MISSLAM and DefSLAM. Quantitative evaluation shows an average error in the range of 0.8–2.2 mm for different cases.

Haoyin Zhou, Jagadeesan Jayender
Efficient Global-Local Memory for Real-Time Instrument Segmentation of Robotic Surgical Video

Performing a real-time and accurate instrument segmentation from videos is of great significance for improving the performance of robotic-assisted surgery. We identify two important clues for surgical instrument perception, including local temporal dependency from adjacent frames and global semantic correlation in long-range duration. However, most existing works perform segmentation purely using visual cues in a single frame. Optical flow is just used to model the motion between only two frames and brings heavy computational cost. We propose a novel dual-memory network (DMNet) to wisely relate both global and local spatio-temporal knowledge to augment the current features, boosting the segmentation performance and retaining the real-time prediction capability. We propose, on the one hand, an efficient local memory by taking the complementary advantages of convolutional LSTM and non-local mechanisms towards the relating reception field. On the other hand, we develop an active global memory to gather the global semantic correlation in long temporal range to current one, in which we gather the most informative frames derived from model uncertainty and frame similarity. We have extensively validated our method on two public benchmark surgical video datasets. Experimental results demonstrate that our method largely outperforms the state-of-the-art works on segmentation accuracy while maintaining a real-time speed.

Jiacheng Wang, Yueming Jin, Liansheng Wang, Shuntian Cai, Pheng-Ann Heng, Jing Qin
C-Arm Positioning for Spinal Standard Projections in Different Intra-operative Settings

Trauma and orthopedic surgeries that involve fluoroscopic guidance crucially depend on the acquisition of correct anatomy-specific standard projections for monitoring and evaluating the surgical result. This implies repeated acquisitions or even continuous fluoroscopy. To reduce radiation exposure and time, we propose to automate this procedure and estimate the C-arm pose update directly from a first X-ray without the need for a pre-operative computed tomography scan (CT) or additional technical equipment. Our method is trained on digitally reconstructed radiographs (DRRs) which uniquely provide ground truth labels for arbitrary many training examples. The simulated images are complemented with automatically generated segmentations, landmarks, as well as a k-wire and screw simulation. To successfully achieve a transfer from simulated to real X-rays, and also to increase the interpretability of results, the pipeline was designed by closely reflecting on the actual clinical decision-making of spinal neurosurgeons. It explicitly incorporates steps like region-of-interest (ROI) localization, detection of relevant and view-independent landmarks, and subsequent pose regression. To validate the method on real X-rays, we performed a large specimen study with and without implants (i.e. k-wires and screws). The proposed procedure obtained superior C-arm positioning accuracy ( $$p_{wilcoxon}\ll 0.01$$ p wilcoxon ≪ 0.01 ), robustness, and generalization capabilities compared to the state-of-the-art direct pose regression framework.

Lisa Kausch, Sarina Thomas, Holger Kunze, Tobias Norajitra, André Klein, Jan Siad El Barbari, Maxim Privalov, Sven Vetter, Andreas Mahnken, Lena Maier-Hein, Klaus H. Maier-Hein
Quantitative Assessments for Ultrasound Probe Calibration

Ultrasound probe calibration remains an area of active research but the science of validation has not received proportional attention in current literature. In this paper, we propose a framework to improve, assess, and visualize the quality of probe calibration. The basis of our framework is a heteroscedastic fiducial localization error (FLE) model that is physically quantifiable, used to i) derive an optimal calibration transform in the presence of heteroscedastic FLE, ii) assess the quality of a particular instance of probe calibration using a registration circuit, and iii) visualize the distribution of target registration error (TRE). The novelty of our work is the extension of the registration circuit to Procrustean point-line registration, and a demonstration that it produces a quantitative metric that correlates with true TRE. By treating ultrasound calibration as a heteroscedastic errors-in-variables regression instead of a least-squares regression, a more accurate calibration can be consistently obtained. Our framework has direct implication to many calibration techniques using point- and line-based calibration phantoms.

Elvis C. S. Chen, Burton Ma, Terry M. Peters
Intra-operative Update of Boundary Conditions for Patient-Specific Surgical Simulation

Patient-specific Biomechanical Models (PBMs) can enhance computer assisted surgical procedures with critical information. Although pre-operative data allow to parametrize such PBMs based on each patient’s properties, they are not able to fully characterize them. In particular, simulation boundary conditions cannot be determined from pre-operative modalities, but their correct definition is essential to improve the PBM predictive capability. In this work, we introduce a pipeline that provides an up-to-date estimate of boundary conditions, starting from the pre-operative model of patient anatomy and the displacement undergone by points visible from an intra-operative vision sensor. The presented pipeline is experimentally validated in realistic conditions on an ex vivo pararenal fat tissue manipulation. We demonstrate its capability to update a PBM reaching clinically acceptable performances, both in terms of accuracy and intra-operative time constraints.

Eleonora Tagliabue, Marco Piccinelli, Diego Dall’Alba, Juan Verde, Micha Pfeiffer, Riccardo Marin, Stefanie Speidel, Paolo Fiorini, Stéphane Cotin
Deep Iterative 2D/3D Registration

Deep Learning-based 2D/3D registration methods are highly robust but often lack the necessary registration accuracy for clinical application. A refinement step using the classical optimization-based 2D/3D registration method applied in combination with Deep Learning-based techniques can provide the required accuracy. However, it also increases the runtime. In this work, we propose a novel Deep Learning driven 2D/3D registration framework that can be used end-to-end for iterative registration tasks without relying on any further refinement step. We accomplish this by learning the update step of the 2D/3D registration framework using Point-to-Plane Correspondences. The update step is learned using iterative residual refinement-based optical flow estimation, in combination with the Point-to-Plane correspondence solver embedded as a known operator. Our proposed method achieves an average runtime of around 8s, a mean re-projection distance error of $$0.60 \pm 0.40$$ 0.60 ± 0.40 mm with a success ratio of 97% and a capture range of 60 mm. The combination of high registration accuracy, high robustness, and fast runtime makes our solution ideal for clinical applications.

Srikrishna Jaganathan, Jian Wang, Anja Borsdorf, Karthik Shetty, Andreas Maier
hSDB-instrument: Instrument Localization Database for Laparoscopic and Robotic Surgeries

Automated surgical instrument localization is an important technology to understand the surgical process and in order to analyze them to provide meaningful guidance during surgery or surgical index after surgery to the surgeon. We introduce a new dataset that reflects the kinematic characteristics of surgical instruments for automated surgical instrument localization of surgical videos. The hSDB(hutom Surgery DataBase)-instrument dataset consists of instrument localization information from 24 cases of laparoscopic cholecystecomy and 24 cases of robotic gastrectomy. Localization information for all instruments is provided in the form of a bounding box for object detection. To handle class imbalance problem between instruments, synthesized instruments modeled in Unity for 3D models are included as training data. Besides, for 3D instrument data, a polygon annotation is provided to enable instance segmentation of the tool. To reflect the kinematic characteristics of all instruments, they are annotated with head and body parts for laparoscopic instruments, and with head, wrist, and body parts for robotic instruments separately. Annotation data of assistive tools (specimen bag, needle, etc.) that are frequently used for surgery are also included. Moreover, we provide statistical information on the hSDB-instrument dataset and the baseline localization performances of the object detection networks trained by the MMDetection library and resulting analyses (The dataset, additional dataset statistics and several trained models are publicly available at https://hsdb-instrument.github.io/ ).

Jihun Yoon, Jiwon Lee, Sunghwan Heo, Hayeong Yu, Jayeon Lim, Chi Hyun Song, SeulGi Hong, Seungbum Hong, Bokyung Park, SungHyun Park, Woo Jin Hyung, Min-Kook Choi
Co-generation and Segmentation for Generalized Surgical Instrument Segmentation on Unlabelled Data

Surgical instrument segmentation for robot-assisted surgery is needed for accurate instrument tracking and augmented reality overlays. Therefore, the topic has been the subject of a number of recent papers in the CAI community. Deep learning-based methods have shown state-of-the-art performance for surgical instrument segmentation, but their results depend on labelled data. However, labelled surgical data is of limited availability and is a bottleneck in surgical translation of these methods. In this paper, we demonstrate the limited generalizability of these methods on different datasets, including robot-assisted surgeries on human subjects. We then propose a novel joint generation and segmentation strategy to learn a segmentation model with better generalization capability to domains that have no labelled data. The method leverages the availability of labelled data in a different domain. The generator does the domain translation from the labelled domain to the unlabelled domain and simultaneously, the segmentation model learns using the generated data while regularizing the generative model. We compared our method with state-of-the-art methods and showed its generalizability on publicly available datasets and on our own recorded video frames from robot-assisted prostatectomies. Our method shows consistently high mean Dice scores on both labelled and unlabelled domains when data is available only for one of the domains.

Megha Kalia, Tajwar Abrar Aleef, Nassir Navab, Peter Black, Septimiu E. Salcudean

Surgical Data Science

Frontmatter
E-DSSR: Efficient Dynamic Surgical Scene Reconstruction with Transformer-Based Stereoscopic Depth Perception

Reconstructing the scene of robotic surgery from the stereo endoscopic video is an important and promising topic in surgical data science, which potentially supports many applications such as surgical visual perception, robotic surgery education and intra-operative context awareness. However, current methods are mostly restricted to reconstructing static anatomy assuming no tissue deformation, tool occlusion and de-occlusion, and camera movement. However, these assumptions are not always satisfied in minimal invasive robotic surgeries. In this work, we present an efficient reconstruction pipeline for highly dynamic surgical scenes that runs at 28 fps. Specifically, we design a transformer-based stereoscopic depth perception for efficient depth estimation and a light-weight tool segmentor to handle tool occlusion. After that, a dynamic reconstruction algorithm which can estimate the tissue deformation and camera movement, and aggregate the information over time is proposed for surgical scene reconstruction. We evaluate the proposed pipeline on two datasets, the public Hamlyn Centre Endoscopic Video Dataset and our in-house DaVinci robotic surgery dataset. The results demonstrate that our method can recover the scene obstructed by the surgical tool and handle the movement of camera in realistic surgical scenarios effectively at real-time speed.

Yonghao Long, Zhaoshuo Li, Chi Hang Yee, Chi Fai Ng, Russell H. Taylor, Mathias Unberath, Qi Dou
CataNet: Predicting Remaining Cataract Surgery Duration

Cataract surgery is a sight saving surgery that is performed over 10 million times each year around the world. With such a large demand, the ability to organize surgical wards and operating rooms efficiently is critical to delivery this therapy in routine clinical care. In this context, estimating the remaining surgical duration (RSD) during procedures is one way to help streamline patient throughput and workflows. To this end, we propose CataNet, a method for cataract surgeries that predicts in real time the RSD jointly with two influential elements: the surgeon’s experience, and the current phase of the surgery. We compare CataNet to state-of-the-art RSD estimation methods, showing that it outperforms them even when phase and experience are not considered. We investigate this improvement and show that a significant contributor is the way we integrate the elapsed time into CataNet’s feature extractor.

Andrés Marafioti, Michel Hayoz, Mathias Gallardo, Pablo Márquez Neila, Sebastian Wolf, Martin Zinkernagel, Raphael Sznitman
Task Fingerprinting for Meta Learning inBiomedical Image Analysis

Shortage of annotated data is one of the greatest bottlenecks in biomedical image analysis. Meta learning studies how learning systems can increase in efficiency through experience and could thus evolve as an important concept to overcome data sparsity. However, the core capability of meta learning-based approaches is the identification of similar previous tasks given a new task - a challenge largely unexplored in the biomedical imaging domain. In this paper, we address the problem of quantifying task similarity with a concept that we refer to as task fingerprinting. The concept involves converting a given task, represented by imaging data and corresponding labels, to a fixed-length vector representation. In fingerprint space, different tasks can be directly compared irrespective of their data set sizes, types of labels or specific resolutions. An initial feasibility study in the field of surgical data science (SDS) with 26 classification tasks from various medical and non-medical domains suggests that task fingerprinting could be leveraged for both (1) selecting appropriate data sets for pretraining and (2) selecting appropriate architectures for a new task. Task fingerprinting could thus become an important tool for meta learning in SDS and other fields of biomedical image analysis.

Patrick Godau, Lena Maier-Hein
Acoustic-Based Spatio-Temporal Learning for Press-Fit Evaluation of Femoral Stem Implants

In this work, we propose a method utilizing tool-integrated vibroacoustic measurements and a spatio-temporal learning-based framework for the detection of the insertion endpoint during femoral stem implantation in cementless Total Hip Arthroplasty (THA). In current practice, the optimal insertion endpoint is intraoperatively identified based on surgical experience and dependent on a subjective decision. Leveraging spectogram features and time-variant sequences of acoustic hammer blow events, our proposed solution can give real-time feedback to the surgeon during the insertion procedure and prevent adverse events in clinical practice. To validate our method on real data, we built a realistic experimental human cadaveric setup and acquired acoustic signals of hammer blows during broaching the femoral stem cavity with a novel inserter tool which was enhanced by contact microphones. The optimal insertion endpoint was determined by a standardized preoperative plan following clinical guidelines and executed by a board-certified surgeon. We train and evaluate a Long-Term Recurrent Convolutional Neural Network (LRCN) on sequences of spectrograms to detect a reached target press fit corresponding to a seated implant. The proposed method achieves an overall per-class recall of $$93.82\pm 5.11\%$$ 93.82 ± 5.11 % for detecting an ongoing insertion and $$70.88\pm 11.83\%$$ 70.88 ± 11.83 % for identifying a reached target press fit for five independent test specimens. The obtained results open the path for the development of automated systems for intra-operative decision support, error prevention and robotic applications in hip surgery.

Matthias Seibold, Armando Hoch, Daniel Suter, Mazda Farshad, Patrick O. Zingg, Nassir Navab, Philipp Fürnstahl

Surgical Planning and Simulation

Frontmatter
Deep Simulation of Facial Appearance Changes Following Craniomaxillofacial Bony Movements in Orthognathic Surgical Planning

Facial appearance changes with the movements of bony segments in orthognathic surgery of patients with craniomaxillofacial (CMF) deformities. Conventional bio-mechanical methods, such as finite element modeling (FEM), for simulating such changes, are labor intensive and computationally expensive, preventing them from being used in clinical settings. To overcome these limitations, we propose a deep learning framework to predict post-operative facial changes. Specifically, FC-Net, a facial appearance change simulation network, is developed to predict the point displacement vectors associated with a facial point cloud. FC-Net learns the point displacements of a pre-operative facial point cloud from the bony movement vectors between pre-operative and post-operative bony models. FC-Net is a weakly-supervised point displacement network trained using paired data with strict point-to-point correspondence. To preserve the topology of the facial model during point transform, we employ a local-point-transform loss to constrain the local movements of points. Experimental results on real patient data reveal that the proposed framework can predict post-operative facial appearance changes remarkably faster than a state-of-the-art FEM method with comparable prediction accuracy.

Lei Ma, Daeseung Kim, Chunfeng Lian, Deqiang Xiao, Tianshu Kuang, Qin Liu, Yankun Lang, Hannah H. Deng, Jaime Gateno, Ye Wu, Erkun Yang, Michael A. K. Liebschner, James J. Xia, Pew-Thian Yap
A Self-supervised Deep Framework for Reference Bony Shape Estimation in Orthognathic Surgical Planning

Virtual orthognathic surgical planning involves simulating surgical corrections of jaw deformities on 3D facial bony shape models. Due to the lack of necessary guidance, the planning procedure is highly experience-dependent and the planning results are often suboptimal. A reference facial bony shape model representing normal anatomies can provide an objective guidance to improve planning accuracy. Therefore, we propose a self-supervised deep framework to automatically estimate reference facial bony shape models. Our framework is an end-to-end trainable network, consisting of a simulator and a corrector. In the training stage, the simulator maps jaw deformities of a patient bone to a normal bone to generate a simulated deformed bone. The corrector then restores the simulated deformed bone back to normal. In the inference stage, the trained corrector is applied to generate a patient-specific normal-looking reference bone from a real deformed bone. The proposed framework was evaluated using a clinical dataset and compared with a state-of-the-art method that is based on a supervised point-cloud network. Experimental results show that the estimated shape models given by our approach are clinically acceptable and significantly more accurate than that of the competing method.

Deqiang Xiao, Hannah H. Deng, Tianshu Kuang, Lei Ma, Qin Liu, Xu Chen, Chunfeng Lian, Yankun Lang, Daeseung Kim, Jaime Gateno, Steve Guofang Shen, Dinggang Shen, Pew-Thian Yap, James J. Xia
DLLNet: An Attention-Based Deep Learning Method for Dental Landmark Localization on High-Resolution 3D Digital Dental Models

Dental landmark localization is a fundamental step to analyzing dental models in the planning of orthodontic or orthognathic surgery. However, current clinical practices require clinicians to manually digitize more than 60 landmarks on 3D dental models. Automatic methods to detect landmarks can release clinicians from the tedious labor of manual annotation and improve localization accuracy. Most existing landmark detection methods fail to capture local geometric contexts, causing large errors and misdetections. We propose an end-to-end learning framework to automatically localize 68 landmarks on high-resolution dental surfaces. Our network hierarchically extracts multi-scale local contextual features along two paths: a landmark localization path and a landmark area-of-interest segmentation path. Higher-level features are learned by combining local-to-global features from the two paths by feature fusion to predict the landmark heatmap and the landmark area segmentation map. An attention mechanism is then applied to the two maps to refine the landmark position. We evaluated our framework on a real-patient dataset consisting of 77 high-resolution dental surfaces. Our approach achieves an average localization error of 0.42 mm, significantly outperforming related start-of-the-art methods.

Yankun Lang, Hannah H. Deng, Deqiang Xiao, Chunfeng Lian, Tianshu Kuang, Jaime Gateno, Pew-Thian Yap, James J. Xia
Personalized CT Organ Dose Estimation from Scout Images

With the rapid increase of CT usage, radiation dose across patient populations is also increasing. Therefore, it is desirable to reduce the CT radiation dose. However, the reduction in dose also incurs additional noise and with the degraded image quality, diagnostic performance can be compromised. Existing routine dosimetric quantities are usually based on absorbed dose within cylindrical phantoms and do not appropriately represent the actual patient dose. More comprehensive dose metrics such as effective dose require estimation of patient-specific dose at an organ level. Unfortunately, currently available systems are quite far from achieving this goal as well as limited by a number of manual adjustments, time-consuming and inefficient procedures. To overcome all these challenges in achieving the goal of patient safety through reduced dose without compromising image quality, we devise a fully-automated, end-to-end deep learning-based solution to perform real-time, patient-specific, organ-level dosimetric prediction of CT scans. Leveraging the 2D scout (frontal and lateral) images of the actual patients, which are routinely acquired prior to the CT scan, our proposed Scout-Net model estimates the patient-specific mean dose in real-time for six different organs. Our experimental evaluation on real patient data demonstrates the effectiveness of our Scout-Net model not only in real-time dose estimation (only 11 ms on average per scan), but also as a potential tool for optimizing CT radiation dose in specific patients.

Abdullah-Al-Zubaer Imran, Sen Wang, Debashish Pal, Sandeep Dutta, Bhavik Patel, Evan Zucker, Adam Wang
High-Particle Simulation of Monte-Carlo Dose Distribution with 3D ConvLSTMs

Monte-Carlo simulation of radiotherapy dose remains an extremely time-consuming task, despite being still the most precise tool for radiation transport calculation. To circumvent this issue, deep learning offers promising avenues. In this paper, we extend ConvLSTM to handle 3D data and introduce a 3D recurrent and fully convolutional neural network architecture. Our model’s purpose is to infer a computationally expensive Monte Carlo dose calculation result for VMAT plans with a high number of particles from a sequence of simulations with a low number of particles. We benchmark our framework against other learning methods commonly used for denoising and other medical tasks. Our model outperforms the other methods with regards to several evaluation metrics used to assess the clinical viability of the predictions. Code is available at https://git.io/JcbxD .

Sonia Martinot, Norbert Bus, Maria Vakalopoulou, Charlotte Robert, Eric Deutsch, Nikos Paragios
Effective Semantic Segmentation in Cataract Surgery: What Matters Most?

Our work proposes neural network design choices that set the state-of-the-art on a challenging public benchmark on cataract surgery, CaDIS. Our methodology achieves strong performance across three semantic segmentation tasks with increasingly granular surgical tool class sets by effectively handling class imbalance, an inherent challenge in any surgical video. We consider and evaluate two conceptually simple data oversampling methods as well as different loss functions. We show significant performance gains across network architectures and tasks especially on the rarest tool classes, thereby presenting an approach for achieving high performance when imbalanced granular datasets are considered. Our code and trained models are available at https://github.com/RViMLab/MICCAI2021_Cataract_semantic_segmentation and qualitative results on unseen surgical video can be found at https://youtu.be/twVIPUj1WZM .

Theodoros Pissas, Claudio S. Ravasio, Lyndon Da Cruz, Christos Bergeles
Facial and Cochlear Nerves Characterization Using Deep Reinforcement Learning for Landmark Detection

We propose a pipeline for the characterization of facial and cochlear nerves in CT scans, a task specifically relevant for cochlear implant surgery planning. These structures are hard to locate in clinical CT scans due to their small size relative to the image resolution, the lack of contrast, and the proximity to other similar structures in this region. We define key landmarks around the facial and cochlear nerves and locate them using deep reinforcement learning with communicative multi-agents based on the C-MARL model. These landmarks are used as initialization for customized characterization methods. These include the automated direct measurement of the diameter of the cochlear nerve canal and extraction of the cochlear nerve cross-section followed by its segmentation using active contours. We also derive a path selection algorithm for optimal geodesic pathfinding selection based on Dijkstra’s algorithm for the characterization of the facial nerve. A total of 119 clinical CT images from preoperative patients have been used to develop this pipeline that produces accurate characterizations of these nerves in the cochlear region and provides reliable measurements for computer-aided diagnosis and surgery planning.

Paula López Diez, Josefine Vilsbøll Sundgaard, François Patou, Jan Margeta, Rasmus Reinhold Paulsen
Patient-Specific Virtual Spine Straightening and Vertebra Inpainting: An Automatic Framework for Osteoplasty Planning

Symptomatic spinal vertebral compression fractures are often treated by osteoplasty where a cement-like material is injected into the bone to stabilize the fracture, restore the vertebral body height and alleviate pain. Leakage is a common complication and may occur due to too much cement being injected. Here, we propose an automated patient-specific framework that can allow physicians to calculate an upper bound of the volume of cement for particular types of VCFs and estimate the optimal outcome of osteoplasty. The framework uses the patient CT scan and the segmentation label of the fractured vertebra to build a virtual healthy spine. Firstly, the fractured spine is segmented with a three-step Convolutional Neural Network architecture. Next, a per-vertebra rigid registration to a healthy reference spine restores its curvature. Finally, a GAN-based inpainting approach replaces the fractured vertebra with an estimation of its original shape, the volume of which we use as an estimate of the original healthy vertebra volume. As a clinical application, we derive an upper bound on the amount of bone cement for the injection. We evaluate our framework by comparing the virtual vertebrae volumes of ten patients to their healthy equivalent and report an error of 3.88 ± 7.63%. The presented pipeline offers a first approach to a personalized automatic high-level framework for planning osteoplasty procedures.

Christina Bukas, Bailiang Jian, Luis Francisco Rodríguez Venegas, Francesca De Benetti, Sebastian Rühling, Anjany Sekuboyina, Jens Gempt, Jan Stefan Kirschke, Marie Piraud, Johannes Oberreuter, Nassir Navab, Thomas Wendler
A New Approach to Orthopedic Surgery Planning Using Deep Reinforcement Learning and Simulation

Computer-assisted orthopedic interventions require surgery planning based on patient-specific three-dimensional anatomical models. The state of the art has addressed the automation of this planning process either through mathematical optimization or supervised learning, the former requiring a handcrafted objective function and the latter sufficient training data. In this paper, we propose a completely model-free and automatic surgery planning approach for femoral osteotomies based on Deep Reinforcement Learning which is capable of generating clinical-grade solutions without needing patient data for training. One of our key contributions is that we solve the real-world task in a simulation environment tailored to orthopedic interventions based on an analytical representation of real patient data, in order to overcome convergence, noise, and dimensionality problems. An agent was trained on simulated anatomy based on Proximal Policy Optimization and inference was performed on real patient data. A qualitative evaluation with expert surgeons and a complementary quantitative analysis demonstrated that our approach was capable of generating clinical-grade planning solutions from unseen data of eleven patient cases. In eight cases, a direct comparison to clinical gold standard (GS) planning solutions was performed, showing our approach to perform equally good or better in 80% (surgeon 1) respectively 100% (surgeon 2) of the cases.

Joëlle Ackermann, Matthias Wieland, Armando Hoch, Reinhold Ganz, Jess G. Snedeker, Martin R. Oswald, Marc Pollefeys, Patrick O. Zingg, Hooman Esfandiari, Philipp Fürnstahl
Whole Heart Mesh Generation for Image-Based Computational Simulations by Learning Free-From Deformations

Image-based computer simulation of cardiac function can be used to probe the mechanisms of (patho)physiology, and guide diagnosis and personalized treatment of cardiac diseases. This paradigm requires constructing simulation-ready meshes of cardiac structures from medical image data–a process that has traditionally required significant time and human effort, limiting large-cohort analyses and potential clinical translations. We propose a novel deep learning approach to reconstruct simulation-ready whole heart meshes from volumetric image data. Our approach learns to deform a template mesh to the input image data by predicting displacements of multi-resolution control point grids. We discuss the methods of this approach and demonstrate its application to efficiently create simulation-ready whole heart meshes for computational fluid dynamics simulations of the cardiac flow. Our source code is available at https://github.com/fkong7/HeartFFDNet .

Fanwei Kong, Shawn C. Shadden
Automatic Path Planning for Safe Guide Pin Insertion in PCL Reconstruction Surgery

Reconstruction surgery of torn ligaments typically requires precise and anatomically correct fixation of the graft substitute on the bone surface. Several planning methodologies have been proposed that aim at standardizing the interventional procedure by localizing drill sites or defining the drill tunnel orientation with the help of anatomical landmarks. However, the practical implementation is limited by the often complex and time-consuming nature of the planning steps. For this reason, we propose an automatic solution for safe guide pin path planning based on bone contour extraction, axis detection, anatomical landmark detection, and geometrical construction. We evaluate our approach for the task of double-bundle posterior cruciate ligament reconstruction surgery on the lateral tibia using 38 clinical X-ray images. Our method achieves a median path angulation error of $$0.37^{\circ }$$ 0 . 37 ∘ and a median localization error of 0.96 mm for the ligament attachment center.

Florian Kordon, Andreas Maier, Benedict Swartman, Maxim Privalov, Jan Siad El Barbari, Holger Kunze
Improving Hexahedral-FEM-Based Plasticity in Surgery Simulation

Collecting, stretching and tearing soft tissue is common in surgery. These repeated deformations have a plastic component that surgeons take into consideration and that surgical simulation should model. Organs and tissues can often be modeled as curved cylinders or planes, offset orthogonally to form thick shells. A pair of primary directions, e.g., axial and radial for cylinders, then provides a quadrilateral mesh whose offset naturally yields a hexahedral mesh.To better capture tissue plasticity for such hexahedral meshes, this work compares to and extends existing volumetric finite element models of plasticity. Specifically, we extend the open source simulation framework SOFA in the context of surgical simulation. Based on factored deformation gradients, the extension focuses on the challenge of separating symmetric and asymmetric, elastic and plastic deformation components – while preserving volume and avoiding re-meshing.

Ruiliang Gao, Jörg Peters
Rapid Treatment Planning for Low-dose-rate Prostate Brachytherapy with TP-GAN

Treatment planning in low-dose-rate prostate brachytherapy (LDR-PB) aims to produce arrangement of implantable radioactive seeds that deliver a minimum prescribed dose to the prostate whilst minimizing toxicity to healthy tissues. There can be multiple seed arrangements that satisfy this dosimetric criterion, not all deemed ‘acceptable’ for implant from a physician’s perspective. This leads to plans that are subjective where quality of treatment depends on the expertise of the planner. We propose a method that learns to generate consistent treatment plans from a large pool of successful clinical data (961 patients). Our model is based on conditional generative adversarial networks that use a novel loss function for penalizing the model on spatial constraints of the seeds. An optional optimizer based on a simulated annealing (SA) algorithm can be used to further fine-tune the plans if necessary (determined by the treating physician). Performance analysis was conducted on 150 test cases demonstrating comparable results to that of the manual plans. On average, the clinical target volume covered by $$100\%$$ 100 % of the prescribed dose was $$98.9\%$$ 98.9 % for our method compared to $$99.4\%$$ 99.4 % for manual plans. Moreover, using our model, the planning time was significantly reduced to an average of 3 s/plan (2.5 min/plan with the optional SA). Compared to this, manual planning at our centre takes around 20 min/plan.

Tajwar Abrar Aleef, Ingrid T. Spadinger, Michael D. Peacock, Septimiu E. Salcudean, S. Sara Mahdavi

Surgical Skill and Work Flow Analysis

Frontmatter
Trans-SVNet: Accurate Phase Recognition from Surgical Videos via Hybrid Embedding Aggregation Transformer

Real-time surgical phase recognition is a fundamental task in modern operating rooms. Previous works tackle this task relying on architectures arranged in spatio-temporal order, however, the supportive benefits of intermediate spatial features are not considered. In this paper, we introduce, for the first time in surgical workflow analysis, Transformer to reconsider the ignored complementary effects of spatial and temporal features for accurate surgical phase recognition. Our hybrid embedding aggregation Transformer fuses cleverly designed spatial and temporal embeddings by allowing for active queries based on spatial information from temporal embedding sequences. More importantly, our framework processes the hybrid embeddings in parallel to achieve a high inference speed. Our method is thoroughly validated on two large surgical video datasets, i.e., Cholec80 and M2CAI16 Challenge datasets, and outperforms the state-of-the-art approaches at a processing speed of 91 fps.

Xiaojie Gao, Yueming Jin, Yonghao Long, Qi Dou, Pheng-Ann Heng
OperA: Attention-Regularized Transformers for Surgical Phase Recognition

In this paper we introduce OperA, a transformer-based model that accurately predicts surgical phases from long video sequences. A novel attention regularization loss encourages the model to focus on high-quality frames during training. Moreover, the attention weights are utilized to identify characteristic high attention frames for each surgical phase, which could further be used for surgery summarization. OperA is thoroughly evaluated on two datasets of laparoscopic cholecystectomy videos, outperforming various state-of-the-art temporal refinement approaches.

Tobias Czempiel, Magdalini Paschali, Daniel Ostler, Seong Tae Kim, Benjamin Busam, Nassir Navab
Surgical Workflow Anticipation Using Instrument Interaction

Surgical workflow anticipation, including surgical instrument and phase anticipation, is essential for an intra-operative decision-support system. It deciphers the surgeon’s behaviors and the patient’s status to forecast surgical instrument and phase occurrence before they appear, providing support for instrument preparation and computer-assisted intervention (CAI) systems. We investigate an unexplored surgical workflow anticipation problem by proposing an Instrument Interaction Aware Anticipation Network (IIA-Net). Spatially, it utilizes rich visual features about the context information around the instrument, i.e., instrument interaction with their surroundings. Temporally, it allows for a large receptive field to capture the long-term dependency in the long and untrimmed surgical videos through a causal dilated multi-stage temporal convolutional network. Our model enforces an online inference with reliable predictions even with severe noise and artifacts in the recorded videos. Extensive experiments on Cholec80 dataset demonstrate the performance of our proposed method exceeds the state-of-the-art method by a large margin (1.40 v.s. 1.75 for inMAE and 2.14 v.s. 2.68 for eMAE). The code is published on https://github.com/Flaick/Surgical-Workflow-Anticipation.

Kun Yuan, Matthew Holden, Shijian Gao, Won-Sook Lee
Multi-view Surgical Video Action Detection via Mixed Global View Attention

Automatic surgical activity detection in the operating room can enable intelligent systems that potentially lead to more efficient surgical workflow. While real-world implementations of video activity detection in the OR most likely rely on multiple video feeds observing the environment from different view points to handle occlusion and clutter, the research on the matter has been left under-explored. This is perhaps due to the lack of a suitable dataset, thus, as our first contribution, we introduce the first large-scale multi-view surgical action detection dataset that includes over 120 temporally annotated robotic surgery operations, each recorded from 4 different viewpoints, resulting in 480 full-length surgical videos. As our second contribution, we design a novel model architecture that can detect surgical actions by utilizing multiple time-synchronized videos with shared field of view to better detect the activity that is taking place at any time. We explore early, hybrid, and late fusion methods for combining data from different views. We settle on a late fusion model that remains insensitive to sensor locations and feeding order, improving over single-view performance by using a mixing in the style of attention. Our model learns how to dynamically weight and fuse information across all views. We demonstrate improvements in mean Average Precision across the board using our new model.

Adam Schmidt, Aidean Sharghi, Helene Haugerud, Daniel Oh, Omid Mohareri
Interhemispheric Functional Connectivity in the Primary Motor Cortex Distinguishes Between Training on a Physical and a Virtual Surgical Simulator

Functional brain connectivity using functional near-infrared spectroscopy (fNIRS) during a pattern cutting (PC) task was investigated in physical and virtual simulators. 14 right-handed novice medical students were recruited and divided into separate cohorts for physical (N = 8) and virtual (N = 6) PC training. Functional brain connectivity measured were based on wavelet coherence (WCOH) from task-related oxygenated hemoglobin (HBO2) changes from baseline at left and right prefrontal cortex (LPFC, RPFC), left and right primary motor cortex (LPMC, RPMC), and supplementary motor area (SMA). HBO2 changes within the neurovascular frequency band (0.01–0.07 Hz) from long-separation channels were used to compute average inter-regional WCOH metrics during the PC task. The coefficient of variation (CoV) of WCOH metrics and PC performance metrics were compared. WCOH metrics from short-separation fNIRS time-series were separately compared. Partial eta squared effect size (Bonferroni correction) between the physical versus virtual simulator cohorts was found to be highest for LPMC-RPMC connectivity. Also, the percent change in magnitude-squared WCOH metric was statistically (p < 0.05) different for LPMC-RPMC connectivity between the physical and the virtual simulator cohorts. Percent change in WCOH metrics from extracerebral sources was not different at the 5% significance level. Also, higher CoV for both LPMC-RPMC magnitude-squared WCOH metric and PC performance metrics were found in physical than a virtual simulator. We conclude that interhemispheric connectivity of the primary motor cortex is the distinguishing functional brain connectivity feature between the physical versus the virtual simulator cohorts. Brain-behavior relationship based on CoV between the LPMC-RPMC magnitude-squared WCOH metric and the FLS PC performance metric provided novel insights into the neuroergonomics of the physical and virtual simulators that are crucial for validating Virtual Reality technology.

Anirban Dutta, Anil Kamat, Basiel Makled, Jack Norfleet, Xavier Intes, Suvranu De

Surgical Visualization and Mixed, Augmented and Virtual Reality

Frontmatter
Image-Based Incision Detection for Topological Intraoperative 3D Model Update in Augmented Reality Assisted Laparoscopic Surgery

Augmented Reality (AR) is a promising way to precisely locate the internal structures of an organ in laparoscopy. Several methods have been proposed to register a preoperative 3D model reconstructed from MRI or CT to the intraoperative laparoscopy 2D images. These methods assume a fixed topology of the 3D model. They thus quickly fail once the organ is cut to remove pathological internal structures. We propose to add image-based incision detection in the registration pipeline, in order to update the topology of the organ model. Whenever an incision is detected, it is transferred to the 3D model, whose topology is then updated accordingly, and registration started. We trained a UNet as incision detector from 181 labelled incision images, collected from 10 myomectomy procedures. It obtains a mean precision, recall and f1 score of 0.05, 0.36, and 0.08 from 10-fold cross-validation. Overall, topology updating improves 3D registration accuracy by 5% on average.

Tom François, Lilian Calvet, Callyane Sève-d’Erceville, Nicolas Bourdel, Adrien Bartoli
Using Multiple Images and Contours for Deformable 3D-2D Registration of a Preoperative CT in Laparoscopic Liver Surgery

Deformable registration is required to achieve laparoscopic augmented reality but still is an open problem. Some of the existing methods reconstruct a preoperative model and register it using anatomical landmarks from a single image. This is not accurate due to depth ambiguities. Other methods require of non-standard devices unadapted to the clinical practice. A reasonable way to improve accuracy is to combine multiple images from a monocular laparoscope. We propose three novel registration methods exploiting information from multiple images. The first two are based on rigidly-related images (MV-B and MV-C) and the third one on non-rigidly-related images (MV-D). We evaluated registration accuracy quantitatively on synthetic and phantom data, and qualitatively on patient data, comparing our results with state of the art methods. Our methods outperforms, reducing the partial visibility and depth ambiguity issues of single-view approaches. We characterise the improvement margin, which may be slight or significant, depending on the scenario.

Yamid Espinel, Lilian Calvet, Karim Botros, Emmanuel Buc, Christophe Tilmant, Adrien Bartoli
SurgeonAssist-Net: Towards Context-Aware Head-Mounted Display-Based Augmented Reality for Surgical Guidance

We present SurgeonAssist-Net: a lightweight framework making action-and-workflow-driven virtual assistance, for a set of predefined surgical tasks, accessible to commercially available optical see-through head-mounted displays (OST-HMDs). On a widely used benchmark dataset for laparoscopic surgical workflow, our implementation competes with state-of-the-art approaches in prediction accuracy for automated task recognition, and yet requires $$7.4\times $$ 7.4 × fewer parameters, $$10.2\times $$ 10.2 × fewer floating point operations per second (FLOPS), is $$7.0\times $$ 7.0 × faster for inference on a CPU, and is capable of near real-time performance on the Microsoft HoloLens 2 OST-HMD. To achieve this, we make use of an efficient convolutional neural network (CNN) backbone to extract discriminative features from image data, and a low-parameter recurrent neural network (RNN) architecture to learn long-term temporal dependencies. To demonstrate the feasibility of our approach for inference on the HoloLens 2 we created a sample dataset that included video of several surgical tasks recorded from a user-centric point-of-view. After training, we deployed our model and cataloged its performance in an online simulated surgical scenario for the prediction of the current surgical task. The utility of our approach is explored in the discussion of several relevant clinical use-cases. Our code is publicly available at https://github.com/doughtmw/surgeon-assist-net .

Mitchell Doughty, Karan Singh, Nilesh R. Ghugre
Backmatter
Metadata
Title
Medical Image Computing and Computer Assisted Intervention – MICCAI 2021
Editors
Prof. Dr. Marleen de Bruijne
Prof. Dr. Philippe C. Cattin
Stéphane Cotin
Nicolas Padoy
Prof. Stefanie Speidel
Yefeng Zheng
Caroline Essert
Copyright Year
2021
Electronic ISBN
978-3-030-87202-1
Print ISBN
978-3-030-87201-4
DOI
https://doi.org/10.1007/978-3-030-87202-1

Premium Partner