Keynote: Recent Advances in Surgical AI for Next Generation Interventions

Recent trends in Artificial Intelligence (AI) and surgical science have revolutionized the field of surgery, paving the way for a new era of AI-assisted robotic interventions. These cutting-edge technologies offer tremendous potential to enhance imaging, surgical navigation, and robotic interventions, ultimately reducing cognitive load on surgeons and optimizing procedural efficiency. This talk will highlight AI applications in different surgical procedures and where we stand in terms of their clinical translation for moving towards next generation of surgical intervention. [1].

Sophia Bano

Keynote: 4-D+ nanoSCOPE Project

The number of elderly and very elderly people is increasing worldwide, and so is the number of patients suffering from osteoporosis. This disease significantly impairs the quality of life and leads to high social costs. Nevertheless, the origin and course of osteoporosis are still not sufficiently understood. This is because methods for an in-depth analysis of the fine bone structure over time in living individuals are not yet available, especially those that also allow large matrix studies with statistical significance. An interdisciplinary research project now wants to change this. The 4-D+ nanoSCOPE project is developing a groundbreaking X-ray microscope (image acquisition with submicron resolution over a hundred times faster than is currently possible). An interdisciplinary team intend to enable X-ray microscopy studies in living creatures for the very first time. They plan to do so by combining state-of-the-art imaging techniques with innovative precision learning software and a novel Xray microscope. Their method has the potential to revolutionize our understanding of bone structure and improve bone remodelling, by enabling an effective assessment of the effects on bone of age, hormones, inflammation and treatment. [1].

Silke Christiansen

Improving Hybrid Quantum Annealing Tomographic Image Reconstruction with Regularization Strategies

Quantum computing and quantum annealing present promising avenues for addressing complex problems in various fields, including tomographic image reconstruction. This study investigates the application of hybrid quantum annealing in the context of tomographic image reconstruction, focusing on the formulation of compatible conventional image regularization strategies: L2 and total variation. Using a Shepp-Logan phantom of image size 32 × 32 with 4-bit grayscale encoding, we study the effect of the regularization techniques under the influence of their parameters and the runtime of the hybrid quantum annealer. The study reveals, that L2 regularization effectively enhances the obtained image reconstructions and total variation can further improve them. Despite efforts to employ regularized hybrid quantum annealing reconstructions, they still fall short in comparison to traditional reconstruction techniques.

Merlin A. Nau, A. Hans Vĳa, Maximilian P. Reymann, Wesley Gohn, Andreas K. Maier

Abstract: Deep Learning-based Detection of Vessel Occlusions on CT-Angiography in Patients with Suspected Acute Ischemic Stroke

Swift diagnosis and treatment play a decisive role in the clinical outcome of patients with acute ischemic stroke (AIS), and computer-aided diagnosis (CAD) systems can accelerate the underlying diagnostic processes. Here, we developed an artifical neural network (ANN) which allows automated detection of abnormal vessel findings. Pseudoprospective external validation was performed in consecutive patients with suspected AIS from 4 different hospitals during a 6-month timeframe and demonstrated high sensitivity (≥87%) and negative predictive value (≥93%). Benchmarking against two CE- and FDA-approved software solutions showed significantly higher performance for our ANN with improvements of 25–45% for sensitivity and 4–11% for NPV.We provide an imaging platform (https://stroke.neuroAI-HD.org) for online processing of medical imaging data with the developed ANN, including provisions for data crowdsourcing. Notably, this work has previously been published in Nature Communications [1].

Gianluca Brugnara, Michael Baumgartner, Edwin D. Scholze, Katerina Deike-Hofmann, Klaus Kades, Jonas Scherer, Stefan Denner, Hagen Meredig, Aditya Rastogi, Mustafa A. Mahmutoglu, Christian Ulfert, Ulf Neuberger, Silvia Schönenberger, Kai Schlamp, Zeynep Bendella, Thomas Pinetz, Carsten Schmeel, Wolfgang Wick, Peter A. Ringleb, Ralf Floca, Markus Möhlenbruch, Alexander Radbruch, Martin Bendszus, Klaus Maier-Hein, Philipp Vollmuth

Abstract: Cytologic Scoring of Equine Exercise-induced Pulmonary Hemorrhage

Performance of Human Experts and a Deep Learning

Exercise-induced pulmonary hemorrhage (EIPH) is a common respiratory condition in race horses with negative implications on performance. The gold standard diagnostic method is cytology of bronchoalveolar lavage fluid using the time-consuming total hemosiderin score (THS). For the routine THS, 300 alveolar macrophages are classified into 5 grades based on the amount of intracellular hemosiderin pigment (degradation product of heme iron of red blood cells). Besides the high time investment, there is notable inter-rater variability in assigning hemosiderin grades. Thus automated image analysis is of high interest to improve this diagnostic test. In this study [1] we validated a deep learning-based algorithm (RetinaNet) in 52 whole slide images (WSI) against the performance of 10 experts (each graded 300 cells per case) and a ground truth with labels for all macrophage in theWSI (range: 596 - 8954 macrophages). Compared to the ground truth reference, the algorithm had a diagnostic accuracy of 92.3%, while the 10 experts had an accuracy of 75.5% (range: 63.4 - 92.3%). Automated analysis of a single WSI took on average 1:37 minutes, while experts required an average of 14 minutes for 300 cells. In conclusion, the deep learning-based algorithm has a high diagnostic accuracy and is, therefore, a promising tool to reduce expert labor and to facilitate the routine use of the THS.

Christof A. Bertram, Christian Marzahl, Alexander Bartel, Jason Stayt, Federico Bonsembiante, Janet Beeler-Marfisi, Ann K. Barton, Ginevra Brocca, Maria E. Gelain, Agnes Gläsel, Kelly du Preez, Kristina Weiler, Christiane Weissenbacher-Lang, Katharina Breininger, Marc Aubreville, Andreas Maier, Robert Klopfl/-eisch, Jenny Hill

Abstract: Adaptive Region Selection for Active Learning in Whole Slide Image Semantic Segmentation

The annotation of gigapixel-sized whole slide images (WSI) in digital pathology can be time-intensive, especially when generating annotations for training deep segmentation models. Instead of requesting annotations for the fullWSI, region-based active learning (AL) allows to specify selected regions for annotation in an iterative process, reducing annotation while maintaining segmentation performance. Existing methods for section selection onWSI evaluate the informativeness of a quadratic grid of a predefined size of $$l \times l$$ l × l pixels according to a suitable informativeness criterion and then select the $$k$$ k most informative regions. Our experiments showthat the benefit of this method strongly depends on the choice of these two hyperparameters, i.e., the AL step size, and that a suboptimal AL step size can result in uninformative or redundant annotation requests [1].We evaluate our approach on the publicly available CAMELYON16 dataset and show that it consistently achieves higher sampling efficiency measured by annotated area compared to the reference approach across various AL step sizes.With only 2.6% of tissue area annotated, we achieve the same performance compared to a full annotation setting and thereby substantially reduce the costs of annotating a WSI dataset for the task of segmentation. Our approach can in theory be applied with any informativeness measure, with future work looking closer into an improved characterization of annotation effort. The source code is available at https://github.com/DeepMicroscopy/AdaptiveRegionSelection.

Jingna Qiu, Frauke Wilm, Mathias Öttl, Maja Schlereth, Chang Liu, Tobias Heimann, Marc Aubreville, Katharina Breininger

Abstract: Flexible Unfolding of Circular Structures for Rendering Textbook-style Cerebrovascular Maps

Comprehensive, contiguous visualizations of the main cerebral arteries and the surrounding parenchyma offer considerable potential for improving diagnostic workflows in cerebrovascular disease. Instead of manually navigating through Computer Tomography Angiography volumes, e.g. in time-critical stroke assessment, a 2D overview would allow for rapid examination of vascular topology and lumen. Unfolding the brain vasculature into a 2D vessel map is, however, infeasible using the common Curved Planar Reformation (CPR) due to the circular structure of the Circle of Willis (CoW). Additionally, the spatial configuration of the vessels typically renders them unsuitable for mapping onto simple geometric primitives. We propose CeVasMap [1], a flexible mesh-based solution for mapping multiple vascular structures, including circular ones, and their surroundings into a two-dimensional representation. It extends the As-Rigid- As-Possible (ARAP) deformation algorithm by a smart initialization of the required 3D readout mesh which is fitted to the CoW. Depending on the resulting degree of distortion, it is also possible to merge neighboring arteries directly into the same view. In cases of high distortion, these neighboring vessels are instead unfolded individually and attached to the main structure, creating a textbook-style overview image. An extensive distortion analysis is provided respectively for each vessel, comparing global and local gradient norms of the 2D-3D vector field of individual and merged unfoldings with their CPR representations. In addition to enabling the unfolding of circular structures and allowing more realistic curvature preservation, our method is on par in terms of incurred distortions to optimally oriented CPRs for individual vessel unfoldings and comparable to unfavorable CPR orientations when merging the complete CoW with a median distortion of 65 μm/mm. Compared to row-wise constant distortion in CPR, unfolding with CeVasMap results in a high ratio of distortion-free image parts whereas the occurring distortions are close to the centerlines.

Leonhard Rist, Oliver Taubmann, Hendrik Ditt, Michael Sühling, Andreas Maier

Attention-guided Erasing

Novel Augmentation Method for Enhancing Downstream Breast Density Classification

The assessment of breast density is crucial in the context of breast cancer screening, especially in populations with a higher percentage of dense breast tissues. This study introduces a novel data augmentation technique termed attention-guided erasing (AGE), devised to enhance the downstream classification of four distinct breast density categories in mammography following the BI-RADS recommendation in the Vietnamese cohort. The proposed method integrates supplementary information during transfer learning, utilizing visual attention maps derived from a vision transformer backbone trained using the self-supervised DINO method. These maps are utilized to erase background regions in the mammogram images, unveiling only the potential areas of dense breast tissues to the network. Through the incorporation of AGE during transfer learning with varying random probabilities, we consistently surpass classification performance compared to scenarios without AGE and the traditional random erasing transformation. We validate our methodology using the publicly available VinDr-Mammo dataset. Specifically, we attain a mean F1-score of 0.5910, outperforming values of 0.5594 and 0.5691 corresponding to scenarios without AGE and with random erasing (RE), respectively. This superiority is further substantiated by t-tests, revealing a p-value of p<0.0001, underscoring the statistical significance of our approach.

Adarsh Bhandary Panambur, Hui Yu, Sheethal Bhat, Prathmesh Madhu, Siming Bayer, Andreas Maier

Appearance-based Debiasing of Deep Learning Models in Medical Imaging

Out-of-distribution data can substantially impede the performance of deep learning models. In medical imaging, domain shifts can, for instance, be caused by different image acquisition protocols. To address these domain shifts, domain adversarial training can be employed to constrain a model to domainagnostic features. This, however, requires prior knowledge about the domain variable, which might not always be accessible. Recent approaches make use of control regions to guide the training process and thereby alleviate the need for prior domain knowledge. In this work, we combine these approaches with traditional domain adversarial training to exploit the benefits of both methods.We test the proposed method on two medical datasets and demonstrate performance increases of up to 10 %, compared to a baseline trained without debiasing.

Frauke Wilm, Marcel Reimann, Oliver Taubmann, Alexander Mühlberg, Katharina Breininger

Abstract: Interpretable Medical Image Classification Using Prototype Learning and Privileged Information

Interpretability is often an essential requirement in medical imaging. Advanced deep learning methods are required to address this need for explainability and high performance. In this work, we investigate whether additional information available during the training process can be used to create an understandable and powerful model. We propose an innovative solution called Proto-Caps that leverages the benefits of capsule networks, prototype learning, and the use of privileged information [1]. This hierarchical architecture establishes a basis for inherent interpretability. The capsule layers allow for mapping human-defined visual attributes onto the encapsulated representation of the high-level features. Furthermore, an active prototype learning algorithm incorporates even more interpretability. As a result, Proto-Caps provides case-based reasoning with attribute-specific prototypes. Applied to the LIDC-IDRI dataset [2], Proto-Caps predicts the malignancy of lung nodules and also provides prototypical samples that are similar in regards to the nodules’ spiculation, calcification, and six more visual features. Besides the additional interpretability, the proposed solution shows an above state-of-the-art prediction performance. Compared to the explainable baseline model, our method achieves more than 6% higher accuracy in predicting both malignancy (93.0 %) and mean characteristic features of lung nodules. Relatively good results can be also achieved when using only 1% of the attribute labels during the training. This result motivates further research as it shows that with Proto-Caps it only requires a few additional annotations of human-defined attributes resulting in an interpretable decision-making process. The code is publicly available at https://github.com/XRad-Ulm/Proto-Caps.

Luisa Gallée, Meinrad Beer, Michael Götz

Abstract: Robust Multi-contrast MRI Denoising using Trainable Bilateral Filters without Noise-free Targets

Magnetic resonance imaging (MRI) is widely acknowledged as one of the most diagnostically valuable and versatile medical imaging techniques available today, characterized by its exceptional soft tissue contrast, the absence of ionizing radiation, and the capability to acquire multiple different image contrasts. However, low signal-to-noise ratio (SNR) is a common challenge, particularly in low-field MRI scans, leading to reduced image quality and impaired diagnostic value. The effectiveness of traditional denoising methods, such as bilateral filters (BFs), heavily relies on the choice of hyperparameters. In contrast, deep learning approaches like convolutional neural networks (CNNs) are computationally demanding, require paired noisy and noise-free data for supervised learning, and often struggle to generalize to different magnetic resonance (MR) image contrasts. To bridge the gap between traditional denoising methods and deep learning, we employ a novel approach that combines a neural network comprised of trainable BF layers. This network is trained using an extended version of Stein’s unbiased risk estimator (SURE) as a self-supervised loss function, which estimates the mean squared error (MSE) between the denoised image and the unknown noise-free ground truth by incorporating a noise level map [1]. Our experiments demonstrate the effectiveness of our self-supervised approach, with the BF network outperforming the CNN by 14.7% in terms of peak signal-to-noise ratio (PSNR) when tested on unseen MR image contrasts. In conclusion, our research introduces a novel approach to address noise reduction challenges in MRI, particularly in low-SNR scenarios and across different MR image contrasts. The combination of trainable BF layers and SURE-based model supervision holds potential for future research in medical imaging, as it eliminates the dependency on noise-free training data, demonstrating parameter-efficiency, robustness and enhanced diagnostic outcomes even in the presence of unseen MR image features.

Laura Pfaff, Fabian Wagner, Julian Hossbach, Elisabeth Preuhs, Mareike Thies, Felix Denzinger, Dominik Nickel, Tobias Wuerfl, Andreas Maier

Privacy-enhancing Image Sampling for the Synthesis of High-quality Anonymous Chest Radiographs

The development of well-performing deep learning-based algorithms for thoracic abnormality detection and classification relies on access to largescale chest X-ray datasets. However, the presence of patient-specific biometric information in chest radiographs impedes direct and public sharing of such data for research purposes due to the potential risk of patient re-identification. In this context, synthetic data generation emerges as a solution for anonymizing medical images. In this study, we utilize a privacy-enhancing sampling strategy within a latent diffusion model to generate fully anonymous chest radiographs.We conduct a comprehensive analysis of the employed method and examine the impact of different privacy degrees. For each configuration, the resulting synthetic images exhibit a substantial level of data utility, with only a marginal gap compared to real data. Qualitatively, a Turing test conducted with six radiologists confirms the high and realistic appearance of the generated chest radiographs, achieving an average classification accuracy of 55% across 50 images (25 real, 25 synthetic).

Kai Packhäuser, Lukas Folle, Tri-Thien Nguyen, Florian Thamm, Andreas Maier

Segmentation-guided Medical Image Registration

Quality Awareness using Label Noise Correctionn

Medical image registration methods can strongly benefit from anatomical labels, which can be provided by segmentation networks at reduced labeling effort. Yet, label noise may adversely affect registration performance. In this work, we propose a quality-aware segmentation-guided registration method that handles such noisy, i.e., low-quality, labels by self-correcting them using Confident Learning. Utilizing NLST and in-house acquired abdominal MR images, we show that our proposed quality-aware method effectively addresses the drop in registration performance observed in quality-unaware methods. Our findings demonstrate that incorporating an appropriate label-correction strategy during training can reduce labeling efforts, consequently enhancing the practicality of segmentation-guided registration.

Varsha Raveendran, Veronika Spieker, Rickmer F. Braren, Dimitrios C. Karampinos, Veronika A. Zimmer, Julia A. Schnabel

Displacement Representation for Conditional Point Cloud Registration

HeatReg Applied to 2D/3D Freehand Ultrasound Reconstruction

In this work, we create a point cloud-based framework based on Free Point Transformers (FPTs) for 2D/3D registration of untracked ultrasound (US) sweeps. Applications include outpatient follow-up assessments and intraoperative scenarios like ultrasound-guided navigation. Through a simple modification in displacement prediction representation, we enhance registration results by more than 25% w.r.t. prior work while preserving the model-free paradigm, maintaining network parameters, and only marginally increasing computation time. Experiments on the SegThy dataset, featuring manually segmented anatomies on MR (magnetic resoncance) scans in the thyroid gland area, demonstrate our method’s effectiveness. We simulate numerous realistic ultrasound sweeps, aiming to register them back into the MR volume. Beyond methodological contributions, our fast registration framework strives to enable clinically capable systems, advancing ultrasound-guided surgery.

Lasse Hansen, Jürgen Lichtenstein, Mattias P. Heinrich

Joint Learning of Image Registration and Change Detection for Lung CT Images

Intuitive visualization of relevant changes between radiological image pairs in the form of change maps has the potential to not only increase efficiency in diagnostic reading, but also to decrease the number of missed abnormalities. Classically, change maps are created from difference images after an image registration step, which requires a careful balance in order to neither generate artifacts nor disguise relevant changes.We propose jointly learning registration and change map in order to address these limitations. As a proof of concept, the method was tested on NLST lung CT images and synthetically generated data, and shows comparable results to the conventional approach. In a reader study, the use of change maps resulted in a 23% reduction in reading time while maintaining similar recall.

Temke Kohlbrandt, Jan Moltz, Stefan Heldmann, Alessa Hering, Jan Lellmann

Abstract: Focused Unsupervised Image Registration for Structure-specific Population Analysis

Population-based analysis of medical images plays an essential role in identification and development of imaging biomarkers. Most commonly the focus lies on a single structure or image region in order to identify variations to discriminate between patient groups. In many applications, existing automatic segmentation tools or trained neural networks are used to identify relevant image structures. However, if new structures are to be analyzed, these approaches have the disadvantage that extensive manually segmented image data are required for development and training. Thus, in our paper [1], we focus on atlas-based segmentation methods for the analysis of image populations. Since, most frequently, high segmentation accuracy is only required in specific image regions while the accuracy in the remaining image area is of less importance, we propose an efficient ROI-based approach for unsupervised learning of deformable atlasto- image registration to facilitate structure-specific analysis. The proposed approach features a multi-stage registration pipeline using a transformer-based architecture to perform atlas-to-image transfer at high resolution in the specified region of interest and at low resolution in the remaining image space. This reduces computational cost in terms of memory consumption, computation time and energy consumption without significant accuracy loss ind the region of interest. The proposed method was evaluated for predicting cognitive impairment from morphological changes of the hippocampal region in brain MRI images. We compare our approach with models trained on fullresolution and half-resolution images, as well as with a U-net based registration network and iterative optimization-based registration methods. The experiments show that next to the efficient processing of 3D data, our method delivers accurate registration results comparable to state-of-the-art segmentation tools. Furthermore, the proposed method better captures morphological changes in a desired region of interest enabling better distinguishing between different cohorts.

Jan Ehrhardt, Hristina Uzunova, Paul Kaftan, Julia Krüger, Roland Opfer, Heinz Handels

Abstract: Combined 3D Dataset for CT- and Point Cloud-based Intra-patient Lung Registration Lung250M-4B

Intra-patient lung registration aims to find correspondences between lung images of different respiratory phases, aiding in, e.g., the diagnosis of COPD, estimation of tumour motion in radiotherapy planning or tracking of lung nodules.With recent developments, deep learning-based methods are competing for state-of-the-art in various image registration tasks. Additionally, geometric deep learning on point clouds – in particular learning-based point cloud registration – shows great potential regarding computational efficiency, robustness, and anonymity preservation. Publicly available image datasets for intra-patient lung registration, however, are often not sufficiently large to train deep learning methods properly or include primarily small motions, which transfer poorly to larger deformations. When purely using point cloud data, on the other hand, a fair comparison with state-of-the-art image-based registration methods is not possible and for both expert supervision is desirable. With Lung250M-4B [1], we present a dataset, that aims to tackle these problems. It consists of 248 curated and pre-processed public multi-centric in- and expiratory lung CT scans from 124 patients with large motion between scans. It comprises the DIR-LAB COPDgene [2] data as test data, which is popularly used to evaluate registration methods. Moreover, for each image, corresponding vessel point clouds are provided. For supervision, vein and artery segmentations as well as thousands of image-derived keypoint correspondences are included. Multiple validation scan pairs are annotated with manual landmarks.With all of this, Lung250M- 4B is the first dataset to enable a fair comparison between image- and point cloud-based registration methods, while consisting of significantly more image pairs than previous lung CT datasets, and it contains accurate correspondences for supervised learning. The download link for the data, processing scripts and benchmark results are available under https://github.com/multimodallearning/Lung250M-4B.

Fenja Falta, Christoph Großbröhmer, Alessa Hering, Alexander Bigalke, Mattias P. Heinrich

Influence of Prompting Strategies on Segment Anything Model (SAM) for Short-axis Cardiac MRI Segmentation

The segment anything model (SAM) has recently emerged as a significant breakthrough in foundation models, demonstrating remarkable zero-shot performance in object segmentation tasks. While SAM is designed for generalization, it exhibits limitations in handling specific medical imaging tasks that require fine-structure segmentation or precise boundaries. In this paper, we focus on the task of cardiac magnetic resonance imaging (cMRI) short-axis view segmentation using the SAM foundation model. We conduct a comprehensive investigation of the impact of different prompting strategies (including bounding boxes, positive points, negative points, and their combinations) on segmentation performance. We evaluate on two public datasets using the baseline model and models finetuned with varying amounts of annotated data, ranging from a limited number of volumes to a fully annotated dataset. Our findings indicate that prompting strategies significantly influence segmentation performance. Combining positive points with either bounding boxes or negative points shows substantial benefits, but little to no benefit when combined simultaneously. We further observe that fine-tuning SAM with a few annotated volumes improves segmentation performance when properly prompted. Specifically, fine-tuning with bounding boxes has a positive impact, while fine-tuning without bounding boxes leads to worse results compared to baseline.

Josh Stein, Maxime Di Folco, Julia A. Schnabel

Abstracting Volumetric Medical Images with Sparse Keypoints for Efficient Geometric Segmentation of Lung Fissures with a Graph CNN

Volumetric image segmentation often relies on voxel-wise classification using 3D convolutional neural networks (CNNs). However, 3D CNNs are inefficient for detecting thin structures that make up a tiny fraction of the entire image volume. We propose a geometric deep learning framework that leverages the representation of the image as a keypoint (KP) cloud and segments it with a graph convolutional network (GCN). From the sparse point segmentations, 3D meshes of the objects are reconstructed to obtain a dense surface. The method is evaluated for the lung fissure segmentation task on two public data sets of thorax CT images and compared to the nnU-Net as the current state-of-the-art 3D CNNbased method. Our method achieves fast inference times through the sparsity of the point cloud representation while maintaining accuracy. We measure a 34× speed-up at 1.5× the nnU-Net’s error with Förstner KPs and a 6× speed-up at 1.3× error with pre-segmentation KPs.

Paul Kaftan, Mattias P. Heinrich, Lasse Hansen, Volker Rasche, Hans A. Kestler, Alexander Bigalke

Advanced Deep Learning for Skin Histoglyphics at Cellular Level

In dermatology, the histological examination of skin cross-sections is essential for skin cancer diagnosis and treatment planning. However, the complete coverage of tissue abnormalities is not possible due to time constraints as well as the sheer number of cell groups. We present an automatic segmentation approach of seven tissue classes: vessels, perspiration glands, hair follicles, sebaceous glands, tumor tissue, epidermis and fatty tissue, for a fast processing of the large datasets. Hence, the initial size of the data lends itself to the use of patch-based deep learning models, resulting in good IoU score of 94.2 percent for the cancerous tissue and overall IoU score of 83.6 percent.

Robert Kreher, Naveeth Reddy Chitti, Georg Hille, Janine Hürtgen, Miriam Mengonie, Andreas Braun, Thomas Tüting, Bernhard Preim, Sylvia Saalfeld

Combining Image- and Geometric-based Deep Learning for Shape Regression

Comparison to Pixel-level Methods for Segmentation in Chest X-ray

When solving a segmentation task, shaped-base methods can be beneficial compared to pixelwise classification due to geometric understanding of the target object as shape, preventing the generation of anatomical implausible predictions in particular for corrupted data. In this work, we propose a novel hybrid method that combines a lightweight CNN backbone with a geometric neural network (Point Transformer) for shape regression. Using the same CNN encoder, the Point Transformer reaches segmentation quality on per with current state-ofthe- art convolutional decoders (4 ± 1.9 vs 3.9 ± 2.9 error in mm and 85 ± 13 vs 88 ± 10 Dice), but crucially, is more stable w.r.t image distortion, starting to outperform them at a corruption level of 30%.

Ron Keuth, Mattias P. Heinrich

Abstract: Multi-dataset Approach to Medical Image Segmentation

MultiTalent

The medical imaging community generates a wealth of data-sets, many of which are openly accessible and annotated for specific diseases and tasks such as multi-organ or lesion segmentation. Current practices continue to limit model training and supervised pre-training to one or a few similar datasets, neglecting the synergistic potential of other available annotated data. We propose MultiTalent, a method that leverages multiple CT datasets with diverse and conflicting class definitions to train a single model for a comprehensive structure segmentation [1]. Our results demonstrate improved segmentation performance compared to previous related approaches, systematically, also compared to single-dataset training using state-of-the-art methods, especially for lesion segmentation and other challenging structures. We show that MultiTalent also represents a powerful foundation model that offers a superior pre-training for various segmentation tasks compared to commonly used supervised or unsupervised pre-training baselines. Our findings offer a new direction for the medical imaging community to effectively utilize the wealth of available data for improved segmentation performance. The code and model weights are publicly available: https://github.com/MIC-DKFZ/MultiTalent.

Constantin Ulrich, Fabian Isensee, Tassilo Wald, Maximilian Zenk, Michael Baumgartner, Klaus H. Maier-Hein

Abstract: 3D Medical Image Segmentation with Transformer-based Scaling of ConvNets

MedNeXt

Transformer-based architectures have seen widespread adoption recently for medical image segmentation. However, achieving performances equivalent to those in natural images are challenging due to the absence of large-scale annotated datasets. In contrast, convolutional networks have higher inductive biases and consequently, are easier to train to high performance. Recently, the ConvNeXt architecture attempted to improve the standard ConvNet by upgrading the popular ResNet blocks to mirror Transformer blocks. In this work, we extend upon this to design a modernized and scalable convolutional architecture customized to challenges of dense segmentation tasks in data-scarce medical settings. In this work, we introduce the MedNeXt architecture which is a Transformer-inspired, scalable large-kernel network for medical image segmentation with 4 key features – 1) Fully ConvNeXt 3D Encoder-Decoder architecture to leverage network-wide benefits of the block design, 2) Residual ConvNeXt blocks for up and downsampling to preserve semantic richness across scales, 3) Upkern, an algorithm to iteratively increase kernel size by upsampling small kernel networks, thus preventing performance saturation on limited data, 4) Compound scaling of depth, width and kernel size to leverage the benefits of large-scale variants of the MedNeXt architecture. With state-of-the-art performance on 4 popular segmentation tasks, across variations in imaging modalities (CT, MRI) and dataset sizes, MedNeXt represents a modernized deep architecture for medical image segmentation. This work was originally published in [1]. Our code is made publicly available at: https://github.com/MIC-DKFZ/MedNeXt.

Saikat Roy, Gregor Koehler, Michael Baumgartner, Constantin Ulrich, Fabian Isensee, Paul F. Jaeger, Klaus Maier-Hein

Abstract: Baseline Pipeline for Automated Eye Redness Extraction with Relation to Clinical Grading

An essential bio-marker to detect ocular surface diseases like dry eye disease is ocular redness. In clinical routine, this marker is graded by visual comparison to reference image scales. We aim at supporting clinicians in this time-consuming and subjective task by determining a redness score from images, obtained with a novel device for standardized ocular surface photography (Cornea Dome Lens, Occyo GmbH, Innsbruck, Austria). Therefore, in a previous work [1], we presented a baseline pipeline to automatically determine eye redness. Regions of interest were cropped from the recordings based on the iris center and split up into smaller squared sub-regions called tiles. Each of these tiles was classified by a machine learning model and the redness is extracted for the relevant regions. Using the pipeline, images from 36 healthy and 37 pathological eyes were divided into 5840 tiles (80 per eye). A typical split of 80/10/10 % was used as training, validation and test set, respectively, to train the machine learning model. Hereby, the Random Forest model employed in the baseline was replaced by a deep learning model (ResNet50) to improve the performance. This model showed an accuracy of 0.920 and an F1-score of 0.919 on the test data set compared to an accuracy of 0.856 and an F1-score of 0.855 for the Random Forest [2]. In a follow-up work, we were able to relate the resulting redness scores with gradings from clinicians [3]. A positive relation between the scores and the gradings was observed. In the future, we will expand our data set and include more features (e.g., vessel density) to define a meaningful indicator for eye redness grading, which can be used as support in the clinical routine.

Philipp Ostheimer, Arno Lins, Bernhard Steger, Vito Romano, Marco Augustin, Daniel Baumgarten

Abstract: Reducing Domain Shift in Deep Learning for OCT Segmentation using Image Manipulations

Medical segmentation of optical coherence tomography (OCT) images using deep neural networks (DNNs) has been intensively studied in recent years, but generalization across datasets from different OCT devices is still a considerable challenge. In this work, we focus on the novel self-examination low-cost full-field (SELFF)-OCT, a handheld imaging device for home-monitoring of retinopathies, and the clinically used Spectralis-OCT. Images from both devices exhibit different characteristics, leading to different representations within DNNs and consequently to a reduced segmentation quality when switching between devices. To robustly segment OCT images from an OCT-scanner unseen during training, we alter the appearance of the images using manipulation methods ranging from traditional data augmentation to noise-based methods to learning-based style transfer methods. We evaluate the effect of the manipulation methods with respect to segmentation quality and changes in the feature space of the DNN. Reducing the domain shift with style transfer methods results in a significantly better segmentation of pigment epithelial detachment (PED).We evaluate the obtained segmentation networks qualitatively using t-SNE and quantitatively by measuring the univariate Wasserstein distance between feature representations across domains. We find that the segmentation quality of PED is negatively correlated with the distance between training and test distributions.f To obtain the best segmentation performance, we find that style transfer should be applied either at train or test time (but not at both), depending on which domain is used for training. Our methods and results help researchers to choose and evaluate image manipulation methods for developing OCT segmentation models which are robust against domain shifts. This paper was accepted and will be presented at the SPIE for Computer-Aided Diagnosis 2024 [1].

Marc S. Seibel, Joshua Niemeĳer, Marc Rowedder, Helge Sudkamp, Timo Kepp, Gereon Hüttmann, Heinz Handels

Neural Implicit k-space with Trainable Periodic Activation Functions for Cardiac MR Imaging

In MRI reconstruction, neural implicit k-space (NIK) representation maps spatial frequencies to k-space intensity values using an MLP with periodic activation functions. However, the choice of hyperparameters for periodic activation functions is challenging and influences training stability. In this work, we introduce and study the effectiveness of trainable (non-)periodic activation functions for NIK in the context of non-Cartesian Cardiac MRI. Evaluated on 42 radially sampled datasets from 6 subjects, NIKs with the proposed trainable activation functions outperform qualitatively and quantitatively other state-of-the-art reconstruction methods, including NIK with fixed periodic activation functions.

Patrick T. Haft, Wenqi Huang, Gastao Cruz, Daniel Rueckert, Veronika A. Zimmer, Kerstin Hammernik

Effect of Training Epoch Number on Patient Data Memorization in Unconditional Latent Diffusion Models

Deep diffusion models hold great promise for open data sharing while preserving patient privacy by utilizing synthetic high quality data as surrogates for real patient data. Despite the promise, such models are also prone to patient data memorization, where generative models synthesize patient data copies instead of novel samples. This can compromise patient privacy and further lead to patient re-identification. Given the risks, it is of considerable importance to investigate the reasons underlying memorization in such models. One aspect that is typically ignored is number of epochs while training, and over-training a model can lead to memorization. Here, we evaluate the effect of over-training on memorization. We train diffusion models on a publicly available chest X-ray dataset for varying number of epochs and detect patient data copies among synthesized samples using self-supervised models. Our results suggest that over-training can result in enhanced data memorization and it is an important aspect that should be considered while training generative models.

Salman U. Hassan Dar, Isabelle Ayx, Marie Kapusta, Theano Papavassiliu, Stefan O. Schoenberg, Sandy Engelhardt

Exploring GPT-4 as MR Sequence and Reconstruction Programming Assistant

GPT4MR

In this study, we explore the potential of generative pre-trained transformer (GPT), as a coding assistant for MRI sequence programming using the Pulseq framework. The programming of MRI sequences is traditionally a complex and time-consuming task, and the Pulseq standard has recently simplified this process. It allows researchers to define and generate complex pulse sequences used in MRI experiments. Leveraging GPT-4’s capabilities in natural language generation, we adapted it for MRI sequence programming, creating a specialized assistant named GPT4MR. Our tests involved generating various MRI sequences, revealing that GPT-4, guided by a tailored prompt, outperformed GPT-3.5, producing fewer errors and demonstrating improved reasoning. Despite limitations in handling complex sequences, GPT4MR corrected its own errors and successfully generated code with step-by-step instructions. The study showcases GPT4MR’s ability to accelerate MRI sequence development, even for novel ideas absent in its training set. While further research and improvement are needed to address complexity limitations, a well-designed prompt enhances performance. The findings propose GPT4MR as a valuable MRI sequence programming assistant, streamlining prototyping and development. The future prospect involves integrating a PyPulseq plugin into lightweight, open-source LLMs, potentially revolutionizing MRI sequence development and prototyping.

Moritz Zaiss, Junaid R. Rajput, Hoai N. Dang, Vladimir Golkov, Daniel Cremers, Florian Knoll, Andreas Maier

Abstract: Understanding Silent Failures in Medical Image Classification

To ensure the reliable use of classification systems in medical applications, it is crucial to prevent silent failures. This can be achieved by either designing classifiers that are robust enough to avoid failures in the first place, or by detecting remaining failures using confidence scoring functions (CSFs). A predominant source of failures in image classification is distribution shifts between training data and deployment data. To understand the current state of silent failure prevention in medical imaging, we conduct the first comprehensive analysis comparing various CSFs in four biomedical tasks and a diverse range of distribution shifts. Based on the result that none of the benchmarked CSFs can reliably prevent silent failures, we conclude that a deeper understanding of the root causes of failures in the data is required. To facilitate this, we introduce SF-Visuals, an interactive analysis tool that uses latent space clustering to visualize shifts and failures. On the basis of various examples, we demonstrate how this tool can help researchers gain insight into the requirements for safe application of classification systems in the medical domain. The open-source benchmark and tool are at: https://github.com/IML-DKFZ/sf-visuals. [1]

Till J. Bungert, Levin Kobelke, Paul F. Jaeger

Abstract: Advancing Large-scale Deformable 3D Registration with Differentiable Volumetric Rasterisation of Point Clouds

Chasing Clouds

3D point clouds are an efficient and privacy-preserving representation of medical scans highly suitable for complex segmentation and registration tasks. Yet, current loss functions for training self-supervised geometric networks are insufficient to handle largescale clouds and provide robust derivatives. Here, we present Differentiable Volumetric Rasterisation of Point Clouds (DiVRoC) that overcomes those limitations and provides highly accurate learning- or optimisation-based deformable 3D registration [1]. The key contribution is the derivation of a reverse grid-sampling operation with gradients for the motion vectors that can rapidly transform between grid-based volumetric and sparse point representations. It enables scalable regularisation and loss computation on 3D point clouds with >100k points being orders of magnitude faster than a Chamfer loss. The concept includes geometric registration networks that can be robustly trained in an unsupervised fashion and act on sparser point clouds. This is followed by a regularisation model that enables extrapolation for high-resolution distance metrics. Our experiments on the challenging PVT1010 lung dataset [2] that includes large motion of COPD patients between inspiration and expiration demonstrate state-of-the-art accuracies for training a PointPWC-Net and/or alignment based on Adam instance optimisation. The model reduces registration errors to approx. 2.4 mm and runs on very large point clouds in one second. The DiVRoC module can also be used to learn shape models for 3D surfaces [3]. Implementation details to use DiVRoC as drop-in replacement for point distances, a new out-of-domain dataset for evaluation and demos for realtime inference can be found at https://github.com/mattiaspaul/ChasingClouds.

Mattias P. Heinrich, Alexander Bigalke, Christoph Großbröhmer, Lasse Hansen

Harmonized Import of Clinical Research Data for the Open Source Image Analysis Platform Kaapana

While the DICOM standard facilitates a consistent approach to image data, integrating clinical and patient data from unstructured formats into medical image analysis platforms remains a complex challenge. To address this problem, we propose a web-based tool for interactive harmonization of semi-structured data tables and facilitating their integration into image analysis platforms such as Kaapana. Harmonization is performed with respect to a given schema. The approach supports researchers throughout the data lifecycle by enabling the interactive creation of migration scripts to extend the life of data in changing environments. The proposed tool helps researchers enhance data utilization in the medical field by making unharmonized data available. Despite its potential, the proposed solution has limitations when handling large data sets and faces potential security issues due to the use of JavasScript. Nevertheless, it offers considerable benefits by assisting in data harmonization, enabling the use of data from various sources, and therefore reducing costs by eliminating the need for redundant data collection.

Lucas Kulla, Philipp Schader, Klaus Maier-Hein, Marco Nolden

Interactive Exploration of Conditional Statistical Shape Models in the Web-browser

exploreCOSMOS

Statistical Shape Models of faces and various body parts are heavily used in medical image analysis, computer vision and visualization. Whilst the field is well explored with many existing tools, all of them aim at experts, which limits their applicability.We demonstrate the first tool that enables the convenient exploration of statistical shape models in the browser, with the capability to manipulate the faces in a targeted manner. This manipulation is performed via a posterior model given partial observations. We release our code and application on GitHub https://github.com/maximilian-hahn/exploreCOSMOS.

Maximilian Hahn, Bernhard Egger

Abstract: Anatomy-informed Data Augmentation for Enhanced Prostate Cancer Detection

Data augmentation (DA) is a key factor in medical image analysis, such as in prostate cancer (PCa) detection on magnetic resonance images. State-of-the-art computer-aided diagnosis systems still rely on simplistic spatial transformations to preserve the pathological label post transformation. However, such augmentations do not substantially increase the organ and tumor shape variability in the training set, limiting the model’s generalization ability. We propose a new anatomy-informed transformation that leverages information from adjacent organs to simulate typical physiological deformations of the prostate and generates unique lesion shapes without altering their label. Due to its lightweight computational requirements, it can be easily integrated into common DA frameworks. We demonstrate the effectiveness of our augmentation on a dataset of 774 biopsy-confirmed examinations, by evaluating a state-of-the-art method for PCa detection with different augmentation settings [1].

Balint Kovacs, Nils Netzer, Michael Baumgartner, Carolin Eith, Dimitrios Bounias, Clara Meinzer, Paul F. Jäger, Kevin S. Zhang, Ralf Floca, Adrian Schrader, Fabian Isensee, Regula Gnirs, Magdalena Görtz, Viktoria Schütz, Albrecht Stenzinger, Markus Hohenfellner, Heinz-Peter Schlemmer, Ivo Wolf, David Bonekamp, Klaus H. Maier-Hein

Abstract: Reformulating COPD Classification on Chest CT Scans as Anomaly Detection using Contrastive Representations

cOOpD

Classification of heterogeneous diseases is challenging due to their complexity, variability of symptoms and imaging findings. Chronic obstructive pulmonary disease (COPD) is a prime example, being underdiagnosed despite being the third leading cause of death. Its sparse, diffuse and heterogeneous appearance on computed tomography challenges supervised binary classification. We reformulate COPD binary classification as an anomaly detection task, proposing cOOpD: heterogeneous pathological regions are detected as out-of-distribution (OOD) from normal homogeneous lung regions. To this end, we learn representations of unlabeled lung regions employing a self-supervised contrastive pretext model, potentially capturing specific characteristics of diseased and healthy unlabeled regions.Agenerative model then learns the distribution of healthy representations and identifies abnormalities (stemming from COPD) as deviations. Patientlevel scores are obtained by aggregating region OOD scores. We show that cOOpD achieves the best performance on two public datasets, with an increase of 8.2% and 7.7% in terms of AUROC compared to the previous supervised state-of-the-art. Additionally, cOOpD yields well-interpretable spatial anomaly maps and patient-level scores which we show to be of additional value in identifying individuals in the early stage of progression. Experiments in artificially designed real-world prevalence settings further support that anomaly detection is a powerful way of tackling COPD classification [1].

Silvia D. Almeida, Carsten T. Lüth, Tobias Norajitra, Tassilo Wald, Marco Nolden, Paul F. Jäger, Claus P. Heussel, Jürgen Biederer, Oliver Weinheimer, Klaus H. Maier-Hein

Abstract: Handling Label Uncertainty on the Example of Automatic Detection of Shepherd’s Crook RCA in Coronary CT Angiography

Coronary artery disease (CAD) is often treated minimally invasively with a catheter being inserted into the diseased coronary vessel. If a patient exhibits a shepherd’s crook (SC) right coronary artery (RCA) – an anatomical norm variant of the coronary vasculature – the complexity of this procedure is increased. Automated reporting of this variant from coronary CT angiography screening would ease prior risk assessment. We propose a 1D convolutional neural network which leverages a sequence of residual dilated convolutions to automatically determine this norm variant from a prior extracted vessel centerline. As the SC RCA is not clearly defined with respect to concrete measurements, labeling also includes qualitative aspects. Therefore, 4.23% samples in our dataset of 519 RCA centerlines were labeled as unsure SC RCAs, with 5.97% being labeled as sure SC RCAs.We explore measures to handle this label uncertainty, namely global/model-wise random assignment, exclusion, and soft label assignment. Furthermore, we evaluate how this uncertainty can be leveraged for the determination of a rejection class. With our best configuration, we reach an area under the receiver operating characteristic curve (AUC) of 0.938 on confident labels. Moreover, we observe an increase of up to 0.020 AUC when rejecting 10%of the data and leveraging the labeling uncertainty information in the exclusion process [1].

Felix Denzinger, Michael Wels, Oliver Taubmann, Florian Kordon, Fabian Wagner, Stephanie Mehltretter, Mehmet A. Gülsün, Max Schöbinger, Florian André, Sebastian Buß, Johannes Görich, Michael Sühling, Andreas Maier, Katharina Breininger

Segment-wise Evaluation in X-ray Angiography Stenosis Detection

X-ray coronary angiography is the gold standard imaging modality for the assessment of coronary artery disease (CAD). The SYNTAX score is a recommended instrument for therapy decision-making and predicts the postprocedural risk associated with the two revascularization strategies: percutaneous coronary intervention (PCI) and coronary artery bypass graft (CABG). The score requires expert assessment and manual measurements of coronary angiograms for stenosis characterization. In this work we propose a deep learning workflow for automated stenosis detection to facilitate the calculation of the SYNTAX score. We use a region-based convolutional neural network for object detection, fine-tuned on a public dataset consisting of angiography frames with annotated stenotic regions. The model is evaluated on angiographic video sequences of complex CAD patients from the German Heart Center of the Charité University Hospital (DHZC), Berlin. We provide a customized graphical tool for cardiac experts that allows correction and segment annotation of the detected stenotic regions. The model reached a precision of 78.39% in the frame-wise object detection task on the clinical dataset. For the task of predicting the presence of coronary stenoses at the patient level, the model achieved a sensitivity of 49.55% for stenoses of all degrees and 59.18% for stenoses of relevant degrees (>75%). The results suggest that our stenosis detection tool can facilitate visual assessment of CAD in angiography data and encourage to investigate further development towards fully automated calculation of the SYNTAX score.

Antonia Popp, Alaa Abd El Al, Marie Hoffmann, Ann Laube, Peter McGranaghan, Volkmar Falk, Anja Hennemuth, Alexander Meyer

Automated Mitotic Index Calculation via Deep Learning and Immunohistochemistry

The volume-corrected mitotic index (M/V-Index) has demonstrated prognostic value in invasive breast carcinomas. However, despite its prognostic significance, it is not established as the standard method for assessing aggressive biological behaviour, due to the high additional workload associated with determining the epithelial proportion. In this work, we show that the use of a deep learning pipeline solely trained with an annotation-free, immunohistochemistrybased approach, provides accurate estimates of epithelial segmentation in canine mammary carcinomas. We compare our automatic framework with the manually annotated M/V-Index in a study with three board-certified pathologists. Our results indicate that the deep learning-based pipeline shows expert-level performance, while providing time efficiency and reproducibility.

Jonas Ammeling, Moritz Hecker, Jonathan Ganz, Taryn A. Donovan, Robert Klopfleisch, Christof A. Bertram, Katharina Breininger, Marc Aubreville

Abstract: Radiomics Processing Toolkit

Role of Feature Computation on Prediction Performance

Radiomics focuses on extracting and analyzing quantitative features from medical images. Standardizing radiomics is difficult due to variations across studies and centers, making it challenging to identify optimal techniques for any application. Recent works (WORC, Autoradiomics) [1, 2] are introducing radiomics-based frameworks for automated pipeline optimization. Both approaches span the workflow, enabling consistent, and reproducible radiomics analyses. In contrast, finding the ideal solutions for feature extractor and feature selection components, has received less attention. Therefore, we propose the Radiomics Processing Toolkit (RPTK) [3], which adds comprehensive feature extraction and selection components from PyRadiomics and from the Medical Image Radiomics Processor (MIRP) to the radiomics pipeline.We compared RPTK with results fromWORC and Autoradiomics and on six different public benchmark data sets. We demonstrate significant improved performance by incorporating the proposed feature processing and selection techniques across all datasets. Additionally, the choice of the feature extractor significantly enhances prediction performance. Our results provide additional guidance in selecting suitable components for optimized radiomics analyses.

Jonas R. Bohn, Christian M. Heidt, Silvia D. Almeida, Lisa Kausch, Michael Götz, Marco Nolden, Petros Christopoulos, Stephan Rheinheimer, Alan A. Peters, Oyunbileg von Stackelberg, Hans-Ulrich Kauczor, Klaus H. Maier-Hein, Claus P. Heußel, Tobias Norajitra

Towards Unified Multi-modal Dataset Creation for Deep Learning Utilizing Structured Reports

The unification of electronic health records promises interoperability of medical data. Divergent data storage options, inconsistent naming schemes, varied annotation procedures, and disparities in label quality, among other factors, pose significant challenges to the integration of expansive datasets especially across instiutions. This is particularly evident in the emerging multi-modal learning paradigms where dataset harmonization is of paramount importance. Leveraging the DICOMstandard,we designed a data integration and filter tool that streamlines the creation of multi-modal datasets. This ensures that datasets from various locations consistently maintain a uniform structure. We enable the concurrent filtering of DICOMdata (i.e. images andwaveforms) and corresponding annotations (i.e. segmentations and structured reports) in a graphical user interface. The graphical interface as well as example structured report templates is openly available at https://github.com/Cardio-AI/fl-multi-modal-dataset-creation.

Malte Tölle, Lukas Burger, Halvar Kelm, Sandy Engelhardt

Abstract: Comprehensive Multi-domain Dataset for Mitotic Figure Detection

The density of mitotic figures is a well-established diagnostic marker for tumor malignancy across many tumor types and species. At the same time, the identification of mitotic figures in hematoxylin and eosin-stained tissue slices is known to have a high inter-rater variability, reducing its reproducibility. Hence, mitotic figure identification in tumor tissue is a task worth automating using deep learning models. Additionally, there is high variability in tissue across labs, tumor types, and scanning devices, which leads to a covariant domain shift responsible for reducing the performance of many models. To provide a data foundation for the investigation of robustness and training of robust mitotic figure recognition models alike, we introduced the MIDOG++ dataset [1]. The dataset builds on the training data sets of the MIDOG 2021 and 2022 MICCAI challenges and extends them by two additional tumor types. In total, the dataset features regions of interest with a size of 2mm2 from 503 histological specimens across seven different tumor types (breast carcinoma, lung carcinoma, lymphosarcoma, neuroendocrine tumor, cutaneous mast cell tumor, cutaneous melanoma, and (sub)cutaneous soft tissue sarcoma). The annotation database, created from a consensus of three pathologists, aided by a machine learning algorithm to reduce the risk of missing mitotic figures, contains in total 11,937 mitotic figures. In our paper, we have demonstrated that there is a considerable domain gap between individual domains, but also that a combination of multiple domains yields robust mitotic figure detectors across tumor types and scanners.

Marc Aubreville, Frauke Wilm, Nikolas Stathonikos, Katharina Breininger, Taryn A. Donovan, Samir Jabari, Robert Klopfleisch, Mitko Veta, Jonathan Ganz, Jonas Ammeling, Paul J. van Diest, Christof A. Bertram

Assessment of Scanner Domain Shifts in Deep Multiple Instance Learning

Deep multiple instance learning is a popular method for classifying whole slide images, but it remains unclear how robust such models are against scanner-induced domain shifts. In this work, we studied this problem based on the classification of the mutational status of the c-Kit gene from whole slide images of canine mast cell tumors obtained with three different scanners. Furthermore, we investigated the possibility of utilizing image augmentation during feature extraction to overcome domain shifts. Our findings suggest that a notable domain shift exists between models trained on different scanners. Nevertheless, the use of image augmentations during feature extraction failed to address this domain shift and had no positive effect on in-domain performance.

Jonathan Ganz, Chloé Puget, Jonas Ammeling, Eda Parlak, Matti Kiupel, Christof A. Bertram, Katharina Breininger, Robert Klopfleisch, Marc Aubreville

Few Shot Learning for the Classification of Confocal Laser Endomicroscopy Images of Head and Neck Tumors

The surgical removal of head and neck tumors requires safe margins, which are usually confirmed intraoperatively by means of frozen sections. This method is, in itself, an oversampling procedure, which has a relatively low sensitivity compared to the definitive tissue analysis on paraffin-embedded sections. Confocal laser endomicroscopy (CLE) is an in-vivo imaging technique that has shown its potential in the live optical biopsy of tissue. An automated analysis of this notoriously difficult to interpret modality would help surgeons. However, the images of CLE show a wide variability of patterns, caused both by individual factors but also, and most strongly, by the anatomical structures of the imaged tissue, making it a challenging pattern recognition task. In this work, we evaluate four popular few shot learning (FSL) methods towards their capability of generalizing to unseen anatomical domains in CLE images. We evaluate this on images of sinunasal tumors (SNT) from five patients and on images of the vocal folds (VF) from 11 patients using a cross-validation scheme. The best respective approach reached a median accuracy of 79.6% on the rather homogeneous VF dataset, but only of 61.6% for the highly diverse SNT dataset. Our results indicate that FSL on CLE images is viable, but strongly affected by the number of patients, as well as the diversity of anatomical patterns.

Marc Aubreville, Zhaoya Pan, Matti Sievert, Jonas Ammeling, Jonathan Ganz, Nicolai Oetter, Florian Stelzle, Ann-Kathrin Frenken, Katharina Breininger, Miguel Goncalves

Computational Ontology and Visualization Framework for the Visual Comparison of Brain Atrophy Profiles

Alzheimer’s disease (AD) accounts for more than two-thirds of all dementia cases. Existing MRI volumetry tools summarize pathology found within brain MRI scans. However, they often lack methods for aggregating information at different brain abstraction levels, and lack an intuitive visualizations.We propose a computational pipeline for quantifying hierarchical volumetric deviations and generating interactive summary visualizations. We collected N=3115 MRI scans from five different data cohorts. We used the FastSurferCNN tool to obtain brain region segmentations and estimate their raw volumes. First, we created a semantic model, encoding hierarchical anatomical relationships in the web ontology language (OWL) model and a computational framework for aggregating volumetric deviations. Second,we developed a visualization framework, providing interactive visual ‘sunburst’ summary plots. The summary plots can highlight mean-group or single-subject atrophy profiles, enhancing visual comparison of atrophy profiles with different AD phases. Our pipeline could assist clinicians in discovering brain pathologies or subgroups in an interpretable and reliable manner.

Devesh Singh, Martin Dyrba

Abstract: Denoising of Home OCT Images using Noise-to-noise Trained on Artificial Eye Data

Optical coherence tomography (OCT) established as an essential part of the diagnosis, monitoring and treatment programs of patients suffering from wet age-related macular degeneration (AMD). To further improve disease progression monitoring and just-intime therapy, home OCTs such as the innovative self-examination low-cost full-field OCT (SELFF-OCT) are developed, enabling self-examination by patients due to its technical simplicity and cost efficiency, but coming at the cost of reduced image quality indicated by a low signal-to-noise ratio (SNR). Although deep learning denoising methods based on convolutional neural networks (CNN) or generative adversarial networks (GAN) achieve state-of-the-art denoising performance in improving the SNR for better image interpretability, they usually require noise-free images for training, which are not available for OCT imaging or can only be approximated by repeated scanning followed by complex and error-prone registration and multi-frame averaging processes. To circumvent this drawback, we propose a denoising approach in this work based on utilizing paired SELFF-OCT images acquired from the retina of an artificial eye to train a Noise2Noise (N2N) network by repeatedly mapping one noisy image to another noisy realization of the same image. Training of the network is performed with a small amount of data comprising only two OCT volumes. The performance of the proposed approach is evaluated by denoising unseen SELFF-OCT images from the retina of the artificial eye as well as real human eyes, utilizing standard image quality assessment (IQA) metrics like peak signal-to-noise ratio (PSNR) and structure similarity index measure (SSIM) as well as non-reference quality metrics. The qualitative and quantitative results of the evaluation verify the effectiveness of the proposed N2N approach by an improved SNR, while important structural information in the scans is preserved. Furthermore, the results reveal a superior denoising performance of the proposed approach compared to the application of conventional OCT denoising methods like block-matching and 3D filtering (BM3D) and probability-based non-local means (PNLM) [1].

Marc Rowedder, Timo Kepp, Tobias Neumann, Helge Sudkamp, Gereon Hüttmann, Heinz Handels

Abstract: Metal-conscious Embedding for CBCT Projection Inpainting

The existence of metallic implants in projection images for cone-beam computed tomography (CBCT) introduces undesired artifacts which degrade the quality of reconstructed images. In order to reduce metal artifacts, projection inpainting is an essential step in many metal artifact reduction algorithms. In this work, a hybrid network combining the shift window (Swin) vision transformer (ViT) and a convolutional neural network is proposed as a baseline network for the inpainting task. To incorporate metal information for the Swin ViT-based encoder, metal-conscious self-embedding and neighborhoodembedding methods are investigated [1]. Both methods have improved the performance of the baseline network. Furthermore, by choosing appropriate window size, the model with neighborhood-embedding could achieve the lowest mean absolute error of 0.079 in metal regions and the highest peak signal-to-noise ratio of 42.346 in CBCT projections. At the end, the efficiency of metal-conscious embedding on both simulated and real cadaver CBCT data has been demonstrated, where the inpainting capability of the baseline network has been enhanced.

Fuxin Fan, Yangkong Wang, Ludwig Ritschl, Ramyar Biniazan, Marcel Beister, Björn Kreher, Yixing Huang, Steffen Kappler, Andreas Maier

Abstract: Self-supervised Pre-training for Dealing with Small Datasets in Deep Learning for Medical Imaging

Evaluation of Contrastive and Masked Autoencoder Methods

Deep learning in medical imaging has the potential to minimize the risk of diagnostic errors, reduce radiologist workload, and accelerate diagnosis. Training such deep learning models requires large and accurate datasets, with annotations for all training samples. However, in the medical imaging domain, annotated datasets for specific tasks are often small due to the high complexity of annotations, limited access, or the rarity of diseases. To address this challenge, deep learning models can be pre-trained on large image datasets without annotations using methods from the field of self-supervised learning. After pre-training, small annotated datasets are sufficient to fine-tune the models for a specific task. The most popular self-supervised pre-training approaches in medical imaging are based on contrastive learning. However, recent studies in natural image processing indicate a strong potential for masked autoencoder approaches. Our work [1] compares state-of-the-art contrastive learning methods with the recently introduced masked autoencoder approach "SparK" for convolutional neural networks (CNNs) on medical images. Therefore, we pre-train on a large unannotated CT image dataset and fine-tune on several CT classification tasks. Due to the challenge of obtaining sufficient annotated training data in medical imaging, it is of particular interest to evaluate how the self-supervised pre-training methods perform when fine-tuning on small datasets. By experimenting with gradually reducing the training dataset size for fine-tuning, we find that the reduction has different effects depending on the type of pre-training chosen. The SparK pre-training method is more robust to the training dataset size than the contrastive methods. Based on our results, we propose the SparK pre-training for medical imaging tasks with only small annotated datasets.

Daniel Wolf, Tristan Payer, Catharina S. Lisson, Christoph G. Lisson, Meinrad Beer, Michael Götz, Timo Ropinski

Abstract: Enhanced Diagnostic Fidelity in Pathology Whole Slide Image Compression via Deep Learning

Accurate diagnosis of disease often depends on the exhaustive examination of whole slide images (WSI) at microscopic resolution. Efficient handling of these data-intensive images requires lossy compression techniques. This paper investigates the limitations of the widely-used JPEG algorithm, the current clinical standard, and reveals severe image artifacts impacting diagnostic fidelity. To overcome these challenges, we introduce a novel deep-learning (DL)-based compression method tailored for pathology images. By enforcing feature similarity of deep features between the original and compressed images, our approach achieves superior Peak Signal-to-Noise Ratio (PSNR), Multi-Scale Structural Similarity Index (MS-SSIM), and Learned Perceptual Image Patch Similarity (LPIPS) scores compared to JPEG-XL,Webp, and other DL compression methods. Our method increases the PSNR value from 39 (JPEG80) to 41, indicating improved image fidelity and diagnostic accuracy. Thisworkwas published on the InternationalWorkshop on Machine Learning in Medical Imaging [1].

Maximilian Fischer, Peter Neher, Peter Schüffler, Shuhan Xiao, Silvia Dias Almeida, Constantin Ulrich, Alexander Muckenhuber, Rickmer Braren, Michael Götz, Jens Kleesiek, Marco Nolden, Klaus Maier-Hein

Abstract: Self-supervised CT Dual Domain Denoising using Low-parameter Models

Computed tomography (CT) is routinely used for three-dimensional non-invasive imaging. Numerous data-driven image denoising algorithms were proposed to restore image quality in low-dose acquisitions. However, considerably less research investigates methods already intervening in the raw detector data due to limited access to suitable projection data or correct reconstruction algorithms. In this work, we present an endto- end trainable CT reconstruction pipeline that contains denoising operators in both the projection and the image domain and that are optimized simultaneously without requiring ground-truth high-dose CT data [1]. In addition to experiments with shallow convolutional neural networks, we use trainable bilateral filter layers as known denoising operators [2]. These custom filter layers only require gradient-based optimization of four parameters, each with well-defined effect on the filtering operation. Our experiments reveal that including an additional projection denoising operator in the CT reconstruction pipeline improved the overall denoising performance by 82.4–94.1%/12.5–41.7% (PSNR/SSIM) on abdomen CT and 1.5–2.9%/0.4–0.5% (PSNR/SSIM) on X-ray Microscopy data relative to the low-dose baseline. We have publicly released our helical CT reconstruction framework, Helix2Fan [3], which includes a raw projection rebinning step to render helical projection data suitable for differentiable fan-beam reconstruction operators and end-to-end learning. Additionally, the trainable bilateral filter layers employed in this study have been contributed to the medical open network for artificial intelligence (MONAI).

Fabian Wagner, Mareike Thies, Laura Pfaff, Oliver Aust, Sabrina Pechmann, Daniela Weidner, Noah Maul, Maximilian Rohleder, Mingxuan Gu, Jonas Utz, Felix Denzinger, Andreas Maier

Comparative Analysis of Radiomic Features and Gene Expression Profiles in Histopathology Data using Graph Neural Networks

This study leverages graph neural networks to integrate MELC data with Radiomic-extracted features for melanoma classification, focusing on cellwise analysis. It assesses the effectiveness of gene expression profiles and Radiomic features, revealing that Radiomic features, particularly when combined with UMAP for dimensionality reduction, significantly enhance classification performance. Notably, using Radiomics contributes to increased diagnostic accuracy and computational efficiency, as it allows for the extraction of critical data from fewer stains, thereby reducing operational costs. This methodology marks an advancement in computational dermatology for melanoma cell classification, setting the stage for future research and potential developments.

Luis C. Rivera Monroy, Leonhard Rist, Martin Eberhardt, Christian Ostalecki, Andreas Bauer, Julio Vera, Katharina Breininger, Andreas Maier

Magnetisation Reconstruction for Quantum Metrology

Widefield nitrogen-vacancy (NV) magnetometry presents a promising method for the detection of cancer biomarkers, offering a new frontier in medical diagnostics. The challenge lies in the inverse problem of accurately reconstructing magnetisation sources from magnetic field measurements, a task complicated by the noise sensitivity of the data, and the ill-posed nature of the inverse problem. To address this, we employed a physics informed neural network (PINN) on 2D magnetic materials, combining the strengths of convolutional neural networks (CNN) with underlying physical laws of magnetism. The physics informed loss during the training of the neural network constrains the parameter space to physically plausible reconstructions. The physics-constraining results in improved accuracy and noise robustness. This paves the way for understanding the requirements for the development of such models for quantum sensing in biomedicine.

Kartikay Tehlan, Michele Bissolo, Riccardo Silvioli, Johannes Oberreuter, Andreas Stier, Nassir Navab, Thomas Wendler

Improving Segmentation Models for AR-guided Liver Surgery using Synthetic Images

AR-guided open liver surgery is a field of intense research. However, due to the lack ofRGB-D videos of the surgery scene, there are not any solutions for automatic real-time tracking and registration of the virtual models to the patient’s anatomy, yet. We provide the first proof of concept for generating synthetic liver surgery images using surgery phantoms with a 3D print of a real liver. Thus, the RGB-D camera of an AR device captures realistic depth patterns. The RGB images of the phantom are enriched by realistic liver textures using image synthesis methods. We use these data to augment training data for RGB-D segmentation. Furthermore, we compare three common image synthesis methods that are based on generative adversarial networks (GANs) in demo setting for this purpose. We evaluate our synthetic data by measuring the performance of an RGB-D segmentation model for porcine liver images. Results showthatwe can outperform models trained only on real data by 3% to 4% when using a GauGAN approach. Furthermore, we observe biases due to overuse of synthetic data for augmentation factors higher than 50 %. Results propose a novel phantom-based concept for data synthesis in AR-guided surgery and serve as guidance for future technical improvements.

Michael Schwimmbeck, Serouj Khajarian, Stefanie Remmele

Application of Active Learning-based on Uncertainty Quantification to Breast Segmentation in MRI

In medical image segmentation with deep learning, large amounts of annotated data are needed to train precise models. Such annotations are timeconsuming and costly to create, since medical experts need to ensure their quality. Active learning techniques may reduce the expert effort. In this work, we compare different sample selection strategies for training a model for breast segmentation in MR images using 3D U-Nets: We evaluate two uncertainty-based approaches that compute the voxel-wise entropy or epistemic uncertainty based on a Bayesian neural network approximated via Monte Carlo dropout and compare them against a random selection as baseline. We find that both uncertainty-based approaches improve over the baseline in the earlier iterations, but lead to a similar performance in the long run. In early iterations they are 2-4 active learning iterations ahead of the "random selection" strategy, which corresponds to one or several days of saved annotation time.We also assess how well the different uncertainty measures correlate with the segmentation quality and find that epistemic uncertainty is a better surrogate measure than the commonly used plain entropy.

Kai Geißler, Markus Wenzel, Susanne Diekmann, Heinrich von Busch, Robert Grimm, Hans Meine

Guidance to Noise Simulation in X-ray Imaging

In medical imaging, noise is an inherent occurring signal corruption, especially for the X-ray imaging where dose exposure to the patient should be minimal. Besides potential image degeneration, which may hinder accurate diagnoses, the noise can have negative impact on signal processing and evaluation algorithms, especially in deep learning (DL) methods. Furthermore, for the training of DL based noise reduction or to bolster DL methods against degeneration due to unseen types of noise or noise levels, it is inevitable to have a thorough and correct noise simulation available. This paper introduces a comprehensive noise simulation method that integrates the strengths of existing techniques into a more complete solution. Simultaneously, our approach aims to minimize the reliance on device-specific measurements and data, by proposing an automatic detector gain estimation.

Dominik Eckert, Magdalena Herbst, Julia Wicklein, Christopher Syben, Ludwig Ritschl, Steffen Kappler, Sebastian Stober

Automated Lesion Detection in Endoscopic Imagery for Small Animal Models

Murine animal models are routinely used in research of gastrointestinal diseases, for example to analyze colorectal cancer or chronic inflammatory bowel diseases. By using suitable (miniaturized) endoscopy systems, it is possible to examine the large intestine of mice with respect to inflammatory, vascular or neoplastic changes without the need to sacrifice the animals. This enables the acquisition of high-resolution colonoscopy image sequences that can be used for the visual examination of tumors, the assessment of inflammation or the vasculature. Since the human resources for analyzing a multitude of videos, are limited, an automated evaluation of such image data is desirable. Video recordings (n = 49) of mice with and without colorectal cancer (CRC) were employed for this purpose and scored by clinical experts. The videos contained mice with tumors in 33 cases and 16 are pathologically normal. For the automatic detection of neoplastic changes (e.g. polyps), a deep neural network based on the YOLOv7- tiny architecture was applied. This network was previously trained on >36,000 human colon images with neoplasias visible in all frames. On test data with human images, the precision of the network is Prec = 0.92, and Rec = 0.90. The network was applied to the mouse data without any changes. To avoid falsepositive detections a color-based method was added to differentiate between stool residues and polyps. With the framework for the detection of neoplastic changes and classification of stool residues, we achieve results of Prec = 0.90, Rec = 0.98, F1 score = 0.94.Without the detection of stool residues, the values were dropping to Prec = 0.65 and Rec = 0.98, as 19 occurrences of stool are incorrectly classified as tumors. Our network trained on human data for neoplasia detection is able to accurately detect tumors in the murine colon. An additional module for the separation of stool residues is essential to avoid integration of wrongly positive cases.

Thomas Eixelberger, Qi Fang, Bisan A. Zohud, Ralf Hackner, Rene Jackstadt, Michael Stürzl, Elisabeth Naschberger, Thomas Wittenberg

Ultrasound to CT Image-to-image Translation for Personalized Thyroid Screening

The use of 2D scintigraphy in the screening for thyroid pathologies is widespread, however its diagnostic value is limited because the activity of multiple thyroid lesions cannot be effectively resolved. Combining the scintigraphy with a CT would allow to simulate 3D SPECT and thereby increase the diagnostic value of the screening. However, during screening programs ultrasound is preferred to CT, because of its widespread availability and harmlessness. Therefore, tools to translate the thyroid ultrasound to CT are needed. In this perspective, we propose to translate ultrasound images of the thyroid into synthetic CT images using a GAN-based architecture. Our proposed approach results in a higher anatomical consistency between the input US and the output synthetic CT compared to the baseline. The synthetic CTs exhibit a realistic HU distribution compared to real CTs and maintain realistic appearance as indicated by the Fréchet Inception Distance.

Carl A. Noack, Francesca De Benetti, Kartikay Tehlan, Nassir Navab, Thomas Wendler

Abstract: Multistage Registration of CT and Biopsy CT Images of Lung Tumors

The research project “Radiomics enhanced CT-guided targeted biopsy in lung cancer” utilises pre-calculated intratumoural heterogeneity areas to perform CT-guided biopsies of lung tumors. This involves the fusion of CT images acquired during preliminary examinations and their radiomics maps with biopsy CT images to detect potential intratumoral heterogeneity areas, requiring registration of the corresponding images. So, we developed a multistage registration approach with rigid preregistration. The dataset comprises 13 thorax CT volumes recorded during preliminary examinations (called CT images) and 13 narrowCT volumes (6 slices) acquired during biopsies (called biopsy CT images) of 13 patients with lung tumors. In some cases, due to the intervention, patients were lying on their side during the biopsy, whereas they were lying on their back during the preliminary examination. Rigid preregistration was initially performed to correct large rotations and translations. The rotation was determined using bounding boxes and ITK-Snap was used to estimate corresponding slices in the images. The preregistered CT images were then registered to the biopsy CT images using a SimpleElastix multistage algorithm, including rigid, affine, and deformable transformations. The transformations from the rigid preregistration and SimpleElastix were then applied to the radiomics maps. The results demonstrate that the multistage registration resulted in high structural similarity and overlap of lung tumors in the CT and biopsy CT images, enabling “virtual biopsies” and extraction of quantitative radiomics features of the exact puncture site [1].

Anika Strittmatter, Alexander Hertel, Steffen Diehl, Matthias F. Froelich, Stefan O. Schoenberg, Sonja Loges, Tobias Boch, Daniel Nowak, Alexander Streuer, Lothar R. Schad, Frank G. Zöllner

Abstract: Spatiotemporal Illumination Model for 3D Image Fusion in Optical Coherence Tomography

Optical coherence tomography (OCT) is a non-invasive, micrometer-scale imaging modality that has become a clinical standard in ophthalmology. By raster-scanning the retina, sequential cross-sectional image slices are acquired to generate volumetric data. In-vivo imaging suffers from discontinuities between slices that show up as motion and illumination artifacts.We present a new illumination model that exploits continuity in orthogonally raster-scanned volume data [1]. Our novel spatiotemporal parametrization adheres to illumination continuity both temporally, along the imaged slices, as well as spatially, in the transverse directions. Yet, our formulation does not make inter-slice assumptions, which could have discontinuities. This is the first optimization of a 3D inverse model in an image reconstruction context in OCT. Evaluation in 68 volumes from eyes with pathology showed reduction of illumination artifacts in 88% of the data, and only 6% showed moderate residual illumination artifacts. The method enables the use of forward-warped motion corrected data [2], which is more accurate, and enables supersampling and advanced 3D image reconstruction in OCT [3, 4].

Stefan B. Ploner, Jungeun Won, Julia Schottenhamml, Jessica Girgis, Kenneth Lam, Nadia Waheed, James G. Fujimoto, Andreas Maier

Abstract: Gradient-based Geometry Learning for Fan-beam CT Reconstruction

Incorporating computed tomography (CT) reconstruction operators into differentiable pipelines has proven beneficial in many applications. Such approaches usually focus on the projection data and keep the acquisition geometry fixed. However, precise knowledge of the acquisition geometry is essential for high quality reconstruction results. Here, the differentiable formulation of fan-beam CT reconstruction is extended to the acquisition geometry. The CT reconstruction operation is analytically derived with respect to the acquisition geometry. This allows to propagate gradient information from a loss function on the reconstructed image into the geometry parameters. As a proof-ofconcept experiment, this idea is applied to rigid motion compensation. The cost function is parameterized by a trained neural network which regresses an image quality metric from the motion-affected reconstruction alone. Since this regressed quality index and the geometry parameters are connected in a differentiable manner, optimization can be performed using standard gradient-based optimization procedures. Oppositely, all previous approaches rely on gradient-free optimization in this context. The proposed motion compensation algorithm improves the structural similarity index measure (SSIM) from 0.848 for the initial motion-affected reconstruction to 0.946 after compensation. It also generalizes to real fan-beam sinograms which are rebinned from a helical trajectory where the SSIM increases from 0.639 to 0.742. Furthermore,we can showthat the number of target function evaluations is decreased by several orders of magnitude compared to gradientfree optimization. Using the proposed method, we are the first to optimize an autofocusinspired algorithm based on analytical gradients. Next to motion compensation, we see further use cases of our differentiable method for scanner calibration or hybrid techniques employing deep models. The GPU-accelerated source code for geometrydifferentiable CT backprojection in fan-beam and cone-beam geometries is publicly available at https://github.com/mareikethies/geometry_gradients_CT [1].

Mareike Thies, Fabian Wagner, Noah Maul, Lukas Folle, Manuela Meier, Maximilian Rohleder, Linda-Sophie Schneider, Laura Pfaff, Mingxuan Gu, Jonas Utz, Felix Denzinger, Michael Manhart, Andreas Maier

Segmentation-inspired Image Registration

Artificial intelligence has been used with great success for the segmentation of anatomical structures in medical imaging. We use these achievements to improve classical registration schemes. Particularly, we derive geometrical features such as centroids and principal axes of segments and use those in a combined approach. A smart filtering of the features results in a two phase preregistration, followed in a third phase by an intensity guided registration. We also propose to use a regularization, which enables a coupling of all components of the 3D transformation in a unified framework. Finally, we show how easily our approach can be applied even to challenging 3D medical data.

Saskia Neuber, Pia F. Schulz, Sven Kuckertz, Jan Modersitzki

Exploring Epipolar Consistency Conditions

Intravital X-ray microscopy (XRM) in preclinical mouse models is of vital importance for the identification of microscopic structural pathological changes in the bone which are characteristic of osteoporosis. The complexity of this method stems from the requirement for high-quality 3D reconstructions of the murine bones. However, respiratory motion and muscle relaxation lead to inconsistencies in the projection data which result in artifacts in uncompensated reconstructions. Motion compensation using epipolar consistency conditions (ECC) has previously shown good performance in clinical CT settings. Here, we explore whether such algorithms are suitable for correcting motion-corrupted XRM data. Different rigid motion patterns are simulated and the quality of the motion-compensated reconstructions is assessed. The method is able to restore microscopic features for out-of-plane motion, but artifacts remain for more realistic motion patterns including all six degrees of freedom of rigid motion. Therefore, ECC is valuable for the initial alignment of the projection data followed by further fine-tuning of motion parameters using a reconstruction-based method.

Mareike Thies, Fabian Wagner, Mingxuan Gu, Siyuan Mei, Yixing Huang, Sabrina Pechmann, Oliver Aust, Daniela Weidner, Georgiana Neag, Stefan Uderhardt, Georg Schett, Silke Christiansen, Andreas Maier

Abstract: Enabling Geometry Aware Learning Through Differentiable Epipolar View Translation

Epipolar geometry is exploited in several applications in the field of Cone-Beam Computed Tomography (CBCT) imaging. By leveraging consistency conditions between multiple views of the same scene, motion artifacts can be minimized, the effects of beam hardening can be reduced, and segmentation masks can be refined. So far, these conditions have been formulated as optimization criteria to be minimized in post-processing. In this work, we explore the idea of enabling deep learning models to access the known geometrical relations between views in order to improve the prediction. The implicit 3D information contained in the relative pose between views can potentially enhance various projection domain algorithms such as segmentation, detection, or inpainting. Based on this hypothesis, we introduce a differentiable feature translation operator, which uses available projection matrices to calculate and integrate over the epipolar line in a second view. After geometrically translating all channels of a layer’s output feature map, we concatenate these activations to a second instance of the same architecture processing a second view of the same 3D scene. As an example application, we evaluate the effects of this operation on the task of projection domain metal segmentation. By re-sampling a stack of projections into orthogonal viewpairs,we segment each projection image jointly with a second view acquired 90° apart. The comparison with an equivalent single-view segmentation model reveals an improved segmentation performance of 0.95 over 0.91 measured by the dice coefficient. By providing an implementation of this operator as an open-access differentiable layer, we seek to enable future research [1].

Maximilian Rohleder, Charlotte Pradel, Fabian Wagner, Mareike Thies, Noah Maul, Felix Denzinger, Andreas Maier, Bjoern Kreher

Abstract: Realistic Collimated X-ray Image Simulation Pipeline

Collimator detection in X-ray systems has long posed a formidable challenge, particularly when information about the detector’s position relative to the source is either unreliable or completely unavailable. In this paper [1], we introduce a physically motivated image processing pipeline designed to simulate the intricate characteristics of collimator shadows in X-ray images. The primary objective of this pipeline is to address the scarcity of training data for deep neural networks, which are increasingly promising for collimator detection. By applying the pipeline to deep networks initially limited by small datasets, our approach equips them with the necessary information to learn and generalize effectively. Our pipeline is a comprehensive solution that leverages several key components to generate realistic collimator images. Employing randomized labels to describe collimator shapes and their respective locations ensures diversity and representativeness. In addition, we integrate a convolution kernel based scattered radiation simulation mechanism, which is a crucial factor in real-world X-ray imaging. To complete the simulation process, we introduce Poisson noise to replicate the inherent characteristics of collimator shadows in X-ray images. Comparing the simulated data with real collimator shadows demonstrates the authenticity of our approach and its potential to bridge the gap between synthetic and real-world data. Moreover, incorporating simulated data into our deep learning framework not only serves as a valid substitute for real collimators but also significantly improves generalization in real-world applications, holding great promise for the field of collimator detection. The concepts and information presented in this paper are based on research and are not commercially available.

Benjamin El-Zein, Dominik Eckert, Thomas Weber, Maximilian Rohleder, Ludwig Ritschl, Steffen Kappler, Andreas Maier

Abstract: RecycleNet

Latent Feature Recycling Leads to Iterative Decision Refinement

Despite the remarkable success of deep learning systems over the last decade, a key difference still remains between neural network and human decision-making: As humans, we can not only form a decision on the spot, but also ponder, revisiting an initial guess from different angles, distilling relevant information, arriving at a better decision. Here, we propose RecycleNet, a latent feature recycling method, instilling the pondering capability for neural networks to refine initial decisions over a number of recycling steps, where outputs are fed back into earlier network layers in an iterative fashion. This approach makes minimal assumptions about the neural network architecture and thus can be implemented in a wide variety of contexts. Using medical image segmentation as the evaluation environment, we show that latent feature recycling enables the network to iteratively refine initial predictions even beyond the iterations seen during training, converging towards an improved decision.We evaluate this across a variety of segmentation benchmarks and showconsistent improvements even compared with top-performing segmentation methods. This allows trading increased computation time for improved performance, which can be beneficial, especially for safety-critical applications [1].

Gregor Koehler, Tassilo Wald, Constantin Ulrich, David Zimmerer, Paul F. Jaeger, Jörg K. H. Franke, Simon Kohl, Fabian Isensee, Klaus H. Maier-Hein

Self-supervised Vessel Segmentation from X-ray Images using Digitally Reconstructed Radiographs

Coronary artery segmentation on angiograms can be beneficial in the diagnosis and treatment of coronary artery diseases. In this paper, we propose a self-supervised vessel segmentation framework that incorporates the knowledge from generated digitally reconstructed radiographs(DRRs) to perform vessel segmentation on angiographic images without manual annotations. The framework is built based on domain randomization, adversarial learning, and self-supervised learning. Domain randomization and adversarial learning are able to effectively reduce the domain gaps between DRRs and angiograms, whereas self-supervised learning enables the network to learn photometric invariant and geometric equivariant features for angiographic images. The experimental results demonstrate that we achieve a better performance compared with the state-of-the-art methods.

Zichen Zhang, Baochang Zhang, Mohammad F. Azampour, Shahrooz Faghihroohi, Agnieszka Tomczak, Heribert Schunkert, Nassir Navab

Influence of imperfect annotations on deep learning segmentation models

Convolutional neural networks are the most commonly used models for multi-organ segmentation in CT volumes. Most approaches are based on supervised learning, which means that the data used for training requires expert annotations, which is time-consuming and tedious. Errors introduced during that process will inherently influence all downstream tasks and are difficult to counteract. To showthe impact of such annotation errors when training deep segmentation models, we evaluate simple U-Net architectures trained on multi-organ datasets including artificially generated annotation errors. Specifically, three types of common masks are simulated, i.e. constant over- or under-segmentation error at the organ’s boundary and the mixed-segmentation error. Our results show that using the ground truth data leads to mean dice score of 0.780, compared to mean dice scores of 0.761 and 0.663 for the constant over- and under-segmentation errors respectively. In contrast the mixed segmentation introduces a rather small performance decrease with a mean dice score of 0.771.

Christopher Brückner, Chang Liu, Leonhard Rist, Andreas Maier

Addressing the Bias of the Dice Coefficient

Semantic Segmentation of Peripheral Airways in Lung CT

While self-configuring U-Net architectures excel at a vast majority of supervised medical image segmentation tasks, they strongly rely on the chosen loss function. We demonstrate that a commonly employed Dice or cross entropy loss leads to a bias of the trained network, that is critical for the clinical application of airway segmentation from CT scans. The effort to produce the most accurate segmentations is skewed towards larger anatomical structures, leaving smaller peripheral airways with poorer quality. To address this bias, we explore several different choices of amending the label definition, including morphological dilation, and find that separating the binary airway segmentations into at least two distinct structures yields substantial improvements of approximately 4% in peripheral areas. This finding could directly benefit several clinically relevant tasks, among others virtual CT bronchoscopy.

Fenja Falta, Mattias P. Heinrich, Marian Himstedt

Automated Tooth Instance Segmentation and Pathology Annotation Pipeline for Panoramic Radiographs

Mask-R-CNN Approach with Elastic Transformations

Caries detection in dental radiographs is a challenging and time consuming task even for experts in the field. Recent studies have shown the potential of tooth instance segmentation and caries detection with neural networks. We present a tooth level pathology annotation pipeline, based on automated tooth instance segmentation and numbering with a Mask-R-CNN architecture followed by the extraction of the bounding boxes of individual teeth as patches, that can be reassembled to the original image. 5-fold cross validation resulted in mean average precision (mAP) of 0.898 ± 0.02 for tooth instance segmentation. Augmentation focusing on elastic transformation increased the mAP by 0.053 to 0.951 ± 0.014 and enhanced robustness across folds. At performance levels at least similar to published data our approach provides flexibility for patch-based pathology diagnosis combined with the option to reassemble annotated patches to the original image. This will permit combining tooth-number-specific, neighborhood-based and entire image based features in future modeling along with tooth-centric review and diagnoses by clinical needs of dentists.

Christopher J. Hansen, Jonas Conrad, Ronald Seidel, Nicolai R. Krekiehn, Eren Yilmaz, Niklas Koser, Martin Goetze, Toni Gehrmann, Sebastian Lauterbach, Christian Graetz, Christof Dörfer, Claus C. Glüer

Multi-task Learning to Improve Semantic Segmentation of CBCT Scans using Image Reconstruction

Semantic segmentation is a crucial task in medical image processing, essential for segmenting organs or lesions such as tumors. In this study we aim to improve automated segmentation in CBCTs through multi-task learning. To evaluate effects on different volume qualities, a CBCT dataset is synthesised from the CT Liver Tumor Segmentation Benchmark (LiTS) dataset. To improve segmentation, two approaches are investigated. First, we perform multi-task learning to add morphology based regularization through a volume reconstruction task. Second, we use this reconstruction task to reconstruct the best quality CBCT (most similar to the original CT), facilitating denoising effects. We explore both holistic and patch-based approaches. Our findings reveal that, especially using a patch-based approach, multi-task learning improves segmentation in most cases and that these results can further be improved by our denoising approach.

Maximilian E. Tschuchnig, Julia Coste-Marin, Philipp Steininger, Michael Gadermayr

Non-specialist Versus Neural Network

An Example of Gender Classification from Abdominal CT Data

The general paradigm is the following: more input information for a system leads to a better performance. Our assumption is that the information must be rather appropriate than excessive, especially, when the training data is limited. Moreover, the requirements of a machine learning system might differ from the ones of a human observer. In this work, we analyze and compare the performance of several neural network architectures and human readers, who had only basic common knowledge on the subject. The example task is gender classification using abdominal computerized tomography (CT) data. It has been demonstrated by our study that training of a neural network purely on pelvic bone segmentation masks is the most efficient and produces highly accurate results, whereas for human observers this information is not sufficient. The study confirms our original assumptions and emphasizes the importance of the appropriate input feature selection for a better performance of a specific task.

Stephan Prettner, Tatyana Ivanovska

Automatic Segmentation of Lymphatic Perfusion in Patients with Congenital Single Ventricle Defects

The Fontan circulation is the surgical end-point for a variety of singleventricle congenital heart lesions. While recent decades have witnessed substantial improvements in survival rates, the associated physiology remains susceptible to severe complications such as protein-losing enteropathy and plastic bronchitis. These complications are often indicative of abnormal congestion in the lymphatic system, underscoring their significance as harbingers of impending comorbidities. An accurate assessment of congestion severity requires the detailed annotation of lymphatic perfusion patterns in each volumetric scan slice. The manual labelling of such intricate data is time-consuming and demands a high level of expertise, rendering it unfeasible within the confines of standard clinical protocols. We use a curated database consisting of manually annotated T2-weighted magnetic resonance imaging (MRI) scans from 71 Fontan patients post-surgery. Following the current state-of-the-art method for biomedical image segmentation, we evaluate its performance on multiple independent test sets regarding the degree of severity and imaging quality. Incorporating the best-performing model,we have developed a user-friendly interface for the automatic segmentation of lymphatic malformations, which will be published before the conference starts.

Marietta Stegmaier, Johanna P. Müller, Christian Schröder, Thomas Day, Michela Cuomo, Oliver Dewald, Sven Dittrich, Bernhard Kainz

Data Augmentation for Images of Chronic Foot Wounds

Training data for Neural Networks is often scarce in the medical domain, which often results in models that struggle to generalize and consequently showpoor performance on unseen datasets. Generally, adding augmentation methods to the training pipeline considerably enhances a model’s performance. Using the dataset of the Foot Ulcer Segmentation Challenge, we analyze two additional augmentation methods in the domain of chronic foot wounds - local warping of wound edges along with projection and blurring of shapes inside wounds. Our experiments show that improvements in the Dice similarity coefficient and Normalized Surface Distance metrics depend on a sensible selection of those augmentation methods.

Max Gutbrod, Benedikt Geisler, David Rauber, Christoph Palm

Segmentation of Acute Ischemic Stroke in Native and Enhanced CT using Uncertainty-aware Labels

In stroke diagnosis, a non-contrast CT (NCCT) is the first scan acquired and bears the possibility to identify ischemic changes in the brain. Their identification and segmentation are subject to high inter-rater variability. We develop and evaluate models based on labels that reflect the uncertainty in segmentation hypotheses by annotation of minimum (“inner”) and maximum (“outer”) contours of perceived presence of infarct core and hypoperfused tissue. These labels are used for training nnU-Net to segment both from NCCT and CT angiography (CTA) scans of 167 patients. The predicted output is post-processed to obtain delineations of the tissue of interest at varying distances between inner and outer contours. Compared to the ground truth, infarcts of medium size (10 to 70 ml) could be segmented in the NCCT scans with a median error of 3.7 ml (6.2 ml for CTA) of excess predicted volume, missing 6.4 ml (3.5 ml) of the infarct.

Linda Vorberg, Oliver Taubmann, Hendrik Ditt, Andreas Maier

Preprocessing Evaluation and Benchmark for Multi-structure Segmentation of the Male Pelvis in MRI on the Gold Atlas Dataset

In radiation therapy (RTx), an accurate delineation of the regions of interest and organs at risk allows for a more targeted irradiation with reduced side effects. In the case of prostate cancer treatments, RTx planning requires the delineation of many pelvic structures. This is a time-consuming task and clinicians would greatly benefit from using robust automatic multi-structure segmentation tools.With the final purpose of introducing an automatic segmentation algorithm in clinical practice, we first address the problem of multi-structure segmentation in pelvic MR using a publicly available dataset. Moreover, we evaluate three types of preprocessing approaches to enable training and inference using different MR sequences. Despite a limited number of training samples, we report an average Dice score of 84.7 ± 10.2% in the segmentation of 8 pelvic structures. The code and the trained models are available at: https://github.com/FrancescaDB/multi_structure_segmentation_gold_atlas

Francesca De Benetti, Smaranda Bogoi, Nassir Navab, Thomas Wendler

Evaluation of Semi-automatic Segmentation of Liver Tumors for Intra-procedural Planning

Transarterial chemoembolization (TACE) is a common procedure for the treatment of intermediate-stage primary liver cancer, in which the blood supply of the tumor is suppressed by occluding the supplying vessels. In this procedure, contrast-enhanced cone-beam computed tomography (CBCT) scans are used to localize the tumor lesions and identify their feeding vessels, potentially aided by segmentation software.With the help of semi-automatic segmentation algorithms, a high-quality segmentation of a tumor can be achieved with minimal user input, such as drawing a line approximating the tumor’s longest axis. In this paper, we conduct a user study for evaluating human tendencies when annotating tumors in this manner, build a simulator based on our findings and design a semi-automatic segmentation method based on DeepGrow, trained on simulated inputs. We compare it to the random walker algorithm, acting as an established baseline measure, on the task of liver tumor segmentation using a dataset of CBCT scans along with simulated user inputs. We discover that human users tend to overestimate tumors with an average distance of 2.8 voxels to the tumor’s boundary. Our customized network outperforms the random walker with an average Dice score of 0.89 and an average symmetric surface distance (ASSD) of 1.16 voxels compared to a Dice score of 0.69 and an ASSD of 2.95 voxels. This shows the potential of learning-based methods to speed up the intraprocedural segmentation workflow.

Dominik Pysch, Maja Schlereth, Mihai Pomohaci, Peter Fischer, Katharina Breininger

Generalizable Kidney Segmentation for Total Volume Estimation

We introduce a deep learning approach for automated kidney segmentation in autosomal dominant polycystic kidney disease (ADPKD). Our method combines Nyul normalization, resampling, and attention mechanisms to create a generalizable network. We evaluated our approach on two distinct datasets and found that our proposed model outperforms the baseline method with an average improvement of 9.45 % in Dice and 79.90 % in mean surface symmetric distance scores across both the datasets, demonstrating its potential for robust and accurate total kidney volume calculation from T1-w MRI images in ADPKD patients.

Anish Raj, Laura Hansen, Fabian Tollens, Dominik Nörenberg, Giulia Villa, Anna Caroli, Frank G. Zöllner

Multi-organ Segmentation in CT from Partially Annotated Datasets using Disentangled Learning

While deep learning models are known to be able to solve the task of multi-organ segmentation, the scarcity of fully annotated multi-organ datasets poses a significant obstacle during training. The 3D volume annotation of such datasets is expensive, time-consuming and varies greatly in the variety of labeled structures. To this end, we propose a solution that leverages multiple partially annotated datasets using disentangled learning for a single segmentation model. Dataset-specific encoder and decoder networks are trained, while a joint decoder network gathers the encoders’ features to generate a complete segmentation mask. We evaluated our method using two simulated partially annotated datasets: one including the liver, lungs and kidneys, the other bones and bladder. Our method is trained to segment all five organs achieving a dice score of 0.78 and an IoU of 0.67. Notably, this performance is close to a model trained on the fully annotated dataset, scoring 0.80 in dice score and 0.70 in IoU respectively.

Tianyi Wang, Chang Liu, Leonhard Rist, Andreas Maier

Abstract: Generation of Synthetic 3D Data using Simulated MR Examinations in Augmented Reality

In the future, medical imaging devices such as computed tomography (CT) and magnetic resonance (MR) are expected to become increasingly autonomous. Therefore, design criteria and workflow of such devices will change substantially. Moreover, sensor data of the system are required to develop scene understanding algorithms to support and guide the user. Data availability might be a critical factor due to patient privacy issues, high cost of labelling data or impossibility to acquire it in dangerous situations. In this work, we present an approach to generate synthetic 3D point cloud data from a simulated MR examination experienced on the Microsoft Hololens2. The complete workflow of an MR examination using a virtual, autonomousMRscanner is reproduced in an AR scene. The user can interact with an avatar of the patient via voice commands, select a procedure in a GUI or position a coil. The user is recorded by a system of active stereo vision RGBD-cameras while interacting with AR elements. A registration routine of the AR scene and the RGBD-cameras is described, and accuracy measurements are provided. The real point clouds are fused with virtually generated point clouds from the AR scene. These point clouds are completely labelled and 3D bounding boxes of the objects as well as the rotation and translation of their corresponding CAD models are saved. Our approach can be used to generate synthetic depth data such as a real depth camera would see once the system – or even a system that does not yet physically exist - is built and installed on-site [1].

Aniol Serra Juhé, Daniel Rinck, Andreas Maier

Smoke Classification in Laparoscopic Cholecystectomy Videos Incorporating Spatio-temporal Information

Heavy smoke development represents an important challenge for operating physicians during laparoscopic procedures and can potentially affect the success of an intervention due to reduced visibility and orientation. Reliable and accurate recognition of smoke is therefore a prerequisite for the use of downstream systems such as automated smoke evacuation systems. Current approaches distinguish between non-smoked and smoked frames but often ignore the temporal context inherent in endoscopic video data. In this work, we therefore present a method that utilizes the pixel-wise displacement from randomly sampled images to the preceding frames determined using the optical flow algorithm by providing the transformed magnitude of the displacement as an additional input to the network. Further, we incorporate the temporal context at evaluation time by applying an exponential moving average on the estimated class probabilities of the model output to obtain more stable and robust results over time. We evaluate our method on two convolutional-based and one state-of-the-art transformer architecture and show improvements in the classification results over a baseline approach, regardless of the network used.

Tobias Rueckert, Maximilian Rieder, Hubertus Feussner, Dirk Wilhelm, Daniel Rueckert, Christoph Palm

Learning High-resolution Delay-and-sum Beamforming

Ultrasound (US) imaging is a versatile tool in modern healthcare diagnostics that often faces spatial resolution challenges. Although ultrasound localization microscopy (ULM) surpasses resolution constraints by fast perfusion scanning, it often relies on traditional delay-and-sum (DAS) beamformers. In response, we propose a differentiable DAS pipeline with a learnable apodization feature descriptor connected to a super-resolution network. Learning apodization weights and image super-resolution contributes to an improvement of B-mode image quality. Quantitative assessment on ULM data and validation with an in vivo dataset demonstrates the effectiveness of our approach. While this study employs ULM data, our findings provide insights that hold broader implications for computational beamforming.

Christopher Hahne

Neural Network-based Sinogram Upsampling in Real-measured CT Reconstruction

Computed tomography (CT) is one of the most popular non-invasive medical imaging modalities. A major downside of medical CT is the exposure of the patient to high-energy X-rays during image acquisition. One way to reduce the amount of ionising radiation is to record fewer projective views and then upsample the resulting subsampled sinogram. Post acquisition, this can be achieved through conventional sinogram interpolation algorithms or using neural networks. This paper compares the results of two upsampling network architectures with the results of conventional sinogram interpolation. We found that for subsampling factors two and four, the neural networks did not substantially improve the model predictions in terms of structured similarity and peak signal-to-noise ratio compared to conventional sinogram interpolation. This suggests that, for these subsampling factors and the given dataset, interpolation approximates the problemwell enough.

Lena Augustin, Fabian Wagner, Mareike Thies, Andreas Maier

Data Consistent Variational Networks for Zero-shot Self-supervised MR Reconstruction

Variational Networks are a common approach in deep learning-based accelerated MR reconstruction. Due to their architecture, they may however fail in enforcing data consistency.We propose an adjustment to the Variational Network, integrating an optimization block that ensures consistency with the measured kspace points. We show the superiority of the method for zero-shot self-supervised 3D reconstruction quantitatively on retrospectively undersampled knee-data, and qualitatively in prospectively undersampled MR angiography images.

Florian Fürnrohr, Jens Wetzl, Marc Vornehm, Daniel Giese, Florian Knoll

Deep Image Prior for Spatio-temporal Fluorescence Microscopy Images DECO-DIP

Image deconvolution and denoising is a common postprocessing step to improve the quality of biomedical fluorescence microscopy images. In recent years, this task has been increasingly tackled with the help of supervised deep learning methods. However, generating a large number of training pairs is, if at all possible, often laborious. Here, we present a new deep learning algorithm called DECO-DIP that builds on the Deep Image Prior (DIP) framework and does not rely on training data. We extend DIP by incorporating a novel loss function that, in addition to a standard L2 data term, contains a term to model the underlying image generation forward model. We apply our framework both to synthetic data and Ca2+ microscopy data of biological samples, namely Jurkat T-cells and astrocytes. DECO-DIP outperforms both classical deconvolution and the standard DIP implementation. We further introduce an extension, DECO-DIP-T, which explicitly utilizes the time dependence in live cell microscopy image series.

Lina Meyer, Lena-Marie Woelk, Christine E. Gee, Christian Lohr, Sukanya A. Kannabiran, Björn-Philipp Diercks, René Werner

Unified Retrieval for Streamlining Biomedical Image Dataset Aggregation and Standardization

Advancements in computational power and algorithmic refinements have significantly amplified the impact and applicability of machine learning (ML), particularly in medical imaging. While ML in general thrives on extensive datasets to develop accurate, robust, and unbiased models, medical imaging faces unique challenges, including a scarcity of samples and a predominance of poorly annotated, heterogeneous datasets. This heterogeneity manifests in varied acquisition conditions, target populations, data formats and structures. Data acquisition of large datasets is often additionally hampered by compatibility issues of source specific downloading tools with high-performance computing (HPC) environments. To address these challenges, we introduce the unified retrieval tool (URT), a tool that unifies the acquisition and standardization of diverse medical imaging datasets to the brain imaging data structure (BIDS). Currently, downloads from the cancer imaging archive (TCIA), OpenNeuro and Synapse are supported, easing access to large-scale medical data. URT’s modularity allows the straightforward extension to other sources. Moreover, URT’s compatibility with Docker and Singularity enables reproducible research and easy application on HPCs.

Raphael Maser, Meryem Abbad Andaloussi, François Lamoline, Andreas Husch

Abstract: Object Detection for Breast Diffusion-weighted Imaging

Diffusion-weighted imaging (DWI) is a rapidly emerging unenhanced MRI technique in oncologic breast imaging. This IRB approved study included n=818 patients (with n=618 malignant lesions in n=268 patients). All patients underwent a clinically indicated multiparametric breast 3T MRI examination, including a multi-b-value DWI acquisition (50,750,1500). We utilized nnDetection, a state-of-the-art self-configuring object detection model, with certain breast cancer-specific extensions to train a detection model. The model was trained with the following extensions: (i) apparent diffusion coefficient (ADC) as additional input, (ii) random bias field, random spike, and random ghosting augmentations, (iii) a size-balanced data loader to ensure that the fewer large lesions were given an equal chance to be picked in a mini-batch and (iv) replacement of the loss function with a size-adjusted focal loss, to prioritize finding primary lesions while disincentivizing small indeterminate false positives. The model was able to achieve an AUC of 0.88 in 5-fold cross-validation using only the DWI acquisition, and compares favorably against multireader performance metrics reported for screening mammography in large studies in the literature (0.81, 0.87, 0.81). It also achieved 0.70 FROC for primary lesions, indicating a relevant localization ability. This study shows that AI has the ability to complement breast cancer screening assessment in DWI-based examinations. This work was originally published at RSNA 2023 [1].

Dimitrios Bounias, Michael Baumgartner, Peter Neher, Balint Kovacs, Ralf Floca, Paul F. Jaeger, Lorenz A. Kapsner, Jessica Eberle, Dominique Hadler, Frederik Laun, Sabine Ohlmeyer, Klaus H. Maier-Hein, Sebastian Bickelhaupt

Abstract: Semi-automatic White Matter Tract Segmentation using Active Learning atTRACTive

Accurately identifying white matter tracts in medical images is essential for various applications, including surgery planning. Supervised machine learning models have reached state-of-the-art solving this task automatically. However, these models are primarily trained on healthy subjects and struggle with strong anatomical aberrations, e.g. caused by brain tumors. This limitation makes them unsuitable for tasks such as preoperative planning, wherefore time-consuming and challenging manual delineation of the target tract is employed.We propose semi-automatic entropy-based active learning for quick and intuitive segmentation of tracts from tractography consisting of millions of streamlines. The method is evaluated on 21 openly available healthy subjects from the Human Connectome Project and an internal dataset of ten neurosurgical cases.With only a few annotations, this approach enables segmenting tracts on tumor cases comparable to healthy subjects (dice = 0.71), while the performance of automatic methods dropped substantially (dice = 0.34). The method named atTRACTive is implemented in the software MITK Diffusion. Manual experiments on tumor data showed higher efficiency than traditional ROI-based segmentation [1].

Robin Peretzke, Klaus Maier-Hein, Jonas Bohn, Yannick Kirchhoff, Saikat Roy, Sabrina Oberli-Palme, Daniela Becker, Pavlina Lenga, Peter Neher

Abstract: Metal Inpainting in CBCT Projections using Score-based Generative Model

During orthopaedic surgery, the insertion of metallic implants or screws is often performed under mobile C-arm systems. However, due to the high attenuation of metals, severe metal artifacts occur in 3D reconstructions, which degrade the image quality significantly. Therefore, many metal artifacts reduction (MAR) algorithms have been developed to reduce the artifacts. In this work, a score-based generative model is trained on simulated knee projections to learn the score function of the perturbed data distribution, and the inpainted images are obtained by removing the noise during the conditional sampling process [1]. Specifically, the backbone of the score-based neural network is a simple U-Net which is conditioned on a time variable while the perturbation kernel utilizes the variance exploding form. A hyperparameter sweep is conducted to confirm the optimal hyperparameters in the sampling process, revealing that a signal-to-noise ratio of 0.4 and a number of discretization steps equal to 1000 achieve the best trade-off between efficiency and accuracy. Finally, the result implies that the inpainted images by the proposed unsupervised method have more detailed information and semantic connection to the bones or soft tissue, achieving the lowest mean absolute error of 0.069 and the highest peak-signal-to-noise ratio of 43.07 compared with the inverse distance weighting interpolation method and the mask pyramid network. Besides, the score-based generative model can also recover projections with large circular and rectangular masks, showing its generalization in inpainting tasks.

Siyuan Mei, Fuxin Fan, Andreas Maier

Abstract: Physics-informed Conditional Autoencoder Approach for Robust Metabolic CEST MRI at 7T

Chemical exchange saturation transfer (CEST) is an MRI technique used to identify solute molecules through proton exchange. The CEST spectrum reveals various metabolite effects, which are extracted using Lorentzian curve fitting. However, the effectiveness of the separation of CEST effects is compromised by the inhomogeneity of the B1 saturation field and noise in the acquisition. These inconsistencies result in variations within the associated metabolic maps. The existing B1 correction methods necessitate a minimum of two sets of CEST spectra. From these, a B1-corrected CEST spectrum at a fixed B1 level is interpolated, effectively doubling the acquisition time. In this study, we investigated the use of an unsupervised physics-informed conditional autoencoder (PICAE) to efficiently correct B1 inhomogeneity and isolate metabolic maps while using a single CEST scan. The proposed method uses two autoencoders. Conditional autoencoder (CAE) for B1 correction of the CEST spectrum at arbitrary B1 levels and Physical Informed Autoencoder (PIAE) for Lorentzian line fitting. CAE consists of fully connected layers whose latent space and input are both conditioned at the B1 level, eliminating the need for a second scan. PIAE uses a fully connected neural network as an encoder and a Lorentzian distribution generator as a decoder. This not only facilitates model interpretation, but also overcomes the shortcomings of traditional curve fitting, in particular its susceptibility to noise. The PICAE-CEST maps showed improved visualization of tumor features compared to the conventional method. The proposed method yielded at least 25% higher structural similarity index (SSIM) compared to the T1-weighted reference image enhanced with the exogenous contrast agent gadolinium in the tumor ring region. In addition, the contrast maps exhibited lower noise and greater homogeneity throughout the brain compared to the Lorentzian fit of the interpolation-based B1-corrected CEST spectrum [1].

Junaid R. Rajput, Tim A. Möhle, Moritz S. Fabian, Angelika Mennecke, Jochen A. Sembill, Joji B. Kuramatsu, Manuel Schmidt, Arnd Dörfler, Andreas Maier, Moritz Zaiss

Comparing Image Segmentation Neural Networks for the Analysis of Precision Cut Lung Slices

Bronchodilators serve as a pivotal intervention for ameliorating symptoms associated with Inflammatory and allergic lung diseases. The objective assessment of bronchodilator efficacy is critical for therapeutic optimization. Measuring airflowvolume through precision cut lung slices (PCLS) imaging at varying time intervals provides a quantitative means to assess airway patency. To enhance the efficiency of this evaluation process, our study extends the existing image segmentation workflow to encompass a wider range of neural networks. Extensive experiments have been conducted across varied data preprocessing methods and loss functions. Furthermore, we contrast the performance differences between single and ensemble models, alongside a visual comparative analysis of their detailed variances in image segmentation. This refined workflow not only surpasses previous experimental results but also enhances the accuracy of lung treatment programs, offering a broader array of choices for future image segmentation tasks.

Mohan Xu, Susann Dehmel, Lena Wiese

Comparison of Deep Learning Image-to-image Models for Medical Image Translation

We conducted a comparative analysis of six image-to-image deep learning models for the purpose of MRI-to-CT image translation comprising resUNet, attUnet, DCGAN, pix2pixGAN with resUNet, pix2pixGAN with attUnet, and the denoising diffusion probabilistic model (DDPM). These models underwent training and assessment using the SynthRAD2023 Grand Challenge dataset. For training, 170 MRI and CT image pairs (patients) were employed, while a set of 10 patients was reserved for testing. In summary, the pix2pixGAN with resUNet achieved the hightest scores (SSIM = 0.81±0.21, MAE = 55.52±3.50, PSNR = 27.19±6.29). The DDPM displayed considerable potential in generating CT images that closely resemble real ones in terms of detail and fidelity. Nevertheless, the quality of its generated images exhibited notable fluctuations. Consequently, further refinement is necessary to stabilize its output quality.

Zeyu Yang, Frank G. Zöllner

3D Deep Learning-based Boundary Regression of an Age-related Retinal Biomarker in High Resolution OCT

Vision is essential for quality of life, but is threatened by visionimpairing diseases like age-related macular degeneration (AMD). A recently proposed biomarker potentially to distinguish normal aging from AMD is the gap visualized between the retinal pigment epithelium (RPE) and the Bruch’s membrane. Due to lack of automated processing, to date, this gap was only described sparsely in histologic data or on optical coherence tomography (OCT) B-scans. By segmenting the posterior RPE boundary automatically for the first time, we enable fully-automatic quantification of the thickness of this gap in vivo across whole volumetric OCT images. Our novel processing pipeline leverages advancements in motion correction, volumetric image merging, and high resolution OCT. A novel 3D boundary regression network named depth map regression network (DMR-Net) estimates the gap thickness in the volume. As 3D networks require full-volume ground truth boundary labels, which are labor-intensive, we developed a novel semi-automatic labeling approach to refine existing labels based on the visibility of the gap with minimal user input. We demonstrate thickness maps across a wide age range of healthy participants (23 – 79 years). The median absolute error in the test set is 0.161 μm, which is well below the axial pixel spacing (0.89 μm). For the first time, our results allow spatially resolved analysis to investigate pathologic deviations in normal aging and AMD.

Wenke Karbole, Stefan B. Ploner, Jungeun Won, Anna Marmalidou, Hiroyuki Takahashi, Nadia K. Waheed, James G. Fujimoto, Andreas Maier

Accelerating Artificial Intelligence-based Whole Slide Image Analysis with an Optimized Preprocessing Pipeline

As the field of digital pathology continues to advance, the computeraided analysis of whole slide images (WSI) has become an essential component for cancer diagnosis, staging, biomarker prediction, and therapy evaluation. However, even with the latest hardware developments, the processing of entire slides still demands significant computational resources. Therefore, many WSI analysis pipelines rely on patch-wise processing by tessellating a WSI into smaller sections and aggregating the results to retrieve slide-level outputs.One commonality among all these algorithms is the necessity for WSI preprocessing to extract patches, with each algorithm having its own requirements such as sliding window extraction or extracting patches at multiple magnification levels. In this paper, we present a novel Python-based software framework that leverages NVIDIA’s cuCIM library and parallelization to accelerate the preprocessing of WSIs, named PathoPatch. Compared to existing frameworks, we achieve a substantial reduction in processing time while maintaining or even improving the preprocessing capabilities. The code is available under https://github.com/TIO-IKIM/PathoPatcher.

Fabian Hörst, Sajad H. Schaheer, Giulia Baldini, Fin H. Bahnsen, Jan Egger, Jens Kleesiek

Abstract: Transient Hemodynamics Prediction using an Efficient Octree-based Deep Learning Model

Patient-specific hemodynamics assessment has the potential to support diagnosis and treatment of neurovascular diseases. Currently, conventional medical imaging modalities are not able to accurately acquire high-resolution hemodynamic information that would be required to assess complex eurovascular pathologies. Alternatively, computational fluid dynamics (CFD) simulations can be applied to tomographic reconstructions to obtain clinically relevant hemodynamic quantities. However, enormous computational resources and expert knowledge would be required to execute CFD simulations, which are usually not available in clinical environments. Recently, deep-learning-based methods have been proposed as CFD surrogates to improve computational efficiency.Nevertheless, the prediction of high-resolution transient CFD simulations for complex vascular geometries poses a challenge to conventional deep learning models. In this work, we present an architecture that is tailored to predict high-resolution (spatial and temporal) velocity fields for complex synthetic vascular geometries. For this, an octreebased spatial discretization is combined with an implicit neural function representation to efficiently handle the prediction of the 3D velocity field for each time step. The presented method is evaluated for the task of cerebral hemodynamics prediction before and during the injection of contrast agent in the internal carotid artery (ICA). Compared to CFD simulations, the velocity field can be estimated with a mean absolute error of 0.024ms−1, whereas the run time reduces from several hours on a high-performance cluster to a few seconds on a consumer graphical processing unit [1]

Noah Maul, Katharina Zinn, Fabian Wagner, Mareike Thies, Maximilian Rohleder, Laura Pfaff, Markus Kowarschik, Annette Birkhold, Andreas Maier

Abstract: Utility-preserving Measure for Patient Privacy

Deep Learning-based Anonymization of Chest Radiographs

Robust and reliable anonymization of chest radiographs constitutes an essential step before publishing large datasets of such for research purposes. The conventional anonymization process is carried out by obscuring personal information in the images with black boxes and removing or replacing meta-information. However, such simple measures retain biometric information in the chest radiographs, allowing patients to be re-identified by a linkage attack. Therefore, there is an urgent need to obfuscate the biometric information appearing in the images.We propose the first deep learning-based approach (PriCheXy-Net) to targetedly anonymize chest radiographs while maintaining data utility for diagnostic and machine learning purposes. Our model architecture is a composition of three independent neural networks that, when collectively used, allow for learning a deformation field that is able to impede patient re-identification. Quantitative results on the ChestX-ray14 dataset show a reduction of patient re-identification from 81.8% to 57.7% (AUC) after re-training with little impact on the abnormality classification performance. This indicates the ability to preserve underlying abnormality patterns while increasing patient privacy. Lastly, we compare our proposed anonymization approach with two other obfuscation-based methods (Privacy-Net, DP-Pix) and demonstrate the superiority of our method towards resolving the privacy-utility tradeoff for chest radiographs. This work has previously been published at MICCAI 2023 [1]. Code is available at https://github.com/kaipackhaeuser/PriCheXy-Net.

Kai Packhäuser, Sebastian Gündel, Florian Thamm, Felix Denzinger, Andreas Maier

Springer Professional

About this book

Table of Contents

Frontmatter