Skip to main content

2025 | Buch

Bildverarbeitung für die Medizin 2025

Proceedings, German Conference on Medical Image Computing, Regensburg March 09-11, 2025

herausgegeben von: Christoph Palm, Katharina Breininger, Thomas Deserno, Heinz Handels, Andreas Maier, Klaus H. Maier-Hein, Thomas M. Tolxdorff

Verlag: Springer Fachmedien Wiesbaden

Buchreihe : Informatik aktuell

insite
SUCHEN

Über dieses Buch

Die Konferenz "BVM - Bildverarbeitung für die Medizin" ist seit vielen Jahren als die nationale Plattform für den Austausch von Ideen und die Diskussion der neuesten Forschungsergebnisse im Bereich der Medizinischen Bildverarbeitung und der Künstlichen Intelligenz (KI) etabliert. Auch 2025 werden wir aktuelle Forschungsergebnisse vorstellen und Gespräche zwischen (jungen) Wissenschaftler*innen, Industrie und Anwender*innen vertiefen. Die Beiträge dieses Bandes – die meisten davon in englischer Sprache - umfassen alle Bereiche der medizinischen Bildverarbeitung, insbesondere die Bildgebung und -akquisition, Segmentierung und Analyse, Registrierung, Visualisierung und Animation, computerunterstützte Diagnose sowie bildgestützte Therapieplanung und Therapie. Hierbei kommen Methoden des maschinellen Lernens, der biomechanischen Modellierung sowie der Validierung und Qualitätssicherung zum Einsatz.

Das Kapitel "Leveraging multiple total body segmentators and anatomy-informed post-processing for segmenting bones in Lung CTs" ist unter einer Creative Commons Attribution 4.0 International License über link.springer.com frei verfügbar (Open Access).

Inhaltsverzeichnis

Frontmatter
Keynote: AI for Analysis of Coronary Artery Disease

Coronary artery disease is a leading cause of morbidity and mortality worldwide. Coronary CT is a non-invasive tool enabling analysis of the coronary arteries and providing important diagnostic information. However, current clinical analysis typically remains limited to visual evaluation of the coronary artery tree as the extraction of the detailed quantitative information requires high level of expertise and it is a time-consuming process. Besides the information about the coronary arteries, these images also contain information about the whole heart and body composition that may be valuable for the prediction of cardiovascular risk.

Ivana Išgum
Keynote: Fifty Years of Medical Image Computing and my Small Part

Reluctantly abandoning my career in rock-and-roll and inspired by the elegant mathematics of reconstruction in computed tomography I started working in medical imaging as a clinical scientist. Working in the clinical environment I appreciated how much in medicine was unknown. I became convinced that a physics and engineering approach could contribute to improving healthcare and that this must be done in close collaboration with clinical colleagues in order to be relevant. I also became convinced that real impact in healthcare engineering would only come about by working closely with industry. Only this would enable wide dissemination.

David Hawkes
Keynote: Autonomous Surgery from a Surgeon’s Perspective

In addition to continuously rising costs, the healthcare system is currently characterised by an increasing shortage of staff. It is currently estimated that by the year 2023 there will be a shortage of 500,000 nurses and 6,000 doctors in Germany alone. Consequences of this are delays in medical treatment, lack of operation in individual departments resulting in medical undersupply and overworked staff.

Dirk Wilhelm
Image Registration for a Dynamic Breathing Model

Respiratory surface electromyography measures the electrical muscle activity during breathing non-invasively. Electrophysiological modeling of the respiratory cycle is a valuable tool for analysis of the signals. A promising approach for dynamic simulations is based on knowing the deformation of the torso at a finite number of time steps between expiration and inspiration. In order to provide a foundation for such models, we present a new image registration method that determines the torso transformation during the respiratory cycle. For this purpose, we extend a ResNet-LDDMM based 3D/3D registration approach. We modify the network structure and add 2D data taken during respiration into the registration to include information about the breathing motion. Our experiments show that these modifications improve the registration quality, thereby providing a step towards a more realistic model of electrical transfer behavior over the respiratory cycle. The code is publicly available at https://github.com/schulz-p/Image-Registration-for-a-Dynamic-Breathing-Model .

Pia F. Schulz, Andra Oltmann, Johannes Bostelmann, Ole Gildemeister, Franz Wegner, Jan Lellmann, Philipp Rostalski, Jan Modersitzki
Abstract: ConvexAdam
Self-configuring Framework for Dual-Optimisation-based 3D Multitask Medical Image Registration

Effective medical image registration aims to align anatomical structures accurately and apply smooth, plausible transformations across diverse imaging tasks. Deep learningbased registration approaches require extensive training data and task-specific configurations, which limits their adaptability and usability across multiple modalities and anatomical regions. We present ConvexAdam [1], a dual-optimization framework that combines convex optimization for global alignment with Adam-based instance optimization for fine-tuning.

Christoph Großbröhmer, Hanna Siebert, Lasse Hansen, Mattias P. Heinrich
Surrogate-based Respiratory Motion Estimation using Physics-enhanced Implicit Neural Representations

Medical image registration plays a key role in radiation-based cancer treatment in the thorax. Naturally thorax image registration is highly patientspecific, requiring registration models to be trained patient individually. The available per-patient data is highly limited and often does not cover all potential scenarios occurring during treatment. In thisworkwe create patient-individual implicit neural representations (INRs) that represent the displacement fields during a breathing cycle. We tackle the data shortage by including physical knowledge, while simultaneously improving generalization capabilities. Our results show that physical constraints can be well integrated into INRs. However, we find that extrapolation capabilities are highly dependent on the induced physical regularization.

Jan Boysen, Hristina Uzunova, Jan Ehrhardt, Heinz Handels
Comparison of Framewise Video Classification in Laryngoscopies

In this study, the performance of single-task and multi-task models, incorporating both static and temporal classification approaches, for various tasks in medical video laryngoscopy (VL) is assessed through a deep learning (DL) image encoder and LSTM networks. The data foundation is an in-house dataset of 464 individual recordings. In contrast to previousworks,we consider the impact of multi-task learning and temporal dependencies in the data through video snippets. Results show that multi-task models outperform single-task models for tasks with sparse labels, indicating the benefits of shared learning across tasks. Moreover LSTM-based models significantly improve temporal consistency and performance for tasks with inherent temporal dependencies such the process state of VL.

Ole Felber, Louis Bellmann, Philipp Breitfeld, Martin Petzoldt, Felix Rindt, René Werner, Maximilian Nielsen
Real-time Fiberscopic Image Improvement for Automated Lesion Detection in the Urinary Bladder

Fiber endoscopes are flexible devices that use glass fibers for optical transmission of images out hollows such as the urinary bladder, trans-nasal cavities, or the lung. Even though bendable tip-chip endoscopes have been available for the past decades, fiberscopes are still in broad use, as they provide good images for a good price. However, images obtained from fiberscopes are particularly degraded by the honeycomb pattern related to the core and cladding of each fiber. To remove such honeycomb patterns for the human visual inspection of natural orifices as well as to condition such images to be used for machine and deep learning tasks, real-time compensation algorithms are needed. Using a large set of >15,000 fiberscopic images from the urinary bladder, two related approaches are investigated, namely (1) how to eliminate honeycomb patterns in real-time and hence improve the image quality, (2) to use such improved image data to train a deep-learning task to detect tumorous lesions in the urinary bladder. The investigated non-parametric filtering approach removes the honeycomb artifacts in the frequency domain using a DoG filter and thresholding. This filtering approach was evaluated on the fiberscopic images with and without visible honeycomb pattern using the BRISQUE image quality algorithm [1]. Secondly, the thus improved images were used as enhanced training dataset for a pre-trained deep neural networks (RT-DETR) for automated lesion detection in the urinary bladder. Based on the BRISQUE score, for approx. 10,000 cystoscopic images the fiber structures could be eliminated and the image quality could be improved. The BRISQUE score improved on average from 31.19 to 18.32. The F1-score for lesion detection using 572 cystoscopic improved from 0.623 to 0.802 using the honeycomb structured images for training, and to F1 = 0.830 with the images with the removed honeycomb structures. The investigated approach is on one hand capable to improve the fiberscopic honeycomb pattern in real-time, while on the other hand the honeycomb removal leads to improved rates for automated lesion detection in the urinary bladder.

Thomas Eixelberger, Karl Weingärtner, Philipp Maisch, Christian Bolenz, Thomas Wittenberg
Abstract: Universal and Flexible Framework for Unsupervised Statistical Shape Model Learning

Statistical shape models (SSM) are an essential tool in medical image analysis and computational anatomy, facilitating a deeper understanding of anatomical variability across populations. Despite the evident utility of SSMs, their creation often comes with the drawback of depending on some form of human supervision, e.g. in the form of correspondence annotations. Although unsupervised learning-based 3D shape matching methods have made a major leap forward in recent years, the correspondence quality of existing methods does not meet the demanding requirements necessary for the construction of SSMs of complex anatomical structures.

Nafie El Amrani, Dongliang Cao, Florian Bernard
Robust Statistical Shape Modelling with Implicit Neural Representations

We present a frameworkfor multi-label statistical shape modelling using implicit neural representations (INRs). By training a generalised INR alongside instance-specific latent codes to model individual shapes as continuous signed distance maps, the approach captures complex anatomical variations without relying on fixed landmarks. We further propose to employ the shape model for regularisation to obtain robust reconstructions of corrupted or incomplete data. Experiments on 2D chest X-ray segmentations demonstrate that this regularisation facilitates reconstructions under flawed data conditions, achieving highfidelity segmentations. Conceptual examples highlight topological advantages of INR-based shape models over conventional point distribution models.

Christoph Großbröhmer, Fenja Falta, Ron Keuth, Timo Kepp, Mattias P. Heinrich
iRBSM
A Deep Implicit 3D Breast Shape Model

We present the first deep implicit 3D shape model of the female breast, building upon and improving the recently proposed Regensburg Breast Shape Model (RBSM). Compared to its PCA-based predecessor, our model employs implicit neural representations; hence, it can be trained on raw 3D breast scans and eliminates the need for computationally demanding non-rigid registration, a task that is particularly difficult for feature-less breast shapes. The resulting model, dubbed iRBSM, captures detailed surface geometry including fine structures such as nipples and belly buttons, is highly expressive, and outperforms the RBSM on different surface reconstruction tasks. Finally, leveraging the iRBSM, we present a prototype application to 3D reconstruct breast shapes from just a single image. Model and code publicly available at https://rbsm.re-mic.de/implicit .

Maximilian Weiherer, Antonia von Riedheim, Vanessa Brébant, Bernhard Egger, Christoph Palm
Diffusion Models for Conditional Brain Tumor MRI Generation with Tumor-induced Deformations

Machine learning methods have achieved remarkable results in medical image processing but require large annotated training datasets, typically unavailable in the medical image domain. To overcome this issue, we propose a conditional diffusion model that generates brain MRIs with tumors directly with the corresponding ground truth annotations. An additional diffusion-based label modification approach is integrated in order to account for the tumor mass effect. Experimental results demonstrate that replacing up to 50% of real training data with generated samples does not significantly impact segmentation performance. Moreover, models trained exclusively on synthetic data still yield acceptable results, highlighting the potential of our approach to mitigate the problem of limited annotated data in medical imaging.

Mona Irsfeld, Heinz Handels, Hristina Uzunova
LLM-driven Baselines for Medical Image Segmentation
A Systematic Analysis

Large Language Models (LLMs) are increasingly utilized in tasks such as code generation for medical image analysis. However, their specific influence on study outcomes remains under-explored. In this study, we provide a comprehensive evaluation of various open- and closed-source LLMs, comparing their performance across medical imaging datasets. Each LLM was tasked with generating code for a U-Net-based baseline for a semantic segmentation task, guided by a tailored prompt.We evaluated each LLM’s generated model performance using the Dice coefficient and recorded all interactions with the LLM. Significant variations in baseline performance were observed among the LLMs, with differences of up to Δ 85.49% for the Bolus, 86.33% for the BAGLS, and 87.32% for the Brain Tumor test dataset. Additionally, we identified LLMs with minimal coding errors (best-performing LLMs: GPT o1 Preview and Claude 3.5 Sonnet with zero errors upon initial code execution; least-performing: Gemini 1.5 Pro and LlAMA 3.1 405B with 15 and 11 errors, respectively). In summary, careful selection of LLMs can significantly enhance medical image analysis code generation and establish reliable baselines for further algorithmic development.

Jasmin Arjomandi, Luisa Neubig, Andreas M. Kist
Efficient Deep Learning-based Forward Solvers for Brain Tumor Growth Models

Glioblastoma, a highly aggressive brain tumor, poses major challenges due to its poor prognosis and high morbidity rates. Partial differential equationbased models offer promising potential to enhance therapeutic outcomes by simulating patient-specific tumor behavior for improved radiotherapy planning. However, model calibration remains a bottleneck due to the high computational demands of optimization methods like Monte Carlo sampling and evolutionary algorithms. To address this, we recently introduced an approach leveraging a neural forward solver with gradient-based optimization to significantly reduce calibration time. This approach requires a highly accurate and fully differentiable forward model. We investigate multiple architectures, including (i) an enhanced TumorSurrogate, (ii) a modified nnU-Net, and (iii) a 3D Vision Transformer (ViT). The optimized TumorSurrogate achieved the best overall results, excelling in both tumor outline matching and voxel-level prediction of tumor cell concentration. It halved the MSE relative to the baseline model and achieved the highest Dice score across all tumor cell concentration thresholds. Our study demonstrates significant enhancement in forward solver performance and outlines important future research directions. Our source code is openly available at https://github.com/ZeinebZH/TumorNetSolvers

Zeineb Haouari, Jonas Weidner, Ivan Ezhov, Aswathi Varma, Daniel Rueckert, Bjoern Menze, Benedikt Wiestler
Is Self-supervision Enough?
Benchmarking Foundation Models Against End-to-end Training for Mitotic Figure Classification

Foundation models (FMs), i.e., models trained on a vast amount of typically unlabeled data, have become popular and available recently for the domain of histopathology. The key idea is to extract semantically rich vectors from any input patch, allowing for the use of simple subsequent classification networks potentially reducing the required amounts of labeled data, and increasing domain robustness. In this work, we investigate to which degree this also holds for mitotic figure classification. Utilizing two popular public mitotic figure datasets, we compared linear probing of five publicly available FMs against models trained on ImageNet and a simple ResNet50 end-to-end-trained baseline. We found that the end-to-end-trained baseline outperformed all FM-based classifiers, regardless of the amount of data provided. Additionally, we did not observe the FM-based classifiers to be more robust against domain shifts, rendering both of the above assumptions incorrect.

Jonathan Ganz, Jonas Ammeling, Emely Rosbach, Ludwig Lausser, Christof A. Bertram, Katharina Breininger, Marc Aubreville
Look, No Convs! Permutation- and Rotation-invariance for MetaFormers

Despite their success for numerous medical image analysis tasks the use of convolutional neural networks and vision transformers (ViT) have an important limitation of permutation- and rotation-dependency. By design the convolutional filter and positional encoding in ViTs impacts the performance with oftentimes severe robustness issues when using scans with a different input orientation. For biomedical images of tissue samples the orientation is arbitrary and ultrasound scans can be flipped depending on usage. The successful rototranslation equivariant networks tackle the problems at the expense of probing multiple versions of the same filter within each layer, resulting in many times higher computational demand.We revisit and expand this concept by implementing it into MetaFormers that are already partially equivariant and have very few layers which require extra computations. Our experimental validation on several MedMNIST datasets demonstrate the advantages of rotational- and permutationinvariance for the replicated Roto-ResNet18 and our novel Roto-MetaFormer-S12.

Mattias P. Heinrich
Abstract: Evaluating the Explainability of Attributes and Prototypes for a Medical Classification Model

Due to the sensitive nature of medicine, it is particularly important and highly demanded that AI methods are explainable. This need has been recognised and there is great research interest in xAI solutions with medical applications. However, there is a lack of user-centred evaluation regarding the actual impact of the explanations. We evaluate attribute- and prototype-based explanations with the Proto-Caps model [1].

Luisa Gallée, Catharina S. Lisson, Christoph G. Lisson, Daniela Drees, Felix Weig, Daniel Vogele, Meinrad Beer, Michael Götz
Evaluating the Fidelity of Explanations for Convolutional Neural Networks in Alzheimer’s Disease Detection

The black-box nature of deep learning still prevents its widespread clinical use due to the high risk of hidden biases and prediction errors. Over the last decade, various explanation methods have been proposed to reveal the latent mechanisms of neural networks and support their decisions. However, interpreting the explanations themselves can be challenging, and there is still little consensus on how to evaluate the quality of explanations. To investigate the fidelity of explanations provided by prominent feature attribution methods for Convolutional Neural Networks in Alzheimer’s Disease (AD) detection, this paper applies relevance-guided perturbation to the Magnetic Resonance Imaging (MRI) input images. According to the fidelity metric, the AD class probability showed the steepest decline when the perturbation was guided by Integrated Gradients or DeepLift. We conclude by highlighting the role of the reference image in feature attribution with regard to AD detection from MRI images. The source code for the experiments is publicly available on GitHub at https://github.com/bckrlab/ad-fidelity .

Bjarne C. Hiller, Sebastian Bader, Devesh Singh, Thomas Kirste, Martin Becker, Martin Dyrba
Abstract: Metrics Reloaded
Recommendations and Online Toolkit for Image Analysis Validation

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. Particularly in automatic biomedical image analysis, chosen performance metrics often do not reflect the domain of interest, thus failing to adequately measure scientific progress and hindering translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics.

Emre Kavur, Lena Maier-Hein, Annika Reinke
Real-time Landmark Guidance for Radial Head Localization in Ultrasound Imaging

This work presents a method to enhance radial head localization in ultrasound imaging through landmark-based guidance. We created a dataset of ultrasound images containing three anatomical classes: radius, radial head, and humerus, recorded by an examiner. Using this dataset, we trained an nnU-Net model to perform real-time landmark detection, generating predictions at a rate of 20 frames per second. In two studies, participants without medical training were asked to locate their own radial head using a handheld ultrasound probe, first without guidance and then with visual and textual cues based on the model’s realtime segmentation. With this landmark-based guidance, 11 out of 12 participants successfully located the radial head, compared to 6 out of 12 without guidance. These results highlight the potential of landmark guidance to improve accuracy and usability in ultrasound interpretation, making it more accessible to users without medical training.

Lennart Meyling, Christoph Großbröhmer, Jürgen Lichtenstein, Mattias P. Heinrich, Lasse Hansen
Ultrasound-based 3D Reconstruction of Residual Limbs using Electromagnetic Tracking

When manufacturing leg prostheses, accurate measurement of the geometry of residual limbs is essential. Conventionally, this is done using plaster casts or optical scanners. Ultrasound (US) appears to be a promising complementary method, as it can be used to scan internal structures such as bone and muscle tissue. In this paper, we demonstrate the feasibility of 3D reconstruction of skin surface and bone of an residual limb phantom using a US probe placed on the skin and localized by electromagnetic tracking. Different movement trajectories were investigated in order to capture as much of the limb as possible. Compared to a computed tomography (CT) reference image, a mean error of 2.3 mm was measured at 23 points distributed on the skin surface, while a mean error of 4.7 mm (n=20) was determined for the bone geometry.

Pauline Heine, Luise Robra, Jan Komposch, Janis Börsig, Jonas Bornmann, Andreas Leiniger, Rainer Brucher, Alfred M. Franz
Autocalibration for 3D Ultrasound Reconstruction in Infant Hip Dysplasia Screening

Hip dysplasia, a prevalent skeletal anomaly, necessitates accurate and early diagnosis to ensure effective treatment. This study presents a novel phantomfree autocalibration technique for 3D reconstruction of 2D ultrasound sweeps, evaluated for infant hip anatomy. By refining the probe-to-image transformation directly on anatomical structures, our approach eliminates the reliance on calibration phantoms, enhancing accessibility and accuracy in clinical settings. We demonstrate that our method improves alignment accuracy compared to traditional calibration methods as well as an existing autocalibration approach. By optimising the alignment of extracted surface keypoints from automatic segmentations, our results demonstrate superior performance, as evidenced by reduced Hausdorff (1.76 mm) and average surface distances (0.38 mm) between sweep reconstructions.

Wiebke Heyer, Christian Weihsbach, Christoph Otte, Jürgen Lichtenstein, Sebastian Lippross, Mattias P. Heinrich, Lasse Hansen
Weakly Supervised Segmentation of Hyper-reflective Foci with Compact Convolutional Transformers and SAM 2

Weakly supervised segmentation has the potential to greatly reduce the annotation effort for training segmentation models for small structures such as hyper-reflective foci (HRF) in optical coherence tomography (OCT). However, most weakly supervised methods either involve a strong downsampling of input images, or only achieve localization at a coarse resolution, both of which are unsatisfactory for small structures. We propose a novel framework that increases the spatial resolution of a traditional attention-based multiple instance learning (MIL) approach by using layer-wise relevance propagation (LRP) to prompt the segment anything model (SAM 2), and increases recall with iterative inference. Moreover, we demonstrate that replacing MIL with a compact convolutional transformer (CCT), which adds a positional encoding, and permits an exchange of information between different regions of the OCT image, leads to a further and substantial increase in segmentation accuracy.

Olivier Morelle, Justus Bisten, Maximilian WM. Wintergerst, Robert P. Finger, Thomas Schultz
Bridging Gaps in Retinal Imaging
Fusing OCT and SLO Information with Implicit Neural Representations for Improved Interpolation and Segmentation

Optical coherence tomography (OCT), the standard clinical imaging procedure in ophthalmology, provides high-resolution cross-sectional images of the retina, but is usually performed with large slice distances. Small structures in the sparsely scanned retina can therefore be missed, and volumetric measurements are impaired. Interpolation methods that work with single images can generate densely sampled volumes, but fail to correctly interpolate shapes and cannot generate information that is missing between the given slices. In thiswork,we propose to use generalized implicit neural representations (INRs) for OCT interpolation and retinal layer segmentation. By using population-based training, the shape representation is improved over baselines, while the training requires only very few annotated image slices thanks to the ability of INRs to handle highly anisotropic data. To enable the integration of inter-slice information, we use additional SLO images, demonstrating a new way to combine different eye imaging modalities. Finally, it is shown that the generalized INR can be adapted to images that were not seen during training, enabling the segmentation of new images.

Timo Kepp, Julia Andresen, Fenja Falta, Heinz Handels
Histologic Dataset of Normal and Atypical Mitotic Figures on Human Breast Cancer (AMi-Br)

Assessment of the density of mitotic figures (MFs) in histologic tumor sections is an important prognostic marker for many tumor types, including breast cancer. Recently, it has been reported in multiple works that the quantity of MFs with an atypical morphology (atypical MFs, AMFs) might be an independent prognostic criterion for breast cancer. AMFs are an indicator of mutations in the genes regulating the cell cycle and can lead to aberrant chromosome constitution (aneuploidy) of the tumor cells. To facilitate further research on this topic using pattern recognition, we present the first ever publicly available dataset of atypical and normal MFs (AMi-Br). For this, we utilized two of the most popular MF datasets (MIDOG 2021 and TUPAC) and subclassified all MFs using a three expert majority vote. Our final dataset consists of 3,720 MFs, split into 832 AMFs (22.4%) and 2,888 normal MFs (77.6%) across all 223 tumor cases in the combined set. We provide baseline classification experiments to investigate the consistency of the dataset, using a Monte Carlo cross-validation and different strategies to combat class imbalance.We found an averaged balanced accuracy of up to 0.806 when using a patch-level data set split, and up to 0.713 when using a patient-level split.

Christof A. Bertram, Viktoria Weiss, Taryn A. Donovan, Sweta Banerjee, Thomas Conrad, Jonas Ammeling, Robert Klopfleisch, Christopher Kaltenecker, Marc Aubreville
Preservation of Image Content in Stain-to-stain Translation for Digital Pathology

In digital pathology, unsupervised domain adaptation of differently stained whole-slide images (WSIs) through image-to-image translation has become increasingly important for various applications such as stain augmentation or for the stain-independent application of deep learning models. In previous work, different variants of generative adversarial networks (GANs) were proposed to translate a real WSI obtained in the staining domain A into a fake WSI in the target staining domain B. However, GANs perform unpaired image-toimage translation and do not enforce consistency with respect to image content, which limits their applicability in digital pathology settings. In this paper, we first investigate the tissue inconsistency problem in such a stain-to-stain translation scenario using a quantitative evaluation of the distortion between real and fake images in different domains. Then, we investigate two possible solutions, namely (1) stain colorization inspired by natural image colorization, and (2) a modified Cycle-GAN, where an intensity invariant loss is proposed to balance the tissue consistency across staining domains. Our results highlight the superiority of these methods compared to conventional unpaired stain translation solutions for typical staining protocols in digital pathology.

Boqiang Huang, Wissem Benjeddou, Nadine S. Schaadt, Johannes Lotz, Friedrich Feuerhake, Dorit Merhof
Automation Bias in AI-assisted Medical Decision-making under Time Pressure in Computational Pathology

Artificial intelligence (AI)-based clinical decision support systems (CDSS) promise to enhance diagnostic accuracy and efficiency in computational pathology. However, human-AI collaboration might introduce automation bias, where users uncritically follow automated cues. This bias may worsen when time pressure strains practitioners’ cognitive resources. We quantified automation bias by measuring the adoption of negative system consultations and examined the role of time pressure in a web-based experiment, where trained pathology experts (n=28) estimated tumor cell percentages. Our results indicate that while AI integration led to a statistically significant increase in overall performance, it also resulted in a 7% automation bias rate, where initially correct evaluations were overturned by erroneous AI advice. Conversely, time pressure did not exacerbate automation bias occurrence, but appeared to increase its severity, evidenced by heightened reliance on the system’s negative consultations and subsequent performance decline. These findings highlight potential risks of AI use in healthcare. The final dataset (including a table with image patch details), participant demographics, and source code are available at: https://github.com/emelyrosbach/AB-TP.git

Emely Rosbach, Jonathan Ganz, Jonas Ammeling, Andreas Riener, Marc Aubreville
Abstract: Re-identification from Histopathology Images

In numerous studies, deep learning algorithms have proven their potential for the analysis of histopathology images, for example, for revealing the subtypes of tumors or the primary origin of metastases. These models require large datasets for training, which must be anonymized to prevent possible patient identity leaks. Our study demonstrates that even relatively simple deep learning algorithms can re-identify patients in large histopathology datasets with substantial accuracy.

Jonathan Ganz, Jonas Ammeling, Samir Jabari, Katharina Breininger, Marc Aubreville
Anatomy-aware Data Augmentation for Multi-organ Segmentation in CT
AnatoMix

Multi-organ segmentation in medical images is a widely researched task and can save much manual efforts of clinicians in daily routines. Automating the organ segmentation process using deep learning (DL) is a promising solution and state-of-the-art segmentation models, such as nnUNet, are achieving promising accuracy. However, overfitting is still a critical issue for DL-based models due to the limited size of segmentation datasets especially in the medical domain. In this work, a novel data augmentation (DA) strategy is proposed to improve the over-fitting problem of multi-organ segmentation datasets, namely AnatoMix. Different from basic DA strategies based on image transformation on a single image, AnatoMix manipulates multiple images in the segmentation dataset and generates new data by mixing multiple images while maintaining human anatomy as realistic as possible. AnatoMix takes the size and location of each organ into consideration, and the corresponding segmentation ground truth is automatically obtained. The initial experiments have been done to extend the publicly available CT-ORG dataset with AnatoMix generating 1,545 new volumes from the 28 original ones. The extended dataset is then evaluated by training a U-Net on the original and the augmented dataset and tested on the same test data. In our experiments the extended dataset leads to mean dice of 76.1, compared with 74.8 on the original dataset. This shows AnatoMix can effectively improve the generalizability of a limited segmentation dataset.

Chang Liu, Fuxin Fan, Annette Schwarz, Andreas Maier
Segmentation of Spinal Necrosis Zones in MRI

Accurate treatment assessment for spinal metastases requires MRIbased delineation of post-ablative necrosis zones. Given the rising incidence of cancer and limited recent research in this area, there is still a need for improvements to address existing limitations and optimize outcomes. In this work, we evaluated the segmentation capabilities of four recent neural networks and explored the optimal imaging sequence combinations with regarding to their segmentation accuracy. The networks were trained using five-fold cross-validation on data from 28 patients with overall 35 necrosis zones. The segmentation performance yielded a Dice Similarity Coefficient of 83.3 ± 13.2% for nnU-Net, 83.2 ± 7.9% for TransUNet, 76.1 ± 11.6% for SwinUNETR and 75.8 ± 14.7% for SwinUNETRV2 when utilizing a combination of contrast-enhanced T1-, native T1-, and T2- weighted sequences. The achieved results represent the current state-of-the-art in spinal necrosis zone segmentation and are on par with the inter-rater-variability of clinical experts. Therefore, this work could play an important role in the assessment of ablative interventions.

Janine Hürtgen, Sylvia Saalfeld, Robert Kreher, Mathias Becker, Georg Rose, Georg Hille
Augmented Reality Prompts for Foundation Model-based Semantic Segmentation

Mixed reality-guided surgery may benefit from live organ segmentation and tracking, such as adapting virtual models to the deformations of target structures. The use of foundation models allows for widely applicable solutions; however, these models need to be prompted for more demanding tasks, such as surgical applications. This work investigates several user interaction concepts for prompting segmentation foundation models directly in augmented reality (AR) using the Microsoft HoloLens for AR-guided surgical applications. To achieve this, we implement eye tracking, finger tracking, ArUco marker tracking, and pointer tracking concepts and evaluate their accuracy in terms of mean absolute errors during pointing experiments at different distances and with different prompters. Furthermore, we assess their impact on segmentation performance in terms of Dice scores and include an initial user study to query the user confidence and preference. While all methods are accurate enough to perform segmentation in phantom experiments, achieving Dice scores greater than 0.96, ArUco marker tracking and eye tracking prove to be the most accurate throughout our experiments. An initial user questionnaire along with feedback from an experienced visceral surgeon indicates a preference for eye tracking and finger tracking methods. Our code is publicly available at GitHub.

Michael Schwimmbeck, Christopher Auer, Johannes Schmidt, Stefanie Remmele
Abstract: nnU-Net Revisited
Call for Rigorous Validation in 3D Medical Image Segmentation

The release of nnU-Net marked a paradigm shift in 3D medical image segmentation, demonstrating that a properly configured U-Net architecture could still achieve state-ofthe- art results. Despite this, the pursuit of novel architectures, and the respective claims of superior performance over the U-Net baseline, continued. In this study, we demonstrate that many of these recent claims fail to hold up when scrutinized for common validation shortcomings, such as the use of inadequate baselines, insufficient datasets, and neglected computational resources. By meticulously avoiding these pitfalls, we conduct a thorough and comprehensive benchmarking of current segmentation methods including CNN-based, Transformer-based, and Mamba-based approaches.

Fabian Isensee, Tassilo Wald, Constantin Ulrich, Michael Baumgartner, Saikat Roy, Klaus Maier-Hein, Paul F. Jäger
Abstract: Information Mismatch in PHH3-assisted Mitosis Annotation Leads to Interpretation Shifts in H&E Slide Analysis

The count of mitotic figures (MFs) observed in hematoxylin and eosin (H&E)-stained slides is an important prognostic marker, as it is a measure for tumor cell proliferation. However, the identification of MFs has a known low inter-rater agreement. In a computer-aided setting, deep learning algorithms can help to mitigate this, but they require large amounts of annotated data for training and validation. Furthermore, label noise introduced during the annotation process may impede the algorithms’ performance. Unlike H&E, where identification of MFs is based mainly on morphological features, the mitosis-specific antibody phospho-histone H3 (PHH3) specifically highlights MFs. Counting MFs on slides stained against PHH3 leads to higher agreement among raters and has therefore recently been used as a ground truth for the annotation of MFs in H&E.

Jonathan Ganz, Christian Marzahl, Jonas Ammeling, Emely Rosbach, Taryn A. Donovan, Samir Jabari, Christof A. Bertram, Katharina Breininger, Marc Aubreville, the study participants
Abstract: Multi-level Cancer Profiling through Joint Cell-graph Representations

Digital pathology has enabled advanced cancer analysis, yet traditional patch-based methods struggle to capture complex, topological structures in whole-slide images (WSIs). This study introduces a graph-based framework using graph neural networks (GNNs) to model histopathology samples as multi-level cell graphs, integrating both cell- and disease-level information for improved classification. Our approach encompasses five cancer types and two staining protocols, modeling each sample as a graph to capture spatial and phenotypic relationships between cells and diseases. We implemented this framework using graph random neural networks (GRAND) [1], achieving a cell-level classification accuracy of 88% and a disease-level accuracy of 83%, significantly outperforming CNN and XGBoost baselines [2, 3]. These results emphasize the potential of GNNs to generalize across multiple cancers and staining methods, providing a valuable diagnostic tool for pathologists. Future research will extend this model to additional cancer types and explore its applicability and robustness in diverse clinical settings, with the goal of achieving interpretable and scalable computational pathology solutions [4].

Luis C. Rivera Monroy, Leonhard Rist, Frauke Wilm, Christian Ostalecki, Andreas Baur, Julio Vera, Katharina Breininger, Andreas Maier
Abstract: TSynD Targeted Synthetic Data Generation for Enhanced Medical Image Classification
Leveraging Epistemic Uncertainty to Improve Model Performance.

Deep learning in medical applications usually faces the problem of limited training data that does not represent most of the relevant distribution. This is due to the fact that, both, the acquisition process of the data is difficult and the labeling of medical data is costly. The former is caused by the complex and expensive nature of medical sensors and strict data privacy laws. The latter is due to the high wages of medical professionals.

Joshua Niemeijer, Jan Ehrhardt, Hristina Uzunova, Heinz Handels
End-to-end Encoders Stabilize Quantum Convolutional Neural Networks for Medical Image Classification

The advent of quantum machine learning has led to the development of quantum convolutional neural networks (QCNNs) for image classification across various domains. A key limitation of current quantum hardware is the limited number of available qubits and the bandwidth required to load data, necessitating classical dimensionality reduction before inputting image data into quantum circuits. The classification performance varies significantly depending on the quantum circuit and the parameters used. We propose integrating a classical encoder for dimensionality reduction before the QCNN and training it end-to-end. Tested against principal component analysis and autoencoder methods on the Pneumonia MedMNIST dataset, our approach improves performance and stabilizes results across different quantum circuits. While our method falls short compared to the classical baseline in terms of performance, it uses significantly fewer parameters.

Leyi Tang, Merlin A. Nau, Andreas K. Maier
Data Augmentation for Liver Tumor Segmentation using Structure, Texture, and Contrast

We present a new data augmentation method to address the data scarcity problem in deep learning based automatic segmentation of liver tumors. A non-rigid registration algorithm was used to generate new anatomical variations of liver tumors from a publicly available benchmark dataset. Additionally, a conditional image-to-image generative adversarial network (GAN) was trained to translate the generated segmentation masks into corresponding textured CT volumes. We used a state-of the-art segmentation model to investigate the benefit of our data augmentation method. Experiments with varying amounts of synthetic data were conducted and the results show that our novel augmentation method improves tumor segmentation performance by approximately 3 %, outperforming similar data augmentation techniques.

Serouj Khajarian, Oliver Amft, Stefanie Remmele
Abstract: IM-MoCo
Self-supervised MRI Motion Correction using Motion-guided Implicit Neural Representations

Motion artifacts in magnetic resonance imaging (MRI) arise from long acquisition times and can compromise the clinical utility of the images obtained. Traditional motion correction methods often struggle with severe motion, leading to distorted and unreliable results. Deep learning (DL) has improved upon these limitations through generalization, even though it also comes with challenges, such as vanishing structures and hallucinations.

Ziad Al-Haj Hemidi, Christian Weihsbach, Mattias P. Heinrich
Assessing Spatial Bias in Medical Imaging
An Empirical Study of PatchGAN Discriminator Effectiveness

Unpaired image-to-image (I2I) translation plays an important role in denoising, super-resolution and modality conversion. These methods often rely on adversarial training for matching two distributions, and they suffer from hallucinations induced by biases. In this work, we study a spatial shift bias in the case of domain translation for retinal optical coherence tomography (OCT). For one domain, the retinas are at the bottom of the image, while in the other domain, the retinas are centered. We show that the conventional PatchGAN discriminator replicates the spatial bias of the target domain. This leads to imperfectly translated images. By putting explicit limitations to the receptive field of the PatchGAN, we recover the ability of the I2I network to truthfully translate OCTs from one domain to the other. Evidence is provided by improvements of Fréchet Inception distance, increased translational equivariance, and increased Dice scores.

Marc S. Seibel, Timo Kepp, Hristina Uzunova, Jan Ehrhardt, Heinz Handels
Unsupervised Single-source Domain Generalization for Robust Quantification of Lymphatic Perfusion

Infants and young children with Fontan circulation, resulting from lifesaving heart surgery, experience altered hemodynamics that can lead to severe, life-threatening complications, often driven by congestion in the lymphatic system. These complications typically become more apparent as the patients grow older. Early detection and management of these complications require precise assessment of lymphatic perfusion through segmentation of lymphatic fluids. However, manual evaluation is time-intensive and subject to variability, underscoring the need for automated, reliable segmentation in clinical practice. Yet, domain shifts arising from variations in MRI acquisition protocols (e.g., BLADE versus SPACE sequences) and scanner differences present significant challenges for current segmentation models. In this work, we propose the use of causalityinspired single-source domain generalization (CISDG) to develop a robust and accurate segmentation network for lymphatic perfusion patterns across diverse imaging domains. Using a dataset of T2-weighted MR images from 71 patients, we demonstrate that our CISDG model outperforms conventional segmentation networks, including nnU-Net, in both source and target domains. Our results indicate that the proposed method not only generalizes effectively across domain shifts but also holds promise for enhancing diagnostic efficiency and reliability in clinical settings.

Lisa K. Fischer, Johanna P. Müller, Christian Schröder, Anja Hanser, Michela Cuomo, Thomas Day, Cheng Ouyang, Oliver Dewald, Oliver Rompel, Sven Dittrich, Thomas Küstner, Bernhard Kainz
Data-proximal Neural Networks for Limited-view CT

Limited-angle computed tomography (CT) requires solving an inverse problem that is both ill-conditioned and underdetermined. In recent years, learned reconstruction methods have proven highly effective in addressing this challenge. Most of these methods follow a two-step process: first, an initial reconstruction method is applied to the data to generate an auxiliary reconstruction; second, a neural network is used to map the auxiliary reconstruction closer to the ground truth images. However, when applied to unseen data, there are no guarantees that the network’s output will remain consistent with the available measurement data. To address this, we recently introduced a data-proximal network architecture. In this paper, we implement this approach for limited-angle CT and compare its performance with a standard residual network and a null space network.

Simon Göppel, Jürgen Frikel, Markus Haltmeier
Towards Robust Zero-shot Chest X-ray Classification
Exploring Data Distribution Bias in Chest X-ray Datasets

In recent years, unsupervised classification models have become increasingly significant, primarily due to the difficulties associated with data labeling and its costs. This trend is also notable in the field of medical imaging, particularly with chest X-rays (CXRs). Among the various unsupervised pretraining methodologies, image-text models like CLIP are highlighted for their considerable enhancements in zero-shot classification. In this study, we perform a detailed analysis of CLIP’s performance using multiple large CXR datasets, investigating how the batch size, dataset size, and distribution biases differentially influence outcomes across various findings. In two distinct experiments,we showan average of 3% enhancement in the macro average zero-shot AUC scores when the batch size is increased, and a corresponding 8% improvement for pneumothorax by the addition of a second dataset. For pleural effusion, where performance is nearly saturated and previous changes had little effect, we examine adding weak supervisory meta-labels and image-to-image contrastive loss, achieving an average 1% improvement in zero-shot AUC. Consequently, our work shows incorporating dataset insights, meta-information and contrastive learning strategies enhances the robustness and accuracy of CLIP-CXR for specific findings.

Sheethal Bhat, Adarsh Bhandary Panambur, Awais Mansoor, Bogdan Georgescu, Sasa Grbic, Andreas Maier
Precision ICU Resource Planning
A Multimodal Model for Brain Surgery Outcomes

Although advances in brain surgery techniques have led to fewer postoperative complications requiring intensive care unit (ICU) monitoring, the routine transfer of patients to the ICU remains the clinical standard, despite its high cost. Predictive Gradient Boosted Trees based on clinical data have attempted to optimize ICU admission by identifying key risk factors pre-operatively; however, these approaches overlook valuable imaging data that could enhance prediction accuracy. In this work, we show that multimodal approaches that combine clinical data with imaging data outperform the current clinical data only baseline from 0.29 [F1] to 0.30 [F1], when only pre-operative clinical data is used and from 0.37 [F1] to 0.41 [F1], for pre- and post-operative data. This study demonstrates that effective ICU admission prediction benefits from multimodal data fusion, especially in contexts of severe class imbalance.

Maximilian Fischer, Florian M. Hauptmann, Robin Peretzke, Paul Naser, Peter Neher, Jan-Oliver Neumann, Klaus Maier-Hein
Systematic Analysis of Input Modalities for Fracture Classification of the Paediatric Wrist

Fractures, particularly in the distal forearm, are among the most common injuries in children and adolescents, with approximately 800 000 cases treated annually in Germany. The AO/OTA system provides a structured fracture type classification, which serves as the foundation for treatment decisions. Although accurately classifying fractures can be challenging, current deep learning models have demonstrated performance comparable to that of experienced radiologists. While most existing approaches rely solely on radiographs, the potential impact of incorporating other additional modalities, such as automatic bone segmentation, fracture location, and radiology reports, remains underexplored. In this work, we systematically analyse the contribution of these three additional information types, finding that combining them with radiographs increases the AUROC from 91.71 to 93.25. Our code is available on https://github.com/multimodallearning/AO_Classification .

Ron Keuth, Maren Balks, Sebastian Tschauner, Ludger Tüshaus, Mattias Heinrich
Intrinsic Correspondence of Classification Ground Truth and Image Content on the Example of Endoscopic Images

On what basis class labels (“ground truth”) get assigned to images heavily depends on the application scenario, sometimes even without visual inspection of the data. Therefore, it can be of interest to evaluate whether distinguishing intrinsic structures exist within the image data. In this study, it is investigated if images from five small-scale endoscopic datasets where class labels were assigned based on domain-specific criteria can be algorithmically clustered into the desired classes. The image classification task is treated as a clustering comparison problem by comparing ground truth labels with clustering results derived from a variety of image representations.

Johannes Schuiki, Andreas Uhl
Abstract: Real-world Federated Learning in Radiology
Hurdles to Overcome and Benefits to Gain

Federated learning (FL) enables collaborative model training while keeping data locally. Currently, most FL studies in radiology are conducted in simulated environments due to numerous hurdles impeding its translation into practice. The few existing real-world FL initiatives rarely communicate specific measures taken to overcome these hurdles.

Markus Bujotzek, Ünal Akünal, Stefan Denner, Peter Neher, Maximilian Zenk, Klaus Maier-Hein, Andreas Bucher, Rickmer Braren
Abstract: Contrastive Learning Approach for Assessment of Speech in Patients with Tongue Cancer using MRI Data

The analysis of human speech with magnetic resonance imaging (MRI) provides essential information on the dynamic processes involved in speech production, allowing unobtrusive monitoring of the complete vocal tract during speech production. In clinical applications, personalized monitoring and increased speed of speech rehabilitation can be achieved through targeted phonological therapy i.e., by breaking down spoken words into their linguistic units. While MRI provides detailed visualization of the anatomical structures involved in speech production, the acoustic information provided by synchronized audio signals allows higher temporal resolution to process speech sounds.

Tomás Arias-Vergara, Paula A. Pérez-Toro, Xiaofeng Liu, Fangxu Xing, Maureen Stone, Jiachen Zhuo, Jerry L. Prince, Maria Schuster, Elmar Nöth, Jonghye Woo, Andreas Maier
Active Learning Pipeline for Biomedical Image Instance Segmentation with Minimal Human Intervention

Biomedical image segmentation is critical for precise structure delineation and downstream analysis. Traditional methods often struggle with noisy data, while deep learning models such as U-Net have set new benchmarks in segmentation performance. nnU-Net further automates model configuration, making it adaptable across datasets without extensive tuning. However, it requires a substantial amount of annotated data for cross-validation, posing a challenge when only raw images but no labels are available. Large foundation models offer zero-shot generalizability, but may underperform on specific datasets with unique characteristics, limiting their direct use for analysis. Thiswork addresses these bottlenecks by proposing a data-centric AIworkflowthat leverages active learning and pseudo-labeling to combine the strengths of traditional neural networks and large foundation models while minimizing human intervention. The pipeline starts by generating pseudo-labels from a foundation model, which are then used for nnUNet’s self-configuration. Subsequently, a representative core-set is selected for minimal manual annotation, enabling effective fine-tuning of the nnU-Net model. This approach significantly reduces the need for manual annotations while maintaining competitive performance, providing an accessible solution for biomedical researchers to apply state-of-the-art AI techniques in their segmentation tasks. The code is available at https://github.com/MMV-Lab/AL_BioMed_img_seg.

Shuo Zhao, Yu Zhou, Jianxu Chen
Improving Segmentation by Combining Preoperative CT and Intraoperative CBCT using Synthetic Data

Computer-assisted interventions enable clinicians to perform precise, minimally invasive procedures, often relying on advanced imaging methods. Cone-beam computed tomography (CBCT) can be used to facilitate computerassisted interventions, despite often suffering from artifacts that pose challenges for accurate interpretation. While the degraded image quality can affect image analysis, the availability of high quality, preoperative scans offers potential for improvements. Here we consider a setting where preoperative CT and intraoperative CBCT scans are available, however, the alignment (registration) between the scans is imperfect to simulate a real world scenario. We propose a multimodal learning method that fuses roughly aligned CBCT and CT scans and investigate the effect on segmentation performance. For this experiment we use synthetically generated data containing real CT and synthetic CBCT volumes with corresponding voxel annotations. We show that this fusion setup improves segmentation performance in 18 out of 20 setups investigated.

Maximilian E. Tschuchnig, Philipp Steininger, Michael Gadermayr
Automatic Thyroid Scintigram Segmentation using U-Net

Thyroid scintigraphy is an important tool to determine thyroid function and pathologies. The manual segmentation of these images is a time-intensive and error-prone task required to evaluate the scintigram. In this paper, a 5-layer U-Net is presented that automatically detects and evaluates thyroids in scintigrams by segmenting the left and right lobe and calculating the uptake used for diagnosis. The dataset used to train the network contains 2 734 different thyroid scintigrams collected over the course of four years from a medical office. The network reaches a median Dice score of 0.921 for the thyroid lobes and 0.937 for the complete thyroid, while maintaining a median difference of 3.520 cm2 for the size of the thyroid and 0.029 percentage points for the uptake. Overall, the trained network has the potential to speed up the diagnostic process, while improving the consistency and accuracy of medical diagnoses of thyroid scintigrams.

Moritz A. Mau, Marius Krusen, Floris Ernst
Coronary Tree Segmentation and Labelling in X-ray Angiography Images Using Graph Deep Learning

The significant impact of coronary artery disease on cardiovascular health underscores the need for precise diagnostic methods. The SYNTAX score, based on the locations of stenoses in the coronary tree, provides a quantitative estimation of the disease’s complexity, allowing to settle on a treatment strategy. However, expert experience is required for visual localization of stenoses, which would be facilitated by automated labelling of the coronary segments. In this work we represent the connected structure of the coronary tree as a graph, extracted from the angiography image, onto which we apply a graph convolutional network to label the coronary segments according to the SYNTAX score scheme, before creating a multi-class segmentation mask. The method was trained and evaluated using the ARCADE dataset. The segment classification of graph nodes reached an F1-score of 53.68. Our approach achieved a mean F1-score of 45.43 for multiclass segmentation of the image after applying the entire pipeline. Although a baseline nnU-Net model achieves better performance (mean F1-score of 62.53), the results are comparable to other existing approaches and show that graph-based approaches are suitable for the complex task of coronary segment labelling.

Robin Gayet, Alaa Abd El Al, Alexander Meyer, Anja Hennemuth, Matthias Ivantsits, Antonia Popp
Abstract: Skeleton Recall Loss
Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures

Accurate segmentation of thin tubular structures - e.g., vessels, nerves or roads - is crucial in computer vision. Standard segmentation loss functions, like dice or cross-entropy, focus on volumetric overlap, often neglecting structural connectivity or topology. This can lead to segmentation errors affecting tasks such as flow calculation, navigation, and structural inspection.

Yannick Kirchhoff, Maximilian R. Rokuss, Saikat Roy, Balint Kovacs, Constantin Ulrich, Tassilo Wald, Maximilian Zenk, Philipp Vollmuth, Jens Kleesiek, Fabian Isensee, Klaus Maier-Hein
Unified Framework for Foreground and Anonymization Area Segmentation in CT and MRI Data

This study presents an open-source toolkit to address critical challenges in preprocessing data for self-supervised learning (SSL) for 3D medical imaging, focusing on data privacy and computational efficiency. The toolkit comprises two main components: a segmentation network that delineates foreground regions to optimize data sampling and thus reduce training time, and a segmentation network that identifies anonymized regions, preventing erroneous supervision in reconstruction-based SSL methods. Experimental results demonstrate high robustness, with mean Dice scores exceeding 98.5 across all anonymization methods and surpassing 99.5 for foreground segmentation tasks, highlighting the toolkit’s efficacy in supporting SSL applications in 3D medical imaging for both CT and MRI images. The weights and code is available at https://github.com/MIC-DKFZ/Foreground-and-Anonymization-Area-Segmentation .

Michal Nohel, Constantin Ulrich, Jonathan Suprijadi, Tassilo Wald, Klaus Maier-Hein
Automatic Detection of Bronchoscopes on X-ray Images

Bronchoscopy is a crucial step in lung cancer diagnosis after observing a nodule in a lung computed tomography (CT) scan. However, it is often a difficult procedure, particularly for less experienced clinicians. We propose a pipeline for the automatic detection of bronchoscopes in x-ray images to support clinicians in lung airway navigation. Segmentation of bronchoscopes, like other tubular structures, is inherently difficult due to their narrowshape and poor image contrast. Our approach employs two preprocessing strategies, contrast limited adaptive histogram equalization (CLAHE) and negative log (NegLog) transform, along with two loss functions: mean squared Error (MSE) and weighted MSE (WMSE), with MSE serving as the baseline. We evaluate on two different datasets, and the model using WMSE as loss function along with CLAHE as preprocessing can predict 99% of the test images with 0.68 mean Dice similarity coefficient (DSC), 0.63 centerline Dice and 3 mm pixel distance. Additionally, we apply random sample consensus (RANSAC) and skeletonization algorithms for postprocessing and polyline retrieval and our findings indicate that skeletonization offers a more robust solution for extracting polylines from tubular structures compared to RANSAC.

Maryam Parvin, Maximilian Rohleder, Andreas Maier, Holger Kunze
Automated Segmentation and Analysis of Cone Photoreceptors in Multimodal Adaptive Optics Imaging

Accurate detection and segmentation of cone cells in the retina are essential for diagnosing and managing retinal diseases. In this study, we used advanced imaging techniques, including confocal and non-confocal split detector images from adaptive optics scanning light ophthalmoscopy (AOSLO), to analyze photoreceptors for improved accuracy. Precise segmentation is crucial for understanding each cone cell’s shape, area, and distribution. It helps to estimate the surrounding areas occupied by rods, which allows the calculation of the density of cone photoreceptors in the area of interest. In turn, density is critical for evaluating overall retinal health and functionality. We explored two U-Net-based segmentation models: StarDist for confocal and Cellpose for non-confocal modalities. Analyzing cone cells in images from two modalities and achieving consistent results demonstrates the study’s reliability and potential for clinical application.

Prajol Shrestha, Mikhail Kulyabin, Aline Sindel, Hilde R. Pedersen, Stuart Gilson, Rigmor Baraas, Andreas Maier
Comprehensive Dataset of Coarse Tumor Annotations for The Cancer Genome Atlas Breast Invasive Carcinoma

Automated tumor segmentation of histologic images is crucial for the development of computer-assisted diagnostic workflows aiming at accurate prognostication. We present a dataset of coarse annotations of over 1,000 highresolution breast tumor images from The cancer genome atlas breast invasive carcinoma (TCGA-BRCA) repository, each annotated with binary segmentation masks that delineate coarse tumor areas. Additionally, a subset of 20 images includes fine-grained annotations, providing precise delineation of tumor boundaries beyond the broader outlines used in coarse annotations. Initial evaluations using U-Net and DeepLabv3 models show promising segmentation performance. On a held-out, coarsely annotated test set, U-Net achieved an average intersection over union (IoU) score of 0.795, while DeepLabv3 scored 0.783. On the finely annotated subset of this test set, U-Net reached an average IoU of 0.746, with DeepLabv3 slightly outperforming at 0.765. The public availability of this dataset aims to support research in automated tumor analysis, advancing diagnostic workflows and thereby ultimately improving breast cancer management.

Sweta Banerjee, Christof A. Bertram, Jonas Ammeling, Viktoria Weiss, Thomas Conrad, Robert Klopfleisch, Christopher Kaltenecker, Katharina Breininger, Marc Aubreville
Abstract: Neural Cellular Automata Learn and Predict on Low-Power Devices

Various deep-learning architectures have been proposed for medical image analysis in recent times. For example, the U-Net-based models, not to forget the nnU-Net framework, set the current gold standard in medical image segmentation. More recently, vision transformers have been adopted for medical image processing, improving accuracy even further.

Nick Lemke, Mirko Konstantin, Henry Krumb, John Kalkhof, Anirban Mukhopadhyay
Abstract: Nuclear Pleomorphism in Canine Cutaneous Mast Cell Tumors

In histological preparations of tumors, an important criterion for evaluation of malignancy is the variation of nuclear size (anisokaryosis), which is traditionally estimated by pathologists. To improve reproducibility, nuclear size measurements (morphometry) can be performed. In this study [1] we developed a segmentation-based nuclear morphometry algorithm and compared its prognostic value with pathologist’s estimates (routine method) and manual morphometry (gold standard method).

Andreas Haghofer, Eda Parlak, Alexander Bartel, Taryn A. Donovan, Charles-Antoine Assenmacher, Pompei Bolfa, Michael J. Dark, Andrea Fuchs-Baumgartinger, Andrea Klang, Kathrin Jäger, Robert Klopfleisch, Sophie Merz, Barbara Richter, Yvonne Schulman, Hannah Janout, Jonathan Ganz, Josef Scharinger, Marc Aubreville, Stephan M. Winkler, Matti Kiupel, Christof A. Bertram
Abstract: Leveraging Image Captions for Selective Whole Slide Image Annotation

Obtaining dense annotations for histopathological whole-slide images (WSI), such as segmentation masks or mitotic figure identification, is a labor intensive process due to the large image size and the extensive manual effort required for annotation. Identifying informative regions in WSIs for annotation while leaving other regions unlabeled can significantly reduce the annotation effort. These selected annotation regions should contain valuable training information that allows proper model training without significantly impacting performance compared to full annotation.

Jingna Qiu, Marc Aubreville, Frauke Wilm, Mathias Öttl, Jonas Utz, Maja Schlereth, Katharina Breininger
Abstract: Würstchen
An Efficient Architecture for Large-scale Text-to-image Diffusion Models

We introduceWürstchen , a novel architecture for text-to-image synthesis that combines competitive performance with unprecedented cost-effectiveness for large-scale text-toimage diffusion models. A key contribution of our work is to develop a latent diffusion technique in which we learn a detailed but extremely compact semantic image representation used to guide the diffusion process. This highly compressed representation of an image provides much more detailed guidance compared to latent representations of language and this significantly reduces the computational requirements to achieve state-of-the-art results.

Pablo Pernias, Dominic Rampas, Mats L. Richter, Christopher J. Pal, Marc Aubreville
Enhancing Zero-shot Learning in Chest X-ray Diagnostics using BioBERT for Textual Representation

Accurate diagnosis of pulmonary diseases from chest X-rays remains a challenging task, especially due to the scarcity of labeled data. In this work, we propose an enhanced zero-shot learning framework that integrates BioBERT—a pre-trained model for biomedical text representation—into a contrastive learning pipeline for medical imaging diagnostics. Our method leverages the synergy between textual radiology reports and image data by combining BioBERT with a CLIP-based architecture. To address the lack of textual data in standard datasets, we generate synthetic radiology reports for each pathology, aiming to mimic the complexity of actual clinical descriptions. Through extensive experiments, we demonstrate significant improvements in disease classification accuracy and generalization, particularly in zero-shot inference scenarios. Our analysis also reveals how similarities in pathology descriptions can lead to misclassifications, emphasizing the importance of nuanced textual representations. This work establishes a zero-shot learning method in medical imaging, highlighting the potential of BioBERT in enhancing automated diagnostic systems for chest X-rays.

Prakhar Bhardwaj, Sheethal Bhat, Andreas Maier
Abstract: Simulation-informed Learning for Time-resolved Angiographic Contrast Agent Concentration Reconstruction

Three-dimensional digital subtraction angiography (3D-DSA) is a well-established Xray- based technique for visualizing vascular anatomy. Recently, four-dimensional DSA (4D-DSA) reconstruction algorithms have been developed to enable the visualization of volumetric contrast flow dynamics through time-series of volumes. This reconstruction problem is ill-posed mainly due to vessel overlap in the projection direction and geometric vessel foreshortening, which leads to information loss in the recorded projection images.

Noah Maul, Annette Birkhold, Fabian Wagner, Mareike Thies, Maximilian Rohleder, Philipp Berg, Markus Kowarschik, Andreas Maier
U-Net and GAN for Virtual Contrast in Breast MRI
How Do They Compare to Real Contrast Images?

Multiparametric MRI with gadolinium-based contrast agents demonstrates high sensitivity for detecting lesions in the breast, particularly in women with denser breast tissue. However, its use is limited by increased costs, time and contraindications in certain patients. This study explores a pix2pix generative adversarial network (GAN) to create virtual contrast-enhanced (vCE) MRI from un-enhanced T1w, T2w, and DWI sequences and compares it with a U-Net model. The vCE GAN achieved an SSIM of 80.75 and PSNR of 21.90, while the vCE U-Net scored 87.39 and 24.39, respectively. A multi-reader Turing test showed that 45.89% of vCE GAN images were rated as real, comparable to 45.09% for true CE images. In contrast, 47.25% of vCE U-Net images were rated as real.

Aju George, Hannes Schreiter, Julian Hossbach, Tri-Thien Nguyen, Ihor Horishnyi, Chris Ehring, Shirin Heidarikahkesh, Lorenz A.Kapsner, Frederik B.Laun, Michael Uder, Sabine Ohlmeyer, Sebastian Bickelhaupt, Andrzej Liebert
Abstract: Client Security Alone Fails in Federated Learning
2D and 3D Attack Insights

Federated learning (FL) plays a vital role in boosting both accuracy and privacy in the collaborative medical imaging field. The importance of privacy increases with the diverse security standards across nations and corporations, particularly in healthcare and global FL initiatives. Current research on privacy attacks in federated medical imaging focuses on sophisticated gradient inversion attacks that can reconstruct images from FL communications.

Santhosh Parampottupadam, Ralf Floca, Dimitrios Bounias, Benjamin Hamm, Saikat Roy, Sinem Sav, Maximilian Zenk, Klaus Maier-Hein
Abstract: Selective Reduction of CT Data for Self-supervised Pre-training Improves Downstream Classification Performance

Self-supervised pre-training of deep learning models with contrastive learning is a widely used technique in image analysis. As shown in our previous work [1] contrastive pre-training has strong potential on medical images. However, further research is necessary to incorporate the particular characteristics of these images.We hypothesize that the similarity of medical images hinders the success of contrastive learning in the medical imaging domain.

Daniel Wolf, Tristan Payer, Catharina S. Lisson, Christoph G. Lisson, Meinrad Beer, Michael Götz, Timo Ropinski
Abstract: Death by Retrospective Undersampling
Caveats and Solutions for Learning-Based MRI Reconstructions

This study challenges the validity of retrospective undersampling in MRI data science by analysis via an MRI physics simulation. We demonstrate that retrospective undersampling, a method often used to create training data for reconstruction models, can inherently alter MRI signals from their prospective counterparts. This arises from the sequential nature of MRI acquisition, where undersampling post-acquisition effectively alters the MR sequence and the magnetization dynamic in a non-linear fashion.

Junaid R. Rajput, Simon Weinmueller, Jonathan Endres, Peter Dawood, Florian Knoll, Andreas Maier, Moritz Zaiss
Two-stage Approach for Low-dose and Sparse-angle CT Reconstruction using Backprojection

This paper presents a novel two-stage approach for computed tomography (CT) reconstruction, focusing on sparse-angle and low-dose setups to minimize radiation exposure while maintaining high image quality. Two-stage approaches consist of an initial reconstruction followed by a neural network for image refinement. In the initial reconstruction, we apply the backprojection (BP) instead of the traditional filtered backprojection (FBP). This enhances computational speed and offers potential advantages for more complex geometries, such as fan-beam and cone-beam CT. Additionally, BP addresses noise and artifacts in sparse-angle CT by leveraging its inherent noise-smoothing effect, which reduces streaking artifacts common in FBP reconstructions. For the second stage, we fine-tune the DRUNet proposed by Zhang et al. to further improve reconstruction quality. We call our method BP-DRUNet and evaluate its performance on a synthetically generated ellipsoid dataset alongside thewell-established LoDoPaBCT dataset. Our results show that BP-DRUNet produces competetive results in terms of PSNR and SSIM metrics compared to the FBP-based counterpart, FBPDRUNet, and delivers visually competitive results across all tested angular setups.

Tim Selig, Patrick Bauer, Jürgen Frikel, Thomas März, Martin Storath, Andreas Weinmann
Learned Shift-variant CBCT Reconstruction Weights for Non-continuous Trajectories

The differentiable shift-variant filtered back-projection neural network provides an efficient solution for the reconstruction of cone-beam computed tomography (CBCT) data with arbitrary source trajectories. This data-driven approach enables the automatic estimation of trajectory-specific redundancyweights, which are difficult to calculate in practice. The objective of this study is to learn redundancy weights tailored for non-continuous trajectories. The experimental results using random and random nearest-neighbor reordered (RNNR) trajectories show that the model achieves effective image reconstruction, even with non-continuous trajectories. A quantitative analysis of the reconstruction results demonstrates minimal variation in performance metrics across different random seeds, which highlights the model’s robustness and stability. Furthermore, the model’s consistency across random and RNNR trajectories demonstrates that a data-driven approach to learning redundancy weights is not strictly dependent on trajectory ordering. This marks a significant improvement over traditional analytic method. The approach provides a flexible and effective alternative for CBCT imaging in scenarios with non-continuous or even unordered trajectories, opening new possibilities for imaging applications.

Chengze Ye, Linda-Sophie Schneider, Yipen Sun, Mareike Thies, Andreas Maier
Self-supervised 3D Vision Transformer Pre-training for Robust Brain Tumor Classification

Brain tumors pose significant challenges in neurology, making precise classification crucial for prognosis and treatment planning. This work investigates the effectiveness of a self-supervised learning approach–masked autoencoding (MAE)–to pre-train a vision transformer (ViT) model for brain tumor classification. Our method uses non-domain specific data, leveraging the ADNI and OASIS-3 MRI datasets, which primarily focus on degenerative diseases, for pretraining. The model is subsequently fine-tuned and evaluated on the BraTS glioma and meningioma datasets, representing a novel use of these datasets for tumor classification. The pre-trained MAE ViT model achieves an average F1 score of 0.91 in a 5-fold cross-validation setting, outperforming the nnU-Net encoder trained from scratch, particularly under limited data conditions. These findings highlight the potential of self-supervised MAE in enhancing brain tumor classification accuracy, even with restricted labeled data.

Danilo Weber Nunes, David Rauber, Christoph Palm
Abstract: Evolutionary Normalization Optimization Boosts Semantic Segmentation Network Performance

Normalization methods play a critical role in the performance and generalization of deep neural networks for semantic segmentation, particularly in medical imaging. While batch normalization (BN) is widely used, its limitations, such as reduced performance with small batch sizes, have led to alternative normalization methods, such as instance normalization (IN), layer normalization (LN), group normalization (GN), and filter response normalization (FRN). However, most segmentation networks apply a single normalization method uniformly, without optimizing layer- or task-specific configurations.

Luisa Neubig, Andreas M. Kist
Integration of Key-value Attention into Pure and Hybrid Transformers for Semantic Segmentation

While CNNswere long considered state of the art for image processing, the introduction of Transformer architectures has challenged this position. While achieving excellent results in image classification and segmentation, Transformers remain inherently reliant on large training datasets and remain computationally expensive. A newly introduced Transformer derivative named KV Transformer shows promising results in synthetic, NLP, and image classification tasks, while reducing complexity and memory usage. This is especially conducive to use cases where local inference is required, such as medical screening applications. We endeavoured to further evaluate the merit of KV Transformers on semantic segmentation tasks, specifically in the domain of medical imaging. By directly comparing traditional and KV variants of the same base architectures, we provide further insight into the practical tradeoffs of reduced model complexity. We observe a reduction in parameter count and multiply accumulate operations between 5% and 10% , while achieving similar performance from most of the KV variant models when directly compared to their QKV implementation.

DeShin Hwa, Tobias Holmes, Klaus Drechsler
Category-fragment Segmentation Framework for Pelvic Fracture Segmentation in X-ray Images

Pelvic fractures, often caused by high-impact trauma, frequently require surgical intervention. Imaging techniques such as CT and 2D X-ray imaging are used to transfer the surgical plan to the operating room through image registration, enabling quick intraoperative adjustments. Specifically, segmenting pelvic fractures from 2D X-ray imaging can assist in accurately positioning bone fragments and guiding the placement of screws or metal plates. In this study, we propose a novel deep learning-based category and fragment segmentation (CFS) framework for the automatic segmentation of pelvic bone fragments in 2D X-ray images. This framework consists of three consecutive steps. First, the category segmentation network extracts the left and right ilia and sacrum from X-ray images. Then, the fragment segmentation network further isolates the fragments in each masked bone region. Finally, the initially predicted bone fragments are reordered and refined through post-processing operations to form the final prediction. In the best-performing model, segmentation of pelvic fracture fragments achieves an intersection over union (IoU) of 0.91 for anatomical structures and 0.78 for fracture segmentation. Experimental results demonstrate that our CFS framework is effective in segmenting pelvic categories and fragments. For further research and development, the source code are publicly available at https://github.com/DaE-plz/CFSSegNet .

Daiqi Liu, Fuxin Fan, Andreas Maier
DualPath-FFNet
Dynamic Fusion of Pre-trained Features with Sigmoid-gated Attention for Improved Medical Image Segmentation

In medical imaging, accurate image segmentation is essential for precise diagnosis and treatment planning. However, challenges such as image quality variations, complex anatomical structures, and the impracticality of traditional segmentation methods require extensive, source-specific feature extraction. In this study, we introduce DualPath-FFNet, a novel deep-learning architecture for medical image segmentation. The architecture utilizes DenseNet201 and Efficient- NetV2 pre-trained networks as encoders to extract diverse feature representations from input images. Two decoders generate intermediate outputs, refined with the original input image and further enhanced by skip connections. A Sigmoid gated attention (SGA) module dynamically fuses features, prioritizing task-specific information. A third encoder aggregates information from previous stages, integrating skip connections and utilizing the SGA module. This dynamic fusion, combined with multi-view encoding and skip connections, promotes robust and informative representation learning. The performance of this model was comparatively evaluated using dice coefficient and Intersection over Union metrics over 5 different types of medical image data set against three other relavant models. DualPath-FF Net achieves an average Dice score of 0.914 across multiple datasets, demonstrating robust performance and generalizability.

Sadat H. Chowdhury, Mohamed Y. Jabarulla, Hinrich B. Winther, Steffen Oeltze-Jafra
Abstract: Longitudinal Segmentation of MS Lesions via Temporal Difference Weighting

Accurate segmentation of multiple sclerosis (MS) lesions in longitudinal MRI scans is crucial for monitoring disease progression and treatment efficacy. Although changes across time are taken into account when assessing images in clinical practice, most existing deep learning methods treat scans from different timepoints separately. Among studies utilizing longitudinal images, a simple channel-wise concatenation is the primary albeit suboptimal method employed to integrate timepoints. We introduce a novel approach that explicitly incorporates temporal differences between baseline and follow-up scans through a unique architectural inductive bias called difference weighting block [1].

Maximilian R. Rokuss, Yannick Kirchhoff, Saikat Roy, Balint Kovacs, Constantin Ulrich, Tassilo Wald, Maximilian Zenk, Stefan Denner, Fabian Isensee, Philipp Vollmuth, Jens Kleesiek, Klaus Maier-Hein
Internal Organ Localization using Depth Images
A Framework for Automated MRI Patient Positioning

Automated patient positioning is a crucial step in streamlining MRI workflows and enhancing patient throughput. RGB-D camera-based systems offer a promising approach to automate this process by leveraging depth information to estimate internal organ positions. This paper investigates the feasibility of a learning-based framework to infer approximate internal organ positions from the body surface. Our approach utilizes a large-scale dataset of MRI scans to train a deep learning model capable of accurately predicting organ positions and shapes from depth images alone. We demonstrate the effectiveness of our method in localization of multiple internal organs, including bones and soft tissues. Our findings suggest that RGB-D camera-based systems integrated into MRI workflows have the potential to streamline scanning procedures and improve patient experience by enabling accurate and automated patient positioning.

Eytan Kats, Kai Geißler, Jochen G. Hirsch, Stefan Heldman, Mattias P. Heinrich

Open Access

Leveraging Multiple Total Body Segmentators and Anatomy-informed Post-processing for Segmenting Bones in Lung CTs

Accurate segmentation of structures in CT is essential for clinical tasks such as tumour staging, radiotherapy planning, fracture assessment, and monitoring of disease progression. Current deep learning-based automated "segmentators" face challenges due to variability in scanner parameters, anatomical regions, and training data, which impact performance consistency across diverse datasets. We evaluated various total body segmentators on publicly available lung CT data excluded from their training sets. We found that these segmentators exhibit label mixing within individual ribs and vertebrae, often requiring anatomy-informed post-processing steps to improve accuracy. Combining multiple models and incorporating anatomical information enhances segmentation outcomes compared to using single models, highlighting the complementary strengths of different segmentation approaches and task-dependent a priori knowledge.

Lukas Förner, Kartikay Tehlan, Constantin Bauer, Josua A. Decker, Thomas Wendler
Evaluation of TransUNet for the Segmentation of Retinal Structures in OCT-A

Optical coherence tomography angiography (OCT-A) is a novel, noninvasive technology for visualizing retinal structures. Segmentation of these structures can indicate ophthalmic diseases such as diabetic retinopathy or glaucoma to aid diagnosis. However, limited data availability and artifacts make this task challenging. Adapted from other domains, transformer-based models yield promising results in medical image segmentation, challenging U-Net as the de facto standard method. This work evaluates TransUNet, a hybrid transformer and convolutionbased architecture against U-Net in the task of segmenting retinal structures in OCT-A in a multifaceted comparison. Robustness under a reduced training volume and the effect of varying degrees of data augmentation, including noise simulations specific to OCT-A, are included as further aspects of comparison. Although no significant advantage in overall segmentation performance was found, TransUNet outperformed U-Net in the data-reduced setting and showed a better aptitude to capitalize on augmented data.

Leonie Schüßler, Anna-Sophia Hertlein, Alexander K. Schuster, Stefan Wesarg
Camera-based Guide Wire Tracking for a Hybrid Neurovascular Intervention Training System

Currently, there are completely virtual training simulators as well as 3D printed vessel phantoms available to train neuroradiology interventions. With this work, we aim to combine both approaches by a camera-based guide wire tracking for the first time. To achieve this, the camera has to be calibrated to the mask of the 3D printed vessel phantom. Then, the guide wire is segmented in the camera image, its 3D position is estimated, and then it is projected into a virtual projection of a whole-body CT scan. A resulting example of our approach is shown, and further necessary future work is discussed.

Sonja Wichelmann, Roman Leonov, Claudia Rittmüller, Torben Pätz
Abstract: Learned Image Compression for HE-stained Histopathological Images via Stain Deconvolution

Processing histopathological whole slide images (WSI) leads to massive storage requirements for clinics worldwide. Even after lossy image compression during image acquisition, additional lossy compression is frequently possible without substantially affecting the performance of deep learning-based (DL) downstream tasks. In this paper, we show that the commonly used JPEG algorithm is not best suited for further compression and we propose stain quantized latent compression (SQLC), a novel DL based histopathology data compression approach.

Maximilian Fischer, Peter Neher, Tassilo Wald, Constantin Ulrich, Peter Schüffler, Shuhan Xiao, Silvia Dias Almeida, Alexander Muckenhuber, Rickmer Braren, Michael Götz, Jens Kleesiek, Marco Nolden, Klaus Maier-Hein
Backmatter
Metadaten
Titel
Bildverarbeitung für die Medizin 2025
herausgegeben von
Christoph Palm
Katharina Breininger
Thomas Deserno
Heinz Handels
Andreas Maier
Klaus H. Maier-Hein
Thomas M. Tolxdorff
Copyright-Jahr
2025
Electronic ISBN
978-3-658-47422-5
Print ISBN
978-3-658-47421-8
DOI
https://doi.org/10.1007/978-3-658-47422-5