Skip to main content
Top

2025 | Book

Simulation and Synthesis in Medical Imaging

9th International Workshop, SASHIMI 2024, Held in Conjunction with MICCAI 2024, Marrakesh, Morocco, October 10, 2024, Proceedings

Editors: Virginia Fernandez, Jelmer M. Wolterink, David Wiesner, Samuel Remedios, Lianrui Zuo, Adrià Casamitjana

Publisher: Springer Nature Switzerland

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This book constitutes the refereed proceedings of the 9th International Workshop on Simulation and Synthesis in Medical Imaging, SASHIMI 2024, held in conjunction with the 27th International conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2024, in Marrakesh, Morocco in October 2024.

The 19 papers included in this book were carefully reviewed and selected from 32 submissions. They focus on recent developments in methods for image-to-image translation, image synthesis, biophysical modelling, super-resolution and image segmentation and classification.

Table of Contents

Frontmatter
Synthetic Augmentation for Anatomical Landmark Localization Using DDPMs
Abstract
Deep learning techniques for anatomical landmark localization (ALL) have shown great success, but their reliance on large annotated datasets remains a problem due to the tedious and costly nature of medical data acquisition and annotation. While traditional data augmentation, variational autoencoders (VAEs), and generative adversarial networks (GANs) have already been used to synthetically expand medical datasets, diffusion-based generative models have recently started to gain attention for their ability to generate high-quality synthetic images. In this study, we explore the use of denoising diffusion probabilistic models (DDPMs) for generating medical images and their corresponding heatmaps of landmarks to enhance the training of a supervised deep learning model for ALL. Our novel approach involves a DDPM with a 2-channel input, incorporating both the original medical image and its heatmap of annotated landmarks. We also propose a novel way to assess the quality of the generated images using a Markov Random Field (MRF) model for landmark matching and a Statistical Shape Model (SSM) to check landmark plausibility, before we evaluate the DDPM-augmented dataset in the context of an ALL task involving hand X-Rays.
Arnela Hadzic, Lea Bogensperger, Simon Johannes Joham, Martin Urschler
AdaptDiff: Cross-Modality Domain Adaptation via Weak Conditional Semantic Diffusion for Retinal Vessel Segmentation
Abstract
Deep learning has shown remarkable performance in medical image segmentation. However, despite its promise, deep learning has many challenges in practice due to its inability to effectively transition to unseen domains, caused by the inherent data distribution shift and the lack of manual annotations to guide domain adaptation. To tackle this problem, we present an unsupervised domain adaptation (UDA) method named AdaptDiff that enables a retinal vessel segmentation network trained on fundus photography (FP) to produce satisfactory results on unseen modalities (e.g., OCT-A) without any manual labels. For all our target domains, we first adopt a segmentation model trained on the source domain to create pseudo-labels. With these pseudo-labels, we train a conditional semantic diffusion probabilistic model to represent the target domain distribution. Experimentally, we show that even with low quality pseudo-labels, the diffusion model can still capture the conditional semantic information. Subsequently, we sample on the target domain with binary vessel masks from the source domain to get paired data, i.e., target domain synthetic images conditioned on the binary vessel map. Finally, we fine-tune the pre-trained segmentation network using the synthetic paired data to mitigate the domain gap. We assess the effectiveness of AdaptDiff on seven publicly available datasets across three distinct modalities. Our results demonstrate a significant improvement in segmentation performance across all unseen datasets. Our code is publicly available at https://​github.​com/​DeweiHu/​AdaptDiff.
Dewei Hu, Hao Li, Han Liu, Jiacheng Wang, Xing Yao, Daiwei Lu, Ipek Oguz
Adapted nnU-Net: A Robust Baseline for Cross-Modality Synthesis and Medical Image Inpainting
Abstract
In medical image synthesis, the development of robust and reliable baseline methods is crucial due to the complexity and variability of existing techniques. Despite advances with architectures such as GANs and diffusion models, a clear state-of-the-art has yet to be established. This paper introduces a versatile adaptation of the nnU-Net framework as a robust baseline for both cross-modality synthesis and image inpainting tasks. Known for its superior performance in segmentation challenges, nnU-Net’s automatic configuration and parameter optimization capabilities have been adapted for these new applications. We evaluate this method on two use cases: pelvis MR to CT translation using the Synthrad2023 challenge dataset and local synthesis using the BraTs 2023 inpainting challenge dataset. Standard synthesis metrics -MAE, MSE, SSIM and PSNR- demonstrate that our adapted nnU-Net outperforms GAN-based methods like pix2pixHD and ranks among the best methods for both challenges. We recommend this adapted nnU-Net as a new benchmark for medical image translation and inpainting tasks, and provide our implementations for public use on GitHub.
Arthur Longuefosse, Edern Le Bot, Baudouin Denis De Senneville, Rémi Giraud, Boris Mansencal, Pierrick Coupé, Pascal Desbarats, Fabien Baldacci
Beyond MR Image Harmonization: Resolution Matters Too
Abstract
Magnetic resonance (MR) imaging is commonly used in the clinical setting to non-invasively monitor the body. There exists a large variability in MR imaging due to differences in scanner hardware, software, and protocol design. Ideally, a processing algorithm should perform robustly to this variability, but that is not always the case in reality. This introduces a need for image harmonization to overcome issues of domain shift when performing downstream analysis such as segmentation. Most image harmonization models focus on acquisition parameters such as inversion time or repetition time, but they ignore an important aspect in MR imaging—resolution. In this paper, we evaluate the impact of image resolution on harmonization using a pretrained harmonization algorithm. We simulate 2D acquisitions of various slice thicknesses and gaps from 3D acquired, 1 mm\(^3\) isotropic MR images and demonstrate how the performance of a state-of-the-art image harmonization algorithm varies as resolution changes. We discuss the most ideal scenarios for image resolution including acquisition orientation when 3D imaging is not available, which is common for many clinical scanners. Our results show that harmonization on low-resolution images does not account for acquisition resolution and orientation variations. Super-resolution can be used to alleviate resolution variations but it is not always used. Our methodology can generalize to help evaluate the impact of image acquisition resolution for multiple tasks. Determining the limits of a pretrained algorithm is important when considering preprocessing steps and trust in the results.
Savannah P. Hays, Samuel W. Remedios, Lianrui Zuo, Ellen M. Mowry, Scott D. Newsome, Peter A. Calabresi, Aaron Carass, Blake E. Dewey, Jerry L. Prince
Benchmarking Robustness of Endoscopic Depth Estimation with Synthetically Corrupted Data
Abstract
Accurate depth perception is crucial for patient outcomes in endoscopic surgery, yet it is compromised by image distortions common in surgical settings. To tackle this issue, our study presents a benchmark for assessing the robustness of endoscopic depth estimation models. We have compiled a comprehensive dataset that reflects real-world conditions, incorporating a range of synthetically induced corruptions at varying severity levels. To further this effort, we introduce the Depth Estimation Robustness Score (DERS), a novel metric that combines measures of error, accuracy, and robustness to meet the multifaceted requirements of surgical applications. This metric acts as a foundational element for evaluating performance, establishing a new paradigm for the comparative analysis of depth estimation technologies. Additionally, we set forth a benchmark focused on robustness for the evaluation of depth estimation in endoscopic surgery, with the aim of driving progress in model refinement. A thorough analysis of two monocular depth estimation models using our framework reveals crucial information about their reliability under adverse conditions. Our results emphasize the essential need for algorithms that can tolerate data corruption, thereby advancing discussions on improving model robustness. The impact of this research transcends theoretical frameworks, providing concrete gains in surgical precision and patient safety. This study establishes a benchmark for the robustness of depth estimation and serves as a foundation for developing more resilient surgical support technologies. Code is available at https://​github.​com/​lofrienger/​EndoDepthBenchma​rk.
An Wang, Haochen Yin, Beilei Cui, Mengya Xu, Hongliang Ren
A Dual-Task Mutual Learning Framework for Predicting Post-thrombectomy Cerebral Hemorrhage
Abstract
Ischemic stroke is a severe condition caused by the blockage of brain blood vessels, and can lead to the death of brain tissue due to oxygen deprivation. Thrombectomy has become a common treatment choice for ischemic stroke due to its immediate effectiveness. But, it carries the risk of postoperative cerebral hemorrhage. Clinically, multiple CT scans within 0–72 h post-surgery are used to monitor for hemorrhage. However, this approach exposes radiation dose to patients, and may delay the detection of cerebral hemorrhage. To address this dilemma, we propose a novel prediction framework for measuring postoperative cerebral hemorrhage using only the patient’s initial CT scan. Specifically, we introduce a dual-task mutual learning framework to takes the initial CT scan as input and simultaneously estimates both the follow-up CT scan and prognostic label to predict the occurrence of postoperative cerebral hemorrhage. Our proposed framework incorporates two attention mechanisms, i.e., self-attention and interactive attention. Specifically, the self-attention mechanism allows the model to focus more on high-density areas in the image, which are critical for diagnosis (i.e., potential hemorrhage areas). The interactive attention mechanism further models the dependencies between the interrelated generation and classification tasks, enabling both tasks to perform better than the case when conducted individually. Validated on clinical data, our method can generate follow-up CT scans better than state-of-the-art methods, and achieves an accuracy of \(86.37\%\) in predicting follow-up prognostic labels. Thus, our work thus contributes to the timely screening of post-thrombectomy cerebral hemorrhage, and could significantly reform the clinical process of thrombectomy and other similar operations related to stroke.
Caiwen Jiang, Tianyu Wang, Xiaodan Xing, Mianxin Liu, Guang Yang, Zhongxiang Ding, Dinggang Shen
TSynD: Targeted Synthetic Data Generation for Enhanced Medical Image Classification
Leveraging Epistemic Uncertainty to Improve Model Performance
Abstract
The usage of medical image data for the training of large-scale machine learning approaches is particularly challenging due to its scarce availability and the costly generation of data annotations, typically requiring the engagement of medical professionals. The rapid development of generative models enables us to tackle this problem by generating large amounts of realistic synthetic data for the training process. However, randomly choosing synthetic samples, might not be an optimal strategy.
In this work, we investigate the targeted generation of synthetic training data, in order to improve the accuracy and robustness of image classification. Therefore, our approach aims to guide the generative model to synthesize data with high epistemic uncertainty, since large measures of epistemic uncertainty indicate underrepresented data points in the training set. During the image generation we feed images reconstructed by an auto encoder into the classifier and compute the mutual information over the class-probability distribution as a measure for uncertainty. We alter the feature space of the autoencoder through an optimization process with the objective of maximizing the classifier uncertainty on the decoded image. By training on such data we improve the performance and robustness against test time data augmentations and adversarial attacks on several classifications tasks.
Joshua Niemeijer, Jan Ehrhardt, Hristina Uzunova, Heinz Handels
Beyond Intensity Transforms: Medical Image Synthesis Under Large Deformation
Abstract
Deep generative models have achieved remarkable performance in various medical image-to-image translation tasks, including image reconstruction, denoising, and multimodal synthesis. However, these models typically learn to change the intensity of an image while preserving structure. In many medical image-to-image translation scenarios, there is often a significant deformation between the source and target images, such as the deformation of the lungs during breathing, adding an additional layer of complexity. Conventional generative models are not suited to capture spatial deformation. To address this, we propose a framework for medical image synthesis under large deformation which consists of two stages: the first stage predicts a dense displacement field to deform the moving image into the fixed image space, and the second stage predicts the intensity changes. We demonstrate our method on inspiratory-expiratory chest computed tomography images from a large cohort of nearly 500 subjects with varying degrees of disease severity. Ablation studies were conducted to understand the contribution of various model components. Our method achieved reliable alignment between the source and target images with a Dice similarity coefficient of 0.90 and a high multiscale structural similarity of 0.863 within the testing cohort.
Muhammad F. A. Chaudhary, Joseph M. Reinhardt, Sarah E. Gerard
Sim2Real in Endoscopy Segmentation with a Novel Structure Aware Image Translation
Abstract
Automatic segmentation of anatomical landmarks in endoscopic images can provide assistance to doctors and surgeons for diagnosis, treatments or medical training. However, obtaining the annotations required to train commonly used supervised learning methods is a tedious and difficult task, in particular for real images. While ground truth annotations are easier to obtain for synthetic data, models trained on such data often do not generalize well to real data. Generative approaches can add realistic texture to it, but face difficulties to maintain the structure of the original scene. The main contribution in this work is a novel image translation model that adds realistic texture to simulated endoscopic images while keeping the key scene layout information. Our approach produces realistic images in different endoscopy scenarios. We demonstrate these images can effectively be used to successfully train a model for a challenging end task without any real labeled data. In particular, we demonstrate our approach for the task of fold segmentation in colonoscopy images. Folds are key anatomical landmarks that can occlude parts of the colon mucosa and possible polyps. Our approach generates realistic images maintaining the shape and location of the original folds, after the image-style-translation, better than existing methods. We run experiments both on a novel simulated dataset for fold segmentation, and real data from the EndoMapper (EM) dataset [1]. All our new generated data and new EM metadata is being released to facilitate further research, as no public benchmark is currently available for the task of fold segmentation.
Clara Tomasini, Luis Riazuelo, Ana C. Murillo
Fireflies: Photorealistic Simulation and Optimization of Structured Light Endoscopy
Abstract
There exist various approaches for the 3D reconstruction of dynamic scenes. In medicine, particularly in endoscopy, single-shot structured light systems are frequently explored, as they allow for the reconstruction of dynamic, feature-less surfaces. Design and manufacturing of structured light endoscopes, however, implies high initial costs that significantly hinder the availability and development of these systems. To streamline this process, simulation systems are necessary that allow researchers to not only model the intricacies of medical domains, but also of structured light systems themselves. To address this, we propose Fireflies, a differentiable framework for the physically-based simulation and domain randomization of structured light endoscopy. Based on the differentiable Mitsuba renderer, Fireflies facilitates and simplifies the development of domain-specific algorithms for endoscopic procedures. In this paper, we demonstrate the effectiveness of our framework by jointly optimizing domain-specific laser-based projection pattern for Structured Light Endoscopy, and generating large-scale synthetic training data for efficient supervised learning without manual labeling. We show that a) an optimized projection pattern can increase the reconstructability of a target domain and b) the synthetic data generated by Fireflies lowers the labeling effort required for endoscopic machine learning tasks. The source code is available at: https://​github.​com/​Henningson/​Fireflies
Jann-Ole Henningson, Reinhard Veltrup, Marion Semmler, Michael Döllinger, Marc Stamminger
Exogenous Agent-Free Synthetic Post-contrast Imaging with a Cascade of Deep Networks for Enhancement Prediction After Tumor Resection. A Parametric-Map Oriented Approach
Abstract
Gadolinium-based contrast agents (GBCAs) have become a cornerstone in clinical routine for detection, characterization and monitoring of several diseases. Particularly, GBCAs are clinically relevant for the detection of blood brain barrier (BBB) damage, which is associated with an aggressive tumor behavior. However, issues such as safety concerns related to deposition of GBCA in the brain, prolonged acquisitions, and cost increase advocate against its usage. In this work, we propose a novel approach based on a cascade of deep networks for pre- and post-contrast parametric mapping and the synthesis of post-contrast T1-weighted images. Only a pair of pre-contrast weighted images acquired with conventional pulse sequences are used as inputs; thus, our approach is GBCAs-free. Results reveal the potential of this approach to obtain T1w-enhancement information after tumor resection which is comparable with another state-of-the-art prediction approach. We provide not only the predictions, but also the pre- and post-contrast parametric maps without the usage of GBCAs.
Elisa Moya-Sáez, Rodrigo de Luis-García, Laura Nunez-Gonzalez, Carlos Alberola-López, Juan Antonio Hernández-Tamames
OCT Scans Simulation Framework for Data Augmentation and Controlled Evaluation of Signal Processing Approaches
Abstract
Optical Coherence Tomography (OCT) is an emerging approach for tissue diagnostics and optical biopsy. OCT can evaluate biological structures, including vessels (such as blood and lymphatic vessels), tissue layers, tumor margins, and other inclusions. OCT scans reveal coherent speckle patterns and signal decay. These parameters can be characterized by speckle contrast (SC) and the optical attenuation coefficient (OAC). This work presents the principles of OCT signal formation, demonstrates a computationally efficient OCT signal simulation framework, and outlines the applicability of its utilization to SC and OAC processing evaluation. We then demonstrate the presented approach in application to real OCT signals of cartilage under laser treatment. The presented OCT scan simulation and signal processing tools are available on the cloud-based online platform https://​www.​opticelastograph​.​com.
Aleksandr Sovetsky, Alexander Matveyev, Peter Chizhov, Vladimir Zaitsev, Lev Matveev
Enhancing Quantitative Image Synthesis Through Pretraining and Resolution Scaling for Bone Mineral Density Estimation from a Plain X-Ray Image
Abstract
While most vision tasks are essentially visual in nature (for recognition), some important tasks, especially in the medical field, also require quantitative analysis (for quantification) using quantitative images. Unlike in visual analysis, pixel values in quantitative images correspond to physical metrics measured by specific devices (e.g., a depth image). However, recent work has shown that it is sometimes possible to synthesize accurate quantitative values from visual ones (e.g., depth from visual cues or defocus). This research aims to improve quantitative image synthesis (QIS) by exploring pretraining and image resolution scaling. We propose a benchmark for evaluating pretraining performance using the task of QIS-based bone mineral density (BMD) estimation from plain X-ray images, where the synthesized quantitative image is used to derive BMD. Our results show that appropriate pretraining can improve QIS performance, significantly raising the correlation of BMD estimation from 0.820 to 0.898, while others do not help or even hinder it. Scaling up the resolution can further boost the correlation up to 0.923, a significant enhancement over conventional methods.
Yi Gu, Yoshito Otake, Keisuke Uemura, Masaki Takao, Mazen Soufi, Seiji Okada, Nobuhiko Sugano, Hugues Talbot, Yoshinobu Sato
Latent Pollution Model: The Hidden Carbon Footprint in 3D Image Synthesis
Abstract
Contemporary developments in generative AI are rapidly transforming the field of medical AI. These developments have been predominantly driven by the availability of large datasets and high computing power, which have facilitated a significant increase in model capacity. Despite their considerable potential, these models demand substantially high power, leading to high carbon dioxide \((CO_2)\) emissions. Given the harm such models are causing to the environment, there has been little focus on the carbon footprints of such models. This study analyzes carbon emissions from 2D and 3D latent diffusion models (LDMs) during training and data generation phases, revealing a surprising finding: the synthesis of large images contributes most significantly to these emissions. We assess different scenarios including model sizes, image dimensions, distributed training, and data generation steps. Our findings reveal substantial carbon emissions from these models, with training 2D and 3D models comparable to driving a car for 10 km and 90 km, respectively. The process of data generation is even more significant, with \(CO_2\) emissions equivalent to driving 160 km for 2D models and driving for up to 3345 km for 3D synthesis. Additionally, we found that the location of the experiment can increase carbon emissions by up to 94 times, and even the time of year can influence emissions by up to 50%. These figures are alarming, considering they represent only a single training and data generation phase for each model. Our results emphasize the urgent need for developing environmentally sustainable strategies in generative AI.
Marvin Seyfarth, Salman Ul Hassan Dar, Sandy Engelhardt
Synthesizing Scalable CFD-Enhanced Aortic 4D Flow MRI for Assessing Accuracy and Precision of Deep-Learning Image Reconstruction and Segmentation Tasks
Abstract
Systematic and random errors of MRI measurements in conjunction with the absence of ground truth data limit the assessment of accuracy and precision of 4D flow MRI image reconstruction and other downstream tasks. In this work, we propose to generate scalable synthetic CFD-enhanced aortic 4D flow MRI data, which we assemble into a dataset named RACLETTE. Our approach takes in-vivo 4D flow MRI data as input and pairs it with CFD-based “ground-truth” mean and turbulent flow fields. Specifically, high-resolution pulsatile velocity-vector and turbulent flow fields are simulated for varying degrees of aortic stenosis for a set of 139 time-resolved compliant aortic geometries. To generate realistic datasets, the synthetic flow fields are projected and embedded into the background of the in-vivo 4D flow MRI scans. Upon Fourier transform, data sampling using a given velocity encoding and undersampling scheme yields k-space data as input to deep-learning image reconstruction, segmentation and other downstream tasks. Since the synthetic 4D flow MRI data is paired with noise-free reference values including velocity, pressure, wall shear stress, the Reynolds stress tensor and pulse wave velocity, accuracy and precision of reconstruction and inference are readily available. To demonstrate the value of synthetic CFD-enhanced 4D flow MRI data, we utilize the dataset to train and apply (1) deep-learning based image reconstruction and (2) automatic vessel segmentation. It is shown that the synthetically trained deep-learning tasks generalize sufficiently and provide insights into the performance of reconstruction and processing tasks, indicating the potential value of our synthetic dataset also for further applications.
Pietro Dirix, Luuk Jacobs, Stefano Buoso, Sebastian Kozerke
MedEdit: Counterfactual Diffusion-Based Image Editing on Brain MRI
Abstract
Denoising diffusion probabilistic models enable high-fidelity image synthesis and editing. In biomedicine, these models facilitate counterfactual image editing, producing pairs of images where one is edited to simulate hypothetical conditions. For example, they can model the progression of specific diseases, such as stroke lesions. However, current image editing techniques often fail to generate realistic biomedical counterfactuals, either by inadequately modeling indirect pathological effects like brain atrophy or by excessively altering the scan, which disrupts correspondence to the original images. Here, we propose MedEdit, a conditional diffusion model for medical image editing. MedEdit induces pathology in specific areas while balancing the modeling of disease effects and preserving the original scan’s integrity. We evaluated MedEdit on the Atlas v2.0 stroke dataset using Frechet Inception Distance and Dice scores, outperforming state-of-the-art diffusion-based methods such as Palette (by 45%) and SDEdit (by 61%). Additionally, clinical evaluations by a board-certified neuroradiologist confirmed that MedEdit generated realistic stroke scans indistinguishable from real ones. We believe this work will enable counterfactual image editing research to further advance the development of realistic and clinically useful imaging tools.
Malek Ben Alaya, Daniel M. Lang, Benedikt Wiestler, Julia A. Schnabel, Cosmin I. Bercea
Using MR Physics for Domain Generalisation and Super-Resolution
Abstract
MRI is a very flexible imaging modality, but with flexibility comes heterogeneity. MRI sequence choice, acquisition parameters, and image resolution form an extrinsic source of variability, reducing our ability to extract the underlying relevant biological signal and causing difficulties in downstream analyses. We propose a new method that can create resolution and acquisition-parameter invariant representations by removing external sources of variability. We use realistic physics models of image resolution and combine them with a differentiable model of MRI sequences to create an invariant high-resolution multi-parametric (MPM) MRI estimate from an arbitrary number of inputs, all trained via self-supervision. The proposed method allows clinical imaging sessions with sequences acquired at arbitrary resolutions to be transformed into a single-domain generalisable representation. We demonstrate the model’s validity by showing improved MPM reconstruction and imputation quality compared to previous methods and a significantly improved ability to super-resolve. We also demonstrate domain generalisation capabilities via a downstream classification model that is more robust to the choice of input sequences in an out-of-distribution dataset.
Pedro Borges, Virginia Fernandez, Petru Daniel Tudosiu, Parashkev Nachev, Sebastien Ourselin, M. Jorge Cardoso
Single-Scan mpMRI Calibration of Multi-species Brain Tumor Dynamics with Mass Effect
Abstract
We propose an inverse problem formulation and solution algorithm for the calibration of a multi-species biophysical model of glioblastoma (GBM) growth. The model couples GBM growth with GBM-induced deformation of the healthy parenchyma, the so called “mass effect”. The model is a multi-species partial differential equation (PDE) that models interactions between proliferative, infiltrative, and necrotic tumor cells as well as hypoxia. This PDE has several unknown parameters and fields: the pretumor brain anatomy, initial conditions, and ten coefficients representing mechanisms like diffusive proliferation, migration, and growth. A key challenge is estimating these parameters using just a single multiparametric magnetic resonance imaging (mpMRI) scan.
To solve this inverse problem, we first segmented the mpMRI and then we use a single-species PDE model to invert for the tumor initial condition and the pretumor anatomy. Then we solve a second inverse problem with the multi-species PDE but without mass effect, to obtain an initial estimate of the ten unknown scalar coefficients. We use these estimates to appropriately scale the mass-effect estimates from the first inverse problem. Finally, we solve a third inverse problem with the multi-species PDE including mass effect to finalize the ten coefficient values.
We solve all three inverse problems using quasi-Newton solvers, using an adjoint-based formulation for the single-species problem, and a sensitivity-based formulation for the two multi-species problems.
We present preliminary results from evaluating our methodology on five subjects from the BraTS20 dataset. We show that despite the model complexity and the sparse data it is still possible to solve the inverse problem. When comparing to the single-species reconstruction, we find that the new model results in similar or better reconstruction for the overall tumor region while extracting a richer set of biophysical biomarkers.
Ali Ghafouri, George Biros
Annotated Biomedical Video Generation Using Denoising Diffusion Probabilistic Models and Flow Fields
Abstract
The segmentation and tracking of living cells play a vital role within the biomedical domain, particularly in cancer research, drug development, and developmental biology. These are usually tedious and time-consuming tasks that are traditionally done by biomedical experts. Recently, to automatize these processes, deep learning based segmentation and tracking methods have been proposed. These methods require large-scale datasets and their full potential is constrained by the scarcity of annotated data in the biomedical imaging domain. To address this limitation, we propose Biomedical Video Diffusion Model (BVDM), capable of generating realistic-looking synthetic microscopy videos. Trained only on a single real video, BVDM can generate videos of arbitrary length with pixel-level annotations that can be used for training data-hungry models. It is composed of a denoising diffusion probabilistic model (DDPM) generating high-fidelity synthetic cell microscopy images and a flow prediction model (FPM) predicting the non-rigid transformation between consecutive video frames. During inference, initially, the DDPM imposes realistic cell textures on synthetic cell masks which are generated based on real data statistics. The flow prediction model predicts the flow field between consecutive masks and applies that to the DDPM output from the previous time frame to create the next one while keeping temporal consistency. BVDM outperforms state-of-the-art synthetic live cell microscopy video generation models. Furthermore, we demonstrate that a sufficiently large synthetic dataset enhances the performance of cell segmentation and tracking models compared to using a limited amount of available real data. The code is available under https://​github.​com/​ruveydayilmaz0/​BVDM.
Rüveyda Yilmaz, Dennis Eschweiler, Johannes Stegmaier
Backmatter
Metadata
Title
Simulation and Synthesis in Medical Imaging
Editors
Virginia Fernandez
Jelmer M. Wolterink
David Wiesner
Samuel Remedios
Lianrui Zuo
Adrià Casamitjana
Copyright Year
2025
Electronic ISBN
978-3-031-73281-2
Print ISBN
978-3-031-73280-5
DOI
https://doi.org/10.1007/978-3-031-73281-2

Premium Partner