Skip to main content
Top

2025 | Book

Deep Generative Models

4th MICCAI Workshop, DGM4MICCAI 2024, Held in Conjunction with MICCAI 2024, Marrakesh, Morocco, October 10, 2024, Proceedings

insite
SEARCH

About this book

This book constitutes the proceedings of the 4th workshop on Deep Generative Models for Medical Image Computing and Computer Assisted Intervention, DGM4MICCAI 2024, held in conjunction with the 27th International conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2024, in Marrakesh, Morocco in October 2024.

The 21 papers presented here were carefully reviewed and selected from 40 submissions. These papers deal with a broad range of topics, ranging from methodology (such as Causal inference, Latent interpretation, Generative factor analysis) to Applications (such as Mammography, Vessel imaging, Surgical videos and more).

Table of Contents

Frontmatter
DeReStainer: H&E to IHC Pathological Image Translation via Decoupled Staining Channels
Abstract
Breast cancer is a highly fatal disease among cancers in women, and early detection is crucial for treatment. HER2 status, a valuable diagnostic marker based on Immunohistochemistry (IHC) staining, is instrumental in determining breast cancer status. The high cost of IHC staining and the ubiquity of Hematoxylin and Eosin (H&E) staining make the conversion from H&E to IHC staining essential. In this article, we propose a destain-restain framework for converting H&E staining to IHC staining, leveraging the characteristic that H&E staining and IHC staining of the same tissue sections share the Hematoxylin channel. We further design loss functions specifically for Hematoxylin and Diaminobenzidin (DAB) channels to generate IHC images exploiting insights from separated staining channels. Beyond the benchmark metrics on BCI contest, we have developed semantic information metrics for the HER2 level. The experimental results demonstrated that our method outperforms previous open-sourced methods in terms of image intrinsic property and semantic information.
Linda Wei, Shengyi Hua, Shaoting Zhang, Xiaofan Zhang
WDM: 3D Wavelet Diffusion Models for High-Resolution Medical Image Synthesis
Abstract
Due to the three-dimensional nature of CT- or MR-scans, generative modeling of medical images is a particularly challenging task. Existing approaches mostly apply patch-wise, slice-wise, or cascaded generation techniques to fit the high-dimensional data into the limited GPU memory. However, these approaches may introduce artifacts and potentially restrict the model’s applicability for certain downstream tasks. This work presents WDM, a wavelet-based medical image synthesis framework that applies a diffusion model on wavelet decomposed images. The presented approach is a simple yet effective way of scaling 3D diffusion models to high resolutions and can be trained on a single 40 GB GPU. Experimental results on BraTS and LIDC-IDRI unconditional image generation at a resolution of \(128 \times 128 \times 128\) demonstrate state-of-the-art image fidelity (FID) and sample diversity (MS-SSIM) scores compared to recent GANs, Diffusion Models, and Latent Diffusion Models. Our proposed method is the only one capable of consistently generating high-quality images at a resolution of \(256 \times 256 \times 256\), outperforming all comparing methods. The project page is available at https://​pfriedri.​github.​io/​wdm-3d-io.
Paul Friedrich, Julia Wolleb, Florentin Bieder, Alicia Durrer, Philippe C. Cattin
Energy-Based Prior Latent Space Diffusion Model for Reconstruction of Lumbar Vertebrae from Thick Slice MRI
Abstract
Lumbar spine problems are ubiquitous, motivating research into targeted imaging for treatment planning and guided interventions. While the high resolution and high contrast CT has been the modality of choice, MRI can capture both bone and soft tissue without the ionizing radiation of CT albeit longer acquisition time. The critical tradeoff between contrast quality and acquisition time has motivated ‘thick slice MRI’, which prioritises faster imaging with high in-plane resolution but variable contrast and low through-plane resolution. We investigate a recently developed post-acquisition pipeline which segments vertebrae from thick-slice acquisitions and uses a variational autoencoder to enhance quality after an initial 3D reconstruction. We instead propose a latent space diffusion energy-based prior (The code for this work is available at https://​github.​com/​Seven-year-promise/​LSD_​EBM_​MRI.)  to leverage diffusion models, which exhibit high-quality image generation. Crucially, we mitigate their high computational cost and low sample efficiency by learning an energy-based latent representation to perform the diffusion processes. Our resulting method outperforms existing approaches across metrics including Dice and VS scores, and more faithfully captures 3D features.
Yanke Wang, Yolanne Y. R. Lee, Aurelio Dolfini, Markus Reischl, Ender Konukoglu, Kyriakos Flouris
Anatomically-Guided Inpainting for Local Synthesis of Normal Chest Radiographs
Abstract
Chest radiography (CXR) is one of the most used medical imaging modalities. Nevertheless, the interpretation of CXR images is time-consuming and subject to variability. As such, automated systems for pathology detection have been proposed and promising results have been obtained, particularly using deep learning. However, these tools suffer from poor explainability, which represents a major hurdle for their adoption in clinical practice. One proposed explainability method in CXR is through contrastive examples, i.e. by showing an alternative version of the CXR except without the lesion being investigated. While image-level normal/healthy image synthesis has been explored in literature, normal patch synthesis via inpainting has received little attention. In this work, a method to synthesize contrastive examples in CXR based on local synthesis of normal CXR patches is proposed. Based on a contextual attention inpainting network (CAttNet), an anatomically-guided inpainting network (AnaCAttNet) is proposed that leverages anatomical information of the original CXR through segmentation to guide the inpainting for a more realistic reconstruction. A quantitative evaluation of the inpainting is performed, showing that AnaCAttNet outperforms CAttNet (FID of 0.0125 and 0.0132 respectively). Qualitative evaluation by three readers also showed that AnaCAttNet delivers superior reconstruction quality and anatomical realism. In conclusion, the proposed anatomical segmentation module for inpainting is shown to improve inpainting performance.
João Pedrosa, Sofia Cardoso Pereira, Joana Silva, Ana Maria Mendonça, Aurélio Campilho
Enhancing Cross-Modal Medical Image Segmentation Through Compositionality
Abstract
Cross-modal medical image segmentation presents a significant challenge, as different imaging modalities produce images with varying resolutions, contrasts, and appearances of anatomical structures. We introduce compositionality as an inductive bias in a cross-modal segmentation network to improve segmentation performance and interpretability while reducing complexity. The proposed network is an end-to-end cross-modal segmentation framework that enforces compositionality on the learned representations using learnable von Mises-Fisher kernels. These kernels facilitate content-style disentanglement in the learned representations, resulting in compositional content representations that are inherently interpretable and effectively disentangle different anatomical structures. The experimental results demonstrate enhanced segmentation performance and reduced computational costs on multiple medical datasets. Additionally, we demonstrate the interpretability of the learned compositional features. Code and checkpoints will be publicly available at: https://​github.​com/​Trustworthy-AI-UU-NKI/​Cross-Modal-Segmentation.
Aniek Eijpe, Valentina Corbetta, Kalina Chupetlovska, Regina Beets-Tan, Wilson Silva
Unpaired Modality Translation for Pseudo Labeling of Histology Images
Abstract
The segmentation of histological images is critical for various biomedical applications, yet the lack of annotated data presents a significant challenge. We propose a microscopy pseudo labeling pipeline utilizing unsupervised image translation to address this issue. Our method generates pseudo labels by translating between labeled and unlabeled domains without requiring prior annotation in the target domain. We evaluate two pseudo labeling strategies across three image domains increasingly dissimilar from the labeled data, demonstrating their effectiveness. Notably, our method achieves a mean Dice score of \(0.736 \pm 0.005\) on a SEM dataset using the tutoring path, which involves training a segmentation model on synthetic data created by translating the labeled dataset (TEM) to the target modality (SEM). This approach aims to accelerate the annotation process by providing high-quality pseudo labels as a starting point for manual refinement.
Arthur Boschet, Armand Collin, Nishka Katoch, Julien Cohen-Adad
SNAFusion: Distilling 2D Axial Plane Diffusion Priors for Sparse-View 3D Cone-Beam CT Imaging
Abstract
The low radiation dose of X-rays is often a dominant source of artifacts in Cone-beam computed tomography (CBCT) images, making it a long-standing and challenging inverse problem. Previous existing data-driven techniques employ 3D decoders with paired huge volumes of training datasets, resulting in limited generalizability and also ignoring the fact of clinical dataset shortage. Even though some implicit neural rendering (INR) methods are focused on per-patient 3D representations under an implicit coordinate to enhance the final reconstructions, they often struggle to simultaneously achieve 3D-consistent and detailed results in the case of extremely sparse views. In this work, we first unify recent advances in INR and probabilistic generation, which propose a geometry-informed score distillation sampling technique for 3D CBCT imaging. In particular, the framework distills robust prior knowledge from the pre-trained 2D axial diffusion models and incorporates plug-and-play geometric information of the measured process to refine the neural radiance field, aiming for high-quality and coherent reconstructions across all dimensions of CBCT volume. We conduct experiments on several challenging in-distribution and out-of-distribution public CT datasets without any retraining. Both quantitative and qualitative assessments demonstrate that our approach outperforms recent works and exhibits superior generalizability.
Xiaoyue Li, Tielong Cai, Kai Shang, Mark D. Butala, Gaoang Wang
SynthBrainGrow: Synthetic Diffusion Brain Aging for Longitudinal MRI Data Generation in Young People
Abstract
Synthetic longitudinal brain MRI simulates brain aging and would enable more efficient research on neurodevelopmental and neurodegenerative conditions. Synthetically generated, age-adjusted brain images could serve as valuable alternatives to costly longitudinal imaging acquisitions, serve as internal controls for studies looking at the effects of environmental or therapeutic modifiers on brain development, and allow data augmentation for diverse populations. In this paper, we present a diffusion-based approach called SynthBrainGrow for synthetic brain aging with a two-year step. To validate the feasibility of using synthetically generated data on downstream tasks, we compared structural volumetrics of two-year-aged brains against synthetically aged brain MRI. The use of structural similarity indices, such as the Structural Similarity Index Measure (SSIM), for evaluating synthetic medical images has come under recent scrutiny. These indices may not effectively capture the perceptual quality or clinical usefulness in synthesized radiology scans. To assess the performance of SynthBrainGrow, we evaluated the substructural volumetric similarity between synthetic and real patient scans. Results show that SynthBrainGrow can accurately capture substructure volumetrics and simulate structural changes such as ventricle enlargement and cortical thinning. Generating longitudinal brain datasets from cross-sectional data could enable augmented training and benchmarking of computational tools for analyzing lifespan trajectories. This work signifies an important advance in generative modeling to synthesize realistic longitudinal data with limited lifelong MRI scans. The code is available at https://​github.​com/​zapaishchykova/​SynthBrainGrow.
Anna Zapaishchykova, Benjamin H. Kann, Divyanshu Tak, Zezhong Ye, Daphne A. Haas-Kogan, Hugo J. W. L. Aerts
Denoising Diffusion Models for 3D Healthy Brain Tissue Inpainting
Abstract
Monitoring diseases that affect the brain’s structural integrity requires automated analysis of magnetic resonance images, e.g., for the evaluation of volumetric changes. However, many of the evaluation tools are optimized for analyzing healthy tissue. To enable the evaluation of scans containing pathological tissue, it is therefore required to restore healthy tissue in the pathological areas. In this work, we explore and extend denoising diffusion probabilistic models (DDPMs) for consistent inpainting of healthy 3D brain tissue. We modify state-of-the-art 2D, pseudo-3D, and 3D DDPMs working in the image space, as well as 3D latent and 3D wavelet DDPMs, and train them to synthesize healthy brain tissue. Our evaluation shows that the pseudo-3D model performs best regarding the structural-similarity index, peak signal-to-noise ratio, and mean squared error. To emphasize the clinical relevance, we fine-tune this model on synthetic multiple sclerosis lesions and evaluate it on a downstream brain tissue segmentation task, where it outperforms the established FMRIB Software Library (FSL) lesion-filling method.
Alicia Durrer, Julia Wolleb, Florentin Bieder, Paul Friedrich, Lester Melie-Garcia, Mario Alberto Ocampo Pineda, Cosmin I. Bercea, Ibrahim Ethem Hamamci, Benedikt Wiestler, Marie Piraud, Oezguer Yaldizli, Cristina Granziera, Bjoern Menze, Philippe C. Cattin, Florian Kofler
Panoptic Segmentation of Mammograms with Text-to-Image Diffusion Model
Abstract
Mammography is crucial for breast cancer surveillance and early diagnosis. However, analyzing mammography images is a demanding task for radiologists, who often review hundreds of mammograms daily, leading to overdiagnosis and overtreatment. Computer-Aided Diagnosis (CAD) systems have been developed to assist in this process, but their capabilities, particularly in lesion segmentation, remained limited. With the contemporary advances in deep learning their performance may be improved. Recently, vision-language diffusion models emerged, demonstrating outstanding performance in image generation and transferability to various downstream tasks. We aim to harness their capabilities for breast lesion segmentation in a panoptic setting, which encompasses both semantic and instance-level predictions. Specifically, we propose leveraging pretrained features from a Stable Diffusion model as inputs to a state-of-the-art panoptic segmentation architecture, resulting in accurate delineation of individual breast lesions. To bridge the gap between natural and medical imaging domains, we incorporated a mammography-specific MAM-E diffusion model and BiomedCLIP image and text encoders into this framework. We evaluated our approach on two recently published mammography datasets, CDD-CESM and VinDr-Mammo. For the instance segmentation task, we noted 40.25 AP0.1 and 46.82 AP0.05, as well as 25.44 PQ0.1 and 26.92 PQ0.05. For the semantic segmentation task, we achieved Dice scores of 38.86 and 40.92, respectively.
Kun Zhao, Jakub Prokop, Javier Montalt-Tordera, Sadegh Mohammadi
Interactive Generation of Laparoscopic Videos with Diffusion Models
Abstract
Generative AI, in general, and synthetic visual data generation, in specific, hold much promise for benefiting surgical training by providing photorealism to simulation environments. Current training methods primarily rely on reading materials and observing live surgeries, which can be time-consuming and impractical. In this work, we take a significant step towards improving the training process. Specifically, we use diffusion models in combination with a zero-shot video diffusion method to interactively generate realistic laparoscopic images and videos by specifying a surgical action through text and guiding the generation with tool positions through segmentation masks. We demonstrate the performance of our approach using the publicly available Cholec dataset family and evaluate the fidelity and factual correctness of our generated images using a surgical action recognition model as well as the pixel-wise F1-score for the spatial control of tool generation. We achieve an FID of 38.097 and an F1-score of 0.71.
Ivan Iliash, Simeon Allmendinger, Felix Meissen, Niklas Kühl, Daniel Rückert
Multi-parametric MRI to FMISO PET Synthesis for Hypoxia Prediction in Brain Tumors
Abstract
This research paper presents a novel approach to the prediction of hypoxia in brain tumors, using multi-parametric Magnetic Resonance Imaging (MRI). Hypoxia, a condition characterized by low oxygen levels, is a common feature of malignant brain tumors associated with poor prognosis. Fluoromisonidazole Positron Emission Tomography (FMISO PET) is a well-established method for detecting hypoxia in vivo, but it is expensive and not widely available.
Our study proposes the use of MRI, a more accessible and cost-effective imaging modality, to predict FMISO PET signals. We investigate Deep Learning (DL) based approaches trained on the ACRIN 6684 dataset, a resource that contains paired MRI and FMISO PET images from patients with brain tumors. With 3D extension of state-the-art models and spatial constraints to the objective function, specifically in the tumor region, our trained models effectively learn the complex relationships between the MRI features and the corresponding FMISO PET signals, thereby enabling the prediction of hypoxia from MRI scans alone.
The results show a strong correlation between the predicted and actual FMISO PET signals, with an overall PSNR score above 29.6 and a SSIM score greater than 0.94, confirming MRI as a promising option for hypoxia prediction in brain tumors. This approach could significantly improve the accessibility of hypoxia detection in clinical settings, with the potential for more timely and targeted treatments.
Daniele Perlo, Georgia Kanli, Selma Boudissa, Olivier Keunen
qMRI Diffuser: Quantitative T1 Mapping of the Brain Using a Denoising Diffusion Probabilistic Model
Abstract
Quantitative MRI (qMRI) offers significant advantages over weighted images by providing objective parameters related to tissue properties. Deep learning-based methods have demonstrated effectiveness in estimating quantitative maps from series of weighted images. In this study, we present qMRI Diffuser, a novel approach to qMRI utilising deep generative models. Specifically, we implemented denoising diffusion probabilistic models (DDPM) for T1 quantification in the brain, framing the estimation of quantitative maps as a conditional generation task. The proposed method is compared with the residual neural network (ResNet) and the recurrent inference machine (RIM) on both phantom and in vivo data. The results indicate that our method achieves improved accuracy and precision in parameter estimation, along with superior visual performance. Moreover, our method inherently incorporates stochasticity, enabling straightforward quantification of uncertainty. Hence, the proposed method holds significant promise for quantitative MR mapping.
Shishuai Wang, Hua Ma, Juan A. Hernandez-Tamames, Stefan Klein, Dirk H. J. Poot
On Differentially Private 3D Medical Image Synthesis with Controllable Latent Diffusion Models
Abstract
Generally, the small size of public medical imaging datasets coupled with stringent privacy concerns, hampers the advancement of data-hungry deep learning models in medical imaging. This study addresses these challenges for 3D cardiac MRI images in the short-axis view. We propose Latent Diffusion Models that generate synthetic images conditioned on medical attributes, while ensuring patient privacy through differentially private model training. To our knowledge, this is the first work to apply and quantify differential privacy in 3D medical image generation. We pre-train our models on public data and finetune them with differential privacy on the UK Biobank dataset. Our experiments reveal that pre-training significantly improves model performance, achieving a Fréchet Inception Distance (FID) of 26.77 at \(\epsilon =10\), compared to 92.52 for models without pre-training. Additionally, we explore the trade-off between privacy constraints and image quality, investigating how tighter privacy budgets affect output controllability and may lead to degraded performance. Our results demonstrate that proper consideration during training with differential privacy can substantially improve the quality of synthetic cardiac MRI images, but there are still notable challenges in achieving consistent medical realism. Code: https://​github.​com/​compai-lab/​2024-miccai-dgm-daum.
Deniz Daum, Richard Osuala, Anneliese Riess, Georgios Kaissis, Julia A. Schnabel, Maxime Di Folco
Five Pitfalls When Assessing Synthetic Medical Images with Reference Metrics
Abstract
Reference metrics have been developed to objectively and quantitatively compare two images. Especially for evaluating the quality of reconstructed or compressed images, these metrics have shown very useful. Extensive tests of such metrics on benchmarks of artificially distorted natural images have revealed which metric best correlate with human perception of quality. Direct transfer of these metrics to the evaluation of generative models in medical imaging, however, can easily lead to pitfalls, because assumptions about image content, image data format and image interpretation are often very different. Also, the correlation of reference metrics and human perception of quality can vary strongly for different kinds of distortions and commonly used metrics, such as SSIM, PSNR and MAE are not the best choice for all situations. We selected five pitfalls that showcase unexpected and probably undesired reference metric scores and discuss strategies to avoid them.
Melanie Dohmen, Tuan Truong, Ivo M. Baltruschat, Matthias Lenga
Augmenting Prostate MRI Dataset with Synthetic Volumetric Images from Zone-Conditioned Diffusion Generative Model
Abstract
The need for artificial intelligence (AI)-driven computer-assist ed diagnosis (CAD) tools drives up the demand for large high-quality datasets in medical imaging. However, collecting the necessary amount of data is often impractical due to patient privacy concerns or restricted time for medical annotation. Recent advances in generative models in medical imaging with a focus on diffusion-based techniques could provide realistic-looking synthetic samples as a supplement for real data. In this work, we study whether synthetic volumetric MRIs generated by the diffusion model can be used to train downstream models, e.g., semantic segmentation. We can create an arbitrarily large dataset with ground truth by conditioning the diffusion model with a segmentation mask. Thus, the additional synthetic data can be used to control the dataset diversity. Experiments revealed that downstream tasks profit from additional synthetic data. However, the effect will eventually diminish when sufficient real samples are available. We showcase the strength of the synthetic data and provide practical recommendations for using the generated data in zonal prostate segmentation.
Oleksii Bashkanov, Marko Rak, Lucas Engelage, Christian Hansen
TiBiX: Leveraging Temporal Information for Bidirectional X-Ray and Report Generation
Abstract
With the emergence of vision language models in the medical imaging domain, numerous studies have focused on two dominant research activities: (1) report generation from Chest X-rays (CXR), and (2) synthetic scan generation from text or reports. Despite some research incorporating multi-view CXRs into the generative process, prior patient scans and reports have been generally disregarded. This can inadvertently lead to the leaving out of important medical information, thus affecting generation quality. To address this, we propose TiBiX: Leveraging Temporal information for Bidirectional X-ray and Report Generation. Considering previous scans, our approach facilitates bidirectional generation, primarily addressing two challenging problems: (1) generating the current image from the previous image and current report and (2) generating the current report based on both the previous and current images. Moreover, we extract and release a curated temporal benchmark dataset derived from the MIMIC-CXR dataset, which focuses on temporal data. Our comprehensive experiments and ablation studies explore the merits of incorporating prior CXRs and achieve state-of-the-art (SOTA) results on the report generation task. Furthermore, we attain on-par performance with SOTA image generation efforts, thus serving as a new baseline in longitudinal bidirectional CXR-to-report generation. The code is available at https://​github.​com/​BioMedIA-MBZUAI/​TiBiX.
Santosh Sanjeev, Fadillah Adamsyah Maani, Arsen Abzhanov, Vijay Ram Papineni, Ibrahim Almakky, Bartłomiej W. Papież, Mohammad Yaqub
Segmentation-Guided MRI Reconstruction for Meaningfully Diverse Reconstructions
Abstract
Inverse problems, such as accelerated MRI reconstruction, are ill-posed and an infinite amount of possible and plausible solutions exist. This may not only lead to uncertainty in the reconstructed image but also in downstream tasks such as semantic segmentation. This uncertainty, however, is mostly not analyzed in the literature, even though probabilistic reconstruction models are commonly used. These models can be prone to ignore plausible but unlikely solutions like rare pathologies. Building on MRI reconstruction approaches based on diffusion models, we add guidance to the diffusion process during inference, generating two meaningfully diverse reconstructions corresponding to an upper and lower bound segmentation. The reconstruction uncertainty can then be quantified by the difference between these bounds, which we coin the ‘uncertainty boundary’. We analyzed the behavior of the upper and lower bound segmentations for a wide range of acceleration factors and found the uncertainty boundary to be both more reliable and more accurate compared to repeated sampling. Code is available at https://​github.​com/​NikolasMorshuis/​SGR.
Jan Nikolas Morshuis, Matthias Hein, Christian F. Baumgartner
Non-reference Quality Assessment for Medical Imaging: Application to Synthetic Brain MRIs
Abstract
Generating high-quality synthetic data is crucial for addressing challenges in medical imaging, such as domain adaptation, data scarcity, and privacy concerns. Existing image quality metrics often rely on reference images, are tailored for group comparisons, or are intended for 2D natural images, limiting their efficacy in complex domains like medical imaging. This study introduces a novel deep learning-based non-reference approach to assess brain MRI quality by training a 3D ResNet. The network is designed to estimate quality across six distinct artifacts commonly encountered in MRI scans. Additionally, a diffusion model is trained on diverse datasets to generate synthetic 3D images of high fidelity. The approach leverages several datasets for training and comprehensive quality assessment, benchmarking against state-of-the-art metrics for real and synthetic images. Results demonstrate superior performance in accurately estimating distortions and reflecting image quality from multiple perspectives. Notably, the method operates without reference images, indicating its applicability for evaluating deep generative models. Besides, the quality scores in the [0, 1] range provide an intuitive assessment of image quality across heterogeneous datasets. Evaluation of generated images offers detailed insights into specific artifacts, guiding strategies for improving generative models to produce high-quality synthetic images. This study presents the first comprehensive method for assessing the quality of real and synthetic 3D medical images in MRI contexts without reliance on reference images.
Karl Van Eeden Risager, Torkan Gholamalizadeh, Mostafa Mehdipour Ghazi
LatentArtiFusion: An Effective and Efficient Histological Artifacts Restoration Framework
Abstract
Histological artifacts pose challenges for both pathologists and Computer-Aided Diagnosis (CAD) systems, leading to errors in analysis. Current approaches for histological artifact restoration, based on Generative Adversarial Networks (GANs) and pixel-level Diffusion Models, suffer from performance limitations and computational inefficiencies. In this paper, we propose a novel framework, LatentArtiFusion, which leverages the latent diffusion model (LDM) to reconstruct histological artifacts with high performance and computational efficiency. Unlike traditional pixel-level diffusion frameworks, LatentArtiFusion executes the restoration process in a lower-dimensional latent space, significantly improving computational efficiency. Moreover, we introduce a novel regional artifact reconstruction algorithm in latent space to prevent mistransfer in non-artifact regions, distinguishing our approach from GAN-based methods. Through extensive experiments on real-world histology datasets, LatentArtiFusion demonstrates remarkable speed, outperforming state-of-the-art pixel-level diffusion frameworks by more than \(30{\times }\). It also consistently surpasses GAN-based methods by at least 5% across multiple evaluation metrics. Furthermore, we evaluate the effectiveness of our proposed framework in downstream tissue classification tasks, showcasing its practical utility. Code is available at https://​github.​com/​bugs-creator/​LatentArtiFusion​.
Zhenqi He, Wenrui Liu, Minghao Yin, Kai Han
How to Segment in 3D Using 2D Models: Automated 3D Segmentation of Prostate Cancer Metastatic Lesions on PET Volumes Using Multi-angle Maximum Intensity Projections and Diffusion Models
Abstract
Prostate specific membrane antigen (PSMA) positron emission tomography/computed tomography (PET/CT) imaging provides a tremendously exciting frontier in visualization of prostate cancer (PCa) metastatic lesions. However, accurate segmentation of metastatic lesions is challenging due to low signal-to-noise ratios and variable sizes, shapes, and locations of the lesions. This study proposes a novel approach for automated segmentation of metastatic lesions in PSMA PET/CT 3D volumetric images using 2D denoising diffusion probabilistic models (DDPMs). Instead of 2D trans-axial slices or 3D volumes, the proposed approach segments the lesions on generated multi-angle maximum intensity projections (MA-MIPs) of the PSMA PET images, then obtains the final 3D segmentation masks from 3D ordered subset expectation maximization (OSEM) reconstruction of 2D MA-MIPs segmentations. Our proposed method achieved superior performance compared to state-of-the-art 3D segmentation approaches in terms of accuracy and robustness in detecting and segmenting small metastatic PCa lesions. The proposed method has significant potential as a tool for quantitative analysis of metastatic burden in PCa patients.
Amirhosein Toosi, Sara Harsini, François Bénard, Carlos Uribe, Arman Rahmim
Backmatter
Metadata
Title
Deep Generative Models
Editors
Anirban Mukhopadhyay
Ilkay Oksuz
Sandy Engelhardt
Dorit Mehrof
Yixuan Yuan
Copyright Year
2025
Electronic ISBN
978-3-031-72744-3
Print ISBN
978-3-031-72743-6
DOI
https://doi.org/10.1007/978-3-031-72744-3

Premium Partner