Skip to main content
Top

2025 | Book

Simplifying Medical Ultrasound

5th International Workshop, ASMUS 2024, Held in Conjunction with MICCAI 2024, Marrakesh, Morocco, October 6, 2024, Proceedings

insite
SEARCH

About this book

This book constitutes the proceedings of the 5th International Workshop on Simplifying Medical Ultrasound, ASMUS 2024, held in conjunction with MICCAI 2024, the 27th International Conference on Medical Image Computing and Computer-Assisted Intervention. The conference took place in Marrakesh, Morocco on October 6, 2024.

The 21 full papers presented in this book were carefully reviewed and selected from 34 submissions. They were organized in topical sections as follows:​ Image Acquisition, Synthesis and Enhancement; Tracking, Registration and Image-guided Interventions; Segmentation; and Classification and Detection.

Table of Contents

Frontmatter

Image Acquisition, Synthesis and Enhancement

Frontmatter
Unsupervised Physics-Inspired Shear Wave Speed Estimation in Ultrasound Elastography
Abstract
Shear wave elastography (SWE) is a promising tool to quantify tissue stiffness variations with increasing applications in tissue characterization. In SWE, the tissue is excited by an acoustic radiation force pulse sequence induced by an ultrasound probe. The generated shear waves propagate laterally away from the push location. The shear wave speed (SWS) can be measured to estimate elasticity, which is a physical property that can be used to characterize the tissue. SWS estimation requires two steps: speckle tracking from radiofrequency (RF)/IQ data to obtain particle displacement or velocity, and SWS estimation from the estimated velocity, which aims to find the speed of wave propagating in the lateral direction. The SWS can be calculated by comparing the velocity-time profiles at two locations separated by a few millimeters. In the supervised deep learning methods of SWS estimation, simulation data generated by finite element analysis is employed to train the network. However, the computational cost and complexity of modeling the wave propagation contribute to the limited practicality of supervised methods. In this paper, we present an unsupervised physics-inspired learning method for SWS estimation using equations governing the wave propagation in a viscoelastic medium. The proposed method does not require any finite element simulated data, and training data is synthetically generated using forward modeling of the wave propagation equation. Furthermore, unlabeled experimental data is utilized to train/fine-tune the network. We validated the proposed method using experimental data imaged by different machines and data created by placing pork fat on top of a phantom. The findings validate that the suggested approach can demonstrate comparable (or superior) performance compared to the traditional cross-correlation method.
Ali Kafaei Zad Tehrani, E. G. Sunethra Dayavansha, Yuyang Gu, Ion Candel, Michael Wang, Rimon Tadross, Yiming Xiao, Hassan Rivaz, Kai Thomenius, Anthony Samir
Simplifying Prostate Elastography Using Micro-ultrasound and Transfer Function Imaging
Abstract
Prostate cancer is one of the most commonly diagnosed cancers worldwide, yet working towards more accurate and cost-effective detection strategies for this disease remains an active area of research. This includes introducing a new method called micro-ultrasound (microUS), which has equal performance to the gold standard multiparametric magnetic resonance imaging for prostate biopsy guidance. Cancerous lesions are often stiffer than their healthy surroundings, and this tissue stiffness can be imaged using elastography. Clinical strain elastography generally uses manual compression of the tissue via the probe face; however, this method is highly user-dependent and can result in unreliable images due to the complexity and steep learning curve to apply ideal compression. Here, we implement and validate a relative elastography method called transfer function (TF) imaging, which uses automatic tissue compression from a voice coil motor attached to the microUS probe for excitation, and calculates the tissue’s relative stiffness from its frequency response to this excitation. We demonstrate our method’s improved repeatability compared to manual strain elastography using quantitative and qualitative evaluations performed using a commercial quality assurance elasticity phantom. Overall, this method makes elastography much simpler for clinicians, further enabling its use in guiding prostate biopsy procedures.
Reid Vassallo, Tajwar Abrar Aleef, Vedanth Desaigoudar, Qi Zeng, David Black, Brian Wodlinger, Miles Mannas, Peter C. Black, Septimiu E. Salcudean
Do High-Performance Image-to-Image Translation Networks Enable the Discovery of Radiomic Features? Application to MRI Synthesis from Ultrasound in Prostate Cancer
Abstract
This study investigates the foundational characteristics of image-to-image translation networks, specifically examining their suitability and transferability within the context of routine clinical environments, despite achieving high levels of performance, as indicated by a Structural Similarity Index (SSIM) exceeding 85%. The evaluation study was conducted using data from 794 patients diagnosed with Prostate cancer (PCa). To synthesize MRI from Ultrasound (US) images, we employed five widely recognized image-to-image translation networks in medical imaging: 2D-Pix2Pix, 2D-CycleGAN, 3D-CycleGAN, 3D-UNET, and 3D-AutoEncoder. For quantitative assessment, we report four prevalent evaluation metrics: Mean Absolute Error (MAE), Mean Square Error (MSE), Structural Similarity Index (SSIM), and Peak Signal to Noise Ratio (PSNR). Moreover, a complementary analysis employing Radiomic features (RF) via Spearman correlation coefficient was conducted to investigate, for the first time, whether networks achieving high performance (SSIM > 85%) could identify low-level RFs. The RF analysis showed 75 features out of 186 RFs were discovered via just 2D-Pix2Pix algorithm while half of RFs were lost in the translation process. Finally, a detailed qualitative assessment by five medical doctors indicated a lack of low-level feature discovery in image-to-image translation tasks. This study indicates current image-to-image translation networks, even with a high performance (SSIM > 0.85), don’t guarantee the discovery of low-level information which is essential for the integration of synthesized MRI data into regular clinical practice.
Mohammad R. Salmanpour, Amin Mousavi, Yixi Xu, William B. Weeks, Ilker Hacihaliloglu
PHOCUS: Physics-Based Deconvolution for Ultrasound Resolution Enhancement
Abstract
Ultrasound is widely used in medical diagnostics allowing for accessible and powerful imaging but suffers from resolution limitations due to diffraction and the finite aperture of the imaging system, which restricts diagnostic use. The impulse function of an ultrasound imaging system is called the point spread function (PSF), which is convolved with the spatial distribution of reflectors in the image formation process. Recovering high-resolution reflector distributions by removing image distortions induced by the convolution process improves image clarity and detail. Conventionally, deconvolution techniques attempt to rectify the imaging system’s dependent PSF, working directly on the radio-frequency (RF) data. However, RF data is often not readily accessible. Therefore, we introduce a physics-based deconvolution process using a modeled PSF, working directly on the more commonly available B-mode images. By leveraging Implicit Neural Representations (INRs), we learn a continuous mapping from spatial locations to their respective echogenicity values, effectively compensating for the discretized image space. Our contribution consists of a novel methodology for retrieving a continuous echogenicity map directly from a B-mode image through a differentiable physics-based rendering pipeline for ultrasound resolution enhancement. We qualitatively and quantitatively evaluate our approach on synthetic data, demonstrating improvements over traditional methods in metrics such as PSNR and SSIM. Furthermore, we show qualitative enhancements on an ultrasound phantom and an in-vivo acquisition of a carotid artery.
Felix Duelmer, Walter Simson, Mohammad Farid Azampour, Magdalena Wysocki, Angelos Karlas, Nassir Navab

Tracking, Registration and Image-guided Interventions

Frontmatter
PIPsUS: Self-supervised Point Tracking in Ultrasound
Abstract
Finding point-level correspondences is a fundamental problem in ultrasound (US), enabling US landmark tracking for intraoperative image guidance and motion estimation. Most US tracking methods are based on optical flow or feature matching, initially designed for RGB images. Therefore domain shift can impact their performance. Ground-truth correspondences could supervise training, but these are expensive to acquire. To solve these problems, we propose a self-supervised point-tracking model called PIPsUS. Our model can track an arbitrary number of points at pixel-level in one forward pass and exploits temporal information by considering multiple, instead of just consecutive, frames. We developed a new self-supervised training strategy that utilizes a long-term point-tracking model trained for RGB images as a teacher to guide the model to learn realistic motions and use data augmentation to enforce tracking from US appearance. We evaluate our method on neck and oral US and echocardiography, showing higher point tracking accuracy when compared with fast normalized cross-correlation and tuned optical flow. Codes are available at https://​github.​com/​aliciachenw/​PIPsUS.
Wanwen Chen, Adam Schmidt, Eitan Prisman, Septimiu E. Salcudean
Structure-aware World Model for Probe Guidance via Large-scale Self-supervised Pre-train
Abstract
The complex structure of the heart leads to significant challenges in echocardiography, especially in acquisition cardiac ultrasound images. Successful echocardiography requires a thorough understanding of the structures on the two-dimensional plane and the spatial relationships between planes in three-dimensional space. In this paper, we innovatively propose a large-scale self-supervised pre-training method to acquire a cardiac structure-aware world model. The core innovation lies in constructing a self-supervised task that requires structural inference by predicting masked structures on a 2D plane and imagining another plane based on pose transformation in 3D space. To support large-scale pre-training, we collected over 1.36 million echocardiograms from ten standard views, along with their 3D spatial poses. In the downstream probe guidance task, we demonstrate that our pre-trained model consistently reduces guidance errors across the ten most common standard views on the test set with 0.29 million samples from 74 routine clinical scans, indicating that structure-aware pre-training benefits the scanning.
Haojun Jiang, Meng Li, Zhenguo Sun, Ning Jia, Yu Sun, Shaqi Luo, Shiji Song, Gao Huang
An Evaluation of Low-Cost Hardware on 3D Ultrasound Reconstruction Accuracy
Abstract
The advances in consumer-grade hardware, such as optical trackers and portable ultrasound machines, has paved the way for the development of more cost-effective systems. In this paper, we aimed to assess the accuracy of low-cost tracking alternatives in the context of 3D freehand ultrasound (US) reconstruction. Specifically, we compared two low-cost tracking options: a depth camera and a low-end optical tracker, to an FDA approved high-end infrared optical tracking system. Additionally, we compared two US systems, a low-cost handheld US system with a high-resolution ultrasound mobile station. Each tracker and probe pair underwent 20 acquisitions in ideal conditions. An additional 20 acquisitions were made at 3 suboptimal tracker placements. These two experiments showed no statistically significant difference between probes and no difference between the low- and high-end optical trackers on accuracy of reconstructions. As a proof of principle, we performed volume-to-volume registration using the US reconstructions and found that low-cost probe and low-cost optical tracking is similar to using the standard high cost system. These findings suggest that low-cost hardware may offer a solution in the operating room or environments where commercial hardware systems are not available without compromising on the accuracy and usability of US image-guidance.
Étienne Léger, Niki Najafi, Houssem-Eddine Gueziri, D. Louis Collins, Marta Kersten-Oertel
Learning to Match 2D Keypoints Across Preoperative MR and Intraoperative Ultrasound
Abstract
We propose in this paper a texture-invariant 2D keypoints descriptor specifically designed for matching preoperative Magnetic Resonance (MR) images with intraoperative Ultrasound (US) images. We introduce a matching-by-synthesis strategy, where intraoperative US images are synthesized from MR images accounting for multiple MR modalities and intraoperative US variability. We build our training set by enforcing keypoints localization over all images then train a patient-specific descriptor network that learns texture-invariant discriminant features in a supervised contrastive manner, leading to robust keypoints descriptors. Our experiments on real cases with ground truth show the effectiveness of the proposed approach, outperforming the state-of-the-art methods and achieving \(80.35\%\) matching precision on average.
Hassan Rasheed, Reuben Dorent, Maximilian Fehrentz, Tina Kapur, William M. Wells III, Alexandra Golby, Sarah Frisken, Julia A. Schnabel, Nazim Haouchine
Automatic Facial Axes Standardization of 3D Fetal Ultrasound Images
Abstract
Craniofacial anomalies indicate early developmental disturbances and are usually linked to many genetic syndromes. Early diagnosis is critical, yet ultrasound (US) examinations often fail to identify these features. This study presents an AI-driven tool to assist clinicians in standardizing fetal facial axes/planes in 3D US, reducing sonographer workload and facilitating the facial evaluation. Our network, structured into three blocks-feature extractor, rotation and translation regression, and spatial transformer-processes three orthogonal 2D slices to estimate the necessary transformations for standardizing the facial planes in the 3D US. These transformations are applied to the original 3D US using a differentiable module (the spatial transformer block), yielding a standardized 3D US and the corresponding 2D facial standard planes. The dataset used consists of 1180 fetal facial 3D US images acquired between weeks 20 and 35 of gestation. Results show that our network considerably reduces inter-observer rotation variability in the test set, with a mean geodesic angle difference of 14.12\(^{\circ }\) ± 18.27\(^{\circ }\) and an Euclidean angle error of 7.45\(^{\circ }\) ± 14.88\(^{\circ }\). These findings demonstrate the network’s ability to effectively standardize facial axes, crucial for consistent fetal facial assessments. In conclusion, the proposed network demonstrates potential for improving the consistency and accuracy of fetal facial assessments in clinical settings, facilitating early evaluation of craniofacial anomalies.
Antonia Alomar, Ricardo Rubio, Laura Salort, Gerard Albaiges, Antoni Payà, Gemma Piella, Federico Sukno

Segmentation

Frontmatter
C-TRUS: A Novel Dataset and Initial Benchmark for Colon Wall Segmentation in Transabdominal Ultrasound
Abstract
Examining the colon wall in transabdominal ultrasound images emerges as a promising, non-invasive approach for diagnosing and managing ulcerating colitis, a widespread inflammatory bowel disease affecting millions of people worldwide. However, due to its intricacies, this examination has thus far been confined to experts with specialized training. To the best of our knowledge, we are the first to evaluate automated colon wall segmentation using several advanced deep learning segmentation architectures in combination with established and specialized loss functions. To this end, we publish a new open-source dataset, named C-TRUS, including expert annotations for 827 transabdominal ultrasound images as well as image quality categorizations. Furthermore, we establish inter-observer variability, and find that colon wall segmentation is challenging even for medical experts, reaching a moderate average consensus Dice score of 0.6134. The best performing model is the Mask R-CNN architecture, achieving an average Dice score of 0.7249 across all image quality categories and a Dice score of 0.8218 on high quality images. We provide the C-TRUS dataset at https://​github.​com/​wwu-mmll/​c-trus.
Ramona Leenings, Maximilian Konowski, Nils R. Winter, Jan Ernsting, Lukas Fisch, Carlotta Barkhau, Udo Dannlowski, Andreas Lügering, Xiaoyi Jiang, Tim Hahn
Label Dropout: Improved Deep Learning Echocardiography Segmentation Using Multiple Datasets with Domain Shift and Partial Labelling
Abstract
Echocardiography (echo) is the first imaging modality used when assessing cardiac function. The measurement of functional biomarkers from echo relies upon the segmentation of cardiac structures and deep learning models have been proposed to automate the segmentation process. However, in order to translate these tools to widespread clinical use it is important that the segmentation models are robust to a wide variety of images (e.g. acquired from different scanners, by operators with different levels of expertise etc.). To achieve this level of robustness it is necessary that the models are trained with multiple diverse datasets. A significant challenge faced when training with multiple diverse datasets is the variation in label presence, i.e. the combined data are often partially-labelled. Adaptations of the cross entropy loss function have been proposed to deal with partially labelled data. In this paper we show that training naively with such a loss function and multiple diverse datasets can lead to a form of shortcut learning, where the model associates label presence with domain characteristics, leading to a drop in performance. To address this problem, we propose a novel label dropout scheme to break the link between domain characteristics and the presence or absence of labels. We demonstrate that label dropout improves echo segmentation Dice score by 62% and 25% on two cardiac structures when training using multiple diverse partially labelled datasets.
Iman Islam, Esther Puyol-Antón, Bram Ruijsink, Andrew J. Reader, Andrew P. King
Introducing Anatomical Constraints in Mitral Annulus Segmentation in Transesophageal Echocardiography
Abstract
The morphology of the mitral annulus plays an important role in diagnosing and treating mitral valve disorders. Automated segmentation has the promise to be time-saving and improve consistency in clinical practice. In the past years, segmentation has been dominated by methods based on deep learning. Deep learning-based segmentation methods have shown good results, but their consistency and robustness are still subjects of active research. In this work, we introduce a method that combines Graph Convolutional Networks with a 3D CNN model to integrate an anatomical shape template for the predictions. Our method leverages the feature extraction capability of CNN models to provide input features to the graph neural networks. The proposed method leverages strengths from a shape model approach with the strengths of deep learning. Further, we propose loss functions for the CNN designed to guide the graph model training. The CNN was trained with transfer learning, using a limited number of labeled transesophageal echocardiography volumes to adapt to the mitral annulus segmentation task. When comparing the segmentation of the mitral annulus achieved by the proposed method with the test set annotations, the method showed a high degree of accuracy, achieving a curve-to-curve error of \(2.00 \pm 0.81\) mm and a relative perimeter error of \(4.42 \pm 3.33\)%. Our results show that the proposed method is a promising new approach for introducing anatomical template structures in medical segmentation tasks.
Børge Solli Andreassen, Sarina Thomas, Anne H. Schistad Solberg, Eigil Samset, David Völgyes
Interactive Segmentation Model for Placenta Segmentation from 3D Ultrasound Images
Abstract
Placenta volume measurement from 3D ultrasound images is critical for predicting pregnancy outcomes, and manual annotation is the gold standard. However, such manual annotation is expensive and time consuming. Automated segmentation algorithms can often successfully segment the placenta, but these methods may not consistently produce robust segmentations suitable for practical use. Recently, inspired by the Segment Anything Model (SAM), deep learning-based interactive segmentation models have been widely applied in the medical imaging domain. These models produce a segmentation from visual prompts provided to indicate the target region, which may offer a feasible solution for practical use. However, none of these models are specifically designed for interactively segmenting 3D ultrasound images, which remain challenging due to the inherent noise of this modality. In this paper, we evaluate publicly available state-of-the-art 3D interactive segmentation models in contrast to a human-in-the-loop approach for the placenta segmentation task. The Dice score, normalized surface Dice, averaged symmetric surface distance, and 95-percent Hausdorff distance are used as evaluation metrics. We consider a Dice score of 0.95 a successful segmentation. Our results indicate that the human-in-the-loop segmentation model reaches this standard. Moreover, we assess the efficiency of the human-in-the-loop model as a function of the amount of prompts. Our results demonstrate that the human-in-the-loop model is both effective and efficient for interactive placenta segmentation. The code is available at https://​github.​com/​MedICL-VU/​PRISM-placenta.​
Hao Li, Baris Oguz, Gabriel Arenas, Xing Yao, Jiacheng Wang, Alison Pouch, Brett Byram, Nadav Schwartz, Ipek Oguz
Enhanced Uncertainty Estimation in Ultrasound Image Segmentation with MSU-Net
Abstract
Efficient intravascular access in trauma and critical care significantly impacts patient outcomes. However, the availability of skilled medical personnel in austere environments is often limited. Despite advances in autonomous needle insertion, inaccuracies in vessel segmentation predictions pose risks. Understanding the uncertainty of predictive models in ultrasound imaging is crucial for assessing their reliability. We introduce MSU-Net, a novel multistage approach for training an ensemble of U-Nets to yield ultrasound image segmentation maps. We demonstrate substantial improvements, 27.7% over a single Monte Carlo U-Net, enhancing uncertainty evaluations, model transparency, and trustworthiness. By identifying areas where the model is highly confident, MSU-Net helps to better interpret anatomical details and improve the understanding of vessel locations.
Rohini Banerjee, Cecilia G. Morales, Artur Dubrawski

Classification and Detection

Frontmatter
Multi-site Class-Incremental Learning with Weighted Experts in Echocardiography
Abstract
Building an echocardiography view classifier that maintains performance in real-life cases requires diverse multi-site data, and frequent updates with newly available data to mitigate model drift. Simply fine-tuning on new datasets results in “catastrophic forgetting”, and cannot adapt to variations of view labels between sites. Alternatively, collecting all data on a single server and re-training may not be feasible as data sharing agreements may restrict image transfer, or datasets may only become available at different times. Furthermore, time and cost associated with re-training grows with every new dataset. We propose a class-incremental learning method which learns an expert network for each dataset, and combines all expert networks with a score fusion model. The influence of “unqualified experts” is minimised by weighting each contribution with a learnt in-distribution score. These weights promote transparency as the contribution of each expert is known during inference. Instead of using the original images, we use learned features from each dataset, which are easier to share and raise fewer licensing and privacy concerns. We validate our work on six datasets from multiple sites, demonstrating significant reductions in training time while improving view classification performance.
Kit M. Bransby, Woo-Jin Cho Kim, Jorge Oliveira, Alex Thorley, Arian Beqiri, Alberto Gomez, Agisilaos Chartsias
Masked Autoencoders for Medical Ultrasound Videos Using ROI-Aware Masking
Abstract
In routine clinical practice, a vast amount of data is generated, including myriads of ultrasound recordings. However, their annotation and interpretation are labor-intensive; thus, a method that can incorporate this unlabeled data into deep learning pipelines would be highly beneficial. Video masked autoencoders (VideoMAE) are state-of-the-art pre-training techniques and have performed exceptionally well in various computer vision tasks. Accordingly, we hypothesized that a VideoMAE pre-trained on a large unlabeled dataset of ultrasound recordings could also perform well in a downstream task following supervised training on a smaller but labeled dataset. Nevertheless, we found that the conventional masking strategy of the VideoMAE pipeline may perform sub-optimally in the specific domain of ultrasound videos. Motivated by this, we proposed a novel region of interest (ROI)-aware masking method that considers the specific characteristics of this domain. We demonstrated that applying our method instead of the conventional masking strategy significantly improves the VideoMAE’s performance in clinically relevant downstream tasks, even when we reduced the labeled training dataset to one-tenth of its original sample size. The source code for this paper is available at https://​github.​com/​szadam96/​ROI-aware-masking.
Ádám Szijártó, Bálint Magyar, Thomas Á. Szeier, Máté Tolvaj, Alexandra Fábián, Bálint K. Lakatos, Zsuzsanna Ladányi, Zsolt Bagyura, Béla Merkely, Attila Kovács, Márton Tokodi
Uncertainty-Based Multi-modal Learning for Myocardial Infarction Diagnosis Using Echocardiography and Electrocardiograms
Abstract
Medical devices used in cardiac diagnostics typically capture only one aspect of heart function. For instance, 2D B-mode echocardiography reveals the heart’s anatomy and mechanical changes, while an electrocardiogram (ECG) records the heart’s electrical activity from various positions. These examinations, essential for diagnosing cardiac diseases, are usually performed sequentially rather than simultaneously, providing complementary information for the final diagnosis. Recently, the integration of multi-modal information in AI research for healthcare has gained popularity, aiming for more robust diagnostic outcomes. However, the scarcity of publicly available multi-modal data for cardiac disease diagnosis poses a significant challenge to multi-modal learning and evaluation. In this study, we propose an uncertainty-based deep learning framework that utilizes unpaired data from different modalities to improve the diagnosis of myocardial infarction (MI) using both echocardiography and ECG data. Specifically, we trained two unimodal classification models incorporating uncertainty using public single-modal datasets. We then performed multi-modal classification using uncertainty-based decision fusion on a paired dataset, without the need for transfer learning or retraining. Our experiments demonstrated that uncertainty-based multi-modal decision fusion outperforms conventional fusion strategies by 4% in accuracy and unimodal models by 7% in accuracy. This approach is both flexible and data-efficient, making uncertainty-based multi-modal fusion a sustainable and strong solution for both unpaired and paired multi-modal classification.
Yingyu Yang, Marie Rocher, Pamela Moceri, Maxime Sermesant
Fetal Ultrasound Video Representation Learning Using Contrastive Rubik’s Cube Recovery
Abstract
Contrastive learning (CL), which relies on the contrast between positive and negative pairs, has become the leading paradigm in self-supervised learning. In this paper, we propose a self-supervised learning framework, the feature-level Contrastive Rubik’s Cube Recovery (CRCR). CRCR creates contrastive sub-cube pairs from ultrasound video, which capture local spatio-temporal ultrasound features, unlike traditional CL methods which are spatial and work at the global frame level. This approach learns a representation with both intra- and inter-feature contrast to provide strong local feature discrimination. The proposed method is validated on two fetal ultrasound video tasks. Extensive experiments demonstrate that our approach is effective for learning representations that transfer to both in-domain (second-trimester) and cross-domain (first-trimester) clinical downstream classification tasks. In particular, CRCR outperforms four state-of-the-art contrastive learning-based methods on the in-domain task by 3.8%, 2.0%, 1.9% and 1.1%, with each improvement being statistically significant.
Kangning Zhang, Jianbo Jiao, J. Alison Noble
LoRIS - Weakly-Supervised Anomaly Detection for Ultrasound Images
Abstract
This paper presents LoRIS (Localized Reconstruction-by-Inpainting with Single-mask), a novel weakly-supervised anomaly detection technique designed to identify knee joint recess distension in musculoskeletal ultrasound images, which are noisy and unbalanced (as distended cases are rarer). In this context, supervised techniques require a high number of annotated images of both classes (distended and non-distended). On the other hand, we show that existing unsupervised anomaly detection techniques, which can be trained with images from a single class, are ineffective and often unable to correctly localize the anomaly. To overcome these issues, LoRIS is trained with nondistended images only and uses the recess bounding box as location prior to guide the reconstruction. Experimental results show that LoRIS outperforms state-of-the-art unsupervised anomaly detection techniques. When compared to a state-of-the-art fully supervised solution, LoRIS presents similar performance but has two key advantages: during training it requires images from a single class only, and it also outputs the recess segmentation, without the need for segmentation annotations.
Marco Colussi, Dragan Ahmetovic, Gabriele Civitarese, Claudio Bettini, Aiman Solyman, Roberta Gualtierotti, Flora Peyvandi, Sergio Mascetti
Unsupervised Detection of Fetal Brain Anomalies Using Denoising Diffusion Models
Abstract
Congenital malformations of the brain are among the most common fetal abnormalities that impact fetal development. Previous anomaly detection methods on ultrasound images are based on supervised learning, rely on manual annotations, and risk missing underrepresented categories. In this work, we frame fetal brain anomaly detection as an unsupervised task using diffusion models. To this end, we employ an inpainting-based Noise Agnostic Anomaly Detection approach that identifies the abnormality using diffusion-reconstructed fetal brain images from multiple noise levels. Our approach only requires normal fetal brain ultrasound images for training, addressing the limited availability of abnormal data. Our experiments on a real-world clinical dataset show the potential of using unsupervised methods for fetal brain anomaly detection. Additionally, we comprehensively evaluate how different noise types affect diffusion models in the fetal anomaly detection domain.
Markus Ditlev Sjøgren Olsen, Jakob Ambsdorf, Manxi Lin, Caroline Taksøe-Vester, Morten Bo Søndergaard Svendsen, Anders Nymark Christensen, Mads Nielsen, Martin Grønnebæk Tolsgaard, Aasa Feragen, Paraskevas Pegios
Diffusion Models for Unsupervised Anomaly Detection in Fetal Brain Ultrasound
Abstract
Ultrasonography is an essential tool in mid-pregnancy for assessing fetal development, appreciated for its non-invasive and real-time imaging capabilities. Yet, the interpretation of ultrasound images is often complicated by acoustic shadows, speckle, and other artifacts that obscure crucial diagnostic details. To address these challenges, our study presents a novel unsupervised anomaly detection framework specifically designed for fetal ultrasound imaging. This framework incorporates gestational age filtering, precise identification of fetal standard planes, and targeted segmentation of brain regions to enhance diagnostic accuracy. Furthermore, we introduce the use of denoising diffusion probabilistic models in this context, marking a significant innovation in detecting previously unrecognized anomalies. We rigorously evaluated the framework using various diffusion-based anomaly detection methods, noise types, and noise levels. Notably, AutoDDPM emerged as the most effective, achieving an area under the precision-recall curve of 79.8% in detecting anomalies. This advancement holds promise for improving the tools available for nuanced and effective prenatal diagnostics.
Hanna Mykula, Lisa Gasser, Silvia Lobmaier, Julia A. Schnabel, Veronika Zimmer, Cosmin I. Bercea
Correction to: Unsupervised Physics-Inspired Shear Wave Speed Estimation in Ultrasound Elastography
Ali Kafaei Zad Tehrani, E. G. Sunethra Dayavansha, Yuyang Gu, Ion Candel, Michael Wang, Rimon Tadross, Yiming Xiao, Hassan Rivaz, Kai Thomenius, Anthony Samir
Backmatter
Metadata
Title
Simplifying Medical Ultrasound
Editors
Alberto Gomez
Bishesh Khanal
Andrew King
Ana Namburete
Copyright Year
2025
Electronic ISBN
978-3-031-73647-6
Print ISBN
978-3-031-73646-9
DOI
https://doi.org/10.1007/978-3-031-73647-6

Premium Partner