Skip to main content

2024 | Book

Applications of Medical Artificial Intelligence

Second International Workshop, AMAI 2023, Held in Conjunction with MICCAI 2023, Vancouver, BC, Canada, October 8, 2023, Proceedings


About this book

This book constitutes the refereed proceedings of the first International Workshop on Applications of Medical Artificial Intelligence, AMAI 2023, held in conjunction with MICCAI 2023, in Vancouver, Canada in October 2023.
The book includes 17 papers which were carefully reviewed and selected from 26 full-length submissions.
The AMAI 2023 workshop created a forum to bring together researchers, clinicians, domain experts, AI practitioners, industry representatives, and students to investigate and discuss various challenges and opportunities related to applications of medical AI.

Table of Contents

Clinical Trial Histology Image Based End-to-End Biomarker Expression Levels Prediction and Visualization Using Constrained GANs
The gold standard for diagnosing cancer is through pathological examination. This typically involves the utilization of staining techniques such as hematoxylin-eosin (H &E) and immunohistochemistry (IHC) as relying solely on H &E can sometimes result in inaccurate cancer diagnoses. IHC examination offers additional evidence to support the diagnostic process. Given challenging accessibility issues of IHC examination, generating virtual IHC images from H &E-stained images presents a viable solution. This study proposes Active Medical Segmentation and Rendering (AMSR), an end-to-end framework for biomarker expression levels prediction and virtual staining, leveraging constrained Generative Adversarial Networks (GAN). The proposed framework mimics the staining processes, surpassing prior works and offering a feasible substitute for traditional histopathology methods. Preliminary results are presented using a clinical trial dataset pertaining to the CEACAM5 biomarker.
Wei Zhao, Bozhao Qi, Yichen Li, Roger Trullo, Elham Attieh, Anne-Laure Bauchet, Qi Tang, Etienne Pochet
More Than Meets the Eye: Physicians’ Visual Attention in the Operating Room
During surgery, the patient’s vital signs and the field of endoscopic view are displayed on multiple screens. As a result, both surgeons’ and anesthesiologists’ visual attention (VA) is crucial. Moreover, the distribution of said VA and the acquisition of specific cues might directly impact patient outcomes.
Recent research utilizes portable, head-mounted eye-tracking devices to gather precise and comprehensive information. Nevertheless, these technologies are not feasible for enduring data acquisition in an operating room (OR) environment. This is particularly the case during medical emergencies.
This study presents an alternative methodology: a webcam-based gaze target prediction model. Such an approach may provide continuous visual behavioral data with minimal interference to the physicians’ workflow in the OR. The proposed end-to-end framework is suitable for both standard and emergency surgeries.
In the future, such a platform may serve as a crucial component of context-aware assistive technologies in the OR.
Sapir Gershov, Fadi Mahameed, Aeyal Raz, Shlomi Laufer
CNNs vs. Transformers: Performance and Robustness in Endoscopic Image Analysis
In endoscopy, imaging conditions are often challenging due to organ movement, user dependence, fluctuations in video quality and real-time processing, which pose requirements on the performance, robustness and complexity of computer-based analysis techniques. This paper poses the question whether Transformer-based architectures, which are capable to directly capture global contextual information, can handle the aforementioned endoscopic conditions and even outperform the established Convolutional Neural Networks (CNNs) for this task. To this end, we evaluate and compare clinically relevant performance and robustness of CNNs and Transformers for neoplasia detection in Barrett’s esophagus. We have selected several top performing CNN and Transformers on endoscopic benchmarks, which we have trained and validated on a total of 10,208 images (2,079 patients), and tested on a total of 4,661 images (743 patients), divided over a high-quality test set and three different robustness test sets. Our results show that Transformers generally perform better on classification and segmentation for the high-quality challenging test set, and show on-par or increased robustness to various clinically relevant input data variations, while requiring comparable model complexity. This robustness against challenging video-related conditions and equipment variations over the hospitals is an essential trait for adoption in clinical practice. The code is made publicly available at: https://​github.​com/​BONS-AI-VCA-AMC/​Endoscopy-CNNs-vs-Transformers.
Carolus H. J. Kusters, Tim G. W. Boers, Tim J. M. Jaspers, Jelmer B. Jukema, Martijn R. Jong, Kiki N. Fockens, Albert J. de Groof, Jacques J. Bergman, Fons van der Sommen, Peter H. N. de With
Investigating the Impact of Image Quality on Endoscopic AI Model Performance
Virtually all endoscopic AI models are developed with clean, high-quality imagery from expert centers, however, the clinical data quality is much more heterogeneous. Endoscopic image quality can degrade by e.g. poor lighting, motion blur, and image compression. This disparity between training, validation data, and real-world clinical practice can have a substantial impact on the performance of deep neural networks (DNNs), potentially resulting in clinically unreliable models. To address this issue and develop more reliable models for automated cancer detection, this study focuses on identifying the limitations of current DNNs. Specifically, we evaluate the performance of these models under clinically relevant and realistic image corruptions, as well as on a manually selected dataset that includes images with lower subjective quality. Our findings highlight the importance of understanding the impact of a decrease in image quality and the need to include robustness evaluation for DNNs used in endoscopy.
Tim J. M. Jaspers, Tim G. W. Boers, Carolus H. J. Kusters, Martijn R. Jong, Jelmer B. Jukema, Albert J. de Groof, Jacques J. Bergman, Peter H. N. de With, Fons van der Sommen
Ensembling Voxel-Based and Box-Based Model Predictions for Robust Lesion Detection
This paper presents a novel generic method to improve lesion detection by ensembling semantic segmentation and object detection models. The proposed approach allows to benefit from both voxel-based and box-based predictions, thus improving the ability to accurately detect lesions. The method consists of 3 main steps: (i) semantic segmentation and object detection models are trained separately; (ii) voxel-based and box-based predictions are matched spatially; (iii) corresponding lesion presence probabilities are combined into summary detection maps. We illustrate and validate the robustness of the proposed approach on three different oncology applications: liver and pancreas neoplasm detection in single-phase CT, and significant prostate cancer detection in multi-modal MRI. Performance is evaluated on publicly-available databases, and compared to two state-of-the art baseline methods. The proposed ensembling approach improves the average precision metric in all considered applications, with a 8% gain for prostate cancer.
Noëlie Debs, Alexandre Routier, Clément Abi-Nader, Arnaud Marcoux, Alexandre Bône, Marc-Michel Rohé
Advancing Abdominal Organ and PDAC Segmentation Accuracy with Task-Specific Interactive Models
Deep learning-based segmentation algorithms have the potential to expedite the cumbersome clinical task of creating detailed target delineations for disease diagnosis and prognosis. However, these algorithms have yet to be widely adopted in clinical practice, partly because the resulting model segmentations often fall short of the necessary accuracy and robustness that clinical practice demands. This research aims to make AI work in the real world, where domain shift is anticipated and inter-observer variability is inherent to medical practice. While current research aims to design models that can address these challenges, we propose an alternative approach that involves minimal user (clinician) interaction in the segmentation process. By combining the pattern recognition abilities of neural networks with the domain knowledge of clinicians, segmentation predictions can deliver the desired clinical result with little effort on the part of clinicians. To test this approach, we implemented, fine-tuned and compared three state-of-the-art (SOTA) interactive AI (IAI) methods for segmenting six different abdominal organs and pancreatic ductal adenocarcinoma (PDAC), an extremely challenging structure to segment, in CT images. We demonstrate that the fine-tuned RITM (Reviving Iterative Training with Mask Guidance for Interactive Segmentation) method can achieve higher segmentation accuracy than non-interactive SOTA models with as few as three clicks, potentially reducing the time required for treatment planning. Overall, IAI may be an effective method for bridging the gap between what deep learning-based segmentation algorithms have to offer and the high standard that is required for patient care.
Sanne E. Okel, Christiaan G. A. Viviers, Mark Ramaekers, Terese A. E. Hellström, Nick Tasios, Dimitrios Mavroeidis, Jon Pluyter, Igor Jacobs, Misha Luyer, Peter H. N. de With, Fons van der Sommen
Anatomical Location-Guided Deep Learning-Based Genetic Cluster Identification of Pheochromocytomas and Paragangliomas from CT Images
Pheochromocytomas and paragangliomas (PPGLs) are respectively intra-adrenal and extra-adrenal neuroendocrine tumors whose pathogenesis and progression are greatly regulated by genetics. Identifying PPGL’s genetic clusters (SDHx, VHL/EPAS1, kinase signaling, and sporadic) is essential as PPGL’s management varies critically on its genotype. But, genetic testing for PPGLs is expensive and time-consuming. Contrast-enhanced CT (CE-CT) scans of PPGL patients are usually acquired at the beginning of patient management for PPGL staging and determining the next therapeutic steps. Given a CE-CT sub-image of the PPGL, this work demonstrates a two-branch vision transformer (PPGL-Transformer) to identify each tumor’s genetic cluster. The standard of reference for each tumor included two items: its genetic cluster from clinical testing, and its anatomical location. One branch of our PPGL-Transformer identifies PPGL’s anatomic location while the other one characterizes PPGL’s genetic type. A supervised contrastive learning strategy was used to train the PPGL-Transformer by optimizing contrastive and classification losses for PPGLs’ genetic group and anatomic location. Our method was evaluated on a dataset comprised of 1010 PPGLs extracted from the CE-CT images of 289 patients. PPGL-Transformer achieved an accuracy of \(0.63 \pm 0.08\), balanced accuracy (BA) of \(0.63 \pm 0.06\) and F1-score of \(0.46 \pm 0.08\) on five-fold cross-validation and outperformed competing methods by 2–29% on accuracy, 3–18% on BA and 3–14% on F1-score. The performance for the sporadic cluster was higher on BA (\(0.68 \pm 0.13\)) while the performance for the SDHx cluster was higher on recall (\(0.83 \pm 0.06\)) and F1-score (\(0.74 \pm 0.07\)).
Bikash Santra, Abhishek Jha, Pritam Mukherjee, Mayank Patel, Karel Pacak, Ronald M. Summers
Video-Based Gait Analysis for Assessing Alzheimer’s Disease and Dementia with Lewy Bodies
Dementia with Lewy Bodies (DLB) and Alzheimer’s Disease (AD) are two common neurodegenerative diseases among elderly people. Gait analysis plays a significant role in clinical assessments to discriminate these neurological disorders from healthy controls, to grade disease severity, and to further differentiate dementia subtypes. In this paper, we propose a deep-learning based model specifically designed to evaluate gait impairment score for assessing the dementia severity using monocular gait videos. Named MAX-GR, our model estimates the sequence of 3D body skeletons, applies corrections based on spatio-temporal gait features extracted from the input video, and performs classification on the corrected 3D pose sequence to determine the MDS-UPDRS gait scores. Experimental results show that our technique outperforms alternative state-of-the-art methods. The code, demo videos, as well as 3D skeleton dataset is available at https://​github.​com/​lisqzqng/​Video-based-gait-analysis-for-dementia.
Diwei Wang, Chaima Zouaoui, Jinhyeok Jang, Hassen Drira, Hyewon Seo
Enhancing Clinical Support for Breast Cancer with Deep Learning Models Using Synthetic Correlated Diffusion Imaging
Breast cancer is the second most common type of cancer in women in Canada and the United States, representing over 25% of all new female cancer cases. As such, there has been immense research and progress on improving screening and clinical support for breast cancer. In this paper, we investigate enhancing clinical support for breast cancer with deep learning models using a newly introduced magnetic resonance imaging (MRI) modality called synthetic correlated diffusion imaging (CDIs). More specifically, we leverage a volumetric convolutional neural network to learn volumetric deep radiomic features from a pre-treatment cohort and construct a predictor based on the learnt features for grade and post-treatment response prediction. As the first study to learn CDIs-centric radiomic sequences within a deep learning perspective for clinical decision support, we evaluated the proposed approach using the ACRIN-6698 study against those learnt using gold-standard imaging modalities. We find that the proposed approach can achieve better performance for both grade and post-treatment response prediction and thus may be a useful tool to aid oncologists in improving recommendation of treatment of patients. Subsequently, the approach to leverage volumetric deep radiomic features for breast cancer can be further extended to other applications of CDIs in the cancer domain to further improve clinical support.
Chi-en Amy Tai, Hayden Gunraj, Nedim Hodzic, Nic Flanagan, Ali Sabri, Alexander Wong
Image-Based 3D Reconstruction of Cleft Lip and Palate Using a Learned Shape Prior
We present a novel pipeline that takes smartphone videos of the intraoral region of newborn cleft patients as input and produces a 3D mesh. The mesh can be used to facilitate the plate treatment of the cleft and support surgery planning. A retrained LoFTR-based method creates an initial sparse point cloud. Next, we utilize our collection of existing scans of previous patients to train an implicit shape model. The shape model allows for refined denoising of the initial sparse point cloud and; therefore, enhances the camera pose estimation. Finally, we complete the model with a dense reconstruction based on multi-view stereo. With Moving Least Squares and Poisson reconstruction we convert the point cloud into a mesh. This method is low-cost in hardware acquisition and supports minimal training time for a user to utilize it.
Lasse Lingens, Baran Gözcü, Till Schnabel, Yoriko Lill, Benito K. Benitez, Prasad Nalabothu, Andreas A. Mueller, Markus Gross, Barbara Solenthaler
Breaking down the Hierarchy: A New Approach to Leukemia Classification
The complexities inherent to leukemia, multifaceted cancer affecting white blood cells, pose considerable diagnostic and treatment challenges, primarily due to reliance on laborious morphological analyses and expert judgment that are susceptible to errors. Addressing these challenges, this study presents a refined, comprehensive strategy leveraging advanced deep-learning techniques for the classification of leukemia subtypes. We commence by developing a hierarchical label taxonomy, paving the way for differentiating between various subtypes of leukemia. The research further introduces a novel hierarchical approach inspired by clinical procedures capable of accurately classifying diverse types of leukemia alongside reactive and healthy cells. An integral part of this study involves a meticulous examination of the performance of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) as classifiers. The proposed method exhibits an impressive success rate, achieving approximately 90% accuracy across all leukemia subtypes, as substantiated by our experimental results. A visual representation of the experimental findings is provided to enhance the model’s explainability and aid in understanding the classification process.
Ibraheem Hamdi, Hosam El-Gendy, Ahmed Sharshar, Mohamed Saeed, Muhammad Ridzuan, Shahrukh K. Hashmi, Naveed Syed, Imran Mirza, Shakir Hussain, Amira Mahmoud Abdalla, Mohammad Yaqub
Single-Cell Spatial Analysis of Histopathology Images for Survival Prediction via Graph Attention Network
The tumor microenvironment is a complex ecosystem consisting of various immune and stromal cells in addition to neoplastic cells. The spatial interaction and organization of these cells play a critical role in tumor progression. Single-cell analysis of histopathology images offers an intrinsic advantage over traditional patch-based approach by providing fine-grained cellular information. However, existing studies do not perform explicit cell classification, and therefore still suffer from limited interpretability and lack biological relevance, which may negatively affect the performance for clinical outcome prediction. To address these challenges, we propose a cell-level contextual learning approach to explicitly capture the major cell types and their spatial interaction in the tumor microenvironment. To do this, we first segmented and classified each cell into tumor cells, lymphocytes, fibroblasts, macrophages, neutrophils, and other nonmalignant cells on histopathology images. Given this single-cell map, we constructed a graph and trained a graph attention network to learn the cell-level contextual features for survival prediction. Extensive experiments demonstrate that our model consistently outperform existing patch-based and cell graph-based approaches in two independent datasets. Further, we used the feature attribution method to discover distinct spatial patterns that are associated with prognosis, leading to biologically meaningful and interpretable results.
Zhe Li, Yuming Jiang, Leon Liu, Yong Xia, Ruijiang Li
Ultrafast Labeling for Multiplexed Immunobiomarkers from Label-free Fluorescent Images
Labeling pathological images based on different immunobiomarker holds immense clinical significance, serving as an instrumental tool in various fields such as disease diagnostics and biomedical research. However, the existing predominant techniques harnessed for immunobiomarker labeling, such as immunofluorescence (IF) and immunohistochemistry (IHC), are marred by shortcomings such as inconsistent specificity, cost/time-intensive staining procedures, and potential cellular damage incurred during labeling. In response to these impediments, deep-learning-powered generative models have emerged as a promising avenue for immunolabeling prediction, owing to their adeptness in image-to-image translation. To realize automatic immunolabeling prediction, we devised an auto-immunolabeling (Auto-iL) network capable of simultaneous labeling various immunobiomarkers by generating the corresponding immunofluorescence-stained images from dual-modal label-free inputs. To enhance the feature extraction potential of the Auto-iL network, we utilize random masked autoencoders on dual-modal. Subsequently, a self-attention block adeptly merges the dual features, which empowers a robust predictive capacity. In the experiments, immunolabeling performance of four biomarkers for gastric cancer patients was validated. Moreover, pathologists carried out clinical observation assessments on the immunolabeled results to ensure the reliability at the cellular level.
Zixia Zhou, Yuming Jiang, Ruijiang Li, Lei Xing
M U-Net: Intestine Segmentation Using Multi-dimensional Features for Ileus Diagnosis Assistance
The intestine is an essential digestive organ that can cause serious health problems once diseased. This paper proposes a method for intestine segmentation to intestine obstruction diagnosis assistance called multi-dimensional U-Net (M U-Net). We employ two encoders to extract features from two-dimensional (2D) CT slices and three-dimensional (3D) CT patches. These two encoders collaborate to enhance the segmentation accuracy of the model. Additionally, we incorporate deep supervision with the M U-Net to reduce the limitation of training with sparse label data sets. The experimental results demonstrated that the Dice of the proposed method was 73.22%, the recall was 79.89%, and the precision was 70.61%.
Qin An, Hirohisa Oda, Yuichiro Hayashi, Takayuki Kitasaka, Akinari Hinoki, Hiroo Uchida, Kojiro Suzuki, Aitaro Takimoto, Masahiro Oda, Kensaku Mori
Enhancing Cardiac MRI Segmentation via Classifier-Guided Two-Stage Network and All-Slice Information Fusion Transformer
Cardiac Magnetic Resonance imaging (CMR) is the gold standard for assessing cardiac function. Segmenting the left ventricle (LV), right ventricle (RV), and LV myocardium (MYO) in CMR images is crucial but time-consuming. Deep learning-based segmentation methods have emerged as effective tools for automating this process. However, CMR images present additional challenges due to irregular and varying heart shapes, particularly in basal and apical slices. In this study, we propose a classifier-guided two-stage network with an all-slice fusion transformer to enhance CMR segmentation accuracy, particularly in basal and apical slices. Our method was evaluated on extensive clinical datasets and demonstrated better performance in terms of Dice score compared to previous CNN-based and transformer-based models. Moreover, our method produces visually appealing segmentation shapes resembling human annotations and avoids common issues like holes or fragments in other models’ segmentation.
Zihao Chen, Xiao Chen, Yikang Liu, Eric Z. Chen, Terrence Chen, Shanhui Sun
Accessible Otitis Media Screening with a Deep Learning-Powered Mobile Otoscope
Otitis media (OM) is the leading cause of hearing loss in children globally, affecting nearly a billion people per year. Impoverished areas typically lack trained ear specialists, which prevents millions from being diagnosed and treated while causing severe complications. Currently, there is no viable diagnostic system for inexpensively and accurately detecting such ear conditions. This research presents OtoScan, a novel pipeline for the detection of middle ear infections using diagnosis networks and a cost-effective mobile otoscope. The physical attachment was developed using custom-designed 3D models, a compact magnification lens, fiber optics, and various electronics for illumination. To develop detection algorithms, public otoscopic images were collected and augmented with realistic perturbations. A dynamic ensemble of Inception-based architectures trained using transfer learning and label smoothing was developed to mitigate class imbalance and overconfidence while improving diagnostic accuracy for acute and chronic suppurative OM. Regions of interest are highlighted as gradient saliency maps in a smartphone application using Grad-CAM++. Evaluation shows that the proposed algorithm surpasses architectures such as CBAM in accuracy and F1 score. Further testing using an industry-standard medical simulator validated the potential viability of this system. With a production cost of $9.50 USD, OtoScan represents a step towards the democratization of ear care and improvement of patient outcomes.
Omkar Kovvali, Lakshmi Sritan Motati
Feature Selection for Malapposition Detection in Intravascular Ultrasound - A Comparative Study
Coronary atherosclerosis is a leading cause of morbidity and mortality worldwide. It is often treated by placing stents in the coronary arteries. Inappropriately placed stents or malappositions can result in post-interventional complications. Intravascular Ultrasound (IVUS) imaging offers a potential solution by providing real-time endovascular guidance for stent placement. The signature of malapposition is very subtle and requires exploring second-order relationships between blood flow patterns, vessel walls, and stents. In this paper, we perform a comparative study of various deep learning methods and their feature extraction capabilities for building a malapposition detector. Our results in the study address the importance of incorporating domain knowledge in performance improvement while still indicating the limitations of current systems for achieving clinically ready performance.
Satyananda Kashyap, Neerav Karani, Alexander Shang, Niharika D’Souza, Neel Dey, Lay Jain, Ray Wang, Hatice Akakin, Qian Li, Wenguang Li, Corydon Carlson, Polina Golland, Tanveer Syeda-Mahmood
Applications of Medical Artificial Intelligence
Shandong Wu
Behrouz Shabestari
Lei Xing
Copyright Year
Electronic ISBN
Print ISBN

Premium Partner