Skip to main content

2024 | Book

Machine Learning in Medical Imaging

14th International Workshop, MLMI 2023, Held in Conjunction with MICCAI 2023, Vancouver, BC, Canada, October 8, 2023, Proceedings, Part II


About this book

The two-volume set LNCS 14348 and 14139 constitutes the proceedings of the 14th International Workshop on Machine Learning in Medical Imaging, MLMI 2023, held in conjunction with MICCAI 2023, in Vancouver, Canada, in October 2023.
The 93 full papers presented in the proceedings were carefully reviewed and selected from 139 submissions. They focus on major trends and challenges in artificial intelligence and machine learning in the medical imaging field, translating medical imaging research into clinical practice. Topics of interests included deep learning, generative adversarial learning, ensemble learning, transfer learning, multi-task learning, manifold learning, reinforcement learning, along with their applications to medical image analysis, computer-aided diagnosis, multi-modality fusion, image reconstruction, image retrieval, cellular image analysis, molecular imaging, digital pathology, etc.

Table of Contents

GEMTrans: A General, Echocardiography-Based, Multi-level Transformer Framework for Cardiovascular Diagnosis

Echocardiography (echo) is an ultrasound imaging modality that is widely used for various cardiovascular diagnosis tasks. Due to inter-observer variability in echo-based diagnosis, which arises from the variability in echo image acquisition and the interpretation of echo images based on clinical experience, vision-based machine learning (ML) methods have gained popularity to act as secondary layers of verification. For such safety-critical applications, it is essential for any proposed ML method to present a level of explainability along with good accuracy. In addition, such methods must be able to process several echo videos obtained from various heart views and the interactions among them to properly produce predictions for a variety of cardiovascular measurements or interpretation tasks. Prior work lacks explainability or is limited in scope by focusing on a single cardiovascular task. To remedy this, we propose a General, Echo-based, Multi-Level Transformer (GEMTrans) framework that provides explainability, while simultaneously enabling multi-video training where the inter-play among echo image patches in the same frame, all frames in the same video, and inter-video relationships are captured based on a downstream task. We show the flexibility of our framework by considering two critical tasks including ejection fraction (EF) and aortic stenosis (AS) severity detection. Our model achieves mean absolute errors of 4.15 and 4.84 for single and dual-video EF estimation and an accuracy of 96.5% for AS detection, while providing informative task-specific attention maps and prototypical explainability.

Masoud Mokhtari, Neda Ahmadi, Teresa S. M. Tsang, Purang Abolmaesumi, Renjie Liao
Unsupervised Anomaly Detection in Medical Images with a Memory-Augmented Multi-level Cross-Attentional Masked Autoencoder

Unsupervised anomaly detection (UAD) aims to find anomalous images by optimising a detector using a training set that contains only normal images. UAD approaches can be based on reconstruction methods, self-supervised approaches, and Imagenet pre-trained models. Reconstruction methods, which detect anomalies from image reconstruction errors, are advantageous because they do not rely on the design of problem-specific pretext tasks needed by self-supervised approaches, and on the unreliable translation of models pre-trained from non-medical datasets. However, reconstruction methods may fail because they can have low reconstruction errors even for anomalous images. In this paper, we introduce a new reconstruction-based UAD approach that addresses this low-reconstruction error issue for anomalous images. Our UAD approach, the memory-augmented multi-level cross-attentional masked autoencoder (MemMC-MAE), is a transformer-based approach, consisting of a novel memory-augmented self-attention operator for the encoder and a new multi-level cross-attention operator for the decoder. MemMC-MAE masks large parts of the input image during its reconstruction, reducing the risk that it will produce low reconstruction errors because anomalies are likely to be masked and cannot be reconstructed. However, when the anomaly is not masked, then the normal patterns stored in the encoder’s memory combined with the decoder’s multi-level cross-attention will constrain the accurate reconstruction of the anomaly. We show that our method achieves SOTA anomaly detection and localisation on colonoscopy, pneumonia, and covid-19 chest x-ray datasets.

Yu Tian, Guansong Pang, Yuyuan Liu, Chong Wang, Yuanhong Chen, Fengbei Liu, Rajvinder Singh, Johan W. Verjans, Mengyu Wang, Gustavo Carneiro
LMT: Longitudinal Mixing Training, a Framework to Predict Disease Progression from a Single Image

Longitudinal imaging is able to capture both static anatomical structures and dynamic changes in disease progression toward earlier and better patient-specific pathology management. However, conventional approaches rarely take advantage of longitudinal information for detection and prediction purposes, especially for Diabetic Retinopathy (DR). In the past years, Mix-up training and pretext tasks with longitudinal context have effectively enhanced DR classification results and captured disease progression. In the meantime, a novel type of neural network named Neural Ordinary Differential Equation (NODE) has been proposed for solving ordinary differential equations, with a neural network treated as a black box. By definition, NODE is well suited for solving time-related problems. In this paper, we propose to combine these three aspects to detect and predict DR progression. Our framework, Longitudinal Mixing Training (LMT), can be considered both as a regularizer and as a pretext task that encodes the disease progression in the latent space. Additionally, we evaluate the trained model weights on a downstream task with a longitudinal context using standard and longitudinal pretext tasks. We introduce a new way to train time-aware models using $$t_{mix}$$ t mix , a weighted average time between two consecutive examinations. We compare our approach to standard mixing training on DR classification using OPHDIAT a longitudinal retinal Color Fundus Photographs (CFP) dataset. We were able to predict whether an eye would develop a severe DR in the following visit using a single image, with an AUC of 0.798 compared to baseline results of 0.641. Our results indicate that our longitudinal pretext task can learn the progression of DR disease and that introducing $$t_{mix}$$ t mix augmentation is beneficial for time-aware models.

Rachid Zeghlache, Pierre-Henri Conze, Mostafa El Habib Daho, Yihao Li, Hugo Le Boité, Ramin Tadayoni, Pascal Massin, Béatrice Cochener, Ikram Brahim, Gwenolé Quellec, Mathieu Lamard
Identifying Alzheimer’s Disease-Induced Topology Alterations in Structural Networks Using Convolutional Neural Networks

Identifying topology alterations in white matter connectivity has emerged as a promising avenue for exploring potential markers of Alzheimer’s disease (AD). However, conventional graph learning methods struggle to accurately represent the subtle and heterogeneous topology alterations caused by AD, leading to marginal classification accuracy. In this study, we address this issue through a two-fold approach. Firstly, to more reliably capture AD-induced alterations, we collect multi-shell high-angular resolution diffusion MRI data and construct a topology tensor to incorporate multiple edge-based attributes. Secondly, we propose a novel CNN framework called REST-Net, utilizing lightweight convolutional kernels to integrate the multiple attributes, enhancing its capacity for topology representation. With extensive experiments, REST-Net outperforms seven state-of-the-art graph learning methods for binary and tertiary classification tasks. Of utmost importance, the white matter connections identified by REST-Net guide the selection of target bundles for further analysis, which can potentially provide valuable insights for clinical and pharmacological investigations.

Feihong Liu, Yongsheng Pan, Junwei Yang, Fang Xie, Xiaowei He, Han Zhang, Feng Shi, Jun Feng, Qihao Guo, Dinggang Shen
Specificity-Aware Federated Graph Learning for Brain Disorder Analysis with Functional MRI

Resting-state functional magnetic resonance imaging (rs-fMRI) provides a non-invasive solution to explore abnormal brain connectivity patterns caused by brain disorders. Graph neural network (GNN) has been widely used for fMRI representation learning and brain disorder analysis, thanks to its potent graph representation abilities. Training a generalizable GNN model often requires large-scale subjects from different medical centers/sites, but the traditional centralized utilization of multi-site data unavoidably encounters challenges related to data privacy and storage. Federated learning (FL) can coordinate multiple sites to train a shared model without centrally integrating multi-site fMRI data. However, previous FL-based methods for fMRI analysis usually ignore specificity of each site, including factors such as age, gender, and population. To this end, we propose a specificity-aware federated graph learning (SFGL) framework for fMRI-based brain disorder diagnosis. The proposed SFGL consists of a shared branch and a personalized branch, where the parameters of the shared branch are sent to a server and the parameters of the personalized branch remain in each local site. In the shared branch, we employ a graph isomorphism network and a Transformer to learn dynamic representations from fMRI data. In the personalized branch, vectorized representations of demographic information (i.e., gender, age, and education) and functional connectivity network are integrated to capture specificity of each site. We aggregate representations learned by shared branches and personalized branches for classification. Experimental results on two fMRI datasets with a total of 1, 218 subjects demonstrate that SFGL outperforms several state-of-the-art methods.

Junhao Zhang, Xiaochuan Wang, Qianqian Wang, Lishan Qiao, Mingxia Liu
3D Transformer Based on Deformable Patch Location for Differential Diagnosis Between Alzheimer’s Disease and Frontotemporal Dementia

Alzheimer’s disease and Frontotemporal dementia are common types of neurodegenerative disorders that present overlapping clinical symptoms, making their differential diagnosis very challenging. Numerous efforts have been done for the diagnosis of each disease but the problem of multi-class differential diagnosis has not been actively explored. In recent years, transformer-based models have demonstrated remarkable success in various computer vision tasks. However, their use in disease diagnostic is uncommon due to the limited amount of 3D medical data given the large size of such models. In this paper, we present a novel 3D transformer-based architecture using a deformable patch location module to improve the differential diagnosis of Alzheimer’s disease and Frontotemporal dementia. Moreover, to overcome the problem of data scarcity, we propose an efficient combination of various data augmentation techniques, adapted for training transformer-based models on 3D structural magnetic resonance imaging data. Finally, we propose to combine our transformer-based model with a traditional machine learning model using brain structure volumes to better exploit the available data. Our experiments demonstrate the effectiveness of the proposed approach, showing competitive results compared to state-of-the-art methods. Moreover, the deformable patch locations can be visualized, revealing the most relevant brain regions used to establish the diagnosis of each disease.

Huy-Dung Nguyen, Michaël Clément, Boris Mansencal, Pierrick Coupé
Consisaug: A Consistency-Based Augmentation for Polyp Detection in Endoscopy Image Analysis

Colorectal cancer (CRC), which frequently originates from initially benign polyps, remains a significant contributor to global cancer-related mortality. Early and accurate detection of these polyps via colono-scopy is crucial for CRC prevention. However, traditional colonoscopy methods depend heavily on the operator’s experience, leading to suboptimal polyp detection rates. Besides, the public database are limited in polyp size and shape diversity. To enhance the available data for polyp detection, we introduce Consisaug, an innovative and effective methodology to augment data that leverages deep learning. We utilize the constraint that when the image is flipped the class label should be equal and the bonding boxes should be consistent. We implement our Consisaug on five public polyp datasets and at three backbones, and the results show the effectiveness of our method. All the codes are available at ( ).

Ziyu Zhou, Wenyuan Shen, Chang Liu
Cross-view Contrastive Mutual Learning Across Masked Autoencoders for Mammography Diagnosis

Mammography is a widely used screening tool for breast cancer, and accurate diagnosis is critical for the effective management of breast cancer. In this study, we propose a novel cross-view mutual learning method that leverages a Cross-view Masked Autoencoder (CMAE) and a Dual-View Affinity Matrix (DAM) to extract cross-view features and facilitate malignancy classification in mammography. CMAE aims to extract the underlying features from multi-view mammography data without relying on lesion labeling information or multi-view registration. DAM helps overcome the limitations of single-view models and identifies unique patterns and features in each view, thereby improving the accuracy and robustness of breast tissue representations. We evaluate our approach on a large-scale in-house mammography dataset and demonstrate promising results compared to existing methods. Additionally, we perform an ablation analysis to investigate the influence of different loss functions on the performance of our method. The results show that all the proposed components contribute positively to the final performance. In summary, the proposed cross-view mutual learning method shows great potential for assisting malignant classification.

Qingxia Wu, Hongna Tan, Zhi Qiao, Pei Dong, Dinggang Shen, Meiyun Wang, Zhong Xue
Modeling Life-Span Brain Age from Large-Scale Dataset Based on Multi-level Information Fusion

Predicted brain age could be used to measure individual brain status over development and degeneration, which could also indicate the potential risk of age-related brain disorders. Although various techniques for the estimation of brain age have been developed, most approaches only cover a small age range, either young or elderly period, leading to limited applications. In this work, we propose a novel approach to build a brain age prediction model on a lifespan dataset with T1-weighted magnetic resonance imaging (MRI) scans. First, we utilize different neural networks to extract features from 1) an original 3D MRI scan associated with the brain maturing and aging process, 2) three (axial, coronal, and sagittal) 2D slices selected based on prior knowledge to provide possible white matter hypointensity information, and 3) volume ratios of different brain regions related to maturing and aging. Then, these extracted features of multiple levels are fused by the transformer-based cross-attention mechanism to predict the brain age. Our experiments are conducted on a total of 5376 subjects aged from 6 to 96 years from 8 cohorts. In particular, our model is built on 3372 healthy subjects and applied to 2004 subjects with brain disorders. Experimental results show that our method achieves a mean absolute error (MAE) of 2.72 years between estimated brain age and chronological age. Furthermore, when applying our model to age-related brain disorders, it turns out that both cerebral small vessel disease (SVD) and Alzheimer’s disease (AD) groups demonstrate accelerated brain aging.

Nan Zhao, Yongsheng Pan, Kaicong Sun, Yuning Gu, Mianxin Liu, Zhong Xue, Han Zhang, Qing Yang, Fei Gao, Feng Shi, Dinggang Shen
Boundary-Constrained Graph Network for Tooth Segmentation on 3D Dental Surfaces

Accurate tooth segmentation on 3D dental models is an important task in computer-aided dentistry. In recent years, several deep learning-based methods have been proposed for automatic tooth segmentation. However, previous tooth segmentation methods often face challenges in accurately delineating boundaries, leading to a decline in overall segmentation performance. In this paper, we propose a boundary-constrained graph-based neural network that establishes the connectivity of mesh cells based on feature distances and utilizes several modules to encode local regions. To enhance segmentation performance in tooth-gingiva boundary regions, we integrate an auxiliary loss to segment the tooth and gingiva. Furthermore, to improve the performance in tooth-tooth boundary regions, we introduce a contrastive boundary-constrained loss that specifically enhances the distinctiveness of features within boundary mesh cells. Following the network prediction, we apply a post-processing step based on the graph cut to refine the boundaries. Experimental results demonstrate that our method achieves state-of-the-art performance in 3D tooth segmentation.

Yuwen Tan, Xiang Xiang
FAST-Net: A Coarse-to-fine Pyramid Network for Face-Skull Transformation

Face-skull transformation, i.e., shape transformation between facial surface and skull structure, has a wide range of applications in various fields such as forensic facial reconstruction and craniomaxillofacial (CMF) surgery planning. However, this transformation is a challenging task due to the significant differences between the geometric topologies of the face and skull shapes. In this paper, we propose a novel coarse-to-fine face-skull transformation network(i.e., FAST-Net) that has a pyramid architecture to gradually improve the transformation level by level. Specifically, using face-to-skull transformation for instance, in the first pyramid level, we use a point displacement sub-network to predict a coarse skull shape of point cloud from a given facial shape of point cloud with a skull template of point cloud as prior information. In the following pyramid levels, we further refine the predicted skull shape by first dividing the skull shape together with the given facial shape into different sub-regions, individually feeding the regions to a new sub-network, and merging the outputs as a refined skull shape. Finally, we generate a surface mesh model for the final predicted skull point cloud by non-rigidly registration with a skull template. Experimental results show that our method achieves the state-of-the-art performance on the task of face-skull transformation.

Lei Zhao, Lei Ma, Zhiming Cui, Jie Zheng, Zhong Xue, Feng Shi, Dinggang Shen
Mixing Histopathology Prototypes into Robust Slide-Level Representations for Cancer Subtyping

Whole-slide image analysis via the means of computational pathology often relies on processing tessellated gigapixel images with only slide-level labels available. Applying multiple instance learning-based methods or transformer models is computationally expensive as, for each image, all instances have to be processed simultaneously. The MLP-Mixer is an under-explored alternative model to common vision transformers, especially for large-scale datasets. Due to the lack of a self-attention mechanism, they have linear computational complexity to the number of input patches but achieve comparable performance on natural image datasets. We propose a combination of feature embedding and clustering to preprocess the full whole-slide image into a reduced prototype representation which can then serve as input to a suitable MLP-Mixer architecture. Our experiments on two public benchmarks and one inhouse malignant lymphoma dataset show comparable performance to current state-of-the-art methods, while achieving lower training costs in terms of computational time and memory load. Code is publicly available at .

Joshua Butke, Noriaki Hashimoto, Ichiro Takeuchi, Hiroaki Miyoshi, Koichi Ohshima, Jun Sakuma
Consistency Loss for Improved Colonoscopy Landmark Detection with Vision Transformers

Colonoscopy is a procedure used to examine the colon and rectum for colorectal cancer or other abnormalities including polyps or diverticula. Apart from the actual diagnosis, manually processing the snapshots taken during the colonoscopy procedure (for medical record keeping) consumes a large amount of the clinician’s time. This can be automated through post-procedural machine learning based algorithms which classify anatomical landmarks in the colon. In this work, we have developed a pipeline for training vision-transformers for identifying anatomical landmarks, including appendiceal orifice, ileocecal valve/cecum landmark and rectum retroflection. To increase the accuracy of the model, we utilize a hybrid approach that combines algorithm-level and data-level techniques. We introduce a consistency loss to enhance model immunity to label inconsistencies, as well as a semantic non-landmark sampling technique aimed at increasing focus on colonic findings. For training and testing our pipeline, we have annotated 307 colonoscopy videos and 2363 snapshots with the assistance of several medical experts for enhanced reliability. The algorithm identifies landmarks with an accuracy of 92% on the test dataset.

Aniruddha Tamhane, Daniel Dobkin, Ore Shtalrid, Moshe Bouhnik, Erez Posner, Tse’ela Mida
Radiomics Boosts Deep Learning Model for IPMN Classification

Intraductal Papillary Mucinous Neoplasm (IPMN) cysts are pre-malignant pancreas lesions, and they can progress into pancreatic cancer. Therefore, detecting and stratifying their risk level is of ultimate importance for effective treatment planning and disease control. However, this is a highly challenging task because of the diverse and irregular shape, texture, and size of the IPMN cysts as well as the pancreas. In this study, we propose a novel computer-aided diagnosis pipeline for IPMN risk classification from multi-contrast MRI scans. Our proposed analysis framework includes an efficient volumetric self-adapting segmentation strategy for pancreas delineation, followed by a newly designed deep learning-based classification scheme with a radiomics-based predictive approach. We test our proposed decision-fusion model in multi-center data sets of 246 multi-contrast MRI scans and obtain superior performance to the state of the art (SOTA) in this field. Our ablation studies demonstrate the significance of both radiomics and deep learning modules for achieving the new SOTA performance compared to international guidelines and published studies (81.9% vs 61.3% in accuracy). Our findings have important implications for clinical decision-making. In a series of rigorous experiments on multi-center data sets (246 MRI scans from five centers), we achieved unprecedented performance (81.9% accuracy). The code is available upon publication.

Lanhong Yao, Zheyuan Zhang, Ugur Demir, Elif Keles, Camila Vendrami, Emil Agarunov, Candice Bolan, Ivo Schoots, Marc Bruno, Rajesh Keswani, Frank Miller, Tamas Gonda, Cemal Yazici, Temel Tirkes, Michael Wallace, Concetto Spampinato, Ulas Bagci
Class-Balanced Deep Learning with Adaptive Vector Scaling Loss for Dementia Stage Detection

Alzheimer’s disease (AD) leads to irreversible cognitive decline, with Mild Cognitive Impairment (MCI) as its prodromal stage. Early detection of AD and related dementia is crucial for timely treatment and slowing disease progression. However, classifying cognitive normal (CN), MCI, and AD subjects using machine learning models faces class imbalance, necessitating the use of balanced accuracy as a suitable metric. To enhance model performance and balanced accuracy, we introduce a novel method called VS-Opt-Net. This approach incorporates the recently developed vector-scaling (VS) loss into a machine learning pipeline named STREAMLINE. Moreover, it employs Bayesian optimization for hyperparameter learning of both the model and loss function. VS-Opt-Net not only amplifies the contribution of minority examples in proportion to the imbalance level but also addresses the challenge of generalization in training deep networks. In our empirical study, we use MRI-based brain regional measurements as features to conduct the CN vs MCI and AD vs MCI binary classifications. We compare the balanced accuracy of our model with other machine learning models and deep neural network loss functions that also employ class-balanced strategies. Our findings demonstrate that after hyperparameter optimization, the deep neural network using the VS loss function substantially improves balanced accuracy. It also surpasses other models in performance on the AD dataset. Moreover, our feature importance analysis highlights VS-Opt-Net’s ability to elucidate biomarker differences across dementia stages.

Boning Tong, Zhuoping Zhou, Davoud Ataee Tarzanagh, Bojian Hou, Andrew J. Saykin, Jason Moore, Marylyn Ritchie, Li Shen
Enhancing Anomaly Detection in Melanoma Diagnosis Through Self-Supervised Training and Lesion Comparison

Melanoma, a highly aggressive form of skin cancer notorious for its rapid metastasis, necessitates early detection to mitigate complex treatment requirements. While considerable research has addressed melanoma diagnosis using convolutional neural networks (CNNs) on individual dermatological images, a deeper exploration of lesion comparison within a patient is warranted for enhanced anomaly detection, which often signifies malignancy. In this study, we present a novel approach founded on an automated, self-supervised framework for comparing skin lesions, working entirely without access to ground truth labels. Our methodology involves encoding lesion images into feature vectors using a state-of-the-art representation learner, and subsequently leveraging an anomaly detection algorithm to identify atypical lesions. Remarkably, our model achieves robust anomaly detection performance on ISIC 2020 without needing annotations, highlighting the efficacy of the representation learner in discerning salient image features. These findings pave the way for future research endeavors aimed at developing better predictive models as well as interpretable tools that enhance dermatologists’ efficacy in scrutinizing skin lesions.

Jules Collenne, Rabah Iguernaissi, Séverine Dubuisson, Djamal Merad
DynBrainGNN: Towards Spatio-Temporal Interpretable Graph Neural Network Based on Dynamic Brain Connectome for Psychiatric Diagnosis

Mounting evidence has highlighted the involvement of altered functional connectivity (FC) within resting-state functional networks in psychiatric disorder. Considering the fact that the FCs of the brain can be viewed as a network, graph neural networks (GNNs) have recently been applied to develop useful diagnostic tools and analyze the brain connectome, providing new insights into the functional mechanisms of the psychiatric disorders. Despite promising results, existing GNN-based diagnostic models are usually unable to incorporate the dynamic properties of the FC network, which fluctuates over time. Furthermore, it is difficult to produce temporal interpretability and obtain temporally attended brain markers elucidating the underlying neural mechanisms and diagnostic decisions. These issues hinder their possible clinical applications for the diagnosis and intervention of psychiatric disorder. In this study, we propose DynBrainGNN, a novel GNN architecture to analysis dynamic brain connectome, by leveraging dynamic variational autoencoders (DVAE) and spatio-temporal attention. DynBrainGNN is capable of obtaining disease-specific dynamic brain network patterns and quantifying the temporal properties of brain. We conduct experiments on three distinct real-world psychiatric datasets, and our results indicate that DynBrainGNN achieves exceptional performance. Moreover, DynBrainGNN effectively identifies clinically meaningful brain markers that align with current neuro-scientific knowledge.

Kaizhong Zheng, Bin Ma, Badong Chen
Precise Localization Within the GI Tract by Combining Classification of CNNs and Time-Series Analysis of HMMs

This paper presents a method to efficiently classify the gastroenterologic section of images derived from Video Capsule Endoscopy (VCE) studies by exploring the combination of a Convolutional Neural Network (CNN) for classification with the time-series analysis properties of a Hidden Markov Model (HMM). It is demonstrated that successive time-series analysis identifies and corrects errors in the CNN output. Our approach achieves an accuracy of $$98.04\%$$ 98.04 % on the Rhode Island (RI) Gastroenterology dataset. This allows for precise localization within the gastrointestinal (GI) tract while requiring only approximately 1M parameters and thus, provides a method suitable for low power devices.

Julia Werner, Christoph Gerum, Moritz Reiber, Jörg Nick, Oliver Bringmann
Towards Unified Modality Understanding for Alzheimer’s Disease Diagnosis Using Incomplete Multi-modality Data

Multi-modal neuroimaging data, e.g., magnetic resonance imaging (MRI) and positron emission tomography (PET), has greatly advanced computer-aided diagnosis of Alzheimer’s disease (AD) and its prodromal stage, i.e., mild cognitive impairment (MCI). However, incomplete multi-modality data often limits the diagnostic performance of deep learning-based methods, as only partial data can be used for training neural networks, and meanwhile it is challenging to synthesize missing scans (e.g., PET) with meaningful patterns associated with AD. To this end, we propose a novel unified modality understanding network to directly extract discriminative features from incomplete multi-modal data for AD diagnosis. Specifically, the incomplete multi-modal neuroimages are first branched into the corresponding encoders to extract modality-specific features and a Transformer is then applied to adaptively fuse the incomplete multi-modal features for AD diagnosis. To alleviate the potential problem of domain shift due to incomplete multi-modal input, the cross-modality contrastive learning strategy is further leveraged to align the incomplete multi-modal features into a unified embedding space. On the other hand, the proposed network also employs inter-modality and intra-modality attention weights for achieving local- to-local and local-to-global attention consistency so as to better transfer the diagnostic knowledge from one modality to another. Meanwhile, we leverage multi-instance attention rectification to rectify the localization of AD-related atrophic area. Extensive experiments on ADNI datasets with 1,950 subjects demonstrate the superior performance of the proposed methods for AD diagnosis and MCI conversion prediction.

Kangfu Han, Fenqiang Zhao, Dajiang Zhu, Tianming Liu, Feng Yang, Gang Li
COVID-19 Diagnosis Based on Swin Transformer Model with Demographic Information Fusion and Enhanced Multi-head Attention Mechanism

Coronavirus disease 2019 (COVID-19) is an acute disease, which can rapidly become severe. Hence, it is of great significance to realize the automatic diagnosis of COVID-19. However, existing models are often inapplicable for fusing patients’ demographic information due to its low dimensionality. To address this, we propose a COVID-19 patient diagnosis method with feature fusion and a model based on Swin Transformer. Specifically, two auxiliary tasks are added for fusing computed tomography (CT) images and patients’ demographic information, which utilizes the patients’ demographic information as the label for the auxiliary tasks. Besides, our approach involves designing a Swin Transformer model with Enhanced Multi-head Self-Attention (EMSA) to capture different features from CT data. Meanwhile, the EMSA module is able to extract and fuse attention information in different representation subspaces, further enhancing the performance of the model. Furthermore, we evaluate our model in COVIDx CT-3 dataset with different tasks to classify Normal Controls (NC), COVID-19 cases and community-acquired pneumonia (CAP) cases and compare the performance of our method with other models, which show the effectiveness of our model.

Yunlong Sun, Yiyao Liu, Junlong Qu, Xiang Dong, Xuegang Song, Baiying Lei
MoViT: Memorizing Vision Transformers for Medical Image Analysis

The synergy of long-range dependencies from transformers and local representations of image content from convolutional neural networks (CNNs) has led to advanced architectures and increased performance for various medical image analysis tasks due to their complementary benefits. However, compared with CNNs, transformers require considerably more training data, due to a larger number of parameters and an absence of inductive bias. The need for increasingly large datasets continues to be problematic, particularly in the context of medical imaging, where both annotation efforts and data protection result in limited data availability. In this work, inspired by the human decision-making process of correlating new “evidence” with previously memorized “experience”, we propose a Memorizing Vision Transformer (MoViT) to alleviate the need for large-scale datasets to successfully train and deploy transformer-based architectures. MoViT leverages an external memory structure to cache history attention snapshots during the training stage. To prevent overfitting, we incorporate an innovative memory update scheme, attention temporal moving average, to update the stored external memories with the historical moving average. For inference speedup, we design a prototypical attention learning method to distill the external memory into smaller representative subsets. We evaluate our method on a public histology image dataset and an in-house MRI dataset, demonstrating that MoViT applied to varied medical image analysis tasks, can outperform vanilla transformer models across varied data regimes, especially in cases where only a small amount of annotated data is available. More importantly, MoViT can reach a competitive performance of ViT with only 3.0% of the training data. In conclusion, MoViT provides a simple plug-in for transformer architectures which may contribute to reducing the training data needed to achieve acceptable models for a broad range of medical image analysis tasks.

Yiqing Shen, Pengfei Guo, Jingpu Wu, Qianqi Huang, Nhat Le, Jinyuan Zhou, Shanshan Jiang, Mathias Unberath
Fact-Checking of AI-Generated Reports

With advances in generative artificial intelligence (AI), it is now possible to produce realistic-looking automated reports for preliminary reads of radiology images. However, it is also well-known that such models often hallucinate, leading to false findings in the generated reports. In this paper, we propose a new method of fact-checking of AI-generated reports using their associated images. Specifically, the developed examiner differentiates real and fake sentences in reports by learning the association between an image and sentences describing real or potentially fake findings. To train such an examiner, we first created a new dataset of fake reports by perturbing the findings in the original ground truth radiology reports associated with images. Text encodings of real and fake sentences drawn from these reports are then paired with image encodings to learn the mapping to real/fake labels. The examiner is then demonstrated for verifying automatically generated reports.

Razi Mahmood, Ge Wang, Mannudeep Kalra, Pingkun Yan
Is Visual Explanation with Grad-CAM More Reliable for Deeper Neural Networks? A Case Study with Automatic Pneumothorax Diagnosis

While deep learning techniques have provided the state-of-the-art performance in various clinical tasks, explainability regarding their decision-making process can greatly enhance the credence of these methods for safer and quicker clinical adoption. With high flexibility, Gradient-weighted Class Activation Mapping (Grad-CAM) has been widely adopted to offer intuitive visual interpretation of various deep learning models’ reasoning processes in computer-assisted diagnosis. However, despite the popularity of the technique, there is still a lack of systematic study on Grad-CAM’s performance on different deep learning architectures. In this study, we investigate its robustness and effectiveness across different popular deep learning models, with a focus on the impact of the networks’ depths and architecture types, by using a case study of automatic pneumothorax diagnosis in X-ray scans. Our results show that deeper neural networks do not necessarily contribute to a strong improvement of pneumothorax diagnosis accuracy, and the effectiveness of GradCAM also varies among different network architectures.

Zirui Qiu, Hassan Rivaz, Yiming Xiao
Group Distributionally Robust Knowledge Distillation

Knowledge distillation enables fast and effective transfer of features learned from a bigger model to a smaller one. However, distillation objectives are susceptible to sub-population shifts, a common scenario in medical imaging analysis which refers to groups/domains of data that are underrepresented in the training set. For instance, training models on health data acquired from multiple scanners or hospitals can yield subpar performance for minority groups. In this paper, inspired by distributionally robust optimization (DRO) techniques, we address this shortcoming by proposing a group-aware distillation loss. During optimization, a set of weights is updated based on the per-group losses at a given iteration. This way, our method can dynamically focus on groups that have low performance during training. We empirically validate our method, GroupDistil on two benchmark datasets (natural images and cardiac MRIs) and show consistent improvement in terms of worst-group accuracy.

Konstantinos Vilouras, Xiao Liu, Pedro Sanchez, Alison Q. O’Neil, Sotirios A. Tsaftaris
A Bone Lesion Identification Network (BLIN) in CT Images with Weakly Supervised Learning

Malignant bone lesions often lead to poor prognosis if not detected and treated in time. It also influences the treatment plan for primary tumor. However, diagnosing these lesions can be challenging due to their subtle appearance resemblances to other pathological conditions. Precise segmentation can help identify lesion types but the regions of interest (ROIs) are often difficult to delineate, particularly for bone lesions. We propose a bone lesion identification network (BLIN) in whole body non-contrast CT scans based on weakly supervised learning through class activation map (CAM). In the algorithm, location of the focal box of each lesion is used to supervise network training through CAM. Compared with precise segmentation, focal boxes are relatively easy to be obtained either by manual annotation or automatic detection algorithms. Additionally, to deal with uneven distribution of training samples of different lesion types, a new sampling strategy is employed to reduce overfitting of the majority classes. Instead of using complicated network structures such as grouping and ensemble for long-tailed data classification, we use a single-branch structure with CBAM attention to prove the effectiveness of the weakly supervised method. Experiments were carried out using bone lesion dataset, and the results showed that the proposed method outperformed the state-of-the-art algorithms for bone lesion classification.

Kehao Deng, Bin Wang, Shanshan Ma, Zhong Xue, Xiaohuan Cao
Post-Deployment Adaptation with Access to Source Data via Federated Learning and Source-Target Remote Gradient Alignment

Deployment of Deep Neural Networks in medical imaging is hindered by distribution shift between training data and data processed after deployment, causing performance degradation. Post-Deployment Adaptation (PDA) addresses this by tailoring a pre-trained, deployed model to the target data distribution using limited labelled or entirely unlabelled target data, while assuming no access to source training data as they cannot be deployed with the model due to privacy concerns and their large size. This makes reliable adaptation challenging due to limited learning signal. This paper challenges this assumption and introduces FedPDA, a novel adaptation framework that brings the utility of learning from remote data from Federated Learning into PDA. FedPDA enables a deployed model to obtain information from source data via remote gradient exchange, while aiming to optimize the model specifically for the target domain. Tailored for FedPDA, we introduce a novel optimization method StarAlign (Source-Target Remote Gradient Alignment) that aligns gradients between source-target domain pairs by maximizing their inner product, to facilitate learning a target-specific model. We demonstrate the method’s effectiveness using multi-center databases for the tasks of cancer metastases detection and skin lesion classification, where our method compares favourably to previous work. Code is available at: .

Felix Wagner, Zeju Li, Pramit Saha, Konstantinos Kamnitsas
Data-Driven Classification of Fatty Liver From 3D Unenhanced Abdominal CT Scans

Fatty liver disease is a prevalent condition with significant health implications and early detection may prevent adverse outcomes. In this study, we developed a data-driven classification framework using deep learning to classify fatty liver disease from unenhanced abdominal CT scans. The framework consisted of a two-stage pipeline: 3D liver segmentation and feature extraction, followed by a deep learning classifier. We compared the performance of different deep learning feature representations with volumetric liver attenuation, a hand-crafted radiomic feature. Additionally, we assessed the predictive capability of our classifier for the future occurrence of fatty liver disease. The deep learning models outperformed the liver attenuation model for baseline fatty liver classification, with an AUC of 0.90 versus 0.86, respectively. Furthermore, our classifier was better able to detect mild degrees of steatosis and demonstrated the ability to predict future occurrence of fatty liver disease.

Jacob S. Leiby, Matthew E. Lee, Eun Kyung Choe, Dokyoon Kim
Replica-Based Federated Learning with Heterogeneous Architectures for Graph Super-Resolution

Having access to brain connectomes at various resolutions is important for clinicians, as they can reveal vital information about brain anatomy and function. However, the process of deriving the graphs from magnetic resonance imaging (MRI) is computationally expensive and error-prone. Furthermore, an existing challenge in the medical domain is the small amount of data that is available, as well as privacy concerns. In this work, we propose a new federated learning framework, named RepFL. At its core, RepFL is a replica-based federated learning approach for heterogeneous models, which creates replicas of each participating client by copying its model architecture and perturbing its local training dataset. This solution enables learning from limited data with a small number of participating clients by aggregating multiple local models and diversifying the data distributions of the clients. Specifically, we apply the framework for graph super-resolution using heterogeneous model architectures. In addition, to the best of our knowledge, this is the first federated multi-resolution graph generation approach. Our experiments prove that the method outperforms other federated learning methods on the task of brain graph super-resolution. Our RepFL code is available at .

Ramona Ghilea, Islem Rekik
A Multitask Deep Learning Model for Voxel-Level Brain Age Estimation

Global brain age estimation has been used as an effective biomarker to study the correlation between brain aging and neurological disorders. However, it fails to provide spatial information on the brain aging process. Voxel-level brain age estimation can give insights into how different regions of the brain age in a diseased versus healthy brain. We propose a multitask deep-learning-based model that predicts voxel-level brain age with a Mean Absolute Error (MAE) of 5.30 years on our test set (n=50) and 6.92 years on an independent test set (n = 359). The results of our model outperformed a recently proposed voxel-level age prediction model. The source code and pre-trained models will be made publicly available to make our research reproducible.

Neha Gianchandani, Johanna Ospel, Ethan MacDonald, Roberto Souza
Deep Nearest Neighbors for Anomaly Detection in Chest X-Rays

Identifying medically abnormal images is crucial to the diagnosis procedure in medical imaging. Due to the scarcity of annotated abnormal images, most reconstruction-based approaches for anomaly detection are trained only with normal images. At test time, images with large reconstruction errors are declared abnormal. In this work, we propose a novel feature-based method for anomaly detection in chest x-rays in a setting where only normal images are provided during training. The model consists of lightweight adaptor and predictor networks on top of a pre-trained feature extractor. The parameters of the pre-trained feature extractor are frozen, and training only involves fine-tuning the proposed adaptor and predictor layers using Siamese representation learning. During inference, multiple augmentations are applied to the test image, and our proposed anomaly score is simply the geometric mean of the k-nearest neighbor distances between the augmented test image features and the training image features. Our method achieves state-of-the-art results on two challenging benchmark datasets, the RSNA Pneumonia Detection Challenge dataset, and the VinBigData Chest X-ray Abnormalities Detection dataset. Furthermore, we empirically show that our method is robust to different amounts of anomalies among the normal images in the training dataset. The code is available at: .

Xixi Liu, Jennifer Alvén, Ida Häggström, Christopher Zach
CCMix: Curriculum of Class-Wise Mixup for Long-Tailed Medical Image Classification

Deep learning-based methods have been widely used for medical image classification. However, in clinical practice, rare diseases are usually underrepresented with limited labeled data, which result in long-tailed medical datasets and significantly degrade the performance of deep classification networks. Previous strategies employ re-sampling or re-weighting techniques to alleviate this issue by increasing the influence of underrepresented classes and reducing the influence of overrepresented ones. Still, poor performance may occur due to overfitting of the tail classes. Further, Mixup is employed to introduce additional information into model training. Despite considerable improvements, the significant noise in medical images means that random batch mixing may introduce ambiguity into training, thereby impair the performance. This observation motivates us to develop a fine-grained mixing approach. In this paper we present Curriculum of Class-wise Mixup (CCMix), a novel method for addressing the challenge of long-tailed distributions. CCMix leverages a novel curriculum that takes into account both the degree of mixing and the class-wise performance to identify the ideal Mixup proportions of different classes. Our method’s simplicity enables its effortless integration with existing long-tailed recognition techniques. Comprehensive experiments on two long-tailed medical image classification datasets demonstrate that our method, requiring no modifications to the framework structure or algorithmic details, achieves state-of-the-art results across diverse long-tailed classification benchmarks. The source code is available at .

Sirui Li, Fuheng Zhang, Tianyunxi Wei, Li Lin, Yijin Huang, Pujin Cheng, Xiaoying Tang
MEDKD: Enhancing Medical Image Classification with Multiple Expert Decoupled Knowledge Distillation for Long-Tail Data

Medical image classification is a challenging task, particularly when dealing with long-tailed datasets where rare diseases are underrepresented. The imbalanced class distribution in such datasets poses significant challenges in accurately classifying minority classes. Existing methods for alleviating the long-tail problem in medical image classification suffer from limitations such as noise introduction, loss of crucial information, and the need for manual tuning and additional computational resources. In this study, we propose a novel framework called Multiple Expert Decoupled Knowledge Distillation (MEDKD) to tackle the imbalanced class distribution in medical image classification. The knowledge distillation of multiple teacher models can significantly alleviate the class imbalance by partitioning the dataset into several subsets. However, current frameworks of this kind have not yet explored the integration of more advanced distillation methods. Our framework incorporating TCKD and NCKD concepts to improve classification performance. Through comprehensive experiments on publicly available datasets, we evaluate the performance of MEDKD and compare it with state-of-the-art methods. Our results demonstrate remarkable accuracy improvements achieved by the proposed method, highlighting its effectiveness in alleviating the challenges of medical image classification with long-tailed datasets.

Fuheng Zhang, Sirui Li, Tianyunxi Wei, Li Lin, Yijin Huang, Pujin Cheng, Xiaoying Tang
Leveraging Ellipsoid Bounding Shapes and Fast R-CNN for Enlarged Perivascular Spaces Detection and Segmentation

Enlarged perivascular spaces (EPVS) are small fluid-filled spaces surrounding blood vessels in the brain. They have been found to be important in the development and progression of cerebrovascular disease, including stroke, dementia, and cerebral small vessel disease. Their accurate detection and quantification are crucial for early diagnosis and better management of these diseases.In recent years, object detection techniques such as Mask R-CNN approach have been widely used to automate the detection and segmentation of small objects. To account for the tubular shape of these markers we use ellipsoid shapes instead of bounding boxes to express the location of individual elements in the implementation of the Fast R-CNN. We investigate the performance of this model under different modality combinations and find that the T2 modality alone, as well as the combination of T1+T2, deliver better performance.

Mariam Zabihi, Chayanin Tangwiriyasakul, Silvia Ingala, Luigi Lorenzini, Robin Camarasa, Frederik Barkhof, Marleen de Bruijne, M. Jorge Cardoso, Carole H. Sudre
Non-uniform Sampling-Based Breast Cancer Classification

The emergence of deep learning models and their remarkable success in visual object recognition and detection have fueled the medical imaging community’s interest in integrating these algorithms to improve medical screening and diagnosis. However, natural images, which have been the main focus of deep learning models, and medical images, such as mammograms, have fundamental differences. First, breast tissue abnormalities are often smaller than salient objects in natural images. Second, breast images have significantly higher resolutions. To fit these images to deep learning approaches, they must be heavily downsampled. Otherwise, models that address high-resolution mammograms require many exams and complex architectures. Spatially resizing mammograms leads to losing discriminative details that are essential for accurate diagnosis. To address this limitation, we develop an approach to exploit the relative importance of pixels in mammograms by conducting non-uniform sampling based on task-salient regions generated by a convolutional network. Classification results demonstrate that non-uniformly sampled images preserve discriminant features requiring lower resolutions to outperform their uniformly sampled counterparts.

Santiago Posso Murillo, Oscar Skean, Luis G. Sanchez Giraldo
A Scaled Denoising Attention-Based Transformer for Breast Cancer Detection and Classification

Breast cancer significantly threatens women’s health, and early, accurate diagnosis via mammogram screening has considerably reduced overall disease burden and mortality. Computer-Aided Diagnosis (CAD) systems have been used to assist radiologists by automatically detecting, segmenting, and classifying medical images. However, precise breast lesion diagnosis has remained challenging. In this paper, we propose a novel approach for breast cancer detection and classification in screening mammograms. Our model is a hybrid of CNN and Transformers, specifically designed to detect and classify breast cancer. The model first utilizes a depthwise convolution-based hierarchical backbone for deep feature extraction, coupled with an Enhancement Feature Block (EFB) to capture and aggregate multi-level features to the same scale. Subsequently, it introduces a transformer with Scale-Denoising Attention (SDA) to simultaneously capture global features. Finally, the model employs regression and classification heads for detecting and localizing lesions and classifying mammogram images. We evaluate the proposed model using the CBIS-DDSM dataset and compare its performance with those of state-of-the-art models. Our experimental results and extensive ablation studies demonstrate that our method outperforms others in both detection and classification tasks.

Masum Shah Junayed, Sheida Nabavi
Distilling Local Texture Features for Colorectal Tissue Classification in Low Data Regimes

Multi-class colorectal tissue classification is a challenging problem that is typically addressed in a setting, where it is assumed that ample amounts of training data is available. However, manual annotation of fine-grained colorectal tissue samples of multiple classes, especially the rare ones like stromal tumor and anal cancer is laborious and expensive. To address this, we propose a knowledge distillation-based approach, named KD-CTCNet, that effectively captures local texture information from few tissue samples, through a distillation loss, to improve the standard CNN features. The resulting enriched feature representation achieves improved classification performance specifically in low data regimes. Extensive experiments on two public datasets of colorectal tissues reveal the merits of the proposed contributions, with a consistent gain achieved over different approaches across low data settings. The code and models are publicly available on GitHub .

Dmitry Demidov, Roba Al Majzoub, Amandeep Kumar, Fahad Khan
Delving into Ipsilateral Mammogram Assessment Under Multi-view Network

In many recent years, multi-view mammogram analysis has been focused widely on AI-based cancer assessment. In this work, we aim to explore diverse fusion strategies (average and concatenate) and examine the model’s learning behavior with varying individuals and fusion pathways, involving Coarse Layer and Fine Layer. The Ipsilateral Multi-View Network, comprising five fusion types (Pre, Early, Middle, Last, and Post Fusion) in ResNet-18, is employed. Notably, the Middle Fusion emerges as the most balanced and effective approach, enhancing deep-learning models’ generalization performance by +5.29% (concatenate) and +5.9% (average) in VinDr-Mammo dataset and +2.03% (concatenate) and +3% (average) in CMMD dataset on macro F1-Score. The paper emphasizes the crucial role of layer assignment in multi-view network extraction with various strategies.

Toan T. N. Truong, Huy T. Nguyen, Thinh B. Lam, Duy V. M. Nguyen, Phuc H. Nguyen
ARHNet: Adaptive Region Harmonization for Lesion-Aware Augmentation to Improve Segmentation Performance

Accurately segmenting brain lesions in MRI scans is critical for providing patients with prognoses and neurological monitoring. However, the performance of CNN-based segmentation methods is constrained by the limited training set size. Advanced data augmentation is an effective strategy to improve the model’s robustness. However, they often introduce intensity disparities between foreground and background areas and boundary artifacts, which weakens the effectiveness of such strategies. In this paper, we propose a foreground harmonization framework (ARHNet) to tackle intensity disparities and make synthetic images look more realistic. In particular, we propose an Adaptive Region Harmonization (ARH) module to dynamically align foreground feature maps to the background with an attention mechanism. We demonstrate the efficacy of our method in improving the segmentation performance using real and synthetic images. Experimental results on the ATLAS 2.0 dataset show that ARHNet outperforms other methods for image harmonization tasks, and boosts the down-stream segmentation performance. Our code is publicly available at .

Jiayu Huo, Yang Liu, Xi Ouyang, Alejandro Granados, Sébastien Ourselin, Rachel Sparks
Normative Aging for an Individual’s Full Brain MRI Using Style GANs to Detect Localized Neurodegeneration

In older adults, changes in brain structure can be used to identify and predict the risk of neurodegenerative disorders and dementias. Traditional ‘brainAge’ methods seek to identify differences between chronological age and biological brain age predicted from MRI that can indicate deviations from normative aging trajectories. These methods provide one metric for the entire brain and lack anatomical specificity and interpretability. By predicting an individual’s healthy brain at a specific age, one may be able to identify regional deviations and abnormalities in a true brain scan relative to healthy aging. This study aims to address the problem of domain transfer in estimating age-related brain differences. We develop a fully unsupervised generative adversarial network (GAN) with cycle consistency reconstruction losses, trained on 4,000 cross-sectional brain MRI data from UK Biobank participants aged 60 to 80. By converting the individual anatomic information from their T1-weighted MRI as “content” and adding the “style” information related to age and sex from a reference group, we demonstrate that brain MRIs for healthy males and females at any given age can be predicted from one cross-sectional scan. Paired with a full brain T1w harmonization method, this new MRI can also be generated for any image from any scanner. Results in the ADNI cohort showed that without relying on longitudinal data from the participants, our style-encoding domain transfer model might successfully predict cognitively normal follow-up brain MRIs. We demonstrate how variations from the expected structure are a sign of a potential risk for neurodegenerative diseases.

Shruti P. Gadewar, Alyssa H. Zhu, Sunanda Somu, Abhinaav Ramesh, Iyad Ba Gari, Sophia I. Thomopoulos, Paul M. Thompson, Talia M. Nir, Neda Jahanshad
Deep Bayesian Quantization for Supervised Neuroimage Search

Neuroimage retrieval plays a crucial role in providing physicians with access to previous similar cases, which is essential for case-based reasoning and evidence-based medicine. Due to low computation and storage costs, hashing-based search techniques have been widely adopted for establishing image retrieval systems. However, these methods often suffer from nonnegligible quantization loss, which can degrade the overall search performance. To address this issue, this paper presents a compact coding solution namely Deep Bayesian Quantization (DBQ), which focuses on deep compact quantization that can estimate continuous neuroimage representations and achieve superior performance over existing hashing solutions. Specifically, DBQ seamlessly combines the deep representation learning and the representation compact quantization within a novel Bayesian learning framework, where a proxy embedding-based likelihood function is developed to alleviate the sampling issue for traditional similarity supervision. Additionally, a Gaussian prior is employed to reduce the quantization losses. By utilizing pre-computed lookup tables, the proposed DBQ can enable efficient and effective similarity search. Extensive experiments conducted on 2, 008 structural MRI scans from three benchmark neuroimage datasets demonstrate that our method outperforms previous state-of-the-arts.

Erkun Yang, Cheng Deng, Mingxia Liu
Triplet Learning for Chest X-Ray Image Search in Automated COVID-19 Analysis

Chest radiology images such as CT scans and X-ray images have been extensively employed in computer-assisted analysis of COVID-19, utilizing various learning-based techniques. As a trending topic, image retrieval is a practical solution by providing users with a selection of remarkably similar images from a retrospective database, thereby assisting in timely diagnosis and intervention. Many existing studies utilize deep learning algorithms for chest radiology image retrieval by extracting features from images and searching the most similar images based on the extracted features. However, these methods seldom consider the complex relationship among images (e.g., images belonging to the same category tend to share similar representations, and vice versa), which may result in sub-optimal retrieval accuracy. In this paper, we develop a triplet-constrained image retrieval (TIR) framework for chest radiology image search to aid in COVID-19 diagnosis. The TIR contains two components: (a) feature extraction and (b) image retrieval, where a triplet constraint and an image reconstruction constraint are embedded to enhance the discriminative ability of learned features. In particular, the triplet constraint is designed to minimize the distances between images belonging to the same category and maximize the distances between images from different categories. Based on the extracted features, we further perform chest X-ray (CXR) image search. Experimental results on a total of 29, 986 CXR images from a public COVIDx dataset with 16, 648 subjects demonstrate the effectiveness of the proposed method compared with several state-of-the-art approaches.

Linmin Wang, Qianqian Wang, Xiaochuan Wang, Yunling Ma, Lishan Qiao, Mingxia Liu
Cascaded Cross-Attention Networks for Data-Efficient Whole-Slide Image Classification Using Transformers

Whole-Slide Imaging allows for the capturing and digitization of high-resolution images of histological specimen. An automated analysis of such images using deep learning models is therefore of high demand. The transformer architecture has been proposed as a possible candidate for effectively leveraging the high-resolution information. Here, the whole-slide image is partitioned into smaller image patches and feature tokens are extracted from these image patches. However, while the conventional transformer allows for a simultaneous processing of a large set of input tokens, the computational demand scales quadratically with the number of input tokens and thus quadratically with the number of image patches. To address this problem we propose a novel cascaded cross-attention network (CCAN) based on the cross-attention mechanism that scales linearly with the number of extracted patches. Our experiments demonstrate that this architecture is at least on-par with and even outperforms other attention-based state-of-the-art methods on two public datasets: On the use-case of lung cancer (TCGA NSCLC) our model reaches a mean area under the receiver operating characteristic (AUC) of 0.970 ± 0.008 and on renal cancer (TCGA RCC) reaches a mean AUC of 0.985 ± 0.004. Furthermore, we show that our proposed model is efficient in low-data regimes, making it a promising approach for analyzing whole-slide images in resource-limited settings. To foster research in this direction, we make our code publicly available on GitHub: .

Firas Khader, Jakob Nikolas Kather, Tianyu Han, Sven Nebelung, Christiane Kuhl, Johannes Stegmaier, Daniel Truhn
Enhanced Diagnostic Fidelity in Pathology Whole Slide Image Compression via Deep Learning

Accurate diagnosis of disease often depends on the exhaustive examination of Whole Slide Images (WSI) at microscopic resolution. Efficient handling of these data-intensive images requires lossy compression techniques. This paper investigates the limitations of the widely-used JPEG algorithm, the current clinical standard, and reveals severe image artifacts impacting diagnostic fidelity.To overcome these challenges, we introduce a novel deep-learning (DL)-based compression method tailored for pathology images. By enforcing feature similarity of deep features between the original and compressed images, our approach achieves superior Peak Signal-to-Noise Ratio (PSNR), Multi-Scale Structural Similarity Index (MS-SSIM), and Learned Perceptual Image Patch Similarity (LPIPS) scores compared to JPEG-XL, Webp, and other DL compression methods. Our method increases the PSNR value from 39 (JPEG80) to 41, indicating improved image fidelity and diagnostic accuracy.Our approach can help to drastically reduce storage costs while maintaining large levels of image quality. Our method is online available.

Maximilian Fischer, Peter Neher, Peter Schüffler, Shuhan Xiao, Silvia Dias Almeida, Constantin Ulrich, Alexander Muckenhuber, Rickmer Braren, Michael Götz, Jens Kleesiek, Marco Nolden, Klaus Maier-Hein
RoFormer for Position Aware Multiple Instance Learning in Whole Slide Image Classification

Whole slide image (WSI) classification is a critical task in computational pathology. However, the gigapixel-size of such images remains a major challenge for the current state of deep-learning. Current methods rely on multiple-instance learning (MIL) models with frozen feature extractors. Given the the high number of instances in each image, MIL methods have long assumed independence and permutation-invariance of patches, disregarding the tissue structure and correlation between patches. Recent works started studying this correlation between instances but the computational workload of such a high number of tokens remained a limiting factor. In particular, relative position of patches remains unaddressed.We propose to apply a straightforward encoding module, namely a RoFormer layer , relying on memory-efficient exact self-attention and relative positional encoding. This module can perform full self-attention with relative position encoding on patches of large and arbitrary shaped WSIs, solving the need for correlation between instances and spatial modeling of tissues. We demonstrate that our method outperforms state-of-the-art MIL models on three commonly used public datasets (TCGA-NSCLC, BRACS and Camelyon16)) on weakly supervised classification tasks.Code is available at .

Etienne Pochet, Rami Maroun, Roger Trullo
Structural Cycle GAN for Virtual Immunohistochemistry Staining of Gland Markers in the Colon

With the advent of digital scanners and deep learning, diagnostic operations may move from a microscope to a desktop. Hematoxylin and Eosin (H &E) staining is one of the most frequently used stains for disease analysis, diagnosis, and grading, but pathologists do need different immunohistochemical (IHC) stains to analyze specific structures or cells. Obtaining all of these stains (H &E and different IHCs) on a single specimen is a tedious and time-consuming task. Consequently, virtual staining has emerged as an essential research direction. Here, we propose a novel generative model, Structural Cycle-GAN (SC-GAN), for synthesizing IHC stains from H &E images, and vice versa. Our method expressly incorporates structural information in the form of edges (in addition to color data) and employs attention modules exclusively in the decoder of the proposed generator model. This integration enhances feature localization and preserves contextual information during the generation process. In addition, a structural loss is incorporated to ensure accurate structure alignment between the generated and input markers. To demonstrate the efficacy of the proposed model, experiments are conducted with two IHC markers emphasizing distinct structures of glands in the colon: the nucleus of epithelial cells (CDX2) and the cytoplasm (CK818). Quantitative metrics such as FID and SSIM are frequently used for the analysis of generative models, but they do not correlate explicitly with higher-quality virtual staining results. Therefore, we propose two new quantitative metrics that correlate directly with the virtual staining specificity of IHC markers.

Shikha Dubey, Tushar Kataria, Beatrice Knudsen, Shireen Y. Elhabian
NCIS: Deep Color Gradient Maps Regression and Three-Class Pixel Classification for Enhanced Neuronal Cell Instance Segmentation in Nissl-Stained Histological Images

Deep learning has proven to be more effective than other methods in medical image analysis, including the seemingly simple but challenging task of segmenting individual cells, an essential step for many biological studies. Comparative neuroanatomy studies are an example where the instance segmentation of neuronal cells is crucial for cytoarchitecture characterization. This paper presents an end-to-end framework to automatically segment single neuronal cells in Nissl-stained histological images of the brain, thus aiming to enable solid morphological and structural analyses for the investigation of changes in the brain cytoarchitecture. A U-Net-like architecture with an EfficientNet as the encoder and two decoding branches is exploited to regress four color gradient maps and classify pixels into contours between touching cells, cell bodies, or background. The decoding branches are connected through attention gates to share relevant features, and their outputs are combined to return the instance segmentation of the cells. The method was tested on images of the cerebral cortex and cerebellum, outperforming other recent deep-learning-based approaches for the instance segmentation of cells.

Valentina Vadori, Antonella Peruffo, Jean-Marie Graïc, Livio Finos, Livio Corain, Enrico Grisan
Regionalized Infant Brain Cortical Development Based on Multi-view, High-Level fMRI Fingerprint

The human brain demonstrates higher spatial and functional heterogeneity during the first two postnatal years than any other period of life. Infant cortical developmental regionalization is fundamental for illustrating brain microstructures and reflecting functional heterogeneity during early postnatal brain development. It aims to establish smooth cortical parcellations based on the local homogeneity of brain development. Therefore, charting infant cortical developmental regionalization can reveal neurodevelopmentally meaningful cortical units and advance our understanding of early brain structural and functional development. However, existing parcellations are solely built based on either local structural properties or single-view functional connectivity (FC) patterns due to limitations in neuroimage analysis tools. These approaches fail to capture the diverse consistency of local and global functional development. Hence, we aim to construct a multi-view functional brain parcellation atlas, enabling a better understanding of infant brain functional organization during early development. Specifically, a novel fMRI fingerprint is proposed to fuse complementary regional functional connectivities. To ensure the smoothness and interpretability of the discovered map, we employ non-negative matrix factorization (NNMF) with dual graph regularization in our method. Our method was validated on the Baby Connectome Project (BCP) dataset, demonstrating superior performance compared to previous functional and structural parcellation approaches. Furthermore, we track functional development trajectory based on our brain cortical parcellation to highlight early development with high neuroanatomical and functional precision.

Tianli Tao, Jiawei Huang, Feihong Liu, Mianxin Liu, Lianghu Guo, Xinyi Cai, Zhuoyang Gu, Haifeng Tang, Rui Zhou, Siyan Han, Lixuan Zhu, Qing Yang, Dinggang Shen, Han Zhang
Machine Learning in Medical Imaging
Xiaohuan Cao
Xuanang Xu
Islem Rekik
Zhiming Cui
Xi Ouyang
Copyright Year
Electronic ISBN
Print ISBN

Premium Partner