Skip to main content

2024 | Book

Machine Learning for Multimodal Healthcare Data

First International Workshop, ML4MHD 2023, Honolulu, Hawaii, USA, July 29, 2023, Proceedings

Editors: Andreas K. Maier, Julia A. Schnabel, Pallavi Tiwari, Oliver Stegle

Publisher: Springer Nature Switzerland

Book Series : Lecture Notes in Computer Science


About this book

This book constitutes the proceedings of the First International Workshop on Machine Learning for Multimodal Healthcare Date, ML4MHD 2023, held in Honolulu, Hawaii, USA, in July 2023.
The 18 full papers presented were carefully reviewed and selected from 30 submissions. The workshop's primary objective was to bring together experts from diverse fields such as medicine, pathology, biology, and machine learning. With the aim to present novel methods and solutions that address healthcare challenges, especially those that arise from the complexity and heterogeneity of patient data.

Table of Contents

Death Prediction by Race in Colorectal Cancer Patients Using Machine Learning Approaches
Cancer (CRC) cases have increased worldwide. In USA, African Americans have a higher incidence than other races. In this paper, we aimed to use ML to study specific factors or variables affecting the high incidence of CRC mortality by race after receiving treatments and create models to predict death. We used metastatic CRC Genes Sequencing Studies as data. The patient’s inclusion was based on receiving chemotherapy and grouped by race (White-American and African-American). Five supervised ML methods were implemented for creating model predictions and a Mini-Batched-Normalized-Mutual-Information-Hybrid-Feature-Selection method to extract features including more than 25,000 genes. As a result, the best model was obtained with the Classification-Regression-Trees algorithm (AUC-ROC = 0.91 for White-American, AUC-ROC = 0.89 for African Americans). The features “DBNL gene”, “PIN1P1 gene” and “Days-from-birth” were the most significant variables associated with CRC mortality for White-American, while “IFI44L-gene”, “ART4-gene” and “Sex” were the most relevant related to African-American. In conclusion, these features and models are promising for further analysis and decision-making tools to study CRC from a precision medicine perspective for minority health.
Frances M. Aponte-Caraballo, Frances Heredia-Negrón, Brenda G. Nieves-Rodriguez, Abiel Roche-Lima
Neural Graph Revealers
Sparse graph recovery methods work well where the data follows their assumptions, however, they are not always designed for doing downstream probabilistic queries. This limits their adoption to only identifying connections among domain variables. On the other hand, Probabilistic Graphical Models (PGMs) learn an underlying base graph together with a distribution over the variables (nodes). PGM design choices are carefully made such that the inference and sampling algorithms are efficient. This results in certain restrictions and simplifying assumptions. In this work, we propose Neural Graph Revealers (NGRs) which attempt to efficiently merge the sparse graph recovery methods with PGMs into a single flow. The task is to recover a sparse graph showing connections between the features and learn a probability distribution over them at the same time. NGRs use a neural network as a multitask learning framework. We introduce graph-constrained path norm that NGRs leverage to learn a graphical model that captures complex non-linear functional dependencies between features in the form of an undirected sparse graph. NGRs can handle multimodal inputs like images, text, categorical data, embeddings etc. which are not straightforward to incorporate in the existing methods. We show experimental results on data from Gaussian graphical models and a multimodal infant mortality dataset by CDC (Software: https://​github.​com/​harshs27/​neural-graph-revealers).
Harsh Shrivastava, Urszula Chajewska
Multi-modal Biomarker Extraction Framework for Therapy Monitoring of Social Anxiety and Depression Using Audio and Video
This paper introduces a framework that can be used for feature extraction, relevant to monitoring the speech therapy progress of individuals suffering from social anxiety or depression. It operates multi-modal (decision fusion) by incorporating audio and video recordings of a patient and the corresponding interviewer, at two separate test assessment sessions. The used data is provided by an ongoing project in a day-hospital and outpatient setting in Germany, with the goal of investigating whether an established speech therapy group program for adolescents, which is implemented in a stationary and semi-stationary setting, can be successfully carried out via telemedicine. The features proposed in this multi-modal approach could form the basis for interpretation and analysis by medical experts and therapists, in addition to acquired data in the form of questionnaires. Extracted audio features focus on prosody (intonation, stress, rhythm, and timing), as well as predictions from a deep neural network model, which is inspired by the Pleasure, Arousal, Dominance (PAD) emotional model space. Video features are based on a pipeline that is designed to enable visualization of the interaction between the patient and the interviewer in terms of Facial Emotion Recognition (FER), utilizing the mini-Xception network architecture.
Tobias Weise, Paula Andrea Pérez-Toro, Andrea Deitermann, Bettina Hoffmann, Kubilay can Demir, Theresa Straetz, Elmar Nöth, Andreas Maier, Thomas Kallert, Seung Hee Yang
RobustSsF: Robust Missing Modality Brain Tumor Segmentation with Self-supervised Learning-Based Scenario-Specific Fusion
All modalities of Magnetic Resonance Imaging (MRI) have an essential role in diagnosing brain tumors, but there are some challenges posed by missing or incomplete modalities in multimodal MRI. Existing models have failed to achieve robust performance across all scenarios. To address this issue, this paper proposes a novel 4encoder-4decoder architecture that incorporates both “dedicated” and “single” models. Our model includes multiple Scenario-specific Fusion (SsF) decoders that construct different features depending on the missing modality scenarios. To train our model, we introduce a novel self-supervised learning-based loss function called Couple Regularization (CReg) to achieve robust learning and the Lifelong Learning Strategy (LLS) to enhance model performance. The experimental results on BraTS2018 demonstrate that RobustSsF has successfully improved robustness by reducing standard deviations from 12 times to 76 times lower, also achieving state-of-the-art results in all scenarios when the T1ce modality is missing.
Jeongwon Lee, Dae-Shik Kim
Semi-supervised Cooperative Learning for Multiomics Data Fusion
Multiomics data fusion integrates diverse data modalities, ranging from transcriptomics to proteomics, to gain a comprehensive understanding of biological systems and enhance predictions on outcomes of interest related to disease phenotypes and treatment responses. Cooperative learning, a recently proposed method, unifies the commonly-used fusion approaches, including early and late fusion, and offers a systematic framework for leveraging the shared underlying relationships across omics to strengthen signals. However, the challenge of acquiring large-scale labeled data remains, and there are cases where multiomics data are available but in the absence of annotated labels. To harness the potential of unlabeled multiomcis data, we introduce semi-supervised cooperative learning. By utilizing an “agreement penalty”, our method incorporates the additional unlabeled data in the learning process and achieves consistently superior predictive performance on simulated data and a real multiomics study of aging. It offers an effective solution to multiomics data fusion in settings with both labeled and unlabeled data and maximizes the utility of available data resources, with the potential of significantly improving predictive models for diagnostics and therapeutics in an increasingly multiomics world.
Daisy Yi Ding, Xiaotao Shen, Michael Snyder, Robert Tibshirani
Exploiting Partial Common Information Microstructure for Multi-modal Brain Tumor Segmentation
Learning with multiple modalities is crucial for automated brain tumor segmentation from magnetic resonance imaging data. Explicitly optimizing the common information shared among all modalities (e.g., by maximizing the total correlation) has been shown to achieve better feature representations and thus enhance the segmentation performance. However, existing approaches are oblivious to partial common information shared by subsets of the modalities. In this paper, we show that identifying such partial common information can significantly boost the discriminative power of image segmentation models. In particular, we introduce a novel concept of partial common information mask (PCI-mask) to provide a fine-grained characterization of what partial common information is shared by which subsets of the modalities. By solving a masked correlation maximization and simultaneously learning an optimal PCI-mask, we identify the latent microstructure of partial common information and leverage it in a self-attention module to selectively weight different feature representations in multi-modal data. We implement our proposed framework on the standard U-Net. Our experimental results on the Multi-modal Brain Tumor Segmentation Challenge (BraTS) datasets outperform those of state-of-the-art segmentation baselines, with validation Dice similarity coefficients of 0.920, 0.897, 0.837 for the whole tumor, tumor core, and enhancing tumor on BraTS-2020.
Yongsheng Mei, Guru Venkataramani, Tian Lan
Multimodal LLMs for Health Grounded in Individual-Specific Data
Foundation large language models (LLMs) have shown an impressive ability to solve tasks across a wide range of fields including health. To effectively solve personalized health tasks, LLMs need the ability to ingest a diversity of data modalities that are relevant to an individual’s health status. In this paper, we take a step towards creating multimodal LLMs for health that are grounded in individual-specific data by developing a framework (HeLM: Health Large Language Model for Multimodal Understanding) that enables LLMs to use high-dimensional clinical modalities to estimate underlying disease risk. HeLM encodes complex data modalities by learning an encoder that maps them into the LLM’s token embedding space and for simple modalities like tabular data by serializing the data into text. Using data from the UK Biobank, we show that HeLM can effectively use demographic and clinical features in addition to high-dimensional time-series data to estimate disease risk. For example, HeLM achieves an AUROC of 0.75 for asthma prediction when combining tabular and spirogram data modalities compared with 0.49 when only using tabular data. Overall, we find that HeLM outperforms or performs at parity with classical machine learning approaches across a selection of eight binary traits. Furthermore, we investigate the downstream uses of this model such as its generalizability to out-of-distribution traits and its ability to power conversations around individual health and wellness.
Anastasiya Belyaeva, Justin Cosentino, Farhad Hormozdiari, Krish Eswaran, Shravya Shetty, Greg Corrado, Andrew Carroll, Cory Y. McLean, Nicholas A. Furlotte
Speed-of-Sound Mapping for Pulse-Echo Ultrasound Raw Data Using Linked-Autoencoders
Recent studies showed the possibility of extracting SoS information from pulse-echo ultrasound raw data (a.k.a. RF data) using deep neural networks that are fully trained on simulated data. These methods take sensor domain data, i.e., RF data, as input and train a network in an end-to-end fashion to learn the implicit mapping between the RF data domain and the SoS domain. However, such networks are prone to overfitting to simulated data which results in poor performance and instability when tested on measured data. We propose a novel method for SoS mapping employing learned representations from two linked autoencoders. We test our approach on simulated and measured data acquired from human breast mimicking phantoms. We show that SoS mapping is possible using the learned representations by linked autoencoders. The proposed method has a Mean Absolute Percentage Error (MAPE) of \(2.39\%\) on the simulated data. On the measured data, the predictions of the proposed method are close to the expected values (MAPE of \(1.1\) \(\%\)). Compared to an end-to-end trained network, the proposed method shows higher stability and reproducibility.
Farnaz Khun Jush, Peter M. Dueppenbecker, Andreas Maier
HOOREX: Higher Order Optimizers for 3D Recovery from X-Ray Images
We propose a method to address the challenge of generating a 3D digital twin of a patient during an X-ray guided medical procedure from a single 2D X-ray projection image, a problem that is inherently ill-posed. To tackle this issue, we aim to infer the parameters of Bones, Organs and Skin Shape (BOSS) model, a deformable human shape and pose model. There are currently two main approaches for model-based estimation. Optimization-based methods try to iteratively fit a body model to 2D measurements, they produce accurate 2D alignments but are slow and sensitive to initialization. On the other hand, regression-based methods use neural networks to estimate the model parameters directly, resulting in faster predictions but often with misalignments. Our approach combines the benefits of both techniques by implementing a fully differentiable paradigm through the use of higher-order optimizers that only require the Jacobian, which can be determined implicitly. The network was trained on synthetic CT and real CBCT image data, ensuring view independence. We demonstrate the potential clinical applicability of our method by validating it on multiple datasets covering diverse anatomical regions, and achieving an error of 27.98 mm.
Karthik Shetty, Annette Birkhold, Bernhard Egger, Srikrishna Jaganathan, Norbert Strobel, Markus Kowarschik, Andreas Maier
GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease Detection
Integrating real-time artificial intelligence (AI) systems in clinical practices faces challenges such as scalability and acceptance. These challenges include data availability, biased outcomes, data quality, lack of transparency, and underperformance on unseen datasets from different distributions. The scarcity of large-scale, precisely labeled, and diverse datasets are the major challenge for clinical integration. This scarcity is also due to the legal restrictions and extensive manual efforts required for accurate annotations from clinicians. To address these challenges, we present GastroVision, a multi-center open-access gastrointestinal (GI) endoscopy dataset that includes different anatomical landmarks, pathological abnormalities, polyp removal cases and normal findings (a total of 27 classes) from the GI tract. The dataset comprises 8,000 images acquired from Bærum Hospital in Norway and Karolinska University Hospital in Sweden and was annotated and verified by experienced GI endoscopists. Furthermore, we validate the significance of our dataset with extensive benchmarking based on the popular deep learning based baseline models. We believe our dataset can facilitate the development of AI-based algorithms for GI disease detection and classification. Our dataset is available at https://​osf.​io/​84e7f/​.
Debesh Jha, Vanshali Sharma, Neethi Dasu, Nikhil Kumar Tomar, Steven Hicks, M. K. Bhuyan, Pradip K. Das, Michael A. Riegler, Pål Halvorsen, Ulas Bagci, Thomas de Lange
MaxCorrMGNN: A Multi-graph Neural Network Framework for Generalized Multimodal Fusion of Medical Data for Outcome Prediction
With the emergence of multimodal electronic health records, the evidence for an outcome may be captured across multiple modalities ranging from clinical to imaging and genomic data. Predicting outcomes effectively requires fusion frameworks capable of modeling fine-grained and multi-faceted complex interactions between modality features within and across patients. We develop an innovative fusion approach called MaxCorr MGNN that models non-linear modality correlations within and across patients through Hirschfeld-Gebelein-Reǹyi maximal correlation (MaxCorr) embeddings, resulting in a multi-layered graph that preserves the identities of the modalities and patients. We then design, for the first time, a generalized multi-layered graph neural network (MGNN) for task-informed reasoning in multi-layered graphs, that learns the parameters defining patient-modality graph connectivity and message passing in an end-to-end fashion. We evaluate our model an outcome prediction task on a Tuberculosis (TB) dataset consistently outperforming several state-of-the-art neural, graph-based and traditional fusion techniques.
Niharika S. D’Souza, Hongzhi Wang, Andrea Giovannini, Antonio Foncubierta-Rodriguez, Kristen L. Beck, Orest Boyko, Tanveer Syeda-Mahmood
SIM-CNN: Self-supervised Individualized Multimodal Learning for Stress Prediction on Nurses Using Biosignals
Precise stress recognition from biosignals is inherently challenging due to the heterogeneous nature of stress, individual physiological differences, and scarcity of labeled data. To address these issues, we developed SIM-CNN, a self-supervised learning (SSL) method for personalized stress-recognition models using multimodal biosignals. SIM-CNN involves training a multimodal 1D convolutional neural network (CNN) that leverages SSL to utilize massive unlabeled data, optimizing individual parameters and hyperparameters for precision health. SIM-CNN is evaluated on a real-world multimodal dataset collected from nurses that consists of 1,250 h of biosignals, 83 h of which are explicitly labeled with stress levels. SIM-CNN is pre-trained on the unlabeled biosignal data with next-step time series forecasting and fine-tuned on the labeled data for stress classification. Compared to SVMs and baseline CNNs with an identical architecture but without self-supervised pre-training, SIM-CNN shows clear improvements in the average AUC and accuracy, but a further examination of the data also suggests some intrinsic limitations of patient-specific stress recognition using biosignals recorded in the wild.
Sunmin Eom, Sunwoo Eom, Peter Washington
InterSynth: A Semi-Synthetic Framework for Benchmarking Prescriptive Inference from Observational Data
Treatments are prescribed to individuals in pursuit of contemporaneously unobserved outcomes, based on evidence derived from populations with historically observed treatments and outcomes. Since neither treatments nor outcomes are typically replicable in the same individual, alternatives remain counterfactual in both settings. Prescriptive fidelity therefore cannot be evaluated empirically at the individual-level, forcing reliance on lossy, group-level estimates, such as average treatment effects, that presume an implausibly low ceiling on individuation. The lack of empirical ground truths critically impedes the development of individualised prescriptive models, on which realising personalised care inevitably depends. Here we present InterSynth, a general platform for modelling biologically-plausible, empirically-informed, semi-synthetic ground truths, for the evaluation of prescriptive models operating at the individual level. InterSynth permits comprehensive simulation of heterogeneous treatment effect sizes and variability, and observed and unobserved confounding treatment allocation biases, with explicit modelling of decoupled response failure and spontaneous recovery. Operable with high-dimensional data such as high-resolution brain lesion maps, InterSynth offers a principled means of quantifying the fidelity of prescriptive models across a wide range of plausible real-world conditions. We demonstrate end-to-end use of the platform with an example employing real neuroimaging data from patients with ischaemic stroke, volume image-based succinct lesion representations, and semi-synthetic ground truths informed by functional, transcriptomic and receptomic data. We make our platform freely available to the scientific community.
Dominic Giles, Robert Gray, Chris Foulon, Guilherme Pombo, Tianbo Xu, James K. Ruffle, H. Rolf Jäger, Jorge Cardoso, Sebastien Ourselin, Geraint Rees, Ashwani Jha, Parashkev Nachev
Machine Learning for Multimodal Healthcare Data
Andreas K. Maier
Julia A. Schnabel
Pallavi Tiwari
Oliver Stegle
Copyright Year
Electronic ISBN
Print ISBN

Premium Partner