Skip to main content
Top

2023 | Book

Artificial Intelligence in Medicine

21st International Conference on Artificial Intelligence in Medicine, AIME 2023, Portorož, Slovenia, June 12–15, 2023, Proceedings

insite
SEARCH

About this book

This book constitutes the refereed proceedings of the 21st International Conference on Artificial Intelligence in Medicine, AIME 2023, held in Portoroz, Slovenia, in June12–15, 2023.

The 23 full papers and 21 short papers presented together with 3 demonstration papers were selected from 108 submissions. The papers are grouped in topical sections on: machine learning and deep learning; explainability and transfer learning; natural language processing; image analysis and signal analysis; data analysis and statistical models; knowledge representation and decision support.

Table of Contents

Frontmatter

Machine Learning and Deep Learning

Frontmatter
Survival Hierarchical Agglomerative Clustering: A Semi-Supervised Clustering Method Incorporating Survival Data

Heterogeneity in patient populations presents a significant challenge for healthcare professionals, as different sub-populations may require individualized therapeutic approaches. To address this issue, clustering algorithms are often employed that identify patient groups with homogeneous characteristics. Clustering algorithms are mainly unsupervised, resulting in clusters that are biologically meaningful, but not necessarily correlated with a clinical or therapeutical outcome of interest.In this study we introduce survival hierarchical agglomerative clustering (S-HAC), a novel semi-supervised clustering method that extends the popular hierarchical agglomerative clustering algorithm. Our approach makes use of both patients’ descriptive variables and survival times to form clusters that are homogeneous in the descriptive space and include cohesive survival times.In a benchmark study, S-HAC outperformed several existing semi-supervised clustering algorithms. The algorithm was also evaluated on a critical care database of atrial fibrillation management, where it identified clusters that could readily be mapped to existing knowledge of treatment effects such as contraindications and adverse events following drug exposures. These results demonstrate the effectiveness of the algorithm in identifying clinically relevant sub-populations within heterogeneous patient cohorts.S-HAC represents an attractive clustering method for the biomedical domain due to its interpretability and computational simplicity. Its application to different patient cohorts may enable healthcare professionals to tailor treatments and more effectively meet the needs of individual patients. We believe that this approach has the potential to greatly improve patient outcomes and enhance the efficiency of healthcare delivery.

Alexander Lacki, Antonio Martinez-Millana
Boosted Random Forests for Predicting Treatment Failure of Chemotherapy Regimens

Cancer patients may undergo lengthy and painful chemotherapy treatments, comprising several successive regimens or plans. Treatment inefficacy and other adverse events can lead to discontinuation (or failure) of these plans, or prematurely changing them, which results in a significant amount of physical, financial, and emotional toxicity to the patients and their families. In this work, we build treatment failure models based on the Real World Evidence (RWE) gathered from patients’ profiles available in our oncology EMR/EHR system. We also describe our feature engineering pipeline, experimental methods, and valuable insights obtained about treatment failures from trained models. We report our findings on five primary cancer types with the most frequent treatment failures (or discontinuations) to build unique and novel feature vectors from the clinical notes, diagnoses, and medications that are available in our oncology EMR. After following a novel three axes - performance, complexity, and explainability - design exploration framework, boosted random forests are selected because they provide a baseline accuracy of 80% and an F1 score of 75%, with reduced model complexity, thus making them more interpretable to and usable by oncologists.

Muhammad Usamah Shahid, Muddassar Farooq
A Binning Approach for Predicting Long-Term Prognosis in Multiple Sclerosis

Multiple sclerosis is a complex disease with a highly heterogeneous disease course. Early treatment of multiple sclerosis patients could delay or even prevent disease worsening, but selecting the right treatment is difficult due to the heterogeneity. To alleviate this decision-making process, predictions of the long-term prognosis of the individual patient are of interest (especially at diagnosis, when not much is known yet). However, most prognosis studies for multiple sclerosis currently focus on a short-term binary endpoint, answering questions like “will the patient significantly progress in 2 years”. In this paper, we present a novel approach that provides a comprehensive perspective on the long-term prognosis of the individual patient, by dividing the years after diagnosis up into bins and predicting the level of disability in each of these bins. Our approach addresses several general issues in observational datasets, such as sporadic measurements at irregular time-intervals, widely varying lengths of follow-up, and unequal number of measurements even for the same follow-up. We evaluated our approach on real-world clinical data from an observational single-center cohort of multiple sclerosis patients in Belgium. On this dataset, a regressor chain of random forests achieved a Pearson correlation of 0.72 between its cross-validated test set predictions and the actual disability measurements assessed by a clinician.

Robbe D’hondt, Sinéad Moylett, An Goris, Celine Vens
Decision Tree Approaches to Select High Risk Patients for Lung Cancer Screening Based on the UK Primary Care Data

Lung cancer has the highest cancer mortality rate in the UK. Most patients are diagnosed at an advanced stage because common symptoms for lung cancer such as cough, pain, dyspnoea and anorexia are also present in other diseases. This partly attributes towards the low survival rate. Therefore, it is crucial to screen high risk patients for lung cancer at an early stage through computed tomography (CT) scans. As shown in a previous study, for patients who were screened for lung cancer and were identified with stage I lung cancer, the estimated survival rate was 88% compared to only 5% who have stage IV lung cancer. This paper aims to build tree-based machine learning models for predicting lung cancer risk by extracting significant factors associated with lung cancer. The Clinical Practice Research Datalink (CPRD) data was used in this study which are anonymised patient data collected from 945 general practices across the UK. Two tree-based models (decision trees and random forest) are developed and implemented. The performance of the two models is compared with a logistic regression model in terms of accuracy, Area Under the Receiver Operating Characteristic curve (AUROC), sensitivity and specificity, and both achieve better results. However, as for interpretability, it was found that, unlike coefficients in logistic regression, the default feature importance is non-negative in random forests and decision trees. This makes tree-based models less interpretable than logistic regression.

Teena Rai, Yuan Shen, Jaspreet Kaur, Jun He, Mufti Mahmud, David J. Brown, David R. Baldwin, Emma O’Dowd, Richard Hubbard
Causal Discovery with Missing Data in a Multicentric Clinical Study

Causal inference for testing clinical hypotheses from observational data presents many difficulties because the underlying data-generating model and the associated causal graph are not usually available. Furthermore, observational data may contain missing values, which impact the recovery of the causal graph by causal discovery algorithms: a crucial issue often ignored in clinical studies. In this work, we use data from a multi-centric study on endometrial cancer to analyze the impact of different missingness mechanisms on the recovered causal graph. This is achieved by extending state-of-the-art causal discovery algorithms to exploit expert knowledge without sacrificing theoretical soundness. We validate the recovered graph with expert physicians, showing that our approach finds clinically-relevant solutions. Finally, we discuss the goodness of fit of our graph and its consistency from a clinical decision-making perspective using graphical separation to validate causal pathways.

Alessio Zanga, Alice Bernasconi, Peter J. F. Lucas, Hanny Pijnenborg, Casper Reijnen, Marco Scutari, Fabio Stella
Novel Approach for Phenotyping Based on Diverse Top-K Subgroup Lists

The discovery of phenotypes is useful to describe a population. Providing a set of diverse patient phenotypes with the same medical condition may help clinicians to understand it. In this paper, we approach this problem by defining the technical task of mining diverse top-k phenotypes and proposing an algorithm called DSLM to solve it. The phenotypes obtained are evaluated according to their quality and predictive capacity in a bacterial infection problem.

Antonio Lopez-Martinez-Carrasco, Hugo M. Proença, Jose M. Juarez, Matthijs van Leeuwen, Manuel Campos
Patient Event Sequences for Predicting Hospitalization Length of Stay

Predicting patients’ hospital length of stay (LOS) is essential for improving resource allocation and supporting decision-making in healthcare organizations. This paper proposes a novel transformer-based model, termed Medic-BERT (M-BERT), for predicting LOS by modeling patient information as sequences of events. We performed empirical experiments on a cohort of 48k emergency care patients from a large Danish hospital. Experimental results show that M-BERT can achieve high accuracy on a variety of LOS problems and outperforms traditional non-sequence-based machine learning approaches.

Emil Riis Hansen, Thomas Dyhre Nielsen, Thomas Mulvad, Mads Nibe Strausholm, Tomer Sagi, Katja Hose
Autoencoder-Based Prediction of ICU Clinical Codes

Availability of diagnostic codes in Electronic Health Records (EHRs) is crucial for patient care as well as reimbursement purposes. However, entering them in the EHR is tedious, and some clinical codes may be overlooked. Given an incomplete list of clinical codes, we investigate the performance of ML methods on predicting the complete ones, and assess the added predictive value of including other clinical patient data in this task. We used the MIMIC-III dataset and frame the task of completing the clinical codes as a recommendation problem. We consider various autoencoder approaches plus two strong baselines; item co-occurrence and Singular Value Decomposition (SVD). Inputs are 1) a record’s known clinical codes, 2) the codes plus variables. The co-occurrence-based approach performed slightly better (F1 score = 0.26, Mean Average Precision [MAP] = 0.19) than the SVD (F1 = 0.24, MAP = 0.18). However, the adversarial autoencoder achieved the best performance when using the codes plus variables (F1 = 0.32, MAP = 0.25). Adversarial autoencoders performed best in terms of F1 and were equal to vanilla and denoising autoencoders in term of MAP. Using clinical variables in addition to the incomplete codes list, improves the predictive performance of the models.

Tsvetan R. Yordanov, Ameen Abu-Hanna, Anita CJ. Ravelli, Iacopo Vagliano

Explainability and Transfer Learning

Frontmatter
Hospital Length of Stay Prediction Based on Multi-modal Data Towards Trustworthy Human-AI Collaboration in Radiomics

To what extent can the patient’s length of stay in a hospital be predicted using only an X-ray image? We answer this question by comparing the performance of machine learning survival models on a novel multi-modal dataset created from 1235 images with textual radiology reports annotated by humans. Although black-box models predict better on average than interpretable ones, like Cox proportional hazards, they are not inherently understandable. To overcome this trust issue, we introduce time-dependent model explanations into the human-AI decision making process. Explaining models built on both: human-annotated and algorithm-extracted radiomics features provides valuable insights for physicians working in a hospital. We believe the presented approach to be general and widely applicable to other time-to-event medical use cases. For reproducibility, we open-source code and the tlos dataset at https://github.com/mi2datalab/xlungs-trustworthy-los-prediction .

Hubert Baniecki, Bartlomiej Sobieski, Przemysław Bombiński, Patryk Szatkowski, Przemysław Biecek
Explainable Artificial Intelligence for Cytological Image Analysis

Emerging new technologies are entering the medical market. Among them, the use of Machine Learning (ML) is becoming more common. This work explores the associated Explainable Artificial Intelligence (XAI) approaches, which should help to provide insight into the often opaque methods and thus gain trust of users and patients as well as facilitate interdisciplinary work. Using the differentiation of white blood cells with the aid of a high throughput quantitative phase microscope as an example, we developed a web-based XAI dashboard to assess the effect of different XAI methods on the perception and the judgment of our users. Therefore, we conducted a study with two user groups of data scientists and biomedical researchers and evaluated their interaction with our XAI modules, with respect to the aspects of behavioral understanding of the algorithm, its ability to detect biases and its trustworthiness. The results of the user tests show considerable improvement achieved through the XAI dashboard on the measured set of aspects. A deep dive analysis aggregated on the different user groups compares the five implemented modules. Furthermore, the results reveal that using a combination of modules achieves higher appreciation than the individual modules. Finally, one observes a user’s tendency of overestimating the trustworthiness of the algorithm compared to their perceived abilities to understand the behavior of the algorithm and to detect biases.

Stefan Röhrl, Hendrik Maier, Manuel Lengl, Christian Klenk, Dominik Heim, Martin Knopp, Simon Schumann, Oliver Hayden, Klaus Diepold
Federated Learning to Improve Counterfactual Explanations for Sepsis Treatment Prediction

In recent years, we have witnessed both artificial intelligence obtaining remarkable results in clinical decision support systems (CDSSs) and explainable artificial intelligence (XAI) improving the interpretability of these models. In turn, this fosters the adoption by medical personnel and improves trustworthiness of CDSSs. Among others, counterfactual explanations prove to be one such XAI technique particularly suitable for the healthcare domain due to its ease of interpretation, even for less technically proficient staff. However, the generation of high-quality counterfactuals relies on generative models for guidance. Unfortunately, training such models requires a huge amount of data that is beyond the means of ordinary hospitals. In this paper, we therefore propose to use federated learning to allow multiple hospitals to jointly train such generative models while maintaining full data privacy. We demonstrate the superiority of our approach compared to locally generated counterfactuals on a CDSS for sepsis treatment prescription using various metrics. Moreover, we prove that generative models for counterfactual generation that are trained using federated learning in a suitable environment perform only marginally worse compared to centrally trained ones while offering the benefit of data privacy preservation.

Christoph Düsing, Philipp Cimiano
Explainable AI for Medical Event Prediction for Heart Failure Patients

The past decade has witnessed significant progress in deploying AI in the medical field. However, most AI models are considered black-boxes, making predictions neither understandable nor interpretable by humans. This limitation is especially significant when they contradict clinicians’ expectations based on medical knowledge. This can lead to a lack of trust in the model. In this work, we propose a pipeline to explain AI models. We used a previously devised Neural Network model to present our approach. It predicts the daily risk for patients with heart failure and is a part of a Decision Support System. In our pipeline, we deployed DeepSHAP algorithm to receive global and local explanations. With a global explanation, we defined the most important features in the model and their influence on the prediction. With local explanation, we analyzed individual observations and explained why a specific prediction was made. To validate the clinical relevance of our results, we consulted them with medical experts and made a literature review. Moreover, we described how the proposed pipeline can be integrated into Decision Support Systems. With the above tools, medical personnel can analyze the root of decisions and have insights into how medical parameters should be changed to improve the patient’s health state.

Weronika Wrazen, Kordian Gontarska, Felix Grzelka, Andreas Polze
Adversarial Robustness and Feature Impact Analysis for Driver Drowsiness Detection

Drowsy driving is a major cause of road accidents, but drivers are dismissive of the impact that fatigue can have on their reaction times. To detect drowsiness before any impairment occurs, a promising strategy is using Machine Learning (ML) to monitor Heart Rate Variability (HRV) signals. This work presents multiple experiments with different HRV time windows and ML models, a feature impact analysis using Shapley Additive Explanations (SHAP), and an adversarial robustness analysis to assess their reliability when processing faulty input data and perturbed HRV signals. The most reliable model was Extreme Gradient Boosting (XGB) and the optimal time window had between 120 and 150 s. Furthermore, the 18 most impactful features were selected and new smaller models were trained, achieving a performance as good as the initial ones. Despite the susceptibility of all models to adversarial attacks, adversarial training enabled them to preserve significantly higher results, so it can be a valuable approach to provide a more robust driver drowsiness detection.

João Vitorino, Lourenço Rodrigues, Eva Maia, Isabel Praça, André Lourenço
Computational Evaluation of Model-Agnostic Explainable AI Using Local Feature Importance in Healthcare

Explainable artificial intelligence (XAI) is essential for enabling clinical users to get informed decision support from AI and comply with evidence-based medical practice. In the XAI field, effective evaluation methods are still being developed. The straightforward way is to evaluate via user feedback. However, this needs big efforts (applying on high number of users and test cases) and can still include various biases inside. A computational evaluation of explanation methods is also not easy since there is not yet a standard output of XAI models and the unsupervised learning behavior of XAI models. In this paper, we propose a computational evaluation method for XAI models which generate local feature importance as explanations. We use the output of XAI model (local feature importances) as features and the output of the prediction problem (labels) again as labels. We evaluate the method based a real-world tabular electronic health records dataset. At the end, we answer the research question: “How can we computationally evaluate XAI Models for a specific prediction model and dataset?”.

Seda Polat Erdeniz, Michael Schrempf, Diether Kramer, Peter P. Rainer, Alexander Felfernig, Trang Tran, Tamim Burgstaller, Sebastian Lubos
Batch Integrated Gradients: Explanations for Temporal Electronic Health Records

eXplainable Artifical Intelligence (XAI) is integral for the usability of black-box models in high-risk domains. Many problems in such domains are concerned with analysing temporal data. Namely, we must consider a sequence of instances that occur in time, and explain why the prediction transitions from one time point to the next. Currently, XAI techniques do not leverage the temporal nature of data and instead treat each instance independently. Therefore, we introduce a new approach advancing the Integrated Gradients method developed in the literature, namely the Batch-Integrated Gradients (Batch-IG) technique that (1) produces explanations over a temporal batch for instance-to-instance state transitions and (2) takes into account features that change over time. In Electronic Health Records (EHRs), we see patient records can be stored in temporal sequences. Thus, we demonstrate Batch-Integrated Gradients in producing explanations over a temporal sequence that satisfy proposed properties corresponding to XAI for EHR data.

Jamie Duell, Xiuyi Fan, Hsuan Fu, Monika Seisenberger
Improving Stroke Trace Classification Explainability Through Counterexamples

Deep learning process trace classification is proving powerful in several application domains, including medical ones; however, classification results are typically not explainable, an issue which is particularly relevant in medicine.In our recent work we tackled this problem, by proposing trace saliency maps, a novel tool able to highlight what trace activities are particularly significant for the classification task. A trace saliency map is built by generating artificial perturbations of the trace at hand that are classified in the same class as the original one, called examples.In this paper, we investigate the role of counterexamples (i.e., artificial perturbations that are classified in a different class with respect to the original trace) in refining trace saliency map information, thus improving explainability. We test the approach in the domain of stroke.

Giorgio Leonardi, Stefania Montani, Manuel Striani
Spatial Knowledge Transfer with Deep Adaptation Network for Predicting Hospital Readmission

A hospital readmission risk prediction model based on electronic health record (EHR) data can be an important tool for identifying high-risk patients in need of additional support. Performant readmission models based on deep learning approaches require large, high-quality training datasets to perform optimally. Utilizing EHR data from a source hospital system to enhance prediction on a target hospital using traditional approaches might bias the dataset if distributions of the source and target data are different. There is a lack of an end-to-end readmission model that can capture cross-domain knowledge. Herein, we propose an early readmission risk temporal deep adaptation network, ERR-TDAN, for cross-domain spatial knowledge transfer. ERR-TDAN transforms source and target data to a common embedding space while capturing temporal dependencies of the sequential EHR data. Domain adaptation is then applied on a domain-specific fully connected linear layer. The model is optimized by a loss function that combines distribution discrepancy loss to match the mean embeddings of the two distributions and the task loss to optimize predicting readmission at the target hospital. In a use case of patients with diabetes, a model developed using target data of 37,091 patients from an urban academic hospital was enhanced by transferring knowledge from high-quality source data of 20,471 patients from a rural academic hospital. The proposed method yielded a 5% increase in F1-score compared to baselines. ERR-TDAN may be an effective way to increase a readmission risk model’s performance when data from multiple sites are available.

Ameen Abdel Hai, Mark G. Weiner, Alice Livshits, Jeremiah R. Brown, Anuradha Paranjape, Zoran Obradovic, Daniel J. Rubin
Dealing with Data Scarcity in Rare Diseases: Dynamic Bayesian Networks and Transfer Learning to Develop Prognostic Models of Amyotrophic Lateral Sclerosis

The extremely low prevalence of rare diseases exacerbates many of the typical challenges to prognostic model development, resulting, at the same time, in low data availability and difficulties in procuring additional data due to, e.g., privacy concerns over the risk of patient reidentification. Yet, developing prognostic models with possibly limited in-house data is often of interest for many applications (e.g., prototyping, hypothesis confirmation, exploratory analyses).Several options exist beyond simply training a model with the available data: data from a larger database might be acquired; or, lacking that, to sidestep limitations to data sharing, one might resort to simulators based, e.g., on dynamic Bayesian networks (DBNs). Additionally, transfer learning techniques might be applied to integrate external and in-house data sources.Here, we compare the effectiveness of these strategies in developing a predictive model of 3-year mortality in amyotrophic lateral sclerosis (ALS, a rare neurodegenerative disease with <0.01% prevalence) using the in-house dataset of a single ALS clinic in Milan, Italy (N = 116). We test several combinations of direct and transfer-learning-mediated development based on additional real data from the Italian PARALS register (N = 568). We also train two DBNs, one for each dataset, and use them to simulate large numbers of virtual subjects whose variables are linked by the same probabilistic relationships as in the real data.We show that, compared to a baseline model developed on the smaller dataset (AUROC = 0.633), the largest performance increase was obtained using data simulated using a DBN trained on the larger PARALS register (AUROC = 0.734).

Enrico Longato, Erica Tavazzi, Adriano Chió, Gabriele Mora, Giovanni Sparacino, Barbara Di Camillo

Natural Language Processing

Frontmatter
A Rule-Free Approach for Cardiological Registry Filling from Italian Clinical Notes with Question Answering Transformers

The huge volume of textual information generated in hospitals constitutes an essential but underused asset that could be exploited to improve patient care and management. The encoding of raw medical texts into fixed data structures is traditionally addressed with knowledge-based models and complex hand-crafted rules, but the rigidity of this approach poses limitations to the generalizability and transferability of the solutions, in particular for a non-English setting under data scarcity conditions. This paper shows that transformer-based language representation models have the right characteristics to be employed as a more flexible but equally high-performing clinical information retrieval system for this scenario, without relying upon a knowledge-driven component. We demonstrate it pragmatically on the extraction of clinical entities from Italian cardiology reports for patients with inherited arrhythmias, outperforming the previous ontology-based work with our proposed transformer pipeline under the same setting and exploring a new rule-free approach based on question answering to automate cardiological registry filling.

Tommaso Mario Buonocore, Enea Parimbelli, Valentina Tibollo, Carlo Napolitano, Silvia Priori, Riccardo Bellazzi
Classification of Fall Types in Parkinson's Disease from Self-report Data Using Natural Language Processing

Falls are a leading cause of injury globally, and people with Parkinson’s disease are particularly at risk. An important step in reducing the probability of falls is to identify their causes, but manually classifying fall types is laborious and requires expertise. Natural language processing (NLP) approaches hold potential to automate fall type identification from descriptions. The aim of this study was to develop and evaluate NLP–based methods to classify fall types from Parkinson’s disease patient self-report data. We trained supervised NLP classifiers using an existing dataset consisting of both structured and unstructured data, including the age, gender, and duration of Parkinson's disease of the faller, as well as the fall location, free-text fall description, and fall class of each fall. We trained supervised classification models to predict fall class based on these attributes, and then performed an ablation study to determine the most important factors influencing the model. The best performing classifier was a hard voting ensemble model that combined the Adaboost, unweighted decision tree, weighted k-nearest neighbor, naïve Bayes, random forest, and support vector machine classifiers. On the testing set, this ensemble classifier achieved an F1-macro of 0.89. We also experimented with a transformer-based model, but its performance was subpar compared to that of the other models. Our study demonstrated that automatic fall type classification in Parkinson's disease patients is possible via NLP and supervised classification.

Jeanne M. Powell, Yuting Guo, Abeed Sarker, J. Lucas McKay
BERT for Complex Systematic Review Screening to Support the Future of Medical Research

This work presents a Natural Language Processing approach to screen complex datasets of medical articles to provide timely and efficient response to pressing issues in medicine. The approach is based on the Bidirectional Encoder Representation from Transformers (BERT) to screen the articles using their titles and abstracts. Systematic review screening is a classification task aiming at selecting articles fulfilling the criteria for the next step of the review. A number of BERT models are fine-tuned for this classification task. Two challenging space medicine systematic review datasets that include human, animal, and in-vitro studies are used for the evaluation of the models. Backtranslation is used as a data augmentation technique to handle the class imbalance and a performance comparison of the models on the original and augmented data is presented. The BERT models provide an accessible solution for screening systematic reviews, which are considered complex and time-consuming. The proposed approach can change the workflow of conducting these types of reviews, especially in response to urgent policy and practice questions in medicine. The source code and datasets are available on GitHub: https://github.com/ESA-RadLab/BERTCSRS .

Marta Hasny, Alexandru-Petru Vasile, Mario Gianni, Alexandra Bannach-Brown, Mona Nasser, Murray Mackay, Diana Donovan, Jernej Šorli, Ioana Domocos, Milad Dulloo, Nimita Patel, Olivia Drayson, Nicole Meerah Elango, Jéromine Vacquie, Ana Patricia Ayala, Anna Fogtman
GGTWEAK: Gene Tagging with Weak Supervision for German Clinical Text

Accurate extraction of biomolecular named entities like genes and proteins from medical documents is an important task for many clinical applications. So far, most gene taggers were developed in the domain of English-language, scientific articles. However, documents from other genres, like clinical practice guidelines, are usually created in the respective language used by clinical practitioners. To our knowledge, no annotated corpora and machine learning models for gene named entity recognition are currently available for the German language.In this work, we present GGTweak, a publicly available gene tagger for German medical documents based on a large corpus of clinical practice guidelines. Since obtaining sufficient gold-standard annotations of gene mentions for training supervised machine learning models is expensive, our approach relies solely on programmatic, weak supervision for model training. We combine various label sources based on the surface form of gene mentions and gazetteers of known gene names, with only partial individual coverage of the training data. Using a small amount of hand-labelled data for model selection and evaluation, our weakly supervised approach achieves an $$F_1$$ F 1 score of 76.6 on a held-out test set, an increase of 12.4 percent points over a strongly supervised baseline.While there is still a performance gap to state-of-the-art gene taggers for the English language, weak supervision is a promising direction for obtaining solid baseline models without the need to conduct time-consuming annotation projects. GGTweak can be readily applied in-domain to derive semantic metadata and enable the development of computer-interpretable clinical guidelines, while the out-of-domain robustness still needs to be investigated.

Sandro Steinwand, Florian Borchert, Silvia Winkler, Matthieu-P. Schapranow
Soft-Prompt Tuning to Predict Lung Cancer Using Primary Care Free-Text Dutch Medical Notes

We examine the use of large Transformer-based pretrained language models (PLMs) for the problem of early prediction of lung cancer using free-text patient medical notes of Dutch primary care physicians. Specifically, we investigate: 1) how soft prompt-tuning compares to standard model fine-tuning; 2) whether simpler static word embedding models (WEMs) can be more robust compared to PLMs in highly imbalanced settings; and 3) how models fare when trained on notes from a small number of patients. All our code is available open source in https://bitbucket.org/aumc-kik/prompt_tuning_cancer_prediction/ .

Auke Elfrink, Iacopo Vagliano, Ameen Abu-Hanna, Iacer Calixto
Machine Learning Models for Automatic Gene Ontology Annotation of Biological Texts

Gene ontology (GO) is a major source of biological knowledge that describes the functions of genes and gene products using a comprehensive set of controlled vocabularies or terms organized in a hierarchical structure. Automatic annotation of biological texts using gene ontology (GO) terms gained the attention of the scientific community as it helps to quickly identify relevant documents or parts of text related to specific biological functions or processes. In this paper, we propose and investigate a new GO-term annotation strategy that uses a non-parametric k-nearest neighbor model and relies on various vector-based representations of documents and GO terms linked to these documents. Our vector representations are based on machine learning and natural language processing (NLP) models, including singular value decomposition, Word2Vec and topic-based scoring. We evaluate the performance of our model on a large benchmark corpus using a variety of standard and hierarchical evaluation metrics.

Jayati H. Jui, Milos Hauskrecht

Image Analysis and Signal Analysis

Frontmatter
A Robust BKSVD Method for Blind Color Deconvolution and Blood Detection on H &E Histological Images

Hematoxylin and Eosin (H &E) color variation between histological images from different laboratories degrades the performance of Computer-Aided Diagnosis systems. Histology-specific models to solve color variation are designed taking into account the staining procedure, where most color variations are introduced. In particular, Blind Color Deconvolution (BCD) methods aim to identify the real underlying colors in the image and to separate the tissue structure from the color information. A commonly used assumption is that images are stained with and only with the pure staining colors (e.g., blue and pink for H &E). However, this assumption does not hold true in the presence of common artifacts such as blood, where the blood cells need a third color component to be represented. Blood usually hampers the ability of color standardization algorithms to correctly identify the stains in the image, producing unexpected outputs. In this work, we propose a robust Bayesian K-Singular Value Decomposition (BKSVD) model to simultaneously detect blood and separate color from structure in histological images. Our method was tested on synthetic and real images containing different amounts of blood pixels.

Fernando Pérez-Bueno, Kjersti Engan, Rafael Molina
Can Knowledge Transfer Techniques Compensate for the Limited Myocardial Infarction Data by Leveraging Hæmodynamics? An in silico Study

The goal of this work is to investigate the ability of transfer learning (TL) and multitask learning (MTL) algorithms to predict tasks related to myocardial infarction (MI) in a small–data regime, leveraging a larger dataset of hæmodynamic targets. The data are generated in silico, by solving steady–state Navier–Stokes equations in a patient–specific bifurcation geometry. Stenoses, whose location, shape, and dimension vary among the datapoints, are artificially incorporated in the geometry to replicate coronary artery disease conditions. The model input consists of a pair of greyscale images, obtained by postprocessing the velocity field resulting from the numerical simulations. The output is a synthetic MI risk index, designed as a function of various geometrical and hæmodynamic parameters, such as the diameter stenosis and the wall shear stress (WSS) at the plaque throat. Moreover, the Fractional Flow Reserve (FFR) at each outlet branch is computed. The ResNet18 model trained on all the available MI labels is taken as reference. We consider two scenarios. In the first one, we assume that only a fraction of MI labels is available. For TL, models pretrained on FFR data — learned on the full dataset — reach accuracies comparable to the reference. In the second scenario, instead, we suppose also the number of known FFR labels to be small. We employ MTL algorithms in order to leverage domain–specific feature sharing, and significant accuracy gains with respect to the baseline single–task learning approach are observed. Ultimately, we conclude that exploiting representations learned from hæmodynamics–related tasks improves the predictive capability of the models.

Riccardo Tenderini, Federico Betti, Ortal Yona Senouf, Olivier Muller, Simone Deparis, Annalisa Buffa, Emmanuel Abbé
COVID-19 Diagnosis in 3D Chest CT Scans with Attention-Based Models

The three-dimensional information in CT scans reveals notorious findings in the medical context, also for detecting symptoms of COVID-19 in chest CT scans. However, due to the lack of availability of large-scale datasets in 3D, the use of attention-based models in this field is proven to be difficult. With transfer learning, this work tackles this problem, investigating the performance of a pre-trained TimeSformer model, which was originally developed for video classification, on COVID-19 classification of three-dimensional chest CT scans. The attention-based model outperforms a DenseNet baseline. Furthermore, we propose three new attention schemes for TimeSformer improving the accuracy of the model by 1.5% and reducing runtime by almost 25% compared to the original attention scheme.

Kathrin Hartmann, Enrique Hortal
Generalized Deep Learning-Based Proximal Gradient Descent for MR Reconstruction

The data consistency for the physical forward model is crucial in inverse problems, especially in MR imaging reconstruction. The standard way is to unroll an iterative algorithm into a neural network with a forward model embedded. The forward model always changes in clinical practice, so the learning component’s entanglement with the forward model makes the reconstruction hard to generalize. The deep learning-based proximal gradient descent was proposed and use a network as regularization term that is independent of the forward model, which makes it more generalizable for different MR acquisition settings. This one-time pre-trained regularization is applied to different MR acquisition settings and was compared to conventional $$\ell _1$$ ℓ 1 regularization showing ~3 dB improvement in the peak signal-to-noise ratio. We also demonstrated the flexibility of the proposed method in choosing different undersampling patterns.

Guanxiong Luo, Mengmeng Kuang, Peng Cao
Crowdsourcing Segmentation of Histopathological Images Using Annotations Provided by Medical Students

Segmentation of histopathological images is an essential task for cancer diagnosis and prognosis in computational pathology. Unfortunately, Machine Learning (ML) techniques need large labeled datasets to train accurate segmentation algorithms that generalize well. A possible solution to alleviate this burden is crowdsourcing, which distributes the effort of labeling among a group of (non-expert) individuals. The bias and noise from less experienced annotators may hamper the performance of machine learning techniques. So far, crowdsourcing approaches in ML leveraging these noisy labels achieve promising results in classification. However, crowdsourcing segmentation is still a challenge in histopathological images. This paper presents a novel crowdsourcing approach to the segmentation of Triple Negative Breast Cancer images. Our method is based on the UNet architecture incorporating a pre-trained ResNet-34 as a backbone. The noisy behavior of the annotators is modeled with a coupled network. Our methodology is validated on a real-world dataset annotated by medical students, where five classes are distinguished. The results show that our method with crowd labels achieves a high level of accuracy in segmentation (DICE: 0.7578), outperforming the well-known STAPLE (DICE: 0.7039) and close to the segmentation model using expert labels (DICE: 0.7723). In conclusion, the initial results of our work suggest that crowdsourcing is a feasible approach to segmentation in histopathological images https://github.com/wizmik12/CRowd_Seg .

Miguel López-Pérez, Pablo Morales-Álvarez, Lee A. D. Cooper, Rafael Molina, Aggelos K. Katsaggelos
Automatic Sleep Stage Classification on EEG Signals Using Time-Frequency Representation

Sleep stage scoring based on electroencephalogram (EEG) signals is a repetitive task required for basic and clinical sleep studies. Sleep stages are defined on 30 s EEG-epochs from brainwave patterns present in specific frequency bands. Time-frequency representations such as spectrograms can be used as input for deep learning methods. In this paper we compare different spectrograms, encoding multiple EEG channels, as input for a deep network devoted to the recognition of image’s visual patterns. We further investigate how contextual input enhance the classification by using EEG-epoch sequences of increasing lengths. We also propose a common evaluation framework to allow a fair comparison between state-of-art methods. Evaluations performed on a standard dataset using this unified protocol show that our method outperforms four state-of-art methods.

Paul Dequidt, Mathieu Seraphim, Alexis Lechervy, Ivan Igor Gaez, Luc Brun, Olivier Etard
Learning EKG Diagnostic Models with Hierarchical Class Label Dependencies

Electrocardiogram (EKG/ECG) is a key diagnostic tool to assess patient’s cardiac condition and is widely used in clinical applications such as patient monitoring, surgery support, and heart medicine research. With recent advances in machine learning (ML) technology there has been a growing interest in the development of models supporting automatic EKG interpretation and diagnosis based on past EKG data. The problem can be modeled as multi-label classification (MLC), where the objective is to learn a function that maps each EKG reading to a vector of diagnostic class labels reflecting the underlying patient condition at different levels of abstraction. In this paper, we propose and investigate an ML model that considers class-label dependency embedded in the hierarchical organization of EKG diagnoses to improve the EKG classification performance. Our model first transforms the EKG signals into a low-dimensional vector, and after that uses the vector to predict different class labels with the help of the conditional tree structured Bayesian network (CTBN) that is able to capture hierarchical dependencies among class variables. We evaluate our model on the publicly available PTB-XL dataset. Our experiments demonstrate that modeling of hierarchical dependencies among class variables improves the diagnostic model performance under multiple classification performance metrics as compared to classification models that predict each class label independently.

Junheng Wang, Milos Hauskrecht
Discriminant Audio Properties in Deep Learning Based Respiratory Insufficiency Detection in Brazilian Portuguese

This work investigates Artificial Intelligence (AI) systems that detect respiratory insufficiency (RI) by analyzing speech audios, thus treating speech as a RI biomarker. Previous works [2, 6] collected RI data (P1) from COVID-19 patients during the first phase of the pandemic and trained modern AI models, such as CNNs and Transformers, which achieved $$96.5\%$$ 96.5 % accuracy, showing the feasibility of RI detection via AI. Here, we collect RI patient data (P2) with several causes besides COVID-19, aiming at extending AI-based RI detection. We also collected control data from hospital patients without RI. We show that the considered models, when trained on P1, do not generalize to P2, indicating that COVID-19 RI has features that may not be found in all RI types.

Marcelo Matheus Gauy, Larissa Cristina Berti, Arnaldo Cândido Jr, Augusto Camargo Neto, Alfredo Goldman, Anna Sara Shafferman Levin, Marcus Martins, Beatriz Raposo de Medeiros, Marcelo Queiroz, Ester Cerdeira Sabino, Flaviane Romani Fernandes Svartman, Marcelo Finger
ECGAN: Self-supervised Generative Adversarial Network for Electrocardiography

High-quality synthetic data can support the development of effective predictive models for biomedical tasks, especially in rare diseases or when subject to compelling privacy constraints. These limitations, for instance, negatively impact open access to electrocardiography datasets about arrhythmias. This work introduces a self-supervised approach to the generation of synthetic electrocardiography time series which is shown to promote morphological plausibility. Our model (ECGAN) allows conditioning the generative process for specific rhythm abnormalities, enhancing synchronization and diversity across samples with respect to literature models. A dedicated sample quality assessment framework is also defined, leveraging arrhythmia classifiers. The empirical results highlight a substantial improvement against state-of-the-art generative models for sequences and audio synthesis.

Lorenzo Simone, Davide Bacciu

Data Analysis and Statistical Models

Frontmatter
Nation-Wide ePrescription Data Reveals Landscape of Physicians and Their Drug Prescribing Patterns in Slovenia

Throughout biomedicine, researchers aim to characterize entities of interest, infer landscapes of cells, tissues, diseases, treatments, and drugs, and reason on their relations. We here report on a data-driven approach to construct the landscape of all the physicians in Slovenia and uncover patterns of their drug prescriptions. To characterize physicians, we use the data on their ePrescriptions as provided by the Slovenian National Institute of Public Health. The data from the entire year of 2018 includes 10,766 physicians and 23,380 drugs. We describe physicians with vectors of drug prescription frequency and use the t-SNE dimensionality reduction technique to create a visual map of practicing physicians. We develop an embedding annotation technique that describes each visually-discernible cluster in the visualization with enriched top-level Anatomical Therapeutic Chemical classification terms. Our analysis shows that distinct groups of physicians correspond to different specializations, including dermatology, gynecology, and psychiatry. The visualization also reveals potential overlaps of drug prescribing patterns, indicating possible trends of physicians practicing multiple disciplines. Our approach provides a helpful visual representation of the landscape of the country’s physicians, reveals their prescription domains, and provides an instrument to inform and support healthcare managers and policymakers in reviewing the country’s public health status and resource allocation.

Pavlin G. Poličar, Dalibor Stanimirović, Blaž Zupan
Machine Learning Based Prediction of Incident Cases of Crohn’s Disease Using Electronic Health Records from a Large Integrated Health System

Early diagnosis and treatment of Crohn’s Disease (CD) is associated with decreased risk of surgery and complications. However, diagnostic delay is common in clinical practice. In order to better understand CD risk factors and disease indicators, we identified incident CD patients and controls within the Mount Sinai Data Warehouse (MSDW) and developed machine learning (ML) models for disease prediction.CD incident cases were defined based on CD diagnosis codes, medication prescriptions, healthcare utilization before first CD diagnosis, and clinical text, using structured Electronic Health Records (EHR) and clinical notes from MSDW. Cases were matched to controls based on sex, age and healthcare utilization. Thus, we identified 249 incident CD cases and 1,242 matched controls in MSDW. We excluded data from 180 days before first CD diagnosis for cohort characterization and predictive modeling. Clinical text was encoded by term frequency-inverse document frequency and structured EHR features were aggregated. We compared three ML models: Logistic Regression, Random Forest, and XGBoost.Gastrointestinal symptoms, for instance anal fistula and irritable bowel syndrome, are significantly overrepresented in cases at least 180 days before the first CD code (prevalence of 33% in cases compared to 12% in controls). XGBoost is the best performing model to predict CD with an AUROC of 0.72 based on structured EHR data only. Features with highest predictive importance from structured EHR include anemia lab values and race (white). The results suggest that ML algorithms could enable earlier diagnosis of CD and reduce the diagnostic delay.

Julian Hugo, Susanne Ibing, Florian Borchert, Jan Philipp Sachs, Judy Cho, Ryan C. Ungaro, Erwin P. Böttinger
Prognostic Prediction of Pediatric DHF in Two Hospitals in Thailand

Dengue virus infection is a major global health problem. While dengue fever rarely results in serious complications, the more severe illness dengue hemorrhagic fever (DHF) has a significant mortality rate due to the associated plasma leakage. Proper care thus requires identifying patients with DHF among those with suspected dengue so that they can be provided with adequate and prompt fluid replacement. In this paper, we use 18 years of pediatric patient data collected prospectively from two hospitals in Thailand to develop models to predict DHF among patients with suspected dengue. The best model using pooled data from both hospitals achieved an AUC of 0.92. We then investigate the generalizability of the models by constructing a model for one hospital and testing it on the other, a question that has not yet been adequately explored in the literature on DHF prediction. For some models, we find significant degradation in performance. We show this is due to differences in attribute values among the two hospital patient populations. Possible sources of this are differences in the definition of attributes and differences in the pathogenesis of the disease among the two sub-populations. We conclude that while high predictive accuracy is possible, care must be taken when seeking to apply DHF predictive models from one clinical setting to another.

Peter Haddawy, Myat Su Yin, Panhavath Meth, Araya Srikaew, Chonnikarn Wavemanee, Saranath Lawpoolsri Niyom, Kanokwan Sriraksa, Wannee Limpitikul, Preedawadee Kittirat, Prida Malasit, Panisadee Avirutnan, Dumrong Mairiang
The Impact of Bias on Drift Detection in AI Health Software

Despite the potential of AI in healthcare decision-making, there are also risks to the public for different reasons. Bias is one risk: any data unfairness present in the training set, such as the under-representation of certain minority groups, will be reflected by the model resulting in inaccurate predictions. Data drift is another concern: models trained on obsolete data will perform poorly on newly available data. Approaches to analysing bias and data drift independently are already available in the literature, allowing researchers to develop inclusive models or models that are up-to-date. However, the two issues can interact with each other. For instance, drifts within under-represented subgroups might be masked when assessing a model on the whole population. To ensure the deployment of a trustworthy model, we propose that it is crucial to evaluate its performance both on the overall population and across under-represented cohorts. In this paper, we explore a methodology to investigate the presence of drift that may only be evident in sub-populations in two protected attributes, i.e., ethnicity and gender. We use the BayesBoost technique to capture under-represented individuals and to boost these cases by inferring cases from a Bayesian network. Lastly, we evaluate the capability of this technique to handle some cases of drift detection across different sub-populations.

Asal Khoshravan Azar, Barbara Draghi, Ylenia Rotalinti, Puja Myles, Allan Tucker
A Topological Data Analysis Framework for Computational Phenotyping

Topological Data Analysis (TDA) aims to extract relevant information from the underlying topology of data projections. In the healthcare domain, TDA has been successfully used to infer structural phenotypes from complex data by linking patients who display demographic, clinical, and biomarker similarities.In this paper we propose pheTDA, a TDA-based framework to assist the computational definition of novel phenotypes. More in details, the pheTDA (i) guides the application of the Topological Mapper algorithm to derive a robust data representation as a topological graph; (ii) identifies relevant subgroups of patients from the topology; (iii) assess discriminative features for each subgroup of patients via predictive models.We applied the proposed tool on a population of 725 patients with suspected coronary artery disease (CAD). pheTDA identified five novel subgroups, one of which is characterized by the presence of diabetic patients showing high cardiovascular risk score. In addition, we compare the results obtained with existing clustering algorithms, showing that pheTDA obtains better performance when compared to spectral decomposition followed by k-means.

Giuseppe Albi, Alessia Gerbasi, Mattia Chiesa, Gualtiero I. Colombo, Riccardo Bellazzi, Arianna Dagliati
Ranking of Survival-Related Gene Sets Through Integration of Single-Sample Gene Set Enrichment and Survival Analysis

The onset and progression of a disease are often associated with changes in the expression of groups of genes from a particular molecular pathway. Gene set enrichment analysis has thus become a widely used tool in studying disease expression data; however, it has scarcely been utilized in the domain of survival analysis. Here we propose a computational approach to gene set enrichment analysis tailored to survival data. Our technique computes a single-sample gene set enrichment score for a particular gene set, separates the samples into an enriched and non-enriched cohort, and evaluates the separation according to the difference in survival of the cohorts. Using our method on the data from The Cancer Genome Atlas and Molecular Signatures Database Hallmark gene set collection, we successfully identified the gene sets whose enrichment is predictive of survival in particular cancer types. We show that the results of our method are supported by the empirical literature, where genes in the top-ranked gene sets are associated with survival prognosis. Our approach presents the potential of applying gene set enrichment to the domain of survival analysis, linking the disease-related changes in molecular pathways to survival prognosis.

Martin Špendl, Jaka Kokošar, Ela Praznik, Luka Ausec, Blaž Zupan

Knowledge Representation and Decision Support

Frontmatter
Supporting the Prediction of AKI Evolution Through Interval-Based Approximate Predictive Functional Dependencies

In this paper, we focus on the early prediction of patterns related to the severity stage of Acute Kidney Injury (AKI) in an ICU setting. Such problem is challenging from several points of view: (i) AKI in ICU is a high-risk complication for ICU patients and needs to be suitably prevented, and (ii) the detection of AKI pathological states is done with some delay, due to the required data collection. To support the early prediction of AKI diagnosis, we extend a recently-proposed temporal framework to deal with the prediction of multivalued interval-based patterns, representing the evolution of pathological states of patients. We evaluated our approach on the MIMIC-IV dataset.

Beatrice Amico, Carlo Combi
Integrating Ontological Knowledge with Probability Data to Aid Diagnosis in Radiology

Radiological diagnosis requires integration of imaging observations with patient factors, such as age, sex, and medical history. Imaging manifestations of a disease can be highly variable; conversely, one imaging finding may suggest several possible causes. To account for the inherent uncertainty of radiological diagnosis, this report explores the integration of probability data with an ontology of radiological diagnosis. The Radiology Gamuts Ontology (RGO) incorporates 16,839 entities that define diseases, interventions, and imaging observations of relevance to diagnostic radiology. RGO’s 55,564 causal (“may cause”) relationships link disorders and their potential imaging manifestations. From a cohort of 1.7 million radiology reports on more than 1.3 million patients, the frequency of individual RGO entities and of their pairwise co-occurrence was identified. These data allow estimation of conditional probabilities of pairs of entities. A user interface enables one to traverse the ontology’s network of causal relations with associated conditional-probability data. The system generates Bayesian network models that integrate an entity’s age and sex distribution with its causally related conditions.

Charles E. Kahn Jr.
Ontology Model for Supporting Process Mining on Healthcare-Related Data

In the field of Medicine, Process Mining (PM) can be used to analyse healthcare-related data to infer the underlying diagnostic, treatment, and management processes. The PM paradigm provides techniques and tools to obtain information about the processes carried out by analysing the trace of healthcare events in the Electronic Health Records. In PM, workflows are the most frequent formalism used for representing the PM models. Despite the efforts to develop user-friendly tools, the understanding of PM models remains problematic. To improve this situation, we target the representation of PM models using ontologies. In this paper, we present a first version of the Clinical Process Model Ontology (CPMO), aimed at describing the sequential structure and associated metadata of PM models. Finally, we show the application of the CPMO to the domain of prostate cancer.

José Antonio Miñarro-Giménez, Carlos Fernández-Llatas, Begoña Martínez-Salvador, Catalina Martínez-Costa, Mar Marcos, Jesualdo Tomás Fernández-Breis
Real-World Evidence Inclusion in Guideline-Based Clinical Decision Support Systems: Breast Cancer Use Case

Adopting Clinical Decision Support Systems (CDSS) in clinical practice has shown to benefit both patients and healthcare providers. These CDSS need to be updated when new evidence, data, or guidelines arise since up-to-date evidence directly impacts physician acceptance and adherence to these systems. To this end, in previous studies, methodologies have been developed to update CDSS content by taking advantage of machine learning (ML) algorithms. Modifications in the domain knowledge require a reviewing and validation process before being implemented in clinical practice. Hence, this paper presents a methodology for including real-world evidence in an evidence-based CDSS for breast cancer use case. Decision trees (DT) algorithms are used to suggest modifications based on the analysis of retrospective data, which clinical experts review before being implanted in the CDSS. This way, our methodology allows to combine clinical knowledge from both guidelines and real-world data and enrich the domain clinical knowledge with real-world evidence.

Jordi Torres, Eduardo Alonso, Nekane Larburu
Decentralized Web-Based Clinical Decision Support Using Semantic GLEAN Workflows

Decentralized clinical decision support (CDS), using the Web browser as a local application platform, fully decouples the CDS from vendor-specific EMR, removes reliance on server infrastructure, and does not require custom software. Using GLEAN, a clinical workflow can be loaded within a Web browser to provide decentralized and specialized CDS at a point-of-care. To that end, GLEAN workflows include all knowledge needed for their local execution; the standards-based and secure data sharing with EMR, if needed; and detection of multimorbidity conflicts. This specialized CDS will execute all decision logic locally in the Web browser; using SMART-on-FHIR, locally entered data can be securely submitted to a FHIR-compliant EMR, and remote data can be retrieved. In such a decentralized setting, clinicians can securely collaborate on multimorbidity patients: (1) by sharing workflow traces, i.e., progression of their local workflows, other clinicians can keep appraised of their decision making; and (2) by leveraging medical online knowledge sources, conflicts (e.g., drug-drug, drug-interaction) between multimorbidity decisions can be detected and resolved.

William Van Woensel, Samina Abidi, Syed Sibte Raza Abidi
An Interactive Dashboard for Patient Monitoring and Management: A Support Tool to the Continuity of Care Centre

In recent years, dashboard utilization in healthcare organizations has emerged as a useful tool to monitor and improve the quality of care. In this work, we present a dashboard specifically designed to support the daily clinical activities of the Continuity of Care Centre (CCA) in Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome. CCA team, composed of multidisciplinary professional figures, plans and optimizes the treatment and discharge pathway for frail hospitalized patients. The team monitors the conditions of a large number of patients, which are spread across hospital departments, and fragmentation of patients’ information on electronic health records (EHRs) is one of the greatest issues it has to deal with. The presented dashboard integrates and harmonizes data from several data sources providing physicians with a daily updated and interactive visualization of patient longitudinal history throughout hospitalization. The monitoring activity of CCA is supported through filtering options that allow to manage subgroups of patients individually. Moreover, automatic alerts promptly identify critical clinical conditions and deviations of patient processes from hospital protocols. In order to optimize patient discharge and bed occupancy, the dashboard displays also the intensity of medical care trend during hospitalization and highlights low intensity of care in patients ready to be discharged. The dashboard is currently in use by CCA team as a support tool to monitor and manage the admitted patients.

Mariachiara Savino, Nicola Acampora, Carlotta Masciocchi, Roberto Gatta, Chiara Dachena, Stefania Orini, Andrea Cambieri, Francesco Landi, Graziano Onder, Andrea Russo, Sara Salini, Vincenzo Valentini, Andrea Damiani, Stefano Patarnello, Christian Barillaro
A General-Purpose AI Assistant Embedded in an Open-Source Radiology Information System

Radiology AI models have made significant progress in near-human performance or surpassing it. However, AI model’s partnership with human radiologist remains an unexplored challenge due to the lack of health information standards, contextual and workflow differences, and data labeling variations. To overcome these challenges, we integrated an AI model service that uses DICOM standard SR annotations into the OHIF viewer in the open-source LibreHealth Radiology Information Systems (RIS). In this paper, we describe the novel Human-AI partnership capabilities of the platform, including few-shot learning and swarm learning approaches to retrain the AI models continuously. Building on the concept of machine teaching, we developed an active learning strategy within the RIS, so that the human radiologist can enable/disable AI annotations as well as “fix”/relabel the AI annotations. These annotations are then used to retrain the models. This helps establish a partnership between the radiologist user and a user-specific AI model. The weights of these user-specific models are then finally shared between multiple models in a swarm learning approach.

Saptarshi Purkayastha, Rohan Isaac, Sharon Anthony, Shikhar Shukla, Elizabeth A. Krupinski, Joshua A. Danish, Judy Wawira Gichoya
Management of Patient and Physician Preferences and Explanations for Participatory Evaluation of Treatment with an Ethical Seal

Clinical practice guidelines (CPGs) suffer from several limitations, including limited patient participation. APPRAISE-RS is a methodology for generating treatment recommendations that overcomes this limitation by enabling both patients and clinicians to express their personal preferences about the treatment outcomes. However, patient, and clinical preferences are treated with equal importance, while it seems reasonable/fair to give more importance to clinicians’ preferences as they have more experience on the matter. In this work we present APPRAISE-RS-E, which considers different ponderations when including users’ preferences based on their experience for the generation of treatment recommendations. Moreover, since users are involved in the decision loop, an explanation of the recommendations is provided. Finally, as APPRAISE-RS-E uses AI methods, it has been evaluated using a set of principles and observable indicators, getting an ethical seal that informs users about the ethical issues involved. The experiments have been carried out in the field of attention deficit hyperactivity disorder (ADHD).

Oscar Raya, Xavier Castells, David Ramírez, Beatriz López
Backmatter
Metadata
Title
Artificial Intelligence in Medicine
Editors
Jose M. Juarez
Mar Marcos
Gregor Stiglic
Allan Tucker
Copyright Year
2023
Electronic ISBN
978-3-031-34344-5
Print ISBN
978-3-031-34343-8
DOI
https://doi.org/10.1007/978-3-031-34344-5

Premium Partner