Skip to main content

Über dieses Buch

This book constitutes the refereed proceedings of the 15th Conference on Artificial Intelligence in Medicine, AIME 2015, held in Pavia, Italy, in June 2015. The 19 revised full and 24 short papers presented were carefully reviewed and selected from 99 submissions. The papers are organized in the following topical sections: process mining and phenotyping; data mining and machine learning; temporal data mining; uncertainty and Bayesian networks; text mining; prediction in clinical practice; and knowledge representation and guidelines.



Keynote Presentations


Physics of the Medical Record: Handling Time in Health Record Studies

The rapid increase in adoption of electronic health records (EHRs) creates the possibility of tracking billions of patient visits per year and exploiting them for clinical research. The international observational research collaboration, Observational Health Data Sciences and Informatics (OHDSI), has counted 682 million patient records that have been converted to a common format known at the OMOP Common Data Model [1]. While this number includes duplicates and records that have not been made broadly available to researchers, its scale demonstrates that converting the world population to a common format is feasible.

George Hripcsak

Contextualisation of Biomedical Knowledge Through Large-Scale Processing of Literature, Clinical Narratives and Social Media

Medicine is often pictured as one of the main examples of “big data science” with a number of challenges and successful stories where

data have saved lives

[1]. In addition to structured databases that store expert-curated information, unstructured and semi-structured data is a huge and often most up-to-date resource of medical knowledge. These include scientific literature, clinical narratives and social media, which typically capture findings, knowledge and experience of the three main “stakeholder” communities: researchers, clinicians and patients/carers. The ability to harness such data is essential for the integration of medical information to support clinical decision making and medical research.

Goran Nenadic

Process Mining and Phenotyping


An Active Learning Framework for Efficient Condition Severity Classification

Understanding condition severity, as extracted from Electronic Health Records (EHRs), is important for many public health purposes. Methods requiring physicians to annotate condition severity are time-consuming and costly. Previously, a passive learning algorithm called CAESAR was developed to capture severity in EHRs. This approach required physicians to label conditions manually, an exhaustive process. We developed a framework that uses two Active Learning (AL) methods (Exploitation and Combination_XA) to decrease manual labeling efforts by selecting only the most informative conditions for training. We call our approach CAESAR-Active Learning Enhancement (CAESAR-ALE). As compared to passive methods,CAESAR-ALE’s first AL method, Exploitation, reduced labeling efforts by 64% and achieved an equivalent true positive rate, while CAESAR-ALE’s second AL method, Combination_XA, reduced labeling efforts by 48% and achieved equivalent accuracy. In addition, both these AL methods outperformed the traditional AL method (SVM-Margin). These results demonstrate the potential of AL methods for decreasing the labeling efforts of medical experts, while achieving greater accuracy and lower costs.

Nir Nissim, Mary Regina Boland, Robert Moskovitch, Nicholas P. Tatonetti, Yuval Elovici, Yuval Shahar, George Hripcsak

Predictive Monitoring of Local Anomalies in Clinical Treatment Processes

Local anomalies are small outliers that exist in some subsegments of clinical treatment processes (CTPs). They provide crucial information to medical staff and hospital managers for determining the efficient medical service delivered to individual patients, and for promptly handling unusual treatment behaviors in CTPs. Existing studies mainly focused on the detection of large deviations of CTPs, called of global anomalous inpatient traces. However, local anomalies in inpatient traces are easily overlooked by existing approaches. In some medical problems, such as unstable angina, local anomalies are important since they may indicate unexpected changes of patients’ physical conditions. In this work, we propose a predictive monitoring service on local anomalies using a Latent Dirichlet Allocation (LDA)-based probabilistic model. The proposal was evaluated in the study of unstable angina CTP, testing 12,152 patient traces from the Chinese PLA General Hospital.

Zhengxing Huang, Jose M. Juarez, Wei Dong, Lei Ji, Huilong Duan

Mining Surgery Phase-Related Sequential Rules from Vertebroplasty Simulations Traces

We present in this paper an algorithm for extracting perceptual-gestural rules from heterogeneous multisource traces. The challenge that we address is two-fold: 1) represent traces such that they render coherently all aspect of this multimodal knowledge; 2) ensure that key tutoring services can be produced on top of represented traces. In the spirit of automatic knowledge acquisition paradigm proposed in the literature, we implemented PhARules, a modified version of an existing algorithm, CMRules, for mining surgery phase-aware sequential rules from simulated surgery traces. We demonstrated the efficiency of our algorithm as well its performance limits on traces of simulations of vertebroplasty recorded in TELEOS, an Intelligent Tutoring System dedicated to percutaneous orthopedic surgery.

Ben-Manson Toussaint, Vanda Luengo

Data Driven Order Set Development Using Metaheuristic Optimization

An unanticipated negative consequence of using healthcare information technology for clinical care is the cognitive workload imposed on users due to poor usability characteristics. This is a widely recognized challenge in the context of computerized provider order entry (CPOE) technology. In this paper, we investigate cognitive workload in the use of order sets, a core feature of CPOE systems that assists clinicians with medical order placement. We propose an automated, data-driven algorithm for developing order sets such that clinicians’ cognitive workload is minimized. Our algorithm incorporates a two-stage optimization model embedded with bisecting K-means clustering and tabu search to optimize the content of order sets, as well as the time intervals where specific order sets are recommended in the CPOE. We evaluate our algorithm using real patient data from a pediatric hospital, and demonstrate that data-driven order sets have the potential to dominate existing, consensus order sets in terms of usability and cognitive workload.

Yiye Zhang, Rema Padman

Conceptual Modeling of Clinical Pathways: Making Data and Processes Connected

In this paper, we propose a framework for seamless conceptual modeling of both data and processes, and the seamless integration of temporalities for both clinical data and clinical tasks. Moreover, we apply our approach to model the clinical pathway for managing patients with ROSC (Return Of Spontaneous Circulation) in a real ICU clinical setting.

Carlo Combi, Barbara Oliboni, Alberto Gabrieli

Data Mining and Machine Learning


Distributed Learning to Protect Privacy in Multi-centric Clinical Studies

Research in medicine has to deal with the growing amount of data about patients which are made available by modern technologies. All these data might be used to support statistical studies, and for identifying causal relations. To use these data, which are spread across hospitals, efficient merging techniques as well as policies to deal with this sensitive information are strongly needed. In this paper we introduce and empirically test a distributed learning approach, to train Support Vector Machines (SVM), that allows to overcome problems related to privacy and data being spread around. The introduced technique allows to train algorithms without sharing any patients-related information, ensuring privacy and avoids the development of merging tools. We tested this approach on a large dataset and we described results, in terms of convergence and performance; we also provide considerations about the features of an IT architecture designed to support distributed learning computations.

Andrea Damiani, Mauro Vallati, Roberto Gatta, Nicola Dinapoli, Arthur Jochems, Timo Deist, Johan van Soest, Andre Dekker, Vincenzo Valentini

Mining Hierarchical Pathology Data Using Inductive Logic Programming

Considerable amounts of data are continuously generated by pathologists in the form of pathology reports. To date, there has been relatively little work exploring how to apply machine learning and data mining techniques to these data in order to extract novel clinical relationships. From a learning perspective, these pathology data possess a number of challenging properties, in particular, the temporal and hierarchical structure that is present within the data. In this paper, we propose a methodology based on inductive logic programming to extract novel associations from pathology excerpts. We discuss the challenges posed by analyzing these data and discuss how we address them. As a case study, we apply our methodology to Dutch pathology data for discovering possible causes of two rare diseases: cholangitis and breast angiosarcomas.

Tim Op De Beéck, Arjen Hommersom, Jan Van Haaren, Maarten van der Heijden, Jesse Davis, Peter Lucas, Lucy Overbeek, Iris Nagtegaal

What if Your Floor Could Tell Someone You Fell? A Device Free Fall Detection Method

Falls in the home environment are a serious cause of injury in older people leading to loss of independence and increased health related financial costs. In this study we investigate a device free method to detect falls by using simple batteryless radio frequency identification (RFID) tags in a smart RFID enabled carpet. Our method extracts information from the tags and the environment of the carpeted floor and applies machine learning techniques to make an autonomous decision regarding the posture of a person on the floor. This information can be used to automatically seek assistance to help the subject and decrease the negative effects of ‘long-lie’ after a fall. Our approach does not require video monitoring or body worn kinematic sensors; hence preserves the privacy of the dwellers, reduces costs and eliminates the need to remember to wear a device. Our results indicate a good performance for fall detection with an overall F-score of 94%.

Roberto Luis Shinmoto Torres, Asanga Wickramasinghe, Viet Ninh Pham, Damith Chinthana Ranasinghe

Domain knowledge Based Hierarchical Feature Selection for 30-Day Hospital Readmission Prediction

Many studies fail to provide models for 30-day hospital re-admission prediction with satisfactory performance due to high dimensionality and sparsity. Efficient feature selection techniques allow better generalization of predictive models and improved interpretability, which is a very important property for applications in health care. We propose feature selection method that exploits hierarchical domain knowledge together with data. The new method is evaluated on predicting 30-day hospital readmission for pediatric patients from California and provides evidence that a knowledge-based approach outperforms traditional methods and that the newly proposed method is competitive with state-of-the-art methods.

Sandro Radovanovic, Milan Vukicevic, Ana Kovacevic, Gregor Stiglic, Zoran Obradovic

A Genomic Data Fusion Framework to Exploit Rare and Common Variants for Association Discovery

Collapsing methods are used in association studies to exploit the effect of genetic rare variants in diseases. In this work we model an enriched collapsing approach by including genes, protein domains, pathways and protein-protein interactions data. We applied the collapsing technique to a data set of epileptic (85 cases) and healthy (61 controls) subjects. The method retrieved 4 genes, 5 domains, 33 gene interactions and 14 pathways showing a significant association with the disease. Collapsed data have been also used as features for prediction models. We found that the use of protein-protein interactions as model features increases the area under ROC curve (+1.5%) if compared to the solely gene-based approach.

Simone Marini, Ivan Limongelli, Ettore Rizzo, Tan Da, Riccardo Bellazzi

Collaborative Filtering for Estimating Health Related Utilities in Decision Support Systems

A distinctive feature of most advanced clinical decision support systems is the ability to adapt to habits and preferences of patients. However effective preferences elicitation is still among the most challenging tasks to achieve fully personalized guidance. On the other hand availability of data related to patients’ lives and habits is steadily increasing, making its exploitation an interesting opportunity for such purposes. In the MobiGuide project decision trees are used to implement shared-decision making using utility coefficients to incorporate patient preferences in the model. The main focus of this paper is the effort devoted to enhance traditional elicitation techniques proposing a methodology to predict patients’ health-related utility coefficients. In particular we describe a recommender system, based on collaborative filtering, capable of estimating utilities by means of integrating different data sources such as medical surveys, questionnaires and utility elicitation tools along with patient self-reported experiences in the form of natural language.

Enea Parimbelli, Silvana Quaglini, Riccardo Bellazzi, John H. Holmes

Temporal Data Mining


Updating Stochastic Networks to Integrate Cross-Sectional and Longitudinal Studies

Clinical trials are typically conducted over a population within a defined time period in order to illuminate certain characteristics of a health issue or disease process. These cross-sectional studies provide a snapshot of these disease processes over a large number of people but do not allow us to model the temporal nature of disease, which is essential for modelling detailed prognostic predictions. Longitudinal studies on the other hand, are used to explore how these processes develop over time in a number of people but can be expensive and time-consuming, and many studies only cover a relatively small window within the disease process. This paper explores the application of intelligent data analysis techniques for building reliable models of disease progression from both cross-sectional and longitudinal studies. The aim is to learn disease ‘trajectories’ from cross-sectional data by building realistic trajectories from healthy patients to those with advanced disease. We focus on exploring whether we can ‘calibrate’ models learnt from these trajectories with real longitudinal data using Baum-Welch re-estimation.

Allan Tucker, Yuanxi Li

Optimal Sub-Sequence Matching for the Automatic Prediction of Surgical Tasks

Surgery is one of the riskiest and most important medical acts that is performed today. The desires to improve patient outcomes, surgeon training, and also to reduce the costs of surgery, have motivated surgeons to equip their Operating Rooms with sensors that describe the surgical intervention. The richness and complexity of the data that is collected calls for new machine learning methods to support pre-, peri- and post-surgery (before, during and after).

This paper introduces a new method for the prediction of the next task that the surgeon is going to perform during the surgery (


). Our method bases its prediction on the optimal matching of the current surgery to a set of pre-recorded surgeries.

We assess our method on a set of neurosurgeries (lumbar disc herniation removal) and show that our method outperforms the state of the art by providing a prediction (of the next task that is going to be performed by the surgeon) more than 85% of the time with a 95% accuracy.

Germain Forestier, François Petitjean, Laurent Riffaud, Pierre Jannin

On the Advantage of Using Dedicated Data Mining Techniques to Predict Colorectal Cancer

Electronic Medical Records (EMRs) provide a wealth of data that can be used to generate predictive models for diseases. Quite some studies have been performed that use EMRs to generate such models for specific diseases, but most of them are based on more traditional techniques used in medical domain, such as logistic regression. This paper studies the benefit of using advanced data mining techniques for Colorectal Cancer (CRC). CRC is the second most common cancer in the EU and is known to be a disease with very a-specific predictors, making it difficult to generate good predictive models. In addition, the EMR data itself has its own challenges, including the sparsity, the differences in which physicians code the data, the temporal nature of the data, and the imbalance in the data. Results show that state-of-the-art data mining techniques, including temporal data mining, are able to generate better predictive models than currently available in the literature.

Reinier Kop, Mark Hoogendoorn, Leon M. G. Moons, Mattijs E. Numans, Annette ten Teije

Identifying Chemotherapy Regimens in Electronic Health Record Data Using Interval-Encoded Sequence Alignment

Electronic health records (EHRs) play an essential role in patient management and guideline-based care. However, EHRs often do not encode therapy protocols directly, and instead only catalog the individual drug agents patients receive. In this paper, we present an automated approach for protocol identification using EHR data. We introduce a novel sequence alignment method based on the Needleman-Wunsch algorithm that models variation in treatment gaps. Using data on 178 breast cancer patients that included manually annotated chemotherapy protocols, our method successfully matched 93% of regimens based on the top score and had 98% accuracy using the top two scored regimens. These results indicate that our sequence alignment approach can accurately find chemotherapy plans in patient event logs while measuring temporal variation in treatment administration.

Haider Syed, Amar K. Das

An Evaluation Framework for the Comparison of Fine-Grained Predictive Models in Health Care

Within the domain of health care, more and more fine-grained models are observed that predict the development of specific health (or disease-related) states over time. This is due to the increased use of sensors, allowing for continuous assessment, leading to a sharp increase of data. These specific models are often much more complex than high-level predictive models that e.g. give a general risk score for a disease, making the evaluation of these models far from trivial. In this paper, we present an evaluation framework which is able to score fine-grained temporal models that aim at predicting multiple health states, considering their capability to describe data, their capability to predict, the quality of the models parameters, and the model complexity.

Ward R. J. van Breda, Mark Hoogendoorn, A. E. Eiben, Matthias Berking

A Model for Cross-Platform Searches in Temporal Microarray Data

Even with the advance of next-generation sequencing, microarray technology still has its place in molecular biology. There is a large body of information available through a growing number of studies in public repositories like NCBI GEO and ArrayExpress. Software is now developed to allow for cross-platform comparison. An important part of temporal translational research is based on stimulus response studies and includes searching for particular time pattern like peaks in a set of given genes across studies and platforms. This study explores the feasibility based on a statistical model and temporal abstraction using our SPOT software.

Guenter Tusch, Olvi Tole, Mary Ellen Hoinski

Uncertainty and Bayesian Networks


Risk Assessment for Primary Coronary Heart Disease Event Using Dynamic Bayesian Networks

Coronary heart disease (CHD) is the leading cause of mortality worldwide. Primary prevention of CHD denotes limiting a first CHD event in individuals who have not been formally diagnosed with the disease. This paper demonstrates how the integration of a Dynamic Bayesian network (DBN) and temporal abstractions (TAs) can be used for assessing the risk of a primary CHD event. More specifically, we introduce basic TAs into the DBN nodes and apply the extended model to a longitudinal CHD dataset for risk assesment. The obtained results demonstrate the effectiveness of our proposed approach.

Kalia Orphanou, Athena Stassopoulou, Elpida Keravnou

Uncertainty Propagation in Biomedical Models

Mathematical models are prevalent in modern medicine. However, reasoning with realistic biomedical models is computationally demanding as parameters are typically subject to nonlinear relations, dynamic behavior, and uncertainty. This paper addresses this problem by proposing a new framework based on constraint programming for a sound propagation of uncertainty from model parameters to results. We apply our approach to an important problem in the obesity research field, the estimation of free-living energy intake in humans. Complementary to alternative solutions, our approach is able to correctly characterize the provided estimates given the uncertainty inherent to the model parameters.

Andrea Franco, Marco Correia, Jorge Cruz

A Bayesian Network for Probabilistic Reasoning and Imputation of Missing Risk Factors in Type 2 Diabetes

We propose a novel Bayesian network tool to model the probabilistic relations between a set of type 2 diabetes risk factors. The tool can be used for probabilistic reasoning and for imputation of missing values among risk factors.

The Bayesian network is learnt from a joint training set of three European population studies. Tested on an independent patient set, the network is shown to be competitive with both a standard imputation tool and a widely used risk score for type 2 diabetes, providing in addition a richer description of the interdependencies between diabetes risk factors.

Francesco Sambo, Andrea Facchinetti, Liisa Hakaste, Jasmina Kravic, Barbara Di Camillo, Giuseppe Fico, Jaakko Tuomilehto, Leif Groop, Rafael Gabriel, Tuomi Tiinamaija, Claudio Cobelli

Causal Discovery from Medical Data: Dealing with Missing Values and a Mixture of Discrete and Continuous Data

Causal discovery is an increasingly popular method for data analysis in the field of medical research. In this paper we consider two challenges in causal discovery that occur very often when working with medical data: a mixture of discrete and continuous variables and a substantial amount of missing values. To the best of our knowledge there are no methods that can handle both challenges at the same time. In this paper we develop a new method that can handle these challenges based on the assumption that data is missing completely at random and that variables obey a non-paranormal distribution. We demonstrate the validity of our approach for causal discovery for empiric data from a monetary incentive delay task. Our results may help to better understand the etiology of attention deficit-hyperactivity disorder (ADHD).

Elena Sokolova, Perry Groot, Tom Claassen, Daniel von Rhein, Jan Buitelaar, Tom Heskes

Modeling Coronary Artery Calcification Levels from Behavioral Data in a Clinical Study

Cardiovascular disease (CVD) is one of the key causes for death worldwide. We consider the problem of modeling an imaging biomarker, Coronary Artery Calcification (CAC) measured by computed tomography, based on behavioral data. We employ the formalism of Dynamic Bayesian Network (DBN) and learn a DBN from these data. Our learned DBN provides insights about the associations of specific risk factors with CAC levels. Exhaustive empirical results demonstrate that the proposed learning method yields reasonable performance during cross-validation.

Shuo Yang, Kristian Kersting, Greg Terry, Jefferey Carr, Sriraam Natarajan

Running Genome Wide Data Analysis Using a Parallel Approach on a Cloud Platform

Hierarchical Naïve Bayes (HNB) is a multivariate classification algorithm that can be used to forecast the probability of a specific disease by analysing a set of Single Nucleotide Polymorphisms (SNPs). In this paper we present the implementation of HNB using a parallel approach based on the Map-Reduce paradigm built natively on the Hadoop framework, relying on the Amazon Cloud Infrastructure. We tested our approach on two GWAS datasets aimed at identifying the genetic bases of Type 1 (T1D) and Type 2 Diabetes (T2D). Both datasets include individual level data of 1,900 cases and 1,500 controls with ~ 420,000 SNPs. For T2D the best results were obtained using the complete set of SNPs, whereas for T1D the best performances were reached using few SNPs selected through standard univariate association tests. Our cloud-based implementation allows running genome wide simulations cutting down computational time and overall infrastructure costs.

Andrea Demartini, Davide Capozzi, Alberto Malovini, Riccardo Bellazzi

Text Mining


Extracting Adverse Drug Events from Text Using Human Advice

Adverse drug events (ADEs) are a major concern and point of emphasis for the medical profession, government, and society in general. When methods extract ADEs from observational data, there is a necessity to evaluate these methods. More precisely, it is important to know what is already known in the literature. Consequently, we employ a novel relation extraction technique based on a recently developed probabilistic logic learning algorithm that exploits human advice. We demonstrate on a standard adverse drug events data base that the proposed approach can successfully extract existing adverse drug events from limited amount of training data and compares favorably with state-of-the-art probabilistic logic learning methods.

Phillip Odom, Vishal Bangera, Tushar Khot, David Page, Sriraam Natarajan

An Analysis of Twitter Data on E-cigarette Sentiments and Promotion

We investigate general sentiments and information dissemination concerning

electronic cigarettes



using Twitter. E-cigs are relatively new products, and hence, not much research has been conducted in this area using large-scale social media data. However, the fact that e-cigs contain potentially dangerous substances makes them an interesting subject to study. In this paper, we propose novel features for e-cigs sentiment classification and create sentiment dictionaries relevant to e-cigs. We combine the proposed features with traditional features (i.e., bag-of-words and SentiStrength features) and use them in conjunction with supervised machine learning classifiers. The feature combination proves to be more effective than the traditional features for e-cigs sentiment classification. We also found that Twitter users are mainly concerned with sharing information (33%) and promoting e-cigs (22%). Although a low percentage of users share opinions, the majority of these users have positive opinions about e-cigs (11% positive, 3% negative).

Andreea Kamiana Godea, Cornelia Caragea, Florin Adrian Bulgarov, Suhasini Ramisetty-Mikler

Determining User Similarity in Healthcare Social Media Using Content Similarity and Structural Similarity

More and more health consumers discuss healthcare topics with peers in online health social websites. These health social websites empower consumers to actively participate in their own healthcare and promotes communication between people. However, it is difficult for consumers to find information efficiently from hundreds of thousands of discussion threads. Finding similar users for consumers enables them to see what their peers are doing or experiencing thus enables automated selection of “relevant” information. In this work, we proposed two different methods for computing user similarity in healthcare social media using content and structural information respectively. Experiment results showed that the method using structural information from a heterogeneous healthcare information network performed better than content similarity in finding active similar users. However, when the users are not as active or contributing relatively fewer messages in social media, content similarity performed better in identifying these users.

Ling Jiang, Christopher C. Yang

Biomedical Concepts Extraction Based on Possibilistic Network and Vector Space Model

This paper proposes a new approach for indexing biomedical documents based on the combination of a Possibilistic Network and a Vector Space Model. This later carries out partial matching between documents and biomedical vocabularies. The main contribution of the proposed approach is to combine the cosine similarity and the two measures of possibility and necessity to enhance the estimation of the similarity between a document and a given concept. The possibility estimates the extent to which a document is not similar to the concept. The necessity allows the confirmation that the document is similar to the concept. Experiments were carried out on the OSHUMED corpora and showed encouraging results.

Wiem Chebil, Lina Fatima Soualmia, Mohamed Nazih Omri, Stéfan Jacques Darmoni

Answering PICO Clinical Questions: A Semantic Graph-Based Approach

In this paper, we tackle the issue related to the retrieval of the best evidence that fits with a PICO (Population, Intervention, Comparison and Outcome) question. We propose a new document ranking algorithm that relies on semantic based query expansion bounded by the local search context to better discard irrelevant documents. Experiments using a standard dataset including 423 PICO questions and more than 1,2 million of documents, show that our aproach is promising.

Eya Znaidi, Lynda Tamine, Chiraz Latiri

Semantic Analysis and Automatic Corpus Construction for Entailment Recognition in Medical Texts

Textual Entailment Recognition (RTE) consists in detecting inference relationships between natural language sentences. It has a wide range of applications such as machine translation, question answering or text summarization. Significant interest has been brought to RTE with several challenges. However, most of current approaches are dedicated to open domains. The major challenge facing RTE in specialized domains is the lack of relevant training corpora and resources. In this paper we present an automatic corpus construction approach for RTE in the medical domain. We also quantify the impact of using (open-)domain RDF datasets on supervised learning based RTE. We evaluate the relevance of our corpus construction method by comparing the results obtained by an efficient memory based learning algorithm on PASCAL RTE corpora and on our automatically constructed corpus. The results show an accuracy increase of +6 to +28% and an improvement of +8 to +23% in terms of F-measure. We also found that semantic annotations from large open-domain datasets increased F1 score by 6%, while smaller medical RDF datasets actually decreased the overall performance. We discuss these findings and give some pointers to future investigations.

Asma Ben Abacha, Duy Dinh, Yassine Mrabet

Automatic Computing of Global Emotional Polarity in French Health Forum Messages

Social media provide the possibility for people to freely communicate. These discussion are rich with subjectivity and emotions, which is due to the anonymity of contributors. We propose to work on health fora in French and on subjective entities (


emotions, feelings, uncertainties). Our specific interest is to study how the polarity of emotions is influenced by negation, uncertainty, modifiers and discoursive markers, and how the global polarity of sentences is constructed. We design a rule-based system and evaluate is against manually built reference data. Inter-annotator agreement is between 0.50 and 0.66. An evaluation of the automatic system shows between 40 and 56% precision.

Natalia Grabar, Loïc Dumonet

Automatic Symptom Extraction from Texts to Enhance Knowledge Discovery on Rare Diseases

This paper reports ongoing researches on automatic symptom recognition towards diagnosis of rare diseases and knowledge acquisition on this subject. We describe a hybrid approach combining sequential pattern mining and natural language processing techniques in order to automate the discovery of symptoms from textual content. More precisely, our weakly supervised approach uses linguistic knowledge to enhance an incremental pattern mining process, in order to filter and make a relevant use of the discovered patterns.

Jean-Philippe Métivier, Laurie Serrano, Thierry Charnois, Bertrand Cuissart, Antoine Widlöcher

Prediction in Clinical Practice


A Composite Model for Classifying Parotid Shrinkage in Radiotherapy Patients Using Heterogeneous Data

The identification of head-and-neck radiotherapy patients who will probably undergo the parotid gland shrinkage would help to plan adaptive therapy for them. The goal of this paper is to build predictive models to be included in a Decision Support System, able to operate with a wide set of heterogeneous data and classify parotid shrinkage. The main idea is to combine a set of models, each of them working distinctly with a group of features regarding clinical data, dosimetric data, or information extracted from Computed Tomography images, into one or more composite models using the most informative variables, in order to obtain more accurate and reliable decisions. Each of these models is built by using Likelihood-Fuzzy Analysis, which is based on both statistics and fuzzy logic, in order to grant semantic interpretability. This solution presents good accuracy, sensitivity and specificity, and compared with the wellknown Fisher’s Linear Discriminant Analysis results more effective in parotids classification, even in case of missing values. The best models operating with available features are achieved, and the advantages of acquiring data from different sources are outlined. Other interesting findings regard the confirmation of already known predictors, and the individuation of others still undisclosed.

Marco Pota, Elisa Scalco, Giuseppe Sanguineti, Maria Luisa Belli, Giovanni Mauro Cattaneo, Massimo Esposito, Giovanna Rizzo

Feasibility of Spirography Features for Objective Assessment of Motor Symptoms in Parkinson’s Disease

Parkinsons disease (PD) is currently incurable, however the proper treatment can ease the symptoms and significantly improve the quality of patients life. Since PD is a chronic disease, its efficient monitoring and management is very important. The objective of this paper is to investigate the feasibility of using the features and methodology of a spirography device, originally designed to measure early Parkinsons disease (PD) symptoms, for assessing motor symptoms of advanced PD patients suffering from motor fluctuations. More specifically, the aim is to objectively assess motor symptoms related to bradykinesias (slowness of movements occurring as a result of under-medication) and dyskinesias (involuntary movements occurring as a result of over-medication). The work combines spirography data and clinical assessments from a longitudinal clinical study in Sweden with the features and pre-processing methodology of a Slovenian spirography application. The target outcome was to learn to predict the “cause” of upper limb motor dysfunctions as assessed by a clinician who observed animated spirals in a web interface. Using the machine learning methods with feature descriptions from the Slovenian application resulted in 86% classification accuracy and over 90% AUC, demonstrating the usefulness of this approach for objective monitoring of PD patients.

Aleksander Sadikov, Jure Žabkar, Martin Možina, Vida Groznik, Dag Nyholm, Mevludin Memedi

Using Multivariate Sequential Patterns to Improve Survival Prediction in Intensive Care Burn Unit

Resuscitation and stabilization are key issues in Intensive Care Burn Units and early survival predictions help to decide the best clinical action during these phases. Current survival scores of burns focus on clinical variables such as age or the body surface area. However, the evolution of other parameters (e.g. diuresis or fluid balance) during the first days is also valuable knowledge. In this work we suggest a methodology and we propose a Temporal Data Mining algorithm to estimate the survival condition from the patient’s evolution. Experiments conducted on 480 patients show the improvement of survival prediction.

Isidoro J. Casanova, Manuel Campos, Jose M. Juarez, Antonio Fernandez-Fernandez-Arroyo, Jose A. Lorente

A Heterogeneous Multi-Task Learning for Predicting RBC Transfusion and Perioperative Outcomes

It would be desirable before a surgical procedure to have a prediction rule that could accurately estimate the probability of a patient bleeding, need for blood transfusion, and other important outcomes. Such a prediction rule would allow optimal planning, more efficient use of blood bank resources, and identification of high-risk patient cohort for specific perioperative interventions. The goal of this study is to develop an efficient and accurate algorithm that could estimate the risk of multiple outcomes simultaneously. Specifically, a heterogeneous multi-task learning method is proposed for learning outcomes such as perioperative bleeding, intraoperative RBC transfusion, ICU care, and ICU length of stay. Additional outcomes not normally predicted are incorporated in the model for transfer learning and help improve the performance of relevant outcomes. Results for predicting perioperative bleeding and need for blood transfusion for patients undergoing non-cardiac operations from an institutional transfusion datamart show that the proposed method significantly increases AUC and G-Mean by more than 6% and 5% respectively over standard single-task learning methods.

Che Ngufor, Sudhindra Upadhyaya, Dennis Murphree, Nageswar Madde, Daryl Kor, Jyotishman Pathak

Comparison of Probabilistic versus Non-probabilistic Electronic Nose Classification Methods in an Animal Model

An electronic nose (eNose) is a promising device for exhaled breath tests. Principal Component Analysis (PCA) is the most used technique for eNose sensor data analysis, and the use of probabilistic methods is scarce. In this paper, we developed probabilistic models based on the logistic regression framework and compared them to non-probabilistic classification methods in a case study of predicting Acute Liver Failure (ALF) in 16 rats in which ALF was surgically induced. Performance measures included accuracy, AUC and Brier score. Robustness was evaluated by randomly selecting subsets of repeatedly measured sensor values before calculating the model variables. Internal validation for both aspects was obtained by a leave-one-out scheme. The probabilistic methods achieved equally good performance and robustness results when appropriate feature extraction techniques were applied. Since probabilistic models allow employing sound methods for assessing calibration and uncertainty of predictions, they are a proper choice for decision making. Hence we recommend adopting probabilistic classifiers with their associated predictive performance in eNose data analysis.

Camilla Colombo, Jan Hendrik Leopold, Lieuwe D. J. Bos, Riccardo Bellazzi, Ameen Abu-Hanna

Knowledge Representation and Guidelines


Detecting New Evidence for Evidence-Based Guidelines Using a Semantic Distance Method

To ensure timely use of new results from medical research in daily medical practice, evidence-based medical guidelines must be updated using the latest medical articles as evidences. Finding such new relevant medical evidence manually is time consuming and labor intensive. Traditional information retrieval methods can improve the efficiency of finding evidence from the medical literature, but they usually require a large training corpus for determining relevance. This means that both the manual approach and traditional IR approaches are not suitable for automatically finding new medical evidence in realtime. This paper propose the use of a semantic distance measure to automatically find relevant new evidence to support guideline updates. The advantage of using our semantic distance measure is that this relevance measure can be easily obtained from a search engine (e.g., PubMed), rather then gathering a large corpus for analysis. We have conducted several experiments that use our semantic distance measure to find new relevant evidence for guideline updates. We selected two versions of the Dutch Breast Cancer Guidelines (2004 and 2012), and we checked if the new evidence items in the 2012 version could be found by using our method. The experiment shows that our method can not only find at least some evidence for 10 out of the 16 guideline statements in our experiment (i.e. a reasonable recall), but it also returns reasonably small numbers of evidence candidates (i.e. a good precision) with an acceptable real-time performance (an average of approximately 10 minutes for each guideline statement).

Qing Hu, Zhisheng Huang, Annette ten Teije, Frank van Harmelen

Analyzing Recommendations Interactions in Clinical Guidelines

Impact of Action Type Hierarchies and Causation Beliefs

Accounting for patients with multiple health conditions is a complex task that requires analysing potential interactions among recommendations meant to address each condition. Although some approaches have been proposed to address this issue, important features still require more investigation, such as (re)usability and scalability. To this end, this paper presents an approach that relies on reusable rules for detecting interactions among recommendations coming from various guidelines. It extends previously proposed models by introducing the notions of action type hierarchy and causation beliefs, and provides a systematic analysis of relevant interactions in the context of multimorbidity. Finally, the approach is assessed based on a case-study taken from the literature to highlight the added value of the approach.

Veruska Zamborlini, Marcos da Silveira, Cedric Pruski, Annette ten Teije, Frank van Harmelen

A General Approach to Represent and Query Now-Relative Medical Data in Relational Databases

Now-related temporal data play an important role in the medical context. Current relational temporal database (TDB) approaches are limited since (i) they (implicitly) assume that the span of time occurring between the time when facts change in the world and the time when the changes are recorded in the database is exactly known, and (ii) do not explicitly provide an extended relational algebra to query now-related data. We propose an approach that, widely adopting AI symbolic manipulation techniques, overcomes the above limitations.

Luca Anselma, Luca Piovesan, Abdul Sattar, Bela Stantic, Paolo Terenziani

Temporal Conformance Analysis of Clinical Guidelines Execution

Physicians often have to combine clinical guideline recommendations with their own basic medical knowledge to cope with specific patients in specific contexts. Both knowledge sources may include temporal constraints for the execution of actions. In this paper we approach the problem of compliance analysis with both sources of knowledge, pointing out discrepancies – including temporal ones – with respect to them, and where such discrepancies may be due to multiple and possibly conflicting recommendations.

Matteo Spiotta, Paolo Terenziani, Daniele Theseider Dupré

Combining Decision Support System-Generated Recommendations with Interactive Guideline Visualization for Better Informed Decisions

The main task of decision support systems based on computer-interpretable guidelines (CIG) is to send recommendations to physicians, combining patients’ data with guideline knowledge. Another important task is providing physicians with explanations for such recommendations. For this purpose some systems may show, for every recommendation, the guideline path activated by the reasoner. However the fact that the physician does not have a global view of the guideline may represent a limitation. Indeed, there are instances (e.g. when the clinical presentation does not perfectly fit the guideline) in which the analysis of alternatives that were not activated by the system becomes warranted. Furthermore possibly valid alternatives could not be activated due to lack of data or wrong knowledge representation. This paper illustrates a CIG implementation that complements the two functionalities, i.e., sending punctual recommendations and allowing a meaningful navigation of the entire guideline. The training example concerns atrial fibrillation management.

Lucia Sacchi, Enea Parimbelli, Silvia Panzarasa, Natalia Viani, Elena Rizzo, Carlo Napolitano, Roxana Ioana Budasu, Silvana Quaglini


Weitere Informationen

Premium Partner